Projects/Parallel KDC
Problem
The KDC is a single-threaded daemon--once it receives a complete request from a client, it fully processes that request before receiving another. The performance consequences of this are threefold:
- Only one CPU services KDC requests, including cryptography operations.
- When the KDC is reading data from disk (such as the replay cache or a BDB database), it does nothing else.
- If the KDB module retrieves data from a remote source (such as an LDAP query), the KDC does nothing while waiting for a reply.
Most KDCs experience only moderate load and can service requests quickly. In some circumstances, higher performance may be required.
Candidate Solutions
There are four possible solutions, the first of which is already possible:
- The realm administrator can run multiple KDC processes on the same host, each listening on a different port, each accessing the same database. This is possible with the current implementation, and SRV records can be used to avoid the need for client configuration; however, it does not yield optimal performance. Each client request will select a port without knowing whether the KDC process servicing that port is busy, and will wait for a timeout before trying another port. Moreover, MIT krb5 client code does not implement randomization of equal-priority SRV records, so randomization of SRV responses by the DNS infrastructure would be necessary for load-balancing to occur, and such randomization is sometimes defeated by caching. Parallelism is limited to the number of KDC processes.
- We could make the KDC event-oriented. This approach would require refactoring the entire KDC code base and all KDB modules. The DAL would have to provide KDB modules access to the listen_and_process main loop, and all DAL requests would have to be structured with callbacks or other mechanisms to allow the answer to arrive after further iterations of the main loop. This approach would only solve the problem of allowing the KDC to perform work while waiting for remote data sources such as LDAP; it would not allow multiple CPUs to service KDC requests or allow the KDC to perform work while waiting for disk accesses to complete.
- We could make the KDC multithreaded. This approach would require eliminating all use of global state (in particular, the kdc_active_realm variable and all of the macros such as kdc_context which derive from it) and ensuring that all library code used by the KDC is thread-safe. Any mistakes in thread-safety might result in difficult-to-debug race conditions, some of which might have security consequences.
- We could make the KDC use a multi-process worker model. After setting up its initial state including listener sockets, the KDC would fork multiple subprocesses. The set of idle subprocesses would compete for UDP packets or incoming TCP connections on the listener sockets, invisibly to clients. Once a worker process has obtained a request, it would service it according to the current single-threaded logic. Parallelism would be limited to the number of worker processes.
This project proposes to implement the fourth option, as it requires minimal code changes and does not introduce much additional risk.
Design of Proposed Solution
A new option will need to be added to the getopt() loop in initialize_realms() to specify the number of worker threads. The -w option is a reasonable option since it is currently unused.
Code to create the worker processes will be invoked from main() after the call to write_pid_file(). The parent process will act as a proxy for SIGTERM and SIGHUP so that killing the pid in the pid file terminates or signals all worker processes. When any child process exits, the parent will terminate the other worker processes and exit.
The network socket will need to set UDP listening sockets to non-blocking (TCP listener sockets already are), and process_packet() will need to ignore EAGAIN errors instead of logging them.
On platforms with no IP_PKTINFO support, the KDC must bind to each interface address in order to send UDP relies from the same address as the request was received. When the host network is reconfigured, the KDC will try to recognize this and rebind the UDP listener ports. This cannot be done in the worker subprocesses since multiple processes cannot all bind to the same port. It would be possible to rebind the ports in the supervisor and then terminate and restart the worker processes, but that would be very complicated. The proposed solution is to disable reconfiguration when worker processes are in use and document the limitation.
KDB modules must be independently opened in each worker process, rather than opening it once and cloning the resulting process state. It is still a good idea to open and then close the KDB module in the supervisor, prior to starting worker processes, in order to get a more controlled failure if a module is misconfigured. This will be handled by calling finish_realms() prior to creating working process, and then calling initialize_realms() again inside the child process.
The logging code will need to be examined to make sure that concurrent access to the same logging sinks would not create problems.
Additional attention to bug #1671 (no file locking used by replay cache) may be necessary to evaluate whether there is a security impact on a multi-process KDC, keeping in mind that allowing one replay to each independent KDC processes is typically not considered a serious security threat in master/slave scenarios. This may be moot in light of the SecurID project work, as the replay cache may no longer be needed in the KDC.
Testing Plan
In most test scenarios, requests are processed too quickly by the KDC to measure any difference in behavior from a multi-process worker model. It should be possible to test this by hand by temporarily modifying the BDB back end to sleep() for a minute when looking up a particular principal name such as "slowuser". While testing, note that libkrb5 will retry requests after a timeout, so a single "kinit slowuser" will cause multiple worker processes to block unless the retry logic is disabled in the client code. Client retry logic can be disabled in lib/krb5/os/sendto_kdc.c by changing MAX_PASS from 3 to 1 and changing all assignments to socktype2 to 0 (e.g. instead of SOCK_STREAM).
Automated testing of this functionality would be pretty tricky; we would need a special stub KDB back end to cause worker processes to block, as well as a way to control the client retry loop.