logo_kerberos.gif

Projects/replay cache collision avoidance

From K5Wiki
Jump to: navigation, search
This project was completed in release 1.7.

Background

The MIT Kerberos replay cache follows the guidelines of RFC 4120 Section 10. Security Considerations:

  Implementation note: If a client generates multiple requests to the
  KDC with the same timestamp, including the microsecond field, all but
  the first of the requests received will be rejected as replays.  This
  might happen, for example, if the resolution of the client's clock is
  too coarse.  Client implementations SHOULD ensure that the timestamps
  are not reused, possibly by incrementing the microseconds field in
  the time stamp when the clock returns the same time for multiple
  requests.

As computers have become faster and greater numbers of processes or threads are requiring network authentication to network services, it is becoming more likely that the microseconds field will match for multiple requests. It is also more difficult to avoid such collisions without introducing a significant performance hit. As a result, an ever increasing number of false positives are triggered even though the authenticators used by the various client processes differ.

The current FILE replay cache format consists of a file, starting with a two-byte big-endian version number. Integers in the rest of the file are in host byte order, surprisingly. Following the version number is a lifespan value in host byte order, and the remainder of the file consists of records, each of which includes the following fields:

 unsigned int   - client length == strlen(client principal name) + 1
 variable       - client principal name (NUL terminated C-string)
 unsigned int   - server length == strlen(server principal name) + 1
 variable       - server principal name (NUL terminated C-string)
 krb5_int32     - micro seconds
 krb5_timestamp - timestamp

These are the fields that are required by RFC 4120 Section 10. Security Considerations:

  Unless the application server provides its own suitable means to
  protect against replay (for example, a challenge-response sequence
  initiated by the server after authentication, or use of a server-
  generated encryption subkey), the server MUST utilize a replay cache
  to remember any authenticator presented within the allowable clock
  skew.  Careful analysis of the application protocol and
  implementation is recommended before eliminating this cache.  The
  replay cache will store at least the server name, along with the
  client name, time, and microsecond fields from the recently-seen
  authenticators, and if a matching tuple is found, the
  KRB_AP_ERR_REPEAT error is returned.

As stated the requirement is that at least the client and server principals along with the complete time stamp are required to be present. Additional data such as a hash of a canonical representation of the authenticator or the full clear text of a canonical representation of the authenticator would be permitted by RFC 4120 and could be used to avoid false positives.

Simply adding new fields to the MIT Kerberos replay cache record would not be an acceptable solution. As indicated in RFC 4120, it is imperative that all services sharing the same service principal share the same replay cache regardless of which Kerberos implementation is in use.

  If multiple servers (for example, different services on one machine,
  or a single service implemented on multiple machines) share a service
  principal (a practice that we do not recommend in general, but that
  we acknowledge will be used in some cases), either they MUST share
  this replay cache, or the application protocol MUST be designed so as
  to eliminate the need for it.  Note that this applies to all of the
  services.  If any of the application protocols does not have replay
  protection built in, an authenticator used with such a service could
  later be replayed to a different service with the same service
  principal but no replay protection, if the former doesn't record the
  authenticator information in the common replay cache.

As a result, the file format used by MIT Kerberos has been implemented in other Kerberos implementations. It would not be safe to change the replay cache file format in a manner that prevented the sharing of the replay cache among all of the implementations that can be expected to be deployed on a single machine, whether they be from different versions of MIT Kerberos or implementations from different vendors.

The Proposal

In order to avoid false positives we require that the replay cache data structure store sufficient data to be able to distinguish between two authenticators while at the same time maintaining compatibility with existing deployments.

To do this, we define a new type of extension record consisting of the following:

 unsigned int   - 1 (length of sentinel)
     unsigned char  - null byte (sentinel to mark a "new" record)
 unsigned int   - length of authenticator-hash extension
     variable       - extension identification "HASH:"
     variable       - hex-encoded hash of authenticator ciphertext
     char           - space
     variable       - ASCII decimal length of client principal name
     char           - ":"
     variable       - client principal name
     char           - space
     variable       - ASCII decimal length of server principal name
     char           - ":"
     variable       - server principal name
     char           - null terminator
 krb5_int32     - microseconds
 krb5_timestamp - timestamp

The extension identification scheme allows transition to future enhanced replay cache file formats. We eschew binary encoding because embedded null bytes will cause old implementations to truncate that component of the record because they use strlen to determine the length of the principal name to write out.

This record would be followed by the existing record consisting of the client and server principal names, the time stamp and the microseconds value. A complete entry therefore takes up two existing records. We cannot assume that the two records will be read in the same order as they were written, as an expunge operation might reorder entries in the file (the MIT krb5 code reverses the order of records in the file, for instance). Therefore, extension records are defined to supersede, rather than augment, the normal style of record.

Hashing the ciphertext is relatively safe given reasonable assumptions about the security of the encryption, and provides additional opportunities to prevent false positives. (The mandatory components of an authenticator might be identical between two successive legitimate requests.) The default hash method would be a MD5 hash. This will be fast to compare, and if there are collisions, the worst case is a false-positive indication that would have happened anyway in the old implementation. It is not important that the hash be very collision-resistant because a collision merely creates a (relatively unlikely) denial of service.

How does this provide backward compatibility?

New servers reading entries written by new servers will interpret the hash entries containing the hash values, and use those in preference to the old-style records which do not contain hash values.

For new servers reading entries written by old servers, the string fields will be valid principal names and the time stamp value will not be a small integer. As a result, the server will know it is dealing with an old style entry and perform an old style check. This might lead to false positives but there is nothing we can do without additional information that is not available.

For old servers reading entries written by new servers, the new hash based entry will never match the incoming principal names and will therefore be skipped. The old style entries will be used as they are today.

For old servers reading entries written by old servers, the behavior will be the same as today.

Is this a long term fix for the problem?

Perhaps this is the best we can do. RFC 4120 Section 10. Security Considerations describes how the application server must behave when it is unsure that it has enough replay data to cover the allowable clock skew period:

  If a server loses track of authenticators presented within the
  allowable clock skew, it MUST reject all requests until the clock
  skew interval has passed, providing assurance that any lost or
  replayed authenticators will fall outside the allowable clock skew
  and can no longer be successfully replayed.  If this were not done,
  an attacker could subvert the authentication by recording the ticket
  and authenticator sent over the network to a server and replaying
  them following an event that caused the server to lose track of
  recently seen authenticators.

That implies that at the very least an backward-incompatible change in file format would result in an outage.

In addition, a backward-incompatible change in file format would make it impossible to share a replay cache among services built against different Kerberos implementations. While it is certainly possible to create a faster implementation that is built around B-trees, the new format could only be used if it was known that only MIT Kerberos would ever use that replay cache.

Implementation considerations

The krb5_rc_store function is an internal API. The krb5_donot_replay structure can gain a new member containing the hash of the authenticator.

The key to detecting one of these new extension records is finding a replay cache entry that appears to have a zero-length string for the server principal name. The search mechanism that krb5_rc_dfl_store uses will take into account the authenticator hash.

Old implementations will read the entire extension records in the new replay cache entry, but ignore the new entries due to the zero-length strings.

When reading in a replay cache, we expect to see "replays" between the old-style and new-style entries written out by new code. These need to be handled by letting records with message hashes supersede those without.

Review

This section documents the review of the project according to Project policy. It is divided into multiple sections. First, approvals should be listed. To list an approval type

#~~~~

on its own line. The next section is for discussion. Use standard talk page conventions. In particular, sign comments with

--~~~~

and indent replies.

Members of Krbcore raising Blocking objections should preface their comment with {{project-block}}. The member who raised the objection should remove this markup when their objection is handled.

Approvals

Greg Hudson

Discussion