HTTP caching explained

11 Feb 2009 by David Corvoysier
Web developers can take advantage of HTTP caching mechanisms to improve significantly the performance and the robustness of their site. HTTP caching mechanisms are however not well-known and often not-well understood. This article aims at clarifying things without going into too much details. For a more complete description of these mechanisms, please refer to the HTTP/1.1 RFC. The HTTP/1.1 protocol includes two basic "caching" mechanisms to:
  • eliminate the need to send requests in many cases, using an "expiration" mechanism,
  • eliminate the need to send full responses in many other cases, using a "validation" mechanism.

The HTTP/1.1 Expiration model

The HTTP/1.1 expiration model, as defined in [RFC2616 §13.2], is designed to avoid the client making requests to the origin server until the corresponding content has expired. The HTTP/1.1 expiration model does not mandate that the origin server explicitly specifies expiration dates for contents. For that reason, each client needs to define its own caching algorithm. Note: A simple algorithm is provided in [RFC2616 §13.2], based on the last modification date of a document. If that information is provided, then the expiration date equals ten percent of the difference between the reception date and the modification date. However, deterministic client behaviour is often desirable to avoid server congestion, and two headers can be used to explicitly specify the expiration date of a particular content: • Expires (see [RFC2616 §14.21]), • Cache-control, max-age directive (see [RFC2616 §14.9]). Note: The Cache-control header always takes precedence over the Expires header Example: A page that expires in ten minutes (see how both headers are here redundant)
HTTP/1.1 200 OK
Date: Mon 06 Oct 2008 12:33:48 GMT
Server: Apache/2.2.8
Cache-Control: max-age=600
Expires: Mon 06 Oct 2008 12:43:48 GMT
More, the Cache-control header can also be used to further control the cache handling on the client side, by using a set of directives that directly control the client caching algorithm. Example: Never cache this page (Expires header will be ignored):
HTTP/1.1 200 OK
Date: Mon 06 Oct 2008 12:33:48 GMT
Server: Apache/2.2.8
Cache-Control: no-cache
Expires: Mon 06 Oct 2008 12:43:48 GMT
Always revalidate this page (Expires header will be ignored):
HTTP/1.1 200 OK
Date: Mon 06 Oct 2008 12:33:48 GMT
Server: Apache/2.2.8
Cache-Control: must-revalidate
Expires: Mon 06 Oct 2008 12:43:48 GMT

The HTTP/1.1 Validation model

The HTTP/1.1 validation model, as defined in [RFC2616 §13.3], is designed to request the client to verify first if a content has changed before fetching it from the server. To verify if a stale cache entry is still valid, the client simply inserts specific headers in the request it sends to the origin server, passing back a specific tag it has received in the initial request. If the origin server recognizes that the tag corresponds to a content that hasn't changed, it responds with a "304 Not Modified". Otherwise, it sends back the updated document. Note: Hints that this mechanism is at work can thus be found by looking for "304 Not Modified" responses in the web server logs. The following two headers are typically used to tag a particular content: • Last-Modified (see [RFC2616 §14.29]), • Etag (see [RFC2616 §14.19]).

Caching documents retrieved through HTTPS

By default, most browsers do not cache documents retrieved through HTTPS for security reasons. However, the origin server can use the Cache-Control header to specify that a specific document is public, and can thus be cached. Example:
HTTP/1.1 200 OK
Date: Mon 06 Oct 2008 12:33:48 GMT
Server: Apache/2.2.8
Cache-Control: public
Expires: Mon 06 Oct 2008 12:43:48 GMT
comments powered by Disqus