This article has no structure really. It is a gathering of important information and own findings concerning caching of web elements.
Caching controls of web elements
General rules:
– The header ‘Expires:’ is an implementation of the HTTP/1.0
– The header ‘Cache-Control: max-age=xxxx’ is an implementation of the HTTP/1.1 and OVERRIDES ‘Expires:’
Extract from http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html:
If a response includes both an Expires header and a max-age directive, the max-age directive overrides the Expires header, even if the Expires header is more restrictive. This rule allows an origin server to provide, for a given response, a longer expiration time to an HTTP/1.1 (or later) cache than to an HTTP/1.0 cache. This might be useful if certain HTTP/1.0 caches improperly calculate ages or expiration times,
perhaps due to desynchronized clocks.
Using mod_expires:
Activating the mod_expire
ExpiresActive On
– The ‘Expires:’ header is sent from the server in this format: eg.
Expires: Fri, 13 Feb 2009 11:09:56 GMT
Modifies and/or creates the headers:
‘Expires:’ and ‘Cache-Control: max-age=xxxxx’
xxxxxx=number of seconds from now when the element will no more be valid. eg.
Cache-Control: max-age=31536000 (One year from now)
Syntax:
ExpiresByType MIME-type {seconds}
MIME-type: {code} {seconds}
code: ‘M’=Modified Date, ‘A’=Access Date
seconds: Number of seconds from after which the cached element will become invalid
Non verbose config examples:
# expire GIF images after a month in the client's cache
ExpiresByType image/gif A2592000
# HTML documents are good for a week from the time they were changed
ExpiresByType text/html M604800
Verbose config examples:
ExpiresByType text/html "access plus 1 month 15 days 2 hours"
ExpiresByType image/gif "modification plus 5 hours 3 minutes"
Note: if you use a ‘modification date’ based setting, the Expires header will not be added to content that does not come from a file on disk. This is due to the fact that there is no modification time for such content.
Using mod_header
eg.1
Header append Cache-Control "public, max-age=86400"
eg.2
# 1 YEAR
Header set Cache-Control "max-age=29030400, public"
IMPORTANT:
The Cache-Control: max-age=xxxx directive on a response implies(default) that the response is cacheable (i.e., “public”) unless some other, more restrictive cache directive is also present.
Caching control notes
From: http://www.askapache.com/htaccess/speed-up-sites-with-htaccess-caching.html
and: http://www.askapache.com/htaccess/caching-tutorial-for-webmasters.html
By setting an expiry time for the files on your Web site, you can go even farther than merely relying on the conditional GET and the 304 response that a server sends when a file has not changed… You can prevent the contact with the server from happening at all by using the Expires header and the Cache-control header.
If a browser receives an image with the cache control headers that say the image can be considered fresh for 2 weeks, then for 2 weeks the image can be pulled directly from the browser’s (or proxy’s) cache on subsequent requests. This is noticeably faster than even a conditional GET and a 304 response from the server since there is no round trip.
After two weeks, a conditional GET would be sent to the server to check the Last-Modified date, then again, no requests would be made for the duration of the specified freshness period.
So, briefly, these 2 mechanisms require no coding changes to your existing Web pages, and they works to avoid the unnecessary requests and connections to your Web server for files that do not need to be requested with each visit. Part of the rationale behind ETags is to provide sub-one second resolution in validating cached entities, where Last-Modified are limited to one second resolution. The ETag certainly has its uses, but it serves more as an alternative to the Last-Modified value rather than an expiry-based caching mechanism.
You can use mod_expires to take care of expires and max-age, and use mod_headers to “manually” configure the following:
Cache-Control: no-store
This object may not be stored in any cache, even the requestor’s browser cache.
Cache-Control: no-cache
This object may be held in any cache but it must be revalidated every time it is requested.
Cache-Control: private
This object can be stored in the requesting browser´s cache but not in a shared cache (eg. proxy servers) …
Cache-Control: must-revalidate
Tells caches that they must obey any freshness information you give them about an object.
The HTTP allows caches to take liberties with the freshness of objects; by specifying this header, you’re telling the cache that you want it to strictly follow your rules.
Cache-Control: proxy-revalidate
Similar to must-revalidate, except that it only applies to proxy caches.
Validators and Validation: Last-Modified and Etag headers
In How Web Caches Work, we said that validation is used by servers and caches to communicate when an representation has changed. By using it, caches avoid having to download the entire representation when they already have a copy locally, but they’re not sure if it’s still fresh.
Validators are very important; if one isn’t present, and there isn’t any freshness information (Expires or Cache-Control) available, caches will not store a representation at all.
The most common validator is the time that the document last changed, as communicated in Last-Modified header. When a cache has a representation stored that includes a Last-Modified header, it can use it to ask the server if the representation has changed since the last time it was seen, with an If-Modified-Since request.
HTTP 1.1 introduced a new kind of validator called the ETag. ETags are unique identifiers that are generated by the server and changed every time the representation does. Because the server controls how the ETag is generated, caches can be surer that if the ETag matches when they make a If-None-Match request, the representation really is the same.
Almost all caches use Last-Modified times in determining if an representation is fresh; ETag validation is also becoming prevalent.
Most modern Web servers will generate both ETag and Last-Modified headers to use as validators for static content (i.e., files) automatically; you won’t have to do anything. However, they don’t know enough about dynamic content (like CGI, ASP or database sites) to generate them; see Writing Cache-Aware Scripts.
The ETag format for Apache 1.3 and 2.x
Format:
Etag: 'inode-size-timestamp'
After making a number of tests with regards to:
– mod_expires module
– mod_headers module
– FileEtag
– mod_gzip module
I’m writing here a short report of the findings:
1) The Apache module mod_expires is enough to modify appropriately the cache control headers therefore the mod_headers is not needed. It modifies the following headers automatically:
– Expires: ‘Date…..’
– Cache-Control: max-age=xxxxxxx
– The header ‘Expires:’ is an implementation of the HTTP/1.0
– The header ‘Cache-Control: max-age=xxxx’ is an implementation of the HTTP/1.1 and OVERRIDES ‘Expires:’
These above Headers control the possible caching and duration of caching (from browsers or proxies) of web responses.
2) The Last-Modified: and ETag: Headers are used to validate a cached object with the original on web server. The Etag is new and has priority.
2b) Although the documentation says otherwise, the Directive ‘FileETag None’ doesn’t work for Apache 1.3.xx,
But it does work in Apache 2.x.
The content of the ETag: In Apache 1.3.x Header (INode MTime Size) can be changed to a single value, eg. ‘FileETag Size’ but cannot be Totally eliminated(‘FileETag None’ doesn’t work) in Apache 1.3.xx which are the versions used in
servers: webc0[0123]
Note: This value can be set to ‘Size’ therefore forcing the cache to use Last-Modified: header for validation. In any case, since we would be setting the Header: ‘Cache-Control: max-age=xxxx’ one year later, no validation would occur before one year anyway.
3) Tests made on module mod_gzip showed that the web site response content can be reduced to 1/3 of the original size for normal text files (.html, .xml, .css, .js…etc). Since the module mod_gzip sends as default the header ‘Vary: Accept-Encoding’ to help the caches to decide if a client should be served the cached zipped version or not, I don’t quite understand why the present settings have turned it OFF. Not that it makes much of a difference because most modern browsers do accept the gzip format anyway.
4) Note that if you use a modification date based setting, the Expires header will not be added to content that does not come from a file on disk. This is due to the fact that there is no modification time for such content.
ETags
After reading the section on Cache control and ETags in the book’ O’Reilly – High Performance Web Sites’, I came to the following conclusions:
Normally when a user loads an object (picture, html file,etc) again, a copy of the object will be served immediately from the browser cache if the object has not expired (Expires: and Cache-Control: max-age=….) If the user clicks on the Refresh/Reload button, a request for validation is then sent to the server..even if if the object in the cache has not yet expired !!!. In this case Both Last-Modified: and ETag infos will be compared
at the server. A response 304 will be sent to the user’s browser, indicating that the cached object is still valid. In this case both Last-Modified and ETag info must match at the server to get this 304 Response,
Pragma: no-cache
Header is an HTTP/1.0 implementation and mostly no more used, but many older Proxies out there still follow only HTTP/1.0 implementation. So as a secure way, just use it when needed.
Suggestions:
There is no way to prevent the re-validation process if the user clicks on the reload/refresh button, therefore preventing a full reload of the object. If ETags are used in combination with load balancers the only info in ETags that is important to NOT be present is the ‘INode’ The ETag’s info ‘FileSize’ and the ‘TimeStamp’ (which should be identical to Last-Modified:) can stay present and will not disturb. Ther eason is because the inode will most likely diffe from server to server behind a load balancer. The ideal would be to illiminate ETags altogether but the difference is strickly of header size and nothing else.
In any case since Apache 1.3.xx (which we are using for our main servers) does not allow to illiminate ETags (FileEtag None), my suggestion would be to set it to ‘FileETag Size’ to get just about the same effect regarding the validation.
Useful Cache-Control: response headers components
max-age=[seconds]
specifies the maximum amount of time that an representation will be considered fresh. Similar to Expires, this directive is relative to the time of the request, rather than absolute. [seconds] is the number of seconds from the time of the request you wish the representation to be fresh for.
s-maxage=[seconds]
similar to max-age, except that it only applies to shared (e.g., proxy) caches.
public
Indicates that the response MAY be cached by any cache, even if it would normally be non-cacheable or cacheable only within a non- shared cache.
private
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache. This allows an origin server to state that the specified parts of the response are intended for only one user and are not a valid response for requests by other users. A private (non-shared) cache MAY cache the response.
Note: This usage of the word private only controls where the response may be cached, and cannot ensure the privacy of the message content.
no-cache
forces caches to submit the request to the origin server for validation before releasing a cached copy, every time. This is useful to assure that authentication is respected (in combination with public), or to maintain rigid freshness, without sacrificing all of the benefits of caching.
no-store
instructs caches not to keep a copy of the representation under any conditions.
must-revalidate
tells caches that they must obey any freshness information you give them about a representation. HTTP allows caches to serve stale representations under special conditions; by specifying this header, you’re telling the cache that you want it to strictly follow your rules.
proxy-revalidate
similar to must-revalidate, except that it only applies to proxy caches.
Writing Cache-Aware Scripts
Ref: http://www.askapache.com/htaccess/caching-tutorial-for-webmasters.html
By default, most scripts won’t return a validator (a Last-Modified or ETag response header) or freshness information (Expires or Cache-Control). While some scripts really are dynamic (meaning that they return a different response for every request), many (like search engines and database-driven sites) can benefit from being cache-friendly.
Generally speaking, if a script produces output that is reproducable with the same request at a later time (whether it be minutes or days later), it should be cacheable. If the content of the script changes only depending on what’s in the URL, it is cacheable; if the output depends on a cookie, authentication information or other external criteria, it probably isn’t.
The best way to make a script cache-friendly (as well as perform better) is to dump its content to a plain file whenever it changes. The Web server can then treat it like any other Web page, generating and using validators, which makes your life easier. Remember to only write files that have changed, so the Last-Modified times are preserved.
Another way to make a script cacheable in a limited fashion is to set an age-related header for as far in the future as practical. Although this can be done with Expires, it’s probably easiest to do so with Cache-Control: max-age, which will make the request fresh for an amount of time after the request.
If you can’t do that, you’ll need to make the script generate a validator, and then respond to If-Modified-Since and/or If-None-Match requests. This can be done by parsing the HTTP headers, and then responding with 304 Not Modified when appropriate. Unfortunately, this is not a trival task.
Some other tips
Don’t use POST unless it’s appropriate. Responses to the POST method aren’t kept by most caches; if you send information in the path or query (via GET), caches can store that information for the future.
Don’t embed user-specific information in the URL unless the content generated is completely unique to that user.
Don’t count on all requests from a user coming from the same host, because caches often work together.
Generate Content-Length response headers. It’s easy to do, and it will allow the response of your script to be used in a persistent connection. This allows clients to request multiple representations on one TCP/IP connection, instead of setting up a connection for every request. It makes your site seem much faster.