Developers
AvantGo Channel Developer Guide

TOC PREV NEXT INDEX


How caching works — technical details

The AvantGo sync server works very much like any other proxy server. If you are not familiar with proxy servers, that is OK. Read on.


Topics in this section:

HTTP headers

When a web server sends a page, whether to an ordinary browser, or the AvantGo sync server, it first sends a group of HTTP headers that look something like this:

  

HTTP/1.1 200 OK

Date: Fri, 21 Jan 2000 17:28:02 GMT

Server: Apache/1.3.9 (Unix)

Cache-Control: max-age=18000

Last-Modified: Fri, 21 Jan 2000 17:27:09 GMT

ETag: "0-3cb-38888fe5"

Accept-Ranges: bytes

Content-Length: 1971

Connection: close

Content-Type: text/html

  

The Cache-Control header and the max-age directive

Most of this is of little importance right now (although some of it is interesting), except for that one line that reads Cache-Control: max-age=18000. The AvantGo sync server uses this line to determine how old that web page is allowed to be before we have to search for a fresh version. The time listed there is in seconds, so these headers are telling the AvantGo sync server that in 18000/3600 = 5 hours, we should consider the associated web page out of date.

Note: As you will see, there are other ways to control caching, such as setting up an Expires: header, but we prefer using Cache-Control instead.

Suppose that the AvantGo sync server is grabbing a page from your channel with the above headers. It would notice the Cache-Control header, and keep the document on its hard drive. It would also note the current time and the document's Last-Modified date.

Note: Technically, we add on a few seconds depending on how long it took us to retrieve the document, but this is close enough for the purposes of instruction.

Figure 5-7 AvantGo sync server stores information about cached pages

Calculating a page's age

For the purposes of determining when a web page is "born", we look at the time we first access a document; not the document's Last-Modified date. This means that if you create a page at 10:00 with a max-age of 1 hour, but nobody requests it until 10:04, it will expire at 11:04 instead of 11:00. If you want to be incredibly strict about when your web page expires, use an Expires header instead of a max-age value. But this can often be problematic — see Using Expires headers.

Suppose that later, a different user requests that same document from your channel. The AvantGo sync server knows it has that page locally on its hard drive. The first thing it will do is calculate the page's age. The local time is 19:40:17 GMT, and your page was "born" on 17:28:02, so the page's age is 2:12:15, or 7935 seconds old.

Figure 5-8 Cached page sent when max-age not exceeded

Determining if the page is still "fresh"

The AvantGo sync server compares this value to the max-age value. Since 7935 is less than 18000, the page is considered fresh and will be uploaded to the client without having to contact your server over the Internet.

Note: Actually, we do a second level of caching here. The client on the mobile device also keeps track of the document's age and its max-age. If the version of the document on the mobile device is still fresh, we do not even bother requesting the page in the first place. If we get a new version of the page, but a checksum reveals that the two pages are exactly alike, we do not upload the same document again. This saves the user a lot of time, as uploading data to the mobile device can be a slow process.

This cache stored on the client is known as the client cache. The cache on the AvantGo sync server, which is used by many different clients, is called the shared cache.

Suppose a few more hours go by, and once again, somebody requests this page from your channel. The local time is 22:36:44. Now, that page is 5:08:42, or 18522 seconds old. 18522 is greater then 18000, so the page is considered stale.

Determining if a newer page is available

At this point, we have to go out on the Internet and see if there is a newer page available on your server. We do something rather clever where we ask if the page has been modified since the Last-Modified date we have for our local copy. This is done using the GET If-Modified-Since command, for those of you familiar with HTTP 1.1.

If the page has not been modified, the server on the Internet will tell us that the page has not changed. This comes in the form of a 304 Not-Modified message. If this is the case, then even though the AvantGo sync server's copy of the page is stale, it is still the most recent, and we serve it anyway. By doing so, we avoid loading the entire page over again from your web server. All we need to receive is the little 304 message telling us that nothing has changed.

Note: Not all web servers add a Last-Modified date by default. If your web server does not, you should configure it so that it does.

Figure 5-9 Cached page sent when source page has not change

If the page on the Internet has been updated, then we go out and grab the newer, fresher version of the page, note the time, and then use this newer page on our cache.

Figure 5-10 Outdated cached page is updated from source page

What about those desktop browser caches?

Is this the same thing as the local hard-drive cache that Netscape or IE uses?

No. The cache that you have on your hard drive is similar to the client cache that is stored on the mobile device. But they both function differently than the shared cache that is used on the AvantGo sync server.

Seeing the HTTP headers for a page

How do I see those HTTP headers for my web page?

The easiest way is to telnet to your web server using port 80. So, you could type telnet www.mywebsite.com:80 from the Start|Run menu in Windows, or use your own favorite telnet client.

Then enter:

  GET /somedirectory/somepage.html HTTP/1.1 (Enter)

  Host: www.mywebsite.com (then hit Enter twice)

You should get the HTTP headers for that page, followed by the HTML, and then the server will disconnect. It is probably worth saving the session to a log file or having a big buffer, so you can see the headers at the beginning of the file.



TOC PREV NEXT INDEX