When choosing the tech stack for a project, we often ask ourselves what kind of website we'll be building to better assess the right options for the project.
Every option has its own tradeoffs, so making sure we choose the one that best fits the project requirements is a must. But making this decision can be challenging once project requirements change in the future, and what looks like a good solution now can very well not be later.
In this article, we'll go through one of web developers' most complex problems: cache.
Caching
- What should be cached?
- How long should it be cached for?
- How/When should the cache be updated?
These questions are often underestimated, and the results can lead to unintended behaviors in our users. And before we can answer the previous questions, we should know what a cache is and what kind of caches are available for us.
What is a cache?
A cache is a spot to put stuff. It temporarily stores data, so you don't have to get it whenever you need it. It enhances the performance of recently or frequently accessed data, according to your context.
Types of caches
There are two main types of caches in the HTTP Caching specification: private caches and shared caches.
Private caches
A private cache is a cache bonded to a specific client, typically a browser cache. Since the response isn't shared with other clients, a private cache can store a personalized response for that user.
You must specify a private directive if a response contains personalized content and you want to store the response only in the private cache.
Suppose personalized contents are stored in a cache other than a private cache. In that case, other users may be able to retrieve those contents, which may cause the information to be leaked unintentionally.
Note that if the response has an Authorization header, it cannot be stored in the private cache (or a shared cache, unless public is specified).
Shared cache
The shared cache is located between the client and the server and can store responses that can be shared among users. And shared caches can be further sub-classified into:
- Proxy caches: In addition to the access control function, some proxies implement caching to reduce traffic out of the network. The service developer doesn't usually manage this, so appropriate HTTP headers must control it.
- Managed caches: service developers explicitly deploy managed caches to offload the origin server and deliver content efficiently. Examples include reverse proxies, CDNs, and service workers in combination with the Cache API.
- It's also possible to ignore the standard HTTP Caching spec protocols in favor of explicit manipulation. For example, the following can be specified to opt-out of a private cache or proxy cache, while using your own strategy to cache only in a managed cache.
Next, we’ll go over the different caching strategies so you can better understand which one best suits your website.
Static Site Generation
You got a website and ran the build script to pre-render all pages into static HTML files. After all pages are built, you can upload your static assets into a CDN.
So, what's really great about static site generation is now that all those documents are pre-rendered. They're static, sitting on the CDN waiting for somebody to come and visit your website to get one of those documents.
The user makes a request, and the CDN doesn't have to do any work at all. It doesn't have to build the page and doesn't have to render anything. It can just send it right to the user, and the user is super happy because it was a fast and cached response resulting in a snappy experience.
Now let's say you edit some of the data — what does that mean for static site generation? Well, something on the database/CMS changed, but your CDN still has all those static assets from the last deploy. So, if the user visits the page, they will get a fast response, but it will be stale; it's not updated with the new data yet, and sometimes that's fine as well.
To have the latest changes from your data available to your users, you need to go over the build of every single page of the website again, even if you only changed data for a particular page.
You build every page for every deploy and any edit on your data.
Server Side Rendering
- No CDN
In SSR, you don't have a big build step; you upload your website to the internet, and then build the pages on demand. So, when somebody asks for a page, you build the page on your server, and then you send it back.
The user has to wait for the page to build, so with static sites, it's nice because you built it before for the users not to have to wait for the build.
This means that every visit to every page rebuilds, but only the visited pages are built. If you have some pages that are never visited, you never build them.
In a nutshell, with a server, you only build the pages that people visit as opposed to static site generation, where you build every single page whether people visit them or not.
- With CDN
The first visitor shows up, and they request the CDN, not your server. CDN doesn't have the document yet, so it goes over to the origin server (your actual web server). The origin server builds the page, sends the page to the CDN, the CDN caches the page, and then sends it to the user. The user won't be super happy because it had to wait for the full cycle to finish.
However, with the CDN, the second visitor requests the page from the CDN, and the CDN already has the document, so it can leave the origin server alone and send a response back to the user. The user is happy because it was fast, cached, and fresh/accurate. This scenario isn't just the second visitor. It's the 3rd, the 4th, the 100th, and so on, visitors.
When using a CDN, the idea of max age can be configured via the cache-control HTTP header using a max-age property. The max-age property says how long a thing should be cached. The value of max-age is in seconds, so you can say to cache it for 60 seconds, cache it for a day, cache it for a month, etc.
Let's say you set the max-age to 60 (1 minute) on a page, so the CDN caches the page for 60 seconds, and the cache expires when 60 seconds have passed. When a cache expires, the CDN will request a fresh page from the origin server, store the fresh page in the cache and send the page to the user.
The next visitor that asks for the same page within 60 seconds will receive the page from the CDN cache. After 60 seconds, the same process repeats when a new request comes in. We’re building a page once a minute, as requests come in only when requests to the page are made.
Incremental Static Regeneration
With the stale while revalidate HTTP caching value, when a user requests a page that has been cached on the CDN but is expired, the CDN will return the expired version of the page.
Then, in the background, it makes a request to the origin server to get a fresh version of the same page. After getting the fresh page, it saves it in the cache and will be used for new user requests.
How often do you want to rebuild?
At the end of the day, if you’re using SSR + CDN + cache headers, you should ask — how often do you want to rebuild? What kind of page is this? Is this a page with data that is frequently changed? Does it matter if, after the data changes, we still show a stale version of it to the user for a bit? Answering those questions will help you understand the best caching strategy for your website.