Immutable documents and Restful API: cache forever, update instantaneously

Written by Sadek Drobi in Concept on March 27,2014

One of the challenges of creating a service for Content Management (Content Query API) is achieving and guaranteeing very low latency. Content is very central to an application or website. Basically you can't show the website if you have no content, and any latency in fetching content means a slow website.

Things got more challenging for us with the idea of structuring content: breaking content into different documents. This simply means that a single webpage/screen might need several documents to be shown. Facing latency with a multiplying factor, we had to think of a game-changing solution.

One of the things that we did, and we've already discussed, is ensure the scalability and decoupling of a repository API. It is extremely simple to elastically spawn new instances to serve the repository API.

The other thing we've needed to optimize, is the caching of queries of a repository API.

Immutable Document Versions

In, every time you save modifications on a document, you're creating a new version of that document. This is what is called immutable documents.

Immutable documents open a lot of possibilities thanks to their simple and clear semantics. The first obvious one is: since an immutable document version can never change, you can actually cache it, server or client side, for as long as you want, like forever.

So every document version will have a URL that is guaranteed to change if the document version changes.

Restful API: a single entry point's Content Query API is a Restful Web API. This simply means that there is only one entry point, /api. Everything else follows from there.

A Restful api is very much like a website, you get the entry document (homepage), and then you follow links and submit forms from there. The goal here isn't to explain what a Restful api is, that could be the goal of the next blog post. But you get the idea.

So a repository Restful API's entry point is /api. A get request on /api returns a document containing several things.

The main document (the API homepage) contains a list of points in the timeline of the repository, we call them releases and their identifier is called a ref. The main document also contains different forms for programmatically submiting queries. Every form has a mandatory field called ref. Ref is the point in time one would like to run the query at.

Immutable documents cache + Restful API

Combining immutable document versions with our Restful API means that anytime you want to run a query on the API, you have first to get the /api document, choose your Ref, and submit a form. The result from submitting the query is a collection of immutable document versions, and they can be cached forever.

As a result, all API queries can be cached forever, except a single small document /api. If the release you're interested in hasn't changed its ref, all results will be fetched from cache, server and/or client side.

Back to the website analogy. The homepage is what we check first, then it directs us into other resources. If something has changed, the link in the homepage will change, new link = the update is instantanious!

A finely grained "cache forever, update instantaneously"

To simplify the API, we have prefixed every query url with the Ref of the needed release. If the Ref changes, the result gets a new URL. This leads to the question: will all documents get new URLs any time there is a change?

The answer is yes but here is a solution for that. For every release document, we include a simple correspondence table between document ids and version ids. Basically this tells the version id of a document in a particular release. Anytime one wants to fetch a document with its ids, they can use this table to get a direct url to the version in question. If that document hasn't changed, the URL won't and we get a finely grained cache forever - update instantaneously semantics.

Sadek Drobi

Sadek is the CEO of and co-created the Play Framework. He also helped LinkedIn and Twitter to scale their architectures. He's an avid badminton player and Juventus fan.