By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Improving performance via async, in-memory caches

Martín Valdez
September 29, 2023


In Java (our preferred technology for enterprise-level backend development at Ensolvers), there are plenty of well-known, proven and open-source caching libraries available. Memcached and Guava Caches are two good examples. We use them in many of our projects to improve performance drastically, especially when reads are far more frequent than write operations.

However, in some projects we've written some specific kinds of caches, tuned to what we need in particular. In this article we are going to describe one of these cases. 

This article presumes that the reader has a basic knowledge about caches such as Memcached and Guava Cache, and relational databases such as MySQL. 

Problem and Solution

In a backend project in which we have been working for several years, due to the size of the user base (hundreds of thousands of daily users) and the type of application, we were in a need to reduce the request response time drastically to provide the best responsiveness possible.

Basically, the backend application consists of a REST API that serves content for a cross-platform mobile application. Due to the size of the user base, backend services receive thousands of requests per minute. Since most of them were only consuming data, initially most of the date objects served were cached using a combination of Memcached and Guava in a two-layer caché configuration. This allowed us to avoid a lot of requests against the underlying relational database (MySQL, clusterized via Amazon Aurora).

In general, these caching solutions trigger a loading function when content expires or is absent. In these cases, they trigger a recalculation of the content to be cached content which, in some cases, results in heavy and expensive database queries.

Due to the behavior of the application in this particular case, even when using distributed caches, under this setup many concurrent requests could trigger the content recalculation causing a DB high CPU usage which impacted in response times.

First improvement: cache warm-up and pre-computation

The first improvement approach we did was to (1) do the cache computation asynchronously in background via scheduled tasks and (2) persisting the content in a denormalized way in the same relational DB so every cache will have content, as a fallback to distributed caches. This was a big milestone in performance. However, having three levels of caching (in-memory with Guava Cache, distributed with Memcached and this new implementation) made it hard to maintain over time. Not only was the code complex, but from time to time the concrete expiration time for the content to be effectively updated was hard to predict. 

Finally, since we were using the same relational DB for storing the cached content, while BD usage was drastically reduced, we thought that this could be improved even more.

Second improvement: in-memory persistent cache

The second and clearly important change was the implementation of our in-memory persistent cache approach. We basically merged the 3 levels of cache in one, custom implementation. We did this by implementing a new custom cache API that 

  1. Keeps the cached content in-memory all the time
  2. It runs the re-processing job periodically in background and in a distributed way, ensuring that only one node in a cluster is running the heavy queries against the DB
  3. When re-computed, content is stored in a separated NoSQL database. 
  4. Every node is synchronized every X amount of minutes with the NoSQL database to maintain the last version of the content

In addition to this, we decided to store more granular dumps of information in the cache so many of the content requests resulted in projections over a particular entry of data stored in memory.

In Java implementation terms, basically this cache is a structure that need to be instantiated with:

  • Supplier function that implements how to calculate the content to be cached in each case
  • Key getters, a list of functions to extract keys from content to index them into the cache structure.
  • Key prefix, a prefix for generating the keys into the structure.
  • Object class, specifying the type of objects being managed by the cache - since Java is strong typed.
  • Persistence storage strategy - a provider for persisting the content computed to work with concurrent tasks (one provider, many consumers)
  • The time between each content recalculation.

And other minor customization parameters.


In the code stub below we have a sample instantiation of the cache solution implemented to cache information of geolocalized cities.

For our main use, we configured the new cache with an implementation to persist the content into AWS S3. This way, with the current implementation, we have many nodes that read the pre-generated content from S3 (in JSON format) and load into their local memory. At the same time, one of these nodes (randomly chosen by Quartz scheduler) acts as the "coordinator" which calculates the content acquired from DB and stores in S3 the processed content. This “lead” node can do this work at the same time that nodes are reading new copies and due to the serverless nature of S3 (and other solutions of the kind like DynamoDB) generates no extra delays or contention. 

With this implementation, we can ensure that all user requests have the updated content available in memory most of the time, resulting in a reduced and practically constant response time (see Figure 1). At the same time we can decide exactly when the content should be recalculated and when it should be read. So we can define the recalculation every hour at 0 minutes, or every half hour, etc. Only two variables affect this: the update periodicity of each node and the re-computing cadence. Some time-critical calculations (for instance, scheduled content) can be done directly in-memory, without any further query to another layer.

Figure 1. New Relic response time charts showing 12.5ms of response time, even under high throughput conditions

This approach can be adjusted to different scenarios with slight configuration variations or small improvements. For example, we can change where the content is stored, the periodicity in which content is recomputed and updated, the keys to index the content with more than one criteria at same time. With some extra improvements it is also possible to split the data to process and upload with batches instead all at once.


In this article we discussed an approach to replace the traditional multi-level caches with a custom, distributed, in-memory and persistent cache. The main purpose of this solution is its simplicity and pluggability. If we need to change one part of the flow, we only need to provide a piece of code that provides a new strategy, while maintaining a one-layer configuration.

Interested in our services?
Please book a call now.