Randa Zraik

Posted on Feb 26

Microservices Caching Demystified: Strategies, Topologies, and Best Practices

#microservices #programming #architecture #beginners

This article offers a thorough look at caching in microservices from the fundamental to more advanced techniques and patterns. Along the way, we’ll see how caching can accelerate performance, keep services decoupled, and respect each microservice’s autonomy. We will go through the following topics:

Introduction: Core Concepts and Definitions
Cache Implementation Approaches
Caching Strategies
Caching Topologies
Caching Patterns and Use Cases
Data Collisions
Eviction Policies
Wrap-Up
Further Reading

Introduction: Core Concepts and Definitions

Let's clarify first few key concepts and definitions related to microservices and caching before we deep dive into the caching topologies and strategies.

What Are Microservices?

Microservices is an architectural style where software is composed of multiple independent services, each focused on a single purpose. These services:

Can be deployed, scaled, and updated independently.
Communicate (often via HTTP or messaging) rather than relying on a single monolithic database.
Avoid tightly coupled monolithic structures, enabling faster iteration and smaller failure cycles.

This separation helps teams iterate faster and isolate failures. However, data management across microservices can become more complex, especially when different services need overlapping sets of information.

Bounded Context in Microservices

A bounded context is a principle from domain-driven design, crucial for microservices. It means:

Each microservice owns its domain logic and data.
Internally, the service can structure or store data however it wants (e.g., a relational database schema, NoSQL documents, or a simple file system).
Other services cannot directly query or modify that data store.

This is often called a share-nothing approach at the data level: each service controls its own resources. However, this does not necessarily require each service to have a completely separate physical database instance. A common setup is one database (e.g., PostgreSQL) where each microservice is assigned a dedicated schema or set of tables it alone manages. As long as the service is the only one reading/writing those specific tables (and no other service bypasses it), the bounded context principle holds.

What Is Caching?

Caching means temporarily storing data in a faster medium (often memory) to make subsequent requests for the same data quicker. By avoiding repeated expensive queries or computations, caching can significantly boost performance and scalability. It’s a common technique everywhere from simple in-memory lookups to distributed systems that replicate large data sets.

Consistency vs Eventual Consistency

Consistency or strong consistency means that whenever you read data, you always get the most recent write (like in a traditional database with full ACID guarantees). This is great for correctness but can slow down distributed systems.

Eventual Consistency means data might be out of date for a short while, but eventually, all replicas or caches catch up. In microservices, we often accept a brief window of staleness in exchange for better speed and uptime. For example, if you update user preferences, a remote cache might still have the old version for a few seconds until it’s invalidated or refreshed. That’s “eventual consistency”.

If you want absolute consistency, you might do synchronous writes, which can slow the system or cause partial unavailability. If you accept occasional staleness, you get better performance and resilience.

Why Caching Matters in Microservices?

In microservices, caching can:

Improve Performance: Serve data from memory instead of re-fetching from databases or external APIs. This is crucial when a microservice must repeatedly call another microservice or run expensive queries.
Enhance Scalability: Offloading repeated reads to a cache lightens the load on the original data store or service, allowing the overall system to handle more traffic.
Reduce Inter-Service Chatter: Some services might rely heavily on data “owned” by another service. Instead of making many network calls, a local or shared cache can speed things up.
Partially Decouple Services: If the owner goes offline temporarily, other services can still serve cached data (for read-only cases).

Yet, caching in microservices introduces additional complexity:

Consistency: Cached data can become stale or out-of-sync.
Collision Handling: Multiple services or instances writing the same cached data can overwrite each other.
Bounded Context: We must ensure that caching external data doesn’t break the share-nothing principle by bypassing the owning service’s authority over updates.
Eviction Policies: Which data gets removed when the cache is full or out-of-date?

Cache Implementation Approaches

In many caching products, you’ll find two broad ways to store and query data: IMDG (In-Memory Data Grid) and IMDB (In-Memory Database).

IMDG (In-Memory Data Grid)

Definition: A distributed key-value store kept entirely in RAM.
Data Model: Typically a map or dictionary of name-value pairs, plus some metadata.
Use Case: Fast get/put caching with minimal overhead, primarily for simple data access.
Examples: Hazelcast, Apache Ignite, Infinispan, Coherence, GemFire.

If your caching usage centers on straightforward queries, i.e., fetching or updating objects by key, an IMDG is ideal for its simplicity and speed.

IMDB (In-Memory Database)

Definition: An in-memory system that can behave more like a database, often supporting SQL-like queries, indexing, or advanced data operations.
Data Model: Potentially relational or table-like, capable of handling more complex queries (joins, aggregates).
Use Case: You need robust query capabilities or analytics on cached data, not just key-based lookups.
Trade-Off: Usually higher memory/CPU usage than an IMDG due to indexing and query engines.

An IMDB is valuable if your cache must support complex queries, like filtering or joining multiple data sets in-memory. This can be a big performance gain for analytics or specialized read patterns but requires more resources.

IMDG vs. IMDB

Simplicity: If your data is basically a series of name-value pairs, an IMDG suffices.
Complex Queries: If you want advanced querying (e.g., partial scans, joins, SQL), an IMDB is a better fit.
Performance Overhead: IMDB’s query engines can be slower and more memory-intensive compared to IMDG.
Purpose: Evaluate whether the cache is just a performance booster for repeated gets or a mini-database in memory for more elaborate data logic.

Caching Strategies

These strategies describe how reads and writes flow between your service, the cache, and the underlying data store. You can apply them to almost any caching topology (single in-memory or distributed), though they’re commonly used with local caches.

Read-Through

The microservice always reads from the cache.
If the data is missing (cache miss), the cache itself fetches from the database, updates the cache, and returns the result.
From the microservice’s perspective, it’s only talking to the cache.
Simplifies reading, but if the database belongs to another microservice domain, you bypass the actual owner’s logic.
For purely read-only usage in your domain only, this can be straightforward.

Write-Through

The microservice writes directly to the cache.
The cache synchronously writes the change to the underlying database.
From the microservice’s perspective, it’s only talking to the cache.
Keeps data consistent but can slow performance if the database call is slow as it must wait for the write to complete.
Similarly can break domain boundaries if you are writing to another microservice’s database.

Write-Behind (Write-Back)

The microservice writes to the cache and returns quickly.
The cache asynchronously updates the database afterward.
Reduces write latency since it does not wait for the database write, but risks data loss if the cache node fails before persisting to the database, and can cause timing issues if other processes expect immediate writes.
Similar boundary issues if updating another microservice’s database.

In strict microservices, letting a cache talk directly to another service's database can undermine the bounded context principle unless carefully encapsulated. Often, you'd prefer your own domain data for these strategies, or you might rely on read-only caching for external data, for that you can consider a data sidecar or a data sharing approach that we will discuss later to avoid direct database calls that bypass the rightful domain owner.

Caching Topologies

In microservices, caching can take several architectural forms, each physically arranged in distinct ways. Each topology has strengths and limitations, particularly regarding fault tolerance, data consistency, scalability, and complexity.

Single In-Memory Caching

Here, you simply load data (e.g., user preferences, some small reference set) into local RAM within a microservice instance. Each instance keeps its own cache.

Suitable for:

Small or mostly static data sets.
Your microservice runs as a single instance or you can tolerate minimal updates and data skew.
The data belongs to your domain (bounded context) so you’re not breaking ownership rules.

Pros:

Performance: Extremely fast, as data is stored in local memory with no network latency.
Complexity: Simple to implement. Requires no extra infrastructure.

Cons:

Consistency and Multiple Instances: If your microservice is scaled across containers, each instance has its own local cache. Updates in one instance aren’t automatically propagated to others, leading to data skew or stale data if the data changes often.
Scalability: A single instance’s memory might not handle large data sets.
Write-Heavy Scenarios: Single in-memory caching suits read-heavy loads. For writes, multiple instances might each update local data, leading to divergent caches or stale state if no synchronization is in place.
Bounded Context: If you rely on read-through/write-through for data that belongs to another domain, you skip that domain’s service logic unless you encapsulate calls through their API.

Still, single in-memory caching is simple and great for static or rarely updated data, or small reference sets that every request needs. For bigger or more complex systems, you’ll often turn to more advanced topologies.

Code Snippet:

This example demonstrates how to use an in-memory cache in .NET with IMemoryCache:

// Register IMemoryCache in Program.cs
builder.Services.AddMemoryCache();

// Get or create a cached value
var value = await _memoryCache.GetOrCreateAsync("key 1", _ => Task.FromResult("value 1"));

Distributed Caching (Client-Server)

A distributed cache keeps data in an external caching cluster, often a separate server or group of servers, while microservices connect to it through a client library over the network. Examples include Redis, Memcached, or Apache Ignite/Hazelcast in client-server mode.

How It Works:

You have one unified external caching cluster (i.e. redis).
You have a cache library in each microservice instance.
Your code calls this library’s API.
The library uses a proprietary protocol to talk to the external cluster.
The clusters stores and replicates data as configured.

Bounded context is not violated as no one is hitting someone else's database, each service has its own read-only cache in the caching server. Also, IMDG or IMDB can be used here, if you only need key-value usage, you’d likely configure an IMDG mode, if you want to run queries, you might pick IMDB mode (though that’s less common for a simple caching scenario).

Pros:

Consistency: All instances share one cache to read and update. Consistency is simpler to manage.
Scalability: If the distributed cache cluster is robust (e.g., horizontally sharded or replicated), it can handle large data volumes and concurrency.
Many real-world microservices rely on distributed caching (e.g., Redis) because it’s straightforward to manage and widely supported.

Cons:

Performance: Slower reads/writes compared to local memory (due to network latency).
Complexity: Must manage an external caching layer (e.g., multiple Redis nodes, replication, or clustering).
Availability: If the external cluster is unreachable, caching fails for all microservice instances.
Fault Tolerance: Potential single point of failure unless replicated or clustered properly. Losing the cache node can disrupt everything.

Code Snippet:

This example demonstrates how to use a distributed cache in .NET with Redis:

// Wire Redis in Program.cs and use IDistributedCache to get/set data
builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = Configuration.GetConnectionString("Redis");
});

// Store a value in the cache
await _distributedCache.SetStringAsync("key 1", "value 1");

// Get the value from the cache
var value = await _distributedCache.GetStringAsync("key 1");

Replicated Caching (In-Process)

This type doesn't require an external server. Each microservice instance has an in-process cache, but updates are replicated to all other nodes, and this is handled by the cache engine. Products like Hazelcast, Apache Ignite, GemFire, Coherence, and Infinispan support this mode.

How It Works:

You still use a library (e.g., Hazelcast, Ignite) in each microservice instance.
Each instance has its own in-process memory cache.
When your app writes to the local cache, updates are automatically replicated to other instances via a proprietary protocol.
So every node eventually has the same data in memory.

Pros:

Performance: Extremely fast local reads (nanosecond-level) because data is in the same process memory.
Fault tolerance: If one instance fails, others still hold the fully copy of the data in memory (assuming no partition issues).

Cons:

Scalability: Large data sets can cause scaling issues as every instance must store it.
Collisions: High update rates risk collisions or “split-brain” scenarios if replication lags. This will be discussed later in Data Collisions section.
Complexity: More complex coordination among large numbers of instances.

Code Snippet:

This example demonstrates how to use a replicated cache in .NET with Hazelcast:

var options = new HazelcastOptionsBuilder()
.With(args)
.Build();

// Create an Hazelcast client and connect to a server running on localhost
await using var client = await HazelcastClientFactory.StartNewClientAsync(options);

// Get the distributed map from the cluster
await using var replicatedMap = await client.GetReplicatedMapAsync<string, string>("replicated-map-1");

// Store a value in the replicated map
await replicatedMap.PutAsync("key 1", "value 1");

// Get the value from the replicated map
var value = await replicatedMap.GetAsync("key 1");

Near-Cache Hybrids

A near-cache approach combines the distributed caching and the replicated caching.

How It Works:

A microservice instance has a local “front” cache for “hot” items with a capacity limit and an eviction policy configured. We will talk later about Eviction Policies.
There's also a distributed “backing” cache (like Hazelcast or Ignite cluster) that holds the full data set.
Reads first go to the local near/front cache. If it's not there, they retrieve from the backing cache.
Writes usually go to the backing cache, which sends invalidates or updates to local near-caches for other instances via a proprietary protocol to ensure they remain in sync.

Pros:

Blends scalability of a distributed store with fast local reads for frequently accessed keys.
Reduces repeated remote calls if the item is “hot”.
Limits local memory usage (only “most recently/frequently used” items).

Cons:

Additional complexity in configuring two-tier caching.
Brief staleness possible unless invalidation updates propagate instantaneously.
Doesn’t store the entire data set locally, so cache misses still require network access to the backing store.

Code Snippet:

This example demonstrates how to use a near cache in .NET with Hazelcast:

var options = new HazelcastOptionsBuilder()
    .With(args)
    .Build();

// Configure NearCache
options.NearCaches["near-cache-map-1"] = new NearCacheOptions
{
    Eviction = new Hazelcast.Models.EvictionOptions()
    {
        // Evicts least recently used entries
        EvictionPolicy = EvictionPolicy.Lru,
        // Max size for entries
        Size = 10000,
    },
    // Max number of seconds for each entry to stay in the Near Cache
    TimeToLiveSeconds = 60,
    // Max number of seconds for each entry can stay in the Near Cache untouched
    MaxIdleSeconds = 3600,
    InvalidateOnChange = true
};

// Create a Hazelcast client and connect to a server running on localhost
await using var client = await HazelcastClientFactory.StartNewClientAsync(options);

// Get the distributed map from the cluster
await using var map = await client.GetMapAsync<string, string>("near-cache-map-1");

// Store a value in the cache
await map.SetAsync("key 1", "value 1");

// Get the value from the cache by key
var value = await map.GetAsync("key 1");

Topologies Comparison

	Single In-Memory	Distributed (Client-Server)	Replicated (In-Process)	Near-Cache (Hybrid)
Performance	Extremely fast local	Network-based reads	Nanosecond local reads	Local + distributed store
Data Volume	Small, mostly static	Potentially large	Usually smaller sets	Large in backing
Update Rate	Very low changes	Handles high writes	Moderate updates	Moderate / High
Fault Tolerance	None if multi-instance	Cluster config dependent	Node-level replication	Partial replication
Consistency	Cache is per instance, no unification	Central store	Collision risk under concurrency	Local front can be stale briefly

Caching Patterns and Use Cases

We will discuss some higher-level, application-focused solutions for typical microservice challenges. These patterns can be built on top of different topologies.

Data Sharing

Scenario: Product microservice owns products information, while Order microservice needs to read that data regularly. Order microservice calling Product microservice’s API constantly might become a bottleneck or add unnecessary network overhead.

How It Works:

Product microservice remains the sole owner of the data (bounded context).
Order microservice, which needs that data, sets up a local cache to store read-only copies.
When Order microservice needs the data, it can check its cache first. If it’s stale or missing, it calls Product microservice’s API.
Order microservice never writes directly to Product microservice’s data store. Product microservice is still the only one responsible to modify its own data.

Pros:

Respects boundaries and achieves strong decoupling.
Performance: Faster reads due to the local cache for the other services that need the data.
Fault Tolerance: The other services can continue to operate even if the original service is unavailable.

Cons:

Consistency: The other services might not see the changes immediately made by the original service.
Cache Invalidation: The other services must decide how long it trusts the cached data before refreshing from the original service. So avoid this pattern if the service is write-heavy.
Memory Overhead: If the dataset is large, the cache can consume significant memory.

Data Sidecars

Scenario: Profile microservice owns detailed user profile data. Several other microservices need to read it heavily. They shouldn’t directly connect to Profile microservice’s database, nor spam the Profile microservice API every time.

How It Works:

Profile microservice writes changes to its domain data as usual.
Whenever data changes, Profile microservice also updates a distributed cache (the “sidecar”).
Other microservices read from the sidecar, which is effectively read-only for them. The domain logic for writes remains in Profile microservice.

Pros:

Respects boundaries and achieves strong decoupling.
Performance: Less load on the microservice. Others read from the sidecar cache instead of making direct calls or updating the database.
Consistency: Everyone sees a consistent (or eventually consistent) picture from the sidecar.
Scalability: Sidecar is scalable and can handle large volumes of data efficiently.

Cons:

Fault Tolerance: If the cache node goes down, reading services lose their data unless there’s replication or a fallback path.
Extra Complexity: Setting up the push/refresh logic or using events to keep the sidecar in sync.

Multi-Instance Caching

Scenario: One microservice, say Order microservice, needs to be scaled to 10 containers to handle high traffic. Each container needs the same reference data or read/writes to a shared domain. You want local caching but must keep them consistent enough.

How It Works:

If each container does single in-memory caching independently, you get data skew.
Instead, you pick a replicated or near-cache approach so that changes can propagate among instances.
- Replicated: All instances store the full data set in memory. When one node updates a key, it’s broadcast to others via a proprietary protocol.
- Near-Cache: Each node has a partial local cache and fetches from a backing store if missing or stale.

Pros:

Performance: Each instance can quickly respond to read requests from memory.
Scalability: You can add more containers without manually syncing caches.

Cons:

Collisions: If multiple nodes write the same key concurrently, overwrites can happen.
Memory Usage (replicated) or Complex Invalidation (near-cache).
Consistency: Some nodes might see outdated data briefly.

Tuple-Space Pattern

Scenario: You have a system that does high speed processing (i.e. a stock trading platform) and relies on all data being in memory for lightning-fast reads and can accept the overhead.

How It Works:

You load all relevant data into an IMDG or IMDB (like a huge in-memory store).
Reads are basically memory-speed lookups, no disk or external service.
Writes must also sync with the store or an underlying database eventually.
The entire microservice logic might revolve around the in-memory “space” (hence the name “tuple-space pattern”, and also the “space-based” architecture style).

Pros:

Performance: Ultra-Fast Reads. Everything is in memory.
Ideal for: Very high read or compute-intensive tasks (e.g., real-time analytics, stock trading, or matching engines).

Cons:

Huge Memory usage: Storing all data in RAM can be expensive.
Complex Writes: If multiple services or instances attempt to update data, concurrency and collisions can be tough to handle.

Wrap-Up on Patterns

These patterns often overlap, for example, a sidecar approach might also leverage multi-instance caching or near-cache logic. The key is to keep the domain lines clear so you never override someone else’s data domain rules and choose a pattern that balances performance with the reality of concurrency, staleness, and memory cost.

Data Collisions

Understanding Data Collisions

When using replicated caching (or multi-master distributed caching), two instances can update the same record at nearly the same time, with replication lag. For example:

Instance A decrements an inventory count from 700 to 690.
Instance B decrements from 700 to 695.
Both replication messages cross in flight, overwriting each other. End result might incorrectly show 690 or 695 instead of 685 total.

These inconsistencies are typically called split-brain or data collisions.

Avoiding Data Collisions

Queueing: Instead of writing to the cache directly, each instance sends a message to a queue. A separate service processes these messages sequentially, ensuring no collisions but the trade off is eventual consistency.
Compare-and-Set (Version or Timestamp Checks): The microservice checks a version (timestamp or sequence) before updating. If the version changed, it means someone else updated the data and the operation should be retried.

Calculating Collision Probability

Collision probability can be approximated by the following formula:

Collision_Rate ≈ Number_of_Instances × (Update_Rate² / Cache_Size) × Replication_Latency

Number_of_Instances: How many instances.
Update_Rate: Writes per second.
Cache_Size: Total distinct data entries. The bigger it is, the less often the exact same entry collides.
Replication_Latency: Average time for updates to propagate (ms).

If the collision rate is low (like under 1%), you might be fine. If it’s high, you’ll need concurrency mechanisms.

Example:

Number_of_Instances = 8
Update_Rate (seconds) = 300
Cache_Size (rows) = 30000
Replication_Latency (milliseconds) = 50
Then Collision Rate is 1.2 per second, which is above 1%, so collision probability is a bit high and we need to consider some concurrency mechanism.

Eviction Policies

Caches are finite. When they fill up, something must be removed to make room for new entries. Various eviction policies address different usage patterns.

Time-to-Live (TTL)

Definition: Each entry has an expiration timer. After the time elapses, the cache discards it.
Pros: Good for data that “naturally” becomes stale quickly (like real-time bidding info).
Cons: Does not handle the scenario where the cache is simply full (some items might still be unexpired).

Archive (ARC) Policy

Definition: Evicts items based on creation date, e.g., only keep entries under 6 months old.
Pros: Excellent for storing recent transactions (user orders for the last 6 months) and automatically discarding older data.
Cons: Also doesn’t handle the scenario of a “full” cache. If the cache is at capacity but none of the data is older than X months, new entries cannot be added.

Least Frequently Used (LFU)

Definition: Evicts the entry with the lowest access frequency.
Pros: If data is heavily read over time but rarely updated, this can keep popular items in memory.
Cons: When new items are inserted, many LFU algorithms reset counters. Frequently used items might get evicted if a series of puts occur. Can cause surprising evictions in “put-heavy” workloads.

Least Recently Used (LRU)

Definition: Evicts items that have not been accessed for the longest period.
Pros: Generally the most intuitive for interactive data. Items used recently remain in cache.
Cons: Has overhead in tracking recency (often via a linked list or timestamps).

LRU is a common default for near-cache front caches (a “most recently used” approach). Just remember, an MRU eviction policy is the opposite: it evicts the most recently used item (rarely beneficial).

Random Replacement (RR)

Definition: When the cache is full, pick an item at random to evict.
Pros: Minimal overhead, extremely fast.
Cons: No intelligence about usage patterns; can evict the most popular item.

Selecting the Right Eviction Policy

A recommended approach:

Start with Random (RR) if usage patterns are unknown. Measure cache hit rates (via logs, counters, or built-in metrics).
Experiment with LRU or LFU for a trial period, measuring the difference in hit ratio and overall performance.
Choose the best performer for your data behavior.
Time-based polices (TTL, ARC) shine when data is stale after a certain window or you only want to keep recent or valid data.

Wrap-Up

Caching in microservices isn’t just about speed, it’s about reducing network calls, managing concurrency, and respecting domain boundaries. Make sure to understand your application's characteristics, data behavior, and the trade-offs of each caching approach before committing to any caching strategy.

Introduction: Core Concepts and Definitions

What Are Microservices?

Bounded Context in Microservices

What Is Caching?

Consistency vs Eventual Consistency

Why Caching Matters in Microservices?

Cache Implementation Approaches

IMDG (In-Memory Data Grid)

IMDB (In-Memory Database)

IMDG vs. IMDB

Caching Strategies

Read-Through

Write-Through

Write-Behind (Write-Back)

Caching Topologies

Single In-Memory Caching

Suitable for:

Pros:

Cons:

Code Snippet:

Distributed Caching (Client-Server)

How It Works:

Pros:

Cons:

Code Snippet:

Replicated Caching (In-Process)

How It Works:

Pros:

Cons:

Code Snippet:

Near-Cache Hybrids

How It Works:

Pros:

Cons:

Code Snippet:

Topologies Comparison

Caching Patterns and Use Cases

Data Sharing

Data Sidecars

Multi-Instance Caching

Tuple-Space Pattern

Wrap-Up on Patterns

Data Collisions

Understanding Data Collisions

Avoiding Data Collisions

Calculating Collision Probability

Eviction Policies

Time-to-Live (TTL)

Archive (ARC) Policy

Least Frequently Used (LFU)

Least Recently Used (LRU)

Random Replacement (RR)

Selecting the Right Eviction Policy

Wrap-Up

Further Reading

Read next

Laravel 12: What’s New and Why It’s Awesome!

Best Practices for Returning API Responses in Express.js Controllers

Model Context Protocol (MCP): A New Standard for AI Tool Interoperability

What do you guys think is the easiest coding language to learn