DEV Community

Truong Vu
Truong Vu

Posted on

Lessons Learned: App crash since caches go wrong

Redis is a fast, open-source, in-memory key-value data structure store. Everyone knows about it, but do you think you are implementing it correctly? I did it wrong and this post shared about my mistake.

There was an application I designed using Cache-aside Strategy to reduce database load. I thought it was good enough, but then, one day, my phone started blowing up with Service Unavailable notifications, and I knew it was going to be a very long day.

How Cache-Aside Strategy Works

How cache-aside works

Basically, the application will determine whether the item is currently held in the cache. Then if item is not available, it will read the item from the database (MySQL). After that, it will store a copy of the item in the cache.

What went wrong

The application went down since requests bypassed the cache and overwhelmed the database with simultaneous queries. So the database couldn’t keep up, and the entire service went down. Basically, there were two major issues:

  • A large number of cache keys expired all at once, causing a Thundering herd problem.

  • A huge number of requests reading the same key which had expired. Then, all of them directly read data from database, leading to a Cache Stampede.

These were the edge-cases that I wasn’t thinking about.

Solutions

Thundering Herd Problem

Workaround: Randomized expiration times for cache keys to avoid them expired at the same time.

Pseudo-code:
resolve thundering herd problem

Cache Stampede

Workaround: Use a distributed lock to allow only one process to update the cache, while others wait.

Pseudo-code:

resolve cache stempede by using redis distributed lock

This ensures:

  • Only one process queries the database and updates the cache.

  • Other requests wait for the cache to be updated instead of overloading the database.

Lesson Learned

So many factors led to that nightmare. Overall, I lacked a proper continuous monitoring system to detect and address potential problems. Also if we have a well-planned strategy for edge cases and load testing would have helped catch issues early.

This lesson teaching me how important of strategic thinking in software architecture.

If you’re working with a similar setup, trust me—don’t wait for your app to crash before considering robust caching strategies and thorough testing.

Top comments (0)