-
Notifications
You must be signed in to change notification settings - Fork 9
cache global data
See: https://github.com/gaiaops/gaia_core_php/blob/master/examples/cache/global_data.t
This class demonstrates one of the most simple concepts of caching ... how to take a big chunk of data that doesn't change often and cache it. This takes a lot of load off the database server and distributes it over a pool of memcache servers.
The call SiteConfig::data() in this example returns data of key value pairs from the query:
SELECT name, value FROM config;
Often configuration variables are stored in a lookup table to allow developers to decouple configuration changes from the codebase. But if this configuration data is needed on every page request, the database will be hit very hard over and over by this one query. It doesn't matter that the database can cache the query. At some point the volume of connections will saturate the database and cause performance issues.
The Cache\Gate class uses a probabilistic approach to refreshing the cache and avoiding the problem of the 'thundering herd'. The most common approach to refreshing data in the cache is to let the cache expire. The next client to ask for the data sees that it is missing and repopulates it back into the cache. This approach has the benefit of having to only code the logic for updating the cache in one spot. It is easy to understand and maintain. It doesn't rely on cronjobs or other external mechanisms to maintain the data. And if for whatever reason the data gets evicted from the cache, the code will auto-repopulate it. But the strategy has a big problem. When many clients attempt to access a cache key in parallel and the data is missing from the cache, there is a race condition. This race condition is known as the 'Thundering Herd'. All of the clients stampede over each other to try to repopulate the data back into the cache.
When this happens, you will see a flurry of database connections stack up on the database server in regular intervals. Worse, since the query hasn't been run by the database server in a while, the query cache or innodb buffer pool may not have easy access to the data. It may have to hit the disk. If the query is poor performing (often a reason it is cached) the problem is that much worse. All the clients sit around waiting while the database attempts to access data off the disk and calculate the results of the query. The highly parallel stampede of clients can even topple and crash a Database server in the worst case scenario.
Cache\Gate attempts to solve this problem as much as it can. It elects just one client to refresh the data periodically. It does this transparently by caching the data forever and holding onto a soft timeout value in a separate cache key. When the soft timeout is reached the Gate class tells one client that no data was found, relying on that client to know what to do to re-populate the cache. It uses some other nice tricks for performance like probabilistic cache refreshing to avoid the overhead of network mutex locks on the cache key.
In additon, this example also demonstrates how to set up multiple tiers of caching. We can use apc as our first layer cache, and then fallback to memcache if the value isn't in apc. Since our memcache cache layer is wrapped in Cache\Gate, that protects us from the thundering herd hitting the database. If we are worried about one cache server going down periodically, we can keep multiple copies of the data in the cache using Cache\Replica. This insulates us from cache server outages or intermittent network problems.
The important thing to take away from this example is this: data that is used heavily in your application and changes infrequently should be cached for as long as possible while keeping closely in-sync with your database. Cache\Gate provides a nice API for reducing the likelihood of the 'thundering herd' problem when the data needs to be refreshed, and Cache\Replica keeps several copies of the data in the cache to insulate against cache outages and hotspots.