You know, this is sort of a pretty standard way of scaling out the number of queries that you can handle. Both Postgres and MySQL I'm pretty sure fork processes per connection, so they have sort of a hard upper limit on the number of connections that they can handle. And so, you can scale out throughput by running read replicas, but this also isn't free. Your read replicas have to connect to your primary database in order to stream the replication log, which adds load to your primary database. And you're adding, like, not insignificant operational overhead to running your database in production.
So, this is still a lot of work. It's a lot of cost. And at the end of the day, you're only improving throughput, not latency, because it's the same database engine, executing the same query, just on another server. And then, that takes us to sort of the third option, which is building this custom caching solution. The idea here is you execute a query once, you store the results in something like Redis or Memcached, and then, if the data hasn't changed, you're able to read the data out of Redis or Memcached. But this is, you know, this is a lot of work. This is code that you have to write. You know, previously, your application was just making SQL queries against MySQL or Postgres. Now, you have to, you know, you have a totally different access pattern for accessing these caches. You have to deal with invalidating the caches on writes. And frequently, this is like something that's like totally manual. So, you can introduce bugs where you forget to invalidate a cache on a particular write. And these bugs, like, I've spent months of my life tracking down bugs that eventually, you know, turned out it was just we were forgetting to invalidate the cache on a write. And you're still adding operational overhead, because you have to run this extra service. And there's a lot of other problems, you know, like little fringe problems that come with running these caches. You know, fallback and failover, running these caches and distributed scenario can be really tricky. And you have this problem where because you're invalidating on writes, if you get a lot of writes, then you have to run the query against your primary database again. And you have this like thundering herd problem, where, you know, if a bunch of people request the same data against an invalidated cache entry, you know, you could really put a lot of load on your upstream database and, you know, cause a lot of production problems.
But the idea here, the idea behind ReadySet is all three of these options kind of suck. And it would be awesome if we didn't have to do any of them at all. You know, we want to be focused, instead of focusing on, you know, scaling out our database, we want to be focused on building features that our users want, you know, sort of making our customers happy. And all of this database scaling issue is kind of a distraction. So this kind of brings us to what I think is so exciting about ReadySet, is that it enables you to scale up without dealing with any of those headaches that we were kind of talking about earlier that Griffin discussed in the last few slides. So this kind of shows some of the most exciting features I think about ReadySet. Number one, it's plug and play.
Comments