Video Summary and Transcription
Key-value databases are optimized for high availability and partition tolerance, making them ideal for storing non-relational data structures. They prioritize speed and high availability over consistency, making them suitable for simple data models or one-to-one mappings. However, they have limited query capabilities compared to relational databases. Some available options for key-value databases include DynamoDB, CloudFlare Worker KV, Redis, and FoundationDB. Using key-value databases in TypeScript requires addressing challenges such as serialization, deserialization, secondary indexes, relationships, and validation. The Talk includes a demonstration of an application that showcases the use of secondary indexes and the implementation of a key-value database in Deno or Redis.
1. Introduction to Key-Value Databases
In this part, we'll discuss the key characteristics of a KV database, including its ability to store non-relational data structures and its optimization for high availability and partition tolerance. We'll also explore different flavors of KV databases and how to use them in TypeScript applications.
How's it going, folks? My name is Kevin Winery, and today we're going to talk a little bit about key-value databases. And if you've never used a KV database before in the context of a JavaScript web application, I think you're really going to enjoy it and learn how it's a little bit different than some of the other databases you've used before and kind of when you might want to apply this technology in the code that you're already writing.
So in this presentation specifically, we're going to take a look at some of the key characteristics that make a KV database different from a relational database that you might have used in the past. And we'll also examine some of the use cases for which a KV database is uniquely suited and might be the tool that you reach for in those contexts. We'll also look at some of the different KV database options that are out there. There's a number of hosted platforms, vendors, open source projects that make KV databases available. So we'll take a look at some of the different flavors of KV databases that exist and what makes them different. And finally, we'll take a look at how to use a KV database in the context of a TypeScript application, and how to solve some of the common programming challenges you're likely to run into if you are using a KV database.
So let's dive in by talking a little bit about the key characteristics of a KV database. And what it boils down to really is two key things. A KV database is really good at storing non-relational data structures. And it's also sort of optimized to be highly available and partition tolerant. And we'll dive in a little bit more deeply into both of these areas to understand what that means. But we'll start with what I mean when I say non-relational data structures, specifically, and why KV database is optimized for those things. So as a JavaScript developer, a data structure like this is probably something that you've seen before. It's an object literal with keys that map to other object literals that contain data about a user that is within your application, their full name, their email address, and things like that. And KV database actually functions a lot like an object literal or a map from the JavaScript world where you have a key that maps one-to-one to a value in your application of some kind.
2. Key Characteristics and Techniques of KV Databases
In this part, we'll discuss the key characteristics of a KV database, including its ability to store non-relational data structures and its optimization for high availability and partition tolerance. We'll also explore different techniques, such as hierarchical data structures and secondary indexes, that can be used to create complex data models within a KV database.
And those values can be arbitrary JavaScript objects structured however you'd like. And the keys, depending on the database that you're using, can either be just a plain string or a key could be like a compound value, like an array that could have multiple different kinds of values that come together to form a unique identifier for one of the values in your database.
Now the key characteristic there, though, is that one-to-one mapping between a key and some kind of value that exists in your database. And the way that you typically interact with a KV database from your code is you'll specify a key, you'll create a value—oftentimes in a TypeScript context there'll be like type information that's associated with that value before it's stripped away at runtime and becomes just a plain JavaScript object. And there's this get set API that's generally going to be available in every database where you can associate some bit of data with a particular key in your database.
And in addition to just having this one-to-one mapping, there's also a couple of techniques that you'll use for more complex data within your application. And one of those things is sort of the concept of a hierarchy that could exist in your application as well. Because a KP database, just by itself, is fairly limited in what it can express in terms of the structure of your data. There's a key, there's a value, that's pretty much it.
And for simpler use cases like storing user preferences or some of the other use cases we'll get into in a minute, that might be fine. But by employing just a couple of different tricks with how you think about your data, you can actually get a lot more value and express a lot more meaning in your key value database by introducing some different techniques in how you manage your key space or the set of keys that you use to map to your data. And one of those things is a hierarchy.
So if you have been in web development for a while, you may have used a RESTful API where this concept of hierarchical data and as you build URLs, you can sort of infer from the structure of the URL how the data that is found at a particular resource might be used within your application. And when you're designing keys for the data in your database, you can actually think about it in a very similar way where you can create a hierarchy of data in how you structure your keys. So you'll see examples here where if you're storing data in a KV database for a blog post, you might structure the key in such a way where there's a post key that's then followed by like a year key and then you can build a key progressively using the different parts of either the slug for the blog post or like the month and the day that it was published. And now you have as a part of your key actually encoded some useful data that you could then later use to maybe grab all the blog posts from a particular year or a particular month. So structuring your data in a hierarchical way gives you a way to add a little bit of extra value and meaning into the keys that you're using for your data.
The other very common technique that you're going to imply in a KV database is this idea of a secondary index. KV databases are non-relational, so you're not going to be storing a join table that's gonna be mapping one record in a data table somewhere to another one. But that doesn't mean that you can't maintain any kind of relationships or have different ways of addressing your data.
So a common need is to be able to access the same type of information by a slightly different key. So let's imagine you have a database of users and the sort of primary key that you use for that is the user's username. That's sort of the primary identifier that you have for them in your system. However, the email address for a user might be another way to uniquely distinguish a user in your application, so you want to be able to address and fetch data for a user based on that as well. So what you would do is, when you would store the information in your KV store under one ID, you would at the same time store that same information under another ID or a secondary index, which would allow you to look up the same information using that secondary index. And typically what you'd do is you'd store that in that secondary index, you would just store whatever the key value is for the primary index and then use that to such that, you know, only one value in your database is sort of the source of truth for the data associated with that record.
So with a combination of hierarchical data structures and secondary indexes, you can actually create pretty complex data models within a KV database. Maybe not at the same level as a relational database, but you'd be surprised how much mileage you can get out of just these two techniques.
So we talked a little bit about a KV database being a non-relational data structure store, let's talk a little bit about some of the technical parts that make it different from a relational database, and specifically about how these types of databases tend to be very highly available and partition-tolerant. And to understand what those words mean, we will take a quick, deep tour to talk about this idea of Cap Theorem.
3. Distributed Databases and Key Value Trade-offs
In a distributed database system, you have to optimize between availability and consistency. Key value databases prioritize high availability for fast reads, even if the data may not be completely consistent across all nodes. This trade-off depends on the specific use case and the importance of consistency.
So in a distributed database, the idea is that you can only guarantee maybe two of these three things are true for a distributed database system. The data that's in the database system can be highly available, which means that every time you try to contact the API, it is going to give you some kind of response back, even if it might not be consistent. And consistent then means that the data that you are reading every time is going to be the same across different clients. So if you are in India reading a value from the database, and I'm here in the United States reading a value from the database, we're both seeing the same data. So that's considered to be consistent.
And partition tolerance means that your database can, because it is distributed across multiple machines, it's possible to continue interacting with your database even when different nodes in the system go down. And in a distributed database system, you really have to have partition tolerance or it's not going to work very well. So typically what happens when you are designing a database is you have to optimize in one direction or the other between availability and consistency. And key value databases tend to favor high availability instead of consistency for a couple of reasons.
The first one being key value database use cases really are interested in very fast reads. So we want to be very fast and high volume reads. And so we want the availability to end up being much more important for some use cases like storing a shopping cart or user preferences versus that data being highly consistent. And the sort of trade off that often gets made is this idea of the data that's coming from a key value database is eventually consistent. So it might be the case that when you read a value from a KV database, it might not be exactly the same as a value that's being read somewhere across the world because maybe, after data has been written to your database, it hasn't been replicated across every node in the network. So it's slightly different. There's different scenarios in which the data might not be completely consistent all the time. But for some use cases, that's okay. It's just a different set of trade offs that you would make depending on what you're trying to implement.
4. Comparison of KV and Relational Databases
A key-value (KV) database excels at fast reads and writes due to its one-to-one mapping between keys and values. It is ideal for storing data that closely matches the data in your application. However, KV databases have limited query capabilities compared to relational databases, which offer rich querying options. Relational databases are strongly consistent by default, making them suitable for systems that require transactional and data integrity. KV databases prioritize speed and high availability, sometimes sacrificing consistency. Use KV databases for simple data models or one-to-one mappings, such as shopping carts or real-time game states. Even complex CRUD applications can benefit from KV databases if secondary indexes and other strategies are utilized. If you anticipate the need for complex querying or unpredictable data extraction, a relational database may be a better choice.
So it's also useful to think about KV database in comparison to a relational database. A KV database tends to be very good at fast reads and writes. Because there's that one-to-one mapping between a key and a value, reading a piece of data is very fast, and writing a piece of data is also very fast. Those tend to be very quick operations.
Because the data structures that you're storing are pretty simple, and the API to store that data is pretty simple, it's really good at storing data that's modeled very closely to the data that exists in your application. In that example we saw earlier, we saw an object literal that we were just able to store in the database and pull back out of the database without having to do a lot of jumping through hoops to make it into a format that our application could understand.
On the relational database side, the reads are still pretty fast, but because they support very rich querying, the reads can be a little bit slower than they would be in the context of a key value database that only has to do sort of an O of one lookup on a particular piece of data. And because the data in a relational database is relational, there's the join tables and pointers to other tables within the data. The way that that data is expressed is a little bit more complex and sometimes the APIs for interacting with a relational database are a little more complex too. And you have to do a little bit of extra work to get the data that exists in the database to sort of conform to a format that makes sense in your code.
So that's why you'll often see technologies like object relational mappers exist to kind of translate from tabular data to an object model that makes more sense to us as a programmer. Something that key-value databases though, especially in relation to relational databases are not all that great at is query capabilities. So some key-value databases do provide some amount of query capabilities so you can reference the values of objects and query them from your database in that way. But those capabilities are, as a general rule, much less robust than what's available in a SQL database. Being able to do SQL queries on your data is probably the most powerful way that exists of being able to query your data in that way. So it is a trade-off that you're making there.
A relational database also tends to be strongly consistent, sort of by default. So for creating a system of record, or some kind of system for which transactional integrity and data integrity is very important, a relational database can sometimes be the way to go there. Although many key-value databases do have strongly consistent modes that they can operate in for both reads and writes. So it's not so much a trade-off as something that is enabled by default in a relational database. And just something to keep in mind when you're using a KV database, like the trade-off that you're making for speed, for high availability, is sometimes made in terms of consistency. So the data that your clients are reading at any given time might be different, and you have to kind of design your system with that in mind.
So with those kinds of differences in mind, the kinds of use cases that you want to apply a KV database for would be use cases where either the data model is very simple and one-to-one mappings from a key to a value don't really represent much of a trade-off or a disadvantage. So use cases like I mentioned shopping carts where you have a logged in user and their shopping cart state is very tied to that particular user. The one-to-one mapping from user to shopping cart status doesn't really matter, so you really do get to take advantage of the fast reads and writes in a KV database. Or like a real-time game state of some kind where, again, you have a position in a game world that's mapped one-to-one to a particular player. Oftentimes, a KV database can be a really good use case for those types of things as well. But even if you're actually building a CRUD application that is reasonably complex, if you're able to use secondary indexes and some of the other strategies that we briefly touched on before, it's possible that a KV database is just going to be the easiest way to build some of those simple CRUD apps that we all build every single day. So simplicity of data structures, knowing how you're going to need to query your data ahead of time might be indicators that a KV database will be a good choice for the application you're building. On the other side, if you have a sense that you know your data is going to need to be queryable, maybe there's administrative interfaces where you can't maybe neatly predict ahead of time how your users are going to want to extract data from the system, a relational database might be a better choice for that type of use case.
5. Key-Value Databases: Options and Differences
If the data is naturally relational, a relational database is a better choice. For high consistency use cases, a strongly consistent system may be preferred. Let's explore some available options for KV databases. DynamoDB is a hybrid database with both document and raw KV data modes. CloudFlare worker KV provides a lightweight key-value storage engine. Redis and FoundationDB are open-source options with Redis being the most well-known. Redis offers a robust set of commands and data structure options. It can be used as a cache or primary datastore. Redis data is durable, but configuration is required. Redis supports transactions for writes.
If the data is naturally relational, where a user has many photos and groups have many users and the relationships between different pieces of data are a pretty important part of the data model, then a relational database is going to make a lot of sense and probably be a much better choice. There are also use cases for which high consistency is desirable, where if you're storing employee records or payroll runs, those types of things, it could be desirable to prefer a system that is strongly consistent most of the time. Maybe at the expense of availability, maybe the system being down for a period of time is less disruptive than that data being inconsistent for any reason.
Now that we've talked a little bit about KB databases and how they're different from other what their unique characteristics are, let's take a look at some of the available options out there for KB databases. On the left, we have the two primary proprietary KB data stores that are a part of a cloud offering, both on the AWS and the CloudFlare side. DynamoDB is an interesting database option that's in a space between a pure KB database and something like MongoDB, which is purely a document database, because it has actually both modes of operation where you can store documents in DynamoDB as well as sort of more raw KB data objects. So Dynamo, of course, hosted by AWS and kind of fully managed in that way. CloudFlare worker KB is another option where, in the context of a CloudFlare worker, you can access a very simple and lightweight key value storage engine there. And then on the right, we have a couple of open-source options, one of them being Redis. If you've heard of a KVE database before, there's an excellent chance that Redis is the one that you've heard of or the one that you've tried and used before. A lesser-known one is FoundationDB. It's an open-source project from Apple that they use for iCloud and a lot of other applications within their environment. And it's very similar to Redis in some respects, but it has some of the flexibility and durability that you would see from a DynamoDB or other more traditional databases. So Redis and FoundationDB, both being open-source options, if you're going to use them in an application today, there's probably going to be some kind of hosting provider involved if you don't want to run it yourself. For Redis, there's lots of different Redis hosts out there. A great one is Upstash, and actually, I believe, if you have used the Vercell KV, I think they actually resell Upstash as a part of that offering. So Upstash is a great hosted version. And for FoundationDB, Dino Deploy to implement the Dino KV API, which we'll see here in a second, actually uses FoundationDB in production to back that API as well.
Just really quickly breaking down some of the differences. They do tend to break down around both how the data is modeled and how you tend to interact with data in the database, and what the different guarantees and behaviors are with regard to durability, consistency, availability. So with Redis, I think Redis of the four options up here will tend to give the most robust set of commands and different ways in which you can store data in the database. Within the documentation, it actually describes itself as a data structure server, and I think that's true. There's lots of different ways in which you can approach structuring data in the system. They also have a very rich set of queries and commands that you can run for the data that is stored in Redis, and Lua scripting, because why not, I guess. It's possible to encapsulate some business logic and host that in your database as well. A couple things to keep in mind on the Redis side is that your data is durable in that Redis is an in-memory datastore, which is one of the reasons why it's so fast, and it can work really well as a cache in addition to a primary datastore. It can continually back up the contents of in-memory storage to disk, so it is reasonably durable after you write a value to the in-memory store. But there are some caveats there to understand, which definitely go beyond the scope of this talk. But some of the durability of Redis is something that you have to keep in mind in how you configure when and how the data that exists in-memory in Redis is sent to disk and then replicated across other nodes in a cluster, that sort of thing. Redis also does support transactions for writes.
6. Using KV Databases in TypeScript
Redis supports eventual consistency for reads, while DynamoDB offers a robust set of querying capabilities and supports eventual or strong consistency. CloudFlare Workers KB provides fast writes but no guarantee for writes, making it suitable for specific use cases. Foundation DB offers a transaction system and a choice between eventual or strong consistency on reads.
So you can have a reasonable amount of confidence that as you're writing data, nothing is changing out from under you in a way that would invalidate the transaction. And Redis does support eventual consistency for reads and not a strongly consistent read mode. So, basically what that means is it is possible for a client of a Redis database to write a value to the database and get a response back right away, indicating that that right is successful. But all of that could happen before data in your Redis cluster is kind of replicated across other nodes. So, there is a chance that a client somewhere else could be receiving a stale version of a piece of data that was modified by Redis. So, eventual consistency for reads is just kind of part of how the system works.
For DynamoDB, again, a pretty robust set of querying capabilities because it does support both document data and key values. So, you're able to do queries across actual fields inside of the data that you're operating with. And generally, there's a pretty flexible set of options for how you store your data and how you query your data once it's there. And writes in DynamoDB will be immediately durable after you make those requests. There is a transaction system for writes, so you can orchestrate more complex updates that way. And DynamoDB actually supports either eventual or strong consistency mode for reads. So, you as sort of the client can decide, make a decision about what is most important to you. Do you really care about getting the absolute latest version of a piece of data? Or are you okay with eventual consistency? And there's plenty of use cases for which eventual consistency is completely fine.
For CloudFlare Workers KB, it's a fairly low level API with like a get and set, and there's some amount of filtering on like the prefix of keys. But one of the reasons it ends up being really, really fast is because there isn't a ton that you can do in terms of filtering and querying and that sort of thing. Writes are immediately durable, like when you write a value to a CloudFlare KB worker database. But there is no guarantee for writes such that when you're writing to one key, it could be that in the course of what you would maybe otherwise want to have a transaction, that pieces of data could change out from underneath you. So in that sort of scenario, it's really important to understand your use case where how likely is it that data is going to be sort of written underneath you and how big a deal is that for you in your application. There are actually lots of use cases for which that isn't a problem or just as a practical matter wouldn't happen a lot. Especially like those shopping cart use cases where there really is just a one-to-one relationship between a key and a piece of data. So those types of use cases end up being a good fit. They also have eventual consistency for reads without a strong consistency mode. So it's very much possible that a piece of data that you're getting out of the worker's KV could be modified and maybe not the same version of the data that somebody else is looking at. So really, if you want to have, if you're okay with that eventual consistency and that right behavior, your reward is some pretty strong performance when you're using the API.
On the foundation DB side, it's kind of a layer above what I just described for CloudFlare Workers' KV, which is a few more different commands for filtering and sorting, but also on the durability and consistency side, maybe a little bit more similar to what's happening in DynamoDB, where the rights are going to be immediately durable. There is a transaction system for kind of ensuring that your data is written in a consistent way, and you do also have the choice between eventual or strong consistency on reads. So you as a developer can decide what's most important for a given scenario.
Okay, now that we've had a chance to look at the different types of KV databases, let's talk a little bit about how you would use it in TypeScript.
7. Integrating KV Databases with TypeScript
In TypeScript, using a KV database can be challenging due to the need to serialize and deserialize unstructured data into typed objects. Additionally, secondary indexes, relationships, and validation must be addressed. A recommended strategy is to implement a pure functional interface that operates on strongly typed objects. The service interface can handle saving and retrieving objects, while a library like Zod can be used for declarative validation.
Okay, now that we've had a chance to look at the different types of KV databases, let's talk a little bit about how you would use it in TypeScript. And the challenge with using a KV database in TypeScript tends to be in serializing and deserializing unstructured data into typed objects. And that's not a unique problem for KV databases. Like if you're working with a JSON API, this is a challenge that you've encountered in TypeScript before, but it is something you need to deal with.
There's also this idea of secondary indexes and relationships. There are some abstraction layers on top of KV databases that can help with this a little bit, but generally you're going to want to write your own service layer that sort of sits on top of your database and operates on those domain objects. And finally, you're going to need to address validation. So you're going to want to ensure that the data you're writing to your KV database matches the shape of objects as you intend in your TypeScript code.
So our general strategy here is going to be to implement like sort of a pure functional interface that our code can consume. And it just operates on strongly typed objects. And we can ask that service interface to save objects to retrieve objects for us. And the clients of that interface are going to be none the wiser that that data is coming from a KV database or what that service layer needs to do to make the, you know, make the data work for us in that way. And for validation, I actually like to pull in a third party library called Zod, which helps you sort of declaratively set up like what is going to be necessary for an object to validate correctly and also gives you a TypeScript type that you can export and use across your application to understand like what the shape of your object is supposed to be.
8. Demonstration of a KV Database Application
I'll briefly demonstrate an application that allows users to enter their preferences and save them using a key-value (KV) database. The application showcases the use of a secondary index to access data by either username or email address. This is a common usage pattern for KV databases.
So I'll just very briefly demonstrate this application. I'll show you what it does. We'll dig into the code a tiny bit. You know, the source code here is actually gonna be up on GitHub. So if you do wanna take your time and dig into it a little bit more deeply, you'll have the option to do that.
But for now, let's pop out to the browser. And here I just have a simple like form where you can enter a username and an email and like a UI preference. And when you click the button, your preferences are saved. And one thing that this sample application does is show how you use this a secondary index idea. So by just like appending this query parameter, you can, oops, I forgot the equal sign there. You can actually address the same data by either username or email address. So this is a fairly common usage pattern for a KV database and also a fairly typical kind of data that you would store in a KV database too.
9. Server Application and Object Setup
This part discusses the server application built on Deno and its options to use Deno KV or Redis. It covers the service facade, typing for objects, and the use of Zod for validation. The implementation of the secondary index is explained, along with the storage engine selection. The Deno KV implementation is shown, including the creation of primary and secondary keys and the atomic operation for setting them. Functions for storing and retrieving preferences are provided, with a focus on getting preferences by email. The code is recommended for further exploration, especially the comparison between Deno KV and Redis implementations. The part concludes with gratitude to the audience and encouragement to use key value databases in future projects.
So this is the project and within the code there's a server application, it's actually built on top of Deno. There's options to use it with Deno KV or with Redis and I'll be adding some more options as time goes on as well. And you can kind of see how I go about doing the service facade and also handle some of the typing for these objects. And most of the action there happens in this userpreferences.ts file and this is where I sort of set up the shape of my object.
So I have a TypeScript enum for like the UI theme and then I use Zod to actually set up a object for user preferences. So Zod actually has an email validator. There's a way to, you know, ensure that the theme is set to be one of the members of the enumeration that I had before. And also, you know, you can set minimum length requirements for username and things like that. And one thing that's really nice is it just exports a TypeScript type that has the type of information that you would want like as a consumer. It's like what you would get if I just like created an interface, but instead of just getting that, I also have this validation logic here.
And then as a part of this interface, I have a store preferences function and this implements that secondary index that I was talking about before. So I start off here by parsing a input object and it'll throw errors if there's something amiss with the shape of that object. And then depending on, you know, the storage engine that's selected, I have this function which stores that actually to the kv database. And if I open up the implementation for Deno kv, it uses the built-in kv database that's a part of the Deno runtime. So if you install Deno, you just get it for free and you can just use it by calling await Deno open kv. And I create both my primary and secondary key here for preferences and preferences by email. And then I do an atomic operation where I'm going to try to set both of those keys at once. And if either of those things fail, then the entire transaction fails. So I have both the preferences set up, which does actually have the data for the object. And then I set up my secondary key around email there. And similarly for getting preferences or getting preferences by email, I have functions for both of those options. And for get preferences by email, I start off by getting the primary key for the object, which is stored by email. And then I look up the source of truth from the primary key, which is associated with a username. So again, I do recommend that you check out the code for this. You can play around with it. And there's a version that uses both Deno KV and Redis. So you can see how those two implementations differ. So I definitely encourage that you check that out. But we are a little bit short on time. So I'm going to go ahead and wrap it up there. And once again, say thanks to everyone for hanging out at TypeScript Congress today. Hopefully you learned a little bit about key value databases. And I hope that you have a fun time using them in your next project.
Comments