Video Summary and Transcription
In this talk, the importance of choosing the right database for your application is discussed, especially in the context of Fullstack development. The ACID properties—Atomicity, Consistency, Isolation, and Durability—are explained in detail, emphasizing their role in data integrity and reliability. Distributed transactions are highlighted, with a focus on the CALVIN protocol used by FAUNA. This protocol ensures global replication and data integrity across multiple cloud vendors. The talk also covers various database options like MongoDB, DynamoDB, Firebase, and Cassandra, with specific advantages and challenges of each. For those interested in deeper insights, resources like the CMU Database Group YouTube and jepson.io are recommended. The ultimate goal is to have a fast, reliable, and serverless database that meets specific requirements. FAUNA offers a free tier and support for those willing to try it.
1. Choosing the Right Database and Understanding ACID
In this talk, Tyler Hannon discusses the importance of choosing the right database for your application. He introduces the ACID acronym, which stands for Atomicity, Consistency, Isolation, and Durability. Tyler explains each component of ACID and its significance in maintaining data integrity and reliability. He also highlights the challenges and benefits of distributed transactions in modern Fullstack applications.
Excellent, so thanks so much for joining me today. My name is Tyler Hannon, and I look after community, and support, and customer success at Fauna. My background is actually in distributed systems and databases, so an event like JS Nation is not where I am most comfortable, but similarly, my expertise is perhaps where you're not most comfortable. And that's why I titled this talk When Worlds Collide.
We are in a sociological shift in the Fullstack world, where a web server was adjacent to the database, to this new world with global clients and global APIs, where latency and consistency at the edge is critical. But unfortunately, we're all being marketed at. So when you think about your application, the reality is we just want to dump data somewhere, and we want to read data from somewhere. And I guarantee that either in a diagram you have on your own or somewhere inside of the company you work for, there is this 1980s looking bit of clip art that represents a database with the notes, insert the logo here. And that's what I want to talk about today. What is important to consider when you choose what logo to insert into that database diagram?
We know that the database is often a bottleneck, and just like so many database vendors are talking now, we also know that distributed transactions are the solutions, but distributed, what even now? And what is this thing, ACID, that database vendors keep talking about? It is not related to the thing you can find here in the city that I'm from of Amsterdam. It's something entirely different. In fact, it's an acronym and acronyms are weird. So I thought I would take a little bit of time to walk through this ACID acronym with you. So the first is A, atomic or atomicity. Basically what it's saying is that all changes to the data are performed as if they were a single operation. That is like everything happens or nothing happens. The canonical example for this kind of workflow is financial services. If I am debiting an account that debit must be made successfully and the credit must be made successfully but that's a lot of words. So I like to think of the atomic property as all or nothing. There's C for consistent. Data is in a consistent state when the transaction starts and when the transaction ends. Again, to use our canonical example, this would be in an application that's transferring funds from one place to another. This property ensures that the total value combined is in the same state both before and after the transaction. It is valid before, it is valid after. I is for isolation in that the intermediate state of a transaction is invisible to others. Basically, things happen one at a time or in parallel, but regardless, the result is the same, and D is durable, or durability. When the transaction is complete, the changes are persisted and are not undone. This is probably what you are most familiar with. Once it's complete, it survives in outage. Now, there are a wide variety of database vendors that talk about distributed transactions.
2. Choosing the Right Database Approach
There are various approaches to solving database challenges, including Mongo, Dynamo, Firebase, Cassandra, and FAUNA. FAUNA's approach is based on the CALVIN distributed transaction and data replication protocol. CALVIN ensures data integrity and provides global replication, allowing for temporal capabilities and integration with multiple cloud vendors. For those interested in learning more, resources like CMU Database Group YouTube and jepson.io offer valuable insights. Ultimately, the goal is to have a fast, reliable, and serverless database that meets your specific requirements. Try FAUNA for free and reach out for any questions or support.
There's many approaches to how to solve this. There's Mongo's approach and Dynamo's approach, and what Firebase talks about, and what Cassandra talk about. What we at FAUNA talk about as well. Our approach, for what it's worth, is based in a distributed transaction and data replication protocol called CALVIN. This is the title of the paper. CALVIN Fast Distributed Transactions for Partitioned Database Systems. If you have any interest, I encourage you to read about it. Because ultimately, understanding how your requirements are met are important in ensuring that they are met completely.
In simple though, it's a multi-document transaction, and it looks a little bit like this. I have built a compelling, awesome app in some flavor of JavaScript using the frameworks and tools that I trust, be that React, or Next, or whatever. My app has a piece of data, and my friendly little helper in my app goes to the database and says, hey, here's data. And that data falls into the transaction log. That transaction log is then replicated to another site globally. The replication is acknowledged. There's a whole bunch of checks, and maths, and confusing things that go into place. And then the data is written. As that data is replicated globally, because it's in the transaction log, it is returned to the application as complete. This also allows for some really cool capabilities around temporality, knowing what happened at what point in time. And it allows us to reach out to a variety of cloud vendors in a variety of regions around the world, some of which are available today and some of which will be available in the future.
But your journey is just beginning. I encourage you to look at the CMU Database Group YouTube if you're interested in how these sorts of things work. To look at jepson.io and the amazing talks that Kyle Kingsbury has done and the work that he has done in breaking distributed databases and making us as an industry better for it. But at its simplest, you just wanna dump your data somewhere. You just wanna read what you wrote. You just want it to be fast everywhere. You have described a geographically redundant distributed system. And because you don't want to be an operator, you've described a database that's built for serverless. You just want it to work. We want that for you to try it. It's free, there's a free tier. It's great. Also feel free to hit me up. I'm at Tyler Hannon. You can also find us at at fauna. I am marketing to you. You are being marketed to all the time. Know that your journey of understanding about distributed database fundamentals begins with knowing what you care about. Thank you so much, and I'll be in chat if people have questions.
Comments