Welcome, everybody. Thank you so much for joining me for this talk. I am very excited to be giving this in a lightning talk format. I've given the same talk before in a full length format and I've condensed it down into just the necessary pieces. So, looking forward to see how this will go. If you have any questions about the talk after I've given it, feel free to shoot me a message on Twitter and I'll be happy to answer any questions you might have.
But before we get into the meat and bones of this talk, let's talk about the bigger picture concept that the world itself runs on data, whether it's your cell phone, whether you're using Facebook, or maybe your refrigerator or whatever you have that's connected to the internet, everything runs on data and as developers, it's actually our job to manage this. So it's a heavy load to put on our shoulders, but that's what we signed up for when we became developers is to actually take this data, do something with it, and spit it out in a format that other pieces of software can use.
So the TLDR of all this is that in order to manage a set of data, you have to have some sort of knowledge about its structure and its purpose. You have to know why you're dealing with your data and why you're doing what you're doing in your application's code to your data. So to revise this original statement, not only is it our job to manage this data that the world runs on, but it's also our job to at least to some degree understand it.
And this is hard because in general, data is hard to model. As technical people, we have a lot going on. We're doing a lot of technical things. We're developing applications. We have a lot of this knowledge to keep in our heads. There's not a whole lot of room to understand the whole data domain of whatever industry you're working in at the time. So a couple of other reasons though why your data is hard to model, as your data flows through different areas of your application, you have to know how to interact with this data. So it needs to be modeled in a way that with different pieces of your application. Your data model may change as your application evolves. So as new requirements come up in your industry, you may have to evolve your model a bit and doing that in a way that's safe for your application can be difficult at times. Another one, and this is a big one, is that your data may not have been modeled by you. And I would also revise this to say that your data probably wasn't modeled by you. You're probably consuming data from someone else and using it within your own application.
So for all of these reasons, we as developers came up with this idea of schemas, which is a way to clearly and concisely represent your data model. But there's still a problem, even with schemas. Schemas are now everywhere, so we've solved this problem of being able to model out our data in a way that makes sense. But now that we found a good solution, we're using it everywhere, and the original intent for the schema is now lost. So what the schema is supposed to be is a source of truth for what your data looks like. But as you start adding different schemas everywhere, it begs the question, now what is the source of truth? So this causes the problem that you now have multiple perceived sources of truth, and each schema may describe your data a little bit differently, which is what probably causes the question, what is the source of truth? And also, each schema has a different role.
Comments