So with this I wanted to move to a definition of Apache Kafka, and I know definitions are really boring, however, I wanted to be us on the same line so that we kind of can understand each other. So Apache Kafka is an event streaming platform that is distributed, scalable, high-throughput, low-latency, and has an amazing ecosystem and community. Or simply put, it is a platform to handle transportation of messages across your multiple systems. It can be micro services, can be IoT devices, can be a teapot in your kitchen sending information about the water to your mobile phone, so anything.
Apache Kafka platform is distributed, meaning that it relies on multiple servers with data which is replicated over multiple locations, making sure if any of those servers go down, we are still fine. Our users can still use the system. It's also scalable, so you can have as many of those servers as you need and they can handle trillions of messages per day, ending up in petabytes of data persistently, and that's the word that's important, persistently stored on the disks. And also what is awesome about Apache Kafka is its community and also a wide ecosystem, including the libraries, you'll see JavaScript later in action, and also the connector so you don't really have to reinvent. It exists already for decades, so there are a lot of connectors which are already built making it easy to connect Apache Kafka with your systems as well.
So, to understand how Apache Kafka works and more importantly, how we can work effectively with Apache Kafka, we need to talk about Kafka's way of thinking about data. And the approach which Kafka takes is simple, but also quite clever. Instead of working with data in terms of static objects or final facts, final set of data which is stored in a table, in a database, Apache Kafka deals with entities described by continuously coming events.
So in our example, for our online shop, we have some products which we are selling. And the information about the products and their states, they can store in a table, in a database. And this gives us some valuable information, some final compressed results. However, if after you store the data you come up with more questions about, I don't know, the search trends, the peak times for some products, you can't truly detect that information from the data you stored unless you planned it in advance. So, we can see that data in the table as a compressed snapshot and one-dimensional view or a single dot on an infinite timeline of the data.
What if instead you can see this data as a flow of events. For example, a customer ordered a tie. Another customer searched for a donut. Then we dispatched the tie to the first customer and the second one decided to buy the donut. And so on, we have more events coming to the system. So, instead of seeing the single data point, we see the whole life cycle of product purchase. What is more, we can replace those events. We can't really change the past events, they already happened, but we can go and replace them again and again, and approach the data from different angles, and answer all the questions which we might have in our mind even later. And this is called an event-driven architecture, and I'm quite sure many of you are familiar with that. But let's see how Apache Kafka plays with event driven architecture. So here in the center I put the cluster, and on the left and on the right we will see applications which interact with the cluster. So Apache Kafka coordinates data movement and takes care of the incoming messages. It uses a push-pull model to work with the data, which means that on one side we have some structures which will create and push the data into the cluster.
Comments