So think about this standard producer and consumer, if you just use Kafka.js or any client that you decide to use. The normal producer and normal consumers, they don't have strong guarantees. What I mean by that, it doesn't guarantee to you that you're going to send a single message on the producer's side, it might have duplicates. It's what we call at least once. That's the guarantee, which means you can have duplicates. And in some systems, you cannot do that. If you're depositing money, you should not do that. Especially with drawing money from a customer, you should not do that. And on the client's side, you can also have duplicate processing. So the defaults, if you use, that's what you can get. There are solutions for that.
So Kafka offers exactly one semantics. That's the EOS there. And transaction boundaries. And it's very common that you have a pattern like that, that you have a processor, a message that initiates as a process, you want to consume the message, do some processing and produce the message to another topic in a single transaction. So you want to make sure when you consume, do processing, something goes wrong. If you reprocess or restart your system, that you process that message again, because you didn't finalize those three steps, which is sending that message to the next step. And for these, you have to do some configuration in Kafka.
And from the producer's side, you have to set idepotence to true. That guarantees that it's not going to be duplicated and sent. And on the consumer's side, you have to disable the auto-commit offset, which means your client reads the message, but now you have the control. When you want to tell Kafka, I really processed this message, and you commit back the offset. And you want to actually set one property called max in-flight request to one, so you don't have parallel processing and you can keep the ordering guarantees. And on the producer side, on the last step, remember, it's consuming and then producing, and you want all this to be in the same transaction. What you want to set is a transaction ID, so the broker can set the transaction boundaries and say the point is true as well. And what this guarantees is that when you consume, process, and send, it's part of a single atomic transaction. Kafka client, this is not a distributed transaction, like XA transaction. It's not guaranteed if you do a database call on the side that will be rolled back. You have to take care of that. It's on the boundaries of Kafka. It's the same as your normal database transaction between two tables that you are used to, so just keep that in mind.
Some configurations of the cluster, and I'm also done because my clock is blinking already. You should have at least three partitions and at least two in sync replicas for your topics for this to work. So if you try locally and you don't have it, it will not work, and you also have to start the transaction like you can see that, start a transaction, send a message, send the offsets, and then commit the transactions, which means everything is going to happen or it will be rolled back. And that's basically what you want to do, and it's really important to take care of that. That's it. If you want to know more about this, I write in this specific blog post about this, and I also have docker compose where I have multiple Kafka nodes where you can play with this type of more advanced work in your local machine.
Comments