But what we don't see is like who owns it before, so the historical state is not well, I mean, like it's there, but it's not easily accessible. But we know it's, there are some transactions down in the blockchain that changed the state, but to compute back the state, like how was it? I know, like three months ago, basically we would need to redo all these transactions in an opposite way. So the blockchain is an event-sourcing system similar to Redux, if you probably know Redux. Now it is not so cool anymore, but yeah.
Cool, yeah, and I already touched a little bit on the problem that we don't have the so-called provenance, especially for NFT, so provenance actually, it's a term that comes from art and in art provenance means like when I buy, I don't know, like a very expensive image or a picture that I wanna hang in my villa that I don't have. Then I wanna know who owned it before and also like what's also part of the provenance is actually, is it not a fake one, is it the correct one? So when I see that one that was owned by this gallery or that like very famous collector, then, yeah, the chances are high that I don't have a fake one. And on the blockchain, yeah, the provenance kind of record it, but it's hard to get. So what we don't really have is this time machine. So, I mean, theoretically possible the data is there, but you need to understand, like that the Ethereum blockchain for example, when we run a full archive node that has like all the blocks stored since Ethereum started, we end up with a storage capacity of three terabytes currently roughly. But that's even like heavily compressed. So then to access that data, we need to uncompress and stuff, so that makes it slow. Plus, yeah, and you can think, you can see like there are always new blocks produce, so that probably doesn't get away, right? Like we always have, we always need to think about where to store that data and how to access that data. And then to really get the historical view, it gets a bit hard. So that's kind of one of the problems that we see with blockchain data.
The other one is actually the incentivization problem. So, for which is linked to the other one, so we have this idea, or especially in the early days of this fully decentralized apps. That means we have a front end that we can distribute the code. I mean, basically it's HTML, JavaScript and CSS, we can distribute it however we want. I profess is one of these technologies that's worked very similar to a BitTorrent. And then the idea was that everybody runs a blockchain node on their computer, and then the front end directly interact with the blockchain node. This is the bad dream of blockchain researchers is this fully decentralized thing. The cool thing is there are never scaling issues with such a setup, regardless of 10 or 10 billion of users, although 10 billion will be a little bit too much. It scales because there's no server, everything works on their computers, so it's only the blockchain nodes that interact with each other. That was the initial idea, but there's also a problem with the architecture. And also because it went a little bit the way, nowadays almost nobody runs a node on their own computers. People usually rely on third party node providers. When we use Metamask, for example, in the background, we usually use Infura, which is one of the companies that works with Metamask for that stuff. And then when we write data onto the blockchain, we also pay transaction fees in gas. And that kind of incentivized the whole system to work. So the validators get the transactions in, then they can keep some parts of these gas fees and yeah, that's why they bring their blockchain to the chain. But reading from the blockchain is kind of an unused problem or it's not incentivized on the protocol level, for what they say. So that's also where a lot of companies spun up, as I said, like in info around all those networks with mental masks to provide access to the data, but also they need to find a business model around it. And so there is this whole thing. And it's not easy to get to that data. Also, like what I said before is on the blockchain, we store the same space that's ordered by, yeah, like the timestamp, with all the other smart contracts and everything else that goes on on the blockchain. So in order to see, so for example, Maker, let's say Uniswap is probably the most common name here. So in order to see what happened on Uniswap, we need to look at all these transactions and just filter out the ones for Uniswap because all the other protocols share the same block space. Yes, and then that ends up with getting, and also the protocol or the interface to get that data is called JSON-RPC. That's a very low-level thing to interact with a blockchain node. It's really like an RPC. I can call simple functions, but not really nice data extraction or I don't know like what else would be like indexing instructions and so on and so forth. It's very low-level. So that's actually code that I saw in the wild, so when we wanna try to get a token ID, which has all the tokens by an owner, we see all these await statements here, and a lot of them are actually a call back to the JSON RPC endpoint, and maybe it take 100 to 200 milliseconds to resolve. And then we get into stuff like this, like, okay, I get to first this one and I get this one, and then I wait. And if someone has 20 tokens, and for each token, it takes 200 milliseconds, we end up with a page that takes multiple seconds to load. And we know that users, as soon as we go above 100, 200, 300 milliseconds, they usually get impatient, although people that are using Web3 or blockchain applications are a little bit more used to slow waiting time, but I don't think that's a good excuse. So this is kind of the problems. The data extraction, it's not really incentivized. It's the protocol, the underlying protocol is very basic, and so it's quite a hard task. And that's exactly where the graph comes in.
Comments