And as we speak, things are really happening right now. So it all started with, you know, back a couple of years ago, I got an email from the release note from the DJ software that I'm using, and they're saying something like, here, dear DJs, we are now able to provide you a unique technology that will allow you to separate the sources of your track, and by that, you know, be creative and do something with it. And I thought to myself at first that, well, it's not so interesting. I mean, probably it has been solved already. But you know, it was like post-Covid era, there were still limitations, and you know, limitation on crowding and everything, so I really didn't pay attention to that.
And recently, a friend of mine came to me and she said, I want your help to separate the vocals out of some track that I have. This is a very old track, there are no studio versions or something. What can I do? And you know, sometimes I have my equalizer here, and I can play and, you know, in some manner reduce the sound of some sounds, or enhance the sounds of others, but it's not really creating a karaoke version of, like, peeling apart the layers. But suddenly I remembered that I have this tool in my DJ software, and I read, you know, the step-by-step guide of what to do, what do I need to configure, click a few buttons, and boom, I had it. And I was, you know, it was nice, she was happy, but then I, like, played with it with another song and another song, and I was actually, it wasn't just nice. I was amazed by that, and everything was happening in real time.
And this is something that was not on this release note, by the way, but actually, or maybe it is, but I didn't read the entire thing. But actually I was amazed, so this really triggered my engineering part of the brain. And, you know, I started, what do I do? I want to know how things are happening, I go to Google. And I look for music source operation using neural network, and I downloaded an article, read it, another article, read it, downloaded the dataset, downloaded the Python code, trained the model myself, and then I was testing it with another track and another track and another track, and I was actually mind-blowing about this technology. And after a few hours of playing with it, this is how I look like. Like, I discovered, you know, a whole new world came out to me.
So, the first thing is how we model sound, okay? What sound is? So, sound, eventually, is like changing in air pressure caused by air molecules' vibrations. Our ears are sensitive to those vibrations, and eventually this is what our brain perceives as sound. So computers does something similar called sampling. I'm not going to dig into this technique because of time constraints, but the computer measures the amplitude levels of those vibrations. Eventually, what we get is a waveform, which is the most common visual representation of sound, but actually this waveform holds multifactorial information about the sound. The first thing is the frequency, okay? If we zoom in, we can get the frequency of the sound. Second thing is the intensity of the sound. The intensity is measured by a squared, like we are taking a squared area of the waveform, and we see what is the peak in proportion to what is the minimum and the maximum points. And then we have something very important, which is the timbre of the sound. And the timbre is something that also considered as the tone quality or the tone color. It's not the quality like how clear I hear the sound, it's the tone quality of like overtone of different instruments overtone each other. For example, if I'm playing a C chord at the same time I'm playing a C chord in a guitar, at the same time someone plays a C chord at the piano, I want to be able to distinguish between those instruments, and this is something very hard to do for computers. Actually, if you think about it, our brain can do it pretty much instantly.
Comments