Video Summary and Transcription
This Talk provides an overview of an open-source guitar tuner project, covering topics such as pitch detection, web APIs for microphone access, implementation with React and XGS, and data visualization. It explores various pitch detection algorithms, including zero crossing, fast Fourier transform, and autocorrelation. The project utilizes the MediaStream API and Audio API to access user media. The implementation is done with React and Next.js, and includes a utility library for performing pitch estimation and detection. The Talk also discusses future improvements for the project, such as noise cancellation and chord detection.
1. Introduction to the Guitar Tuner Project
I am Omar Job, the Technical Lead at Learn, an Italian company aiming to make digital competencies accessible to everyone. Today, we will have an overview of an open-source guitar tuner project. We will explore pitch detection and estimation, web APIs for microphone access, implementation with React and XGS, data visualization, and discuss issues and improvements. This project was born out of curiosity and aims to explore new territory. It's open source, built with React and extraJS, and utilizes the web audio API.
Hi, everyone. It's a pleasure to be here and I cannot wait to share with you this topic. So, let's start with the presentations. I am Omar Job and I'm Technical Lead at Learn, which is an Italian company which aims to make digital competencies accessible to everyone. In my free time, I have a lot of passions, one of which is music, and I love to bother my neighbors playing guitar. And that's why we are here.
So, today we're going to see a lot of things. We will have an overview about the project, which is an open source guitar tuner. We will see the basics of pitch detection and pitch estimation, so how to detect the frequency of a note that's played. And then we will see the web APIs involved in the project, so how to get access to a user's microphone. And we will see even the actual implementation with React and XGS. And we will see after that the data visualization part, so how to display the information that we are retrieving. And after that, finally, we will talk about issues and possible improvements.
I would like to start with a question that is, do we need another guitar tuner? Well, there's plenty of guitar tuners online. You can find a lot of applications and download them for free. So, which leads to another question, that is, why? Why did we come up with a guitar tuner again? Well, actually, I wanted to test myself and I was curious because I wanted to see what was under the hood. I wanted to see all the mathematics involved in pitch estimation and detection. And I want to see if I could build a guitar tuner from scratch. So, this is a project made out of pure curiosity. And disclaimer, I'm not a mathematician. This is the result of my researches and my curiosity. So, this is a project that aims to explore something that I did never explore.
The project is open source. You can find it online. You can check the code and I will share it with you after the talk so you can explore it. It's made with React and extraJS. And it uses the web audio API. The interface is pretty clear. It detects the note that is being played. And as you can see, the indicator moves when I play a note.
2. Basics of Pitch Detection and Algorithms
The basics of pitch detection involve understanding that notes are represented by frequencies in hertz. A reference frequency, A440, is used for tuning. The guitar signal, represented by a wave, is often noisy, making analysis challenging. Various algorithms exist for pitch detection, including zero crossing.
It's very accurate. It can be improved, but for this stage, it's very, very good and it works very well. So, what are the basics of pitch detection?
Well, I want to align everyone on the topic because if you play an instrument, and I noticed that a lot of programmers do play instruments, but if you're not a musician, this topic can be tricky to understand. So, I want to align everyone on the topic. So, we start with the notes.
And I tell you that notes are represented by a frequency, which is measured in hertz. As you can see, in this table, we have a lot of notes. Every note is represented by a number, which is the frequency. On the top row, you can see the note names that go from C to B. And on the left column, you can see some numbers that go from 0 to 4. These numbers are octaves. An octave is the same note repeated, but with a different frequency, and each octave doubles the frequency of the note. In other words, if you go on the A column, on the A440, which is highlighted, you can see that if you go down with the rows in the column, the frequency is doubled.
So, the A440 is the reference frequency for tuning. So, a lot of tuners use this frequency to tune all the instruments, and this is A4, so the A in the octave 4. Guitar has six strings, so every string is played as a note. So, as you can see here, the six strings is an E note in the fourth octave. The first string, for instance, is an E in the second octave, and as you notice, the frequencies are very very different. Our goal is to detect the frequency that is being played, and tell the user which note he is playing in order to tune his guitar.
So, the guitar signal can be represented with a plot. So, in sound, the signal is represented by a wave. As you can see on the left, we have a clean signal, which is an A4, so A440 Hz. As you can see, it is periodic and very clear. On the right, you can see the guitar signal, which is not so clear, but can be very noisy and tricky to perform analysis on this kind of signal. This leads to our algorithms.
We have a lot of algorithms to perform pitch estimation and pitch detection. These are three main algorithms that I've studied, and I'm going to show you every algorithm step by step. I will point out the pros, the cons, and which one I used to perform the actual pitch detection in the project. We will start with zero crossing. Zero crossing starts with this kind of plot, so the clear signal.
3. Pitch Detection Algorithms and Web APIs
Zero crossing is an algorithm that counts the number of times a signal crosses the zero line to determine frequency. Fast Fourier transform converts a signal from time domain to frequency domain, but can be challenging to interpret. Autocorrelation compares a signal with shifted versions to obtain a clearer plot. The project uses the MediaStream API and Audio API to access user media.
We are applying zero crossing to a clear signal. Zero crossing is an algorithm in which you count the number of times the signal crosses the zero line. Why do you do that? Because the frequency, so the Hz, is the representation of one cycle per second. Every cycle, a clear function, a periodic function, crosses the zero lines two times. So, if you know the number of crossings and you divide it by two, you get the number of oscillations, the number of cycles, and you can divide the number of oscillations by the seconds and obtain the exact frequency in Hz. This method is pretty straightforward on a clear signal, but can this work on a guitar signal? Well, no. The answer is no, because as we've seen earlier, this kind of signal is very, very tricky, is very noisy, and is not always periodic, so it's difficult to apply this kind of method.
Let's see if another method comes to our to the rescue. So, fast Fourier transform. Fast Fourier transform is a method that applies the discrete Fourier transform to a signal. So, it converts the signal from its original domain, that is time, to the frequency domain. What do I mean by that? Let's make it clear. As you can see here, on the x-axis you have the time on the signal, and the y-axis you have the amplitude, so the measure of the intensity of the signal. On the right, the signal is converted in this kind of plot, in which on the x-axis you have the frequency, and on the y-axis you have still the amplitude, but as you can see there's a spike around 440. So, the signal that is A4, 440, is converted in a plot that has a spike. So, it's clear that this frequency is this one. So, on the right, on the x-axis, you have the frequency, and you can easily detect the frequency. But can this work with a guitar signal? Well, it can, but it's very complicated to understand which spike to choose, because as you can see here, you have a lot of spikes and it's not very clear.
Let's move to the last algorithm that's the one that I've used in the project, which is autocorrelation. Autocorrelation is an algorithm that is used to compare the signal with the shifted version of itself. What do I mean by that? If you look at the top plot, you can see that the signal is shifted and repeated across the time, and on the bottom plot you can see this function is traced. So, when the signal is the same, the function has a value of 1. When the signal is completely different, it has a value of minus 1. So, if you go on, and you go forward shifting the signal, you obtain a plot. From this kind of signal, you obtain this kind of plot, which is easier to understand and is very clear. So, this algorithm, we can say that it's used to clear the signal and to track a plot that's more readable and more accessible to calculations.
Let's move on and see the web APIs involved. The main APIs that I've used are MediaStream API and the Audio API. Let's take a look at the MediaStream API. The MediaStream API is an API that allows you to access user media, so to the audio of the user, to the microphone, or to the camera.
4. Implementation with Audio API and File Structure
The project uses the Audio API, which provides interfaces for decoding, processing, and analyzing audio signals. The implementation is done with React and Next.js. BrowserAudio.ts is used to access user media, while Tuner orchestrates the logic between components. Pitch Detector is a utility library for performing frequency estimation, pitch detection, and auto-correlation.
You have a lot of settings that you can use and to choose from, such as noise cancellation or auto gain control, for instance, but I've used only these three settings because in the documentation there's plenty. You can go check it out, but this is what I needed.
Then, user is prompted by this API, when you call it, with a pop-up. So, it asks the user to access microphone. User clicks yes and you get access to the microphone.
Then, we have the Audio API, which is an API that exposes a lot of interfaces, one of which is the audio context, which is an interface that allows you to decode and process the signal and to perform calculation. Then, after the audio context, I've used the createAnalyzer method. There's another method that allows you to perform real-time time and domain frequency analysis, so you get access to the signal and you can analyze this signal.
As you can see in this image, you have the signal, you insert this analyzer node, which is a node that goes between the signal. You perform some calculation, some analysis, but the signal is unchanged. So, it does not alter the signal. It performs only analysis and it is used only for analysis purposes, not for changes.
The actual implementation is made with React and Next.js. The file structure is the following. I have created three main files, one of which is BrowserAudio.ts that is used to access the user media, so all the stuff that we have seen earlier. Then, we have Tuner where all the logic is orchestrated between the components. This is the main component of the application. Then, we have Pitch Detector, which is an utility library that is used to perform calculation, frequency estimation, pitch detection, and auto-correlation.
Let's see BrowserAudio.ts. This class is very straightforward. You have two attributes that are AudioContext and Analyzer that are the methods that I'll talk about later, so the interface that we are going to use to get access to the user's microphone. Then, we have this method that I call GetMixStream. You use the MediaDevices API, WebMedia API, to access user media and it's done. The user gets prompted and you get access to the signal.
Then, in Tuner, you create instances of the BrowserAudio.ts and then we have this Buffer, which is an array, a float32 array, in which we are going to store all the data related to the signal because the AudioContext gives us the signal in this kind of form, so it's converted to numbers. After that, we instance the AudioContext and the Analyzer in order to get access to these interfaces. Then, we have a method that is called StartTuner that is used to access the MixStream, so the method that we've seen earlier. You get access to the MixStream and you perform all the calculations that you want. After that, I set the source, which is a state variable with the AudioContext, so I get access to all the signal and then SetListening is only a state variable used for display purposes to show only an array or something. Then, this effect is used to perform the custom estimation, in which I have an interval that runs every one millisecond and calls this function, which is source.connect, in which I connect the Analyzer, that I've seen earlier in the image before, into the signal in order to perform estimation and frequency time analysis.
5. Pitch Estimation and Detection
Then, the audio signal is converted into numbers representing amplitude. Autocorrelation is performed to obtain a more readable plot. The frequency is then calculated, and the corresponding note is determined based on the frequency. The main utility class performs pitch estimation and detection using autocorrelated values.
Then, this interval goes on and keeps calling this function, so you connect the audio and start getting the signal, because GetPitch, it's a method that is used to perform estimation on frequencies. This is GetPitch. So, you use this method that is Analyzer, GetFloatTimeDomainData that does the thing that I've said earlier. So, it converts all the signal data into numbers in this buffer, which is a float32 array, and, as you notice, every item of the array has a number and represents the amplitude of the signal, so it tracks a plot. So, this plot can be seen as this array. It reminds me that.
Then, after that, after having this array, I can perform autocorrelation, because I need to clear this kind of plot, into this kind of signal, and get something that is more readable. So, I perform autocorrelation, and, as you can see, the array on the right, CorrelatedValues, is easier to understand, because it has all numbers between 1 and minus 1, and if you track a plot with these kind of numbers, you obtain the plot that we've seen earlier with the autocorrelation function. So, it's more readable. This function performs autocorrelation, then you can calculate the frequency after having that kind of plot. Once you have the frequency, you can tell which note it is, because, as we said earlier, every note is represented by a frequency. So, if you know the frequency, you know which note it's being played. So, you set the note. The note is this. I created this type, which has the name, an octave, so the number of octaves the note is, the senseOf, and the frequency. SenseOf are the number of sense that the note is off from the fundamental frequency. For instance, if the fundamental frequency is 440, and the user is playing 435 hertz, the note is off by 5 cents. So, I tell the user that it's off, so I can point the indicator in another way.
Then, this is the main utility class that is used to perform pitch estimation and pitch detection. So, we get autocorrelated values, and these functions normalize. The normalized function is used to get all values between 1 and minus 1, so to normalize all the data that we have, and it performs max-absolute scaling, which is an algorithm that finds the maximum absolute values and divides all the elements by this maximum value. So, after performing this function, you get all the elements between 1 and minus 1. So, it's more readable, and it's easier to perform calculation on this kind of data. Then, we perform the autocorrelation with this method, autocorrelation with lag, and if you notice, there's this kind of variable that's called rms, which is root mean square. What is root mean square? It's the measure of the magnitude of the loudness of the signal. So, I use this variable to store the magnitude of the signal and filter the signal if the signal is below a threshold. So, I want to filter out all the data that is not loud because it can be even me making noise on my desk or something like that. I want only the notes that are loud, and root mean square is calculated by performing the square root of the quadratic mean like this formula. So, you obtain the value that represents the loudness of the signal. This is the main function, the autocorrelation function.
6. Signal Conversion and Frequency Detection
The audio signal is converted into code using a mathematical formula. The frequency of the signal is detected by calculating the highest peak in the function. The frequency is then used to obtain the note type. The user is informed if they are off the original frequency by a certain number of cents.
As you can see, it's the same as the mathematical formula in which you shift the signal and multiply it by that, and then you sum this multiplied value all in a series. So, it's this mathematical function. I only converted the mathematical function into code. You can check it out on the repo.
Once you have the signal that is clear, so the periodic function, you can detect the frequency. This method is the get frequency one. I could have used, for instance, even zero crossing method, but I decided to go with another method in which you calculate the highest peak in the function. In this case, it's the red one, and if you know how much time has elapsed between a peak and another, you can easily calculate the frequency by dividing the sample rate, which is the time elapsed between one peak and another, by the largest peak index. So, you get the frequency.
You have this utility which is get note from frequency. So, from the frequency, you obtain the note type that I showed you earlier. So, you get note name, the octaves, send off, and the frequency. You can see here there's a method called get MIDI number from pitch or even get send off from pitch. So, I wrote these methods, but I did not come up with this method because they are standard in the industry. Then, after you use this kind of method, so get note from frequency, to obtain an object which is the note. So, you have the name which is A, the octave which is 4, the sends off which is 5, and frequency which is 435. So, I tell the user that he is five sends off from the original frequency.
7. Improvements and Conclusion
I studied the topic extensively and provided the formula reference and source code in the repo. After obtaining the note object, the user's tuning accuracy is determined. Data visualization is achieved using the React Go chart library. The project has room for improvement, such as resetting the indicator when no note is detected, noise cancellation, and chord detection. Exploring other pitch detection algorithms like YIM is also an option. Feel free to check out the project, study pitch estimation, and connect with me on social media.
So, I did study a lot about this thing and I inserted for you even in the repo, in the code, the reference and the source of this formula. So, you can go check them out, read them, try to implement them, and come up with your solutions. But this is standard.
Then, after you use this kind of method, so get note from frequency, to obtain an object which is the note. So, you have the name which is A, the octave which is 4, the sends off which is 5, and frequency which is 435. And here, in this case, user is not in tune because he's playing 435 frequency, but the frequency of the reference frequency is 440. So, I tell the user that he is five sends off from the original frequency.
Let's move on to the data visualization part and let's see what we can achieve with that. So, I wanted to do it in a simple way because the main goal was the mathematical part and the visualization part. So, I decided to go with a library which is called React Go chart. It's very easy.
You give to the, as a prop, the frequency. For instance, sorry. You give as a prop the percentage. So, the indicator moves with the percentage. So, I calculate by the frequency the percentage of the indicator and I pass it to the, as a prop, and it displays the indicator that is going up and down based on the frequency that, in which the user is playing. And even here, I display the note name and the note number and the frequency easily without any library because it's only data.
After that, we are finally coming to an end because we're going to talk about all the improvements that can be done to this project. Well, as I said, it was a research purpose project but a lot of improvements can be made. I want to make a lot of improvements while I'm still studying this topic and, for instance, you can challenge yourself and reset the indicator if no note is detected because this indicator, when you detect a note, it does not go to its initial position.
It's very easy to fix but if you want to go forward, you can even improve noise cancellation and signal pre-processing. You can filter out noisy signal or, for instance, you can try to detect which chord is the user playing, not the single note but the chord. It's much difficult to do it but it's very interesting to do the science behind that and you can even study other algorithms to perform pitch detection such as YIM but I advise you this is a rabbit hole because there is a lot of stuff behind all these kinds of algorithms so the choice is yours.
You can go check it out, check out the project, try to understand it, study a lot of things about pitch estimation and frequency calculation and I tell you, it's very funny to study these things because I've never thought that there was something like that behind the guitar tuner but this project was the output of a long research and it was a fun project and I challenge you to do the same thing, to check out the code and to try to implement it. So thank you for having me. I share the code here in this slide. You can scan the QR code and you can find it so you can get access to the repo which is open source. If you like it, add me on my socials. You can find me on Twitter, on LinkedIn, wherever you want. So we can get in touch, talk about pitch estimation, we can talk about React and other things related to computer science. So it has been a pleasure and I wish you a nice day.
Comments