Come On Barbie, Let’s Go Party: Using AI for Music Mixing

Rate this content
Bookmark

As a DJ, I use many techniques to mix and create new sounds that get peoples’ hands in the air. In this talk I’ll describe AI algorithms based on Neural Networks which have the ability to break down music into elements. I’ll cover how our brain differentiate between dozens of different sound signals when we listen to music. Can we instruct AI to do so? 

The cool part: live DJing on stage using AI algorithms.

This talk has been presented at JSNation 2024, check out the latest edition of this JavaScript Conference.

FAQ

DJ mixing is the art of blending two or more tracks seamlessly to create a continuous flow of music. It involves curating the right playlist, reading the crowd, and determining the next track based on the energy on the dance floor.

Deep learning revolutionizes DJ mixing by allowing DJs to separate and manipulate different components of a track, such as vocals and instruments, in real-time. This technology uses neural networks to analyze and process sound signals, enabling more creative and precise mixing.

A data scientist at Wix builds machine learning pipelines for data scientists across the organization. They work on extracting and processing data to improve various aspects of Wix's services, including website building and user experience.

The DJ in the text mixes Dark 80s, synthwave, and techno sounds.

Common visual representations of sound include waveforms and spectrograms. Waveforms show the amplitude changes over time, while spectrograms display the changes in frequencies over time, providing a 3D representation of sound.

The three main levels of audio feature abstraction are: 1) Low-level features, such as amplitude, envelopes, and zero crossing rate, which are numerical data for machines to process. 2) Medium-level features, such as pitch, beats, and notes, which are more perceptual for humans. 3) High-level features, such as tempo, lyrics, melody, and rhythm, which are enjoyed by general listeners.

Deep learning helps in music source separation by using neural networks to process raw audio or spectrograms and extract different components, such as vocals and instruments. Convolutional neural networks (CNNs) identify local patterns, while recurrent neural networks (RNNs) find dependencies between track parts. The result is a mask that separates the desired audio component from the original track.

Applications of music source separation technology include karaoke, music transcription, lyrics extraction, and beat matching. This technology allows for high-quality separation of vocals and instruments, making it easier to create karaoke tracks, transcribe music, and enhance DJ mixes.

The DJ was inspired to explore music source separation using neural networks after receiving an email about a new technology in his DJ software that could separate track sources. Initially dismissive, he later revisited the feature to help a friend separate vocals from an old track, leading to his amazement at the technology's capabilities.

The DJ used open-source tools, datasets, and Python code to experiment with music source separation. He followed step-by-step guides, read articles, and trained models himself to understand and utilize the technology.

Ziv Levy
Ziv Levy
27 min
13 Jun, 2024

Comments

Sign in or register to post your comment.

Video Summary and Transcription

Today, we explore DJ mixing and how deep learning revolutionizes the art by discussing sound processing, extracting features, and using machine learning. Deep learning allows for efficient extraction of audio features and high-resolution track separation. Neural networks can achieve source separation by converting audio to spectrograms and applying convolutional and recurrent neural networks. This has immediate impact on industries such as karaoke and music transcription.

1. Introduction to DJ Mixing and Deep Learning

Short description:

Today, we're going to explore DJ mixing and how deep learning revolutionizes the art. I'm a DJ at Wix and a data scientist. DJing is more than curating playlists, it's about reading the crowd. Sometimes, when I try to blend a song that sounds perfect in my headphones, it crashes on the dance floor. Let me show you an example. We'll discuss sound processing, extracting features, and using machine learning. And then, we'll dive into the revolutionary deep learning approach.

Today, we're going to actually explore and dig into this art of mixing, of DJ mixing, and I'm going to talk about this perspective as a DJ and actually we're going to also talk about how deep learning brings a whole new revolution to this music art of mixing and in general what can be done with sound signals and neural networks.

So, again, a bit more about myself. I'm working at Wix for the past seven years now. I'm working at the data science group. My day job is that actually I'm building machine learning pipelines for data scientists across the organization. For those of you who are not familiar with Wix, Wix is a website platform building. And again, I'm also a DJ. I'm mixing Dark 80s, synthwave, and techno sounds and this is what we're going to talk about today, again, this aspect of my life as a DJ.

And I don't need to tell you that being a DJ is not only curate the right playlist, but it's also the ability to read the crowd and to see what track is going to be next according to the energy on the dance floor. And the issue is, like, the problem is that sometimes I hear something very good that really perfectly fits to the dance floor in my headphones and when I try to blend it in to the dance floor, it crashes. Let me show you how I am crashing a mix. And how awful this sounds. So I picked those two songs. One of them is by Adele. You are familiar with this song, right? And the next one is, oh, not this one. Next one is this one. Also familiar. By the way, everything I do, I'm doing it live. So if I have some glitches or some messed up, just excuse me. Okay?

So in my head, those songs are perfectly matching. But if I'm trying to play it, and let's skip to this, to the highlight of the Adele song. I'll try to mix the song exactly at the highest point of it. Okay. As you have heard, it's a lot of noise. This is where, you know, some of you probably would do some faces of, hmm, what? What's wrong with this DJ? But to my fortunate, you will be surprised to see what, you know, a very drunk crowd may overcome. But for me, it's like it's devastating. It's really like ruining the moment, and the energy is unbalanced, and I need to recover from it, and it's very stressing. But again, in my head, it was perfect. So what was it? So what we're going to talk about today is what sound is and how we process audio with computers and how we're pulling out features from this audio, and how we use it in machine learning. Okay? And then we're going to talk about the deep learning approach, which is pretty much revolutionary.

2. Exploring Source Separation and Sound Modeling

Short description:

It all started with an email about a unique technology for separating track sources. I didn't pay much attention until a friend asked for help in separating vocals. I rediscovered the tool in my DJ software and was amazed by its real-time capabilities. Intrigued, I delved into music source operation using neural networks. Sampling measures amplitude levels, resulting in a waveform that holds information about frequency, intensity, and timbre. Computers struggle to distinguish between instrument overtones, unlike our brains.

And as we speak, things are really happening right now. So it all started with, you know, back a couple of years ago, I got an email from the release note from the DJ software that I'm using, and they're saying something like, here, dear DJs, we are now able to provide you a unique technology that will allow you to separate the sources of your track, and by that, you know, be creative and do something with it. And I thought to myself at first that, well, it's not so interesting. I mean, probably it has been solved already. But you know, it was like post-Covid era, there were still limitations, and you know, limitation on crowding and everything, so I really didn't pay attention to that.

And recently, a friend of mine came to me and she said, I want your help to separate the vocals out of some track that I have. This is a very old track, there are no studio versions or something. What can I do? And you know, sometimes I have my equalizer here, and I can play and, you know, in some manner reduce the sound of some sounds, or enhance the sounds of others, but it's not really creating a karaoke version of, like, peeling apart the layers. But suddenly I remembered that I have this tool in my DJ software, and I read, you know, the step-by-step guide of what to do, what do I need to configure, click a few buttons, and boom, I had it. And I was, you know, it was nice, she was happy, but then I, like, played with it with another song and another song, and I was actually, it wasn't just nice. I was amazed by that, and everything was happening in real time.

And this is something that was not on this release note, by the way, but actually, or maybe it is, but I didn't read the entire thing. But actually I was amazed, so this really triggered my engineering part of the brain. And, you know, I started, what do I do? I want to know how things are happening, I go to Google. And I look for music source operation using neural network, and I downloaded an article, read it, another article, read it, downloaded the dataset, downloaded the Python code, trained the model myself, and then I was testing it with another track and another track and another track, and I was actually mind-blowing about this technology. And after a few hours of playing with it, this is how I look like. Like, I discovered, you know, a whole new world came out to me.

So, the first thing is how we model sound, okay? What sound is? So, sound, eventually, is like changing in air pressure caused by air molecules' vibrations. Our ears are sensitive to those vibrations, and eventually this is what our brain perceives as sound. So computers does something similar called sampling. I'm not going to dig into this technique because of time constraints, but the computer measures the amplitude levels of those vibrations. Eventually, what we get is a waveform, which is the most common visual representation of sound, but actually this waveform holds multifactorial information about the sound. The first thing is the frequency, okay? If we zoom in, we can get the frequency of the sound. Second thing is the intensity of the sound. The intensity is measured by a squared, like we are taking a squared area of the waveform, and we see what is the peak in proportion to what is the minimum and the maximum points. And then we have something very important, which is the timbre of the sound. And the timbre is something that also considered as the tone quality or the tone color. It's not the quality like how clear I hear the sound, it's the tone quality of like overtone of different instruments overtone each other. For example, if I'm playing a C chord at the same time I'm playing a C chord in a guitar, at the same time someone plays a C chord at the piano, I want to be able to distinguish between those instruments, and this is something very hard to do for computers. Actually, if you think about it, our brain can do it pretty much instantly.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript
JSNation 2023JSNation 2023
21 min
Building a Voice-Enabled AI Assistant With Javascript
Top Content
This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.
AI and Web Development: Hype or Reality
JSNation 2023JSNation 2023
24 min
AI and Web Development: Hype or Reality
Top Content
This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.
The Rise of the AI Engineer
React Summit US 2023React Summit US 2023
30 min
The Rise of the AI Engineer
Watch video: The Rise of the AI Engineer
The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.
TensorFlow.js 101: ML in the Browser and Beyond
ML conf EU 2020ML conf EU 2020
41 min
TensorFlow.js 101: ML in the Browser and Beyond
TensorFlow.js enables machine learning in the browser and beyond, with features like face mesh, body segmentation, and pose estimation. It offers JavaScript prototyping and transfer learning capabilities, as well as the ability to recognize custom objects using the Image Project feature. TensorFlow.js can be used with Cloud AutoML for training custom vision models and provides performance benefits in both JavaScript and Python development. It offers interactivity, reach, scale, and performance, and encourages community engagement and collaboration between the JavaScript and machine learning communities.
Web Apps of the Future With Web AI
JSNation 2024JSNation 2024
32 min
Web Apps of the Future With Web AI
Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.
Building the AI for Athena Crisis
JS GameDev Summit 2023JS GameDev Summit 2023
37 min
Building the AI for Athena Crisis
Join Christoph from Nakazawa Tech in building the AI for Athena Crisis, a game where the AI performs actions just like a player. Learn about the importance of abstractions, primitives, and search algorithms in building an AI for a video game. Explore the architecture of Athena Crisis, which uses immutable persistent data structures and optimistic updates. Discover how to implement AI behaviors and create a class for the AI. Find out how to analyze units, assign weights, and prioritize actions based on the game state. Consider the next steps in building the AI and explore the possibility of building an AI for a real-time strategy game.

Workshops on related topic

AI on Demand: Serverless AI
DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Top Content
Featured WorkshopFree
Nathan Disidore
Nathan Disidore
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
Leveraging LLMs to Build Intuitive AI Experiences With JavaScript
JSNation 2024JSNation 2024
108 min
Leveraging LLMs to Build Intuitive AI Experiences With JavaScript
Featured Workshop
Roy Derks
Shivay Lamba
2 authors
Today every developer is using LLMs in different forms and shapes, from ChatGPT to code assistants like GitHub CoPilot. Following this, lots of products have introduced embedded AI capabilities, and in this workshop we will make LLMs understandable for web developers. And we'll get into coding your own AI-driven application. No prior experience in working with LLMs or machine learning is needed. Instead, we'll use web technologies such as JavaScript, React which you already know and love while also learning about some new libraries like OpenAI, Transformers.js
Llms Workshop: What They Are and How to Leverage Them
React Summit 2024React Summit 2024
66 min
Llms Workshop: What They Are and How to Leverage Them
Featured Workshop
Nathan Marrs
Haris Rozajac
2 authors
Join Nathan in this hands-on session where you will first learn at a high level what large language models (LLMs) are and how they work. Then dive into an interactive coding exercise where you will implement LLM functionality into a basic example application. During this exercise you will get a feel for key skills for working with LLMs in your own applications such as prompt engineering and exposure to OpenAI's API.
After this session you will have insights around what LLMs are and how they can practically be used to improve your own applications.
Table of contents: - Interactive demo implementing basic LLM powered features in a demo app- Discuss how to decide where to leverage LLMs in a product- Lessons learned around integrating with OpenAI / overview of OpenAI API- Best practices for prompt engineering- Common challenges specific to React (state management :D / good UX practices)
Working With OpenAI and Prompt Engineering for React Developers
React Advanced Conference 2023React Advanced Conference 2023
98 min
Working With OpenAI and Prompt Engineering for React Developers
Top Content
Workshop
Richard Moss
Richard Moss
In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps
Building AI Applications for the Web
React Day Berlin 2023React Day Berlin 2023
98 min
Building AI Applications for the Web
Workshop
Roy Derks
Roy Derks
Today every developer is using LLMs in different forms and shapes. Lots of products have introduced embedded AI capabilities, and in this workshop you’ll learn how to build your own AI application. No experience in building LLMs or machine learning is needed. Instead, we’ll use web technologies such as JavaScript, React and GraphQL which you already know and love.
Building Your Generative AI Application
React Summit 2024React Summit 2024
82 min
Building Your Generative AI Application
WorkshopFree
Dieter Flick
Dieter Flick
Generative AI is exciting tech enthusiasts and businesses with its vast potential. In this session, we will introduce Retrieval Augmented Generation (RAG), a framework that provides context to Large Language Models (LLMs) without retraining them. We will guide you step-by-step in building your own RAG app, culminating in a fully functional chatbot.
Key Concepts: Generative AI, Retrieval Augmented Generation
Technologies: OpenAI, LangChain, AstraDB Vector Store, Streamlit, Langflow