Motion Control With Multimodal LLMs

This ad is not shown to multipass and full ticket holders
JSNation US
JSNation US 2025
November 17 - 20, 2025
New York, US & Online
JS stars in the US biggest planetarium
Learn More
In partnership with Focus Reactive
Upcoming event
JSNation US 2025
JSNation US 2025
November 17 - 20, 2025. New York, US & Online
Learn more
Bookmark
Rate this content

What if you could use multimodal LLMs to interact with websites or IoT devices using motion control?

As advancements in multimodal AI offer new opportunities to push the boundaries of what can be done with this technology, I started wondering how it could be leveraged from the perspective of human-computer interaction.

In this talk, I will take you through my research experimenting with building motion-controlled prototypes using LLMs in JavaScript.

This talk has been presented at JSNation 2025, check out the latest edition of this JavaScript Conference.

FAQ

The talk was about motion control with multimodal AI.

The speaker is a Senior Research Engineer at CrowdStrike, known as DevDevCharlie online, with a background in machine learning, particularly using TensorFlow.js.

The speaker enjoys diving, running, hiking, playing drums, learning German, and has also obtained a radio license.

PoseNet and MoveNet are models used for pose detection, providing key points of a person's body, which can be used for applications like motion-based games.

The speaker uses hand gestures to interact with interfaces, such as controlling lights or playing games, by detecting key points on the hands using models like PoseNet.

Gemini is used for gesture recognition and function calling in the speaker's experiments, allowing for the control of devices like lights through hand gestures.

The speaker faces challenges in accurately detecting right from left gestures and ensuring the AI correctly interprets the intended commands.

The speaker is interested in exploring how LLMs can be used for motion control experiences, aiming to create more intuitive interactions with technology using gestures.

The speaker envisions using motion control technology for home automation, where AI learns user behaviors and automates tasks through gestures and acoustic activity recognition.

The speaker uses tools like TensorFlow.js for key point detection and DataStax for vector databases, along with Gemini for multimodal AI experiments.

Charlie Gerard
Charlie Gerard
39 min
12 Jun, 2025

Comments

Sign in or register to post your comment.
Video Summary and Transcription
The Talk delves into motion control with multimodal AI, exploring TensorFlow.js models for gesture recognition and enhancing user interactions. It discusses leveraging LLMs for gesture-based interaction, investigating Gemiini for gesture recognition, and controlling light states with Gemini functions. The conversation includes webcam-based gesture recognition, custom gesture databases, and the future of personalized AI assistance with acoustic recognition.

1. Introduction to Motion Control with Multimodal AI

Short description:

Welcome to a talk on motion control with multimodal AI. The speaker is a senior research engineer at CrowdStrike, known online as DevDevCharlie. With a background in machine learning, particularly with TensorFlow.js, the focus is on recent advancements in multimodal AI. The speaker is a self-proclaimed creative technologist, exploring the possibilities of JavaScript and the web platform.

Thank you. And welcome to my talk about motion control with multimodal AI. I gave a longer version of this talk recently at another conference, and a colleague of mine watched the recording and she was like, oh, it's like a magic show. So hopefully if everything works well, it will feel maybe like a magic show, but then you'll also understand how it actually is all built. So I was briefly introduced. I'm going to go quickly over this. So yes, I'm a senior research engineer at CrowdStrike. I go by DevDevCharlie online usually. I'm an author, master's instructor. So I've been doing kind of things with machine learning on the web for about eight years now, but primarily using TensorFlow.js before, and this is going to move on to the more recent advancements with multimodal AI. Overall, I guess I'm a self-proclaimed creative technologist, so I like to push the boundaries of what can be done with JavaScript and the web platform, and sometimes try to use tools to kind of make it do maybe what it wasn't necessarily built to do. And outside of tech, I've been spending a big part of the year trying to have hobbies that are non-tech related. It includes diving, running, hiking, playing drums, learning German, and I also got my radio license earlier this year. It's a very niche hobby, so I don't know if anybody here knows what it is, but in case you do, my call sign is ko6hpr if one day you hear me on the radio.

2. Exploring TensorFlow.js Models for Motion Control

Short description:

Discussing previous experiments with TensorFlow.js models like PoseNet and MoveNet for pose detection. Exploring the use of key points data and building interactive experiences with motion control. Augmenting tools with motion detection for enhanced user interactions.

But let's start by talking about previous experiments. So when I introduced myself, I just talked about TensorFlow.js, and I want to cover a little bit the things that can be done with that tool, so then you'll understand a little bit why I'm also experimenting with this with multimodal AI. So there's a few different models that you can use with TensorFlow.js, and one of them is about pose detection. It's called PoseNet or MoveNet. There's a second one as well. And usually, you get key points. So forget about the green lines, it's like the red dots. Depending on the model, you get a different amount of key points, and these key points are raw data, so it's coordinates, x and y coordinates relative to the screen.

And with this data that you get, you can build something like this. So this is a clone of a red light, green light, the game from a squid game, so if you have not watched this series, basically, you have a doll, and when it looks at you, you're supposed to like not move, and when the head is turned, you're supposed to like, in this case, run as close to the screen as you can. Otherwise, like, you die. Basically, if you move when the doll is looking at you, you die. And I wanted to recreate something using PoseNet, and then you start thinking, well, how do I actually code, like, the fact that I'm not moving or moving?

Another model that you can use is one that is specifically around key points on the hands. So right hand and left hand, and here I think you have about 21 key points, and you can build something like this. So I started thinking, well, what if you could augment the tools that you already use but add some kind of motion detection to it as well? So that was interacting with Figma, so it's not necessarily that all of a sudden you will build entire interfaces just with your fingers, but what if you just were augmenting the type of things that you can do?

QnA

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript
JSNation 2023JSNation 2023
21 min
Building a Voice-Enabled AI Assistant With Javascript
Top Content
This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.
AI and Web Development: Hype or Reality
JSNation 2023JSNation 2023
24 min
AI and Web Development: Hype or Reality
Top Content
This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.
The Rise of the AI Engineer
React Summit US 2023React Summit US 2023
30 min
The Rise of the AI Engineer
Top Content
Watch video: The Rise of the AI Engineer
The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.
The Ai-Assisted Developer Workflow: Build Faster and Smarter Today
JSNation US 2024JSNation US 2024
31 min
The Ai-Assisted Developer Workflow: Build Faster and Smarter Today
Top Content
AI is transforming software engineering by using agents to help with coding. Agents can autonomously complete tasks and make decisions based on data. Collaborative AI and automation are opening new possibilities in code generation. Bolt is a powerful tool for troubleshooting, bug fixing, and authentication. Code generation tools like Copilot and Cursor provide support for selecting models and codebase awareness. Cline is a useful extension for website inspection and testing. Guidelines for coding with agents include defining requirements, choosing the right model, and frequent testing. Clear and concise instructions are crucial in AI-generated code. Experienced engineers are still necessary in understanding architecture and problem-solving. Energy consumption insights and sustainability are discussed in the Talk.
Web Apps of the Future With Web AI
JSNation 2024JSNation 2024
32 min
Web Apps of the Future With Web AI
Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.
Code coverage with AI
TestJS Summit 2023TestJS Summit 2023
8 min
Code coverage with AI
Premium
Codium is a generative AI assistant for software development that offers code explanation, test generation, and collaboration features. It can generate tests for a GraphQL API in VS Code, improve code coverage, and even document tests. Codium allows analyzing specific code lines, generating tests based on existing ones, and answering code-related questions. It can also provide suggestions for code improvement, help with code refactoring, and assist with writing commit messages.

Workshops on related topic

AI on Demand: Serverless AI
DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Top Content
Featured WorkshopFree
Nathan Disidore
Nathan Disidore
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
AI for React Developers
React Advanced 2024React Advanced 2024
142 min
AI for React Developers
Top Content
Featured Workshop
Eve Porcello
Eve Porcello
Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)
Free webinar: Building Full Stack Apps With Cursor
Productivity Conf for Devs and Tech LeadersProductivity Conf for Devs and Tech Leaders
71 min
Free webinar: Building Full Stack Apps With Cursor
Top Content
WorkshopFree
Mike Mikula
Mike Mikula
In this webinar I’ll cover a repeatable process on how to spin up full stack apps in Cursor.  Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development.  We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more!  By the end expect to be able to run your own ai generated full stack app on your machine!
Working With OpenAI and Prompt Engineering for React Developers
React Advanced 2023React Advanced 2023
98 min
Working With OpenAI and Prompt Engineering for React Developers
Top Content
Workshop
Richard Moss
Richard Moss
In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps
Building AI Applications for the Web
React Day Berlin 2023React Day Berlin 2023
98 min
Building AI Applications for the Web
Workshop
Roy Derks
Roy Derks
Today every developer is using LLMs in different forms and shapes. Lots of products have introduced embedded AI capabilities, and in this workshop you’ll learn how to build your own AI application. No experience in building LLMs or machine learning is needed. Instead, we’ll use web technologies such as JavaScript, React and GraphQL which you already know and love.
Building Your Generative AI Application
React Summit 2024React Summit 2024
82 min
Building Your Generative AI Application
WorkshopFree
Dieter Flick
Dieter Flick
Generative AI is exciting tech enthusiasts and businesses with its vast potential. In this session, we will introduce Retrieval Augmented Generation (RAG), a framework that provides context to Large Language Models (LLMs) without retraining them. We will guide you step-by-step in building your own RAG app, culminating in a fully functional chatbot.
Key Concepts: Generative AI, Retrieval Augmented Generation
Technologies: OpenAI, LangChain, AstraDB Vector Store, Streamlit, Langflow