Video Summary and Transcription
This talk explores JavaScript's role in distributed machine learning at scale, discussing the lack of tooling and the accessibility of machine learning deployments. It also covers cloud-based machine learning architecture, machine learning at the edge, and the use of HarperDB for simplified machine learning deployment. The concept of iterative AI and model training is also discussed.
1. Introduction to JavaScript ML
Hi, welcome to my talk for JS Nation entitled To the Edge and Back JavaScript's Role in Distributed ML at Scale. I am a recovering developer, father of two daughters, based in Denver, Colorado. I work for HarperDB, a Distributed Application Platform built entirely in Node.js. Today, I will explore the JavaScript machine learning ecosystem, tactical architecture, and systems and methods for delivering performant access to machine learning and AI.
Hi, welcome to my talk for JS Nation entitled To the Edge and Back JavaScript's Role in Distributed ML at Scale. My name's Jackson Repp. I am a recovering developer, father of two daughters. I'm based in Denver, Colorado. I've been a part of eight startups, so I've had two exits, five what I call opportunities for learning. And now I work for HarperDB, which is a Distributed Application Platform. We've been around six years and we've got a lot of production deployments and a fairly robust community.
So when I talk about HarperDB as the place I work, I think of more interest to JS Nation is the fact that we are, in fact, built entirely in Node.js. So we are, we've leveraged the language you already love. And it was one of those things where we looked around and we could have chosen any language, but we realized there were tremendous benefits in terms of simplicity and availability of resources and deployment platforms. Where can JavaScript run? So we love to focus on the JavaScript community and machine learning is obviously, it's one of those things that has expanded dramatically in the very recent future. And how does that get done? What are the logistics behind it? And that's what I wanted to explore today.
So the syllabus for this course, I guess, would be understanding the JavaScript machine learning ecosystem. What are the resources we have available to us to build these amazing, cool technologies that function out maybe closer to the user, leveraging a language we all love. And then we have a section called tactical architecture, which is sort of how people do it now or how people did it in the past and where we think it's going over time. How do we continue to deliver performant access to machine learning and AI and these incredibly complex models when running them takes so much horsepower and you don't necessarily have all of the horsepower in the world sitting on your phone or perhaps, you know, in a browser. And finally systems and methods. So how can we approach this problem? What are the considerations we need to have in mind or keep in mind when we're planning a system that is truly distributed and iterative as I'll sort of outline what those architectures look like?
2. Machine Learning Tooling and Tactical Architecture
People become aware of machine learning and its potential applications. However, the lack of tooling requires developers to write low-level code to train models and build applications. With the right infrastructure, machine learning deployments become more accessible. ChatGPT has gained significant attention and offers a comprehensive and fast solution. JavaScript is a great choice for pushing machine learning to the edge, with libraries like TensorFlow.js and mobile platforms like CoreML and MLK. The hierarchical nature of accessing data suggests opportunities for cloud, near edge, far edge, and mobile deployments. The tactical architecture involves training, testing, and deploying models.
First, people become aware of it, right? They know the machine learning is a thing. They know that it can help me identify stuff in a photo or they know they can make recommendations using it. But the tooling isn't there. So you're out writing super low-level code to train a model, to build something that can act on user input and give you a recommendation or a classification or accomplish whatever that end goal might be.
And then the infrastructure gets built out behind that to support stuff that we are now capable of deploying because we have the tooling. And with that infrastructure, it becomes more available deployments, which obviously you can roll out to a wider audience, and then it starts to get. So if you look at awareness, the number one thing that everybody's talking about is ChatGPT to the point that the last three weeks of earnings calls have included mentions of AI and ChatGPT in products that didn't even seem like they would take advantage of them because the stock price goes up, because everybody's so excited and aware. And ultimately, we want to deliver this product, this solution, this result. And it's simple, accessible, comprehensive and fast. And ChatGPT nailed all of those things. And it's tremendous if you've ever used it. You know that there's a wait usually to get in line and commercial accounts are hard to come by and expensive, because it takes tremendous amount of resources to do something as impressive as what ChatGPT does. Now, obviously it's also a little terrifying in terms of the scope of what it can do. It's a very large model that's been trained on lots of pieces of data and not everybody needs to deploy a fully comprehensive human-speaking chat engine, but there are a million other applications for machine learning, especially at the edge, that can leverage a lot of the best practices that ChatGPT put in front of us in terms of accessibility.
We look at the tooling then that we have to continue to push this logic out to the edge, right? How do we get closer to those users? And JavaScript obviously, being on every client device and running just about everywhere, is a great choice for that. And while machine learning and machine learning models and AI has traditionally been, you know, on servers with lots of power, a la ChatGPT training a giant model, there's lots of libraries available. TensorFlow.js is the JavaScript cousin to kind of the king of machine learning platforms sponsored by Google. But you've also got lots of other platforms that are available to take data in, generate a model, and ultimately push that out and run it on the edge as well as mobile platforms like CoreML and CreateML on iOS and MLK for Android. So there's lots of ways to push this out as far as you can. Now, again, you have horsepower that's required to ultimately create and use models, so it really depends where you're going to do it. Traditionally, we've done this in the cloud, right? We run a big server with lots of GPU, and we build big models. And then we set up infrastructure on the edge or in another cloud region to leverage that model, take requests from inbound clients, and to take their data and run it against the model and get some sort of a classification or resulting dataset out of it. But as we continue to look at just the hierarchical nature of, say, how we access data, there's probably an opportunity for bifurcation or trifurcation. Just the vision of responsibilities across cloud to the near edge, i.e. the servers that are just in regions closer to you, the far edge, i.e. AWS local zones or on-prem, things that are very, very close to you. And then finally, things you're actually carrying around with you, a mobile app or a browser on your phone or running on a laptop. So there's lots of things that needed to be put in place and have that tooling so that we could actually deliver the results at a more local level. So we look at a tactical architecture, again, the basics are we want to train a model, we want to test it and validate that it works, and then we want to deploy it. We want to put that out there and have it actually start doing things for us.
3. Cloud-Based ML Architecture
We want to put that out there and have it actually start doing things for us. And I look at like a cloud-based traditional old-school architecture, I've got a data source, either static from a data lake or some giant database, or I've got streaming data that comes in from applications, from clients, from sensors, and then I have an ML pipeline where I am accomplishing all of the training and the testing. And then I have some sort of ML ops, which is a super hot keyword right now, and there's lots of tools, Kubeflow is one of them, it works very well with Kubernetes. And then that's the distribution out to the infrastructure that will then run those models, and I just ran Kubernetes here because everybody knows that, and it's ubiquitous. So this is the architecture of a lot of machine learning applications.
We want to put that out there and have it actually start doing things for us. And I look at like a cloud-based traditional old-school architecture, I've got a data source, either static from a data lake or some giant database, or I've got streaming data that comes in from applications, from clients, from sensors, and then I have an ML pipeline where I am accomplishing all of the training and the testing. And then I have some sort of ML ops, which is a super hot keyword right now, and there's lots of tools, Kubeflow is one of them, it works very well with Kubernetes. And then that's the distribution out to the infrastructure that will then run those models, and I just ran Kubernetes here because everybody knows that, and it's ubiquitous. So this is the architecture of a lot of machine learning applications.
4. Machine Learning at the Edge
And it's very similar to something along the lines of a chat GPT. It's all central cloud-based, lots of horsepower in the infrastructure to build those models and to run those models, but ultimately it's all in one place. The next iteration is to build a big model using the high horsepower infrastructure of the cloud, and then to push that model out and to retrain the model or augment that model with data that comes from more local or regional clients. So you take that model and you ship it out to the edge and then you retrain it or you augment it with local or regional data and that offers a more customized experience. You did most of the work then in the cloud on that high horsepower engine and now at the edge you can use more distinct resources, more discrete resources to retrain and still provide that result in a timely manner to clients. Hierarchical knowledge and hierarchical classification and recommendation is how our bodies work. We call this ensemble learning. There are refinements that need to be made at every level of this to ensure that that model is both relevant and that it's performant enough at that edge because the devices that are calling in, there may be hundreds of thousands of them or millions or billions of them and they're calling in and they want a recommendation, how do you have a model that's localized enough and performant enough to deliver those results out at the edge?
And it's very similar to something along the lines of a chat GPT. It's all central cloud-based, lots of horsepower in the infrastructure to build those models and to run those models, but ultimately it's all in one place. It's not much closer to the user than perhaps the closest infrastructure component.
So client devices may be waiting a while to reach out to get that workload performed, perhaps in the case of, for example, chat GPT, you put in a queue and you have to wait your turn to interact with it, because there's limitations on what that cloud architecture can accomplish.
The next iteration of that then is to build a big model using the high horsepower infrastructure of the cloud, and then to push that model out and to retrain the model or augment that model with data that comes from more local or regional clients. So you may have, for the example, a store, but think of a massive multinational corporation that sells lots of products at thousands of outlets around the world. They know a lot about the general purchasing behaviors of their audience. They know when it's cold, people buy jackets, when it's hot, people buy flip-flops. That is perhaps, one would hope, universal, except it's not because there are certainly regions where having your feet exposed in a flip-flop is considered rude and so we don't sell that many flip-flops. So a machine learning model that was trained on the global data set perhaps wouldn't be the best source of recommendations for a population in an area where cultural or, you know, weather or lots of other factors are in play but they don't have access to that when they're building that one central model.
So you take that model and you ship it out to the edge and then you retrain it or you augment it with local or regional data and that offers a more customized experience. You did most of the work then in the cloud on that high horsepower engine and now at the edge you can use more distinct resources, more discrete resources to retrain and still provide, you know, that result in a timely manner to clients. But we could build this out even further because, again, knowledge much like the human brain is a hierarchical process. The human brain will take in an image through your eyes. It will immediately try to classify the shape. I see an outline of the darkness if I'm in the jungle and the shape looks like it might be a tiger. All I see though is the profile of that tiger, the silhouette. What the human brain will do is it will look at that edge or the edges of that profile. It will rotate that profile around and see if you can classify, I've ever seen that shape, that silhouette before then it will imagine, you imagine sound as an input. Is there what I would consider to be a growl of a tiger? Is it moving in a way that I traditionally associate with a tiger? Is it getting closer to me? Then as I begin to do those classifications, the interesting thing about the human brain as it goes through this process is there's lots of levels where it's able to take in different sensory data, but any single one of them can trigger the chemical reaction that says run or reach for that gun against the tree or make a loud noise or kind of give up because it's over. Hierarchical knowledge and hierarchical classification and recommendation is how our bodies work. If this on the screen is sort of the edge-based one, two tier, you can imagine that an iterative solution may be able to add layers and layers of knowledge on top of a model that may be generated centrally and then distributed out and continues to get more and more refined the closer it gets to the edge. We call this ensemble learning. When we talk about the flow of training a model in the cloud, it is more performant to say use a TensorFlow that's written in Python. That becomes more performant, but the further out you get and the lower the resources available and the closer to the edge and perhaps the more restrictive or sandbox the environment is, you begin to see lots of places you can use JavaScript to continually refine those models and then ultimately deploy them, so JavaScript becomes an excellent tool as you get out across the far edge or on-prem or even running on clients, right? So I can I can train something using the camera on my phone, I can retrain that model and now all of a sudden I can tell it the thing that you're looking at is in fact a tiger. So I can tell it that that model will exist on my phone and anytime I see a silhouette like that it'll be classified as a tiger and all of that will happen locally. However, you can see that this is a lot of moving parts, it looks like it's synchronous and I can just or it's effectively copy and paste and, you know, when you're creating a PowerPoint presentation, you're absolutely copying and pasting but there are refinements that need to be made at every level of this to ensure that that model is both relevant and that it's performant enough at that edge because the devices that are calling in, there may be hundreds of thousands of them or millions or billions of them and they're calling in and they want a recommendation, how do you have a model that's localized enough and performant enough to deliver those results out at the edge? So when we look at the systems and methods by which we would implement a solution like that, there's a lot of considerations. So the ML tooling for JavaScript has some limitations in terms of what it can perform and we're trying to design this iterative system to make as much use of the most appropriate client wherever we can. So on a server you have to look at host resources and you have to look at the complexity, like how many factors am I considering when I'm putting in data that's going to make a recommendation or a classification? And then sheer data volume, right? How much data am I using to train my model and to test it? And do I have the capacity to store that out on the edge on a phone or in a browser or easily accessed? Or am I leveraging terabytes of data in the cloud and pushing it out to the edge? And then when I get out to the edge, what is my phone capable of? What is the browser capable of? What's that sandbox? What do those restrictions mean for me? And finally, it comes down ultimately to user experience. Is it fast enough? Is it good enough? Is the accuracy high enough? And as I built that model in the cloud with lots of resources, how interoperable is it out at the edge? There's, for example, TensorFlow.
5. Machine Learning Deployment and HarperDB
The models you generate using the Python model have to be run through what's called TFJS converter. Consider what you're trying to accomplish and what can be accomplished at the edge. Complexity can be a challenge, especially when scaling up. HarperDB is an integrated machine learning platform that simplifies and reduces complexity. It combines a database, applications, and distribution logic. By leveraging HarperDB, you can handle training, distribution, and replication of models. Clients can access the data and models, and iterative AI allows for localized model training and deployment to client devices.
The models you generate using the Python model have to be run through what's called TFJS converter. And there are some limitations to the structures of those models. So you need to consider, what are you trying to accomplish and what can you accomplish at the edge?
But then there's the other hierarchical nature of it. And we saw all those layers previously. And we talk about complexity sort of killing it. It's no good if it's super performant, but nobody in the world can maintain it. Because God forbid, I become successful, I need to scale it up. If you cannot get a hold on all those moving parts, and you might have 10, 15, a hundred moving parts in an application stack of micro front ends and microservices, APIs, and all that stuff, which is fine, but if you want to be in a hundred places so that you're close to all of your users, well, now I've got 10,000 moving parts to worry about. That's obviously no fun, and it lowers the total cost of ownership, obviously, of maintaining a system like this.
So you consider data storage, volume on disk, the business logic, my training workflow, my memo ops, my distribution, the infrastructure, and the horsepower thereof, and ultimately, what's the load, what are the volume of client apps that are going to be calling in and trying to get access to this information. So at this point, I'll just mention that HarperDB, the company for whom I work, is an integrated machine learning platform. It does lots of things, machine learning is one of those things, and we built in all of the pieces we thought would be necessary to simplify and reduce that complexity so that we could be in 100 places, and you still only had to worry about 100 things. So we built a database with an applications here, and distribution logic. So that's replication between nodes of HarperDB. So if you look at the database, that's the data store, right? And the application is where your training workflow and your distribution or replication of these models can be handled by simply leveraging HarperDB's existing solution. So we are not obviously the only machine learning platform, but I'll use this one as an example. We look at all of the sources of data and pull that into a platform like HarperDB. And then you have modules that you can use to train and build those models. And then you have lots of clients that can access this directly. And ultimately, lots of data going in, lots of processing to generate models in real time. And then finally, the clients that can call in and gain access to that data. So a system of iterative AI on HarperDB combines all of the pieces we had in the previous graphic. And you can simply retrain models within the application tier and accept clients within that same application tier who are asking you questions. Those same models obviously can be deployed out to the actual client devices so that they can run Edge as well. But the interesting thing about a platform in iterative AI is that I can ask a question of a node that's very close to me. But perhaps that model is locally trained. It's been thinned down. It's optimized for a bar edge platform or very low horsepower. Maybe it's running on the Edge on a Raspberry Pi. If I can't answer the question, I can forward that question further up the chain.
6. Iterative AI and Model Training
And I can flag questions as unknown and unresolved, and then ask more powerful questions with a global dataset. If the question can be answered, the knowledge will come back and can be used as new training data. By training on top of previously unanswered questions, you can continue to improve the model.
And I can flag that question as unknown, unresolved. And I can ask the next most powerful thing with perhaps a more global data set that isn't as locally trained. And if it can answer it, great. That knowledge will come back through that original touch point. I can use that now answered but previously unanswered question as a new set of training data to say, if you see something that seems unanswerable, but it follows a paradigm like this, then perhaps this knowledge is worthwhile. And perhaps that might be the answer or an answer that's analogous to that. So you can train on top of that. And you can continue to go up the chain.
Comments