English versionEN

Coffee Chat With Documentation, Are You Ready?

Published author, international speaker and an open-source library maintainer of frontend and web projects

The introduction of ChatGPT, Whisper API, and its orchestration tool such as Langchain, Semantic Kernel brings in a lot of hype over AI and what we can build with AI, such as a document assistant. But are we ready to scale our AI project to meet further requirements and broader scenarios, such as handing multiple processes within a document question and answer flow, or offering industry-specific answers with the existing codebase? How can we, as developers, leverage these tools to offer a better experience in documentation for developers as our users? Join my talk and let's find out.

This talk has been presented at JSNation 2024, check out the latest edition of this JavaScript Conference.

FAQ

The speaker is Maya Chavin, a senior software engineer at Microsoft.

Maya Chavin works with the Microsoft Industrial AI team, which leverages AI technologies to build industry-specific AI-integrated solutions and applications.

The main topic of Maya Chavin's talk is about AI, specifically focusing on document Q&A services using generative AI and large language models (LLMs).

Generative AI is a type of artificial intelligence that can generate text and media from various input data, such as text or images, which are called prompts.

Some examples of large language models (LLMs) are GPT, Gemini, Cloudy, and Llama.

In the context of LLMs, a token is a piece of words. Every word in a sentence is translated into tokens, which are used by the AI for processing. Tokens are important because they represent the cost of interacting with LLMs.

The three core capabilities of LLMs for document Q&A mentioned in the talk are completion (including chat as an extension of completion), retrieval (search), and embedding (creating vector representations of text).

The injection phase in a document Q&A service involves loading and parsing documents, splitting them into structural chunks, creating embeddings for these chunks, and indexing them in a database. This phase provides the grounding for the AI to process user queries accurately.

The querying phase in a document Q&A service involves creating embeddings from the user's input query, searching for matching chunks of text, computing the prompts, and using the AI to summarize and format the answer based on the retrieved chunks and user query.

Some services and tools mentioned for parsing and indexing documents in a document Q&A service include Azure Document Intelligence, text splitter from LangChain, text embedding from OpenAI, Azure AI Search, and Pinecone.

machine learning artificial intelligence

Maya Shavin

34 min

13 Jun, 2024

Comments

Video Summary and Transcription

Maya Chavin, a senior software engineer at Microsoft, discusses generative AI and the core model for LM. The flow of a document Q&A service and the importance of prompts in enhancing it are explored. The injection and querying phases of document Q&A are explained, emphasizing the need for efficient storage, indexing, and computing relevant prompts. The talk also covers the use of embedding models, optimization strategies, and the challenges of testing and validating AI results. Creative uses of LLMs and the impact of AI on job security are mentioned.

Available in Español: Charla sobre café con documentación, ¿Estás listo?

1. Introduction to Generative AI and LM

Short description:

Hi, everyone. I'm Maya Chavin, a senior software engineer at Microsoft. Today's talk is about generative AI and the core model for LM. We'll discuss the flow of a document Q&A service and how to enhance it using prompts. LM is a large language model that allows us to process human input and train its own data. It works with tokens. Token is a piece of words that need to be translated for the model to understand. To count tokens, we can use a token counter application.

Hi, everyone. You have your lunch? Are you awake or sleepy? Okay, because I don't have real coffee here, so I hope that you already have your coffee. If not, I'm sorry, but this is going to be the most boring talk in your life. No, I really hope not. But anyway, so before and foremost, my name is Maya Chavin. I'm a senior software engineer at Microsoft. I'm working in a team called Microsoft Industrial AI, which we leverage different AI technologies to build AI integrated solution and applications for industry specific.

Sorry, my voice today is lost during the flight, so I don't know what happened. So if it's hard for you to understand me, I'm really sorry. And if you want to understand me better, please feel free to contact me after the talk, okay? I've been, like the introduction, I've been working with web and JavaScript and TypeScript, but today's talk, it has nothing to do with TypeScript or JavaScript or anything. It's talking about AI. And first and foremost, how many people here working with AI or generative AI? Okay, so we can skip this slide.

Now, anyway, so for people who doesn't know about generative AI or maybe know about the term but never have a chance to experience it. So generative AI is an AI that can generate text and media from a varieties of input data, which is we call it prompts, basically text or anything, like now we can also send it some image for it to analyze and also learn from their system data. And that is our talk, we'll based on it, which we will talk about what are the core model, what are the core model for LM or generative AI to use. And our talk also will focus about how we're going to use the model and to define what the core flow of a very simple service, document Q&A, where you can find it on Google a hundred times when you Google for document Q&A using AI. But in this talk, we will learn a bit more what the flow behind it, what we can, what kind of service we can use for each different component inside the flow in LM, and finally how we can enhance and expand the service using prompts or any technique that we can pay attention to when we develop a new document Q&A as a generic service. Okay.

But first and foremost, LM. How many people here working with LM, any model LM? What LM do you use? GPT? GPT? Text embedded? DALY? Raise your hand. Come on, I believe that you already have coffee, right? Anyway, so just a recap, LM as a service is a large language model which allow us to, which be able to process human input. And then it will also have capable of training its own data, whether it's supervised or unsupervised, and it works with token. And the nice thing about LM is that it provide you a set of API at the black box that help developer develop AI applications more straightforward and more simply than before. Okay. So some of the LM we can see here, OpenAI, Google, Microsoft, Meta, Anthropic, Hugging Face, nothing new here.

So we talk about LM working with token, right? So what exactly is token? Well, to put it simple, token is just a piece of words, which mean every single words in a sentence you have to translate it to token. And to count the token, we have some calculator that we can use to count the token. It's called token counter, which is right here. I have it in the, this is applications, that you can go here and write your text in here and it will generate for you how much, how many token it will take you to, it will cost you to pass this string to the AI.

2. Core Capabilities for Document Q&A

Short description:

In this part, we'll discuss the core capabilities for document Q&A, including completion, chat, and retrieval. Completion API allows AI to complete user tasks, while chat is an extension of completion. Retrieval enables search, generating vector representations of text. Document Q&A is not complex, but it's crucial to implement correctly to avoid issues like the AI chatbot used by Air Canada. Document Q&A as a service is a simple text input and button where users ask questions and receive AI-generated answers.

I have it in the, this is applications, that you can go here and write your text in here and it will generate for you how much, how many token it will take you to, it will cost you to pass this string to the AI. Okay. This is just a token and you can also see the approximately calculation of token based on OpenAI website. And it's very important because token is money. Literally. We don't work with money, with AI, we work with token.

So when we talk about LM core capability, we have several capability until now, six different one and it's improving. In this talk, we will only focus on three core capability for document Q&A. Completion and chat. Completion and chat, chat is actually completion, extension of completion, so usually when you start an API of completion, you will see the API for chat will have the slash chat at an extension, it's nothing, it's not a separate model, it's using the same completion.

So what is the completion API? Completion API is the API that allow the AI to perform, to complete the task given by user and chatting is also a task given by user. Some of the famous completion API is GPT, Gemini, Cloudy and Lama, it's very hard to pronounce this kind of word. Anyway. So some of these famous completions that we always use when we do chat or text completion and so on, the other one is retrieval. What is retrieval? Retrieval is mean search. You basically, this is a model to allow you to take, to give, to generate some embedding in vector representation of a certain text.

And one of the most popular model of this, API of this is text embedding. Text embedding AIDA, if you ever heard about that for OpenAI, we use it a lot to create, to help us to create a vector representation of a document so that the search algorithm can base on that to find the matching chunks. So this is the three model that we're going to use a lot in document Q&A. Okay.

But before we move to document Q&A, like I said before, document Q&A is not something out of the box. It's not something that really complex but it's something that easily go wrong. For example, Air Canada, well, they got the AI go wrong and they have to pay money for that. Now, there's a argument that the AI chatbot here is actually not AI chatbot. Like, they were written with some dumb algorithm behind and they don't really use chat GPT or any GPT behind it. But again, that's a different story. All I know is that the chatbot go wrong and now the airlines have to pay for that because give misleading information. And that's just one part of the problem that document Q&A is facing if you don't pay attention to what you implement or you don't understand what you implement. So let's take a look at what is document Q&A as a service. So to put it simply, it's just a text input and a button where the user would type inside there a question and send the questions to the AI and ask for an answer.

3. Injection and Query Phases for Document Q&A

Short description:

In the injection phase, the AI service takes in a document and processes user queries to provide relevant answers. Storage and indexing of document chunks are essential for efficient query processing. The smaller and more relevant the chunks, the fewer tokens needed. Embeddings are used for vector or semantic search. In the querying phase, computing the right prompts and obtaining the relevant data chunks without exceeding token limits is important.

Which means, user asks, AI answers. But not with anything. It have to be within a document range that we call grounding. So in fact, when you look at this description, there's two things happen here. The first thing is the injection phase where the AI, where the service took in a lot of documents, a single document, whether predefined or it's on the fly, uploaded by user. It called the injection phase. And then, based on this provided document, it can process a query or questions given by user and give back to user an answer with the relevant piece of document or data section from the document that is given. In fact, we have two flow here that go like this. So there's two phase here. The most important one is the injection phase because it's providing the ground place for AI to be able to process the query from user and given the right answer. Injection and query.

So what important in the injection phase we need to pay attention to? Every document, every paragraph, every text is tokens. Again, everything present in a document can be translated to token and inexpensive. How are we going to store the data? How are we going to store the document in order for the AI or for the service to find and process the query on the right data? So let's take a look at the injection phase, the flow. Assuming you have several files here and it can be a PDF, it can be a mocked out, it can be a code file even in case you want to generate some document Q&A for your code repo.

So what you need to do, you have to load and parse this document and split it into a structural chunks. The smaller the chunks, the more relevant the chunks, the easier for you to pass a token, to pass it to the AI and save yourself a lot of tokens. And then after that you need to create chunks embedded. Like we say, embedded is the representative of chunks and this is important because you need to index these chunks with the embeddings into a database, an index database, so that whenever the AI, the service, it asks a question, it will try to look for the relevant chunks to inform the questions and then pass it to the AI. The important thing to pay attention is that the embeddings algorithm here, embedding model here is used for vector search or semantic search.

For the querying phase, there's several things we need to pay attention and we need to focus. First thing, how we compute the right prompts? Prompt is also money. How we're going to compute the right prompt, whether we can compute the same prompt for every single industry-specific scenario or we have to change or we have to modify them. We also have to think of how we're going to get the right data, the right chunks instead of passing a large document as together with the user query to the AI and cause our service to break down because we don't have enough token.

4. Querying and Prompting in Document Q&A

Short description:

In the querying phase, we need to focus on computing the right prompts and obtaining relevant data chunks without exceeding token limits. The flow involves creating embeddings from the input query, computing prompts and summarizations, formatting the answer and chunk, and conducting efficient vector and semantic search to find the most matching chunk. Additionally, we can improve the prompting computing by providing examples for training the AI model.

And in this phase is the one on the left side. When it get the input, it create embeddings ready for the search algorithm to be able to query on.

So let's look at the flow. In the queried flow, we will create embedding from the input query because we need this to be able for the search algorithm to be able to find the match based on this embedding. And then, we'll send, we'll compute the prompts together with the chunks and ask the AI for summarizations from the chunks and the query we receive. And after that, we also have to format the answer and the chunk according to what we want to display to the user. And we return the answer. So for the search, let we say, vector semantics search and in this flow again, it will be on the right side where we create the embedding for query. And then, we will use it. We will pass to a, we will pass the embedding to the algorithm, the search algorithm and it will look upon the store index and find the right, the similar chunks, the most similar chunks and return to us. And it's very, very efficient. Vector search and semantic search together is very efficient in order for us to find the most match chunk. I don't do that.

The second part of the flow here that we can improve is the prompting computing. All you can say is prompt engineer, though I don't know why we call it engineer because there's no engineer here. It's just playing with text. So how are we going to improve that? This is an example of a user prompt, a very simple one that just telling it to read the question and answer the question based on the document given. And it give you and you pass the chunks as part of the, you pass the chunks as part of the question, part of the prompt. And then you also pass the chunk and pass the question as part of the prompt. So how are we going to train, how are we going to train our model, our AI with this, with this prompt? Well, we can do something like this. We can add some example here together where we can give some example chunks and how the format of the answer will look like, like with citation how it would display. And we also can give it some example questions so it know what it should refer to. This one, we call it fine tuning or some will call it few shots. You can also provide more than one example.

5. Improving User Prompt and Service Generality

Short description:

To improve the user prompt, you can provide more than one example and support localization by specifying the desired language. However, there is no generic document Q&A service as prompts need to target specific industries with specific formats. To make the service more generic, you can deploy multiple instances for different industries.

You can also provide more than one example. And another way to improve the user prompt is when you need to support localization. Let's say if you want to have the documents on English but you want the document Q&A to answer in Chinese or Italian, then there's several way to do it. You can do it with given a prompt, a sentence say, always return the answer in this language and if the GPT support the language proper, it will return to you the right answer in the right language or you can also do it with other thing like pre-process the data into the language or pre-process the query into the target language and so on. But this is the easiest way to do localization.

Other things to make your service become generic. Well, the first one, disclaimer, there's no generic document Q&A service. Every prompt have to target a specific industry because financial report is different from co-file and co-file is different from sustainability report and so on. So the prompt have to be tied to a specific industry with specific format. One way to do to make it a bit more generic is in when you deploy the service, you can create several instance and inject inside your own prompt. Like here, you can inject the topic inside and make sure that the prompt is sell a target per instance. So you can deploy several instance for several industry. It can cost a bit more, but it can support your customer if that's what needed.

6. Flow and Components for Document Assistance

Short description:

To parse documents, you can use Azure Document Intelligence for PDFs or text splitter for structured files. To create chunk embeddings, try the text embedding adder from OpenAI. Store the embeddings in a search service like Azure AI Search or PyCon. Split the database into two to avoid heavy index databases. For querying, use OpenAI's text embedding to create embeddings.

It can cost a bit more, but it can support your customer if that's what needed.

Okay. And that's enough with prompt. So here come, now we come to the next section. We talk about the flow and the components. What service, what LM services we can use for which component.

Okay. Let's take a look. So document assistance flow for injections. For low parse document, we need something to be able to parse the document, right? So if you get a PDF, you have to use some sort of PDF reader. So you have to use something that it will take the PDF and then output for you the text structure. For that, you can use a service called Azure, oops, sorry. For that, we can use a service called Azure Document Intelligence, which is very good for parsing PDF into a structured data table. But if you only write, only use documentation on a marked out file or on a code file that's already structured, you don't really need document intelligence because it's like, it costs money and it's also very, very heavy. You can use text splitter, which is a legend from legend and it's also good to split the text into chunks.

The other way, the other component is create chunks embedding. For this one, you can use text embedding adder from OpenAI. This text embedding adder tool is very good for create embedding and you don't, almost don't have to do a lot of works in order to create the embeddings for that. And after you create the embeddings, you need to store somewhere. So for storing, you have to put it in a search service, the smart search service, which can be Azure AI Search. I have to add Azure AI Search because I work for Microsoft. But anyway, you can use PyCon. PyCon is the open source product that is very good also. And it's have that vector database where you can save your index inside. And that's not only it. You don't save your chunks with the embeddings on the chunks with all the embeddings on the metadata inside the index database because it will be very heavy for the index database and you may have to pay more. So for that, I suggest that you need to split the database into two. One only for the index database, one only for the index and embeddings and the other one will save the chunks, the original chunks, and it would connect with each other according to ID or something so we can search for it.

Okay. For querying, the first component create embedding, again, we use an open AI with text embedding to create embedding.

7. Embedding Models and Flow Optimization

Short description:

To ensure accurate matching, use the same model for embedding. GPT 3.5 tuple and a bot provide reliable results. Use PyCon and Azure AI search for searching match chunks. Use Lengen for easy component orchestration. Remember to update the index with new documents instead of rerunning the service. Save minimum metadata with embeddings in the index DB. Optimize tokens by cleaning up queries and compressing prompts.

We need to use the same model so that the search algorithm can find the match because different model would do the embedding differently. And GPT using for S for summarizations by on the chunk and so on, if you want to have a good result, we did this research about it and we found out that GPT 3.5 tuple and a bot will give more and more and more reliable result.

Search for match chunks, we will, again, we have to use this search algorithm, right? We use the smart search service which is PyCon and Azure AI search. And lastly, to be able to monitor the whole flow without a lot of effort, I would suggest to not write in your own pipeline but instead try to use chaining from Lengen and this way all the component will be connected to each other and you don't have to spend time to make one input, one output of component become the other input of another component. That Lengen would take care of that for you. And this we call an orchestration. And Lengen have a lot of different way to do it that's available on the website documentation so you can check it out. Also, you can do semantic kernel but it doesn't support JavaScript anyway.

These are some resources that you can use to build the flow with this component. And lastly, rule of thumbs. If you have to do a pre-inject document Q&A, which means based on the existing documents, which means sometimes the user have to add a new document inside the system, you don't want to rerun the service again. You create a scheduler or something just to update the existing index with the new document which means do the whole flow of injection again but only on that document only. Second, always save the minimum metadata together with embedding in the index DB to save yourself some time. And lastly, optimization. Token optimization is crucial because you don't want to pay money for that. So, you always have to clean up the query. You know why space, trailing space or any software will be count as token. So, one of the thing is you can pass it to the AI, to the GBT and ask it to clean it for you, receive the clean query and put it into the prompt or trying to optimize the prompt by some compressed algorithm. I think MongoDB have some talk, nice talk about how you can compress this. And this will save yourself a lot of token and it will make your document Q&A service become much better.

QnA

Testing Embedding Models and AI Validation

Short description:

Thank you for joining my talk. Embedding models can provide different results. Testing different models is important but challenging. Manual testing and automation are used to validate the results. AI testing itself can be unreliable. Other questions and a plug for DataStax's AstroDB.

And that's it. Thank you for joining my talk. I'm going to start. That's why I said there's a lot of new stuff that we have to kind of, I think, pick up to work with this generated AI stuff and thank you for explaining that.

I wondered about embedding models. Have you found kind of different results using different embedding models at all?

Yes. So, actually, when we do it the initial state, we test it with three different model module and the result is pretty different. Like, I must say, I have the slide but I don't have enough time so I didn't show it. But the difference will be like the misleading depend on which model you use and which prompt you use. It can give you from 20, like, out of 40 questions at the data set, it can give you from 20 questions that correct to 29 questions that correct. So like 40%, 50%, something like that. And there was a 10% margin between the different prompts.

So is there a way that you can test these different models to see whether you're getting which one's going to give you the right kind of result for your use case?

Yeah, that's also another important aspect, testing, right? So yeah, initially, we have the manual testing where the human have to validate one by one but then we have to do 400 question data sets for testing and no one be able to do that. That's a new job for QA, isn't it?

Yeah. The data scientists have to come up with something like automation for LLM, this process of automation. So they pass the questions to the AI and let the AI actually validate it for them and then return it. And then you do the final test with the human.

Right, so the AI is creating it and testing it and then eventually we have a look. But again, it's like you're asking AI to test its own works and then you're still not sure whether it's good or not.

Well at least I don't think they have egos. So if they're like, I'm wrong, then that's fine. That's fine.

Well, if the AI is wrong, that's the problem, right?

Right. We've got some other questions. All right, that's lovely. I have to point out whilst I'm here that you mentioned Pinecone, you mentioned the Azure AI service. I work for DataStax. We also have a vector store that you can use, check out AstroDB. Quick plug, sorry. Okay, this is an interesting one which I've just lost.

Token Usage and Meta Prompts

Short description:

Constructing prompts with many tokens can be costly. Adding a meta prompt with customization can optimize token usage. Cleaning queries can remove unnecessary words. Jesse Hall's talk on optimizing meta prompts is recommended. Building a specific meta prompt is important, but smaller prompts can work too. Azure AI prioritizes safety, while other platforms may require additional content validation.

The constructed prompt, there we go, adding the instructions on how to answer uses quite a lot of tokens when you've retrieved your chunks and you've got a load of stuff. So is there a way to sort of abstract that at all, make that kind of smaller so that, yeah, I guess it's not costing you so much money. Tokens are money. Well, one of the things is that, that's a tricky question though, to be honest. If one, in one side, if you put a lot of things in the prompts, you can guard the AI better.

So one time I saw one of our, I cannot say which one because it's work, but anyway, so we have a meta prompt that he, he, the guy who wrote it, say that it cover all legal aspects of an AI but it cost about 900 to 1000 token and that's a meta prompt. And if you add your own system prompt inside, like additional prompts to customize it, it can reach to 1,500 token just for your own prompt with the meta prompt. And so to be able to optimize this, the only way that you have to run some kind of a cleaning query to remove all this words that doesn't make sense, like do this or something. I would say, I would suggest everyone to check out a talk of Jesse Hall, I think it's Jesse Hall from MongoDB. He talked about how to optimize the prompts, the meta prompts, for better like token saving, cost saving. And that talk was very good. Yeah, nice. I think here somewhere as well, so.

Yeah, I guess, you know, building a big meta prompt like that is kind of important when you need the model to behave very specifically, like if you don't have quite a stringent kind of things, you can go smaller. Yeah, I mean, if you don't want to cover everything like safety, content safety, I don't know, like misleading, or being of the AI become very happy and conversational, then it's okay, you can make it, you can make it smaller and more. I actually believe, I've seen, I've not used it myself, but I believe that Azure AI does do a lot of work to keep you more safer as well. I don't know if that's using a meta prompt for it. So yeah, that's the thing. If you use open AI, then it's already at the layer of content safety. So you don't have to deal with it. But if you use another open AI, you have to add the contents, like you need to be responsible and write another component to validate the query is not going to be harmful and so on. So yeah, I'm from all... I brought that up, that's fine.

Creative Uses of LLM and Business Rush

Short description:

Creative uses of LLM include generating music and product descriptions from images. Analyzing video to find relevant points won a hackathon. Businesses' rush to use AI for marketing purposes remains a challenging question. LLMs still require observation and improvement for various applications.

I brought that up, that's fine.

That's a good question. What's the most creative use of an LLM that you've come across? Well, a lot. Like someone generate music using LLM. One of my current project, I mean gem network here, I'm working on generate some product descriptions from the image. So basically I pass the image to the GPT, image of the product. And then I want to get back the image description and the text that related to that product and the description so I can just upload it to, I don't know, a store and have it deployed.

Very nice. Very nice. I assume you could sort of use that to build alt text for images for better accessibility in applications as well. Yeah. Also, one of the project that won our hackathon in Microsoft is to analyze video and give you the... What do you call? The point where the talk is relevant to you. Very cool. Yeah. Nice.

Alright. We've got one more question. You spoke about how the LLMs can go wrong and gave that example of Air Canada. Do you think businesses are rushing to use AI purely for marketing purposes?

Sorry? Do you think the businesses are rushing to use AI just for marketing purposes? Ah, wow, that's a very hard question. I mean, in one end, I'm working for Microsoft, you know? So I have to say, okay. Well, I kind of feel like we still need to monitor, not to monitor, to observe AI in the LLM. It has a lot of potential. Up until now, it's not yet in the production level yet, except co-pilot. But let's say, document Q&A, we still have a lot of things to work on that. And also other things, maybe video generation also. So really, it has a lot of potential, but I still don't feel confident to say that it's there other than marketing purposes. I think, from where I've seen things, I think there are some useful things. I think you're also right. We're very early."

Experimenting with AI and Job Security

Short description:

Developers should experiment with AI and its potential. AI won't replace jobs for at least 100 years. Using AI can enhance job security. Copilot is useful for code, but not in other areas. Thank you, Maya!

But actually, I'd encourage us as developers to be experimenting with it. Let's not let a marketing or a product person who's gone wild with power decide you can take over. You have to have AI and things right now. Build things yourself with it, and see what it can do. And be the person that can actually suggest those ideas. I think that's, as developers, we owe it to ourselves to kind of experiment with these things and know what we can do with them.

Also, one more thing is that it's not going to be here to take over our job. Yet. Probably in 100 years. So we don't have to worry about it. Three different estimates on that kind of thing. I'm sure I've seen we're all done in three to five. I don't know, maybe I'm wrong. I mean, at least not in my lifetime. We have had another one come through. That was the next question that came through. Do you think we're all going to lose our jobs? Not yet. Cool. Excellent. Good to know. Yeah, well, and again, another way to not lose your job is to also know how to use the AI. So if you're the one powering it until it can take over that and then that's the singularity and it's all over, right? I mean, I don't use Copilot in office. I mean, I have Copilot. I just don't use it. I only use Copilot in code. Because that's the only way I really enjoy using it. Otherwise, it's just pretty annoying.

All right. Well, I think we. That was that. And that was that. Unless there are any more questions, you've got seconds to put it in. We are out of questions. So thank you very much again, Maya. Please give everybody. Give Maya another round of applause. Thank you. Thank you.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript

JSNation 2023

21 min

Building a Voice-Enabled AI Assistant With Javascript

Top Content

Tejas Kumar

Author of the "Fluent React" bestselling book, software engineer with 23 years of experience, and host of the developer-loved ConTejas Code podcast.

This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.

case study artificial intelligence

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

JSNation US 2024

31 min

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

Top Content

Addy Osmani

Engineering Leader Working on Google Chrome

AI is transforming software engineering by using agents to help with coding. Agents can autonomously complete tasks and make decisions based on data. Collaborative AI and automation are opening new possibilities in code generation. Bolt is a powerful tool for troubleshooting, bug fixing, and authentication. Code generation tools like Copilot and Cursor provide support for selecting models and codebase awareness. Cline is a useful extension for website inspection and testing. Guidelines for coding with agents include defining requirements, choosing the right model, and frequent testing. Clear and concise instructions are crucial in AI-generated code. Experienced engineers are still necessary in understanding architecture and problem-solving. Energy consumption insights and sustainability are discussed in the Talk.

artificial intelligence

AI and Web Development: Hype or Reality

JSNation 2023

24 min

AI and Web Development: Hype or Reality

Top Content

Wes Bos

Full Stack Developer, Speaker & Teacher, Co-host of Syntax.fm podcast.

This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.

productivity artificial intelligence

The Rise of the AI Engineer

React Summit US 2023

30 min

The Rise of the AI Engineer

Top Content

Watch video: The Rise of the AI Engineer

Shawn Swyx Wang

Latent.Space Editor & Smol.ai Founder

The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.

web development artificial intelligence builders and founders future of development

TensorFlow.js 101: ML in the Browser and Beyond

ML conf EU 2020

41 min

TensorFlow.js 101: ML in the Browser and Beyond

Jason Mayes

Web AI Lead at Google.

TensorFlow.js enables machine learning in the browser and beyond, with features like face mesh, body segmentation, and pose estimation. It offers JavaScript prototyping and transfer learning capabilities, as well as the ability to recognize custom objects using the Image Project feature. TensorFlow.js can be used with Cloud AutoML for training custom vision models and provides performance benefits in both JavaScript and Python development. It offers interactivity, reach, scale, and performance, and encourages community engagement and collaboration between the JavaScript and machine learning communities.

tensorflow machine learning innovation tensorflow react

Web Apps of the Future With Web AI

JSNation 2024

32 min

Web Apps of the Future With Web AI

Jason Mayes

Web AI Lead at Google.

Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.

artificial intelligence

Workshops on related topic

AI on Demand: Serverless AI

DevOps.js Conf 2024

163 min

AI on Demand: Serverless AI

Top Content

Featured WorkshopFree

Nathan Disidore

In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.

serverless architecture artificial intelligence

AI for React Developers

React Advanced 2024

142 min

AI for React Developers

Top Content

Featured Workshop

Eve Porcello

Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)

react next.js artificial intelligence

Vibe coding with Cline

JSNation 2025

64 min

Vibe coding with Cline

Featured Workshop

Nik Pash

The way we write code is fundamentally changing. Instead of getting stuck in nested loops and implementation details, imagine focusing purely on architecture and creative problem-solving while your AI pair programmer handles the execution. In this hands-on workshop, I'll show you how to leverage Cline (an autonomous coding agent that recently hit 1M VS Code downloads) to dramatically accelerate your development workflow through a practice we call "vibe coding" - where humans focus on high-level thinking and AI handles the implementation.You'll discover:The fundamental principles of "vibe coding" and how it differs from traditional developmentHow to architect solutions at a high level and have AI implement them accuratelyLive demo: Building a production-grade caching system in Go that saved us $500/weekTechniques for using AI to understand complex codebases in minutes instead of hoursBest practices for prompting AI agents to get exactly the code you wantCommon pitfalls to avoid when working with AI coding assistantsStrategies for using AI to accelerate learning and reduce dependency on senior engineersHow to effectively combine human creativity with AI implementation capabilitiesWhether you're a junior developer looking to accelerate your learning or a senior engineer wanting to optimize your workflow, you'll leave this workshop with practical experience in AI-assisted development that you can immediately apply to your projects. Through live coding demos and hands-on exercises, you'll learn how to leverage Cline to write better code faster while focusing on what matters - solving real problems.

artificial intelligence

Building Full Stack Apps With Cursor

JSNation 2025

46 min

Building Full Stack Apps With Cursor

Featured Workshop

Mike Mikula

In this workshop I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own AI generated full stack app on your machine!
Please, find the FAQ here

artificial intelligence

Free webinar: Building Full Stack Apps With Cursor

Productivity Conf for Devs and Tech Leaders

71 min

Free webinar: Building Full Stack Apps With Cursor

Top Content

WorkshopFree

Mike Mikula

In this webinar I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own ai generated full stack app on your machine!

fullstack artificial intelligence

Working With OpenAI and Prompt Engineering for React Developers

React Advanced 2023

98 min

Working With OpenAI and Prompt Engineering for React Developers

Top Content

Workshop

Richard Moss

In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps

artificial intelligence openai react and ai