Llms Workshop: What They Are and How to Leverage Them

Rate this content
Bookmark

Join Nathan in this hands-on session where you will first learn at a high level what large language models (LLMs) are and how they work. Then dive into an interactive coding exercise where you will implement LLM functionality into a basic example application. During this exercise you will get a feel for key skills for working with LLMs in your own applications such as prompt engineering and exposure to OpenAI's API.


After this session you will have insights around what LLMs are and how they can practically be used to improve your own applications.


Table of contents: 

- Interactive demo implementing basic LLM powered features in a demo app

- Discuss how to decide where to leverage LLMs in a product

- Lessons learned around integrating with OpenAI / overview of OpenAI API

- Best practices for prompt engineering

- Common challenges specific to React (state management :D / good UX practices)

This workshop has been presented at React Summit 2024, check out the latest edition of this React Conference.

FAQ

The LLAMA 2 model, released by Meta.ai, is a large language model with variants ranging from 7 billion to 70 billion parameters. It is an open weights model, meaning that its architecture and parameters are publicly released, allowing anyone to work on it.

Unlike models like ChatGPT, whose architecture and parameters are not publicly available, LLAMA 2's architecture and parameters are released by Meta.ai. This allows individuals to work on the LLAMA 2 model independently.

Large language models like LLAMA 2 are trained by taking a large chunk of internet text (approximately 10 terabytes) and running it through a GPU cluster with about 6,000 specialized GPUs over 12 days, costing around $2 million. This process compresses the text into a parameter file used by the model.

Pre-training involves training the model on a large amount of internet text to learn general knowledge and language patterns. Fine-tuning involves training the model on a smaller, high-quality dataset with specific instructions to generate desired responses, such as answering questions accurately.

The presenters are Nathan Mars, the tech lead of the DataViz squad at Grafana Labs, and Horace Rzajac, a software engineer on the Explore team at Grafana Labs.

A large language model (LLM) is a type of artificial intelligence model designed to understand and generate human language. It consists of parameters and a run file, and it is trained on vast amounts of text data to predict the next word in a sequence. Examples include Meta's LLAMA 2 and LLAMA 3 models.

The purpose of fine-tuning a model is to adapt it from a general document generator to a more specialized assistant that can provide accurate and helpful responses to specific queries. This involves training the model with high-quality Q&A documents created by human labelers.

Model inference is the process of running a trained model to generate outputs, such as text, on a local machine without needing internet connectivity. Model training, on the other hand, involves training the model on large datasets using specialized GPU clusters. Training is computationally intensive and expensive.

Grafana Labs uses large language models in various ways, such as generating panel titles and descriptions, summarizing incidents, and analyzing flame graph profiling data. These applications help reduce user toil and make complex data more accessible.

Example applications of LLMs within Grafana include the Dashboard Assistant, which helps generate titles and descriptions for panels, and Flame Graph AI, which analyzes performance data and highlights bottlenecks.

Nathan Marrs
Nathan Marrs
Haris Rozajac
Haris Rozajac
66 min
28 Jun, 2024

Comments

Sign in or register to post your comment.
Video Summary and Transcription
Today's Workshop introduced large language models (LLMs) and their implementation using C. The training process involves compressing a large amount of text into parameters, resulting in a lossy approximation. LLMs generate text based on their training, but the generated content may include hallucinations or partially correct answers. Fine tuning and reinforcement learning stages improve the performance of LLMs. In the context of Grafana, LLMs are used for tasks such as generating titles and descriptions, understanding flame graph profiling data, and generating pizza names.

1. Introduction to Large Language Models

Short description:

Today we're going to be looking at the introduction to large language models. We'll first begin with an introduction to large language models and then go over a few early applications within Grafana. The LLAMA 2 70B model is a large language model released by Meta.ai and it has models ranging from 7 billion to 70 billion parameters. The LLAMA 2 70b model is just two files on your system - a parameters file and a run file. The parameters file is 140 gigabytes and each parameter is stored as two bytes. The run file contains the code that runs the neural network.

So, welcome everyone. Today we're going to be kind of looking at introduction to large language models. So quick introductions. I can go and then my colleague, Horace, can go. So my name is Nathan Mars and I'm the tech lead of the DataViz squad at Grafana Labs. And kind of been experimenting in this world of large language models and how it can apply in the world of observability for the past year and a half now. And I'm really excited about this space.

Horace, if you want to introduce yourself. Yeah, of course. Yeah, my name is Horace Rzajac. I'm also working at Grafana Labs. I'm a software engineer on the Explore team. And also, yeah, I've been playing with LLMs for about a year now and thinking about how to implement them at Grafana and some other personal projects.

So we're going to kick it off here. So to begin with, we're going to start with this high-level agenda here. We'll first begin with an introduction to large language models and then we're going to go over a few early applications of large language models within Grafana. Then we'll have a short five minute break, five minute or so break, and transition to our hands-on activity working with large language models that Horace will be leading. We're going to kick off. Let me add these people into the call.

All right. So first of all, I want to say huge credit goes to Andrej Karpathy for the majority of this content and structure of this intro material. A super helpful resource for creating this. So what is a large language model? It is in actuality just two files. In this example here, we're looking at a LLAMA 2 70B 70 billion model. This is a large language model released by Meta.ai. And this is a second iteration of the LLAMA model. LLAMA 2 has models ranging from 7 billion, 13 billion, and up to 70 billion parameters. 70 billion is the biggest one and many people like this model specifically because it's the most powerful open weights model. The weights and architecture of this model was all released by Meta. So anyone can work on this model by themselves. This is unlike many other language models that you might be familiar with. For example, if you're using chatTBT or something like this, that model's architecture was never released and is owned by open AI. You're allowed to use it through a web interface, but you don't have actual access to the underlying model itself. A quick update in the LLAMA world, though. Meta actually released LLAMA 3 a few months ago. And this has an 8 billion and 70 billion model variant. The underlying concepts we're going to be discussing and exploring today apply to both LLAMA 2 and 3 as well as the other large language models out there today.

So in this case, the LLAMA 2 70b model is really just two files on your system. There's a parameters file and then the run file, which is just some code that runs those parameters. We'll take a high level look at the run file in a few minutes. The parameters are basically just the weights of the neural network that is the language model. Every one of those parameters is stored as two bytes. And because this is a 70 billion parameter model, the parameters file is 140 gigabytes. Each parameter is two bytes because they're float 16 numbers. And this is a pretty standard data format that gives just enough precision at the cost of how much data it takes to store and to run. How much memory it takes to run the model. So you have the parameters file. You also need something that runs the neural network. And this piece of code is implemented in our run file. This could be a C file or Python file or any other programming language really.

2. Implementing the Large Language Model

Short description:

You can use JavaScript or any other arbitrary language, but C is recommended for its simplicity and efficiency. Implementing the neural network architecture that uses the parameters to run the model would only require about 500 lines of C code. The model can be used on your computer without internet connectivity as it is a fully self-contained package. By compiling the C code, you can talk to the large language model by providing prompts and receiving generated text as output. The computational complexity lies in how the parameters are formed, while the run C file contains the core logic to load and run the model using matrix multiplication for performance computations.

You can do in JavaScript if you want to. It can be written in any arbitrary language, but C is a very simple and efficient language from a memory standpoint, especially. So just to give you a sense, it would only require about 500 lines of C with no other dependencies to implement the neural network architecture that uses the parameters to run the model. So you can take these two files and you can take your MacBook or whatever computer you have and use the model. This is a fully self-contained package that has everything that's necessary. You don't need any connectivity to the Internet or anything else. You can compile your C code and you get a binary that you can just point at the parameters and you can talk to this large language model.

So in this example, you can send it a prompt to write a poem about this conference and get the language model generating text back. So in this case, I'll keep pulling it over and over here, I gave it the directions, gave a poem about React Summit. It's running the model and we can see it outputs some text about a poem. So this is a very small package, but the computational complexity really comes in with how these parameters actually formed. The run C file does not contain any surprises and everything is algorithmically understood and open. This is an actual look inside of a run C file. You can see it's kind of a basic C file. It's basically the core logic to load and run the model. Here's kind of an abstraction of what's going on here. So the LLAMA2 model and other models use matrix multiplication to performance computations. The model has a large number of parameters which are stored in matrices. The model performs matrix multiplication on these parameters to generate the output text. And this run C program is a relatively simple program. It consists of a main function and a few helper functions. The main function is responsible for reading the input text file, calling the model to generate the output text and writing the output text to the output text file. The helper functions perform various tasks such as reading and writing files, allocating memory, which is a huge piece of this, and performing matrix multiplication.

Watch more workshops on topic

AI on Demand: Serverless AI
DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Top Content
Featured WorkshopFree
Nathan Disidore
Nathan Disidore
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
AI for React Developers
React Advanced 2024React Advanced 2024
142 min
AI for React Developers
Featured Workshop
Eve Porcello
Eve Porcello
Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)
Working With OpenAI and Prompt Engineering for React Developers
React Advanced 2023React Advanced 2023
98 min
Working With OpenAI and Prompt Engineering for React Developers
Top Content
Workshop
Richard Moss
Richard Moss
In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps
Building AI Applications for the Web
React Day Berlin 2023React Day Berlin 2023
98 min
Building AI Applications for the Web
Workshop
Roy Derks
Roy Derks
Today every developer is using LLMs in different forms and shapes. Lots of products have introduced embedded AI capabilities, and in this workshop you’ll learn how to build your own AI application. No experience in building LLMs or machine learning is needed. Instead, we’ll use web technologies such as JavaScript, React and GraphQL which you already know and love.
Building Your Generative AI Application
React Summit 2024React Summit 2024
82 min
Building Your Generative AI Application
WorkshopFree
Dieter Flick
Dieter Flick
Generative AI is exciting tech enthusiasts and businesses with its vast potential. In this session, we will introduce Retrieval Augmented Generation (RAG), a framework that provides context to Large Language Models (LLMs) without retraining them. We will guide you step-by-step in building your own RAG app, culminating in a fully functional chatbot.
Key Concepts: Generative AI, Retrieval Augmented Generation
Technologies: OpenAI, LangChain, AstraDB Vector Store, Streamlit, Langflow
Building Your Own GenAI Agent Application
React Summit US 2024React Summit US 2024
87 min
Building Your Own GenAI Agent Application
WorkshopFree
Amit Mandelbaum
Idan Rozin
2 authors
GenAI agents are one of the most promising directions for complex GenAI based applications. These agents can search the web, code, and carry complex tasks completely autonomously for the user. In this workshop we will learn the basics of GenAI agents. Define the basic terms and frameworks and understand how they differ from traditional use of LLMs.We will understand prompting techniques for LLM agents and specifically the React prompting technique for AI agents (not to be confused with React the programming language). We will build, from scratch, our own GenAI agent that can interact with the user, perform web searches and code and execute in Javascript.Table of contents:- Introduction to GenAI agents- Understanding the React framework- Hands-on Building of simple GenAI agent- Deployment of the Agent with streamlit- Tips, and frameworks for developing GenAI agents

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript
JSNation 2023JSNation 2023
21 min
Building a Voice-Enabled AI Assistant With Javascript
Top Content
This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.
AI and Web Development: Hype or Reality
JSNation 2023JSNation 2023
24 min
AI and Web Development: Hype or Reality
Top Content
This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.
The Rise of the AI Engineer
React Summit US 2023React Summit US 2023
30 min
The Rise of the AI Engineer
Top Content
Watch video: The Rise of the AI Engineer
The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.
The Ai-Assisted Developer Workflow: Build Faster and Smarter Today
JSNation US 2024JSNation US 2024
31 min
The Ai-Assisted Developer Workflow: Build Faster and Smarter Today
AI is transforming software engineering by using agents to help with coding. Agents can autonomously complete tasks and make decisions based on data. Collaborative AI and automation are opening new possibilities in code generation. Bolt is a powerful tool for troubleshooting, bug fixing, and authentication. Code generation tools like Copilot and Cursor provide support for selecting models and codebase awareness. Cline is a useful extension for website inspection and testing. Guidelines for coding with agents include defining requirements, choosing the right model, and frequent testing. Clear and concise instructions are crucial in AI-generated code. Experienced engineers are still necessary in understanding architecture and problem-solving. Energy consumption insights and sustainability are discussed in the Talk.
Web Apps of the Future With Web AI
JSNation 2024JSNation 2024
32 min
Web Apps of the Future With Web AI
Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.
Code coverage with AI
TestJS Summit 2023TestJS Summit 2023
8 min
Code coverage with AI
Codium is a generative AI assistant for software development that offers code explanation, test generation, and collaboration features. It can generate tests for a GraphQL API in VS Code, improve code coverage, and even document tests. Codium allows analyzing specific code lines, generating tests based on existing ones, and answering code-related questions. It can also provide suggestions for code improvement, help with code refactoring, and assist with writing commit messages.