When Less Is More: A Technical Overview of LLMs and the Strength of Smaller Models

This ad is not shown to multipass and full ticket holders
React Summit
React Summit 2026
June 11 - 15, 2026
Amsterdam & Online
The biggest React conference worldwide
Upcoming event
React Summit 2026
React Summit 2026
June 11 - 15, 2026. Amsterdam & Online
Learn more
Bookmark
Rate this content

In generative AI, the largest large language models (LLMs) often dominate the headlines, hailed as the best solutions for the most complex and diverse tasks. While they certainly have their place, are they the best option for every enterprise use case?

Smaller language models are gaining traction for their ability to deliver high performance with lower cost and resource requirements. These models are quicker, easier to fine-tune, and better suited for targeted business needs, making them an attractive alternative for many organizations.


In this session, we will:

-Explore the technical structure and content of LLMs.

-Discuss how smaller, purpose-built models can be more efficient and effective for enterprise tasks, including how model optimization techniques can boost performance even more.

-Demonstrate how smaller LLMs can provide faster, more cost-effective solutions while still meeting the demands of specialized use cases.

This talk has been presented at AI Coding Summit 2026, check out the latest edition of this Tech Conference.

FAQ

The presentation is a technical overview of large language models and the strengths of smaller models.

Common Crawl is a web scraper that hosts a large collection of internet pages and data for model training.

Data is collected from the internet, filtered for quality, converted to mathematical representations, and tokenized.

Tokenization is the process of converting human language into mathematical symbols that a neural network can process.

Model inference is the process of using a trained AI model to generate outputs from new inputs, often requiring an inference engine.

Smaller models are faster and cheaper to run, and can be used on local hardware for data privacy and control.

Quantization is a technique to reduce the size of AI models by decreasing the precision of their parameters, maintaining accuracy while improving efficiency.

AI models can be found on platforms like Hugging Face, which offers models compatible with various inference engines.

Guide LLM and LLM EvalHarness are tools mentioned for benchmarking AI models and their performance.

Lugari Karasen is a developer advocate at Red Hat AI.

Legare Kerrison
Legare Kerrison
11 min
26 Feb, 2026

Comments

Sign in or register to post your comment.
Video Summary and Transcription
Lugari Karasen from Red Hat AI discusses the technical aspects of large language models, including data collection, tokenization, and neural network internals. Model training involves converting human language to mathematical representations and adjusting parameters iteratively in a complex environment. Inference engines like VLLM aid in deploying models for rapid data processing. Optimizing model size for efficiency without sacrificing accuracy is crucial, with quantization reducing model size while maintaining precision. Local deployment offers privacy and control, and smaller purpose-driven models can enhance workflows and experimentation.

1. Technical Overview of Language Models

Short description:

Lugari Karasen, developer advocate at Red Hat AI, discusses large language models, focusing on data collection, tokenization, neural network internals, and model inferencing. Touching on the strengths of smaller models, the process includes web scraping, data filtering, language conversion to mathematical representations, and tokenization for training.

Hi guys, I'm Lugari Karasen. I'm a developer advocate at Red Hat AI, and today we're going to talk about a technical overview of large language models and the strengths of smaller models. So what we're going to touch on today and the next 10 minutes, we're going to look at data collection for the pre-training of these models, the tokenization of that data, what the internals of a neural network look like. And then inferencing these models once it's time to get them into production. Through that, we will then touch on the strengths of smaller models.

So first up, data collection for pre-training. If you've ever posted on the internet, you've probably helped contribute to training these models. Here we can see Common Crawl's statistical graph of what a web hierarchy looks like. Common Crawl is a web scraper that hosts a bunch of the internet's pages and the data that are on them. Every lab has some data set similar to what Common Crawl has captured to train these models on.

So once you pull from the website, people will typically filter the URLs that are going to predictably lead to bad results. They're going to pull the text from these websites, ignore things that are not text, filter for the type of language that you want. Maybe you want it to be 65% English, some other percent of a different language. At the end of the day, it's all going to be converted to mathematical representations. From there, you're going to remove duplicates and hopefully any personal identifiable information, such as social security numbers, passwords, etc. From there, we're going to tokenize that data.

2. Model Training and Inference

Short description:

Converting human language into math through encoding and tokenization. Training models adjust parameters iteratively to reflect data patterns, operating in a complex parameter-rich environment. Inference engines like VLLM facilitate the production deployment of models for rapid data processing and new data generation.

So this is us converting human language into math. If we see an excerpt from Alice in Wonderland, we can convert it to binaries with UTF-8 encoding, grouping them into 8 bits. This process, combined with byte-pair encoding, results in a tokenized version of the human language, associating words with corresponding tokens.

During training, the model adjusts parameters to reflect patterns in the data it was trained on. This involves tweaking weights through an iterative process to reduce loss and make the model more representative of the dataset. The complexity lies in the billions of parameters models typically have, with examples like Chat2BT5 trained on trillions of parameters, creating an intricate vector embedding space.

Inference involves putting models into production to process user inputs quickly and generate new data. An inference engine, like VLLM, is vital for this process, requiring inputs such as a config, tokenizer, and safe tensors. VLLM, supported by Red Hat and available on HuggingFace, stands out for its user-friendly compatibility with various models.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript
JSNation 2023JSNation 2023
21 min
Building a Voice-Enabled AI Assistant With Javascript
Top Content
This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.
The Ai-Assisted Developer Workflow: Build Faster and Smarter Today
JSNation US 2024JSNation US 2024
31 min
The Ai-Assisted Developer Workflow: Build Faster and Smarter Today
Top Content
AI is transforming software engineering by using agents to help with coding. Agents can autonomously complete tasks and make decisions based on data. Collaborative AI and automation are opening new possibilities in code generation. Bolt is a powerful tool for troubleshooting, bug fixing, and authentication. Code generation tools like Copilot and Cursor provide support for selecting models and codebase awareness. Cline is a useful extension for website inspection and testing. Guidelines for coding with agents include defining requirements, choosing the right model, and frequent testing. Clear and concise instructions are crucial in AI-generated code. Experienced engineers are still necessary in understanding architecture and problem-solving. Energy consumption insights and sustainability are discussed in the Talk.
The Rise of the AI Engineer
React Summit US 2023React Summit US 2023
30 min
The Rise of the AI Engineer
Top Content
Watch video: The Rise of the AI Engineer
The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.
AI and Web Development: Hype or Reality
JSNation 2023JSNation 2023
24 min
AI and Web Development: Hype or Reality
Top Content
This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.
Web Apps of the Future With Web AI
JSNation 2024JSNation 2024
32 min
Web Apps of the Future With Web AI
Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.
The AI-Native Software Engineer
JSNation US 2025JSNation US 2025
35 min
The AI-Native Software Engineer
Software engineering is evolving with AI and VIBE coding reshaping work, emphasizing collaboration and embracing AI. The future roadmap includes transitioning from augmented to AI-first and eventually AI-native developer experiences. AI integration in coding practices shapes a collaborative future, with tools evolving for startups and enterprises. AI tools aid in design, coding, and testing, offering varied assistance. Context relevance, spec-driven development, human review, and AI implementation challenges are key focus areas. AI boosts productivity but faces verification challenges, necessitating human oversight. The impact of AI on code reviews, talent development, and problem-solving evolution in coding practices is significant.

Workshops on related topic

AI on Demand: Serverless AI
DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Top Content
Featured WorkshopFree
Nathan Disidore
Nathan Disidore
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
AI for React Developers
React Advanced 2024React Advanced 2024
142 min
AI for React Developers
Top Content
Featured Workshop
Eve Porcello
Eve Porcello
Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)
Building Full Stack Apps With Cursor
JSNation 2025JSNation 2025
46 min
Building Full Stack Apps With Cursor
Featured Workshop
Mike Mikula
Mike Mikula
In this workshop I’ll cover a repeatable process on how to spin up full stack apps in Cursor.  Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development.  We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more!  By the end expect to be able to run your own AI generated full stack app on your machine!
Please, find the FAQ here
Vibe coding with Cline
JSNation 2025JSNation 2025
64 min
Vibe coding with Cline
Featured Workshop
Nik Pash
Nik Pash
The way we write code is fundamentally changing. Instead of getting stuck in nested loops and implementation details, imagine focusing purely on architecture and creative problem-solving while your AI pair programmer handles the execution. In this hands-on workshop, I'll show you how to leverage Cline (an autonomous coding agent that recently hit 1M VS Code downloads) to dramatically accelerate your development workflow through a practice we call "vibe coding" - where humans focus on high-level thinking and AI handles the implementation.You'll discover:The fundamental principles of "vibe coding" and how it differs from traditional developmentHow to architect solutions at a high level and have AI implement them accuratelyLive demo: Building a production-grade caching system in Go that saved us $500/weekTechniques for using AI to understand complex codebases in minutes instead of hoursBest practices for prompting AI agents to get exactly the code you wantCommon pitfalls to avoid when working with AI coding assistantsStrategies for using AI to accelerate learning and reduce dependency on senior engineersHow to effectively combine human creativity with AI implementation capabilitiesWhether you're a junior developer looking to accelerate your learning or a senior engineer wanting to optimize your workflow, you'll leave this workshop with practical experience in AI-assisted development that you can immediately apply to your projects. Through live coding demos and hands-on exercises, you'll learn how to leverage Cline to write better code faster while focusing on what matters - solving real problems.
The React Developer's Guide to AI Engineering
React Summit US 2025React Summit US 2025
96 min
The React Developer's Guide to AI Engineering
Featured WorkshopFree
Niall Maher
Niall Maher
A comprehensive workshop designed specifically for React developers ready to become AI engineers. Learn how your existing React skills—component thinking, state management, effect handling, and performance optimization—directly translate to building sophisticated AI applications. We'll cover the full stack: AI API integration, streaming responses, error handling, state persistence with Supabase, and deployment with Vercel.Skills Translation:- Component lifecycle → AI conversation lifecycle- State management → AI context and memory management- Effect handling → AI response streaming and side effects- Performance optimization → AI caching and request optimization- Testing patterns → AI interaction testing strategiesWhat you'll build: A complete AI-powered project management tool showcasing enterprise-level AI integration patterns.
Build LLM agents in TypeScript with Mastra and Vercel AI SDK
React Advanced 2025React Advanced 2025
145 min
Build LLM agents in TypeScript with Mastra and Vercel AI SDK
Featured WorkshopFree
Eric Burel
Eric Burel
LLMs are not just fancy search engines: they lay the ground for building autonomous and intelligent pieces of software, aka agents.
Companies are investing massively in generative AI infrastructures. To get their money's worth, they need developers that can make the best out of an LLM, and that could be you.
Discover the TypeScript stack for LLM-based development in this 3 hours workshop. Connect to your favorite model with the Vercel AI SDK and turn lines of code into AI agents with Mastra.ai.