English versionEN

Crafting Intelligent Agents with Context Engineering

We've all heard about prompt engineering. But now with the emergence of context engineering you may be scratching your head about what the difference is. The reality is that to build reliable agents, we need to use all tools in our context toolbox to ensure they provide accurate responses in the correct format.

Let's dive into the world of context engineering, and understand the elements of context that we can tune and tweak to build more reliable AI agents and associated systems.

This talk has been presented at AI Coding Summit, check out the latest edition of this Tech Conference.

FAQ

The speaker is Carly Richland, a member of the developer advocacy team at Elastic.

Context engineering is the practice of managing and optimizing the context provided to large language models (LLMs) to ensure they perform tasks effectively and accurately.

Context is important for LLMs because it influences their actions and results. If the context window overflows or contains irrelevant information, it can lead to inaccurate outputs and increased costs, as LLMs typically charge per token.

A context window is the number of tokens that a large language model can process at once. It is crucial to manage this window to prevent overflow and ensure relevant information is used.

Strategies in prompt engineering include using zero-shot and multi-shot prompts, defining the role, directive, output formatting, providing examples, and setting style instructions and guardrails to guide the LLM's responses.

Vector searches use mathematical representations (vectors) of documents and queries to find semantically similar results, while traditional lexical searches rely on exact text matches using inverted indexes.

Embeddings are numerical vector representations of documents or queries generated by machine learning models. They capture semantic aspects and are used in vector searches to find similar content.

LLMs use short-term memory for current sessions (e.g., ongoing chat) and long-term memory for past interactions, which can be stored in databases for future reference.

Tools are functions that LLMs can invoke to perform actions, retrieve data, and achieve specific goals. They are defined with descriptions and input schemas to guide LLMs in selecting and using them effectively.

RAG is a technique where documents are chunked and stored in a vector database. Relevant queries retrieve semantically similar documents to provide context for LLMs, ensuring accurate and grounded responses.

artificial intelligence

Carly Richmond

23 min

23 Oct, 2025

Comments

Video Summary and Transcription

Carly emphasizes the importance of context engineering for agents, addressing challenges like model cutoff dates and biases in data. The discussion delves into user prompts, exploring zero-shot prompts and the role of user input in LLMs. Custom instructions and system prompts are highlighted in prompt engineering. The talk covers system prompts, RAG for document retrieval, and vector search for semantic relevance. Effective tool utilization, structured output, and the significance of context in tool building are also key points.

Available in Español: Creación de Agentes Inteligentes con Ingeniería de Contexto

Exploring Context Engineering for Agents

Short description:

Carly discusses the importance of context engineering for agents, highlighting the impact of context windows and the need for relevant information to avoid influencing actions adversely.

Hi, everybody. It's great to see you. My name is Carly, and today I want to talk to you about context engineering. So, you might be wondering why we need to talk about context for agents. Can we not just code and let it figure it out with its information? I'm going to explain why that's not necessarily the case. We're going to talk about each of the components of context engineering with examples to show you the kind of practices you'll be doing.

And, of course, we'll have Q&A at the end after this recording. So, if you want to ask me questions, do come and join me in the Q&A session. So, if you haven't met me before, hi, it's nice to meet you. I'm Carly Richland, and I work in the developer advocacy team at Elastic. I am collecting socials like Pokemon. I'm sure a few of you are as well. So, if I don't get to your question in Q&A and you want to come and ask me something afterwards, just scan that link there. That will take you to my link tree, and you can find me wherever works. And I'm happy to help or answer questions as they come up.

So, why do we care about the context that's being passed on LLM? Well, firstly, we need to understand what the context window is. In simple terms, the context window is the number of tokens that an LLM can process at once. And while you might be thinking, well, I'm seeing from a lot of the emerging new LLMs that they have these super big context windows, it doesn't necessarily matter what I'm sending there. I'm going to disagree and I'm going to say that that's not quite right. It's still very much possible to overflow a context window. It just gets harder with the more tokens. But also, we can actually end up doing things like poisoning our context. If we have irrelevant information in there that's not actually useful, that can influence the actions that an LLM can take and therefore impact the overall result. So, it's actually still very important to make sure that the context window represents the type of information that you want it to. And not just talking about a single agent, but also making sure when we get into multi-agent architectures that each of those has the right context that it needs not only to perform its own task but to coordinate with others. It's also partially a cost point as well if we think about it. Foundational models tend to be charging per token. So, of course, there's definitely savings to make to make sure that our agents are working within our means and also making sure that we're not blowing the entire company budget when we're building these tools. It's also that LLMs do still make things up. They still hallucinate a little bit and that's down to a few different reasons.

Challenges in Model Contextual Understanding

Short description:

Discussing challenges with model cutoff dates, knowledge access limitations, biases in data, language ambiguity, and the issue of models not admitting uncertainty. Emphasizing the need for proper context engineering tools to prevent model failure due to missing information.

It can be down to the cutoff point for the model which is basically the date after which it doesn't have training information available. So, if you're asking a model with a cutoff date from 2023 what happened last week, it just doesn't know the answer. It can also be down to knowledge it just doesn't have access to. That might be things such as the proprietary information that you're trying to expose via chatbot or another tool to automate common processes. Sometimes the models itself, albeit probably less likely with foundational and large-scale models that we're using, could be subject to overfitting. And this is a well established problem within machine learning models anyway.

Biases in data are something that are very heavily discussed and talked about as well. And sometimes even language ambiguity can trip up some large language models as well. But the other thing is sometimes with the notion of hallucination, they're not very good at saying, I don't know. I'm sure we all have a person in our life that that immediately thinks, that makes us think of. But a recent paper from September by OpenAI actually suggests that the training and evaluation process being employed in order to build these models up actually rewards them guessing the answer over acknowledging uncertainty. And for that reason it's not necessarily going to tell you if it doesn't have the right context because it's incentivized to try and find an answer.

And that's why we need to use all the tools within the context engineering toolbox in order to understand what's going on and make sure that the large language model is not set up to fail because it doesn't know pieces of information that it needs to actually perform the actions that we want it to perform. And what we're going to do is talk about each of these components in turn, what they are and use precise examples to help. The main example I'm going to use is this simple travel planner bot that simply pulls out from a series of tools, weather and other information and generates an itinerary for a trip. I'll show you the code throughout the presentation and then at the end I also have a QR if you want to go in and dive into the code afterwards.

User Prompts and Prompt Engineering

Short description:

Discussing the impact of user prompts, including zero-shot prompts and the influence of choices and information on UI. Exploring the sophistication of prompts beyond textual data, incorporating images, videos, and other media. Detailing the process of passing prompts using AISDK and the importance of user input in LLMs.

I'll show you the code throughout the presentation and then at the end I also have a QR if you want to go in and dive into the code afterwards.

So the first thing we're going to talk about is the user prompt, which while we don't have that much control of, we can certainly influence through things like the choices and information we give on our UI. You probably are familiar with a simple prompt like this which is just asking a basic question. And this is known as a zero-shot prompt. It doesn't have any examples, it doesn't have a lot of detail in it and it will basically, you can see that chat GPT is going off it's actually using search under the hood now and it's trying to figure out what the answer to our question would be. Very common operation that many of us are used to doing.

But actually prompts can be more sophisticated, it's not just limited to textual data. Of course we're used to generating images like with the example prompt I have on the screen but there's no reason that we also couldn't use images, video and other mediums within a prompt as well to try and get the information or perform the actions that we want the model to do. When we're passing a prompt using AISDK as an example, what we tend to be doing is firstly sending the message and you'll actually see up here I've got send message which is being passed in through the use chat hook available from AISDK and then I have this simple function here send user chat which is actually sending the message as parts of type text. And then furthermore later on if you look into the TSX further down you'll see that in my form I'm going to send the user chat on submission and also when the enter key is clicked as well. And that's the simplest way that we tend to get user input into an LLM.

Custom Instructions and System Prompts

Short description:

Exploring custom instructions and system prompts in prompt engineering. Discussing the role, directive, output formatting, and examples in prompt customization. Highlighting the significance of system prompts and the inclusion of user messages for model actions.

We also can provide custom instructions. This is commonly known as the system prompt but you might hear it called instructions as well. And this is basically more of the blurb in detail to direct the large language model on what particular problem it has to solve. Prompts tend to have particular structures. There's numerous strategies that you might want to employ and what you'll see here is there's numerous different elements that we can tweak within the field of prompt engineering.

So we have the role i.e the persona we want our large language model to employ in order to conduct these tasks. We've got the directive which is the instruction, what it actually needs to do. The output formatting is should, well it's the output formatting. Is it text? Is it JSON? Is it something else? What am I expecting my output to look like? We can provide examples and this is what converts us from a zero shot to a multi-shot prompt. The idea we're giving it these examples to explain how many times, how it might want to action particular inputs.

Then there's style instructions which is you know perhaps more information about the output formatting and how you want it to look in terms of stylistic instructions. And then other information which might include things like guardrails or other things to try and direct it further. I.e maybe things like what not to do. When it comes to system prompts there's normally an element like system or system prompt that we basically pass whenever we're invoking the appropriate stream with our model in order to provide the system prompt and as you can see here we'll normally pass in not just the system prompt but also the messages from the user from the user prompt we discussed before so that it has all the information it needs to try and perform the action that we want it to perform.

System Prompts, RAG, and Semantic Search

Short description:

Exploring system prompts, prompt strategies, task types, and embeddings in prompt engineering. Introducing RAG for document retrieval and vector search for semantic relevance. Discussing similarity measures, vector search algorithms, and memory aspects in LLMs.

When it comes to system prompts there's normally an element like system or system prompt that we basically pass whenever we're invoking the appropriate stream with our model in order to provide the system prompt and as you can see here we'll normally pass in not just the system prompt but also the messages from the user from the user prompt we discussed before so that it has all the information it needs to try and perform the action that we want it to perform. There's also numerous strategies so if you want to dive into this more I actually recommend taking a look at this paper which has some really good nitty gritty kind of state of the ecosystem of prompt engineering but there's numerous different strategies. We can think of you know shots and examples if we discuss through there's actually numerous different types of structure for prompts if you think about things like chain of thought and there's also techniques like meta-prompting as well. All of these things can actually help us try to build better prompts to make sure that the LLM is doing what we want it to do but also there's other practices we should consider too. So think about the task type and make it clear what type of task you want the LLM to perform in your prompts. If it's a very complex task try and make sure you give it the right number of examples because if you give it too much it might lead to complexity and issues with the LLM trying to figure out what to do. Conversely too few examples could leave it really ambiguous. Think about the inputs, the outputs and the format like we do for all the systems that we build. Think about that persona you want to emulate carefully and also think about the LLM itself. Could you use the LLM to perhaps optimize, tweak your prompt or indeed do you get better results with the types of prompts that are being passed through with different LLMs? Have a look at all of these options when you're thinking about prompt engineering.

Now the next thing we need to talk about is RAG and RAG stands for Retrieval Augmented Generation. In simple terms this is where you take a set of documents normally split into pieces through a process known as chunking and then we will store them in a way normally utilizing a vector database which has our documents converted via an embedding model which we'll talk about more in a minute and then we'll take a relevant query, pull out the semantically relevant documents and then we'll pass those as part of our context to the large language model in order to get the result that we want and this is a very common thing in grounding an LLM and making sure it has the correct information. Now when it comes to similarity and semantic search we need to first understand the difference between that and traditional lexical. Traditional lexical works on exact matches for text so it normally utilizes the data structures you see here known as an inverted index where our documents and our query both go through a process to extract the relevant terms and then we look up our inverted index to find those terms and they'll point back to the documents of which that is considered a relevant and result to the query we've passed. But vector search works differently because it's all based on high school maths so if you think about different types of document not just text anymore but images documents and audio and even other types as well they can be converted into a numerical vector representation i.e. a stream of numbers and this is normally a dense vector and then we do the same thing with our query utilizing the same model and then we utilize one of a family of algorithms known as nearest neighbor to basically find what the closest vectors are and return the results. Thinking about embeddings we get these from machine learning models that are using a different task so LLM their task is all about text generation. The models we use to generate embeddings quite simply are basically going to take the semantic aspects out of whichever type of data is depending on the modality that it supports i.e. what format it supports and it'll just generate a stream of numbers that represent that document that we can then have in within a vector space. So this is an example vector space here a very simple two-dimensional illustration and what you'll see is within my vector space I've got my first dimension of realism and second of human and each of the documents within here is positioned based on the intensity that it meets that particular category. So you'll see that my realistic characters are up towards the top and then you'll see cartoon Vader is on the opposing side but he's not considered similar. Then when we go and search we'll basically get the same vector embedding from that model from our query and then we'll find the closest candidate by using a similarity algorithm. There are numerous different types of measures of similarity the default for elastic search we've seen in a few others in the majority of cases is cosine similarity and it basically takes the angle between the two vectors your query vector and one of the documents of interest and as the cosine of the angle tends towards one that means that that document is considered semantically relevant. When we're doing these comparisons those nearest neighbor algorithms there's numerous ways it can be done there's numerous different algorithms such as disk enn and others. The one that's quite popularly used in vector databases including we've seen in elastic search would be hierarchical navigable small worlds or hnnsw and the way that this works is it splits the graph space into layers as you can see from the visualization and that means that we kind of go from more kind of sparse kind of comparisons to more fine grain meaning that we're trying to balance accuracy but also speeds. So this is an example query and as you can see here that I've got my knn option I specify the vector field here which is basically you know the name of the field where the vector lives within the vector database the number of documents I want to return are the closest is k and then the number of candidates to evaluate per shard in this case I've set as 100 and then my query vector here you'll see is basically taken that query and put and you know converted it using a model and then I've got my vector ready to go and when I end up doing that and when I end up doing that I can match documents not just based on the exact text matches but also the similarity based on the attributes from the model.

Like humans you know large language models have this notion of memory they need aspects to both short-term memory i.e the current session and longer-term things that have happened in the past to help them function and make sure that they're performing tasks as we expect. Short-term memory quite simply is referring to the current session so if we take our travel planner example basically the current chat that I am having with it would be the short-term memory and it would be not just the human messages that I am sending but also the responses from the AI as well but we can also think about chat history the conversations that we've had before we've been planning other trips and those would be the ones that we store within long-term memory which could be in a store such as a vector database or a file or even some form of other form of memory so that we can pull them back. Taking our AI SDK agent example what you'll see here is that we're going to actually persist the messages and then pull relevant ones back. So you'll see here what I'm doing is I'm actually persisting the last message then what I'm doing is getting similar ones by pulling them out of Elasticsearch via this utility I created.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript

JSNation 2023

21 min

Building a Voice-Enabled AI Assistant With Javascript

Top Content

Tejas Kumar

Author of the "Fluent React" bestselling book, software engineer with 23 years of experience, and host of the developer-loved ConTejas Code podcast.

This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.

case study artificial intelligence

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

JSNation US 2024

31 min

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

Top Content

Addy Osmani

Engineering Leader Working on Google Chrome

AI is transforming software engineering by using agents to help with coding. Agents can autonomously complete tasks and make decisions based on data. Collaborative AI and automation are opening new possibilities in code generation. Bolt is a powerful tool for troubleshooting, bug fixing, and authentication. Code generation tools like Copilot and Cursor provide support for selecting models and codebase awareness. Cline is a useful extension for website inspection and testing. Guidelines for coding with agents include defining requirements, choosing the right model, and frequent testing. Clear and concise instructions are crucial in AI-generated code. Experienced engineers are still necessary in understanding architecture and problem-solving. Energy consumption insights and sustainability are discussed in the Talk.

artificial intelligence

The Rise of the AI Engineer

React Summit US 2023

30 min

The Rise of the AI Engineer

Top Content

Watch video: The Rise of the AI Engineer

Shawn Swyx Wang

Latent.Space Editor & Smol.ai Founder

The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.

web development artificial intelligence builders and founders future of development

AI and Web Development: Hype or Reality

JSNation 2023

24 min

AI and Web Development: Hype or Reality

Top Content

Wes Bos

Full Stack Developer, Speaker & Teacher, Co-host of Syntax.fm podcast.

This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.

productivity artificial intelligence

Web Apps of the Future With Web AI

JSNation 2024

32 min

Web Apps of the Future With Web AI

Jason Mayes

Web AI Lead at Google.

Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.

artificial intelligence

The AI-Native Software Engineer

JSNation US 2025

35 min

The AI-Native Software Engineer

Addy Osmani

Engineering Leader Working on Google Chrome

Software engineering is evolving with AI and VIBE coding reshaping work, emphasizing collaboration and embracing AI. The future roadmap includes transitioning from augmented to AI-first and eventually AI-native developer experiences. AI integration in coding practices shapes a collaborative future, with tools evolving for startups and enterprises. AI tools aid in design, coding, and testing, offering varied assistance. Context relevance, spec-driven development, human review, and AI implementation challenges are key focus areas. AI boosts productivity but faces verification challenges, necessitating human oversight. The impact of AI on code reviews, talent development, and problem-solving evolution in coding practices is significant.

artificial intelligence

Workshops on related topic

AI on Demand: Serverless AI

DevOps.js Conf 2024

163 min

AI on Demand: Serverless AI

Top Content

Featured WorkshopFree

Nathan Disidore

In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.

serverless architecture artificial intelligence

AI for React Developers

React Advanced 2024

142 min

AI for React Developers

Top Content

Featured Workshop

Eve Porcello

Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)

react next.js artificial intelligence

Building Full Stack Apps With Cursor

JSNation 2025

46 min

Building Full Stack Apps With Cursor

Featured Workshop

Mike Mikula

In this workshop I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own AI generated full stack app on your machine!
Please, find the FAQ here

artificial intelligence

Vibe coding with Cline

JSNation 2025

64 min

Vibe coding with Cline

Featured Workshop

Nik Pash

The way we write code is fundamentally changing. Instead of getting stuck in nested loops and implementation details, imagine focusing purely on architecture and creative problem-solving while your AI pair programmer handles the execution. In this hands-on workshop, I'll show you how to leverage Cline (an autonomous coding agent that recently hit 1M VS Code downloads) to dramatically accelerate your development workflow through a practice we call "vibe coding" - where humans focus on high-level thinking and AI handles the implementation.You'll discover:The fundamental principles of "vibe coding" and how it differs from traditional developmentHow to architect solutions at a high level and have AI implement them accuratelyLive demo: Building a production-grade caching system in Go that saved us $500/weekTechniques for using AI to understand complex codebases in minutes instead of hoursBest practices for prompting AI agents to get exactly the code you wantCommon pitfalls to avoid when working with AI coding assistantsStrategies for using AI to accelerate learning and reduce dependency on senior engineersHow to effectively combine human creativity with AI implementation capabilitiesWhether you're a junior developer looking to accelerate your learning or a senior engineer wanting to optimize your workflow, you'll leave this workshop with practical experience in AI-assisted development that you can immediately apply to your projects. Through live coding demos and hands-on exercises, you'll learn how to leverage Cline to write better code faster while focusing on what matters - solving real problems.

artificial intelligence

The React Developer's Guide to AI Engineering

React Summit US 2025

96 min

The React Developer's Guide to AI Engineering

Featured WorkshopFree

Niall Maher

A comprehensive workshop designed specifically for React developers ready to become AI engineers. Learn how your existing React skills—component thinking, state management, effect handling, and performance optimization—directly translate to building sophisticated AI applications. We'll cover the full stack: AI API integration, streaming responses, error handling, state persistence with Supabase, and deployment with Vercel.Skills Translation:- Component lifecycle → AI conversation lifecycle- State management → AI context and memory management- Effect handling → AI response streaming and side effects- Performance optimization → AI caching and request optimization- Testing patterns → AI interaction testing strategiesWhat you'll build: A complete AI-powered project management tool showcasing enterprise-level AI integration patterns.

artificial intelligence

Build LLM agents in TypeScript with Mastra and Vercel AI SDK

React Advanced 2025

145 min

Build LLM agents in TypeScript with Mastra and Vercel AI SDK

Featured WorkshopFree

Eric Burel

LLMs are not just fancy search engines: they lay the ground for building autonomous and intelligent pieces of software, aka agents.
Companies are investing massively in generative AI infrastructures. To get their money's worth, they need developers that can make the best out of an LLM, and that could be you.
Discover the TypeScript stack for LLM-based development in this 3 hours workshop. Connect to your favorite model with the Vercel AI SDK and turn lines of code into AI agents with Mastra.ai.

typescript artificial intelligence