Game Changer! Building Search Into Your Applications

Rate this content
Bookmark

Building search into applications can be quite easy. This talk has been really fun for audiences because I often get audience members involved, often building out the code themselves, while they try to stump each other by naming the hardest things to find. The application is hosted in a Code Sandbox, so the audience takes the code home with them. Also, I can do the same presentation with movies, if the organizers prefer.

This talk has been presented at Node Congress 2023, check out the latest edition of this Tech Conference.

FAQ

According to Salesforce, 87% of e-commerce shoppers begin their shopping journey in the search bar.

Forrester states that 68% of shoppers will abandon their search if the user experience is poor, indicating the critical role of effective search functionality in retaining users.

Document databases are better suited for handling large volumes of unstructured and semi-structured data, making them ideal for search functionalities where query patterns are not predefined, unlike relational databases.

Apache Lucene is an open-source search engine software library, extensively used by companies like Netflix, Walmart, and eBay. It excels in processing and indexing documents for efficient search operations.

Lucene enhances search by using an inverted index to map tokens derived from document data to their respective documents, improving the accuracy and speed of search results.

Analyzers in Lucene process input text, break it down into tokens, and optimize these tokens for search by removing punctuation, lowercasing text, and more, which helps in accurately mapping search queries to relevant documents.

Using different analyzers, like the standard or keyword analyzer, can significantly change the search results. For example, the keyword analyzer treats the input as a single token, which is perfect for exact matches, unlike the standard analyzer that splits the input into multiple tokens.

Query operators such as regex, phrase, text, and autocomplete allow users to refine and specify their search queries, catering to different search preferences and improving the likelihood of finding the most relevant data.

Custom scoring allows search engines to rank documents based on relevance to the query and other factors, like a document's inherent value (e.g., FIFA scores in a sports database), ensuring that the most relevant and high-quality results appear first.

Karen Huaulme
Karen Huaulme
8 min
14 Apr, 2023

Comments

Sign in or register to post your comment.

Video Summary and Transcription

Implementing the right strategies and tools, such as Apache Lucene, can improve search performance and user experience. The choice of analyzer affects search results, and query operators provide various search options. Relevant scoring is crucial for ranking documents based on relevance. Custom scoring can prioritize specific criteria. Consider analyzers, query operators, and scoring methods to optimize the search experience.

1. Introduction to Search Game and Apache Lucene

Short description:

You've got data and users who need to access it. The search game is about helping users find what they want. Implementing the right strategies and tools, such as a document database like Apache Lucene, can improve search performance and user experience.

Listen up, people! You've got data and you've got users and your users need to access your data. So whether it's Google, whether it's Amazon, whether it's Stack Overflow, Salesforce says that 87% of e-commerce shoppers start their journey in the search bar. And Forrester says that 68% of those shoppers will give up their journey if you provide a bad user experience.

Now this search bar looks simple but on the other side of that search bar are your users and they are not so simple. They don't know what they want. They don't know how to express what they want. They don't know how to spell it. And that's getting them to get what they want is what I call the search game. And when you play the search game right you could take this search bar and you can turn it into this giant goal. A goal so big in fact that your users simply can't miss. They will net everything that they're looking for including the things that they didn't even know that they were looking for. When you play the search game right, that means you get more engagement, more clicks, more users, more likes, more sharing, and more revenue. So then your competitors and everybody who is not you is your competitor.

So I am going to coach you today on how to implement the right strategies and the right tools you'll need to get this. The first thing you need for your proper equipment is a document database. More users when they want your data they're probably searching through volumes and volumes of unstructured and semi-structured data. Now, relational databases are fantastic for tables. Anything that's in a column in a row is great. When you know the query pattern ahead of time, it's great for that. But for search, the performance goes down so you'll want a document data, database.

The next star player you have is Apache Lucene. Apache Lucene is your star player. All the winning teams play with Apache Lucene. Netflix, Walmart, Ebay, it is a battle-tested, open source. It's been around for 20 years. That's why they play it. And you can build your own thing but why would you farm a promising, upcoming player when Messi's already warmed up, ready and hungry to play for you. So Apache Lucene is your star player and its big power play is that it takes that document database and it runs that data through a process called analysis. Analysis is going to take that data, break it down into different tokens depending on the analyzer you use and those tokens are stored in an inverted index. So Lucene uses an inverted index.

2. Search Process and Optimization

Short description:

When using the standard analyzer in Lucene, searching for 'Manchester United' will yield two tokens: 'Manchester' and 'United'. The choice of analyzer affects the search results, as demonstrated by the keyword analyzer, which returns only players from Manchester United. Query operators, such as regex, phrase, text, facets, and auto-complete, provide users with various search options. Relevant scoring plays a crucial role in search engines, as it ranks documents based on their relevance to the search query. Custom scoring can be used to prioritize specific criteria, such as the overall FIFA score. Consider the choice of analyzers, query operators, and scoring methods to optimize the search experience for your users.

So let's run through this process in a practice play, see how it feels. If I have these four documents with these soccer teams in it and their unique underscore ID field, if I were to look through those documents for Manchester United, it would lowercase everything, remove all the punctuation, I'm left with two different tokens, Manchester and United, using the standard analyzer in Lucene. Those are my two tokens.

So when I look through these documents for those things, my tokens or my terms will map two documents, one in two for Manchester and one in three for United. So my inverted index will hold my tokens or my terms, what documents and other helpful metadata, frequency, position, etc. Now having the right tokens or the right terms can make or break a good search experience for you. So it's important to use the right analyzer to get the right terms.

Now I'm going to show you what I mean in this example. So this is an app that I wrote called Atlas Search Soccer. It uses Atlas Search. In it, I used the FIFA Player Database so you can find lots of different search options to find your FIFA dream team and you can put your own players on there. It'll also show you the code on the queries on how to do that. Now I know it's called football everywhere else in the entire world except the United States, but I already bought the domain name so we're just going to stick with that. So in this one I am looking for players from Manchester United. I'm using the standard analyzer. As you remember, it's Manchester United. So I'm going to get 697 players when I look for Manchester United because it's giving me Manchester United and West Ham United and Manchester City and anything else with either Manchester United. If, however, I change to the keyword analyzer as I'm doing here, I find 33 matching players and they are all indeed from Manchester United because when I pass, this is using the keyword analyzer which takes everything, it keeps the punctuation, it keeps the capital letters, all the casing, and it gives me that one token. So keyword analyzers are fantastic if you're using check boxes.

So your tokens matter which means your analyzers matter. The next thing you need to consider are your query operators, whether it's regex, phrase, text, whether using facets or if you're using auto-complete. This is a way to let your users take their best shot. Every user is different. Every user has a different preference of how they're going to go through things so you want to, in your application, give them as many options as possible. And of course I can't talk about giving your users those best shot at finding your data without talking about scoring. Relevant scoring is so important in search. All search engines are going to grade all your documents based on how well they match the search query and that is called relevance. And it is going to return your documents to you with the score in descending order. So it's going to give you what it thinks the best matches are, the most relevant matches first.

In this example, for instance, I'm looking for Cristiano Ronaldo for my FIFA dream team and I just look for Ronaldo and I get this lovely gentleman first but that's not the Ronaldo I want because it's looking for relevance first. I want the overall FIFA score to be very high in that. So in this one, in the query, I'm going to take the overall FIFA score which is the field in every one of my documents. I'm going to factor it in to my relevance score and now I get Cristiano Ronaldo first. He's very difficult personality-wise but he's very good and I want him on my dream team. So with that, you want scoring matters, custom scoring because you want everything right, think about your data, think about your user interface, think about your tokens and I'm not clicking through, I had such a nice interactive, oh there it goes. Think about your tokens, goes in, pick your analyzer accordingly that goes in your index inside of your queries and all of that will be served up to your users so they have their best shot at finding your data before they find it at your competitors.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Remix Flat Routes – An Evolution in Routing
Remix Conf Europe 2022Remix Conf Europe 2022
16 min
Remix Flat Routes – An Evolution in Routing
Top Content
Remix Flat Routes is a new convention that aims to make it easier to see and organize the routes in your app. It allows for the co-location of support files with routes, decreases refactor and redesign friction, and helps apps migrate to Remix. Flat Folders convention supports co-location and allows importing assets as relative imports. To migrate existing apps to Flat Routes, use the Remix Flat Routes package's migration tool.
You Don’t Know How to SSR
DevOps.js Conf 2024DevOps.js Conf 2024
23 min
You Don’t Know How to SSR
The Talk covers the speaker's personal journey into server-side rendering (SSR) and the evolution of web development frameworks. It explores the use of jQuery for animations in SSR, the challenges faced in integrating React with Umbraco, and the creation of a custom SSR framework. The Talk also discusses the benefits of Next.js and the use of serverless artifacts for deployment. Finally, it highlights the features of Astro, including its function per route capability.
How to Make a Web Game All by Yourself
JS GameDev Summit 2023JS GameDev Summit 2023
27 min
How to Make a Web Game All by Yourself
This talk guides you on how to make a web game by yourself, emphasizing the importance of focusing on tasks that interest you and outsourcing the rest. It suggests choosing a game engine that allows distribution on the web and aligns with your understanding and enjoyment. The talk also highlights the significance of finding fun in the creative process, managing scope, cutting features that don't align with the game's direction, and iterating to the finish line. It concludes by discussing the options for publishing the game on the web and leveraging unique web features.
Atomic Deployment for JS Hipsters
DevOps.js Conf 2024DevOps.js Conf 2024
25 min
Atomic Deployment for JS Hipsters
This Talk discusses atomic deployment for JavaScript and TypeScript, focusing on automated deployment processes, Git hooks, and using hard links to copy changes. The speaker demonstrates setting up a bare repository, configuring deployment variables, and using the post-receive hook to push changes to production. They also cover environment setup, branch configuration, and the build process. The Talk concludes with tips on real use cases, webhooks, and wrapping the deployment process.
Your GraphQL Groove
GraphQL Galaxy 2022GraphQL Galaxy 2022
31 min
Your GraphQL Groove
The Talk discusses the value proposition of GraphQL and its ability to solve common pain points in API development. It highlights the importance of making informed decisions when choosing GraphQL clients, servers, and schema builders. The Talk also emphasizes the need to focus on the best developer experience in the present rather than seeking a perfect long-term solution. Additionally, it mentions the future of the Urkel GraphQL client and the reasons for dropping ReScript support. Overall, the Talk provides insights into the current state and future trends of GraphQL development.
AWS Lambda under the hood
Node Congress 2023Node Congress 2023
22 min
AWS Lambda under the hood
Top Content
In this Talk, key characteristics of AWS Lambda functions are covered, including service architecture, composition, and optimization of Node.js code. The two operational models of Lambda, asynchronous and synchronous invocation, are explained, highlighting the scalability and availability of the service. The features of Lambda functions, such as retries and event source mapping, are discussed, along with the micro VM lifecycle and the three stages of a Lambda function. Code optimization techniques, including reducing bundle size and using caching options, are explained, and tools like webpack and Lambda Power Tuning are recommended for optimization. Overall, Lambda is a powerful service for handling scalability and traffic spikes while enabling developers to focus on business logic.

Workshops on related topic

AI on Demand: Serverless AI
DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Top Content
Featured WorkshopFree
Nathan Disidore
Nathan Disidore
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
Integrating LangChain with JavaScript for Web Developers
React Summit 2024React Summit 2024
92 min
Integrating LangChain with JavaScript for Web Developers
Featured Workshop
Vivek Nayyar
Vivek Nayyar
Dive into the world of AI with our interactive workshop designed specifically for web developers. "Hands-On AI: Integrating LangChain with JavaScript for Web Developers" offers a unique opportunity to bridge the gap between AI and web development. Despite the prominence of Python in AI development, the vast potential of JavaScript remains largely untapped. This workshop aims to change that.Throughout this hands-on session, participants will learn how to leverage LangChain—a tool designed to make large language models more accessible and useful—to build dynamic AI agents directly within JavaScript environments. This approach opens up new possibilities for enhancing web applications with intelligent features, from automated customer support to content generation and beyond.We'll start with the basics of LangChain and AI models, ensuring a solid foundation even for those new to AI. From there, we'll dive into practical exercises that demonstrate how to integrate these technologies into real-world JavaScript projects. Participants will work through examples, facing and overcoming the challenges of making AI work seamlessly on the web.This workshop is more than just a learning experience; it's a chance to be at the forefront of an emerging field. By the end, attendees will not only have gained valuable skills but also created AI-enhanced features they can take back to their projects or workplaces.Whether you're a seasoned web developer curious about AI or looking to expand your skillset into new and exciting areas, "Hands-On AI: Integrating LangChain with JavaScript for Web Developers" is your gateway to the future of web development. Join us to unlock the potential of AI in your web projects, making them smarter, more interactive, and more engaging for users.
Build and Deploy a Backend With Fastify & Platformatic
JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
WorkshopFree
Matteo Collina
Matteo Collina
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/). 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
Deploying React Native Apps in the Cloud
React Summit 2023React Summit 2023
88 min
Deploying React Native Apps in the Cloud
WorkshopFree
Cecelia Martinez
Cecelia Martinez
Deploying React Native apps manually on a local machine can be complex. The differences between Android and iOS require developers to use specific tools and processes for each platform, including hardware requirements for iOS. Manual deployments also make it difficult to manage signing credentials, environment configurations, track releases, and to collaborate as a team.
Appflow is the cloud mobile DevOps platform built by Ionic. Using a service like Appflow to build React Native apps not only provides access to powerful computing resources, it can simplify the deployment process by providing a centralized environment for managing and distributing your app to multiple platforms. This can save time and resources, enable collaboration, as well as improve the overall reliability and scalability of an app.
In this workshop, you’ll deploy a React Native application for delivery to Android and iOS test devices using Appflow. You’ll also learn the steps for publishing to Google Play and Apple App Stores. No previous experience with deploying native applications is required, and you’ll come away with a deeper understanding of the mobile deployment process and best practices for how to use a cloud mobile DevOps platform to ship quickly at scale.
Full Stack GraphQL In The Cloud With Neo4j Aura, Next.js, & Vercel
GraphQL Galaxy 2021GraphQL Galaxy 2021
161 min
Full Stack GraphQL In The Cloud With Neo4j Aura, Next.js, & Vercel
WorkshopFree
William Lyon
William Lyon
In this workshop we will build and deploy a full stack GraphQL application using Next.js, Neo4j, and Vercel. Using a knowledge graph of news articles we will first build a GraphQL API using Next.js API routes and the Neo4j GraphQL Library. Next, we focus on the front-end, exploring how to use GraphQL for data fetching with a Next.js application. Lastly, we explore how to add personalization and content recommendation in our GraphQL API to serve relevant articles to our users, then deploy our application to the cloud using Vercel and Neo4j Aura.

Table of contents:
- Next.js overview and getting started with Next.js
- API Routes with Next.js & building a GraphQL API
- Using the Neo4j GraphQL Library
- Working with Apollo Client and GraphQL data fetching in Next.js
- Deploying with Vercel and Neo4j Aura
Building Serverless Applications on AWS with TypeScript
Node Congress 2021Node Congress 2021
245 min
Building Serverless Applications on AWS with TypeScript
Workshop
Slobodan Stojanović
Slobodan Stojanović
This workshop teaches you the basics of serverless application development with TypeScript. We'll start with a simple Lambda function, set up the project and the infrastructure-as-a-code (AWS CDK), and learn how to organize, test, and debug a more complex serverless application.
Table of contents:        - How to set up a serverless project with TypeScript and CDK        - How to write a testable Lambda function with hexagonal architecture        - How to connect a function to a DynamoDB table        - How to create a serverless API        - How to debug and test a serverless function        - How to organize and grow a serverless application


Materials referred to in the workshop:
https://excalidraw.com/#room=57b84e0df9bdb7ea5675,HYgVepLIpfxrK4EQNclQ9w
DynamoDB blog Alex DeBrie: https://www.dynamodbguide.com/
Excellent book for the DynamoDB: https://www.dynamodbbook.com/
https://slobodan.me/workshops/nodecongress/prerequisites.html