What's Inside Biome's Linter?

Rate this content
Bookmark

With Emanuele, lead maintainer of Biome, we will explore the internals of Biome's analyzer, which fuels its linter. You'll learn how lint rules are made, what tools the analyzer can give and how to use them.

This talk has been presented at JSNation 2024, check out the latest edition of this JavaScript Conference.

FAQ

The Biome Analyzer is a versatile tool that goes beyond being just a linter or a CLI tool. It offers functionalities like formatting, analyzing, checking, import sorting, and even code transpiling.

Biome Analyzer is fast because it uses multi-threading to handle multiple files simultaneously, employs channels for efficient communication among threads, and uses aggressive caching to reuse memory tokens.

Import sorting is a feature of the Biome Analyzer that sorts imports automatically when you save your file. This is done without the need for diagnostics and is part of the analyzer's assist functions.

Yes, the Biome Analyzer is LSP (Language Server Protocol) ready. It can be configured with an IDE or any editor that supports LSP, enabling features like automatic sorting of JSON keys or JSX element attributes.

Yes, Biome Analyzer can also be used as a CLI tool. You can configure it to impose refactors and make the CLI fail if certain conditions, like unsorted JSON keys, are not met.

Biome Analyzer is designed to handle large codebases efficiently. It can process thousands of files quickly by leveraging multi-threading and caching, providing a great developer experience.

Multi-threading in Biome Analyzer involves spawning multiple threads for each file. Each thread is responsible for parsing and analyzing its own file, and then reporting diagnostics back to the main thread using channels.

Channels are used for communication between the main thread and multiple worker threads. Each thread sends its diagnostics and other information through these channels to the main thread, which then compiles and reports the results.

Reusable tokens refer to pointers to blocks of memory that are saved into a caching object. When a file is reparsed, these tokens are reused, saving memory and improving performance.

The Biome Analyzer can emit new diagnostics and perform other tasks like import sorting that complement the parsing phase. It can handle tasks that are out of scope for the parser, such as semantic validation.

Emanuele Stoppa
Emanuele Stoppa
10 min
17 Jun, 2024

Comments

Sign in or register to post your comment.
Video Summary and Transcription
Today, we're going to talk about the Biome Analyzer, which is not just a linter or a CLI tool. It takes advantage of multi-threading, channels for communication, and caching to achieve high performance. The analyzer complements the parser and provides features like import sorting and emitting new diagnostics. It is LSP ready, can automatically sort JSON keys, and can be used as a CLI tool for enforcing refactors. The Biome Analyzer showcases its impressive performance in handling large codebases in a video demonstration.

1. Introduction to Biome Analyzer

Short description:

Today, we're going to talk about the Biome Analyzer. It's not just a linter or a CLI tool. Biome Analyzer is so fast because it takes advantage of multi-threading, uses channels for communication, and employs aggressive caching during parsing.

Hello, everyone. How's it going? So today, we're going to talk about the Biome Analyzer and what's behind it.

So before going forward, who I am. So my name is Emanuele Stoppa. I'm Italian. I live in Ireland. I like open source, games, traveling. And I'm also so into open source that I'm into two projects, Astro and Biome.

Today, we're going to talk about the Biome Analyzer. So what's really curious about the Biome Analyzer? Well, Biome Analyzer is so fast. I'm going to look at why it's that fast. It's not just a linter. It's much more. A linter is just a smaller thing. And it's not just a CLI tool. It's also something more. So, let's do it.

So, why Biome is so fast? So, there are, among other things, there are three things that I want to explain to you. And so, why is Biome so fast? Like, it takes advantage of multi-threading. So, it spawns multiple threads for each file. It uses this kind of channels to keep the communications among the different threads. And we use, like, aggressive caching during the parsing phase. Now, multi-threading. So, when each command that you run from the CLI, like formatting, analyzing, checking, this kind of stuff. So, each command that crawls your file system. What Biome does is that once it identifies those files that are eligible for being handled, let's say, Biome spawns a thread. So, each thread is responsible of its own file. And it parses it. It analyzes it. And it emits some signals that could be, like, if there's a diagnostic, if there's a code action, and more.

2. Working of Biome Analyzer

Short description:

Biome Analyzer uses channels for communication among threads and collects diagnostics using multiple senders and one receiver. It also employs reusable tokens to minimize memory usage during reparsing.

Now, all these threads, when they are spawned, they are not aware of each other. Like, they just do one job. At the end, they have to report something, like if there are errors, or if not, like any kind of information. In order to do so, we use channels.

So, you have all these files. For each file, we have these threads. There are n threads, depending on the operating system. Then we have the main thread. So, the main thread waits for all these threads. And it starts collecting information from all the threads.

So, using these channels, we have multiple channels with multiple senders, which are essentially the threads. And one receiver, which belongs to the main thread. And once there are diagnostics, we collect them. We collect if there are warnings, errors. If, like, we skipped like some diagnostics due to some restrictions or options and things like that. So, that's how the communication happens. And once all the threads are died, the main thread can resume its work and report everything to your console.

And then we have reusable tokens. So, essentially, what does that mean? So, once biome parses your file, it creates tokens and nodes. These are essentially pointers to a block of memory on your operating system. And these pointers are saved, like the references are saved into a caching object. Okay? Once a reparse of the same file happens again, let's say a code action occurred and that action changes this snippet from let to const. We do a reparse to make sure that there are no more triggered rules. When we reparse it, essentially, the nodes that belong to msecret equals and the string are reused. So, instead of creating a new node, we have it there already. So, we have that reference that says that msecret points to that block of memory. Let's just use it. Let's not create a new one. So, that's how for each document we reuse the same thing. So, like memory wise, there's no waste at all.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

The Future of Performance Tooling
JSNation 2022JSNation 2022
21 min
The Future of Performance Tooling
Top Content
Today's Talk discusses the future of performance tooling, focusing on user-centric, actionable, and contextual approaches. The introduction highlights Adi Osmani's expertise in performance tools and his passion for DevTools features. The Talk explores the integration of user flows into DevTools and Lighthouse, enabling performance measurement and optimization. It also showcases the import/export feature for user flows and the collaboration potential with Lighthouse. The Talk further delves into the use of flows with other tools like web page test and Cypress, offering cross-browser testing capabilities. The actionable aspect emphasizes the importance of metrics like Interaction to Next Paint and Total Blocking Time, as well as the improvements in Lighthouse and performance debugging tools. Lastly, the Talk emphasizes the iterative nature of performance improvement and the user-centric, actionable, and contextual future of performance tooling.
Rome, a Modern Toolchain!
JSNation 2023JSNation 2023
31 min
Rome, a Modern Toolchain!
Top Content
Rome is a toolchain built in Rust that aims to replace multiple tools and provide high-quality diagnostics for code maintenance. It simplifies tool interactions by performing all operations once, generating a shared structure for all tools. Rome offers a customizable format experience with a stable formatter and a linter with over 150 rules. It integrates with VCS and VLSP, supports error-resilient parsing, and has exciting plans for the future, including the ability to create JavaScript plugins. Rome aims to be a top-notch toolchain and welcomes community input to improve its work.
Conquering Complexity: Refactoring JavaScript Projects
JSNation 2024JSNation 2024
21 min
Conquering Complexity: Refactoring JavaScript Projects
Today's Talk explores the complexity in code and its impact. It discusses different methods of measuring complexity, such as cyclomatic complexity and cognitive complexity. The importance of understanding and conquering complexity is emphasized, with a demo showcasing complexity in a codebase. The Talk also delves into the need for change and the role of refactoring in dealing with complexity. Tips and techniques for refactoring are shared, including the use of language features and tools to simplify code. Overall, the Talk provides insights into managing and reducing complexity in software development.
Improving Developer Happiness with AI
React Summit 2023React Summit 2023
29 min
Improving Developer Happiness with AI
Watch video: Improving Developer Happiness with AI
GitHub Copilot is an auto-completion tool that provides suggestions based on context. Research has shown that developers using Copilot feel less frustrated, spend less time searching externally, and experience less mental effort on repetitive tasks. Copilot can generate code for various tasks, including adding modals, testing, and refactoring. It is a useful tool for improving productivity and saving time, especially for junior developers and those working in unfamiliar domains. Security concerns have been addressed with optional data sharing and different versions for individuals and businesses.
Static Analysis in JavaScript: What’s Easy and What’s Hard
JSNation 2023JSNation 2023
23 min
Static Analysis in JavaScript: What’s Easy and What’s Hard
Static analysis in JavaScript involves analyzing source code without executing it, producing metrics, problems, or warnings. Data flow analysis aims to determine the values of data in a program. Rule implementation in JavaScript can be straightforward or require extensive consideration of various cases and parameters. JavaScript's dynamic nature and uncertainty make static analysis challenging, but it can greatly improve code quality.
Automate the Browser With Workers Browser Rendering API
JSNation 2024JSNation 2024
20 min
Automate the Browser With Workers Browser Rendering API
The Talk discusses browser automation using the Worker's Browser Rendering API, which allows tasks like navigating websites, taking screenshots, and creating PDFs. Cloudflare integrated Puppeteer with their workers to automate browser tasks, and their browser rendering API combines remote browser isolation with Puppeteer. Use cases for the API include taking screenshots, generating PDFs, automating web applications, and gathering performance metrics. The Talk also covers extending sessions and performance metrics using Durable Objects. Thank you for attending!

Workshops on related topic

Solve 100% Of Your Errors: How to Root Cause Issues Faster With Session Replay
JSNation 2023JSNation 2023
44 min
Solve 100% Of Your Errors: How to Root Cause Issues Faster With Session Replay
WorkshopFree
Ryan Albrecht
Ryan Albrecht
You know that annoying bug? The one that doesn’t show up locally? And no matter how many times you try to recreate the environment you can’t reproduce it? You’ve gone through the breadcrumbs, read through the stack trace, and are now playing detective to piece together support tickets to make sure it’s real.
Join Sentry developer Ryan Albrecht in this talk to learn how developers can use Session Replay - a tool that provides video-like reproductions of user interactions - to identify, reproduce, and resolve errors and performance issues faster (without rolling your head on your keyboard).