Parse, Don’t Validate

Rate this content
Bookmark
The talk covers various aspects of ensuring data correctness in JavaScript projects. It starts by addressing common challenges such as security, reliability, and data validation. JSON Schema and JSON Type Definition (JTD) are discussed, with JTD being recommended for most API use cases due to its simplicity and type alignment. Fastify is introduced as a library that improves performance by efficiently handling serialization. The AJV library is also mentioned for its role in data validation with JSON Schema. The talk highlights the problems with using JSON, including its wastefulness and security issues like call stack exhaustion and DDoS attacks. For better performance and reliability, the speakers suggest parsing data directly to application types using JTD. They provide a practical example with a Mario Kart character schema, demonstrating how JTD simplifies data validation and parsing.

From Author:

Most JavaScript applications use JSON.parse to create any object first, and then validate and narrow the data type to the expected one. This approach has performance and security problems, as even if the data is invalid, the whole JSON string needs to be parsed first before the data is validated, instead of failing at JSON parsing stage (e.g., if number is passed instead of string in some property).

Many languages support parsing of JSON strings directly into the expected types, but it is not natively supported in JavaScript or TypeScript.

In this talk we will show how the powers of TypeScript combined with the new specification JSON Type Definition (RFC 8927) and Ajv library can be used to parse your data directly into the expected application-defined type faster than JSON.parse, and also how to serialize data of a known type approximately 10 times faster than JSON.serialize.

 

This talk has been presented at Node Congress 2023, check out the latest edition of this Tech Conference.

FAQ

Simplex Chat is a unique messaging platform founded by Evgeniy, which is distinctive for not having user identifiers.

The primary challenges include security, reliability, and data validation, which are consistent issues across various projects.

JSON Schema is a way to define the format and structure of data, widely adopted since 2009. JTD (JSON Type Definition) is a simpler alternative started in 2020, supporting discriminated unions but with some limitations compared to JSON Schema.

JSON can be wasteful as it requires parsing the entire data piece before validation, it's time-consuming due to its complex, nested nature, and it has security issues like potential for call stack exhaustion and DDoS attacks.

Fastify enhances performance by efficiently handling serialization of responses, understanding the structure of the data it returns, which minimizes loops and accelerates property access.

AJV is a library used for data validation with JSON Schema and has significantly grown, with 350 million downloads monthly as of 2015. It helps manage data correctness and security in JavaScript applications.

For API development, JTD (JSON Type Definition) is recommended for most use cases because of its simplicity and alignment with data types, providing clearer and more efficient data handling.

Type alignment ensures that data parsing and serialization are directly linked to specific data types, which improves performance, security, and reliability by ensuring data is correctly typed and structured.

Evgeny Poberezkin
Evgeny Poberezkin
Jason Green
Jason Green
26 min
17 Apr, 2023

Comments

Sign in or register to post your comment.

Video Transcription

Available in Español: Analizar, no validar

1. Introduction to JavaScript and Data Correctness

Short description:

Hello. We're going to talk today about JavaScript and how to ensure data correctness. We've worked together on various projects, including MailOnline and Threads. I'll hand over to Jason to introduce himself. Jason is the director of technology at Threads Styling and has extensive experience with data validation using JSON Schema. We've encountered common problems in our projects, such as security, reliability, and data validation. We'll discuss an alternative approach to validation that involves parsing and carrying the proof of validity. JSON, while flexible, has its challenges and can be wasteful.

Hello. I'm Evgeniy and this is Jason. We're going to talk today about JavaScript and how ensure data correctness but before we'll give a brief introduction.

So, we've done lots of great things together. We worked together at MailOnline, at Threads, and when I did java library Jason also helped a lot. So, currently I founded a Simplex chat, which is a messaging platform that is the only one of a kind that doesn't have user identifiers, but this is not what the talk is about.

So, I'll hand over to Jason to introduce himself. Thanks, Evgeniy. Obviously, you know by now I'm Jason Green, I'm director of technology at Threads Styling. Threads is a fashion tech company pioneering the world of personalized luxury shopping through chat and social media. I also previously worked with Evgeniy as a principal engineer at the MailOnline. I've been a long time user of data validation with JSON Schema and in particular using AJV, which I've witnessed grow and mature so much over the years. I'm an early investor in simplex chat as well.

Yeah, AJV growth has been indeed crazy so it's got from really nothing in 2015 when it started and now it has 350 million downloads every month with every JavaScript application probably everyone using that. So why do we want to talk about what we talk about, right? In all these projects we've done and we've done some really cool things, right? We've done a content creator at MailOnline when we've been quite mature and nevertheless we've built a very complex in-browser application and with hundreds of thousands of lines of JavaScript code that allowed editing. The whole MailOnline website is managed by that. And then when I was using engineering threads and Jason also joined threads, we built StoryMaker. Mostly Jason built it, I'm just basking in the glory. So it was a content management system for Instagram, which we definitely learned a lot of things from the previous project. And in all of those projects we did, we have been invariably hitting the same problems of security, reliability, data validation, whatever project we do, the problems are invariably the same. So I've done a lot of Haskell, and this parse.don't.validate.maxim belongs to Alexis Kane, one of the best and most genius Haskell engineers out there, who proposes the approach to parsing as an alternative to validation. So rather than just check that your data is correct, you push the proof and carry the proof of validity around. So not just correct, not just check that your data is correct, but also obtain some proof as if it's in some different type and use it across your application. So I'm gonna hand over to Jason. And in this class with JavaScript, what we'll learn is that you should really not be using native JSON in JavaScript. You should be doing some other things. Jason, over to you.

So as we all know, JSON is a widely used format that's generally considered to be flexible and easy to work with. However, it's important to be aware of some of the potential problems and challenges that it has. JSON is particularly wasteful.

2. Challenges with JSON and Importance of Performance

Short description:

Passing JSON can be wasteful as you need to pass the entire data before checking its validity. JSON has security issues and can exhaust the call stack with deep structures or be used in DDoS attacks. Performance and reliability are important depending on the situation, especially when it affects user experience and satisfaction. Fastify is a library that tackles serialization by defining inputs and outputs in JSON Schema, increasing speed and improving data structure handling.

Now, it's not something you're gonna notice in your day-to-day debugging when you're working with it, but passing JSON can be a very wasteful process, as you need to pass the entire piece of data before you can understand or even begin to check if it's valid or not. Because of the potentially complex and nested nature of JSON, it can be particularly time-consuming to then go on and validate.

Many of us who started in JavaScript have come to love working with type script, but then you go and throw a big large blob of unstructured JSON into the mix, and suddenly, you're back to square one. None of your types matter, and everything is unknown again. It also has some security issues. These are issues that I actually wasn't very aware of despite working with it for a long time, until looking into it. If you have very, very deep structures, they can actually exhaust your call stack. This can be just because of the data itself, or it can be a deliberate attack with very deeply nested structures being sent to your APIs. You can also suffer from very large blobs of data being sent to your APIs in the form of a DDoS attack. Once again, before you can even understand if it's valid or not, your API will have to dutifully pass those blobs, which once again is very wasteful. It is even possible to do prototype pollution attacks via JSON as well.

So before we are concerned about performance and reliability, it's important to think about when performance and reliability is actually important. It does seem like an obvious statement. You know, most people wouldn't go out of their way to make an argument that it's not important, but it's not going to be important for every situation. It really depends on various factors. Obviously, a slow app is better than no app at all. So if you have an application that's delivering value, you may have much bigger issues that you need to face before worrying about performance and reliability. Particularly in the early stages of app development, you're going to be much more concerned with time to market. If your app isn't even available yet, that's obviously a big issue. You're going to be concerned about budget, the overall user experience of your application, and of course, what are your users' needs and what's most pressing to them. However, it is going to be an issue when the performance is affecting user experience and satisfaction. That can risk you losing users and those people who go away from your application or site because it didn't load fast enough, they may not come back, which is obviously what we refer to as high bounce rates.

Even worse, if reliability is your issue and your customers are losing their work or their data is becoming corrupted, that's a big issue that, in the best case, can result in some apologies. In the worst case, you may actually end up having to pay for it in some way through compensation or discounts to keep people happy. So there is actually a solution to part of this problem, which is tackled by a library called Fastify, which is a replacement for your Express router. It tackles the serialization part of the problem, which is to say that by defining the inputs and outputs and the shape of them in JSON Schema, this library is able to more quickly serialize the responses and it can get quite good increase in speed because it's focused on ... because it knows the structure of the data it's supposed to be returning. In this way, it can take a lot that would normally be loops and turn them into straight property access. So if you talk about schemas, for a long time JSON Schema was the only way to define the format of the data or the type of the data or whatever you call it. It started from 2009 and since 2020 there is an alternative specification that was created to address the shortcomings.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Scaling Up with Remix and Micro Frontends
Remix Conf Europe 2022Remix Conf Europe 2022
23 min
Scaling Up with Remix and Micro Frontends
Top Content
This talk discusses the usage of Microfrontends in Remix and introduces the Tiny Frontend library. Kazoo, a used car buying platform, follows a domain-driven design approach and encountered issues with granular slicing. Tiny Frontend aims to solve the slicing problem and promotes type safety and compatibility of shared dependencies. The speaker demonstrates how Tiny Frontend works with server-side rendering and how Remix can consume and update components without redeploying the app. The talk also explores the usage of micro frontends and the future support for Webpack Module Federation in Remix.
React's Most Useful Types
React Day Berlin 2023React Day Berlin 2023
21 min
React's Most Useful Types
Top Content
Watch video: React's Most Useful Types
Today's Talk focuses on React's best types and JSX. It covers the types of JSX and React components, including React.fc and React.reactnode. The discussion also explores JSX intrinsic elements and react.component props, highlighting their differences and use cases. The Talk concludes with insights on using React.componentType and passing components, as well as utilizing the react.element ref type for external libraries like React-Select.
Full Stack Components
Remix Conf Europe 2022Remix Conf Europe 2022
37 min
Full Stack Components
Top Content
RemixConf EU discussed full stack components and their benefits, such as marrying the backend and UI in the same file. The talk demonstrated the implementation of a combo box with search functionality using Remix and the Downshift library. It also highlighted the ease of creating resource routes in Remix and the importance of code organization and maintainability in full stack components. The speaker expressed gratitude towards the audience and discussed the future of Remix, including its acquisition by Shopify and the potential for collaboration with Hydrogen.
TypeScript and React: Secrets of a Happy Marriage
React Advanced Conference 2022React Advanced Conference 2022
21 min
TypeScript and React: Secrets of a Happy Marriage
Top Content
React and TypeScript have a strong relationship, with TypeScript offering benefits like better type checking and contract enforcement. Failing early and failing hard is important in software development to catch errors and debug effectively. TypeScript provides early detection of errors and ensures data accuracy in components and hooks. It offers superior type safety but can become complex as the codebase grows. Using union types in props can resolve errors and address dependencies. Dynamic communication and type contracts can be achieved through generics. Understanding React's built-in types and hooks like useState and useRef is crucial for leveraging their functionality.
Debugging JS
React Summit 2023React Summit 2023
24 min
Debugging JS
Top Content
Watch video: Debugging JS
Debugging JavaScript is a crucial skill that is often overlooked in the industry. It is important to understand the problem, reproduce the issue, and identify the root cause. Having a variety of debugging tools and techniques, such as console methods and graphical debuggers, is beneficial. Replay is a time-traveling debugger for JavaScript that allows users to record and inspect bugs. It works with Redux, plain React, and even minified code with the help of source maps.
Making JavaScript on WebAssembly Fast
JSNation Live 2021JSNation Live 2021
29 min
Making JavaScript on WebAssembly Fast
Top Content
WebAssembly enables optimizing JavaScript performance for different environments by deploying the JavaScript engine as a portable WebAssembly module. By making JavaScript on WebAssembly fast, instances can be created for each request, reducing latency and security risks. Initialization and runtime phases can be improved with tools like Wiser and snapshotting, resulting in faster startup times. Optimizing JavaScript performance in WebAssembly can be achieved through techniques like ahead-of-time compilation and inline caching. WebAssembly usage is growing outside the web, offering benefits like isolation and portability. Build sizes and snapshotting in WebAssembly depend on the application, and more information can be found on the Mozilla Hacks website and Bike Reliance site.

Workshops on related topic

React, TypeScript, and TDD
React Advanced Conference 2021React Advanced Conference 2021
174 min
React, TypeScript, and TDD
Top Content
Featured WorkshopFree
Paul Everitt
Paul Everitt
ReactJS is wildly popular and thus wildly supported. TypeScript is increasingly popular, and thus increasingly supported.

The two together? Not as much. Given that they both change quickly, it's hard to find accurate learning materials.

React+TypeScript, with JetBrains IDEs? That three-part combination is the topic of this series. We'll show a little about a lot. Meaning, the key steps to getting productive, in the IDE, for React projects using TypeScript. Along the way we'll show test-driven development and emphasize tips-and-tricks in the IDE.
Mastering advanced concepts in TypeScript
React Summit US 2023React Summit US 2023
132 min
Mastering advanced concepts in TypeScript
Top Content
Featured WorkshopFree
Jiri Lojda
Jiri Lojda
TypeScript is not just types and interfaces. Join this workshop to master more advanced features of TypeScript that will make your code bullet-proof. We will cover conditional types and infer notation, template strings and how to map over union types and object/array properties. Each topic will be demonstrated on a sample application that was written with basic types or no types at all and we will together improve the code so you get more familiar with each feature and can bring this new knowledge directly into your projects.
You will learn:- - What are conditional types and infer notation- What are template strings- How to map over union types and object/array properties.
Deep TypeScript Tips & Tricks
Node Congress 2024Node Congress 2024
83 min
Deep TypeScript Tips & Tricks
Top Content
Featured Workshop
Josh Goldberg
Josh Goldberg
TypeScript has a powerful type system with all sorts of fancy features for representing wild and wacky JavaScript states. But the syntax to do so isn't always straightforward, and the error messages aren't always precise in telling you what's wrong. Let's dive into how many of TypeScript's more powerful features really work, what kinds of real-world problems they solve, and how to wrestle the type system into submission so you can write truly excellent TypeScript code.
Best Practices and Advanced TypeScript Tips for React Developers
React Advanced Conference 2022React Advanced Conference 2022
148 min
Best Practices and Advanced TypeScript Tips for React Developers
Top Content
Featured Workshop
Maurice de Beijer
Maurice de Beijer
Are you a React developer trying to get the most benefits from TypeScript? Then this is the workshop for you.In this interactive workshop, we will start at the basics and examine the pros and cons of different ways you can declare React components using TypeScript. After that we will move to more advanced concepts where we will go beyond the strict setting of TypeScript. You will learn when to use types like any, unknown and never. We will explore the use of type predicates, guards and exhaustive checking. You will learn about the built-in mapped types as well as how to create your own new type map utilities. And we will start programming in the TypeScript type system using conditional types and type inferring.
Master JavaScript Patterns
JSNation 2024JSNation 2024
145 min
Master JavaScript Patterns
Featured Workshop
Adrian Hajdin
Adrian Hajdin
During this workshop, participants will review the essential JavaScript patterns that every developer should know. Through hands-on exercises, real-world examples, and interactive discussions, attendees will deepen their understanding of best practices for organizing code, solving common challenges, and designing scalable architectures. By the end of the workshop, participants will gain newfound confidence in their ability to write high-quality JavaScript code that stands the test of time.
Points Covered:
1. Introduction to JavaScript Patterns2. Foundational Patterns3. Object Creation Patterns4. Behavioral Patterns5. Architectural Patterns6. Hands-On Exercises and Case Studies
How It Will Help Developers:
- Gain a deep understanding of JavaScript patterns and their applications in real-world scenarios- Learn best practices for organizing code, solving common challenges, and designing scalable architectures- Enhance problem-solving skills and code readability- Improve collaboration and communication within development teams- Accelerate career growth and opportunities for advancement in the software industry
Building Your Own Custom Type System
React Summit 2024React Summit 2024
38 min
Building Your Own Custom Type System
Featured Workshop
Kunal Dubey
Kunal Dubey
I'll introduce the audience to a concept where they can have end-to-end type systems that helps ensure typesafety across the teams Such a system not only improves communication between teams but also helps teams collaborate effectively and ship way faster than they used to before. By having a custom type system, teams can also identify the errors and modify the API contracts on their IDE, which contributes to a better Developer Experience. The workshop would primarily leverage TS to showcase the concept and use tools like OpenAPI to generate the typesystem on the client side.