Video Summary and Transcription
The Talk discusses how the engineering team at Expedia improved the performance of customer flight search by using various metrics and techniques. These include prefetching resources during browser idle time, preemptive search to predict responses, and optimizing performance through micro queries and a micro front-end architecture. The team also focused on improving build and package size limits for better code analysis. Performance monitoring and automation were implemented for ongoing performance improvements.
1. Improving Flight Search Performance
Hi everyone. Ina from the engineering team at Expedia will discuss how we improved the performance of customer flight search. The motivation behind this improvement is the impact of latency on user experience and attention. We use lighthouse metrics such as FCP, first input delay, cumulative layout shift, and time to interactive. Page usable time and non-supply overhead are two derived perf metrics we monitor. Prefetching resources during browser idle time allows faster retrieval, especially for new users. The next experiment is preemptive search.
Hi everyone. So myself, Ina, I am from engineering team at Expedia. So this talk is around the search speed, how we made the flight search faster, how we dramatically improved the performance of customer flight search on Expedia.
So before going deeper into the topic, let me first share the motivation, what brings us to improve the performance on the flight search page. So first of all, on the flight search page, the search topic is at peak. And if the page is not performance efficient, then it leads to an increase in latency and hence the user experience gets impacted and thereby the user attention is also impacted.
Also, before beginning over the perf experiment, let me first bring you to the performance metrics topic. So for measurement, there are some common set of lighthouse metrics that can be monitored. A few important ones for the pages are first contentful paint that is commonly known as FCP. Then there is first input delay. There is cumulative layout shift and time to interactive. Other than that, we can also make use of some derived perf metrics. Two of those for Expedia that helped us to monitor the perf metric for the users is page usable time. And then there is non-supply overhead. Page usable time is the metric that is marked when the main component of flight search page gets mounted. And non-supply overhead is the overall page usable time on the flight search page minus the supply overhead. That means that the total amount of time taken by Expedia to reach to the flight search component without depending on the supply. Other than that, there is a size limit also that we have put on the flight search page to make sure that the bundle size as well as the package that we have within the flight search are not exceeding the threshold.
Now let's first come to the first perf experiment that is prefetching. So prefetching means that we are fetching the resources beforehand during the browser ideal time. And when we land on the current page, we are not fetching the resources from the CDN path but from the prefetch cache. This helps us to retrieve the resources faster. And before coming to prefetching, it's important to prepare what all resources you want to prefetch. That is, it's not important that all the resources need to be prefetched right on the previous page but the important resources, ideally the ones that are commonly used across multiple pages can be prefetched so that the retrieval is faster. Also prefetching is impactful for the new users. The users who are not fetching, who are not making use of the resources from the browser cache. For the existing users, the resources already come from the browser cache and hence prefetching might not be impactful there. Or if you are opening from the incognito, then also it's not impactful. But if you are the new user, then it's going to impact a lot. Then the next experiment that we have is preemptive search.
2. Optimizing Performance and Architecture
Preemptive search predicts the response before the user lands on the flight search page, improving performance by 50% on web and native. Micro queries fetch responses in chunks, improving page performance by 20%. Async queries and improved waterfall diagram result in an 8% performance improvement. Micro front-end architecture breaks down page-level components into shareable packages, optimizing performance and ensuring maintainability.
So by preemptive search, we mean that we are preemptively calling the search response. That is, the response is getting predicted even before the user lands on the flight search page. So, that is done by knowing what all search inputs we have on the previous page, that is, the home page for flight search. And as soon as the user triggers the search button, we know that this is the search response that the user is going to ask for. So we cache the response beforehand and when the user lands on the flight search page, the cached response is received by the user. This was a very important experiment in terms of perf measurement and it helps us to improve the performance by nearly 50% on both web and native.
The next perf experiment that we have is the micro queries. So what we were doing initially on the flight search page was there was a main page level query that was giving us all the responses at once. So once that main bulky page query was broken down into micro queries, we were able to fetch the responses in chunks rather than loading all the responses at once. So this helps us to make sure that the user is able to see important page level information beforehand and then the information that was not needed during the page load time were fetched. So, by that, we were able to improve the page performance by nearly 20%. And we were also separate out some of the information like the fair detail information that is not needed immediately by the user.
Another important thing when it comes to performance is making sure that the queries are happening on async manner. So for that, the first step for any of the pages that we analyze the waterfall diagram for the page to make sure that the network calls are happening as expected. So it's important that the calls are not waiting for each other unless they are dependent on each other. So in our case, the loading and loaded queries are actually independent of each other. So it was made sure that these calls are getting triggered at the same time and are not waiting for each other. We did observe nearly 8% of the experiment by improving how the queries are getting executed and improving the waterfall diagram for the page.
The next thing here is ensuring a micro front-end architecture is getting followed on the page. It also depends on the page as well as like it depends on the requirements as well. For us, micro front-end architecture is useful so far. What it meant is that we broke down the page-level components into shareable packages by making those page-level components such as offer details, fair details, etc. We were able to make those shareable package flexible. The packages were also reusable across pages. So for example, Flight Search and Flight Information page are correlated to each other. So we were able to reuse those shareable packages. We can also optimize at package-level efficiently than at the page-level itself. Another thing to note here is that the packages are maintainable. We were able to define who is going to maintain which package. So there was a sense of ownership when it comes to package very efficiently.
3. Improving Build Size and Monitoring
We improved build size and package size limit, enabling better code analysis and performance. The perf experiment had a significant impact on page usable time and non-supply overhead. Performance is a continuous process, with monitoring and automation to detect and alert for perf degradation. Questions and feedback can be directed to my email or LinkedIn profile. Check the references for more detailed perf experiments and performance scoring information.
We were also able to improve build size by improving the package size limit overall. So this way we were able to not just refactor the code but it also helped us to improve the performance because we were able to analyze the components a little better than we could have done at the page-level.
Another aspect here is that we were able to improve the perf matrix a lot by following this perf experiment. As you can see on the slide, there is a visible impact on the top 90th percentile on page usable time by 52% and top 90th percentile on non-supply overhead.
Of course, I would like to conclude the talk on performance by saying that it is a continuous process. Once we are done with the perf experiment, we also made sure that monitoring is happening. There is an automation process if in case the perf threshold exceeds a specific limit or if in case we find any faulty commit version that's causing any perf degradation then we are alerted beforehand.
Now I'll leave to the audience for any questions and answer. Please feel free to reach out over my email or my LinkedIn profile for the same. I'll also be in the conference for the same. So in case anyone has any questions, please feel free to network with me or reach out to me.
There are also a set of references that I have added. The first one is the Medium Log. This is around the detailed description of the perf experiments that I just pointed out. Then there is a very useful link that I found around prefetching and the performance scoring one, like how we can score the performance. So please feel free to go through the references as well for better understanding.
Thanks a lot for your time.
Comments