In this talk, I'll take you through my journey as I joined the team supporting our Smart TVs application and share my experience learning one of the most overlooked but essential pieces of functionality we have.
Let’s Build a TV Spatial Navigation
FAQ
The topic of Sergio Avalos's talk is Spatial Navigation for smart TV applications.
Sergio Avalos is a software engineer at Spotify, working on the team behind the Spotify client that runs on smart TVs.
Spatial Navigation is a term used to describe the process of navigating a TV interface using the directional keys on a TV remote control.
Using IDs for navigational elements can be error-prone, difficult to work with dynamic views, and adds extra information unrelated to the application logic.
As of 2023, browser support for Spatial Navigation is still a work in progress. There is a proposal in draft, but it is not yet implemented.
Yes, there is an open-source project provided by Norwegian Media, released in 2019, but it wasn't available when Spotify's smart TV application was initially developed.
Sergio Avalos suggests using a hook function that returns a callback for setting the reference of the HTML element and managing focus without relying on static IDs.
Some advanced challenges in Spatial Navigation include handling non-matrix layouts, managing focus on pop-ups, and implementing circular navigation for convenience.
Developers can use the library linked in Sergio Avalos's presentation to start building smart TV applications without developing Spatial Navigation logic from scratch.
A library for Spatial Navigation is needed because smart TVs have different operating systems, and using a web application for the user interface can lose native platform support, including Spatial Navigation.
1. Introduction to Spatial Navigation#
Welcome to the talk on Spatial Navigation. We'll be discussing the challenges of implementing spatial navigation for TV controls and why a library is needed. The market for smart TVs has multiple brands with their own operating systems, making it necessary to have native applications for each. However, to simplify maintenance, we built a web application for the user interface. Unfortunately, this approach resulted in the loss of native platform support for spatial navigation. Although there is a proposal to provide this functionality in browsers, it is still a work in progress.
Welcome, everyone. Thank you very, very much for joining this talk.
My name is Sergio Avalos, and we're going to be talking about Spatial Navigation. But rather than talking, we're going to be building in.
I'm a software engineer at Spotify and I recently joined about a year ago the team working behind the Spotify client that runs on your smart TV. That means that for this talk, we're not going to be talking about mobile, neither we're going to be talking about desktop. And most importantly, we're not going to be talking about mouse. Instead, we're going to be talking about TV control, that gadget that I bet all of you have in your living rooms.
Special navigation is nothing else, just a fancy name for describing what you do with the TV control when you are pressing the directional keys, the arrow keys for selecting the one application or just navigating to one of them. That got me very curious when I joined the recent team where I'm working because I didn't know that, I mean, I was surprised that one had to create a library for that. So I decided to dig into the code and I was fascinated. Not because the code was amazing, I mean, it was fine, the code, but most importantly because I felt that it was a very interesting problem to solve. So that's what this talk is about. I want to share with you my learning of how I learned about this library and what a better way to learn than just building it ourselves.
But in case you wonder, because that was my first impression, why do we need to build a library for spatial navigation? I mean, isn't it a huge utility that should be provided by the platforms? And the answer is yes, totally. If you're building a native application. Let me try to explain.
The market for smart TVs is quite cement, there are many brands and each of them run their own operating system. That means that you need to have your native application running for each of them. But, just to make our lives easier and reduce the maintenance costs, we decided to build a user interface using a web application that can be loaded in each of the native app. That gave us a great interoperability of shipping the same code to all these native applications. But, it came at the cost of losing the support from the native platform. In that case, it's obviously the spatial navigation.
Then, I was thinking, okay, okay. But, the year is 2023. Shouldn't that be provided by the browser? I mean, the browser, nowadays, is a very sophisticated piece of software. And, the answer is not yet. It's a work in progress. There is a proposal. It's still a draft for building this functionality, but it's not there yet.
2. Improving the Approach to Spatial Navigation#
We need to continue waiting. Are there any open-source projects we could use? Norwegian Media released one in 2019, but our application is older. Let's start building it. Wrap each navigational element with an ID and tell them where to go. This approach has caveats: difficult with dynamic views, prone to mistakes, and adds extra information. Let's improve this approach by developing the extra logic to connect TV control with our application.
We need to continue waiting. Then, I was thinking, okay, okay. But, are there any open-source projects out there that we could use? And, actually, there is. Thank you very much, Norwegian Media, for providing this. Unfortunately, they released it in, no, unfortunately, but they released it in 2019, and our application is a little bit older than that. So, we didn't have any back then.
Having answered that question, let's start. Let's start building it. If I ask you just from top of your head, like your intuition, how would you do it? I don't know about you, but for me, it was, I mean, the simplest that I could come up, and I think I read it on a blog from Norwegian Media, and even from Netflix. It's basically, you just wrap each of what I call navigational elements as the element that the user can interact with it, with just an ID, just identify them, and then you tell them where to go. Take for example, the sidebar of our application, the Spotify application. Each of these elements is just a link for the home view, the search, and so on, and like I've explained before, you wrap them with an ID, and in that wrapper, you tell them where to go, so if you are going to the, if you're in the search and you go to the app, then you tell them, go to this ID that is the home.
That approach actually gets the job done, but it obviously have a few caveats as you, I can imagine, you can anticipate. One is that it is difficult to work with dynamic views. Think for example, recommendations, the developer doesn't know what they are going to get. Also, it's ever prone, because the developer is the developers role to add this ID manually, so mistakes can happen. We're humans. And finally, it just adds extra information that is not related to the application, like I said, this is just a utility that should be invisible to the application layer. So let's improve this approach.
For this presentation, I built a very small application app that basically has just two views. It's a welcome. Click on this one. Then you go to another view that renders you just a surprise for whatever. And then you have the go back link. And then you come back to SMBN. It works perfectly well with the mouse, but it doesn't work with the TV control. So this is exactly what we're going to do. We're going to develop the extra logic that we need to connect TV control with our very simple application. Demo application.
3. Demo Application and Spatial Navigation Logic#
In the demo application, we have the index page with the router configuration for the welcome page and the surprise page. Each view is a React component, such as the question box and the go back link. We register navigational nodes, listen to events from the TBControl, and select the next element based on the direction. We create a NavigationEngine class to handle this logic and make it available to the app using a context provider. The API for setting the HTML element reference is straightforward, with a focus function.
Demo application. OK. Awesome. So I'm just going to go very briefly to the source code of the demonstration app. We have the index page that you get from the React app and create React app, a script. Inside of it, we have the component for our application that is just the router configuration for going to the welcome page and the surprise page. For this, I'm using React router DOM library.
And each of these components, I'm sorry, each of these views are just another React component that we have. For example, for the welcome page, we have an array of 10 empty elements and we only use it to render 10 different components that is the question box. And for the surprise view, we have already hard coded the links of the images that will display randomly along with another go back link. Finally, these two components are the question box and the go back link. That is nothing else, but it just only uses the link component from the React router. In the case of the question box, it's like rendering the question box image and the other one is basically just rendering the children. That is the text that says, go back.
Okay. So here is this. Let's jump into the logic of the spatial navigation. First, we start registering all the navigational nodes, then we listen to the events coming up from the TBControl. Finally, from there we select the element that should go depending on the direction. And finally, we just update the cursor, meaning what is the next element that should be focused? If I put everything on a diagram so it's crystal clear for you from the steps 1 to 3, you can see that each of the question boxes is going to be registered on a class called NavigationEngine with the method RegisterNode. We add an event listener called OnKeyDown that will call the HandleNavigation method from this class that we just defined. All right, step number one, let's create a class of this NavigationEngine that has a private variable called Node, and then one method for adding nodes to this private variable and another one for removing. Then we go back to the index script where we instantiate this class NavigationEngine for the purpose of this talk we make it available to the whole global Finally, we also make it available inside our app using a context provider. I hope you don't believe that we are writing directly from the Windows. That's only for this presentation. Finally, we go back to the navigation, not finally, but we go back to the navigational element because I wanted to show you first the API that I encountered. I felt it was super simple. It was just a hook function that returns you a callback for like setting the reference of the HTML element that you're rendering. And then there is focus as simple as that. You don't need to think about IDs.
4. UseFocusRef Hook and TV Control Integration#
To use the useFocusRef hook function, you need to create a reference value with a callback, generate a unique ID, and obtain an instance of the navigation engine. The registerNode method is called when the component is rendered and removed when it's sub-mounted to avoid memory leaks. We can debug the navigational nodes to ensure they are rendered correctly. We add an event listener to the document to listen to key presses and call the handleKeyEvent callback function. We use a map to define directional keys and integrate them with our internal values in the app.
It's only, that's the only thing that you need to do. How do you call this useFocusRef hook function? Well, it first, you create a reference value with the callback to instantiate this reference. Then you generate a unique ID. And finally you obtain one of the instance of the navigation engine using the context provider. And with the help of the useEffect function, every time the component is render, is mounted, excuse me, we're going to call the method called registerNode. And when it's sub-mounted we're going to remove it to avoid having memory leaks.
Cool. We are going to now debug this. So we want to make sure like all the navigational nodes are render and if we look at the nodes variable we see that we have 10. We click in each one of them and then again we have only one, so it's refreshing. We can even inspect inside and if we do see the reference is pointed to the HTML element. We go back and then again we have 10. So it's working. Let's go with the step number two.
Listen to the TV control. In the where we are in the in our application component we add an event listener to the document so that every time the any key is pressed then we just call this a callback function called handleKeyEvent. To generate that callback we basically just are we're using a method that will distinguish whether the key that you are pressing is one of the directional ones, the arrows, and just for this step we're going to console.login so we're able to debug it. To build these directional keys we already have a map where we define what is a directional key and that map is just basically the integration between our internal values in the app what we define as being up next and the values coming from the from the platform, in this case the our platform is a desktop, where it can change depending on if you're actually running on a smart tv or on a gaming console for example.
5. Integration Logic and Node Selection#
This part focuses on the integration logic for the native component of the application. We explore how to select the next node after a key press by using the getBoundingClientRect method to obtain the dimensions and coordinates of each node. We then filter the nodes by direction and main axis, and select the closest one based on distance. The NavigationEngine class is updated with the handleNavigation method to implement this logic. Finally, we update the course based on the initial diagram.
This is the part of the integration logic that the native component native application needs to know but we're not going to do it for this presentation.
So let's go and see if the events of the tv control are being registered and yeah we can see here I press the key down and it's telling me like it's the arrow key I go to the left and when then we have the left.
Now we can go with the funnest part of this code which is selecting the node after the user press the key. In case you were wondering why do we need the reference it's because we can call this method called getBoundingClientRect that gives you exactly the dimensions and also the coordinates according to the viewport where the element is rendered. That means that if you take all the nodes and you call this method one by one then you get all the information that you need to build that logic so you can forget in this moment about the application or whatever render.
With this information we can exactly decide where this focus should go. So how do you choose this the next node? First you filter all the nodes by the direction, then you filter by the main axis, and finally you pick the closest one just by the distance. Let's go step by step. Imagine that we're talking about another matrix, a bigger one, five by five. You are in the middle and the key that is pressed is going to the right. Then you will filter the first the last two columns, excuse me, but if you are pressing to the top then you pick the first two rows. From those, you choose by the main axis, so if you are going to the right then you choose those nodes that are between the margin top and the margin bottom. The same if you are going up then you choose between the margin right and the margin left. Sorry, it should be like that! Finally, once you narrow to those two, then you pick the one closest to the distance. How does it look in code? We go back to the NavigationEngine class that we defined before, we add the handleNavigation method, and we do this step by step. First we filter by the direction, and we do that with the help of this dictionary that already has the predefined method that you need to filter all those methods. Then from there, you do exactly the same but you are going to do the filter by the main axis, and then you pick the closest element. And we're pretty much done. Now we can go back to the step number two, remove the console.log, and call the method called handleNavigation. Cool. Let's see if this works. So if I click the down, then we can see the I already have it, auto-complete. We can see this is the element class. So we're in this corner. If we press to the left, then we have this one. If we press down, then you know what's going to happen, right? Awesome. So it's working, but it's not selecting the element that is supposed to be selected. Let's do that. Step number four, update the course. So we go back to the diagram that I showed at the beginning and we're going to update it.
6. Adding Subscribers and Final Remarks#
So every time a note is raised, we add a subscriber and execute callbacks to notify all subscribers. We update the useHook function to keep track of the focused element's state. We demonstrate the functionality and mention the challenges of complex arrangements, annoying pop-ups, and circular navigations. We provide links to a report and a library for further exploration. We encourage getting in touch and building a community around smart TV application development.
So every time a note is raised, we're going to add a subscriber. So the note can say like, Hey, if anything happens, please let me know. Notify me. Let's do this in code. If we go back to the navigation engine class, then we add one method for adding those subscriptions. Another one for executing all those callbacks called notifyAllSubscribers. And we call this method exactly after we found one of the elements.
We update the useHook function. We update the value that we previously defined and is focused with a new hook function that will keep the state whether that element in turn is focused or not. And again, with the help of the useEffect function, every time the component is mounted, first we check it initially like, hey, am I focused? Yeah or not. We update the state. And we also subscribe for whenever the navigation engine calls the handle navigation, then I just check, hey, is it me who is focused? And if so, then we just update the state.
Cool, we're pretty much done. Let's just do a quick demonstration. So it's going to top-down, it's going down, left, left, we click on OK, and again. So perfect, it's working. I was afraid of the demo. So this is just the beginning, because trust me, from here, it just gets a lot more complicated, a lot more funnier. For example, your application is not going to be a perfect shape matrix instead, where you're going to deal with a more complex arrangement where you have the sidebar or the different columns, so literally you need to deal with those corner cases. What about those pop-annoying and inconvenient pop-ups that tell you, hey, do you want to buy premium? This is an interesting case because you need to focus the attention on just two elements, accept or buy. Although all those navigation notes are behind, you just need to focus on those two. And finally, one of my favorite ones is called circular navigations, where you want to solve the constraints in some areas. For example, the sidebar menu, if the user is hitting down, down, down, down, it reaches the bottom, just for convenience, we want to reach it to the top. Trust me, this is a total game-changer. But we're not going to continue because we're out of time. But if you feel curious and you want to continue the party, here is the link of the report that I used for this presentation so we can continue creating. On the other hand, if you feel inspired and say like, I want to create my first smart application for the smart TV, there is the link of the library that you can use so you don't have to build this from scratch. However, if you think you've been thinking like, oh, this is way too complicated, you can do it in a much easier way. Please get in touch. My personal goal of this talk is like to create a small community because building an application for a smart TV is not easy. So let's get in touch. Let's help each other. But that's it for now. Muchas gracias. Thank you very much. Here you have again the links to the QR codes of the links. And I really hope to hear from you.
Available in other languages:
Check out more articles and videos
We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career
Workshops on related topic
Table of contents:
1 - The infamous "N+1" problem: Jonathan Baker - Let's talk about what it is, why it is a problem, and how Shopify handles it at scale across several GraphQL APIs.
2 - Contextualizing GraphQL APIs: Alex Ackerman - How and why we decided to use directives. I’ll share what directives are, which directives are available out of the box, and how to create custom directives.
3 - Faster GraphQL queries for mobile clients: Theo Ben Hassen - As your mobile app grows, so will your GraphQL queries. In this talk, I will go over diverse strategies to make your queries faster and more effective.
4 - Building tomorrow’s product today: Greg MacWilliam - How Shopify adopts future features in today’s code.
5 - Managing large APIs effectively: Rebecca Friedman - We have thousands of developers at Shopify. Let’s take a look at how we’re ensuring the quality and consistency of our GraphQL APIs with so many contributors.
We will enhance a full-stack JS application (Node.js backend + Vanilla JS frontend) to authenticate users with One Time Passwords (email) and OAuth, including:
- User authentication – Managing user interactions, returning session / refresh JWTs- Session management and validation – Storing the session securely for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Comments