It ultimately helps you maintain availability of all of your applications. And it's super common, as we all know, to rely on downstream services and providers. But things start to get scary when you have a single point of failure in your delivery. Well, why not protect yourself? De-risk that element. By flagging around that point, you could effectively create a system that allows you to pivot to a fallback service, if in case the worst case scenario does in fact occur, which we know it often does, unfortunately. Sorry.
This gives you the ability to roll back gracefully, and without having to go offline altogether, all within about 200 milliseconds. You're protecting your uptime, you're supporting your team's SLOs, and everyone's much happier. This can also be done in the case of bad actors. Say someone's using your service for something they really shouldn't be. You can isolate that one endpoint. You can give 404 for that one bad-acting device and everyone still gets their 200s. In essence, you get to really define how you degrade. You get to ring fence your blast radius and make a decision around how you do roll back. So this is perfect for scenarios like load shedding or for manual control of certain problems. This process is all about putting you back in control of a situation that you likely didn't anticipate or ask for.
And of course, when we're talking about resolution of these sort of scenarios, let's take the situation where a safety valve is able to maintain uptime by rolling back to a previous state where there's like a breaking change. Switching a flag can not only support you in staying online, but also gives you the agility needed to be able to fix the issue at hand. Using the audit log and your observability platform, you're able to pinpoint the issue, see when, where it occurred. What was the change that contributed towards the outage? And when you fix is, in fact, ready to deploy, you, of course, need to be ready, like sure that it can actually go out to all of your users. That it is going to be a solution that can be applied across your user base, isn't going to cause further problems when implemented because we can stage your rollout. You can stage your fix by going out to a subset of your users at first and gradually rolling out to more and more people as your confidence grows. Flags give you the gift of certainty here. It gives the ability for everyone to operate from one singular version of the truth. And now that you're back online, your fixes live to your entire user base.
Of course, we want to stay online, right? Sometimes it's hard to know if your configuration is truly good to go. You may make some guesses based on your platform, how it behaves in certain scenarios. But the thing is, is that assumptions, they can be easily proven incorrect and preconceptions proven wrong. You know, when you're having a myriad of microservices or dealing with processes requiring numerous network calls, there's some complex tuning often required. A lot of the time you're having to take a great deal of care when deploying.
Comments