Keep Calm and Deploy On: Creating Safer Releases with Feature Flags

Rate this content
Bookmark

Creating and deploying new software is risky. We've all seen how easily bugs arise, causing software to be poorly delivered or to the wrong people. What's more, depending on how tightly we couple our systems and services, they can interact unexpectedly and unfortunately with existing software or hardware. Beyond unintended consequences, we can also find that people can use our services for nefarious purposes. It's essential to have safety nets in place when things don't go as planned or people attempt to break the rules. In this session, we'll discuss how feature flags can work in both temporary and permanent scenarios to enable you to break the quality triangle and deliver quality promptly.

This talk has been presented at React Advanced Conference 2022, check out the latest edition of this React Conference.

FAQ

Feature flags are used to control the visibility of features within the code, allowing developers to preview features, test in production, and roll out features to subsets of users without fully deploying them to all end users.

Feature flags help mitigate risk by allowing developers to control feature visibility dynamically, degrade non-critical functionality, pivot to fallback services, and isolate bad actors, thus maintaining application availability and uptime.

A safety valve is a long-term feature flag used to degrade non-critical functionality of applications and services, helping to maintain availability and protect against single points of failure.

Feature flags can be used to roll back gracefully to a previous state, maintain uptime, and fix issues dynamically without taking the application offline, thereby supporting service level objectives (SLOs).

Feature flags can isolate bad actors by targeting specific endpoints or devices, allowing the rest of the users to continue receiving normal service while mitigating the impact of malicious activity.

Feature flags allow for staged rollouts by enabling developers to release features to a subset of users initially and gradually expand the rollout as confidence in the feature's stability grows.

Monitoring is crucial when using feature flags as it helps in understanding the impact on CPU, performance, and overall system behavior, allowing for dynamic adjustments and informed decision-making.

Yes, feature flags can store configuration settings, allowing developers to dynamically adjust values and configurations based on real-time data and assumptions, thus reducing guesswork and improving system tuning.

Feature flags provide a way to manage complexity and dynamically adjust configurations in systems with numerous microservices and network calls, helping to maintain stability and performance.

Feature flags allow developers to explore new possibilities with added protection and confidence, helping to manage unknowns, venture into uncharted territory, and continuously improve software through controlled experimentation.

Jessica Cregg
Jessica Cregg
7 min
24 Oct, 2022

Comments

Sign in or register to post your comment.

Video Summary and Transcription

Feature flags can be used to mitigate risk in software development by altering the visibility of features to end users. By using flags, you can protect against single points of failure and pivot to a fallback service in worst-case scenarios. Monitoring and managing complexity is crucial, and using feature flags allows for dynamic changes and adjusting values based on proven correctness. Operating in the unknown is inevitable in software development, so it's important to manage complexity and embrace learning. Collaboration is key in making feature failures less painful.

1. Introduction to Flags and Risk Mitigation

Short description:

Hey everyone at React Advanced. I'm Jessica, and I'm going to talk to you about how you can use flags to mitigate risk in your software development. Feature flags are typically used to alter the visibility of a feature to end users. They can be used for testing, rolling out features to a subset of users, and more. At LaunchDarkly, we can flag based on different types of data, allowing you to mitigate risk in complex scenarios.

Hey everyone at React Advanced. Hope you're having a good time. I'm Jessica, and I'm going to talk to you about how you can use flags to mitigate risk in your software development. So, let's get into it.

Now you've likely heard about feature flags solving sort of release-shaped problems, right? And they're often used in these sort of entitlement scenarios, changing what's available to certain users. And it's typically used in that sort of boolean state. We take a feature, we wrap it in a flag, and that effectively becomes our control point within our code, allowing us to alter its visibility to our end users. The feature's either visible or it isn't. It's on or it's off. And once we've validated the changes in production and are confident that our feature can be on for 100% of our audience, we get rid of the flag. That's the kind of typical lifecycle that we see with flags.

As you know, this is super useful when it comes to, say, previewing features for testing and production without going out to our end users or for rolling out customer-facing features to just a subset of our user base. But what if the problem we're trying to solve requires more than just a binary state change or A-B testing? At LaunchDarkly, when we're talking about flags, we're not simply talking about two states. We can actually deal with a whole spectrum of stages in your release process. We can flag based on a number, a string. We even have JSON flags. And that allows you to mitigate risk in these more sort of complex scenarios.

2. Flags for Risk Mitigation

Short description:

It's important to protect yourself from single points of failure and mitigate risk by using flags. By flagging around potential failure points, you can create a system that allows you to pivot to a fallback service in worst-case scenarios. This helps you roll back gracefully and maintain uptime, even in the presence of bad actors. Switching flags can support online stability and provide agility in resolving issues. Staging rollouts and using flags ensure a solution that can be applied across your user base. Flags give you certainty and the ability to operate from one version of the truth. It's crucial to take care when deploying in complex environments.

It ultimately helps you maintain availability of all of your applications. And it's super common, as we all know, to rely on downstream services and providers. But things start to get scary when you have a single point of failure in your delivery. Well, why not protect yourself? De-risk that element. By flagging around that point, you could effectively create a system that allows you to pivot to a fallback service, if in case the worst case scenario does in fact occur, which we know it often does, unfortunately. Sorry.

This gives you the ability to roll back gracefully, and without having to go offline altogether, all within about 200 milliseconds. You're protecting your uptime, you're supporting your team's SLOs, and everyone's much happier. This can also be done in the case of bad actors. Say someone's using your service for something they really shouldn't be. You can isolate that one endpoint. You can give 404 for that one bad-acting device and everyone still gets their 200s. In essence, you get to really define how you degrade. You get to ring fence your blast radius and make a decision around how you do roll back. So this is perfect for scenarios like load shedding or for manual control of certain problems. This process is all about putting you back in control of a situation that you likely didn't anticipate or ask for.

And of course, when we're talking about resolution of these sort of scenarios, let's take the situation where a safety valve is able to maintain uptime by rolling back to a previous state where there's like a breaking change. Switching a flag can not only support you in staying online, but also gives you the agility needed to be able to fix the issue at hand. Using the audit log and your observability platform, you're able to pinpoint the issue, see when, where it occurred. What was the change that contributed towards the outage? And when you fix is, in fact, ready to deploy, you, of course, need to be ready, like sure that it can actually go out to all of your users. That it is going to be a solution that can be applied across your user base, isn't going to cause further problems when implemented because we can stage your rollout. You can stage your fix by going out to a subset of your users at first and gradually rolling out to more and more people as your confidence grows. Flags give you the gift of certainty here. It gives the ability for everyone to operate from one singular version of the truth. And now that you're back online, your fixes live to your entire user base.

Of course, we want to stay online, right? Sometimes it's hard to know if your configuration is truly good to go. You may make some guesses based on your platform, how it behaves in certain scenarios. But the thing is, is that assumptions, they can be easily proven incorrect and preconceptions proven wrong. You know, when you're having a myriad of microservices or dealing with processes requiring numerous network calls, there's some complex tuning often required. A lot of the time you're having to take a great deal of care when deploying.