DevOps

When I talk about desired outcomes or answer a question about where to get started with any part of a DevOps initiative, I like to mention NASCAR or Formula 1 racing. Crew chiefs for these race teams have a goal: finish in the best place possible with the resources available while overcoming the adversity thrown at you. If the team feels capable, the goal gets moved up a series of levels to holding a trophy at the end of the race.

To achieve their goals, race teams don't think from start to finish; they flip the table to look at the race from the end goal to the beginning. They set a goal, a stretch goal, and then work backward from that goal to determine how to get there. Work is delegated to team members to push toward the objectives that will get the team to the desired outcome.

Team roles and goals

The teams work with the physics they know (fuel economy, tire wear, etc.), a safe set of known challenges (pit stops, other drivers impeding progress, etc.), and what I like to refer to as an adversity budget. If you put 100% of effort into keeping the car in pristine condition to win, your team will fail. One dropped lug nut on a tire change will impact factors between that moment and the desired outcome. Adversity budget is like an SRE error budget, but it's way more art than science.

Everyone on the team knows they have a part to play on race day (from the owner to marketing to crew chief to engineer to pit lane sweeper). There are roles and responsibilities, communication protocols, disaster expectations, and many other business functions humming along in the background. And, everyone knows where they can help if there is a deficiency (like pulling the Andon cord and swarming in DevOps) or even worse, an incident (crash). An engineer in charge of aerodynamics is engaged when there is an issue impacting airflow over the vehicle. Another engineer will get involved if vibration starts in a particular part of the vehicle. The team on pit lane works to assemble the parts and pieces needed to repair the vehicle as it makes its way to the pit lane.

During a race, as laps get cranked out, several factors impact a team's desired outcome, and all the factors are accounted for and acted on where necessary. What are the weather conditions? How are tires wearing? How is the performance of the vehicle as measured by countless metrics? What's the feedback from the driver? What's the output of the data showing us compared to the drivers' feedback? What feedback can we give the driver to make them feel 100% confident in the equipment they're using to push toward their individual goals? What are our competitors are doing? Now we have to make a pit stop a little ahead of schedule.

Pit stops are like releases

Pit stops are what I like to think of as releases. Race teams know they'll have to pit a certain number of times for various reasons. But, when that schedule is affected, calculus occurs to address the rest of the race. In Formula 1 racing, using flammable fuel on pit row is too dangerous, so they've designed an entire system to eliminate mid-race refueling. If there is something inherently dangerous in a release process for a piece of software, that factor should either be minimized or worked out of the system entirely.

Conversely, in NASCAR, the dangerous process of refueling has been addressed in a different way. Putting the safeguards and processes in place to make the person jumping over a wall with a 40-pound gas can feel as confident as possible is critical. Think about what they're doing. This person is connecting a can of fuel to a hot car with its engine running. Meanwhile, people are running around changing tires, adjusting handling, and so forth. The psychological safety of every team member is essential. But, the ones sent to do more unsafe things should have a forgiving and rapid response system around them in the event of a catastrophe. That response system will remove them from harm while the team adjusts accordingly to its desired outcome. The team will believe in that system because it will be practiced 1000 times before it's employed once.

Practice and train

Race teams practice pit stops all week before the race. They do weight training and cardio programs to stay physically ready for the grueling conditions of race day. They are continually collaborating to address any issue that comes up. Software teams should also practice software releases often. If safety systems are in place and practice runs have been going well, they can release to production more frequently. Speed makes things safer in this mindset. It's not about doing the "right" thing; it's about addressing as many blockers to the desired outcome (goal) as possible and then collaborating and adjusting based on the real-time feedback that's observed. Expecting anomalies and working to improve quality and minimize the impact of those anomalies is the expectation of everyone in a DevOps world.

Last updated: August 21, 2019