Building a Continuous Deployment Engine

A couple of weeks ago, I mentioned that we (the Red Hat Inception Team) are building “a thing.”  Given our own internal interest in the topic, you may be wondering why we chose a custom Release Engine over pre-baked tooling. There are many different reasons why we went in this direction; I’m going to cover four.

Why #1: There are many existing FOSS tools to automate portions of a release process, but there seems to be a tooling gap in tying them easily together.

For CI, there’s Jenkins/Hudson, buildbot, Travis, etc..; for configuration management, Chef or Puppet; for repository management, Pulp or Satellite; for provisioning, Foreman, Satellite, or Cobbler; for host scripting, Ansible as well as our own Taboot. In Red Hat IT, we’re using at least one of these tools in each category. But, there seems to be a lack of easy to use tools to automate orchestration across these systems. Yes, Jenkins can tie them all together, but it’s not easy to set up and even harder to maintain. The Release Engine (RE) is our stab at solving this problem.

Why #2: RE must be light-weight and fit to purpose.

This doesn’t mean that our Release Engine efforts will be restricted to our use cases only, we are trying to make it generic enough that it is reusable by others outside of Red Hat.

Additionally, we already have a few general purpose workflow engines at our fingertips, but we chose not to use them. To be successful, we believe that RE must be easy to extend and modify by our development and operations team members. Each specific workflow tool comes with their own difficulties for making them accessible to the entire department. For example, the prerequisite to set up an Eclipse IDE for a sysadmin to add a new release step would render RE dead on arrival.

In the words of Anderson Silva, the manager of our NA & EMEA Platform Operations team, “the key to the success of this engine is not how many things it can do when it gets released, but how easily people can add functionality to it as demand grows.”

Why #3: RE must be written in a commonly known language. See Why #2.

We chose Python. It’s a popular interpreted language, easy to read and modify and one in which both our development and operations folks frequently have experience. A system that requires Java, C, C++, Mono, etc. to extend would severely limit its accessibility to team members outside the development side of the house, as well as those developers who don’t have time to spend writing traditional strongly typed/compiled code to get their functionality implemented.

Why #4: We need to support a model of decentralized CI, so RE will do the work necessary to maintain a reliable, repeatable and auditable release.  

We will go into the details behind this in the future, but we have already tried centralized CI. As we grew, it became apparent it wasn’t going to continue working for everyone in our department. Developers wanted more control over their CI and having it centralized didn’t give much support to that desire. So, we are trying to unwind our centralized CI tool in favor of developer supported CI environments. No matter what CI solution a team chooses, it should easily hook into RE.

For practical purposes we need consistency in releases and since we won’t be driving that through a central CI system, the release engine will have that job instead. It was unclear that any one tool would what we needed while also doing the following:

  1. Who is doing the release and are they allowed to modify that environment/code/thing? We are all about empowerment, but we’re only going to let people push their team’s code… and definitely not to all environments without fulfilling some prerequisites.
  2. When did the release occur and is it allowed to occur at that time? Sometimes, you need to coordinate code releases so they don’t completely blow up critical business processes. You may say “that’s not CD” – I say; reality of being in an enterprise IT shop. We’ll deal with it if and when we need to.
  3. What and how are you actually deploying? Got a special little thing you do over here in this environment because of “reasons?” Not a thing anymore! We want deployments to be repeatable by everyone, including me… The Product Owner. 🙂

Interested in some of the tools we reviewed?
Here are four we looked at with some thoughts from the team. Kudos to the creators and maintainers of the code; they helped guide some of our design decisions along the way.

Deployinator
What what we like:

  • Language/Framework is Ruby/Rails which many people know.
  • Reusable classes for releases is a good idea.

Why we didn’t choose it:

  • Not a lot of current code updates.
  • README was noted as out of date.
  • Seems to be designed more around checking out code and putting it somewhere which wouldn’t work for us.
  • The authentication seems to be implemented to an in-house system.

Dreadnot
What what we like:

  • Language/Framework is Javascript/Node.js. Almost every engineer knows Javascript.
  • Feature enhancements added within the last few months showing it’s still active.
  • Runs asynchronous.

Why we didn’t choose it:

  • There seems to be regions called out in configs (https://github.com/racker/dreadnot/blob/master/example/local_settings.js#L13) – this may be tied to specific infrastructure.
  • Stacks look like they must be defined and stored on the server side. Another system would need to be in place to support developers modification of deployments.

Strider
What what we like:

  • Language/Framework is Javascript/Node.js. Almost every engineer knows Javascript.
  • Under active development.
  • Strider likes to take common deployment scenarios and make them into reusable recipes.
  • Having reusable stuff is much better than having everyone have custom stuff that ends up to be 95% the same.

Why we didn’t choose it

  • The actual server and runners are all on one machine.
  • It doesn’t seem to delegate out unless you install and configure extensions.
  • Seems to be CI with CD added to it.

Thoughworks – Go
What what we like:

  • Good Flow layout and visualizations
  • Out of the box LDAP integration

Why we didn’t choose it

  • Language is Java, JRuby. While JRuby would be easy for developers the need to add Java + JRuby libraries seems like overkill.
  • Source code was unavailable. (The product was only recently open sourced and is now available).
  • This appears to be a full on replacement for Jenkins and not a system that would run along side of it.
  • Not pluggable as far as we could tell, though we never tried to set up a server as the code was unavailable at that time.
  • The LDAP setup is a bunch of XML and it is unclear whether there is an easy way to use the the GUI to update users.
Share