Greenboot: Automate rollbacks for atomically updated systems

In mid-July, a faulty configuration update caused a significant global IT disruption, leading to transportation delays, point-of-sale issues, telecommunications outages and more. Affected machines entered a boot loop or boot recovery mode, rendering them inoperative; this underscores the critical need for robust automated recovery mechanisms in IT infrastructure. One answer being iterated upon by the open source community is Greenboot, which is currently available in Red Hat Enterprise Linux (RHEL) for Edge, Red Hat In-Vehicle Operating System and Fedora IoT.

Greenboot is designed to automatically recover from failed upgrades by integrating with systemd and atomically updated distros. It acts as a guardian for your system’s health, so that if an update goes awry, the system can automatically rollback to a previously working state. Here is how Greenboot can help prevent outages similar to July’s global disruption.

Automated health checks and rollbacks

Greenboot performs health checks every time the system boots. It uses scripts to verify critical components and services, verifying that they are functioning as expected. If any health check fails, Greenboot can automatically rollback the system to a previous, stable state, minimizing downtime.

Customizable health checks

Administrators can define custom health checks tailored to their specific system needs. These checks can be categorized into mandatory checks that must pass for the system to be considered healthy and optional checks, whose failure will not trigger a rollback but will be logged as a failure.

Integration with systemd and OSTree

By leveraging systemd for service management and OSTree for version control, Greenboot provides a powerful, integrated solution for maintaining system health. The ability to create and manage bootable system snapshots enables Greenboot to rollback effectively if an update causes issues.

How Greenboot works

Greenboot follows a structured approach to manage system health:

System boot
Health check outcomes
Rollback

System boot

During boot, Greenboot runs health check scripts that are located in /etc/greenboot/check/required.d and /etc/greenboot/check/wanted.d. The scripts in required.d must pass for the boot to be successful, while failures in wanted.d are logged but do not trigger a rollback.

Health check outcomes

Success: If all required health checks pass, Greenboot executes any scripts in /etc/greenboot/green.d to finalize the boot process.
Failure: If any required health check fails, Greenboot runs scripts in /etc/greenboot/red.d to attempt corrective actions before rebooting. If the issue persists after several retries, Greenboot triggers an OSTree rollback to the previous stable version.

Rollback mechanism

In the event of repeated failures, Greenboot uses rpm-ostree rollback --reboot to revert the system to the last known good state. This ensures that the system can recover from failed updates without manual intervention. In the case where Linux userspace may not be reached effectively, the bootloader maintains boot counters, to automatically rollback after a number of failed boots.

                              +-------------------------+
     +----------------------->|     System Boot         |
     |                        +-----------+-------------+
     |                                    |
     |                                    |
+--------+                                |
| reboot |                                |
+--------+                                |
     ^                                    |
     |                                    |
Yes  |                                    v
     |                        +-------------------------+
     |        boot_counter == |        -1 ?             |
     +------------------------+-----------------------+-+
                                                      |
                                                      | No
                                                      v
                                             +-------------------------+
                                             | Continue boot process   |
                                             +-----------+-------------+
                                                         |
                                                         v
                                           +--------------------------+
                                           | greenboot-healthcheck    |
                                           +-----------+--------------+
                                                       |
                                                       v
                                          +--------------------------+
                                          | Run health check scripts  |
                                          | in `required.d` and       |
                                          | `wanted.d` directories    |
                                          +-----------+--------------+
                                                      |
                                                      v
                                  +---------------------------+
                     Any required | Script failed ?           |
                      +-----------+---------------------------+
                      | No                             | Yes
                      v                                v
       +--------------------------+     +--------------------------+
       |  Boot successful         |     |  Call `redboot.target`   |
       +-----------+--------------+     +-----------+--------------+
                   |                              |
                   v                              v
      +-----------------------------+   +--------------------------+
      | Reach `boot-complete.target`|   | redboot-task-runner      |
      +-----------+-----------------+   | runs `/usr/libexec/      |
                  |                     | greenboot/greenboot red` |
                  v                     +-----------+--------------+
      +-----------------------------+               |
      | greenboot-grub2-set-success |               v
      | unsets `boot_counter` and   |   +--------------------------+
      | sets `boot_success` to 1    |   | Run scripts in `red.d`   |
      +-----------+-----------------+   +-----------+--------------+
                  |                                 |
                  v                                 v
      +-----------------------------+   +--------------------------+
      | greenboot-task-runner runs  |   | greenboot-status.service |
      | `/usr/libexec/greenboot/    |   | creates MOTD with error  |
      | greenboot green` to run     |   | details                  |
      | scripts in `green.d`        |   +-----------+--------------+
      +-----------+-----------------+               |
                  |                                 v
                  v                     +--------------------------+
      +-----------------------------+   | redboot-auto-reboot      |
      | greenboot-status.service    |   | checks if manual         |
      | creates MOTD with success   |   | intervention is needed,  |
      | message                     |   | if not, reboots system   |
      +-----------------------------+   +--------------------------+

Conclusion

Greenboot offers a robust, automated solution to manage system health and recover from update failures, reducing the risk of downtime and operational disruptions. By leveraging customizable health checks and integration with systemd and OSTree, Greenboot can effectively mitigate the impact of faulty updates, helping your systems to remain reliable and resilient. Whether managing a few servers, many edge devices or many vehicles, Greenboot is an essential tool for maintaining uptime and system integrity in today’s complex IT environments.

Red Hat Developer Sandbox

Programming languages & frameworks

System design & architecture

Developer experience

Automated data processing

Platform engineering

Secure development & architectures

E-books

Cheat sheets

Documentation

Greenboot: Automate rollbacks for atomically updated systems

Automated health checks and rollbacks

Customizable health checks

Integration with systemd and OSTree

How Greenboot works

System boot

Health check outcomes

Rollback mechanism

Conclusion

Trusted execution clusters operator: Design and flow overview

Autoscaling vLLM with OpenShift AI model serving: Performance validation

Introducing Models-as-a-Service in OpenShift AI

Building domain-specific LLMs with synthetic data and SDG Hub

External IP visibility in Red Hat Advanced Cluster Security

systemd Commands cheat sheet

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue