How we rewrote a production UI without stopping it

This is part 2 of a four-part series. In part 1, we covered governance: how we made the code base AI-ready. This post details the architectural shift of the migration itself. Future installments will cover verification, specifically how we turned Storybook into a behavioral specification engine (part 3), and infrastructure, focusing on building a unified data layer for the UI, the mocks, and the E2E tests (part 4).

In July 2025, we faced a challenge with one of our React applications: it needed to become a completely different application (new permission model, new API layer, new data management) while the original version kept running in production. Nine months later, we'd replaced the entire stack in a zero-downtime migration. Some regressions still shipped, because no test suite catches everything. But the vast majority were caught before production and the ones that got through told us exactly where to add coverage next.

The counterintuitive part: We spent the first two months writing tests for code we were about to delete.

Note

A note on voice: Where I say "we," I mean the team. Where I say "I," I mean decisions I made personally. The distinction matters because accountability should be specific.

The problem

The access management interface for the Red Hat Hybrid Cloud Console (where administrators manage roles, groups, permissions, and workspaces) was built as a single-version app. Permission checks were baked into the platform shell (a system we call "Chrome"), meaning the UI asked the shell, "Can this user do X?" and trusted whatever came back. A new version (V2) needed to ship alongside the original (V1), backed by Kessel, a standalone authorization service that the UI queries directly instead of relying on the shell. Not after V1. Alongside it.

The business requirement was clear: V2 ships incrementally. V1 stays operational. Customers on V1 see no change. Customers on V2 get the new features. A feature flag controls the switch. Rollback is a flag flip.

That meant we couldn't do a branch-and-merge rewrite. We needed both versions living in the same repository, sharing code where possible, completely isolated where necessary and both fully tested at all times.

The bet: Specify the old system before touching it

The project had been running for more than a year before I joined. When I took ownership of the repository in July 2025, I assessed two things: the velocity the team could sustain and the deadline (March 2026). The gap between the two was obvious. The only way to close it was to invest in infrastructure before features—tests, boundaries, governance, documentation—so the team and our AI tooling could move faster and safer once the real migration work started.

None of this was in the project brief. The ask was to build V2. I built the infrastructure because the alternative was shipping something I couldn't stand behind.

Like most long-lived production apps, the code base had evolved under real-world pressure: shifting requirements, tight deadlines, team changes. Test coverage was thin and what existed was structural (comparing rendered output, not verifying behavior). That's not unusual. It's the natural state of any system that has been shipping features for years.

So before writing a single line of V2 code, we wrote behavioral specifications for V1. Every table. Every modal. Every wizard. Every permission guard. Tests that ran in a real browser, selected real buttons, and asserted real outcomes.

This felt slow. Two months of writing tests for code we planned to replace. But it was the single most important decision of the entire migration, for two reasons.

First, it forced us to build a complete picture of V1's actual behavior. Any system that has been in production for years accumulates features that aren't fully documented. Edge cases added under deadline pressure. Behaviors that made sense at the time but whose context has been lost. We found several of these: a default group behavior tied to a specific API interaction, permission edge cases that had grown organically, features added in earlier development cycles that hadn't yet been covered by tests. None of this is surprising. It's what happens in production software. But it all became visible only because we tried to specify it.

Second, it gave us a regression safety net. Every subsequent change could be verified against the specification. "Does V1 still behave the way it did before I touched it?" became a question with a mechanical answer, not a judgment call.

By the time we started changing production code, we had 130+ test files watching for regressions.

The architecture: Strangler fig, bottom-up

The strangler fig pattern (replacing a system piece by piece while both versions run) is well-documented for backend services. It's less common in frontend code bases. The key adaptation: we replaced bottom-up, not top-down.

If you replace the UI first, you inherit every bug in the data layer beneath it. If you replace the data layer first, every new component gets a clean foundation. So the sequence was:

Specification layer (July–September): Write behavioral tests for every existing surface, convert the code base to TypeScript.
Component layer (October–December): Build shared abstractions, migrate every surface to use them.
Data layer (January): Replace the state management system feature by feature (roles, users, groups, workspaces), then remove the old one in a single coordinated commit.
Governance layer (February–March): Unify the permission model, consolidate documentation, harden the mock and test infrastructure.

That's the clean version. Here's what actually happened.

What went wrong

This part is on me. During Phase 2, I got impatient. Instead of converting files to TypeScript and then refactoring them, I tried to do both at once. This caused regressions. The test suite caught some of them, but the pattern was clear: two types of changes in the same commit meant two potential sources of breakage and when something broke it was harder to tell which change caused it.

I stopped. Backed up. Split the work into two strict passes: first convert to TypeScript without changing behavior (a mechanical transformation the tests could trivially verify), then refactor the now-typed components. This was slower per file but dramatically safer, because each pass had exactly one reason to fail.

That course-correction taught me the migration's most important operational rule: one kind of change at a time. Convert, then verify. Refactor, then verify. Migrate the data layer, then verify. Never combine transformation types in a single step, no matter how obvious the combined change seems.

The rest of the migration followed that discipline. Each phase was only possible because the previous phase was complete and verified. TypeScript caught errors that JavaScript silently swallowed. Shared components eliminated six independent implementations of the same pattern. The test suite caught every regression during the data layer swap. And the final removal of the old state management system (216 files in a single commit) shipped confidently because 959 tests said the app still worked.

The coexistence boundary

V1 and V2 live side by side in the same repository. A feature flag at the app shell controls which version renders. The directory structure enforces separation: V1 code, V2 code, and a shared layer that neither version owns.

The boundary is enforced by a custom lint rule: V1 cannot import from V2, V2 cannot import from V1 and the shared layer cannot import from either. Violations fail the build.

This sounds obvious, but without mechanical enforcement, coexistence boundaries rot. Someone adds a cross-version import "just for this one case," and within a sprint the boundary is fiction. Lint rules don't forget. Lint rules don't make exceptions during crunch.

We added three more rules:

Require a shared utility for every table
Prevent direct user identity lookups that bypass the shared utility
Restrict API client imports to the data layer only.

The table rule alone eliminated six independent implementations of the same logic—over time, each table had developed its own approach to pagination, sorting and filtering. Four rules, four architectural rules that hold whether or not anyone is watching.

The test suite as permanent asset

By March 2026, the suite looked like this:

959 browser-based interaction tests across 169 test files
207 end-to-end tests running against staging with 8 distinct user personas (admin, viewer, etc.)
33 full user journey tests covering end-to-end create/edit/delete flows
60,772 total lines of test code

That last number matters. It's the reason we could ship a 272-file component library upgrade and a 216-file state management removal with confidence. The test suite isn't overhead. It's the specification that proves V2 does what V1 does.

It also caught real bugs that would have otherwise shipped silently: an API endpoint that didn't support an operation the UI assumed it did, a shared function dependency that caused page freezes for some users, API response shapes that didn't match the SDK's type definitions.

Not everything was caught. Some regressions made it to production, particularly in areas where the behavioral specification was thinnest. Edge cases we hadn't thought to test. Interactions between features that were each correct in isolation. Each one became a new test.

The suite is better now because of what got through, not despite it.

Eliminating flakiness

A flaky test suite is worse than no test suite, because it teaches the team to ignore failures.

We identified the three async patterns responsible for the majority of test flakiness (a double-retry race condition, debug logging that polluted test output and arbitrary wait timers that masked real timing bugs) and banned all three at the lint level. Not guidelines. Not code review comments. Lint rules that fail the build.

The result: When CI fails, something is actually broken.

What's happening now

The patterns described here (behavioral specification, lint-enforced boundaries, shared abstractions, governance-as-code) are now being adopted across the other repositories our team owns. Not all of them need the full treatment (not every code base has a JavaScript-to-TypeScript migration ahead of it), but the core approach scales: specify the system, enforce the architecture mechanically and let the test suite guide agents to do the right thing the first time.

What started as a strategy for one migration is becoming how the team works.

What I'd tell you if you're facing the same problem

Specify before you migrate. Write tests for the system you're about to replace. You'll build a complete picture of what it actually does and you'll get a regression safety net for free.

Replace bottom-up. Data layer before component layer. Shared abstractions before feature code. The sequence matters more than the speed.

Implement one kind of change at a time. Don't convert and refactor in the same step. Don't migrate the data layer and restructure the component in the same commit. Each transformation should have exactly one reason to fail. This is slower per file but dramatically safer per project.

Enforce boundaries with tooling, not with conventions. If two versions coexist, the boundary between them must fail the build when violated. Documentation and code review are not enough.

Ban flakiness patterns, don't fix flaky tests. If you know which patterns cause timing races (and you always do), ban them with lint rules. Prevention beats remediation.

Invest early, harvest late. The bet was that infrastructure compounds. It does.

Try Red Hat Hybrid Cloud Console at console.redhat.com.

Learn more

Last updated: May 6, 2026

How we rewrote a production UI without stopping it

Note

The problem

The bet: Specify the old system before touching it

The architecture: Strangler fig, bottom-up

What went wrong

The coexistence boundary

The test suite as permanent asset

Eliminating flakiness

What's happening now

What I'd tell you if you're facing the same problem

Learn more

New features in Python 3.14

Why killing pods is not enough: Testing operator reconciliation with operator-chaos

Troubleshoot Red Hat OpenShift Virtualization localnet with the netobserv command

EvalHub: Capability and safety benchmarking for AI models

Tune and troubleshoot Red Hat Data Grid cross-site replication

Download, serve, and interact with LLMs on RHEL AI

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links