If you’re running an SQL Server on Red Hat Enterprise Linux (RHEL) and care about high availability, this is a big one. We’re introducing Pacemaker HA Agent v2 (tech preview), the next evolution of high availability for SQL Server on RHEL. This release focuses on one thing above all: making failovers faster, smarter, and more reliable than before.
Let’s walk through what’s changing and why it matters.
Why Pacemaker matters on RHEL
If you’re coming from the Windows world, you’re used to SQL Server AlwaysOn Availability Groups working hand-in-hand with Windows Server Failover Clustering (WSFC). That built-in clustering layer handles everything from health checks to failover decisions. On RHEL, things work a bit differently. SQL Server still provides Availability Groups, but it relies on an external cluster manager to orchestrate everything. That’s where Pacemaker comes in.
Think of Pacemaker as the control plane for high availability on RHEL.
Pacemaker does the following:
- Monitors node and SQL Server health
- Decides when failover should happen
- Coordinates role changes between replicas
- Helps prevent split-brain scenarios
- Manages AG resources and listeners
Sitting between SQL Server and Pacemaker is the Pacemaker HA Agent, the component that translates SQL Server health into something the cluster understands.
What needed improvement?
The original Pacemaker HA Agent (v1) did the job, but customers running real production workloads on RHEL quickly ran into these pain points:
- Failovers could take anywhere from 30 seconds to 2 minutes.
- Health checks weren’t always deep enough, missing things like memory pressure or I/O issues.
- Failover behavior wasn’t very flexible.
- Write-lease handling required extra care.
- No support for modern security standards like TLS 1.3.
In short, it worked; but there was room to make it much better.
Enter Pacemaker HA Agent v2
With SQL Server 2025 CU3, we’re introducing a completely reworked HA agent, built from the ground up with RHEL deployments in mind. One of the biggest changes is that it’s now service-based. Instead of being tightly coupled in older ways, the agent runs as a dedicated system service: mssql-pcsag. This makes it easier to manage and more responsive overall.
On RHEL, you control it just like any other service.
# Start the mssql-pcsag service
sudo systemctl start mssql-pcsag
# Restart the mssql-pcsag service
sudo systemctl restart mssql-pcsag
# Check the status of the mssql-pcsag service
sudo systemctl status mssql-pcsag
# Stop the mssql-pcsag service
sudo systemctl stop mssql-pcsagThis is simple, predictable, and RHEL-native.
Benefits
The main improvement is faster, smarter failovers. Instead of relying on basic polling, v2 introduces a more advanced health monitoring model.
The SQL Server can now surface richer diagnostic signals, which means:
- Quicker detection of problems
- Failover decisions happen faster
- Reduced overall downtime
Failover you can tune
One of the biggest gaps between Windows and Linux HA has been flexibility. That changes with v2. You can now configure failure condition levels (1–5) and health check timeouts. This allows you to decide how aggressive or conservative failover should be.
Here is an example:
ALTER AVAILABILITY GROUP pacemakerag
SET (FAILURE_CONDITION_LEVEL = 2);
ALTER AVAILABILITY GROUP pacemakerag
SET (HEALTH_CHECK_TIMEOUT = 60000);Behind the scenes, these decisions are driven by sp_Server_diagnostics, which gives much deeper visibility into SQL Server’s internal state, things like:
- Memory pressure
- Deadlocks
- Spinlock issues
- Other engine-level problems
Better protection against split-brain
If you’ve worked with clustering on RHEL, you know split-brain scenarios are something you absolutely want to avoid. SQL Server uses a write-lease mechanism for this. But in v1, it wasn’t fully integrated into failover decisions. With v2, that changes.
The agent now actively evaluates lease validity before making role changes, resulting in:
- Safer failovers
- Better data consistency
- More predictable behavior during edge cases
Modern security with TLS 1.3
Security also gets an upgrade. Pacemaker HA Agent v2 supports TLS 1.3 for communication between SQL Server and the cluster stack (when enabled), helping align with modern security expectations on RHEL systems.
Where can you run it?
Right now, Pacemaker HA Agent v2 is available in tech preview for use in the following environments:
- Red Hat Enterprise Linux 9 and newer
- SQL Server 2025 CU3+
This is non-production for the time being. If you’re already running SQL Server 2025 on RHEL and want to test v2, the upgrade path is straightforward.
1. Remove the existing AG resource.
sudo pcs resource delete <NameForAGResource>This pauses synchronization but doesn’t delete your Availability Group.
2. Recreate it using the new agent.
sudo pcs resource create <NameForAGResource> \
ocf:mssql:agv2 \
ag_name=<AGName> \
meta failure-timeout=30s promotable notify=true3. Check the cluster health.
sudo pcs statusOnce that’s done, Pacemaker resumes management and everything continues as expected.
Wrap up
Pacemaker HA Agent v2 is a big step forward for SQL Server on RHEL. It closes long-standing gaps, brings Linux HA behavior closer to what Windows users expect, and most importantly, delivers faster, more reliable failovers. If you’re running mission-critical workloads on RHEL, this is definitely worth exploring.
Check out these resources: