How Red Hat re-designed its Single Sign On (SSO) architecture, and why.
Red Hat, Inc. recently released the Red Hat SSO product, which is an enterprise application designed to provide federated authentication for web and mobile applications.
In the SAML world, RH SSO is known as an Identity Provider (IdP), meaning its role in life is to authenticate and authorize users for use in a federated identity management system. For example, it can be used to authenticate internal users against a corporate LDAP instance such that they can then access the corporate Google Docs domain.
Red Hat IT recently re-implemented our customer-facing authentication system, building the platform on Red Hat SSO. This system serves all Red Hat properties, including www.redhat.com and access.redhat.com — our previous IdP was a custom-built IdP using the JBoss EAP PicketLink framework.
While this worked for the original SAML use-case, our development teams were seeking an easier integration experience and support for OAuth and OpenID Connect protocols. Red Hat SSO comes out of the box with full SAML 2.0, OAuth 2.0 and OpenID Connect support. Re-implementing the IdP from the ground-up gave us a chance to re-architect the solution, making the system much more performant and resilient. While outages were never really acceptable in the past, our customers now expect 24/7 uptime. This is especially true with Red Hat’s increased product suite, including hosted offerings such as OpenShift Online.
We set out on this project with several initial goals. We needed:
- Multiple highly available clusters distributed in different geographic regions
- Sub-second authentication response time — the old system would take roughly 2 seconds to return following a password entry
- A self-healing architecture
- Security from the ground up, everything encrypted… everywhere.
- A custom theme and branding
- Custom data providers
To begin with, we started at the database level. RH SSO requires a database backend for storing realm and user data. While it ships with an integrated H2 database, it is a good idea to use an external database service, especially for production traffic. Any of the supported EAP 7 databases will work, we selected MariaDB 10.1.
We knew that we wanted multiple clusters in multiple locations, so we gave Maria multi-master replication a try. During performance testing, we encountered several cases where replication would be interrupted because of conflicting records being added to each site at the same time. This lead us to consider other topologies and we ended up treating each site as an independent cluster with no database replication. We were able to get away with this because the SSO database is not the source of authority for our users and because when a user accesses sso.redhat.com, they are bound to that cluster for the remainder of the session. Users will failover to other clusters in the case of a site outage, but that will require re-authenticating.
User data is stored within a few distinct data services for which we developed a custom user federation providers. When a user first logs into a cluster, a user record is created in the RH SSO database. Returning users have their information refreshed upon login. This allows us to keep the sites as completely separate clusters. The downside is that this means we have to configure service providers (federated identity consumers) multiple times, one for each region, but a script around the RH SSO API solves that problem.
Each RH SSO application cluster has multiple nodes. User session state is shared among nodes using the standard JBoss EAP infinispan session replication mechanism. We decided to use TCP transport rather than multicast, as this allows us to span availability zones in both our data center and cloud vendors. Incoming connections to the cluster pass through a set of highly available load balancers. The load balancer technology varies based on the site. Per our internal security policy, client TLS sessions are encrypted all the way from the application nodes to the user’s browser. For some load balancers, this requires us to use TCP pass-thru where the load balancer cannot inject a session infinity cookie (since they cannot decrypt the connection). In these cases, the only session stickiness option is to use source IP, which can frequently change when the service is front-ended by a CDN. This is not ideal, but Infinispan distributes user sessions to all cluster nodes, so it is not really a problem.
The Content Distribution Network (CDN) that we use hosts https://sso.redhat.com and directs traffic based on a variety of metrics, but most importantly response time. Should all sites be unavailable, users are redirected to static outage content.
User state is limited to an individual cluster as infinispan is not recommended to be used across a WAN. In order to maintain a consistent user experience, the CDN continues to serve all subsequent requests from that site which handled the initial login event. The CDN sets a cookie to ensure this operates correctly. Anytime a new service provider is accessed, authorization is sourced from this same RH SSO cluster, as SAML operations occur through user browser redirects.
RH SSO should be able to cover most use-cases using native functionality; however, the reality of enterprise environments is that customizations are usually required. This is especially true when dealing with web-based applications. Fortunately, RH SSO supports what is called a SPI (Service Provider Interface). This allows an organization to override or extend RH SSO to meet their needs. These SPIs, and how to use them, are well documented in the upstream project’s documentation.
While RH SSO supports LDAP and Active Directory users out of the box, custom federation providers can be developed to integrate it with other data sources. We ended up developing multiple federation providers, allowing us to source users directly from several data services and fail-over to others when necessary. More information on federation providers may be found at https://keycloak.gitbooks.io/server-developer-guide/content/v/2.2/topics/user-federation.html
As one would expect, RH SSO can be themed for providing users a seamless experience. As such, we have developed a custom theme to match the look and feel of other Red Hat web properties.
Packaging and Init scripts
RH SSO, a middleware product, was developed to be deployable on virtually any platform supporting Java 8. It is currently available as a zip archive or as a container for use on OpenShift 3. We deployed RH SSO before it was officially released and before official containers were built.
This meant we deployed the Zip on RHEL 7, which included having to manage the Zip file in our configuration management system (puppet and ansible) and writing our own systemd unit file. Fortunately, more RHEL-friendly packaging will be available in future versions.
RH SSO has a built-in event system, enabling a high degree of customization around logging events. While the built-in logs do provide detailed information, we wanted to feed the SSO logs into our log aggregation system. It was easiest to have this information correlated in a single log that could then be indexed and used to monitor the system.
Admin role-based access control model
The role-based access control (RBAC) model of RH SSO is fairly coarse at this point in time. One of the huge benefits of the product is that it is extremely easy to add new service providers, such as Google Docs or Dropbox. Since it is so easy to add these providers, our team would like to delegate maintenance of each provider to the group who supports the corresponding application.
The problem is that the RH SSO privileges required for managing service providers actually allows access to modify all service provider configurations. As such, there is no way we can safely delegate this access to application teams without also giving them access to modify other groups’ configuration. There is a pending feature request to support this in the product.
Given our multi-site architecture and how RH SSO handles session replication, only SAML and the OAuth implicit flow operate correctly.
Applications requiring the OAuth code flow, which relies upon server-to-server communication, will not work. A user will be directed to one cluster, but the back-end service to IdP communication will likely involve a different RH SSO cluster, one where the user does not have an active session. Currently, the only way to resolve this is to revert to an active/passive architecture. This is not something that we want to do. The other option is to use a database replication technology like Galera combined with spanning Infinispan across WAN links.
We have business requests for server-to-IdP OAuth flows and are currently exploring the latter option. We are also working with the RH SSO development teams to articulate the use-case and work towards enhancing the product to better support multi-site deployments.
Overall, the migration to RH SSO was fairly seamless even though it required a large amount of coordination with all service providers. Switching between IdPs is never an easy thing to do; however, this migration is paying many dividends. As a result, we have a much more resilient and scalable system. Additionally, we now have the benefits of:
- Support for two factor authentication and social authentication (Facebook, Google, etc)
- Individual Service Provider settings: SAML configurations are defined on a per SP level, allowing individual customizations.
- Full OAuth 2.0 and OpenID Connect support
- A RESTful API and an admin GUI
- Built on RHEL7 and EAP7
Most importantly, RH SSO is a stand-alone product rather than the build-your-own solution we had previously. It is Red Hat’s official SAML and OAuth Identity Provider and can be deployed along side Red Hat IdM.
Currently, Red Hat SSO is available as a zip file or a container image. If you are looking to get your feet wet with it and do not have product subscriptions, you can check out the upstream open source KeyCloak product, upon which RH SSO is built.
 = docker pull registry.access.redhat.com/redhat-sso-7/sso70-openshift
About the Author
Brian J. Atkisson is a Senior Principal Systems Engineer and the technical lead on the Red Hat IT Identity and Access Management team. He has 18 years of experience as a Systems Administrator and Systems Engineer, focusing on identity management, virtualization, systems integration, and automation solutions. He is a Red Hat Certified Architect and Engineer, in addition to his academic background in Biochemistry, Microbiology and Philosophy.