Putting OpenShift under pressure – a case study
About a year ago, Red Hat Product Security decided to move its blog, the Red Hat Security Blog, off of WordPress.com’s infrastructure and onto Red Hat’s OpenShift. There were some initial growing pains since this was a relatively new thing to do, but it wasn’t long before the blog was in a stable environment. There were plans to put the application on a larger gear (it was hosted on a small gear) and to make it scalable (it wasn’t), but as most things go, when stability increases you end up forgetting about making the changes as other issues need to be addressed that aren’t related to your application’s backend. And for that year WordPress, on a small gear, on OpenShift just worked.
Everything you need to grow your career.
With your free Red Hat Developer program membership, unlock our library of cheat sheets and ebooks on next-generation application development.SIGN UP
Fast forward to September 23rd, the day otherwise known as the day before Shellshocked happened, where the final touches were being placed on an article that would explain Red Hat’s position on the Bash vulnerability, why it happened, and what we had done to fix the problem. That article would end up testing the blog’s backend to the max.
The day we went public with the vulnerability and the fixes, we published the blog and watched eagerly to see if people were actually using it to help answer their questions. People did come and we started to see pingbacks where others were linking to that article from their own blogs. As that first day’s page views exceeded 27,000, more than three times that of the Heartbleed article, no problems were seen from our installation. The views just kept coming in, and the blog kept serving them up. That first day saw over 55,000 views of our blog.
Unfortunately, the first Bash fix wasn’t complete. That led to confusion and a desire for information. While Red Hat Product Security worked diligently to fix the issue, knowledge base articles and updates to our original article were posted. The article and the knowledge base information became one of the de facto standard places to get information about the vulnerability. US-CERT, NIST, and many news organization pointed their readers at our resources, and the readers did come.
Around mid-morning on Thursday the blog started to fail. Most of the failures were in the capacity of serving the demand. Working with OpenShift developers, connection rates upwards of 50 to 60 hits per second were tolerated. Higher rates were being seen, and it was determined that something else would need to be done.
Quickly, the Security Blog was moved from a small gear to a medium gear, again, without scaling. Because WordPress is slightly tricky to setup to be truly scaleable, the priority became to just get the system stable and revisit the best means of deploying the site later. The medium gear immediately remedied the outage. With no other changes, the medium gear was handling the upwards of 100 hits per second that came in for hours on end. Even with the outage we served up 174,244 views of our information that second day.
Next steps are already being planned for moving the WordPress instance to even more stable grounds. Setting up the application on a large, scalable gear and using Varnish on another gear ready to deploy to other geographic areas to help reduce the network demand are in the works. OpenShift is flexible enough to allow us to change gears (quite literally) to meet demand in a short amount of time, and, with better planning, can meet demand automatically.
Because we self host our blogging solution, as well as other in-house applications, we’re able to take full control of the system. When availability is a high priority (along with other security requirements) it’s important to maintain that control yourselves. OpenShift allows us to do just that.
Red Hat Product Security is a proud user of OpenShift and open source software.