From a developer’s perspective, “incident management” can be a pretty ambiguous term. While the first thing that comes to mind is receiving and responding to alerts, most IT professionals know it is so much more than that. Effective incident management starts with data collection and continues through alerting, escalation, collaboration, and resolution. At the server level, the most important pieces of incident management are infrastructure monitoring and log management, the vast majority of which are easily configurable on a Red Hat Enterprise Linux system.
When it comes to incident management tools, they can be grouped into two separate categories depending on the security requirements of your organization: internal and external.
Internal incident management tools are commercial or open source solutions that are installed and managed on your company’s internal infrastructure. Many organizations have strict security requirements that limit their ability to utilize external third-party SaaS tools, so solutions that can be managed in-house are crucial.
Nagios is one of the most popular open source IT infrastructure monitoring tools on the market today. It is used by hundreds of thousands of users to identify issues in their IT infrastructure as they happen. Nagios provides everything from network analysis to server monitoring and log aggregation, making it one of the most powerful tools your organization can use.
Logstash is an open source log aggregation tool that, when paired with Elasticsearch and Kibana (known as the ELK stack), provides the ability to search and analyze log data from across your entire infrastructure. Beyond simply aggregating log data, Logstash normalizes log data into a common format before it’s inserted into your analytics datastore.
Monit is a popular open source tool for managing and monitoring Unix systems. The beauty of Monit is that it not only monitors your system and alerts you of issues, it can be configured to automatically respond and repair the system.
External incident management tools are software-as-a-service (SaaS) solutions that are built and managed by a third-party provider. These tools are often far more cost-effective for smaller organizations. While external solutions aren’t always practical from a security standpoint, they can provide significantly more features and integrations than internal tools.
PagerDuty is an incident management tool with an emphasis on incident resolution. Through integrations with almost every popular incident management tool available (including the three internal tools mentioned above), PagerDuty is an incredibly powerful alerting and escalation tool, and it’s used by some of the most popular tech companies in the world.
Logentries is a third-party log management and analytics tool that provides much of the same functionality as Logstash, but without all of the configuration overhead. With the ability to search, visualize, and monitor your logs, your team can easily run down issues at any time, from anywhere.
As far as application monitoring goes, New Relic is the holy grail. New Relic is a tool that allows your team to analyze application performance along the entire stack. Combined with user monitoring, application monitoring, and availability monitoring, New Relic is an insanely powerful weapon to have in any organization’s arsenal.
Effective incident management is about more than just the toolset you use, it is also about the process. However, no matter how refined your process is, the right tools can take things to the next level. Thanks to the hard work done on both open source and commercial solutions, you should have no problem finding the right tool for your application.
Zachary Flower (@zachflower) is a freelance web developer, writer, and polymath. He has an eye for simplicity and usability, and strives to build products with both the end user and business goals in mind. From building projects for the NSA to creating features for companies like Name.com and Buffer, Zach has always taken a strong stand against needlessly reinventing the wheel, often advocating for the use of well established third-party and open source services and solutions to improve the efficiency and reliability of a development project.Last updated: March 16, 2023