Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Alertmanager Watchdog monitoring with Nagios passive checks

April 29, 2020
Huub Daems
Related topics:
DevOpsPythonKubernetesLinux

    After installing a fresh Red Hat OpenShift cluster, go to Monitoring -> Alerting. There, you will find a Watchdog alert, which sends messages to let you know that Alertmanager is not only still running, but is also emitting other signals for alerts you might be interested in. You can hook into Watchdog alerts with an external monitoring system, which in turn can tell you that alerting in your OpenShift cluster is working.

    "You need a check to check if your check checks out."

    How do you do this? Before we can configure Alertmanager for sending out Watchdog alerts, we need something on the receiving side, which is in our case Nagios. Follow me on this journey to get Alertmanager's Watchdog alerting against Nagios with a passive check.

    Set up Nagios

    OpenShift is probably not the first infrastructure element you have running under your supervision. That is why we start to capture a message from OpenShift with a self-made (actually from the Python 3 website and adjusted) Python HTTP receiving server, just to learn how to configure alert manager and to possibly modify the received alert message.

    Also, you probably already have Nagios, Checkmk, Zabbix, or something else for external monitoring and running alerts. For this journey, I chose to use Nagios because it is a simple precooked and pre-setup option via yum install nagios. Nagios normally only does active checks. An active check means that Nagios is the initiator of a check configured by you. To know if the OpenShift Alertmanager is working, we need a passive check in Nagios.

    So, let's go and let our already existing monitoring system receive something from Alertmanager. Start by installing Nagios and the needed plugins:

    $ yum -y install nagios nagios-plugins-ping nagios-plugins-ssh nagios-plugins-http nagios-plugins-swap nagios-plugins-users nagios-plugins-load nagios-plugins-disk nagios-plugins-procs nagios-plugins-dummy

    Let's be more secure and change the provided default password for the Nagios administrator, using htpasswd:

    $ htpasswd -b /etc/nagios/passwd nagiosadmin <very_secret_password_you_created>
    

    Note: If you also want to change the admin's username nagiosadmin to something else, don't forget to change it also in /etc/nagios/cgi.cfg.

    Now, we can enable and start Nagios for the first time:

    $ systemctl enable nagios
    $ systemctl start nagios
    

    Do not forget that every time you modify your configuration files, you should run a sanity check on them. It is important to do this before you (re)start Nagios Core since it will not start if your configuration contains errors. Use the following to check your Nagios configuration:

    $ /sbin/nagios -v /etc/nagios/nagios.cfg
    $ systemctl reload nagios
    $ systemctl status -l nagios
    

    Dump HTTP POST content to a file

    Before we start configuring, we first need an HTTP POST receiver program in order to receive a message from the Alertmanager via a webhook configuration. Alertmanager sends out a JSON message to an HTTP endpoint. To do that, I created a very basic python program to dump all data received via POST into a file:

    #!/usr/bin/env python3
    
    from http.server import HTTPServer, BaseHTTPRequestHandler
    from io import BytesIO
    
    class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
    
    def do_GET(self):
    self.send_response(200)
    self.end_headers()
    self.wfile.write(b'Hello, world!')
    
    def do_POST(self):
    content_length = int(self.headers['Content-Length'])
    body = self.rfile.read(content_length)
    self.send_response(200)
    self.end_headers()
    response = BytesIO()
    response.write(b'This is POST request. ')
    response.write(b'Received: ')
    response.write(body)
    self.wfile.write(response.getvalue())
    dump_json = open('/tmp/content.json','w')
    dump_json.write(body.decode('utf-8'))
    dump_json.close()
    
    httpd = HTTPServer(('localhost', 8000), SimpleHTTPRequestHandler)
    httpd.serve_forever()
    

    The above program definitely needs some rework. Both the location and format of the output in the file have to be changed for Nagios.

    Configure Nagios for a passive check

    Now that this rudimentary receive program is in place, let's configure the passive checks in Nagios. I added a dummy command to the file /etc/nagios/objects/commands.cfg. That is what I understood from the Nagios documentation, but it is not really clear to me whether that is the right place and the right information. In the end, this process worked for me. But keep following, the purpose at the end is Alertmanager showing up in Nagios.

    Add the following to the end of the commands.cfg file:

    define command {
    command_name check_dummy
    command_line $USER1$/check_dummy $ARG1$ $ARG2$
    }
    

    Then add this to the server's service object .cfg file:

    define service {
    use generic-service
    host_name box.example.com
    service_description OCPALERTMANAGER
    notifications_enabled 0
    passive_checks_enabled 1
    check_interval 15 ; 1.5 times watchdog alerting time
    check_freshness 1
    check_command check_dummy!2 "Alertmanager FAIL"
    }
    

    It would be nice if we could check that this is working via curl, but first, we have to change the sample Python program. It writes to a file by default, and for this example, it must write to a Nagios command_file.

    This is the adjusted Python program to write to the command_file with the right service_description:

    #!/usr/bin/env python3
    
    from http.server import HTTPServer, BaseHTTPRequestHandler
    from io import BytesIO
    import time;
    
    class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
    
    def do_GET(self):
    self.send_response(200)
    self.end_headers()
    self.wfile.write(b'Hello, world!')
    
    def do_POST(self):
    content_length = int(self.headers['Content-Length'])
    body = self.rfile.read(content_length)
    self.send_response(200)
    self.end_headers()
    response = BytesIO()
    response.write(b'This is POST request. ')
    response.write(b'Received: ')
    response.write(body)
    self.wfile.write(response.getvalue())
    msg_string = "[{}] PROCESS_SERVICE_CHECK_RESULT;{};{};{};{}"
    datetime = time.time()
    hostname = "box.example.com"
    servicedesc = "OCPALERTMANAGER"
    severity = 0
    comment = "OK - Alertmanager Watchdog\n"
    cmdFile = open('/var/spool/nagios/cmd/nagios.cmd','w')
    cmdFile.write(msg_string.format(datetime, hostname, servicedesc, severity, comment))
    cmdFile.close()
    
    httpd = HTTPServer(('localhost', 8000), SimpleHTTPRequestHandler)
    httpd.serve_forever()
    

    And with a little curl, we can check that the Python program has a connection with the command_file and that Nagios can read it:

    $ curl localhost:8000 -d OK -X POST
    

    Now we only have to trigger the POST action. All of the information sent to Nagios is hard-coded in this Python program. Hard coding this kind of information is really not the best practice, but it got me going for now. At this point, we have an endpoint (SimpleHTTPRequestHandler) to which we can connect Alertmanager via a webhook to an external monitoring system—in this case, Nagios with an HTTP helper program.

    Configure the webhook in Alertmanager

    To configure the Alertmanager's Watchdog, we have to adjust the secret alertmanager.yml. To get that file out of OpenShift, use the following command:

    $ oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' |base64 -d > alertmanager.yaml
    
    
    global:
      resolve_timeout: 5m
    route:
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'default'
      routes:
      - match:
          alertname: 'Watchdog'
        repeat_interval: 5m
        receiver: 'watchdog'
    receivers:
    - name: 'default'
    - name: 'watchdog'
      webhook_configs:
      - url: 'http://nagios.example.com:8000/'
    

    Note: On the Prometheus web page, you can see the possible alert endpoints. As I found out with webhook_config, you should name that file in plural form (webhook_configs) in alertmanager.yml. Also, check out the example provided on the Prometheus GitHub.

    To get our new fresh configuration back into OpenShift, execute the following command:

    $ oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run -o=yaml | oc -n openshift-monitoring replace secret --filename=-
    

    In the end, you will see something similar received by Nagios. Actually, this is the message the Watchdog sends, via webhook_config, to Nagios:

    {"receiver":"watchdog",
    "status":"firing",
    "alerts":[
    {"status":"firing",
    "labels":
    {"alertname":"Watchdog",
    "prometheus":"openshift-monitoring/k8s",
    "severity":"none"},
    "annotations":
    {"message":"This is an alert meant to ensure that the entire alerting pipeline is functional.\nThis alert is always firing, therefore it should always be firing in Alertmanager\nand always fire against a receiver. There are integrations with various notification\nmechanisms that send a notification when this alert is not firing. For example the\n\"DeadMansSnitch\" integration in PagerDuty.\n"},
    "startsAt":"2020-03-26T10:57:30.163677339Z",
    "endsAt":"0001-01-01T00:00:00Z",
    "generatorURL":"https://prometheus-k8s-openshift-monitoring.apps.box.example.com/graph?g0.expr=vector%281%29\u0026g0.tab=1",
    "fingerprint":"e25963d69425c836"}],
    "groupLabels":{},
    "commonLabels":
    {"alertname":"Watchdog",
    "prometheus":"openshift-monitoring/k8s",
    "severity":"none"},
    "commonAnnotations":
    {"message":"This is an alert meant to ensure that the entire alerting pipeline is functional.\nThis alert is always firing, therefore it should always be firing in Alertmanager\nand always fire against a receiver. There are integrations with various notification\nmechanisms that send a notification when this alert is not firing. For example the\n\"DeadMansSnitch\" integration in PagerDuty.\n"},
    "externalURL":"https://alertmanager-main-openshift-monitoring.apps.box.example.com",
    "version":"4",
    "groupKey":"{}/{alertname=\"Watchdog\"}:{}"}
    

    In the end, if all went well you see in Nagios the services overview a nice green 'OCPALERTMANEGER' service

    If you want to catch up with Nagios passive checks, read more at Nagios Core Passive Checks.

    Thanks for joining me on this journey!

     

    Last updated: June 29, 2020

    Recent Posts

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    • How EvalHub manages two-layer Kubernetes control planes

    • Tekton joins the CNCF as an incubating project

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.