Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Set up a streams for Apache Kafka cluster with Ansible

July 30, 2024
Romain Pelisse
Related topics:
Automation and managementDevOpsLinuxRuntimesStream processing
Related products:
Red Hat Ansible Automation PlatformStreams for Apache Kafka

Share:

    Based on the project Apache Kafka, streams for Apache Kafka is a powerful, innovative distributed event streaming platform. Using a fully distributed architecture, a streams for Apache Kafka cluster requires you to setup and deploy numerous services, ensure they are all properly configured and, even more importantly, that each component is functional and can communicate with each other. 

    For all these reasons, applying an automation solution such as Red Hat Ansible Automation Platform to deploy and manage streams for Apache Kafka makes a lot of sense, especially as Ansible now has a collection dedicated for the product.

    Overview

    As mentioned, a typical implementation of streams for Apache Kafka includes several components, each deployed on numerous target systems. Depending on the organization of the servers, some of those components may actually be collocated.

    At the core, however, there are two main parts: Zookeepers instances, often abbreviated zk, in charge of the orchestration of the cluster, and the brokers, responsible of processing the request.

    Beyond those foundational elements, a streams for Apache Kafka deployment can include other services, such as Kafka Connect, to help integrate with numerous systems. We'll first tackle the deployment of the cluster itself before discussing this extra service.

    Throughout this article, we will document step-by-step how to deploy each of those components using Ansible. To be thorough and easy to follow, however, we are going to first describe the setup of the Ansible controller itself to ensure the reader has all the information necessary to properly reproduce this demonstration.

    Preparing the Ansible controller

    Install Ansible

    In Ansible lexicon, the system executing the automation tool itself is named the "controller". The machines handled by the controller are designated under the term "targets".

    To set up a controller on Red Hat Enterprise Linux 9 (RHEL 9), the very first step is to enable the Ansible Automation Platform subscription on the system:

    # subscription-manager repos --enable=ansible-automation-platform-2.4-for-rhel-9-x86_64-rpms

    Then, Ansible can be installed simply by using dnf:

    # dnf install -y ansible-core

    Configure Ansible to use Ansible automation hub

    As mentioned above, to smoothly set up and manage our streams for Apache Kafka cluster, we are going to leverage an Ansible collection dedicated to the product. This extension for the automation engine is available on the Ansible automation hub. This means that, to be able to connect to this repository of content, Ansible needs to be configured to utilize it (instead or on top of Ansible Galaxy). This configuration lives in the ansible.cfg configuration file:

    [defaults]
    ...
    [inventory]
    ...
    [galaxy]
    server_list = automation_hub, galaxy
    [galaxy_server.galaxy]
    url=https://galaxy.ansible.com/
    [galaxy_server.automation_hub]
    url=https://cloud.redhat.com/api/automation-hub/
    auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
    token=<insert-your-token-here>

    Please refer to this documentation on how to configure Ansible to Automation Hub and then execute the following command to install the Red Hat provided collection for streams for Apache Kafka.

    Once you have properly configured Ansible to utilize automation hub, you can simply run the following command to install the Ansible collection for streams for Apache Kafka:

    $ ansible-galaxy collection install redhat.amq_streams
     

    Note

    Execute this command as the same system user used to run the playbook later on. Indeed, another user, on the same machine, won't have access to the installed collection.

    Prepare targets

    For any systems managed by Ansible, SSHD must be running and a SSH pubkey, associated to root user needs to have been deployed to ensure that Ansible can connect to it. For RHEL 9 system, the required subscription for streams for Apache Kafka (and RHEL) must be enabled. Please refer to this documentation to learn the necessary steps to achieve this.

    An inventory of the targets needs to be provided to Ansible. The simplest way to achieve this is using a plain inventory file, such as this one:

    [all]
    
    [brokers]
    broker*.example.com
    
    [zookeepers]
    zk*.example.com
    
    [topic_writer]
    broker1.example.com

    For testing purpose, you can use localhost, which also removes the need to set up SSHD (as there will be no SSH connection between Ansible and localhost):

    [all]
    localhost ansible_connection=local
    
    [brokers]
    localhost ansible_connection=local
    
    [zookeepers]
    localhost ansible_connection=local
    
    [topic_writer]
    localhost ansible_connection=local

    Each of the components of the streams for Apache Kafka cluster used different ports, so they can be all deployed on one single system. Obviously, this is not recommended for production. It allows you, however, to run the playbooks provided in this article with only one machine (physical, virtual, or containerized).

     

    Note

    Execute twice every playbook in this demonstration to verify that they are indeed idempotent (meaning, the second time does not perform any change).

    Automating the deployment of the Zookeepers instances

    Now that we have a working Ansible controller and an assortment of target machines ready to be provisioned, we can start the setup of our cluster. The first step will be, obviously, to deploy the zk instances. Indeed, they are being orchestrating the cluster, and having them already running before the brokers make sense.

    Thanks to the Ansible collection for streams for Apache Kafka, the playbook to set up the zk nodes is reduced to the bare minimum:

    ---
    - name: "Ansible Playbook to install a Zookeeper ensemble with auth"
      hosts: zookeepers
      gather_facts: yes
      vars_files:
        - vars.yml
      collections:
        - redhat.amq_streams
      roles:
        - role: redhat.amq_streams.amq_streams_zookeeper
    
      post_tasks:
        - name: "Validate that Zookeeper deployment is functional."
          ansible.builtin.include_role:
            name: redhat.amq_streams.amq_streams_zookeeper
            tasks_from: validate.yml

    Note that the host's value matches the name of the inventory's group associated with the machines responsible for running Zookeeper.

    In essence, the playbook above just use the role redhat.amq_streams.amq_streams_zookeeper provided by the collection. In the post_tasks: segment, it utilizes as well that validate.yml, a handy playbook also supplied by the collection, to check that the zk instances deployed are indeed functional.

    The specific configuration of the Zookeepers node is provided by the variables in the vars.yml:

    ---
    amq_streams_common_download_dir: "/opt"

    In our demonstration here, we chose to deploy the binaries and files of streams for Apache Kafka in the /opt folder, on the target system. As this particular configuration will also be used for the broker (see below), it has been placed into a separate vars.yml file. This way, this configuration can be shared with the playbook managing the broker instance, as we'll cover in the next section.

    You can now run the playbooks to deploy zk on the targets belonging to the 'zookeepers' group:

    $ ansible-playbook -i inventory -e @service_account.yml zookeepers.yml
     

    Note

    KRaft can be employed as an alternative to Zookeepers for orchestration, however the Ansible collection for streams for Apache Kafka does not yet supports deployment using KRaft.

    Automating the deployment of the brokers

    Now that the zk nodes are ready to orchestrate our cluster, we can look into the deployment of the broker’s nodes. Here again, the Ansible collection for streams for Apache Kafka is doing the heavy lifting:

    ---
    - name: "Ansible Playbook to install Kafka Brokers with authentication"
      hosts: brokers
      vars_files:
        - vars.yml
      vars:
        amq_streams_replication_factor: "{{ groups['brokers'] | length }}"
        amq_streams_broker_offsets_topic_replication_factor: "{{ amq_streams_replication_factor }}"
        amq_streams_broker_transaction_state_log_replication_factor: "{{ amq_streams_replication_factor }}"
        amq_streams_broker_transaction_state_log_min_isr: "{{ amq_streams_replication_factor }}"
      collections:
        - redhat.amq_streams
      roles:
        - name: amq_streams_broker
      post_tasks:
        - name: "Validate that Broker deployment is functional."
          ansible.builtin.include_role:
            name: amq_streams_broker
            tasks_from: validate.yml

    Similar to the playbook above, we use the role provided by the streams for Apache Kafka collection in order to deploy the broker. As mentioned in the previous section, we also include the vars.yml to utilize the configuration shared between the zk instances and brokers.

    A few variables were also added in order to configure the broker instances. In the example above, we used the number of instances to determine the value of the replication factor.

    We can now run this playbook in order to set up our cluster:

    $ ansible-playbook -i inventory -e @service_account.yml brokers.yml

    Creating topics in the cluster

    With our zk and brokers nodes completely provisioned, our cluster is now ready. For applications to be able to consume or publish events (or messages), topics must be created.

    Here again, these changes to the setup are fully automated by a third playbook, also leveraging the capabilities of the Ansible collection for streams for Apache Kafka:

    ---
    - name: "Ansible Playbook to ensure topics are created in Red Hat AMQ Streams cluster"
      hosts: topic_writer
      gather_facts: no
      vars_files:
        - vars.yml
      vars:
        amq_streams_broker_topic_to_create_name: "myTopic"
        amq_streams_broker_topics:
          - name: "{{ amq_streams_broker_topic_to_create_name }}"
            partitions: 1
            replication_factor: "{{ groups['brokers'] | length }}"
      collections:
        - middleware_automation.amq_streams
    
      tasks:
        - name: "Create topic: {{ amq_streams_broker_topic_to_create_name }}"
          ansible.builtin.include_role:
            name: amq_streams_broker
            tasks_from: topic/create.yml
          loop: "{{ amq_streams_broker_topics }}"
          loop_control:
            loop_var: topic
          vars:
            topic_name: "{{ topic.name }}"
            topic_partitions: "{{ topic.partitions }}"
            topic_replication_factor: "{{ topic.replication_factor }}"
    
      - name: "Describe topic: {{ amq_streams_broker_topic_to_create_name }}"
        ansible.builtin.include_role:
          name: amq_streams_broker
          tasks_from: topic/describe.yml
        loop: "{{ amq_streams_broker_topics }}"
        loop_control:
          loop_var: topic
        vars:
          topic_name: "{{ topic.name }}"

    The target host for this playbook is all the members of the group topic_writer. As shown on the manifest above, this group only contains one of the brokers. Indeed, to create the topic for the entire cluster, any broker can be utilized and there is no need to repeat the operation on all the nodes.

    Here again, the vars.yml is used in this playbook to give access the shared variables between all of them. In our demonstration, the folder sheltering the streams for Apache Kafka deployment on the target is the only common parameters.

     

    Note

    With a real life use case, there would be the most likely more common configuration between the components, however, for the sake of simplicity, our example only have this parameter shared.

    While we only create one topic, we used a list and the associated for loop to demonstrate how this playbook can be used to ensure the availability of as many topics as needed.

    Let's run this playbook to create our topic inside the cluster. We don't need to pass our service account credentials anymore, as this playbook mustn't perform any product installation and thus does not require to access the Red Hat Customer portal to retrieve the necessary zips:

    $ ansible-playbook -i inventory topics.yml -e amq_streams_common_download_dir=/opt

    Note that we need to specify where streams for Apache Kafka is located on the target system, because this playbook has not performed the installation; thus it does not know where the streams for Apache Kafka binaries and config files lives. If the topics deployment were part of the same playbook as the setup, this would not be necessary.

    Also note that this playbook, like the previous one, is idempotent: if we run it again, no change occurs on the target system as the required topics are already created inside the cluster:

    $ ansible-playbook -i inventory topics.yml

    Automating the deployment of Kafka Connect

    With the cluster now fully functional, we can study an additional component of the rich ecosystem of Red Hat AMQ: Kafka Connect. This supplemental service will allow the cluster to interact with numerous external systems, typically a datasource of some kind, like traditional RDMS or in-memory cache. In essence, it helps data's conversion from a remote service into messages that can be processed by the cluster. Or, the other way around, transforming messages from the cluster into records persisted or manipulated by an external system.

    Here again, setting up this service on the target system is made extremely easy by the Ansible collection for streams for Apache Kafka:

    ---
    - name: "Ansible Playbook to install a Kafka Connect ensemble with auth"
      hosts: connect
      gather_facts: yes
      vars_files:
        - vars.yml
      collections:
        - redhat.amq_streams
      tasks:
        - name: "Ensure Kafka Connect is running"
          ansible.builtin.include_role:
            name: redhat.amq_streams.amq_streams_connect
      vars:
        connectors:
          - { name: "file", path: "connectors/file.yml" }
          
      post_tasks:
        - name: "Validate that Kafka Connect deployment is functional."
          ansible.builtin.include_role:
            name: redhat.amq_streams.amq_streams_connect
            tasks_from: validate.yml

    The sole configuration specific to Kafka Connect is the file connector. It is needed only because Kafka Connect can be started without any connectors. In a real-life use case, it's not an issue as, obviously, such a component would not be deployed with a set of predefined connectors matching the requirement of the applications utilizing the cluster. In our demonstration, we add this connector only for the installation to be successful:

    $ ansible-playbook -i inventory connect.yml -e @service_account.yml
     

    Note

    In our demonstration we install each component on the same host, which means that the Red Hat service authentication data could be omitted in the command line above, as Kafka is already available on the target. However, in a real-life scenario, Kafka Connect will most likely run on a different system and thus require those identification information to set up the service.

    Summary

    Thanks to both Ansible and its collection for streams for Apache Kafka, we have deployed a fully operational and ready to be used cluster, with only a set of simple playbooks. This setup works on one machine, but could be effortlessly utilized for hundreds of systems. The only change required is to modify the inventory accordingly.

    From now on, managing the cluster is also made easier by Ansible. New topics can handily be added to the configuration and deployed. Advanced features, such as authentication between components, can be implemented leveraging functionalities of the streams for Apache Kafka collection. Last but not the least, Ansible can help automate the update of the cluster. It's a difficult subject that needs to be handled according to the usage made of the system, so there is no "out of the box" solution for this, however the collection and the primitives of Ansible, provides a lot assistance to manage such critical operation.

    Related Posts

    • Hello World for AMQ Streams on OpenShift

    • HTTP-based Kafka messaging with Red Hat AMQ Streams

    • Deploy Red Hat AMQ Streams and Fuse on OpenShift Container Platform 4

    Recent Posts

    • How Trilio secures OpenShift virtual machines and containers

    • How to implement observability with Node.js and Llama Stack

    • How to encrypt RHEL images for Azure confidential VMs

    • How to manage RHEL virtual machines with Podman Desktop

    • Speech-to-text with Whisper and Red Hat AI Inference Server

    What’s up next?

    Red Hat AMQ Broker Cheat Sheet includes the most common commands to install, deploy, administer, and operate a messaging system based on AMQ Broker.

    Get the e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue