Red Hat OpenShift 4.2 IPI

Months ago, a customer asked me about Red Hat OpenShift on OpenStack, especially regarding the network configuration options available in OpenShift at the node level. In order to give them an answer and increase my confidence on $topic, I've considered how to test this scenario.

At the same time, the Italian solution architect "Top Gun Team" was in charge of preparing speeches and demos for the Italian Red Hat Forum (also known as Open Source Day) for the Rome and Milan dates. Brainstorming led me to start my journey toward testing OpenShift 4.2 setup on OpenStack 13 in order to reply to the customer and leverage this effort to build a demo video for Red Hat Forum.

Note: If you want to skip the bits and bytes, skip ahead to the "Demo" section.

OpenShift 4.2 on OpenStack 13: Background

Why OpenShift on OpenStack? There are a number of advantages to combining these two solutions:

  • OpenStack provides OpenShift with a top-class private cloud architecture to host OpenShift nodes, granting multi-tenancy, an as-a-service approach, and modularity at the Infrastructure-as-a-Service (IaaS) level.
  • This combination provides a three-layer scaling architecture because OpenStack nodes, OpenShift nodes, and OpenShift pods can be scaled horizontally. This combination means that you can follow your business needs without constraints.
  • OpenStack provides a programmatic API-driven approach for OpenShift. For instance, you can scale your OpenShift worker nodes via MachineSet by calling the OpenStack API with a single click.
  • OpenShift on OpenStack is integrated with Nova, Cinder, Swift, Octavia, Kuryr, etc. For instance, with Kuryr you can avoid double encapsulation—i.e., OpenShift software-defined networking (SDN) on OpenStack SDN—by using Neutron networks at the pod level.
  • OpenShift on OpenStack is co-engineered by Red Hat, which means having aligned product roadmaps and integration tests created by the Red Hat engineers working on these projects every single day.

OpenShift Installer Provisioned Infrastructure (IPI) was released with OpenShift 4.2. The objectives for the new installer are to provision and configure OpenShift 4.2 in a fully automated and opinionated way, making it easy to get started on day one and granting you more time to focus on your team on day two.

As you may know, IPI on OpenShift 4.2 also supports Red Hat OpenStack Platform 13 as a provider, leveraging OpenStack's virtualization capabilities to host OpenShift nodes. The main concern to me was that I didn't have enough bare-metal nodes to build my environment. A standard high-availability (HA) OpenStack environment is composed of:

  • One director node
  • Three controllers
  • Three Ceph nodes
  • At least two compute nodes

My goal was to build the following to host OpenShift 4.2 and simulate an HA environment at the control plane and storage level:

  • One director node (undercloud)
  • Three controllers
  • Three Ceph nodes
  • One compute node (overcloud)

Why? To simulate the existing customer environment.

How? Using VMs as OpenStack nodes.

I had an idea: To see if I can set up everything with just a single bare metal server. That effort pushed me to publish this article so I can share and explain how I tested an OpenShift 4.2 IPI setup on OpenStack 13 with a single Red Hat Enterprise Linux (RHEL) server. Doing this was possible because RHEL is properly tuned to use nested virtualization with KVM.

Warning: This article was written to help customers, partners, and community members test OpenShift 4.2 on OpenStack 13 only for demo/test purposes. This procedure and the resulting architecture are not supported (and not even suggested) by Red Hat.

I'd like to thank Daniel Bellantuono for sharing helpful tips about OpenStack's architecture.

Scenario

I used just a single bare-metal node (L0) and then, using KVM's nested virtualization features, created a deployment of OpenStack nodes (L1) with virtualized OpenShift nodes (L2) on top. Figure one shows a schema summarizing the whole setup.

The resulting schema
Figure 1: The resulting schema.

Now, let's dig into the different layers.

L0 bare metal

The L0 bare metal node was configured with Red Hat Enterprise Linux and KVM to act as a hypervisor. Its server requirements are:

  • At least 32 cores
  • 160 GB RAM
  • 500 GB SSD disk (to host high-performance VM disks, namely the Ceph OSD disks, and the Nova compute disk)
  • 200 GB SAS disk (to host medium-performance VM disks, namely the undercloud disk and the controller disks).

Note: You could use SSDs for every VM, but I had to balance my needs with hardware availability.

The virsh command shows the rest of the bare metal node's specs:

[root@newkvm ~]# virsh nodeinfo
CPU model: x86_64
CPU(s): 32
CPU frequency: 2099 MHz
CPU socket(s): 1
Core(s) per socket: 8
Thread(s) per core: 2
NUMA cell(s): 2
Memory size: 167676348 KiB

Next, I used the tuned command to perform network latency workload tuning at the L0 level:

[root@newkvm ~]# tuned-adm profile network-latency

In order to successfully configure and deploy your overcloud nodes, you need to do two things. First, you need to define a provisioning network on libvirt for the undercloud to use when installing our overcloud nodes via PXE. Second, you have to define your virtual machines.

Here is a snippet of network config at the L0 level:

[root@newkvm ~]# cat > /tmp/provisioning.xml <<EOF
<network>
<name>provisioning</name>
<ip address="172.16.0.254" netmask="255.255.255.0"/>
</network>
EOF
[root@newkvm ~]# echo "Defining provisioning network..."
[root@newkvm ~]# virsh net-define /tmp/provisioning.xml
[root@newkvm ~]# echo "Setting net-autostart to provisioning network..."
[root@newkvm ~]# virsh net-autostart provisioning
[root@newkvm ~]# echo "Starting provisioning network..."
[root@newkvm ~]# virsh net-start provisioning
[root@newkvm ~]# echo "Disabling DHCP on default network..."
[root@newkvm ~]# if(virsh net-dumpxml default | grep dhcp &>/dev/null); then
virsh net-update default delete ip-dhcp-range "<range start='192.168.122.2' end='192.168.122.254'/>" --live --config
echoinfo "DHCP already disabled, skipping"

The provisioning network is usually a pre-existing datacenter network in a native VLAN configuration. This configuration is used by the undercloud to perform node introspection and setup via PXE and TFTP. For this reason, I created a dedicated network called "provisioning" (Figure 1's blue section) to attach to all of my VMs.

As you may already know, the entire OS setup and configuration for OpenStack nodes (VMs in our case) is managed by the Red Hat OpenStack Platform director. In addition, DHCP was disabled on the default (pre-existing) libvirt network because the director assigns IPs during OpenStack setup. Last, but not least, we need to configure our hypervisor to use an Ironic project driver. My choice was to use VirtualBMC to simulate Intelligent Platform Management Interfaces (IPMIs) that are not available in a virtual machine environment.

Note: Read this Red Hat Knowledge Base article to learn more about how to configure VBMC and use it to import and introspect bare metal nodes.

I don't want to go deeper into the details of OpenStack setup because the process is long and difficult to summarize. This article assumes that you have a basic knowledge and understanding of OpenStack architecture. That being said, some basic steps are provided.

L1 virtual machines (OpenStack nodes)

VMs were defined using qemu-img, virt-customize, and virt-install starting from the Red Hat Enterprise Linux 7 KVM guest image downloadable from the Red Hat Customer Portal:

[root@newkvm ~]# echo "Downloading basic RHEL image"
[root@newkvm ~]# curl -o rhel7-guest-official.qcow2 $RHEL_IMAGE_U
[root@newkvm ~]# echo "Cloning RHEL image to a 100G sparse image..."
[root@newkvm ~]# qemu-img create -f qcow2 rhel7-guest.qcow2 100G
[root@newkvm ~]# echo "Extending file system..."
[root@newkvm ~]# virt-resize --expand /dev/sda1 rhel7-guest-official.qcow2 rhel7-guest.qcow2
[root@newkvm ~]# echo "Checking image filesystem size..."
[root@newkvm ~]# virt-filesystems --long -h -a rhel7-guest.qcow2 | grep 100G &> /dev/null
[root@newkvm ~]# echo "Deleting old image..."
[root@newkvm ~]# rm -f rhel7-guest-official.qcow2
[root@newkvm ~]# echo "Create undercloud qcow2 disk..."
[root@newkvm ~]# qemu-img create -f qcow2 -b rhel7-guest.qcow2 undercloud.qcow2

Director needs to have two NICs. The first one (eth0) is attached to the provisioning network in order to successfully deploy overcloud nodes, and the second (eth1) is attached to the default network in order to reach (via the NAT made by the L0 hypervisor) the internet to download the RPM packages needed for the setup:

[root@newkvm ~]# echo "Customizing VM..."
[root@newkvm ~]# virt-customize -a undercloud.qcow2 --root-password password:mypassword --ssh-inject "root:file:/root/.ssh/id_rsa.pub" --selinux-relabel --run-command 'yum remove cloud-init* -y && cp /etc/sysconfig/network-scripts/ifcfg-eth{0,1} && sed -i s/ONBOOT=.*/ONBOOT=no/g /etc/sysconfig/network-scripts/ifcfg-eth0 && cat << EOF > /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
ONBOOT=yes
IPADDR=192.168.122.253
NETMASK=255.255.255.0
GATEWAY=192.168.122.1
NM_CONTROLLED=no
DNS1=192.168.122.1
EOF'
[root@newkvm ~]# echo "Creating undercloud VM"
[root@newkvm ~]# virt-install --ram 12288 --vcpus 8  --os-variant rhel7 \
--disk path=/var/lib/libvirt/images/undercloud.qcow2,device=disk,bus=virtio,format=qcow2 \
--import --noautoconsole --vnc --network network:provisioning \
--network network:default --name undercloud 
[root@newkvm ~]# echo "Start undercloud VM now and on-boot"
[root@newkvm ~]# virsh start undercloud
[root@newkvm ~]# virsh autostart undercloud

The setup for other VMs is similar, with the only difference being the amount of resources involved (such as RAM and CPU) and the NIC configuration. For the overcloud nodes, I added two additional NICs (Figure 1's orange section) because I wanted a bond inside Open vSwitch. Within this bond, I configured the OpenStack networks (namely InternalApi, Tenant Network, Storage, and Storage Management) with the tag vlan and left the external network untagged. As a result, our external network on the OpenStack side will use the default network on the L0 hypervisor.

After this basic setup, I installed the undercloud, imported and introspected the OpenStack nodes, and then built my OSP templates to successfully deploy my overcloud:

The output from building the OSP templates
Figure 2: Building the OSP templates.

I skipped the overcloud endpoint TLS configuration because, at the time of this writing, Red Hat OpenShift Container Platform 4.2 cannot be installed via Installer Provisioned Installation (IPI) on Red Hat OpenStack Platform when the endpoints are encrypted with self-signed certificates (as highlighted in this knowledge base entry). Therefore, the results are this:

[root@newkvm ~]# virsh list --all
Id Name State
----------------------------------------------------
17 undercloud running
18 overcloud-ceph01 running
19 overcloud-ceph02 running
20 overcloud-ceph03 running
21 overcloud-compute01 running
22 overcloud-ctrl01 running
23 overcloud-ctrl02 running
24 overcloud-ctrl03 running

Here is the resulting overcloud server list:

(undercloud) [stack@undercloud ~]$ openstack server list
+--------------------------------------+------------------+--------+----------------------+----------------+--------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------------+--------+----------------------+----------------+--------------+
| 9c4d82fd-c37e-4341-9a24-ea6416751aa3 | lab-controller01 | ACTIVE | ctlplane=172.16.0.40 | overcloud-full | control |
| ae2431d5-ff70-4fd3-83e3-48c72fca626e | lab-controller03 | ACTIVE | ctlplane=172.16.0.21 | overcloud-full | control |
| 4176914d-23ef-4e5f-83cd-86a53d320fc4 | lab-controller02 | ACTIVE | ctlplane=172.16.0.29 | overcloud-full | control |
| 78e6d4b0-c3de-431d-b144-6aa19664818d | lab-ceph01 | ACTIVE | ctlplane=172.16.0.46 | overcloud-full | ceph-storage |
| b7bb7596-4bf7-45f7-bd3b-c6bb79304531 | lab-ceph02 | ACTIVE | ctlplane=172.16.0.22 | overcloud-full | ceph-storage |
| 35258a3a-ff8b-44d0-b68b-a55039c4451d | lab-compute01 | ACTIVE | ctlplane=172.16.0.26 | overcloud-full | compute |
| 93d7ff6c-4713-431e-9461-0303126eb7ad | lab-ceph03 | ACTIVE | ctlplane=172.16.0.37 | overcloud-full | ceph-storage |
+--------------------------------------+------------------+--------+----------------------+----------------+--------------+

Because of the limited hardware capabilities (and over-committing, too, given that we are talking about one single bare-metal server), I executed many tests in order to successfully deploy OpenShift on OpenStack. I ran into many timeout issues but finally, I found the right tuning to apply. What follows are a couple of tips and tricks regarding OpenStack compute node timeout tuning.

You probably had to make two edits in the nova_libvirt container configuration file (/var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf). The first is in the [neutron] section, setting a timeout value (in my case 300 seconds) big enough to avoid timeouts on the Neutron side when nova spawns a new instance:

[neutron]
url=http://172.17.1.150:9696
ovs_bridge=br-int
default_floating_pool=nova
extension_sync_interval=600
timeout=300

The second is in the  [default] section, setting a timeout value (in my case 300 seconds) big enough to avoid timeouts on the Neutron side when nova tries to attach a Virtual Interface (VIF) to a new instance:

[default]
instance_usage_audit_period=hour
rootwrap_config=/etc/nova/rootwrap.conf
compute_driver=libvirt.LibvirtDriver
allow_resize_to_same_host=False
vif_plugging_is_fatal=True
vif_plugging_timeout=300

After these edits, you would restart the nova_libvirt container on the compute node.

Be aware that these changes are applied to the OpenStack Nova container after a container restart. If you want to redeploy your overcloud later, you'll have to customize nova.conf via a custom puppet configuration executed by OpenStack director.

L2 nested virtual machines (OpenShift nodes)

In addition to those nodes (VMs in my case), I of course had to consider the list of requirements needed by IPI in terms of vCPU, RAM, floating IPs, and the security groups to be available at the tenant level. The full prerequisites for OpenShift 4.2 IPI on OpenStack are available here.

Because I've tested the setup many times and I didn't want to worry about prerequisites every time I executed a setup, I made a simple bash script to prepare my tenant on OpenStack:

[stack@undercloud osd-ocp-demo]$ cat create_ocp_tenant.sh
#!/bin/bash
source ../overcloudrc
openstack project create ocp-tenant
openstack user create ocp-user --password mypassword
user=$(openstack user show ocp-user -f value -c id)
admin=$(openstack user show admin -f value -c id)
project=$(openstack project show ocp-tenant -f value -c id)
openstack role add --user $user --project $project _member_
openstack role add --user $user --project $project admin
openstack role add --user $admin --project $project admin
openstack role add --user $user --project $project swiftoperator
# show default quota and set new limits on project ocp-tenant
echo "compute quota"
openstack quota list --compute --project ocp-tenant -f yaml
openstack quota set --cores 40 --ram 102400 $project
echo "network quota"
openstack quota list --network --project ocp-tenant -f yaml
openstack quota set --secgroups 40 --secgroup-rules 500 $project
# create needed flavors
openstack flavor create --ram 16384 --vcpu 4 --disk 25 master
echo -e "working on $project"
source ocp-tenant-openrc
openstack object store account set --property Temp-URL-Key=superkey
# create rhcos image
curl --compressed -J -L -o rhcos-openstack.qcow2 https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/rhcos-42.80.20191002.0-openstack.qcow2
openstack image create --container-format=bare --disk-format=qcow2 --file rhcos-openstack.qcow2 rhcos
mkdir -p /home/stack/osd-ocp-demo
cd /home/stack/osd-ocp-demo
cat <<EOF > clouds.yaml
clouds:
openstack:
auth:
auth_url: http://192.168.122.150:5000/v3
username: "ocp-user"
password: "mypassword"
project_id: $project
project_name: "ocp-tenant"
user_domain_name: "Default"
region_name: "regionOne"
interface: "public"
identity_api_version: 3
EOF
wget -r --no-parent -A 'openshift-install-linux*.tar.gz' https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/
wget -r --no-parent -A 'openshift-client-linux*.tar.gz' https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/
tar -xvzf mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-install-linux-4.2.4.tar.gz -C .
tar -xvzf mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux-4.2.0.tar.gz -C .
# openstack FIP for API lb
openstack floating ip create --floating-ip-address 192.168.122.164 --project ocp-tenant external
# openstack FIP for APPS lb
openstack floating ip create --floating-ip-address 192.168.122.180 --project ocp-tenant external
# add ssh key to ssh agent
eval "$(ssh-agent -s)"
ssh-add /home/stack/.ssh/id_rsa
# configure KUBECONFIG path
export KUBECONFIG='/home/stack/osd-ocp-demo/auth/kubeconfig'

Now that the prerequisites are here, let us look at our install-config.yam file, which will instruct the IPI installer about OpenShift configuration in terms of the number of nodes, flavor to be used, network CIDR, etc.

As you can see, I specified fields to:

  • Build two worker nodes.
  • Build three master nodes.
  • Use OpenStack as the provider with the flavor "master" (created by the script create_ocp_tenant.sh).

In addition, I included a floating IP (FIP) for the internal API load balancer (lbFloatingIP)—this FIP grants access to the internal LB (API load balancer):

(undercloud) [stack@undercloud osd-ocp-demo-static-nic]$ cat template/install-config.yaml
apiVersion: v1
clusterID: ocp4
baseDomain: osd2019.local
compute:
- hyperthreading: Enabled
name: worker
platform: {}
replicas: 2
type: worker
controlPlane:
hyperthreading: Enabled
name: master
platform: {}
replicas: 3
type: master
metadata:
name: ocp4
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineCIDR: 10.0.0.0/16
networkType: OpenShiftSDN
serviceNetwork:
- 172.60.0.0/16
platform:
openstack:
cloud: openstack
computeFlavor: master
externalNetwork: external
lbFloatingIP: 192.168.122.164
octaviaSupport: false
region: regionOne
trunkSupport: false
pullSecret: 'mypull secret'
sshKey: ssh-rsa blablabla stack@undercloud.redhat.local

You may also notice that I didn't use Octavia (an OpenStack load balancer-as-a-service) because, in my own test, I specifically want to simulate a customer environment where Octavia is not used. Octavia is not a strict requirement unless you are using Kuryr.

We can now execute the installation with a simple command (if you want, you can specify the debug log level in order to have a better understanding of the installation process):

(overcloud) [stack@undercloud osd-ocp-demo]$ ./openshift-install create cluster --log-level debug
DEBUG OpenShift Installer v4.2.4
DEBUG Built from commit 425e4ff0037487e32571258640b39f56d5ee5572
DEBUG Fetching "Terraform Variables"...
DEBUG Loading "Terraform Variables"...
DEBUG Loading "Cluster ID"...
DEBUG Loading "Install Config"...
DEBUG Loading "SSH Key"...
DEBUG Loading "Base Domain"...
DEBUG Loading "Platform"...
DEBUG Loading "Cluster Name"...
DEBUG Loading "Base Domain"...
DEBUG Loading "Pull Secret"...
DEBUG Loading "Platform"...
DEBUG Using "Install Config" loaded from target directory
DEBUG Loading "Install Config"...
DEBUG Loading "Image"...
DEBUG Loading "Install Config"...
DEBUG Loading "BootstrapImage"...
DEBUG Loading "Install Config"...
DEBUG Loading "Bootstrap Ignition Config"...
DEBUG Loading "Install Config"...
DEBUG Loading "Kubeconfig Admin Client"...
DEBUG Loading "Certificate (admin-kubeconfig-client)"...
DEBUG Loading "Certificate (admin-kubeconfig-signer)"...
DEBUG Loading "Certificate (kube-apiserver-complete-server-ca-bundle)"...
DEBUG Loading "Certificate (kube-apiserver-localhost-ca-bundle)"...
OUTPUT TRUNCATED

During the installation, log into the OpenStack dashboard (shown in Figure 3) and you'll see that OpenShift IPI takes care of everything; from spawning new instances, to building a dedicated tenant network, configuring security groups, and so on so forth.

The OpenStack dashboard showing the Project -&gt; Compute -&gt; Instances screen.
Figure 3: The OpenStack dashboard lets you watch the installation process in action.

After a while (about 30 minutes) you'll have your Red Hat OpenShift 4.2 cluster up and running, as you can see here:

DEBUG Still waiting for the cluster to initialize: Working towards 4.2.4: 98% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.4: 99% complete
DEBUG Cluster is initialized
INFO Waiting up to 10m0s for the openshift-console route to be created...
DEBUG Route found in openshift-console namespace: console
DEBUG Route found in openshift-console namespace: downloads
DEBUG OpenShift console route is created
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/stack/osd-ocp-demo/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.osd2019.local
INFO Login to the console with user: kubeadmin, password: YOURKUBEADMINRANDOMPASSWORD

Looking at OpenStack network topology in Figure 4, you'll see the resulting architecture.

The OpenStack dashboard displaying Project -&gt; Network -&gt; Network Toplogy.
Figure 4: Your OpenStack network topology.

You can also use the oc client from the client machine used to install OpenShift (in my case, it was my undercloud VM):

(undercloud) [stack@undercloud osd-ocp-demo$ oc get nodes
NAME STATUS ROLES AGE VERSION
ocp4-4p5fd-master-0 Ready master 9d v1.14.6+c7d2111b9
ocp4-4p5fd-master-1 Ready master 9d v1.14.6+c7d2111b9
ocp4-4p5fd-master-2 Ready master 9d v1.14.6+c7d2111b9
ocp4-4p5fd-worker-76gvc Ready worker 9d v1.14.6+c7d2111b9
ocp4-4p5fd-worker-n6jvq Ready worker 9d v1.14.6+c7d2111b9

There is only one post-deployment command required in order to attach a pre-allocated floating IP address (FIP) to the Ingress port. Details can be found in the official docs here. This step is needed because the IPI installer takes care of configuring a Keepalived pod on every master and worker, exposing the virtual IPs (VIPs) that route traffic to internal APIs, the Ingress, and DNS services.

Let's assign our FIP in order to reach the OpenShift console. We need to assign it to the ingress-port:

(overcloud) [stack@undercloud osd-ocp-demo]$ openstack port show c3c14e9d-750f-46fb-af9c-e9fd375719b2
+-----------------------+-------------------------------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------------------------------+
| admin_state_up | UP |
| allowed_address_pairs | |
| binding_host_id | |
| binding_profile | |
| binding_vif_details | |
| binding_vif_type | unbound |
| binding_vnic_type | normal |
| created_at | 2019-11-01T00:49:21Z |
| data_plane_status | None |
| description | |
| device_id | |
| device_owner | |
| dns_assignment | None |
| dns_name | None |
| extra_dhcp_opts | |
| fixed_ips | ip_address='10.0.0.7', subnet_id='9cbfdd62-b1e5-4f01-b49c-db992b9afc8e' |
| id | c3c14e9d-750f-46fb-af9c-e9fd375719b2 |
| ip_address | None |
| mac_address | fa:16:3e:b8:39:8b |
| name | ocp4-ll4qz-ingress-port |
| network_id | ec5de4de-2f52-42c5-87bf-35c8d91bd1a7 |
| option_name | None |
| option_value | None |
| port_security_enabled | True |
| project_id | 699eeaefb7b84291a75d389ec0f10ea2 |
| qos_policy_id | None |
| revision_number | 7 |
| security_group_ids | 9e6ee5d9-fa19-418c-804e-f1c654d2e34b |
| status | DOWN |
| subnet_id | None |
| tags | openshiftClusterID=ocp4-ll4qz |
| trunk_details | None |
| updated_at | 2019-11-01T00:49:29Z |
+-----------------------+-------------------------------------------------------------------------+
openstack floating ip set --port c3c14e9d-750f-46fb-af9c-e9fd375719b2 192.168.122.180

Finally, I updated my host file in order to reach OpenShift via FQDN so I didn't have to configure a DNS service:

#ocp4
192.168.122.164 api.ocp4.osd2019.local
192.168.122.180 console-openshift-console.apps.ocp4.osd2019.local
192.168.122.180 integrated-oauth-server-openshift-authentication.apps.ocp4.osd2019.local
192.168.122.180 oauth-openshift.apps.ocp4.osd2019.local
192.168.122.180 prometheus-k8s-openshift-monitoring.apps.ocp4.osd2019.local
192.168.122.180 grafana-openshift-monitoring.apps.ocp4.osd2019.local

That's it. Thirty minutes later, you'll have your OpenShift cluster up and running on OpenStack. You can then start playing around to test the capabilities this environment can grant to your organization. See Figure 5 for the results in the Red Hat OpenShift Container Platform.

Red Hat OpenShift Container Platform's dashboard.
Figure 5: Your new cluster in Red Hat OpenShift Container Platform.

Networking deep dive

As you saw, we preallocated two FIPs within our tenant using the bash script I shared in the section "L2 nested virtual machines (OpenShift nodes)," in the lines:

lbFloatingIP: 192.168.122.164
ingress port floating: 192.168.122.180

These two FIPS are associated with two Neutron ports, namely api-port (internal IP 10.0.0.5) and the Ingress port (internal IP 10.0.0.7). The first FIP assignment (192.168.122.164 -> 10.0.0.5) was made automatically by IPI during setup. The second FIP association is, instead, managed by us as we saw previously in order to reach OpenShift console and other services:

overcloud) [stack@undercloud osd-ocp]$ openstack floating ip list
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+
| ID | Floating IP Address | Fixed IP Address | Port | Floating Network | Project |
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+
| 87295f6b-75ed-4420-9548-a37d4ae137fc | 192.168.122.164 | 10.0.0.5 | 4ca1d30d-9931-495b-a295-5eba2019293f | 2b122467-8cd0-4159-a176-2a4bc4c2f1e7 | 699eeaefb7b84291a75d389ec0f10ea2 |
| 8e47ae70-fb99-4a3b-ad66-314b9e1a5400 | 192.168.122.180 | 10.0.0.7 | a8d54ead-3283-4746-94d1-ef724fcd50f9 | 2b122467-8cd0-4159-a176-2a4bc4c2f1e7 | 699eeaefb7b84291a75d389ec0f10ea2 |
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------

Looking at the Neutron ports, we can see that, as suspected, those ports are API and Ingress but they are down. So, how can load balancing work? Take a look at this:

(overcloud) [stack@undercloud osd-ocp-demo-static-nic]$ openstack port list | grep "api-port\|ingress"
| 4ca1d30d-9931-495b-a295-5eba2019293f | ocp4-4p5fd-api-port | fa:16:3e:5b:2b:eb | ip_address='10.0.0.5', subnet_id='c38b6eb3-0dc3-41ec-915e-e7c365bcb0a0' | DOWN |
| a8d54ead-3283-4746-94d1-ef724fcd50f9 | ocp4-4p5fd-ingress-port | fa:16:3e:dc:b7:1b | ip_address='10.0.0.7', subnet_id='c38b6eb3-0dc3-41ec-915e-e7c365bcb0a0' | DOWN |

Those ports are not attached to an instance. Instead, they are created on the tenant network to be used by OpenShift to allocate VIPs via Keepalived or the Virtual Router Redundancy Protocol (VRRP) in order to load balance the internal services (API and DNS) exposed by masters and the Ingress requests exposed by workers (ingress pod = OpenShift router).

Digging into our OpenShift setup, the project openshift-openstack-infra contains three haproxy and three keepalived pods running on masters plus two keepalived running on workers:

(overcloud) [stack@undercloud osd-ocp-demo-static-nic]$ oc get pods -n openshift-openstack-infra
NAME READY STATUS RESTARTS AGE
coredns-ocp4-4p5fd-master-0 1/1 Running 0 9d
coredns-ocp4-4p5fd-master-1 1/1 Running 0 9d
coredns-ocp4-4p5fd-master-2 1/1 Running 0 9d
coredns-ocp4-4p5fd-worker-76gvc 1/1 Running 1 9d
coredns-ocp4-4p5fd-worker-n6jvq 1/1 Running 0 9d
haproxy-ocp4-4p5fd-master-0 2/2 Running 2 9d
haproxy-ocp4-4p5fd-master-1 2/2 Running 0 9d
haproxy-ocp4-4p5fd-master-2 2/2 Running 0 9d
keepalived-ocp4-4p5fd-master-0 1/1 Running 0 9d
keepalived-ocp4-4p5fd-master-1 1/1 Running 0 9d
keepalived-ocp4-4p5fd-master-2 1/1 Running 0 9d
keepalived-ocp4-4p5fd-worker-76gvc 1/1 Running 1 9d
keepalived-ocp4-4p5fd-worker-n6jvq 1/1 Running 0 9d
mdns-publisher-ocp4-4p5fd-master-0 1/1 Running 0 9d
mdns-publisher-ocp4-4p5fd-master-1 1/1 Running 0 9d
mdns-publisher-ocp4-4p5fd-master-2 1/1 Running 0 9d
mdns-publisher-ocp4-4p5fd-worker-76gvc 1/1 Running 1 9d
mdns-publisher-ocp4-4p5fd-worker-n6jvq 1/1 Running 0 9d

Looking at one of these pods running on master nodes, we can see that Keepalived was configured to use the VRRP protocol to expose three VIPs:

(overcloud) [stack@undercloud osd-ocp-demo]$ oc rsh keepalived-ocp4-4p5fd-master-0
sh-4.2# cat /etc/keepalived/keepalived.conf | grep -A1 ipaddress
virtual_ipaddress {
10.0.0.5/16
--
virtual_ipaddress {
10.0.0.6/16
--
virtual_ipaddress {
10.0.0.7/16

For instance, in order to route Ingress traffic to internal API ports, there is a VRRP instance with a VIP assigned (10.0.0.5):

vrrp_instance ocp4_API {
state BACKUP
interface ens3
virtual_router_id 29
priority 40
advert_int 1
authentication {
auth_type PASS
auth_pass ocp4_api_vip
}
virtual_ipaddress {
10.0.0.5/16
}
track_script {
chk_ocp
}
}

Looking at the haproxy pod on the master, we can see that it listens on port 7443 on all IPs, and that it balances the API calls to the masters' nodes (section backend masters):

(overcloud) [stack@undercloud osd-ocp-demo]$ oc rsh haproxy-ocp4-4p5fd-master-0
sh-4.2$ cat /etc/haproxy/haproxy.cfg
defaults
maxconn 20000
mode tcp
log /var/run/haproxy/haproxy-log.sock local0
option dontlognull
retries 3
timeout http-keep-alive 10s
timeout http-request 1m
timeout queue 1m
timeout connect 10s
timeout client 86400s
timeout server 86400s
timeout tunnel 86400s
frontend main
bind :7443
default_backend masters
listen health_check_http_url
bind :50936
mode http
monitor-uri /healthz
option dontlognull
listen stats
bind 127.0.0.1:50000
mode http
stats enable
stats hide-version
stats uri /haproxy_stats
stats refresh 30s
stats auth Username:Password
backend masters
option httpchk GET /healthz HTTP/1.0
option log-health-checks
balance roundrobin
server etcd-0.ocp4.osd2019.local. 10.0.0.11:6443 weight 1 verify none check check-ssl inter 3s fall 2 rise 3
server etcd-2.ocp4.osd2019.local. 10.0.0.18:6443 weight 1 verify none check check-ssl inter 3s fall 2 rise 3
server etcd-1.ocp4.osd2019.local. 10.0.0.26:6443 weight 1 verify none check check-ssl inter 3s fall 2 rise 3

Logging via SSH to the CoresOS node (master-0) to double-check, we can see that haproxy is listening on port 7443:

root@ocp4-4p5fd-master-0 ~]# netstat -anop | grep 0.0.0.0:7443
tcp 0 0 0.0.0.0:7443 0.0.0.0:* LISTEN 336621/haproxy off (0.00/0/0)

The VIP (10.0.0.5) instead is assigned right now to master-2 node which is the master from a Keepalived perspective:

root@ocp4-4p5fd-master-1 /]# ip a | grep 10.0.0.5
inet 10.0.0.5/16 scope global secondary ens3

What is missing? If the API and Ingress port on Neutron are down, how does this setup work? It works because on the Neutron ports assigned to masters and workers, keepalive VIPs are allowed from a port security perspective.

Need to disable anti-MAC spoofing only for particular IPs/MACs? This setting allows incoming traffic from different IPs on the same Neutron port:

(overcloud) [stack@undercloud osd-ocp-demo-static-nic]$ neutron port-show e3c60257-1877-45c4-8cae-492ef953207f
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+-----------------------+-----------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+-----------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | {"ip_address": "10.0.0.5", "mac_address": "fa:16:3e:25:f2:fe"} |
| | {"ip_address": "10.0.0.6", "mac_address": "fa:16:3e:25:f2:fe"} |
| | {"ip_address": "10.0.0.7", "mac_address": "fa:16:3e:25:f2:fe"} |

Need to summarize the traffic flow for incoming API traffic? It looks like this:

192.168.122.164 -> MASTER-2 NODE (holding keepalived VIP) -> master-2 haproxy pod -> load balancing to other pods

To summarize Ingress traffic flow for incoming HTTP/HTTPS requests:

192.168.122.180 -> Worker-node (holding keepalived VIP for ingress)  -> console pods, prometheus pods, etc

Note: This page explains IPI networking infrastructure with a good level of detail.

In addition, I have also tried adding OpenStack Neutron ports to OpenShift nodes and attaching a provider network in order to have a dedicated management network with static IP/routes. Unfortunately, I was not able to accomplish this goal because IPI's goal is to provide an opinionated setup. Instead, when User-Provisioned Infrastructure (UPI) is available for Red Hat OpenStack, this addition will give us this option.

Demo

Here is the demo video we recorded with my colleague Rinaldo Bergamini. It shows OpenShift IPI installation, in a practical way.

No video provider was found to handle the given URL. See the documentation for more information.
Last updated: March 28, 2023