First steps with the data virtualization Operator for Red Hat OpenShift

First steps with the data virtualization Operator for Red Hat OpenShift

The Red Hat Integration Q4 release adds many new features and capabilities with an increasing focus around cloud-native data integration. The features I’m most excited about are the introduction of the schema registry, the advancement of change data capture capabilities based on Debezium to technical preview, and data virtualization (technical preview) capabilities.

Data integration is a topic that has not received much attention from the cloud-native community so far, and we will cover it in more detail in future posts. Here, we jump straight into demonstrating the latest release of data virtualization (DV) capabilities on Red Hat OpenShift 4. This is a step-by-step visual tutorial describing how to create a simple virtual database using Red Hat Integration’s data virtualization Operator. By the end of the tutorial, you will learn:

  • How to deploy the DV Operator.
  • How to create a virtual database.
  • How to access the virtual database.

The steps throughout this article work on any Openshift 4.x environment with operator support, even on time- and resource-constrained environments such as the Red Hat OpenShift Interactive Learning Portal.

Prepare the environment

First, you need to prepare an environment for deploying the productized DV operator on OpenShift 4. If you are a Red Hat partner and have access to the Red Hat Product Demo System (RHPDS), request an OpenShift cluster of type “OpenShift 4 Workshop” from the Services menu, as shown in Figure 1. A simple cluster of one node (five users) should be enough:

The RHDPS Services screen with "OpenShift 4 Workshop" selected.

Figure 1: In the RHPDS, go to Services and select OpenShift 4 Workshop.

ALTERNATIVE: Whether you are a partner or not, if you do not have access to RHPDS, the same steps also work on the Red Hat OpenShift Interactive Learning Portal. There, you get a resource- and time-constrained OpenShift cluster for one hour. In this environment, choose OpenShift 4.2 Playground and follow the same steps (and don’t forget to type fast). The rest of this guide assumes RHPDS as the environment.

Now, to create the dv-demo project.

Everything you need to grow your career.

With your free Red Hat Developer program membership, unlock our library of cheat sheets and ebooks on next-generation application development.

SIGN UP

In the GUI

Once the OpenShift cluster is set up, log into OpenShift and go to the Projects page, as shown in Figure 2:

The OpenShift Projects page.

Figure 2: Create your new project from the OpenShift Projects page.

Create a new project named dv-demo.

On the command line

Alternatively, get your token and log in from the command line to create a new project called dv-demo:

$ oc login --token=YOUR_TOKEN --server=YOUR_SERVER
$ oc new-project dv-demo

Then, execute all of the following operations in the dv-demo namespace. Create a Secret using your Red Hat Customer Portal credentials to get access to the Red Hat container images:

$ oc create secret docker-registry dv-pull-secret --docker-server=registry.redhat.io --docker-username=$RH_PORTAL_USERNAME --docker-password=$RH_PORTAL_PASSWORD --docker-email=$RH_PORTAL_EMAIL
$ oc secrets link builder dv-pull-secret
$ oc secrets link builder dv-pull-secret --for=pull

Deploy the DV operator

The following steps deploy the DV operator:

  1. Select Catalog -> OperatorHub from the OpenShift menu.
  2. Search for “Data Virtualization.”
  3. Select the Data Virtualization Operator 7.5.0 provided by Red Hat, Inc., as shown in Figure 3:

    Figure 3: The Data Virtualization Operator dialog box.

    ALTERNATIVE: Data Virtualization Operator 7.5.0 provided by Red Hat, Inc., requires credentials to access the Red Hat Container Catalog. Instead, you can use the Teiid Community Operator, which is based on upstream container images. Its implementation does not require Red Hat credentials. This Operator is not supported by Red Hat, but it can be used for quick demo purposes.

  4. Click Install.
  5. Make sure the namespace is the same as the one created earlier and then click Subscribe, as shown in Figure 4:

    The Create Operator Subscription dialog box.

    Figure 4: The Create Operator Subscription dialog box.

After a few minutes, you should see the DV Operator deployed as a pod in Running status, as shown in Figure 5:

Figure 5: The DV Operator as shown in the Pods listing.

  1. Link the Secret created earlier with the dv-operator service account so that the DV Operator can pull images from the Red Hat registry, too. This step should be performed after the operator is installed and the service account created, and before creating a new VirtualDatabase CustomResource (CR):
    $ oc secrets link dv-operator dv-pull-secret --for=pull

With that, the operator installation process is complete and ready for creating virtualizations.

Create a sample database

Before creating a virtualization, we need a sample database populated with data. The following script installs a PostgreSQL database and inserts sample data. If you are going to create a virtualization to an existing database, instead, you can skip this step and go to the next section:

  1. Create a PostgreSQL database:
    $ oc new-app 
    -e POSTGRESQL_USER=user 
    -e POSTGRESQL_PASSWORD=mypassword 
    -e POSTGRESQL_DATABASE=sampledb 
    postgresql:9.6
  2. Once the PostgreSQL pod is in Running status, connect to it and run the psql client:
    $ oc rsh $(oc get pods -o name -l app=postgresql)
    $ psql -U user sampledb
  3. Create a few tables and populate them with data:
    CREATE TABLE CUSTOMER (
    ID bigint,
    SSN char(25),
    NAME varchar(64),
    CONSTRAINT CUSTOMER_PK PRIMARY KEY(ID));
    
    CREATE TABLE ADDRESS (
    ID bigint,
    STREET char(25),
    ZIP char(10),
    CUSTOMER_ID bigint,
    CONSTRAINT ADDRESS_PK PRIMARY KEY(ID),
    CONSTRAINT CUSTOMER_FK FOREIGN KEY (CUSTOMER_ID) REFERENCES CUSTOMER (ID));
    INSERT INTO CUSTOMER (ID,SSN,NAME) VALUES (10, 'CST01002','Joseph Smith');
    INSERT INTO CUSTOMER (ID,SSN,NAME) VALUES (11, 'CST01003','Nicholas Ferguson');
    INSERT INTO CUSTOMER (ID,SSN,NAME) VALUES (12, 'CST01004','Jane Aire');
    INSERT INTO ADDRESS (ID, STREET, ZIP, CUSTOMER_ID) VALUES (10, 'Main St', '12345', 10);
    q
    exit

With that, we have a simple database set up with data and ready for our demo.

Create a virtual database

We now have our operator installed and a sample database populated. We are all set to create a virtualization. Here are a few ways to do it.

The simplest approach

For this purpose, we will use the simplest approach where the data description language (DDL) definition is embedded in the CR itself. For other approaches, where the DDL and its dependencies are retrieved from external locations, check out the “Next steps” section at the end of this document:

  1. From the OpenShift menu, select Catalog -> Installed Operators as shown in Figure 6:
OpenShift's Catalog -> Install Operators dialog box.

Figure 6: OpenShift’s Catalog -> Install Operators dialog box.

  1. Select Data Virtualization Operator.
  2. Select the Virtual Database tab.
  3. Click Create Virtual Database, which opens the Create Virtual Database dialog box shown in Figure 7:
The Create Virtual Database dialog box.

Figure 7: The Create Virtual Database dialog box.

  1. Inspect and modify the example CR that defines the data virtualization.

The essence of virtualization is the DDL definition that creates new abstract data layers on top of the physical data layers using SQL-MED specification. Here is a brief explanation of the example:

//Step 1 Defines a name for the virtual database you want to create
CREATE DATABASE customer OPTIONS (ANNOTATION 'Customer VDB');USE DATABASE customer;

//Step 2: Configures a translator to interpret data from the datasource
CREATE FOREIGN DATA WRAPPER postgresql;

//Step 3: Configures the datasource connection details for the external source
CREATE SERVER sampledb TYPE 'NONE' FOREIGN DATA WRAPPER postgresql;

//Step 4: Creates schemas to hold metadata about the source and virtual layers
CREATE SCHEMA accounts SERVER sampledb;
CREATE VIRTUAL SCHEMA portfolio;

//Step 5: Imports the metadata from source schema into a virtual schema
SET SCHEMA accounts;
IMPORT FOREIGN SCHEMA public FROM SERVER sampledb INTO accounts OPTIONS("importer.useFullSchemaName" 'false');

// Step 6: Create virtual views
SET SCHEMA portfolio;
CREATE VIEW CustomerZip(id bigint PRIMARY KEY, name string, ssn string, zip string) AS
   SELECT c.ID as id, c.NAME as name, c.SSN as ssn, a.ZIP as zip
FROM accounts.CUSTOMER c LEFT OUTER JOIN accounts.ADDRESS a
ON c.ID = a.CUSTOMER_ID;

To read more about these constructs, check out the data virtualization guide.

ALTERNATIVE: Before creating the virtual database, you can customize the CR containing the details of the virtualization and the embedded DDL. In the example CR above, the connection details are hardcoded and match with those of the PostgreSQL database created earlier. If you just want to get this example running, don’t edit anything and hit the Create button. If you prefer to retrieve the PostgreSQL connection details from a Kubernetes Secret object rather than hard-coding them in the CR, see the section “Using a Kubernetes Secret object instead.”

Using a Kubernetes Secret object instead

To use a Kubernetes Secret object:

  1. Create a Secret that will be used to access the database from the virtualization:
oc create -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: postgresql
type: Opaque
stringData:
  database-user: user
  database-name: sampledb
  database-password: mypassword
EOF
  1. In the CR, replace:
- name: SPRING_DATASOURCE_SAMPLEDB_USERNAME
value: user
- name: SPRING_DATASOURCE_SAMPLEDB_PASSWORD
value: mypassword
- name: SPRING_DATASOURCE_SAMPLEDB_DATABASENAME
value: sampledb

with:

- name: SPRING_DATASOURCE_SAMPLEDB_USERNAME
valueFrom:
secretKeyRef:
name: postgresql
key: database-user
- name: SPRING_DATASOURCE_SAMPLEDB_PASSWORD
valueFrom:
secretKeyRef:
name: postgresql
key: database-password
- name: SPRING_DATASOURCE_SAMPLEDB_DATABASENAME
valueFrom:
secretKeyRef:
name: postgresql
key: database-name
  1. Hit the Create button to instantiate the virtualization, as shown in Figure 8:
Creating the alternative virtual database.

Figure 8: Creating the alternative virtual database.

After a few minutes, you should see that in addition to the dv-operator and postgresql pods, there is also an rdbms-springbootpod running our new virtualization. Once this pod is in Running status, we are ready to use our new virtualization, as you can see in Figure 9:

Figure 9: The dv-operator, postgresql, and rdbms-springboot pods all running in the dv-demo namespace.

Using the CLI

Virtualizations can be created not only from the OpenShift console as demonstrated in the previous section but also through the command line as part of the CI/CD process. You can create a similar virtualization (named dv-customer) with one CLI command as follows:

$ oc create -f https://raw.githubusercontent.com/teiid/teiid-openshift-examples/7.5-1.2.x/rdbms-example/dv-customer.yaml

In the end, you can mix and match your methods. Create your virtualization through CLI and then manage them from OpenShift console, or the other way around. Once done, you can also delete the virtualization within the CLI like this:

$ oc delete vdb dv-customer

Access the virtual database

In addition to deploying virtualization, the DV Operator also creates a service and a route for exposing various virtual database endpoints.

Access from inside the cluster

To see what protocols are exposed through the service, go to the OpenShift menu and select Networking -> Services. Then, choose the service named rdbms-springboot, as shown in Figure 10:

The rdbms-springboot service overview window.

Figure 10: The rdbms-springboot service overview window.

All of the ports named here are accessible from inside the OpenShift cluster; for example, from other pods that want to use the virtualization. Here we can see the following endpoints listed:

    • Monitoring data exposed through Jolokia and Prometheus ports.
    • OData and OpenAPI exposed over the HTTP port.
    • JDBC access available as a Teiid port.
    • ODBC access available as a PostgreSQL (pg) port.

HTTP access from outside the cluster

The simplest way to access the virtualization from outside of the OpenShift cluster is through HTTP and OData/OpenAPI. To find the created route, go to the OpenShift menu and select Networking -> Routes, and choose the route named after the virtualization rdbms-springboot as shown in Figure 11:

The rdbms-springboot route in the OpenShift interface.

Figure 11: The rdbms-springboot route.

Clicking on the location will give you “404 – Not found,” as there is no endpoint exposed at “/”. To see the OData metadata, append odata/$metadata to the end of the URL (for example, https://rdbms-springboot-dv-demo.example.com/odata/$metadata).

If you don’t want to edit the URL and want to directly access the OData endpoint, go to the OpenShift menu and select Installed Operator -> DV Operator -> Virtual Databases -> rdbms-springboot and then route link.

Try out other queries based on the OData Basic Tutorial such as:

  • Retrieve top 2 customer records ordered by name:
    https://rdbms-springboot-dv-demo.example.com/odata/portfolio/CustomerZip?$top=2&$orderby=name
  • Retrieve a customer record by ID and pick the name field value:
    https://rdbms-springboot-dv-demo.example.com/odata/portfolio/CustomerZip(10)/name/$value
  • Search customer records by name, and display the result in JSON:
    https://rdbms-springboot-dv-demo.example.com/odata/portfolio/CustomerZip?$filter=startswith(name,%27Joseph%27)&$format=JSON
  • Generate OpenAPI v3 schema from OData metadata
    https://rdbms-springboot-dv-demo.example.com/odata/openapi.json?version=3

JDBC/ODBC access from outside the cluster

To access the virtualizations over JDBC/ODBC from outside of the OpenShift cluster, we have to expose the JDBC/ODBC ports to the outside world. Then, we can use an SQL client supporting these protocols, or use this Spring Boot-based client application.

The simplest way

The simplest way to expose JDBC/ODBC ports for development purposes (specifically) is by using the oc client as follows:

$ oc port-forward $(oc get pods -o=jsonpath='{.items[0].metadata.name}' -l app=rdbms-springboot) 35432 31000

This command maps ports 35432 (ODBC/PG) and 31000 (Teiid JDBC) from the pod with label rdbms-springboot to the local machine. While this command is running in the terminal, you can start an application and connect to it with configurations such as:

jdbc:postgresql://127.0.0.1:35432/customer?sslMode=disable

or

jdbc:teiid:customer@mm://127.0.0.1:31000

Create a Kubernetes LoadBalancer service

Another way to expose the same port is by creating a Kubernetes LoadBalancer service that can be customized with sticky sessions, etc. To create the LoadBalancer service, execute the following:

$ oc apply -f - << EOF
apiVersion: v1
kind: Service
metadata:
  name: rdbms-springboot-expose
spec:
  type: LoadBalancer
  ports:
  - name: teiid
    port: 31000  
  - name: pg
    port: 35432          
  selector:
    app: rdbms-springboot
EOF

Once the service is created, you can discover the full URL to access the LoadBalance for your cluster and infrastructure provider with the following query:

$ oc get svc rdbms-springboot-expose -o=jsonpath='{..ingress[0].hostname}'

For my OpenShift cluster, the command returns the following URL:

A4d3bf5fd1b9311eab2b602474b8b0b4-143190945.eu-central-1.elb.amazonaws.com

This URL can be used by the sample program above, or any other tool, as follows:

jdbc:postgresql://a4d3bf5fd1b9311eab2b602474b8b0b4-143190945.eu-central-1.elb.amazonaws.com:35432/customer?sslMode=disable

or

jdbc:teiid:customer@mm://a4d3bf5fd1b9311eab2b602474b8b0b4-143190945.eu-central-1.elb.amazonaws.com:31000

The above mechanisms give easy access from outside of the OpenShift cluster to the running virtualizations. There are other mechanisms too, but these are enough for a getting started experience.

Next steps

Congratulations, you have reached the end of how to get started with the data virtualization Operator for OpenShift. Data virtualization is an essential part of the Red Hat Integration stack and we will cover how it integrates with other Red Hat technologies in future pieces. To find out more about data virtualization, try out our examples and check out the latest Red Hat Integration documentation on how to configure data virtualization with other data sources and secure its endpoints through 3scale.

If you have requests or questions related to running the data virtualization technical preview, please let us know by sending an email to the data-integration-preview mailing list.

Share