Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Unlock Your Cloudera Data with Red Hat JBoss Data Virtualization

May 11, 2017
Madou Coulibaly
Related topics:
Artificial intelligenceDeveloper toolsLinux
Related products:
Streams for Apache KafkaDeveloper ToolsetRed Hat JBoss Enterprise Application Platform

    After Unlock your Hadoop data with Hortonworks and Red Hat JBoss Data Virtualization episode, let's continue the journey with another "Apache Hadoop" episode of the series: "Unlock your [….] data with Red Hat JBoss Data Virtualization." Through this blog series, we will look at how to connect Red Hat JBoss Data Virtualization (JDV) to different and heterogeneous data sources.

    JDV is a lean, virtual data integration solution that unlocks trapped data and delivers it as easily consumable, unified, and actionable information. It makes data spread across physically diverse systems — such as multiple databases, XML files, and Hadoop systems — appear as a set of tables in a local database. By providing the following functionality, JDV enables agile data use:

    1. Connect: Access data from multiple, heterogeneous data sources.
    2. Compose:  Easily combine and transform data into reusable, business-friendly virtual data models and views.
    3. Consume: Makes unified data easily consumable through open standards interfaces.

    It hides complexities, like the true locations of data or the mechanisms required to access or merge it. Data becomes easier for developers and users to work with.

    This post will guide you step-by-step on how to connect JDV to Cloudera Distribution Hadoop (CDH) via Cloudera JDBC Driver for Impala, using Teiid Designer. We will connect to a Cloudera Distribution Hadoop (CDH) using the Cloudera Impala Translator. A translator acts as the bridge between JBoss Data Virtualization and an external system.

    Prerequisites

    • JDV 6.3 Environment

      Download: https://developers.redhat.com/products/datavirt/overview
      Install: https://developers.redhat.com/products/datavirt/overview

      We will refer to the installation directory of JDV 6.3 as $JDV_HOME.

    • Red Hat JBoss Developer Studio (JBDS) 9.1.0 with Teiid Designer plugins

      Download: https://developers.redhat.com/download-manager/file/jboss-devstudio-9.1.0.GA-installer-eap.jar
      Install: https://developers.redhat.com/products/datavirt/overview

    • Cloudera JDBC Driver for Impala

      Download: https://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-37.html
      Install: Follow the installation instructions.

      In this example, we will connect to a database called "unlockdata" with the username/password "cloudera_dev/cloudera_dev".
      We will refer to the directory with this jar file as $DRIVER_HOME.

    • Cloudera Impala Translator

      The Cloudera Impala Translator is provided with JDV. No installation needed.
      Note: Please visit to discover all the translators available with JDV. If you cannot find a suitable translator for your system then you can develop a custom one.

    Install & Configure the Cloudera JDBC Driver for Impala on JDV

    1. Create a JBoss module directory for the Cloudera JDBC Driver for Impala
      $ mkdir -p $JDV_HOME/modules/system/layers/dv/com/cloudera/impala/main
    2. Copy the Cloudera JDBC jar files into the new JBoss module directory created
      $ cp $DRIVER_HOME/*.jar $JDV_HOME/modules/system/layers/dv/com/cloudera/impala/main
    3. Create a module.xml file with content seen below, in the new JBoss modules directory created
      <?xml version="1.0"?>
      <module xmlns="urn:jboss:module:1.1" name="com.cloudera.impala">
          <resources>
              <resource-root path="commons-codec-1.3.jar"/>
              <resource-root path="commons-logging-1.1.1.jar"/>
              <resource-root path="hive_metastore.jar"/>
              <resource-root path="hive_service.jar"/>
              <resource-root path="httpclient-4.1.3.jar"/>
              <resource-root path="httpcore-4.1.3.jar"/>
              <resource-root path="ImpalaJDBC41.jar"/>
              <resource-root path="libfb303-0.9.0.jar"/>
              <resource-root path="libthrift-0.9.0.jar"/>
              <resource-root path="log4j-1.2.14.jar"/>
              <resource-root path="ql.jar"/>
              <resource-root path="slf4j-api-1.5.11.jar"/>
              <resource-root path="slf4j-log4j12-1.5.11.jar"/>
              <resource-root path="TCLIServiceClient.jar"/>
              <resource-root path="zookeeper-3.4.6.jar"/>
          </resources>
          <dependencies>
              <module name="javax.api"/>
              <module name="javax.transaction.api"/>
          </dependencies>
      </module>
    4. Start your local JDV 6.3 environment
      $ $JDV_HOME/bin/standalone.sh
    5. Add Cloudera JDBC Driver for Impala
      $ $JDV_HOME/bin/jboss-cli.sh --connect
      [standalone@localhost:9999 /] /subsystem=datasources/jdbc-driver=impala:add(driver-name=impala,driver-module-name=com.cloudera.impala,driver-class-name=com.cloudera.impala.jdbc41.Driver,driver-xa-datasource-class-name=com.cloudera.impala.jdbc41.DataSource)
      
    6. Add Cloudera Datasource and enable it
      [standalone@localhost:9999 /] data-source add --name=UnlockData_Cloudera_DS --jndi-name=java:/UnlockData_Cloudera_DS --connection-url=jdbc:impala://localhost:21050/unlockdata --driver-name=impala --user-name=cloudera_dev --password=cloudera_dev
      [standalone@localhost:9999 /] data-source enable --name=UnlockData_Cloudera_DS

    Start Your Development Environment

      1. Start your local JDV 6.3 environment
        $ $JDV_HOME/bin/standalone.sh
      2. Start your local JBDS environment

        Start JBDS 9.1.0 and open the Teiid Designer Perspective as shown below:

        Note: Use the following menu options "Window" > "Perspective" > "Open Perspective" > "Other..." > "Teiid Designer" to set JBDS in Teiid Designer perspective

    Create Your Teiid Project

      1. Create Teiid Model Project called "ClouderaSample"

        Create a new Teiid Model project using right-click "New" > "Teiid Model Project" in the Model Explorer window as shown below:

      2. Import Metadata using Teiid importer

        We are now going to import metadata directly using the Teiid importer. Right-click the project "ClouderaSample" and select Import and select "JDBC Database >> Source Model" as shown above and click "Next >".

    What about automating configuration? (Optional)

    Logotype_RH_Ansible_RGB_Gray

    Ansible, by Red Hat, is a radically simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs. It uses no agents and no additional custom security infrastructure, so it’s easy to deploy – and most importantly, it uses a very simple language (YAML, in the form of Ansible Playbooks) that allow you to describe your automation jobs in a way that approaches plain English. For your convenience, most of the steps are automated in an ansible playbook called cloudera on github and to run you only need to run one command and you should see similar output as shown below:

    $ cd unlock-your-data/cloudera
    $ ansible-playbook local.yml
    PLAY [Configure local JBoss Data Virtualization with Cloudera JDBC driver for Impala] ***
    
    TASK [Gathering Facts] ******************************************************************
    ok: [localhost]
    
    TASK [Create JBoss module directory for the Cloudera JDBC driver for Impala] ************
    changed: [localhost]
    
    TASK [Download and Unarchive the Cloudera JDBC zip file] ********************************
    changed: [localhost]
    
    TASK [Unarchive the Sub Cloudera JDBC zip file] *****************************************
    changed: [localhost]
    
    TASK [Remove Cloudera JDBC unarchived directory] ****************************************
    changed: [localhost]
    
    TASK [Copy module.xml into JBoss modules directory] *************************************
    changed: [localhost]
    
    TASK [Execute Management CLI file(s)] ***************************************************
    changed: [localhost] => (item=add_datasource.cli)
    changed: [localhost] => (item=add_driver.cli)
    
    PLAY RECAP ******************************************************************************
    localhost                  : ok=7    changed=6    unreachable=0    failed=0

    Note: See https://github.com/cvanball/unlock-your-data/tree/master/cloudera for more information.

    Create Your Connection Profile

    1. Create the connection profile

      On the first page of the wizard, click "New..." to create a new Connection Profile. Before we can proceed, we need to setup a new connection profile to be able to connect to the CDH using the JDBC jars previously downloaded.

      Select "Generic" for the Connection Profile Type and name the Connection Profile “UnlockData_Cloudera_DS”. Click "Next >".

    2. Add and configure the JDBC driver

      Click the “Add Driver Definition” button .

      In the "Name/Type" tab, select the right driver template ("Generic JDBC Driver").

      In the "JAR List" tab, click "Clear All" and click "Add JAR/Zip..." to add the JDBC driver jar files, $DRIVER_HOME/*.

      In the "Properties" tab, specify the parameters as following:

      Connection URL = jdbc:impala://server:host/database
      Database Name = database
      Driver Class = com.cloudera.impala.jdbc41.Driver
      User ID = user
      

      Click "Ok" and now we are ready to connect to the CDH via Impala by providing the correct connection details.

      The PostgreSQL JDBC URL is a string with the following syntax:

      jdbc:impala://host:port/database

      where

      host

      is the host name of the server.

      port

      is the port number the server is listening on. The default port number is 21050.

      database

      is the database name.

      Here we will use the following connection details:

      Username = cloudera_dev
      Password = cloudera_dev
      Database = unlockdata
      URL = jdbc:impala://localhost:21050/unlockdata
      

       

       

      Check “Save password” box and Click "Test Connection" to validate if the connection to the server can ping successfully.

      Since the connection can ping successfully, we are ready to select tables from the CDH and create a source model out of it. Click "OK" then "Finish".

    Create Your Source Model

    1. Import the metadata from the database

      Click "Next >" twice to select database objects.

      Select all database objects you want to import then click "Next >".

      Specify the target folder for the source models (here "ClouderaSample/DataSourceLayer"). Make sure that the JNDI name corresponds to the one we created in the JDV environment (Hint: UnlockData_Cloudera_DS) and that Auto-create Data Source is not selected. Click “Finish” to create the source models.

    2. Preview the data through JDV

      Select any model and click the running man icon Running Icon to preview the data as depicted below:

    Conclusion

    In this post, we've shown the configuration steps needed to perform in order to unlock your Cloudera data using Cloudera JDBC driver for Impala with Red Hat JBoss Data Virtualization.

    Now we are ready to federate this data with other data sources from physically distinct systems into such as other SQL databases, XML/Excel files, NoSQL databases, enterprise applications and web services etc.

    For more information about Cloudera, Ansible and Red Hat JBoss Data Virtualization please refer to the following websites:

    • https://www.cloudera.com
    • https://www.ansible.com
    • https://developers.redhat.com/products/datavirt/overview

    Click here to download JBoss Data Virtualization and accept the terms and conditions of the Red Hat Developer Program, which provides no-cost subscriptions for development use only.

    Last updated: November 9, 2023

    Recent Posts

    • LogAn: Large-scale log analysis with small language models

    • stalld’s BPF Backend: Breaking Free from debugfs

    • Running AI inference on Rebellions ATOM NPU with Red Hat AI

    • How we built integration testing for fast-moving AI backend

    • Testing infrastructure red teaming with abliterated models

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.