Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Unlock your Hadoop data with Hortonworks and Red Hat JBoss Data Virtualization

November 16, 2016
Cojan van Ballegooijen
Related topics:
Artificial intelligence
Related products:
Red Hat JBoss Enterprise Application Platform

Share:

    Welcome to this first episode of this series: "Unlock your [....] data with Red Hat JBoss Data Virtualization (JDV)."

    This post will guide you through an example of connecting to a Hadoop source via the Hive2 driver, using Teiid Designer. In this example we will demonstrate connection to a local Hadoop source.  We're using the Hortonworks 2.5 Sandbox running in Virtual Box for our source, but you can connect to another Hortonwork source if you wish using the same steps.

    Hortonworks provides Hive JDBC and ODBC drivers that let you connect popular tools to query, analyze and visualize data stored within the Hortonworks Data Platform (HDP).

    Note: we support HBase as well, stay tuned for an episode of Unlock your HBase data with Hortonworks and JDV.

    Setup

    • JDV 6.3 environment
      Download: https://developers.redhat.com/products/datavirt/overview
      Install: https://developers.redhat.com/products/datavirt/overview
      We will refer to the installation directory of JDV 6.3 as $JDV_HOME.
    • VirtualBox
      Download: https://www.virtualbox.org/wiki/Downloads
      Install: https://www.virtualbox.org/manual/ch02.html
    • HDP Sandbox 2.5 environment for VirtualBox
      Download: https://hortonworks.com/products/hortonworks-sandbox/#install
      Note: at the time of this blog post the version of HDP Sandbox is 2.5. Hortonworks updates the HDP Sandbox regularly.
      Install: https://hortonworks.com/products/hortonworks-sandbox/#install
    • HDP Hive JDBC jars
      JDV require access to the following jar files to successfully use the Hive JDBC driver to connect to the Hortonworks Sandbox environment in both HTTP and HTTPS modes, you can download them from the HDP Hive JDBC driver archive.
      Download: hive-jdbc-standalone.jar, hadoop-common.jar, and hadoop-auth.jar.
      We will refer to the directory of these jar files as $HOME/hdp.
    • Red Hat JBoss Developer Studio (JBDS) 9.1.0 with Teiid Designer plugins
      Download: https://developers.redhat.com/download-manager/file/jboss-devstudio-9.1.0.GA-installer-eap.jar
      Install: https://developers.redhat.com/products/datavirt/overview

    Configure

    1. Create a JBoss module directory for the HDP Hive2 JDBC driver
      $ cd $JDV_HOME
      $ mkdir -p modules/system/layers/dv/org/apache/hadoop/hive2/main

      Note: copy the previously downloaded HDP Hive JDBC jar files to modules/system/layers/dv/org/apache/hadoop/hive2/main

    2. Create a module.xml file with content seen below, in the JBoss modules directory created in our previous step
      <?xml version="1.0" encoding="UTF-8"?>
       <module xmlns="urn:jboss:module:1.0" name="org.apache.hadoop.hive2">
         <resources>
           <resource-root path="hadoop-auth-2.7.3.2.5.0.0-1245.jar"/>
           <resource-root path="hadoop-common-2.7.3.2.5.0.0-1245.jar"/>
           <resource-root path="hive-jdbc-1.2.1000.2.5.0.0-1245-standalone.jar"/>
         </resources>
         <dependencies>
           <module name="org.slf4j"/>
           <module name="org.apache.commons.logging"/>
           <module name="javax.api"/>
           <module name="javax.resource.api"/>
         </dependencies>
       </module>
    3. Start your local JDV 6.3 environment
      $ cd $JDV_HOME/bin
      $ ./standalone.sh
    4. Add HDP Hive2 JDBC driver
      $ cd $JDV_HOME/bin
      $ ./jboss-cli.sh 
      [disconnected /] connect localhost:10099
      [standalone@localhost:10099 /] /subsystem=datasources/jdbc-driver=hdp-sandbox:add(driver-name=hdp-sandbox,driver-module-name=org.apache.hadoop.hive2,driver-class-name=org.apache.hive.jdbc.HiveDriver)
      
    5. Add HDP Hive2 Datasource and enable it
      [standalone@localhost:10099 /] data-source add --name=HDPHive2 --jndi-name=java:/HDPHive2DS --connection-url=jdbc:hive2://localhost:10000/default --driver-name=hdp-sandbox --user-name=maria_dev --password=maria_dev
      [standalone@localhost:10099 /] data-source enable --name=HDPHive2

     

    What about automating configuration ?

    A question you might ask: Can we automate the above configuration steps?
    The answer is yes. we can, with Ansible, by Red Hat.

     

    As you can imagine, I like drinking our own champagne (cool-aid).

    Ansible is a radically simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs. It uses no agents and no additional custom security infrastructure, so it’s easy to deploy – and most importantly, it uses a very simple language (YAML, in the form of Ansible Playbooks) that allow you to describe your automation jobs in a way that approaches plain English.

    For your convenience most of the steps are automated in an ansible playbook called hdphive2 on github and to run you only need to run one command and you should see similar output as shown below:

    $ cd unlock-your-data/hdphive2
    $ ansible-playbook local.yml
    PLAY [Configure local JBoss Data Virtualization with HDP Hive2 JDBC driver] ****
    
    TASK [setup] *******************************************************************
    ok: [localhost]
    
    TASK [Create JBoss module directory for the HDP Hive2 JDBC driver] *************
    changed: [localhost]
    
    TASK [Download HDP Hive JDBC jar files] ****************************************
    changed: [localhost] => (item=http://repo.hortonworks.com/content/repositories/releases/org/apache/hadoop/hadoop-auth/2.7.3.2.5.0.0-1245/hadoop-auth-2.7.3.2.5.0.0-1245.jar)
    changed: [localhost] => (item=http://repo.hortonworks.com/content/repositories/releases/org/apache/hadoop/hadoop-common/2.7.3.2.5.0.0-1245/hadoop-common-2.7.3.2.5.0.0-1245.jar)
    changed: [localhost] => (item=http://repo.hortonworks.com/content/repositories/releases/org/apache/hive/hive-jdbc/1.2.1000.2.5.0.0-1245/hive-jdbc-1.2.1000.2.5.0.0-1245-standalone.jar)
    
    TASK [Copy module.xml into JBoss modules directory] ****************************
    changed: [localhost]
    
    TASK [Execute Management CLI file(s)] ******************************************
    changed: [localhost] => (item=add_datasource.cli)
    changed: [localhost] => (item=add_driver.cli)
    
    PLAY RECAP *********************************************************************
    localhost : ok=5 changed=4 unreachable=0 failed=0

    See http://github.com/cvanball/unlock-your-data/tree/master/hdphive2 for more information.

    Steps to unlock your Hadoop data

      1. Start your local JDV 6.3 environment
        $ cd $JDV_HOME/bin
        $ ./standalone.sh
      2. Start your local HDP Sandbox environment
        Note: Before starting the HDP Sandbox environment please change the port 8080 to 8070 in the port forwarding section of the HDP Sandbox environment since it exposes the 8080 port by default for Ambari). The HDP Sandbox environment contains several databases with Hive tables out of the box as shown below in the Hive View of Ambari.
      3. Start your local JBDS environment
        Start JBDS 9.1.0 and open the Teiid Designer Perspective as shown below

        Note: Use the following menu options Window > Perspective > Open Perspective > Teiid Designer to set JBDS in Teiid Designer perspective
      4. Create Teiid Model Project HDPHive2Sample
        Create a new Teiid Model project using right-click New >Teiid Model Project in the Model Explorer window as shown below.
      5. Create connection profile and source model
        We are now going to use the HDP Hive2 JDBC driver directly and import metadata directly using the JDBC importer. Right-click on project HDPHive2Sample and select Import and select JDBC Database >> Source Model as shown below.
        screen-shot-2016-11-12-at-15-23-07
        Click "Next >".a) Create Connection Profile
        Setup Connection Profile to connect to local HDP Sandbox environment
        screen-shot-2016-11-12-at-16-03-22
        On the first page of the wizard, click "New..." to create a new Connection ProfileBefore we can proceed we need to setup a new connection profile to be able to connect to the HDP Sandbox environment using the JDBC jars previously downloaded from the HDP Hive JDBC archive site. Click the “Add Driver Definition” button in order to be able to connect to the HDP Sandbox environment from within JBDS. Select "Generic JDBC" for the Connection Profile Type and name the Connection Profile “HDPHive2DS”. Click "Next >".

        Add the jars (hive-jdbc-standalone.jar, hadoop-common.jar, and hadoop-auth.jar) we mentioned previously in the JAR List tab see below.

        Now we are ready to connect to the HDP Sandbox environment using Hive by providing the correct connection details in the Properties tab as shown below:
        .
        The Hive JDBC URL is a string with the following syntax:

         

        jdbc:hive2://<host1>:<port1>,<host2>:<port2>/dbName;sess_var_list?hive_conf_list#hive_var_list

        where

        <host1>:<port1>,<host2>:<port2>

        is a server instance or a comma separated list of server instances to connect to (if dynamic service discovery is enabled). If empty, the embedded server will be used.

        dbName

        is the name of the initial database.

        sess_var_list

        is a semicolon separated list of key=value pairs of session variables (e.g., user=foo;password=bar).

        hive_conf_list

        is a semicolon separated list of key=value pairs of Hive configuration variables for this session (e.g., hive.server2.transport.mode=http;hive.server2.thrift.http.path=hs2).

        hive_var_list

        is a semicolon separated list of key=value pairs of Hive variables for this session.
        default port for HiveServer2 is 10000)

        In the Hortonworks Sandbox environment we can use the following connection details:
        Username = maria_dev
        Password = maria_dev
        Database = foodmart
        URL = jdbc:hive2://localhost:10000/foodmart
        Check “Save password”.

        Click "Ok".

        Click "Test Connection" to validate if the connection to the HDP Sandbox environment can ping successfully.

        Since the connection can ping successfully we are ready to select Hive tables from the HDP Sandbox environment and create a source model out of it.

        b) Create Source model


        Click "Next >".


        Click "Next >" to select database objects. Select all tables in the foodmart database.

        Click “Next >”.

        Make sure that the JNDI name corresponds to one we created in the JDV environment (Hint: HDPHive2DS) and that Auto-create Data Source is not selected. Click “Finish” to create the source models. Select the customer model and click the running man icon to preview the data as depicted below.

    Conclusion

    In this post we've shown the configuration steps one needs to perform in order to unlock your Hadoop data using Hive, with Hortonworks and Red Hat JBoss Data Virtualization. We have shown that Ansible is not only targeted at system administrators, but that it’s also an invaluable tool for developers, testers, etc. I encourage you to experiment with the simple basics provided in this article, and expand the functionality of your playbook gradually to create even more sophisticated provisioning scripts.

    Now we are ready to add other data-sources from physically distinct systems into the mix such as SQL databases, XML/Excel files, NoSQL databases, enterprise applications and web-services etc.

    For more information on Ansible, Hortonworks and Red Hat JBoss Data Virtualization please refer to the Ansible, Hortonworks and Red Hat JBoss Data Virtualization websites:

    • http://www.ansible.com
    • http://www.hortonworks.com
    • https://developers.redhat.com/products/datavirt/overview

    Special thanks to Marc Zottner and Roeland van der Pol for their ansible scripts which give me the inspiration to use ansible for the configuration in this post. Do you want to get inspired as well? See for more sophisticated Red Hat JBoss Middleware Ansible playbooks at https://github.com/Maarc/ansible_middleware_soe.

    Last updated: October 26, 2023

    Recent Posts

    • Cloud bursting with confidential containers on OpenShift

    • Reach native speed with MacOS llama.cpp container inference

    • A deep dive into Apache Kafka's KRaft protocol

    • Staying ahead of artificial intelligence threats

    • Strengthen privacy and security with encrypted DNS in RHEL

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue