Unlock your Microsoft Excel data with Red Hat JBoss Data Virtualization

After Unlock your MariaDB/MySQL data, Unlock your PostgreSQL data, and Unlock your Hadoop data with Hortonworks episodes, let’s continue the journey with this new episode of the series: “Unlock your [….] data with Red Hat JBoss Data Virtualization.” Through this blog series, we will look at how to connect Red Hat JBoss Data Virtualization (JDV) to different and heterogeneous data sources.

JDV is a lean, virtual data integration solution that unlocks trapped data and delivers it as easily consumable, unified, and actionable information. It makes data spread across physically diverse systems — such as multiple databases, XML files, and Hadoop systems — appear as a set of tables in a local database. By providing the following functionality, JDV enables agile data use:

  1. Connect: Access data from multiple, heterogeneous data sources.
  2. Compose: Easily combine and transform data into reusable, business-friendly virtual data models and views.
  3. Consume: Makes unified data easily consumable through open standards interfaces.

It hides complexities, like the true locations of data or the mechanisms required to access or merge it. Data becomes easier for developers and users to work with.

This post will guide you step-by-step on how to connect JDV to a Microsoft Excel spreadsheet using Teiid Designer and the Microsoft Excel translator. A translator acts as the bridge between JBoss Data Virtualization and an external system. The Microsoft Excel translator provides a quick and easy way to read a Microsoft Excel spreadsheet and provides contents of the spreadsheet in the tabular form that can be integrated with other sources.

Prerequisites

Create the File Datasource

  1. Start your local JDV 6.3 environment then start the CLI tool
    $ $JDV_HOME/bin/standalone.sh
    $ $JDV_HOME/bin/jboss-cli.sh --connect
  2. Display all the properties supported by the File Connector
    [standalone@localhost:9999 /] /subsystem=teiid:read-rar-description(rar-name=file)
    

    Here, we are going to use the following properties:

    ParentDirectory: Directory where the data files are stored.
    AllowParentPaths: Set AllowParentPaths to false to disallow '..' in paths. This prevents requesting files that are not contained in the parent directory. Optional.
    
  3. Add the File Datasource and enable it
    [standalone@localhost:9999 /] batch
    [standalone@localhost:9999 / #] /subsystem=resource-adapters/resource-adapter=file/connection-definitions=UnlockData_MSExcel_DS:add(jndi-name=java:/UnlockData_MSExcel_DS, class-name=org.teiid.resource.adapter.file.FileManagedConnectionFactory, enabled=true, use-java-context=true)
    [standalone@localhost:9999 / #] /subsystem=resource-adapters/resource-adapter=file/connection-definitions=UnlockData_MSExcel_DS/config-properties=ParentDirectory:add(value=absolute path of $DATA_HOME)
    [standalone@localhost:9999 / #] /subsystem=resource-adapters/resource-adapter=file/connection-definitions=UnlockData_MSExcel_DS/config-properties=AllowParentPaths:add(value=true)
    [standalone@localhost:9999 / #] /subsystem=resource-adapters/resource-adapter=file:activate
    [standalone@localhost:9999 / #] run-batch

What about automating configuration? (Optional)

A question you might ask: Can we automate the above configuration steps? The answer is yes. We can, with Ansible, by Red Hat.

Logotype_RH_Ansible_RGB_Gray

Ansible is a radically simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs. It uses no agents and no additional custom security infrastructure, so it’s easy to deploy – and most importantly, it uses a very simple language (YAML, in the form of Ansible Playbooks) that allow you to describe your automation jobs in a way that approaches plain English. For your convenience, most of the steps are automated in an ansible playbook called msexcel on GitHub and to run you only need to run one command and you should see similar output as shown below:

$ cd unlock-your-data/msexcel
$ ansible-playbook local.yml
PLAY [Configure local JBoss Data Virtualization to connect to Excel spreadsheets] ***

TASK [setup] *******************************************************************
ok: [localhost]

TASK [Execute Management CLI file(s)] ******************************************
changed: [localhost] => (item=add_datasource.cli)

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=1    unreachable=0    failed=0

Note: See https://github.com/cvanball/unlock-your-data/tree/master/msexcel for more information.

Start Your Development Environment

    1. Start your local JDV 6.3 environment
      $ $JDV_HOME/bin/standalone.sh
    2. Start your local JBDS environment

      Start JBDS 9.1.0 and open the Teiid Designer Perspective as shown below:

      Note: Use the following menu options “Window” > “Perspective” > “Open Perspective” > “Other…” > “Teiid Designer” to set JBDS in Teiid Designer perspective

Create Your Teiid Project

    1. Create Teiid Model Project called “MSExcelSample”

      Create a new Teiid Model project using right-click “New” > “Teiid Model Project” in the Model Explorer window as shown below:

    2. Import Metadata using Teiid connection

      We are now going to import metadata directly using the Teiid connection. Right-click the project “MSExcelSample” and select Import and select “Teiid Connection >> Source Model” as shown above and click “Next >”.

Create Your Source Model

  1. Select the Datasource “UnlockData_MSExcel_DS”

    Click “Next >”

  2. Configuration the translator

    Select the translator “excel” and fill the Import Properties with the following information:

    Data Row Number = 2
    Excel File = unlockdata.xlsx
    Header Row Number = 1

    Click “Next >”

    Specify the target folder for the source models (here “MSExcelSample/DataSourceLayer”) and the name of this Source model (Hint: UnlockData_MSExcel_DS). Click “Next >”.

    Check the box “Set all tables read-only (UPDATABLE FALSE)” then click “Next >”.

    Click “Finish” to create the source model.

Preview Data

Select any model and click the running man icon to preview the data as depicted below:

Conclusion

In this post, we’ve shown the configuration steps needed to perform in order to unlock your Microsoft Excel data with Red Hat JBoss Data Virtualization.

Now we are ready to federate this data with other data sources from physically distinct systems into such as other SQL databases, Hadoop systems, NoSQL databases, enterprise applications and web services etc.

For more information about Red Hat JBoss Data Virtualization or Ansible, please refer to the following websites:


Join the Red Hat Developer Program (it’s free) and get access to related cheat sheets, books, and product downloads.

Share