Big Data

JBoss Data Virtualization: Integrating with Impala on Cloudera

JBoss Data Virtualization: Integrating with Impala on Cloudera

Cloudera Impala is a tool to rapidly query Hadoop data in HBase or HDFS using SQL syntax.  You can use Red Hat JBoss Data Virtualization to query that same data via Impala to take advantage of its optimization. You can also combine that data with other data sources in real time.  The goal of this guide is to import data from a Cloudera Impala instance, manipulate it, and then expose that data as a data service.  This guide includes access to a repository with example scripts, creating a custom base and view model, exposing it as a data service, and finally consuming that data via REST. This is a peer article to Unlock Your Cloudera Data with Red Hat JBoss Data Virtualization.

Continue reading “JBoss Data Virtualization: Integrating with Impala on Cloudera”

Share
Unlock Your Cloudera Data with Red Hat JBoss Data Virtualization

Unlock Your Cloudera Data with Red Hat JBoss Data Virtualization

After Unlock your Hadoop data with Hortonworks and Red Hat JBoss Data Virtualization episode, let’s continue the journey with another “Apache Hadoop” episode of the series: “Unlock your [….] data with Red Hat JBoss Data Virtualization.” Through this blog series, we will look at how to connect Red Hat JBoss Data Virtualization (JDV) to different and heterogeneous data sources.

Continue reading “Unlock Your Cloudera Data with Red Hat JBoss Data Virtualization”

Share
Achieving Deployment Excellence with Red Hat OpenShift.io

Achieving Deployment Excellence with Red Hat OpenShift.io

Recently, the focus on the continuous delivery of value has created a lot of interest in microservices, CI/CD, and containers. The idea is that microservices are small and well defined enough to enable rapid innovation, automated testing, and frequent deployments with minimal risk. This is made possible by adopting continuous integration and continuous delivery pipelines. CI/CD requires the ability to quickly, easily, reliably, and automatically create and tear down complete execution environments. Linux containers address this need by creating lightweight, portable, and isolated runtime environments. It becomes easy to reach the conclusion that the path to digital transform is continuous value delivery via microservices-based on containers and CI/CD.

Continue reading “Achieving Deployment Excellence with Red Hat OpenShift.io”

Share
OpenShift.io The Gathering – Summit 2017 – Developer Tools, Overview and Roadmap Part I

OpenShift.io The Gathering – Summit 2017 – Developer Tools, Overview and Roadmap Part I

Yesterday, at Red Hat Summit, Red Hat announced OpenShift.io. OpenShift.io is the next generation OpenShift platform, based on OpenShift 3, for building and running applications in the cloud. It gives you complete control of your application’s lifecycle, from build to production– regardless of deploying from source or running a pre-built container.

Continue reading “OpenShift.io The Gathering – Summit 2017 – Developer Tools, Overview and Roadmap Part I”

Share
Offload your database data into an in-memory data grid for fast processing made easy

Offload your database data into an in-memory data grid for fast processing made easy

An in-memory data grid is a distributed data management platform for application data that:

  • Uses memory (RAM) to store information for very fast, low-latency response time, and very high throughput.
  • Keeps copies of that information synchronized across multiple servers for continuous availability, information reliability, and linear scalability.
  • Can be used as distributed cache, NoSQL database, event broker, compute grid, and Apache Spark data store.

The technical advantages of an in-memory data grid (IMDGs) provide business benefits in the form of faster decision-making, greater productivity, and improved customer engagement and experience.

Continue reading “Offload your database data into an in-memory data grid for fast processing made easy”

Share
Red Hat JBoss Data Virtualization on OpenShift: Part 4 – Bringing data from outside to inside the PaaS

Red Hat JBoss Data Virtualization on OpenShift: Part 4 – Bringing data from outside to inside the PaaS

Welcome to part 4 of Red Hat JBoss Data Virtualization (JDV) running on OpenShift.

JDV is a lean, virtual data integration solution that unlocks trapped data and delivers it as easily consumable, unified, and actionable information. JDV makes data spread across physically diverse systems such as multiple databases, XML files, and Hadoop systems appear as a set of tables in a local database.

Continue reading “Red Hat JBoss Data Virtualization on OpenShift: Part 4 – Bringing data from outside to inside the PaaS”

Share
Running Spark Jobs On OpenShift

Running Spark Jobs On OpenShift

Introduction:

A feature of OpenShift is jobs and today I will be explaining how you can use jobs to run your spark machine, learning data science applications against Spark running on OpenShift.  You can run jobs as a batch or scheduled, which provides cron like functionality. If jobs fail, by default OpenShift will retry the job creation again. At the end of this article, I have a video demonstration of running spark jobs from OpenShift templates against Spark running on OpenShift v3.

Continue reading “Running Spark Jobs On OpenShift”

Share