Cloudera Impala is a tool to rapidly query Hadoop data in HBase or HDFS using SQL syntax. You can use Red Hat JBoss Data Virtualization to query that same data via Impala to take advantage of its optimization. You can also combine that data with other data sources in real time. The goal of this guide is to import data from a Cloudera Impala instance, manipulate it, and then expose that data as a data service. This guide includes access to a repository with example scripts, creating a custom base and view model, exposing it as a data service, and finally consuming that data via REST. This is a peer article to Unlock Your Cloudera Data with Red Hat JBoss Data Virtualization.
Continue reading “JBoss Data Virtualization: Integrating with Impala on Cloudera”
After Unlock your Hadoop data with Hortonworks and Red Hat JBoss Data Virtualization episode, let’s continue the journey with another “Apache Hadoop” episode of the series: “Unlock your [….] data with Red Hat JBoss Data Virtualization.” Through this blog series, we will look at how to connect Red Hat JBoss Data Virtualization (JDV) to different and heterogeneous data sources.
Continue reading “Unlock Your Cloudera Data with Red Hat JBoss Data Virtualization”
Recently, the focus on the continuous delivery of value has created a lot of interest in microservices, CI/CD, and containers. The idea is that microservices are small and well defined enough to enable rapid innovation, automated testing, and frequent deployments with minimal risk. This is made possible by adopting continuous integration and continuous delivery pipelines. CI/CD requires the ability to quickly, easily, reliably, and automatically create and tear down complete execution environments. Linux containers address this need by creating lightweight, portable, and isolated runtime environments. It becomes easy to reach the conclusion that the path to digital transform is continuous value delivery via microservices-based on containers and CI/CD.
Continue reading “Achieving Deployment Excellence with Red Hat OpenShift.io”
Part II of the OpenShift.io Developer Tools overview follows on the heels of the introduction session, this time presented by Pete Muir and Gorkem Ercan. In this session, we are taken through the integrated OpenShift.io Eclipse Che IDE.
Continue reading “OpenShift.io Developer Tools Overview – Summit 2017 – The Power of Cloud Workspaces – Part 2”
Yesterday, at Red Hat Summit, Red Hat announced OpenShift.io. OpenShift.io is the next generation OpenShift platform, based on OpenShift 3, for building and running applications in the cloud. It gives you complete control of your application’s lifecycle, from build to production– regardless of deploying from source or running a pre-built container.
Continue reading “OpenShift.io The Gathering – Summit 2017 – Developer Tools, Overview and Roadmap Part I”
Today’s announcement of Red Hat OpenShift.io was followed by a full day of developer toolset Summit sessions. These were presented by the OpenShift.io product development team and covered some truly amazing OpenShift.io features. While there are too many features to cover in a single blog post, these were my top 7 items.
Continue reading “7 Freaking Awesome things about OpenShift.io”
This year in Boston, MA you can attend the Red Hat Summit 2017, the event to get your updates on open source technologies and meet with all the experts you follow throughout the year.
It’s taking place from May 2-4 and is full of interesting sessions, keynotes, and labs.
Continue reading “Red Hat Summit 2017 – Planning your AppDev & DevOps labs”
An in-memory data grid is a distributed data management platform for application data that:
- Uses memory (RAM) to store information for very fast, low-latency response time, and very high throughput.
- Keeps copies of that information synchronized across multiple servers for continuous availability, information reliability, and linear scalability.
- Can be used as distributed cache, NoSQL database, event broker, compute grid, and Apache Spark data store.
The technical advantages of an in-memory data grid (IMDGs) provide business benefits in the form of faster decision-making, greater productivity, and improved customer engagement and experience.
Continue reading “Offload your database data into an in-memory data grid for fast processing made easy”
Welcome to part 4 of Red Hat JBoss Data Virtualization (JDV) running on OpenShift.
JDV is a lean, virtual data integration solution that unlocks trapped data and delivers it as easily consumable, unified, and actionable information. JDV makes data spread across physically diverse systems such as multiple databases, XML files, and Hadoop systems appear as a set of tables in a local database.
Continue reading “Red Hat JBoss Data Virtualization on OpenShift: Part 4 – Bringing data from outside to inside the PaaS”
A feature of OpenShift is jobs and today I will be explaining how you can use jobs to run your spark machine, learning data science applications against Spark running on OpenShift. You can run jobs as a batch or scheduled, which provides cron like functionality. If jobs fail, by default OpenShift will retry the job creation again. At the end of this article, I have a video demonstration of running spark jobs from OpenShift templates against Spark running on OpenShift v3.
Continue reading “Running Spark Jobs On OpenShift”