Red Hat OpenShift

Introduction:

A feature of OpenShift is jobs and today I will be explaining how you can use jobs to run your spark machine, learning data science applications against Spark running on OpenShift.  You can run jobs as a batch or scheduled, which provides cron like functionality. If jobs fail, by default OpenShift will retry the job creation again. At the end of this article, I have a video demonstration of running spark jobs from OpenShift templates against Spark running on OpenShift v3.

Environment:

  • Infinispan 9.0.0
  • Spark 2.0.1
  • OpenShift Dedicated v3.3
  • Oshinko

Spark Batch Job Example:

apiVersion: batch/v1
kind: Job
metadata:
name: recommend-mllib-scheduled
spec:
parallelism: 1
completions: 1
template:
metadata:
name: recommend-mllib
spec:
containers:
- name: recommend-mllib-job
image: docker.io/metadatapoc/recommend-mllib:latest
imagePullPolicy: "Always"
env:
- name: SPARK_MASTER_URL
value: "spark://instance:7077"
- name: RECOMMEND_SERVICE_SERVICE_HOST
value: "jboss-datagrid-service"
- name: SPARK_USER
value: bob
restartPolicy: Never

 

Scheduled Job (Running Spark Job Every 5 mins):

apiVersion: batch/v2alpha1
kind: ScheduledJob
metadata:
name: sparkrecommendcron
spec:
schedule: "*/5 * * * ?"
jobTemplate:
spec:
template:
spec:
containers:
- name: pi
image: docker.io/metadatapoc/recommend-mllib:latest
imagePullPolicy: "Always"
env:
- name: SPARK_MASTER_URL
value: "spark://instance:7077"
- name: RECOMMEND_SERVICE_SERVICE_HOST
value: "jboss-datagrid-service"
- name: SPARK_USER
value: bob
restartPolicy: Never

Environment Setup

oc cluster up
oc new-app -f http://goo.gl/ZU02P4
oc policy add-role-to-user edit -z oshinko
oc new-app -f https://goo.gl/XDddW5

 
Once you have oshinko and infinispan/jdg setup you will need to spin up a spark cluster.
You can follow these setups in the screenshots below:
 Spark Cluster
 
 
Environment Setup 2
 
 
Environment Setup 2
 
 
Environment Setup 3
 

 

Spark Job Template

Spark jobs may run as scheduled jobs or as one-time batch jobs. You have the option of a source 2 image or to build a custom container which extends our Openshift-Spark image and run a spark-submit job all within OpenShift. I will be demonstrating the custom container extended and spark-submit job run. I have created a template that will wrap around the OpenShift job and run our spark job against the cluster and it will require some inputs:
i) name of the job
ii) spark master ip or service name
iii) JBoss data grid ip or service name
 
 Spark Job Template
 
Spark Job
 
Application creator
 
Job Service

Video Demonstration:

 
 

Links to Project and Example Source Code Used in Demo

RadAnalytics - http://radanalytics.io/
 

To download and learn more about Red Hat JBoss Data Grid, an in-memory data grid to accelerate performance that is fast, distributed, scalable, and independent from the data tier.

Last updated: August 23, 2023