Running Spark Jobs On OpenShift

Introduction:

A feature of OpenShift is jobs and today I will be explaining how you can use jobs to run your spark machine, learning data science applications against Spark running on OpenShift. You can run jobs as a batch or scheduled, which provides cron like functionality. If jobs fail, by default OpenShift will retry the job creation again. At the end of this article, I have a video demonstration of running spark jobs from OpenShift templates against Spark running on OpenShift v3.

Environment:

Infinispan 9.0.0
Spark 2.0.1
OpenShift Dedicated v3.3
Oshinko

Spark Batch Job Example:

apiVersion: batch/v1 kind: Job metadata: name: recommend-mllib-scheduled spec: parallelism: 1 completions: 1 template: metadata: name: recommend-mllib spec: containers: - name: recommend-mllib-job image: docker.io/metadatapoc/recommend-mllib:latest imagePullPolicy: "Always" env: - name: SPARK_MASTER_URL value: "spark://instance:7077" - name: RECOMMEND_SERVICE_SERVICE_HOST value: "jboss-datagrid-service" - name: SPARK_USER value: bob restartPolicy: Never

Scheduled Job (Running Spark Job Every 5 mins):

apiVersion: batch/v2alpha1 kind: ScheduledJob metadata: name: sparkrecommendcron spec: schedule: "*/5 * * * ?" jobTemplate: spec: template: spec: containers: - name: pi image: docker.io/metadatapoc/recommend-mllib:latest imagePullPolicy: "Always" env: - name: SPARK_MASTER_URL value: "spark://instance:7077" - name: RECOMMEND_SERVICE_SERVICE_HOST value: "jboss-datagrid-service" - name: SPARK_USER value: bob restartPolicy: Never

Environment Setup

oc cluster up oc new-app -f http://goo.gl/ZU02P4 oc policy add-role-to-user edit -z oshinko oc new-app -f https://goo.gl/XDddW5

Once you have oshinko and infinispan/jdg setup you will need to spin up a spark cluster.

You can follow these setups in the screenshots below:

Spark Job Template

Spark jobs may run as scheduled jobs or as one-time batch jobs. You have the option of a source 2 image or to build a custom container which extends our Openshift-Spark image and run a spark-submit job all within OpenShift. I will be demonstrating the custom container extended and spark-submit job run. I have created a template that will wrap around the OpenShift job and run our spark job against the cluster and it will require some inputs:

i) name of the job

ii) spark master ip or service name

iii) JBoss data grid ip or service name

Video Demonstration:

Links to Project and Example Source Code Used in Demo

RadAnalytics - http://radanalytics.io/

Spark Machine Learning App Source - https://github.com/zmhassan/Spark-MLlib-Movie-Recommendation-JDG-Example.git

To download and learn more about Red Hat JBoss Data Grid, an in-memory data grid to accelerate performance that is fast, distributed, scalable, and independent from the data tier.

Last updated: August 23, 2023

Running Spark Jobs On OpenShift

Introduction:

Environment:

Spark Batch Job Example:

Scheduled Job (Running Spark Job Every 5 mins):

Environment Setup

Spark Job Template

Video Demonstration:

Links to Project and Example Source Code Used in Demo

How we designed customizable dashboards in OpenShift

Standardize project context with AGENTS.md and Agent Skills

How to use LVM with shared storage

Why is pytorch compile so fast?

The hidden cost of observability sprawl

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links