June 4, 2020

Erik Erlandson, Burr Sutter

Machine learning with Apache Spark on Kubernetes | DevNation Tech Talk

Machine learning with Apache Spark on Kubernetes | DevNation Tech Talk

The first challenge for an AI/ML practitioner is to gather the data inputs needed to feed a learning model. This is where a solution such as Apache Spark’s unified DataFrame API and a scale-out compute model allows you to execute parallelized queries against SQL, Kafka, and S3. In this session, we are going to explore the use of https://radanalytics.io/ and https://opendatahub.io/ on top of Kubernetes/OpenShift to demonstrate a dynamically scalable ETL pipeline for federated data ingestion.