Erik Erlandson, Burr Sutter
May 19, 2020

Machine learning with Apache Spark on Kubernetes | DevNation Tech Talk

The first challenge for an AI/ML practitioner is to gather the data inputs needed to feed a learning model. This is where a solution such as Apache Spark’s unified DataFrame API and a scale-out compute model allows you to execute parallelized queries against SQL, Kafka, and S3. In this session, we are going to explore the use of https://radanalytics.io/ and https://opendatahub.io/ on top of Kubernetes/OpenShift to demonstrate a dynamically scalable ETL pipeline for federated data ingestion.