Date: June 4, 2020
Time: 16:00 UTC / 12:00 PM EDT
Abstract: The first challenge for an AI/ML practitioner is to gather the data inputs needed to feed a learning model. This is where a solution such as Apache Spark’s unified DataFrame API and a scale-out compute model allows you to execute parallelized queries against SQL, Kafka, and S3. In this session, we are going to explore the use of https://radanalytics.io/ and https://opendatahub.io/ on top of Kubernetes/OpenShift to demonstrate a dynamically scalable ETL pipeline for federated data ingestion.
Speakers: Erik Erlandson
Erik Erlandson is a Software Engineer at Red Hat’s AI Center of Excellence, where he delivers machine learning solutions on container platforms for customers. He is a chair of the Kubernetes Big Data User Group and a committer on the Apache Spark project.