DevNation 2014 - Pat McDonough - Interactive Big Data Applications with Apache Spark
Hadoop has brought distributed data platforms to the masses, disrupting decades of data storage and processing practices with scalable platforms built on commodity components, conquering the problem of large scale batch processing such as ETL. But solving problems beyond batch on Hadoop has been a challenge from the beginning, and nobody has ever accused Hadoop of being all that easy to use, leading to a plethora of bolt-on tools. Whether trying to write applications using an API that's easy to reason, constructing multi-stage data processing pipelines, processing high throughput streams, delivering interactive experiences to users, processing SQL, or simply trying to take advantage of all the memory on the cluster, the typical Hadoop deployment has become a complex array of somewhat interoperable moving parts. Apache Spark is rapidly becoming the tool of choice to solve a number of these problems with a single framework for fast in-memory computing, and with the second largest community of active developers in Big Data (trailing only Apache Hadoop MapReduce), you can expect it to continue to thrive. Spark has an easy to use API (available in several different languages), shells for interactive analysis, a broad range of libraries from graph processing to machine learning, is compatible with Hadoop-stored data, and can run on Hadoop 2's YARN resource manager or in stand-alone environments. Come learn how you can get started writing Spark applications in just a few easy steps, and bring the power of Spark to your big data platform.