Open Source Data Pipelines e-book cover

Open Source Data Pipelines for Intelligent Applications

Kyle Bader, Sherard Griffin, Pete Brey, Daniel Riek, Nathan LeClaire
English

Overview

Open Source Data Pipelines for Intelligent Applications provides data engineers and scientists insight into how Kubernetes provides a platform for building data platforms that increase an organization’s data agility. 

The execution environment for today’s applications and application architectures is not a single system—instead, it is a system of systems. Kubernetes tackles many of the challenges inherent to deploying applications and applications architectures in a distributed way, including but not limited to service scheduling and discovery, batch execution, load balancing, and self-healing. By ensuring data continuity from the device edge, and at core sites from datacenter to cloud, data scientists are able to create, update, and enact upon data throughout the life cycle, reducing time to meaningful insight and driving more business value.

You’ll learn:

  • How data platforms are evolving to meet the flexibility and agility required by today’s organizations.
  • How Kubernetes has changed the way we process big data and why businesses must adapt.
  • How to design scalable data storage and artificial intelligence applications for private, public, and multicloud infrastructures.

Excerpt

Why use object storage instead of having a server somewhere that users transfer files to and from? There are many reasons, such as massive scalability, resilience, and a predictable API—and often features such as versioning, tagging, and encryption. Not only is object storage designed to be reliable, but it also provides sophisticated capabilities for auditing data access: a comprehensive trail of who modified or accessed objects and when.

Related E-books