Manipulating Data in Apache Spark™ 

How to use DataFrames For Large-Scale Data Science
Manipulating Data in Apache Spark™

Apache Spark™, with the DataFrame API, is uniquely suited for the complexity in preparing and processing massively diverse data sources and data types, to enable large scale data science.



Databricks, founded by the team that originally created Apache Spark, is glad to share this eBook, in which we cover:

  • How DataFrames leverage the power of distributed processing through Spark, and makes big data processing easier for a wider audience.
  • How Spark makes it easy to manipulate different types of data whether you're running business logic on your local machine or on a 100-node cluster.

Get the eBook to learn more.


Get the eBook