Apache Spark™ has rapidly emerged as the de facto standard for big data processing across all industries and use cases—from providing recommendations based on user behavior to analyzing millions of genomic sequence data to accelerate drug innovation and development for personalized medicine.
This eBook, the second of a series, offers a collection of the most popular technical blog posts that provide an introduction to machine learning on Apache Spark, and highlights many of the major developments around Spark MLlib and GraphX.
Whether you are just getting started with Spark or are already a Spark power user, this eBook will arm you with the knowledge to be successful on your next Spark project including:
- An introduction to machine learning in Apache Spark
- Using Spark for advanced topics such as clustering, trees, graph processing
- How you can use SparkR to analyze data at scale with the R language