Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

"... This book will be a great resource for
both readers looking to implement existing
algorithms in a scalable fashion and readers
who are developing new, custom algorithms
using Spark. ..."

Dr. Matei Zaharia
Original Creator of Apache Spark

FOREWORD by Dr. Matei Zaharia

Chapters

This directory contains all of the chapter codes for "Data Algorithms with Spark".


Bonus Chapters

The following directories are bonus chapters:

Bonus Chapter Description
Word Count Provided multiple solutions for word count problem using reduceByKey() and groupByKey() reducers.
Anagrams Find words, which are anagrams: provided multiple solutions for anagrams problem using reduceByKey(), groupByKey(), and combineByKey() reducers.
Lambda Expressions How to use Lambda Expressions in PySpark programs
TF-IDF Term Frequency - Inverse Document Frequency
K-mers K-mers for DNA Sequences
Correlation All vs. All Correlation
mapPartitions() Transformation mapPartitions() Complete Example
UDF User-Defined Function Example
DataFrames Transformations Examples on Creation and Transformation of DataFrames
DataFrames Tutorials DataFrames Tutorials: from collections and CSV text files
Join Operations Examples on join of RDDs
PySpark Tutorial 101 Examples on using PySpark RDDs and DataFrames
Physical Data Partitioning Tutorial of Physical Data Partitioning
Monoid: Design Principle Monoid as a Design Principle

Data Algorithms with Spark Data Algorithms with Spark