Skip to content

pksX01/PySpark_Tutorials

Repository files navigation

PySpark_Tutorials

This repo contains code for practising PySpark.

Contents
1. rdd_.ipynb -> This notebook contains basics of RDDs.
2. Pyspark_Intro.ipynb -> This notebook contains code for creating RDDs, PySpark DataFrame from the RDDs, and Pandas DataFrame from PySpark DataFrame.
3. Working_with_Hive_and_PySpark_in_Google_Cloud_Dataproc.ipynb -> This notebook explains how to save PySpark DataFrame in Hive Tables and how to run all these codes on Google Cloud Dataproc.
4. PySpark_Advanced.ipynb -> This notebook delves deep into DataFrames, dealing with different type of data, Spark SQL and some advanced concepts in RDDs.
5. Algoscale_Assignment.ipynb -> This notebook contains solution of the Take Home Assignment Round of the interview for Data Engineer position at AlgoScale.
6. AlgoScale_Interview_Problems.ipynb -> This notebook contains solution of the Technical Round of the interview for Data Engineer position at AlgoScale.

About

This repo contains code for practising PySpark.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published