This repository contains the files that are used for the Hadoop Data Science workshop, originated at the GOTO conference, 2015, Amsterdam
README.md
Meaning that you should read this
├── tutorial
├── 01-IPython-notebook.ipynb
Explore the possibilities of an IPython notebook
├── 01-IPython-notebook-exercise.ipynb
Play around with a notebook
├── 02-Apache-Spark.ipynb
Explore Spark for data processing, and see how it differs from regular Python
├── 02-Apache-Spark-exercise.ipynb
├── 02-Apache-Spark-solution.ipynb
Load and process some data with Spark
├── 03-Pandas.ipynb
Explore Pandas for data processing and visualization
├── 03-Pandas-exercise.ipynb
├── 03-Pandas-solution.ipynb
Pandas: DIY and enjoy!
├── 04-Machine-Learning-example.ipynb
Find your way in solving a simple problem using machine learning and Spark
├── example_module.py
This is how you create a module in Python
├── fizzbuzz.csv
Small dataset that is used throughout the notebooks
├── exploration
Inspect these files if you have time and are up for a challenge!
Here, you will predict for a question on stackexchange
how many upvotes it will have received after a month,
based on the number of upvotes in the first day.
├── explore.ipynb
├── plots.ipynb