The project is divided into 4 subprojects. All the code for the project is present in the file project0.ipynb
.
This implementation uses Dask Tasks
for performing word count on the files present in the input directory.
The input files for the project are expected to be in the directory handout/data/
The output of the project will be generated in the output/
directory. JSON files for each of the four subproject are generated separately.
Open the project0.ipynb
using Jupyter and execute each cell of the notebook sequentially.