MongoDB-Hadoop Workshop Exercises

MongoDB powers applications as an operational database and Hadoop delivers intelligence as with powerful analytical infrastructure. In this workshop we'll start by learning about how these technologies fit together with the MongoDB Connector for Hadoop. Then we'll cover reading/writing MongoDB data using MapReduce, Pig, Hive, and Spark. Finally, we'll discuss the broader data ecosystem and operational considerations.

Data

Prior to running any of the exercises, load the sample dataset into MongoDB.

Download MongoDB
Install MongoDB
Download the MovieLens 10M archive and unzip

Finally, load the dataset:

$ python dataset/movielens.py [/path/to/movies.dat] [/path/to/ratings.dat]

For more information refer to the dataset README.

Exercises

Refer to the individual READMEs for steps on building and deploying each exercise.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
dataset		dataset
hive		hive
mapreduce		mapreduce
movieweb @ d28af1e		movieweb @ d28af1e
pig		pig
spark		spark
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MongoDB-Hadoop Workshop Exercises

Data

Exercises

About

Releases

Packages

Languages

License

llvtt/mongodb-hadoop-workshop

Folders and files

Latest commit

History

Repository files navigation

MongoDB-Hadoop Workshop Exercises

Data

Exercises

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages