All 3 questions are on db collection sample_mflix.embedded_movies
To load the sample database follow the instructions here https://github.com/mirkenstein/MSAI-339-NoSQL/blob/master/Mongo/README_ATLAS.md
- Write a query which captures all 3 requirements:
- Movies with
year
between 1975 and 1980 - Display only 3 columns
title
,year
,runtime
- Order by runtime (asc or desc)
- Movies with
Important
Submission: Return top 5 results and submit as part of your homework submission.
Return results would look like this
title,year,runtime
The Terminator ,1980, 120
Helpful documentation
- Write an aggregation aggregating
year
which calculates sum of allruntime
for movies whereyear
is between 1975 and 1980 including.
Important
Submission: Return in your homework the year
and sum of runtime
.
Return results would look like this
year,sumRuntime
1975,1234
Helpful documentation
https://www.mongodb.com/docs/manual/aggregation/
Evey cluster in databricks has access to preloaded datasets.
The dataset /databricks-datasets/adult/adult.data
is part of that preloaded collection.
See the notebook demo discusseds in class
- Read into a dataframe the sample dataset
/databricks-datasets/adult/adult.data
- Display top 5 rows ordered in ascending order by
age
and ascenidng order byeducation_num
.
Important
Submission: Submit the 5 rows from the result in point 2 as part of your homework submition.
Helpful documentation Cnfigure cluster -> https://docs.databricks.com/en/clusters/configure.html
Import notebook -> https://docs.databricks.com/en/notebooks/notebook-export-import.html