The most prevelant problem that appears when a new data science exploration task appear as understanding the COVID-19 exploration task is finding & collecting datasets for it.
And lucky for us, we found a lot of interesting datasets that we can work on to curate our own datasets for the task in hand. Out of all datasets we found, we were particulary interested in the five datasets below:
- Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE available on Github and their awesome Visualization Tools.
- Worldometers Daily Data which is web-scrapped by David Bumbeishvili on his Github.
- Our World In Data Coronavirus Dataset by OWID organization, also avaiable on Github.
- COVID19 Global Weather Data by Pierre Winter on Kaggle.
- COVID-19 Country Data by Patrick on Kaggle.
We made four notebooks, each notebook is to explore a specific dataset (except the first one since its the same as the second mainly) and curate what we need from it. Our Exploration steps are downloading, reading, understanding dataset and finally choosing the features we need out of it. In each notebook, we finalize by extracting the appropriate features into dictionary objects to be used later on by the models.