In order to curate our own dataset, and features from multiple datasets, you face a problem. Each dataset you use has its own format and the shapes. That's why for you own dataset you have to build you own format to cumbersome all the datasets under one umberella and make those features mergable throughout your work.
We save features in the form of dicts
. Dict is the Python's HashMap implementation, where you have K (key)
is the country and the V (value)
is the feature.
From each dataset we explore we curate the features we need in that format aiming for one unified dataset shape that can be merged on demand.
You find four folders here each for a dataset, the four main datasets we explored as follows:
- Worldometers Daily Data which is web-scrapped by David Bumbeishvili on his Github.
- Our World In Data Coronavirus Dataset by OWID organization, also avaiable on Github.
- COVID19 Global Weather Data by Pierre Winter on Kaggle.
- COVID-19 Country Data by Patrick on Kaggle.
In each folder, you will find the original dataset files, and also the files we curated as feature-inpendent dicts. These files collectively is the CCE dataset, which is the core of our work and the second contribution of this work to the world.
Note. We use Pickle Files to save our dicts
in order to preserve the object types by Serilization and Deserilization of objects.
-
Weather Features Dataset by giangnguyen from kaggle.
-
Weather Data for COVID-19 Data Analysis by Davide Bonin from Kaggle.
-
COVID-19: current situation on May Notebook on Kaggle.
-
COVID-19: Digging a Bit Deeper Notebook on Kaggle.
-
Google's COVID-19 Community Mobility Reports. (Tune in later, we might try that on)
-
Forecasting the spread of the novel coronavirus github Repository that does similar work of ours.
-
Country Info Dataset by My Koryto from Kaggle.
-
Covid19 Analysis: EDA + SEIR Model + Predictions Notebook on Kaggle.
-
COVID-19 Lockdown dates by country by jcyzag from Kaggle.
-
Covid Severity Forecasting github Repository thats does similar work of ours but exposes their work as a Python library and is a great place to start from.
-
The Awesome Coronavirus Project, do I need to explain ?