This repository contains cleaned daily reports and time series data from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins University for Systems Science and Engineering (JHU CSSE).
- To provide a stable version of the JHU CSSE data repository where variable names are not subject to change.
- To provide an updated data set that maintains consistent naming conventions across all files.
- To address various inconsistencies in how data were named and managed; including how dates and time are encoded and how programs may interpret missing values.
- To provide time series data in long (tidy) format that will be familiar to most R-users.
The cleaned data are organized within the following folders akin to how JHU CSSE organize their data.
- Cleaned Daily Reports (csse_covid_19_daily_reports)
- Cleaned Time Series Data (csse_covid_19_time_series)
- Cleaned JHU CSSE Data (csse_covid_19_clean_data)
- This is a new folder. It contains the above daily reports (concatenated) and time series data in a .Rdata format.
This folder contains cleaned daily reports from CSSE JHU. Unlike CSSE JHU's raw csv files, every file in this folder consists of the same variables. These variables adopt a consistent naming scheme and order that will not change (although new variables may be added sequentially). Cleaned daily reports will be added daily to this folder to reflect the latest additions and updates from JHU CSSE.
The following columns are found in every csv file in this order:
- Date_Published *
- FIPS
- Admin2
- Province_State
- Country_Region
- Last_Updated
- Latitude
- Longitude
- Confirmed
- Deaths
- Recovered
- Active
- Combined_Key
New variables will be added IF and WHEN JHU CSSE make changes. However, the above variables and their names will not change. If JHU CSSE introduce new variables, these will be added sequentially to the variables above.
Note: *
Date_Published
is the only variable above that is unique to this repository. It is used for data wrangling; to keep track of daily reports. Note then that the dates record inDate_Published
may differ from those recorded inLast_Update
. The latter ought to be used for time series analysis.
Last_Update
fixes inconsistencies in how dates and times were formatted across csv files.- All timestamps are in UTC (GMT+0) and adopt a standard M/DD/YYYY HH:MM:SS POSIXct format.
- Blank cells indicating an absence of COVID-19
Confirmed
,Deaths
, andRecovered
cases are replace with zeros. (Preventing programs like R from treating these as missing values).
This folder contains time series data in a tidy rather than wide format. Data includes confirmed
, deaths
and recovered
cases of COVID-19. All data are from the JHU CSSE's time series csv files (which JHU CSSE creates from their daily case reports).
The following variables are recorded in this order:
- Province_State
- Country_Region
- Latitude
- Longitude
- Date
- Confirmed
- Deaths
- Recovered
New variables will be added IF and WHEN JHU CSSE make changes. However, the above variable names will not change. If JHU CSSE introduce new variables, these will be added sequentially to the variables above.
Date
is encoded as date objects and adopts a standard YYYY-MM-DD format.- Data on
Confirmed
,Deaths
, andRecovered
cases are concatenated into a single csv file in a tidy format.
Warning: The length of
Recovered
cases is less thanConfirmed
andDeaths
. Missing values indicate dates where data on recoveries are unavailable. JHU CSSE also warns that there are no reliable data sources reporting recovered cases for many countries. Therefore, please excercise caution when interpreting data on the number of recoveries.
This folder contains the latest data from JHU CSSE in .Rdata format. One file, i.e., CSSE_DailyReports.Rdata, concatenates their daily reports; Date_Published
identifies the csv file behind each daily report. The second file, i.e., CSSE_TimeSeries.Rdata, contains the latest time series data. Both files are presented in long rather than wide format. Data are also cleaned per the descriptions above.
A huge thanks to everyone at Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) who are involved in collecting and maintaining the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository.
Data can be found at the JHU CSSE's Data Repository.
I'm just one guy. If you use these data I make no warranties regarding the accuracy of this information and disclaim any liability for damages resulting from using this repository. JHU CSSE's Terms of Use apply.