Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

South Korea dataset module #45

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,103 @@ ENV/

notebooks/data/
docs/notebooks

Created by https://www.gitignore.io/api/pycharm
# Edit at https://www.gitignore.io/?templates=pycharm

### PyCharm ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
Comment on lines +115 to +117
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally fine with you updating the .gitignore with required files, but adding your personal IDE files is considered a bad practice, you can see how to move this contents into a global .gitignore file for your local installation here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.gitignore has been updated.
also, audit.md and datapackage.json will be added too.
also, didnt notice the PULL_REQUEST_TEMPLATE.md, so ill take a look into it! thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, I didn't know you can set a global gitignore thank you @ManuelAlvarezC !

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ManuelAlvarezC Could you elaborate on the audit.md and the datapackage.json? just this two left.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although is not as complete as it should be, thedocumentation will help you get a good grab on it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ManuelAlvarezC Thank you, ill take a look at it, finalize the edit and make the push.


# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf

# Generated files
.idea/**/contentModel.xml

# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml

# Gradle
.idea/**/gradle.xml
.idea/**/libraries

# Gradle and Maven with auto-import
# When using Gradle or Maven with auto-import, you should exclude module files,
# since they will be recreated, and may cause churn. Uncomment if using
# auto-import.
# .idea/modules.xml
# .idea/*.iml
# .idea/modules
# *.iml
# *.ipr

# CMake
cmake-build-*/

# Mongo Explorer plugin
.idea/**/mongoSettings.xml

# File-based project format
*.iws

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Cursive Clojure plugin
.idea/replstate.xml

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties

# Editor-based Rest Client
.idea/httpRequests

# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser

### PyCharm Patch ###
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721

# *.iml
# modules.xml
# .idea/misc.xml
# *.ipr

# Sonarlint plugin
.idea/**/sonarlint/

# SonarQube Plugin
.idea/**/sonarIssues.xml

# Markdown Navigator plugin
.idea/**/markdown-navigator.xml
.idea/**/markdown-navigator/

/.idea/.gitignore
/.idea/misc.xml
/.idea/modules.xml
/.idea/inspectionProfiles/profiles_settings.xml
/.idea/rSettings.xml
/.idea/task-geo.iml
/.idea/vcs.xml
# End of https://www.gitignore.io/api/pycharm
3 changes: 3 additions & 0 deletions task_geo/data_sources/covid/south_korea/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from task_geo.data_sources.covid.south_korea.kr_covid import kr_covid

__all__ = ['kr_covid']
24 changes: 24 additions & 0 deletions task_geo/data_sources/covid/south_korea/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import argparse

from kr_covid import kr_covid

def get_argparser():
parser = argparse.ArgumentParser()

parser.add_argument(
'-o', '--output', required=True,
help='Destination file to store the processed dataset.')

return parser


def main():
parser = get_argparser()
args = parser.parse_args()

dataset = kr_covid()
dataset.to_csv(args.output, index=False, header=True)


if __name__ == '__main__':
main()
Empty file.
59 changes: 59 additions & 0 deletions task_geo/data_sources/covid/south_korea/kr_covid.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import io
import pandas as pd
import requests


def kr_covid_connector():
"""Retrieves data from south_korea_patients.

Arguments:
url(string): Dataset url
Returns:
pandas.DataFrame
"""
url = 'https://raw.githubusercontent.com/KrSuma/COVID19_Kr/master/Datasets/PatientInfo.csv'
csv = requests.get('url').content
return pd.read_csv(io.StringIO(csv.decode('utf-8')))


def kr_covid_formatter(df):
"""Formats data retrieved from south_korea_patients.

Arguments:
df(pandas.DataFrame):

Returns:
pandas.DataFrame
"""
cols_ordered = [
'country', 'state', 'province', 'confirmed_date',
'released_date', 'deceased_date', 'exposure_start',
'exposure_end', 'global_id', 'birth_year',
'local_id', 'sex', 'disease',
'group', 'infection_reason', 'infection_order',
'infected_by', 'contact_number'
]
df = df.reindex(columns=cols_ordered)
date_columns = ['confirmed_date', 'release_date', 'deceased_date', 'exposure_start',
'exposure_end']
df[date_columns] = df[date_columns].apply(pd.to_datetime())

# df['confirmed_date'] = pd.to_datetime(df.confirmed_date)
# df['released_date'] = pd.to_datetime(df.released_date)
# df['deceased_date'] = pd.to_datetime(df.deceased_date)
# df['exposure_start'] = pd.to_datetime(df.exposure_start)
# df['exposure_end'] = pd.to_datetime(df.exposure_end)
return df


def kr_covid():
"""Data Source for south_korea_patients.

Arguments:
url(string): Dataset url

Returns:
pandas.DataFrame
"""
data = kr_covid_connector()
return kr_covid_formatter(data)