Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

South Korea dataset module #45

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

cgomez9
Copy link

@cgomez9 cgomez9 commented Apr 7, 2020

Extraction and formatting of the dataset of confirmed cases, deaths and recovered per patient in South Korea #20

Copy link
Collaborator

@ManuelAlvarezC ManuelAlvarezC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

audit.md and datapackage.json are missing.

Also, on your PR template, there was no checklist?

Comment on lines +115 to +117
### PyCharm ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally fine with you updating the .gitignore with required files, but adding your personal IDE files is considered a bad practice, you can see how to move this contents into a global .gitignore file for your local installation here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.gitignore has been updated.
also, audit.md and datapackage.json will be added too.
also, didnt notice the PULL_REQUEST_TEMPLATE.md, so ill take a look into it! thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, I didn't know you can set a global gitignore thank you @ManuelAlvarezC !

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ManuelAlvarezC Could you elaborate on the audit.md and the datapackage.json? just this two left.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although is not as complete as it should be, thedocumentation will help you get a good grab on it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ManuelAlvarezC Thank you, ill take a look at it, finalize the edit and make the push.

@@ -0,0 +1,3 @@
from task_geo.data_sources.covid.south_korea.south_korea_patients import south_korea_patients

__all__ = ['south_korea_patients']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename your data source to south_korea, or even better, the ISO code, something like kr_covid

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

'infected_by', 'contact_number'
]
df = df.reindex(columns=cols_ordered)
df['confirmed_date'] = pd.to_datetime(df.confirmed_date)
Copy link
Collaborator

@ManuelAlvarezC ManuelAlvarezC Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast can be done in two lines with:

date_columns = [...]
df[date_columns] = df[date_columns].apply(pd.to_datetime)
# This was written originally as pd.to_datetime(df[columns]) which crashes.

Copy link

@KrSuma KrSuma Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ManuelAlvarezC Could you elaborate on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

On the lines between 37-41(I only selected the line 37 to comment, a blunder of mine) you are casting columns to datetime and reassigning them multiple times. This approach has two drawbacks:

  1. More lines of code to read and write, making it easier to miss details and introduce errors.
  2. It's in fact much faster making the casting and assigning of all the columns at once. This is what is called vectorization and pandas ( and numpy too) are designed to have vectorized operations run much faster than regular iteration in python.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, passing the date format to to_datetime will further improve its performance.

import requests


def south_korea_patients_connector(*args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the signature of the functions is with *args, and **kwargs if you are only expecting one argument.
Also, this argument is a url, does this data source work with any url? I think this connector should take no arguments. Can you please update the signature for both connector and data source?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, takes no argument now and the url has been set

@KrSuma
Copy link

KrSuma commented Apr 14, 2020

Update: audit.md is done, some changes are made to comply with the lint for flake8 (checks during commit).

Just have the datapackage.json left to do.

once this is done, will be making another commit

@ManuelAlvarezC ManuelAlvarezC added ci-fail When the CI fails for a PR waiting-review-changes labels Apr 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-fail When the CI fails for a PR waiting-review-changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants