South Korea dataset module #45

cgomez9 · 2020-04-07T01:20:38Z

Extraction and formatting of the dataset of confirmed cases, deaths and recovered per patient in South Korea #20

ManuelAlvarezC

audit.md and datapackage.json are missing.

Also, on your PR template, there was no checklist?

ManuelAlvarezC · 2020-04-07T12:00:15Z

.gitignore

+### PyCharm ###
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
+# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839


I'm totally fine with you updating the .gitignore with required files, but adding your personal IDE files is considered a bad practice, you can see how to move this contents into a global .gitignore file for your local installation here.

.gitignore has been updated.
also, audit.md and datapackage.json will be added too.
also, didnt notice the PULL_REQUEST_TEMPLATE.md, so ill take a look into it! thanks.

This is awesome, I didn't know you can set a global gitignore thank you @ManuelAlvarezC !

@ManuelAlvarezC Could you elaborate on the audit.md and the datapackage.json? just this two left.

Although is not as complete as it should be, thedocumentation will help you get a good grab on it.

@ManuelAlvarezC Thank you, ill take a look at it, finalize the edit and make the push.

ManuelAlvarezC · 2020-04-07T12:25:53Z

task_geo/data_sources/covid/south_korea/__init__.py

@@ -0,0 +1,3 @@
+from task_geo.data_sources.covid.south_korea.south_korea_patients import south_korea_patients
+
+__all__ = ['south_korea_patients']


Rename your data source to south_korea, or even better, the ISO code, something like kr_covid

ManuelAlvarezC · 2020-04-07T15:29:39Z

task_geo/data_sources/covid/south_korea/south_korea_patients.py

+        'infected_by', 'contact_number'
+    ]
+    df = df.reindex(columns=cols_ordered)
+    df['confirmed_date'] = pd.to_datetime(df.confirmed_date)


This cast can be done in two lines with:

date_columns = [...] df[date_columns] = df[date_columns].apply(pd.to_datetime) # This was written originally as pd.to_datetime(df[columns]) which crashes.

@ManuelAlvarezC Could you elaborate on this?

Sure thing.

On the lines between 37-41(I only selected the line 37 to comment, a blunder of mine) you are casting columns to datetime and reassigning them multiple times. This approach has two drawbacks:

More lines of code to read and write, making it easier to miss details and introduce errors.

It's in fact much faster making the casting and assigning of all the columns at once. This is what is called vectorization and pandas ( and numpy too) are designed to have vectorized operations run much faster than regular iteration in python.

Also, passing the date format to to_datetime will further improve its performance.

ManuelAlvarezC · 2020-04-07T15:30:39Z

task_geo/data_sources/covid/south_korea/south_korea_patients.py

+import requests
+
+
+def south_korea_patients_connector(*args, **kwargs):


I'm not sure why the signature of the functions is with *args, and **kwargs if you are only expecting one argument.
Also, this argument is a url, does this data source work with any url? I think this connector should take no arguments. Can you please update the signature for both connector and data source?

Done, takes no argument now and the url has been set

KrSuma · 2020-04-14T03:40:35Z

Update: audit.md is done, some changes are made to comply with the lint for flake8 (checks during commit).

Just have the datapackage.json left to do.

once this is done, will be making another commit

South Korea dataset module

85b25ef

ManuelAlvarezC requested changes Apr 7, 2020

View reviewed changes

without audit.md and datapackage.json

4274adb

added audit(unfinished), datapackage left

cdc16de

ManuelAlvarezC added ci-fail When the CI fails for a PR waiting-review-changes labels Apr 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

South Korea dataset module #45

South Korea dataset module #45

cgomez9 commented Apr 7, 2020

ManuelAlvarezC left a comment •

edited

Loading

ManuelAlvarezC Apr 7, 2020

KrSuma Apr 9, 2020

cgomez9 Apr 9, 2020

KrSuma Apr 10, 2020

ManuelAlvarezC Apr 10, 2020

KrSuma Apr 12, 2020

ManuelAlvarezC Apr 7, 2020

KrSuma Apr 9, 2020

ManuelAlvarezC Apr 7, 2020 •

edited

Loading

KrSuma Apr 9, 2020 •

edited

Loading

ManuelAlvarezC Apr 9, 2020

ManuelAlvarezC Apr 9, 2020

ManuelAlvarezC Apr 7, 2020

KrSuma Apr 9, 2020

KrSuma commented Apr 14, 2020 •

edited

Loading

		@@ -0,0 +1,3 @@
		from task_geo.data_sources.covid.south_korea.south_korea_patients import south_korea_patients

		__all__ = ['south_korea_patients']

		import requests


		def south_korea_patients_connector(args, *kwargs):

South Korea dataset module #45

Are you sure you want to change the base?

South Korea dataset module #45

Conversation

cgomez9 commented Apr 7, 2020

ManuelAlvarezC left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ManuelAlvarezC Apr 7, 2020 • edited Loading

Choose a reason for hiding this comment

KrSuma Apr 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KrSuma commented Apr 14, 2020 • edited Loading

ManuelAlvarezC left a comment •

edited

Loading

ManuelAlvarezC Apr 7, 2020 •

edited

Loading

KrSuma Apr 9, 2020 •

edited

Loading

KrSuma commented Apr 14, 2020 •

edited

Loading