Skip to content

Small fixes after course (clarify some instructions) #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 23 additions & 9 deletions notebooks/pandas_06_data_cleaning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2503,14 +2503,11 @@
"\n",
"<details><summary>Hints</summary>\n",
" \n",
"- Use the `rename` method and apply the mapping on the `columns`.\n",
"- The input of the `rename` method van be a dictionary or a function. Use the `clean_column_name` as the function to rename the columns. \n",
"- Make sure to explicitly set the columns= parameter. \n",
"- To rename columns, we can use the `rename()` method.\n",
"- The input of the `rename()` method can also be a function in addition to a dictionary. When passing a function to `rename()`, pandas will under the hood call this function for each the column name individually, and use the return value as the renamed column name.\n",
"- Make sure to explicitly set the `columns=` parameter. \n",
" \n",
"__NOTE__ The function `clean_column_name` takes as input a string and returns the string after removing the prefix and suffix. \n",
"\n",
"- The pandas method `rename` applies this function to each column name individually. \n",
"- `removeprefix()` and `removesuffix()` are [Python string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) to remove start/trailing characters if present.\n",
"__NOTE__ The function `clean_column_name` takes as input a string and returns the string after removing the prefix and suffix. `removeprefix()` and `removesuffix()` are [Python string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) to remove start/trailing characters if present.\n",
"\n",
"</details>\n",
"\n",
Expand All @@ -2524,10 +2521,27 @@
"metadata": {
"tags": []
},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"'DAY_OF_WEEK'"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def clean_column_name(name):\n",
" return name.removeprefix(\"TX_\").removesuffix(\"_DESCR_NL\")"
" \"\"\"\n",
" Takes a string and returns it after removing \"TX_\" and \"_DESCR_NL\".\n",
" \"\"\"\n",
" return name.removeprefix(\"TX_\").removesuffix(\"_DESCR_NL\")\n",
"\n",
"# example to show what the 'clean_column_name' function does\n",
"clean_column_name(\"TX_DAY_OF_WEEK_DESCR_NL\")"
]
},
{
Expand Down
17 changes: 10 additions & 7 deletions notebooks/pandas_06_data_cleaning.md
Original file line number Diff line number Diff line change
Expand Up @@ -383,22 +383,25 @@ A number of the remaining metadata columns names have the `TX_` and the `_DESCR_

<details><summary>Hints</summary>

- Use the `rename` method and apply the mapping on the `columns`.
- The input of the `rename` method van be a dictionary or a function. Use the `clean_column_name` as the function to rename the columns.
- Make sure to explicitly set the columns= parameter.
- To rename columns, we can use the `rename()` method.
- The input of the `rename()` method can also be a function in addition to a dictionary. When passing a function to `rename()`, pandas will under the hood call this function for each the column name individually, and use the return value as the renamed column name.
- Make sure to explicitly set the `columns=` parameter.

__NOTE__ The function `clean_column_name` takes as input a string and returns the string after removing the prefix and suffix.

- The pandas method `rename` applies this function to each column name individually.
- `removeprefix()` and `removesuffix()` are [Python string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) to remove start/trailing characters if present.
__NOTE__ The function `clean_column_name` takes as input a string and returns the string after removing the prefix and suffix. `removeprefix()` and `removesuffix()` are [Python string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) to remove start/trailing characters if present.

</details>

</div>

```{code-cell} ipython3
def clean_column_name(name):
"""
Takes a string and returns it after removing "TX_" and "_DESCR_NL".
"""
return name.removeprefix("TX_").removesuffix("_DESCR_NL")

# example to show what the 'clean_column_name' function does
clean_column_name("TX_DAY_OF_WEEK_DESCR_NL")
```

```{code-cell} ipython3
Expand Down
5 changes: 3 additions & 2 deletions notebooks/visualization_02_seaborn.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1558,9 +1558,10 @@
"<details><summary>Hints</summary>\n",
"\n",
"- The sum of victims _for each_ hour of the day requires `groupby`. One can create a new column with the hour of the day or pass the hour directly to `groupby`.\n",
"- The groupby operation sets the key that you are grouping on as the index (row labels) of the result. The `reset_index()` method can be used to turn that index into normal dataframe columns.\n",
"- The `.dt` accessor provides access to all kinds of datetime information.\n",
"- `rename` requires a dictionary with a mapping of the old vs new names.\n",
"- A bar plot is in seaborn one of the `catplot` options. \n",
"- `rename()` requires a dictionary with a mapping of the old to new names.\n",
"- A bar plot is in seaborn one of the `catplot()` options. \n",
" \n",
"</details>"
]
Expand Down
5 changes: 3 additions & 2 deletions notebooks/visualization_02_seaborn.md
Original file line number Diff line number Diff line change
Expand Up @@ -434,9 +434,10 @@ Use the `height` and `aspect` to adjust the figure width/height.
<details><summary>Hints</summary>

- The sum of victims _for each_ hour of the day requires `groupby`. One can create a new column with the hour of the day or pass the hour directly to `groupby`.
- The groupby operation sets the key that you are grouping on as the index (row labels) of the result. The `reset_index()` method can be used to turn that index into normal dataframe columns.
- The `.dt` accessor provides access to all kinds of datetime information.
- `rename` requires a dictionary with a mapping of the old vs new names.
- A bar plot is in seaborn one of the `catplot` options.
- `rename()` requires a dictionary with a mapping of the old to new names.
- A bar plot is in seaborn one of the `catplot()` options.

</details>

Expand Down