diff --git a/notebooks/pandas_06_data_cleaning.ipynb b/notebooks/pandas_06_data_cleaning.ipynb index 3c3d6c6..a6696ff 100644 --- a/notebooks/pandas_06_data_cleaning.ipynb +++ b/notebooks/pandas_06_data_cleaning.ipynb @@ -2503,14 +2503,11 @@ "\n", "
Hints\n", " \n", - "- Use the `rename` method and apply the mapping on the `columns`.\n", - "- The input of the `rename` method van be a dictionary or a function. Use the `clean_column_name` as the function to rename the columns. \n", - "- Make sure to explicitly set the columns= parameter. \n", + "- To rename columns, we can use the `rename()` method.\n", + "- The input of the `rename()` method can also be a function in addition to a dictionary. When passing a function to `rename()`, pandas will under the hood call this function for each the column name individually, and use the return value as the renamed column name.\n", + "- Make sure to explicitly set the `columns=` parameter. \n", " \n", - "__NOTE__ The function `clean_column_name` takes as input a string and returns the string after removing the prefix and suffix. \n", - "\n", - "- The pandas method `rename` applies this function to each column name individually. \n", - "- `removeprefix()` and `removesuffix()` are [Python string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) to remove start/trailing characters if present.\n", + "__NOTE__ The function `clean_column_name` takes as input a string and returns the string after removing the prefix and suffix. `removeprefix()` and `removesuffix()` are [Python string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) to remove start/trailing characters if present.\n", "\n", "
\n", "\n", @@ -2524,10 +2521,27 @@ "metadata": { "tags": [] }, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "'DAY_OF_WEEK'" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "def clean_column_name(name):\n", - " return name.removeprefix(\"TX_\").removesuffix(\"_DESCR_NL\")" + " \"\"\"\n", + " Takes a string and returns it after removing \"TX_\" and \"_DESCR_NL\".\n", + " \"\"\"\n", + " return name.removeprefix(\"TX_\").removesuffix(\"_DESCR_NL\")\n", + "\n", + "# example to show what the 'clean_column_name' function does\n", + "clean_column_name(\"TX_DAY_OF_WEEK_DESCR_NL\")" ] }, { diff --git a/notebooks/pandas_06_data_cleaning.md b/notebooks/pandas_06_data_cleaning.md index 475247e..7e3b525 100644 --- a/notebooks/pandas_06_data_cleaning.md +++ b/notebooks/pandas_06_data_cleaning.md @@ -383,14 +383,11 @@ A number of the remaining metadata columns names have the `TX_` and the `_DESCR_
Hints -- Use the `rename` method and apply the mapping on the `columns`. -- The input of the `rename` method van be a dictionary or a function. Use the `clean_column_name` as the function to rename the columns. -- Make sure to explicitly set the columns= parameter. +- To rename columns, we can use the `rename()` method. +- The input of the `rename()` method can also be a function in addition to a dictionary. When passing a function to `rename()`, pandas will under the hood call this function for each the column name individually, and use the return value as the renamed column name. +- Make sure to explicitly set the `columns=` parameter. -__NOTE__ The function `clean_column_name` takes as input a string and returns the string after removing the prefix and suffix. - -- The pandas method `rename` applies this function to each column name individually. -- `removeprefix()` and `removesuffix()` are [Python string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) to remove start/trailing characters if present. +__NOTE__ The function `clean_column_name` takes as input a string and returns the string after removing the prefix and suffix. `removeprefix()` and `removesuffix()` are [Python string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) to remove start/trailing characters if present.
@@ -398,7 +395,13 @@ __NOTE__ The function `clean_column_name` takes as input a string and returns th ```{code-cell} ipython3 def clean_column_name(name): + """ + Takes a string and returns it after removing "TX_" and "_DESCR_NL". + """ return name.removeprefix("TX_").removesuffix("_DESCR_NL") + +# example to show what the 'clean_column_name' function does +clean_column_name("TX_DAY_OF_WEEK_DESCR_NL") ``` ```{code-cell} ipython3 diff --git a/notebooks/visualization_02_seaborn.ipynb b/notebooks/visualization_02_seaborn.ipynb index 4ab2e21..92a0ef1 100644 --- a/notebooks/visualization_02_seaborn.ipynb +++ b/notebooks/visualization_02_seaborn.ipynb @@ -1558,9 +1558,10 @@ "
Hints\n", "\n", "- The sum of victims _for each_ hour of the day requires `groupby`. One can create a new column with the hour of the day or pass the hour directly to `groupby`.\n", + "- The groupby operation sets the key that you are grouping on as the index (row labels) of the result. The `reset_index()` method can be used to turn that index into normal dataframe columns.\n", "- The `.dt` accessor provides access to all kinds of datetime information.\n", - "- `rename` requires a dictionary with a mapping of the old vs new names.\n", - "- A bar plot is in seaborn one of the `catplot` options. \n", + "- `rename()` requires a dictionary with a mapping of the old to new names.\n", + "- A bar plot is in seaborn one of the `catplot()` options. \n", " \n", "
" ] diff --git a/notebooks/visualization_02_seaborn.md b/notebooks/visualization_02_seaborn.md index db1d409..b8aeb58 100644 --- a/notebooks/visualization_02_seaborn.md +++ b/notebooks/visualization_02_seaborn.md @@ -434,9 +434,10 @@ Use the `height` and `aspect` to adjust the figure width/height.
Hints - The sum of victims _for each_ hour of the day requires `groupby`. One can create a new column with the hour of the day or pass the hour directly to `groupby`. +- The groupby operation sets the key that you are grouping on as the index (row labels) of the result. The `reset_index()` method can be used to turn that index into normal dataframe columns. - The `.dt` accessor provides access to all kinds of datetime information. -- `rename` requires a dictionary with a mapping of the old vs new names. -- A bar plot is in seaborn one of the `catplot` options. +- `rename()` requires a dictionary with a mapping of the old to new names. +- A bar plot is in seaborn one of the `catplot()` options.