"
+ ],
+ "text/plain": [
+ " City name Population Area square miles Population density\n",
+ "0 San Francisco 852469 46.87 18187.945381\n",
+ "1 San Jose 1015785 176.53 5754.177760\n",
+ "2 Sacramento 485199 97.92 4955.055147"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "6qh63m-ayb-c"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "## Exercise #1\n",
+ "\n",
+ "Modify the `cities` table by adding a new boolean column that is True if and only if *both* of the following are True:\n",
+ "\n",
+ " * The city is named after a saint.\n",
+ " * The city has an area greater than 50 square miles.\n",
+ "\n",
+ "**Note:** Boolean `Series` are combined using the bitwise, rather than the traditional boolean, operators. For example, when performing *logical and*, use `&` instead of `and`.\n",
+ "\n",
+ "**Hint:** \"San\" in Spanish means \"saint.\""
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "code",
+ "id": "zCOn8ftSyddH",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 143
+ },
+ "outputId": "59689928-0c91-43d2-963f-6e5d7c16a527"
+ },
+ "cell_type": "code",
+ "source": [
+ "# Your code here\n",
+ "cities['Is wide and has saint name'] = (cities['Area square miles'] > 50) & cities['City name'].apply(lambda name: name.startswith('San'))\n",
+ "cities"
+ ],
+ "execution_count": 15,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
City name
\n",
+ "
Population
\n",
+ "
Area square miles
\n",
+ "
Population density
\n",
+ "
Is wide and has saint name
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
San Francisco
\n",
+ "
852469
\n",
+ "
46.87
\n",
+ "
18187.945381
\n",
+ "
False
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
San Jose
\n",
+ "
1015785
\n",
+ "
176.53
\n",
+ "
5754.177760
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
Sacramento
\n",
+ "
485199
\n",
+ "
97.92
\n",
+ "
4955.055147
\n",
+ "
False
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City name Population Area square miles Population density \\\n",
+ "0 San Francisco 852469 46.87 18187.945381 \n",
+ "1 San Jose 1015785 176.53 5754.177760 \n",
+ "2 Sacramento 485199 97.92 4955.055147 \n",
+ "\n",
+ " Is wide and has saint name \n",
+ "0 False \n",
+ "1 True \n",
+ "2 False "
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "YHIWvc9Ms-Ll"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "### Solution\n",
+ "\n",
+ "Click below for a solution."
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "code",
+ "id": "T5OlrqtdtCIb",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 143
+ },
+ "outputId": "68302cf2-5454-4e78-c2a4-de3617cfd167"
+ },
+ "cell_type": "code",
+ "source": [
+ "cities['Is wide and has saint name'] = (cities['Area square miles'] > 50) & cities['City name'].apply(lambda name: name.startswith('San'))\n",
+ "cities"
+ ],
+ "execution_count": 16,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
City name
\n",
+ "
Population
\n",
+ "
Area square miles
\n",
+ "
Population density
\n",
+ "
Is wide and has saint name
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
San Francisco
\n",
+ "
852469
\n",
+ "
46.87
\n",
+ "
18187.945381
\n",
+ "
False
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
San Jose
\n",
+ "
1015785
\n",
+ "
176.53
\n",
+ "
5754.177760
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
Sacramento
\n",
+ "
485199
\n",
+ "
97.92
\n",
+ "
4955.055147
\n",
+ "
False
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City name Population Area square miles Population density \\\n",
+ "0 San Francisco 852469 46.87 18187.945381 \n",
+ "1 San Jose 1015785 176.53 5754.177760 \n",
+ "2 Sacramento 485199 97.92 4955.055147 \n",
+ "\n",
+ " Is wide and has saint name \n",
+ "0 False \n",
+ "1 True \n",
+ "2 False "
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 16
+ }
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "f-xAOJeMiXFB"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "## Indexes\n",
+ "Both `Series` and `DataFrame` objects also define an `index` property that assigns an identifier value to each `Series` item or `DataFrame` row. \n",
+ "\n",
+ "By default, at construction, *pandas* assigns index values that reflect the ordering of the source data. Once created, the index values are stable; that is, they do not change when data is reordered."
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "code",
+ "id": "2684gsWNinq9",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "b11186b3-1c6a-4aa4-ebd7-00818a7394e7"
+ },
+ "cell_type": "code",
+ "source": [
+ "city_names.index"
+ ],
+ "execution_count": 17,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "RangeIndex(start=0, stop=3, step=1)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 17
+ }
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "code",
+ "id": "F_qPe2TBjfWd",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "be2d6e8d-d7ed-4259-e79b-7011b121c3ea"
+ },
+ "cell_type": "code",
+ "source": [
+ "cities.index"
+ ],
+ "execution_count": 18,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "RangeIndex(start=0, stop=3, step=1)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 18
+ }
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "hp2oWY9Slo_h"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "Call `DataFrame.reindex` to manually reorder the rows. For example, the following has the same effect as sorting by city name:"
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "code",
+ "id": "sN0zUzSAj-U1",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 143
+ },
+ "outputId": "a2f5d075-3435-4209-9e7b-c6bd3a1317b7"
+ },
+ "cell_type": "code",
+ "source": [
+ "cities.reindex([2, 0, 1])"
+ ],
+ "execution_count": 19,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
City name
\n",
+ "
Population
\n",
+ "
Area square miles
\n",
+ "
Population density
\n",
+ "
Is wide and has saint name
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
2
\n",
+ "
Sacramento
\n",
+ "
485199
\n",
+ "
97.92
\n",
+ "
4955.055147
\n",
+ "
False
\n",
+ "
\n",
+ "
\n",
+ "
0
\n",
+ "
San Francisco
\n",
+ "
852469
\n",
+ "
46.87
\n",
+ "
18187.945381
\n",
+ "
False
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
San Jose
\n",
+ "
1015785
\n",
+ "
176.53
\n",
+ "
5754.177760
\n",
+ "
True
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City name Population Area square miles Population density \\\n",
+ "2 Sacramento 485199 97.92 4955.055147 \n",
+ "0 San Francisco 852469 46.87 18187.945381 \n",
+ "1 San Jose 1015785 176.53 5754.177760 \n",
+ "\n",
+ " Is wide and has saint name \n",
+ "2 False \n",
+ "0 False \n",
+ "1 True "
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 19
+ }
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "-GQFz8NZuS06"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "Reindexing is a great way to shuffle (randomize) a `DataFrame`. In the example below, we take the index, which is array-like, and pass it to NumPy's `random.permutation` function, which shuffles its values in place. Calling `reindex` with this shuffled array causes the `DataFrame` rows to be shuffled in the same way.\n",
+ "Try running the following cell multiple times!"
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "code",
+ "id": "mF8GC0k8uYhz",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 143
+ },
+ "outputId": "d073772b-41d3-4260-d9d4-12520da345ec"
+ },
+ "cell_type": "code",
+ "source": [
+ "cities.reindex(np.random.permutation(cities.index))"
+ ],
+ "execution_count": 20,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
City name
\n",
+ "
Population
\n",
+ "
Area square miles
\n",
+ "
Population density
\n",
+ "
Is wide and has saint name
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
San Francisco
\n",
+ "
852469
\n",
+ "
46.87
\n",
+ "
18187.945381
\n",
+ "
False
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
San Jose
\n",
+ "
1015785
\n",
+ "
176.53
\n",
+ "
5754.177760
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
Sacramento
\n",
+ "
485199
\n",
+ "
97.92
\n",
+ "
4955.055147
\n",
+ "
False
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City name Population Area square miles Population density \\\n",
+ "0 San Francisco 852469 46.87 18187.945381 \n",
+ "1 San Jose 1015785 176.53 5754.177760 \n",
+ "2 Sacramento 485199 97.92 4955.055147 \n",
+ "\n",
+ " Is wide and has saint name \n",
+ "0 False \n",
+ "1 True \n",
+ "2 False "
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 20
+ }
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "fSso35fQmGKb"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "For more information, see the [Index documentation](http://pandas.pydata.org/pandas-docs/stable/indexing.html#index-objects)."
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "8UngIdVhz8C0"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "## Exercise #2\n",
+ "\n",
+ "The `reindex` method allows index values that are not in the original `DataFrame`'s index values. Try it and see what happens if you use such values! Why do you think this is allowed?"
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "code",
+ "id": "PN55GrDX0jzO",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 175
+ },
+ "outputId": "6d125adc-a36f-4717-b964-8b204a63b250"
+ },
+ "cell_type": "code",
+ "source": [
+ "# Your code here\n",
+ "cities.reindex([0, 4, 5, 2])"
+ ],
+ "execution_count": 21,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
City name
\n",
+ "
Population
\n",
+ "
Area square miles
\n",
+ "
Population density
\n",
+ "
Is wide and has saint name
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
San Francisco
\n",
+ "
852469.0
\n",
+ "
46.87
\n",
+ "
18187.945381
\n",
+ "
False
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
5
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
Sacramento
\n",
+ "
485199.0
\n",
+ "
97.92
\n",
+ "
4955.055147
\n",
+ "
False
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City name Population Area square miles Population density \\\n",
+ "0 San Francisco 852469.0 46.87 18187.945381 \n",
+ "4 NaN NaN NaN NaN \n",
+ "5 NaN NaN NaN NaN \n",
+ "2 Sacramento 485199.0 97.92 4955.055147 \n",
+ "\n",
+ " Is wide and has saint name \n",
+ "0 False \n",
+ "4 NaN \n",
+ "5 NaN \n",
+ "2 False "
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 21
+ }
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "TJffr5_Jwqvd"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "### Solution\n",
+ "\n",
+ "Click below for the solution."
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "8oSvi2QWwuDH"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "If your `reindex` input array includes values not in the original `DataFrame` index values, `reindex` will add new rows for these \"missing\" indices and populate all corresponding columns with `NaN` values:"
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "code",
+ "id": "yBdkucKCwy4x",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 175
+ },
+ "outputId": "6ef4d8ea-5e17-4387-c0f5-3f5ff8f28cb6"
+ },
+ "cell_type": "code",
+ "source": [
+ "cities.reindex([0, 4, 5, 2])"
+ ],
+ "execution_count": 22,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
City name
\n",
+ "
Population
\n",
+ "
Area square miles
\n",
+ "
Population density
\n",
+ "
Is wide and has saint name
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
San Francisco
\n",
+ "
852469.0
\n",
+ "
46.87
\n",
+ "
18187.945381
\n",
+ "
False
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
5
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
Sacramento
\n",
+ "
485199.0
\n",
+ "
97.92
\n",
+ "
4955.055147
\n",
+ "
False
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City name Population Area square miles Population density \\\n",
+ "0 San Francisco 852469.0 46.87 18187.945381 \n",
+ "4 NaN NaN NaN NaN \n",
+ "5 NaN NaN NaN NaN \n",
+ "2 Sacramento 485199.0 97.92 4955.055147 \n",
+ "\n",
+ " Is wide and has saint name \n",
+ "0 False \n",
+ "4 NaN \n",
+ "5 NaN \n",
+ "2 False "
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 22
+ }
+ ]
+ },
+ {
+ "metadata": {
+ "colab_type": "text",
+ "id": "2l82PhPbwz7g"
+ },
+ "cell_type": "markdown",
+ "source": [
+ "This behavior is desirable because indexes are often strings pulled from the actual data (see the [*pandas* reindex\n",
+ "documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html) for an example\n",
+ "in which the index values are browser names).\n",
+ "\n",
+ "In this case, allowing \"missing\" indices makes it easy to reindex using an external list, as you don't have to worry about\n",
+ "sanitizing the input."
+ ]
+ }
+ ]
+}
\ No newline at end of file