Skip to content

Commit

Permalink
Quarto output
Browse files Browse the repository at this point in the history
  • Loading branch information
MartinSchobben committed Aug 14, 2024
1 parent 711fce8 commit 78b9260
Show file tree
Hide file tree
Showing 6 changed files with 125 additions and 104 deletions.
1 change: 0 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ post-render:
for i in $(QMD); do quarto convert $$i; done
- mv chapters/*.ipynb notebooks/ >/dev/null 2>&1
- for f in chapters/*.quarto_ipynb ; do mv -- "$f" "${f%.quarto_ipynb}.ipynb" >/dev/null 2>&1; done
python src/clean-nb.py
cp Makefile notebooks/

data:
Expand Down
6 changes: 3 additions & 3 deletions _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ parts:
- caption: Preamble
chapters:
- file: notebooks/how-to-cite
- caption: Floodmapping
chapters:
- file: notebooks/02_floodmapping
- caption: References
chapters:
- file: notebooks/references
- caption: Classification
chapters:
- file: notebooks/01_classification
- caption: Floodmapping
chapters:
- file: notebooks/02_floodmapping
root: README
146 changes: 80 additions & 66 deletions notebooks/01_classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Classification\n",
"Finding forests with satelite imagery\n"
"---\n",
"title: Classification\n",
"subtitle: Finding forests with satelite imagery\n",
"jupyter: \n",
" kernelspec:\n",
" name: \"01_classification\"\n",
" language: \"python\"\n",
" display_name: \"01_classification\"\n",
"format: \n",
" html:\n",
" code-fold: show\n",
"eval: true\n",
"---\n",
"\n",
"## Data Acquisition\n",
"In this chapter, we will employ machine learning techniques to classify a scene using satellite imagery. Specifically, we will utilize ``scikit-learn`` to implement two distinct classifiers and subsequently compare their results. To begin, we need to import the following modules."
]
},
{
Expand Down Expand Up @@ -71,12 +85,12 @@
"geom = {\n",
" 'type': 'Polygon',\n",
" 'coordinates': [[\n",
" [minx, miny],\n",
" [minx, maxy],\n",
" [maxx, maxy],\n",
" [maxx, miny],\n",
" [minx, miny]\n",
" ]]\n",
" [minx, miny`,\n",
" [minx, maxy`,\n",
" [maxx, maxy`,\n",
" [maxx, miny`,\n",
" [minx, miny`\n",
" ``\n",
"}\n",
"\n",
"# Set Temporal extent\n",
Expand All @@ -94,7 +108,7 @@
" \"https://earth-search.aws.element84.com/v1\"\n",
").search(\n",
" intersects=geom,\n",
" collections=[\"sentinel-2-l2a\"],\n",
" collections=[\"sentinel-2-l2a\"`,\n",
" datetime=date_query,\n",
" limit=100,\n",
").item_collection()\n",
Expand Down Expand Up @@ -127,7 +141,7 @@
"# lazily combine items\n",
"ds_odc = odc.stac.load(\n",
" items,\n",
" bands=[\"scl\", \"red\", \"green\", \"blue\", \"nir\"],\n",
" bands=[\"scl\", \"red\", \"green\", \"blue\", \"nir\"`,\n",
" chunks={'time': 5, 'x': 600, 'y': 600},\n",
" geobox=geobox,\n",
" resampling=\"bilinear\",\n",
Expand Down Expand Up @@ -159,15 +173,15 @@
" # include only vegetated, not_vegitated, water, and snow\n",
" return ((data > 3) & (data < 7)) | (data==11)\n",
"\n",
"ds_odc['valid'] = is_valid_pixel(ds_odc.scl)\n",
"ds_odc['valid'` = is_valid_pixel(ds_odc.scl)\n",
"#ds_odc.valid.sum(\"time\").plot()\n",
"\n",
"def avg(ds):\n",
" return (ds / ds.max() * 2)\n",
"\n",
"# compute the masked median\n",
"rgb_median = (\n",
" ds_odc[['red', 'green', 'blue']]\n",
" ds_odc[['red', 'green', 'blue'``\n",
" .where(ds_odc.valid)\n",
" .to_dataarray(dim=\"band\")\n",
" .transpose(..., \"band\")\n",
Expand All @@ -185,7 +199,7 @@
"metadata": {},
"source": [
"### False Color Image\n",
"In addition to the regular RGB Image, we can swap any of the bands from the visible spectrum with any other bands. In this specific case the red band has been changed to the near infrared band. This allows us to see vegetated areas more clearly, since they now appear in a bright red color. This is due to the fact that plants absorb regular red light while reflecting near infrared light [@nasa2020]."
"In addition to the regular RGB Image, we can swap any of the bands from the visible spectrum with any other bands. In this specific case the red band has been changed to the near infrared band. This allows us to see vegetated areas more clearly, since they now appear in a bright red color. This is due to the fact that plants absorb regular red light while reflecting near infrared light {cite}`nasa2020`."
]
},
{
Expand All @@ -196,7 +210,7 @@
"source": [
"# compute the false color image\n",
"fc_median = (\n",
" ds_odc[['nir', 'green', 'blue']]\n",
" ds_odc[['nir', 'green', 'blue'``\n",
" .where(ds_odc.valid)\n",
" .to_dataarray(dim=\"band\")\n",
" .transpose(..., \"band\")\n",
Expand All @@ -214,7 +228,7 @@
"metadata": {},
"source": [
"### NDVI Image\n",
"To get an first impression of the data, we can calculate the NDVI (Normalized Difference Vegetation Index) and plot it. The NDVI is calculated by useing the following formula. [@rouse1974monitoring]\n",
"To get an first impression of the data, we can calculate the NDVI (Normalized Difference Vegetation Index) and plot it. The NDVI is calculated by useing the following formula. {cite}`rouse1974monitoring`\n",
"\n",
"$$\n",
"NDVI = \\frac{NIR - Red}{NIR + Red}\n",
Expand Down Expand Up @@ -262,27 +276,27 @@
"source": [
"# Define Polygons\n",
"forest_areas = {\n",
" 0: [Polygon([(16.482772, 47.901753), (16.465133, 47.870124), (16.510142, 47.874382), (16.482772, 47.901753)])],\n",
" 1: [Polygon([(16.594079, 47.938855), (16.581914, 47.894454), (16.620233, 47.910268), (16.594079, 47.938855)])],\n",
" 2: [Polygon([(16.67984, 47.978998), (16.637263, 47.971091), (16.660376, 47.929123), (16.67984, 47.978998)])],\n",
" 3: [Polygon([(16.756477, 48.000286), (16.723024, 47.983256), (16.739446, 47.972916), (16.756477, 48.000286)])],\n",
" 4: [Polygon([(16.80696, 48.135923), (16.780806, 48.125583), (16.798445, 48.115243), (16.80696, 48.135923)])],\n",
" 5: [Polygon([(16.684097, 48.144438), (16.664634, 48.124366), (16.690788, 48.118892), (16.684097, 48.144438)])],\n",
" 6: [Polygon([(16.550894, 48.169984), (16.530822, 48.165118), (16.558801, 48.137139), (16.550894, 48.169984)])],\n",
" 7: [Polygon([(16.588604, 48.402329), (16.556976, 48.401112), (16.580697, 48.382865), (16.588604, 48.402329)])],\n",
" 0: [Polygon([(16.482772, 47.901753), (16.465133, 47.870124), (16.510142, 47.874382), (16.482772, 47.901753)`)`,\n",
" 1: [Polygon([(16.594079, 47.938855), (16.581914, 47.894454), (16.620233, 47.910268), (16.594079, 47.938855)`)`,\n",
" 2: [Polygon([(16.67984, 47.978998), (16.637263, 47.971091), (16.660376, 47.929123), (16.67984, 47.978998)`)`,\n",
" 3: [Polygon([(16.756477, 48.000286), (16.723024, 47.983256), (16.739446, 47.972916), (16.756477, 48.000286)`)`,\n",
" 4: [Polygon([(16.80696, 48.135923), (16.780806, 48.125583), (16.798445, 48.115243), (16.80696, 48.135923)`)`,\n",
" 5: [Polygon([(16.684097, 48.144438), (16.664634, 48.124366), (16.690788, 48.118892), (16.684097, 48.144438)`)`,\n",
" 6: [Polygon([(16.550894, 48.169984), (16.530822, 48.165118), (16.558801, 48.137139), (16.550894, 48.169984)`)`,\n",
" 7: [Polygon([(16.588604, 48.402329), (16.556976, 48.401112), (16.580697, 48.382865), (16.588604, 48.402329)`)`,\n",
"}\n",
"\n",
"nonforest_areas = {\n",
" 0: [Polygon([(16.674974, 48.269126), (16.623882, 48.236281), (16.682272, 48.213168), (16.674974, 48.269126)])],\n",
" 1: [Polygon([(16.375723, 48.228374), (16.357476, 48.188839), (16.399444, 48.185798), (16.375723, 48.228374)])],\n",
" 2: [Polygon([(16.457834, 48.26426), (16.418907, 48.267301), (16.440804, 48.23324), (16.457834, 48.26426)])],\n",
" 3: [Polygon([(16.519266, 48.101861), (16.470607, 48.100645), (16.500411, 48.07145), (16.519266, 48.101861)])],\n",
" 4: [Polygon([(16.453577, 48.051986), (16.412217, 48.067192), (16.425598, 48.012451), (16.453577, 48.051986)])],\n",
" 0: [Polygon([(16.674974, 48.269126), (16.623882, 48.236281), (16.682272, 48.213168), (16.674974, 48.269126)`)`,\n",
" 1: [Polygon([(16.375723, 48.228374), (16.357476, 48.188839), (16.399444, 48.185798), (16.375723, 48.228374)`)`,\n",
" 2: [Polygon([(16.457834, 48.26426), (16.418907, 48.267301), (16.440804, 48.23324), (16.457834, 48.26426)`)`,\n",
" 3: [Polygon([(16.519266, 48.101861), (16.470607, 48.100645), (16.500411, 48.07145), (16.519266, 48.101861)`)`,\n",
" 4: [Polygon([(16.453577, 48.051986), (16.412217, 48.067192), (16.425598, 48.012451), (16.453577, 48.051986)`)`,\n",
"}\n",
"\n",
"# Geoppandas Dataframe from Polygons\n",
"forest_df = gpd.GeoDataFrame({'geometry': [poly[0] for poly in forest_areas.values()]}, crs=\"EPSG:4326\")\n",
"nonforest_df = gpd.GeoDataFrame({'geometry': [poly[0] for poly in nonforest_areas.values()]}, crs=\"EPSG:4326\")\n",
"forest_df = gpd.GeoDataFrame({'geometry': [poly[0` for poly in forest_areas.values()`}, crs=\"EPSG:4326\")\n",
"nonforest_df = gpd.GeoDataFrame({'geometry': [poly[0` for poly in nonforest_areas.values()`}, crs=\"EPSG:4326\")\n",
"\n",
"\n",
"# Plotting Regions of Interest\n",
Expand Down Expand Up @@ -310,9 +324,9 @@
"outputs": [],
"source": [
"# Classifiying dataset (only necessary bands)\n",
"bands = ['red', 'green', 'blue', 'nir']\n",
"bands = ['red', 'green', 'blue', 'nir'`\n",
"ds_class = (\n",
" ds_odc[bands]\n",
" ds_odc[bands`\n",
" .where(ds_odc.valid)\n",
" .median(dim=\"time\")\n",
")\n",
Expand All @@ -329,24 +343,24 @@
"data_dict_nonfeat = {idx: clip_array(ds_class, polygon) for idx, polygon in nonforest_areas.items()}\n",
"\n",
"# Reshape the polygon dataarrays to get a tuple (one value per band) of pixel values\n",
"feat_data = [xarray.to_array().values.reshape(len(bands),-1).T for xarray in data_dict_feat.values()] # replaced median_data_dict_feat with data_dict_feat\n",
"nonfeat_data = [xarray.to_array().values.reshape(len(bands),-1).T for xarray in data_dict_nonfeat.values()] # replaced median_data_dict_feat with data_dict_feat\n",
"feat_data = [xarray.to_array().values.reshape(len(bands),-1).T for xarray in data_dict_feat.values()` # replaced median_data_dict_feat with data_dict_feat\n",
"nonfeat_data = [xarray.to_array().values.reshape(len(bands),-1).T for xarray in data_dict_nonfeat.values()` # replaced median_data_dict_feat with data_dict_feat\n",
"\n",
"# The rows of the different polygons are concatenated to a single array for further processing\n",
"feat_values = np.concatenate(feat_data)\n",
"nonfeat_values = np.concatenate(nonfeat_data)\n",
"\n",
"# Drop Nan Values\n",
"X_feat_data = feat_values[~np.isnan(feat_values).any(axis=1)]\n",
"X_nonfeat_data = nonfeat_values[~np.isnan(nonfeat_values).any(axis=1)]\n",
"X_feat_data = feat_values[~np.isnan(feat_values).any(axis=1)`\n",
"X_nonfeat_data = nonfeat_values[~np.isnan(nonfeat_values).any(axis=1)`\n",
"\n",
"# Creating Output Vector (1 for pixel is features; 0 for pixel is not feature)\n",
"y_feat_data = np.ones(X_feat_data.shape[0])\n",
"y_nonfeat_data = np.zeros(X_nonfeat_data.shape[0])\n",
"y_feat_data = np.ones(X_feat_data.shape[0`)\n",
"y_nonfeat_data = np.zeros(X_nonfeat_data.shape[0`)\n",
"\n",
"# Concatnate all Classes for training \n",
"X = np.concatenate([X_feat_data, X_nonfeat_data])\n",
"y = np.concatenate([y_feat_data, y_nonfeat_data])\n",
"X = np.concatenate([X_feat_data, X_nonfeat_data`)\n",
"y = np.concatenate([y_feat_data, y_nonfeat_data`)\n",
"\n",
"# Split into Training and Testing Data.\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)"
Expand All @@ -365,10 +379,10 @@
"metadata": {},
"outputs": [],
"source": [
"image_data = ds_class[bands].to_array(dim='band').transpose('latitude', 'longitude', 'band')\n",
"image_data = ds_class[bands`.to_array(dim='band').transpose('latitude', 'longitude', 'band')\n",
"\n",
"# Reshape the image data\n",
"num_of_pixels = ds_class.sizes['longitude'] * ds_class.sizes['latitude']\n",
"num_of_pixels = ds_class.sizes['longitude'` * ds_class.sizes['latitude'`\n",
"num_of_bands = len(bands)\n",
"X_image_data = image_data.values.reshape(num_of_pixels, num_of_bands)"
]
Expand Down Expand Up @@ -396,10 +410,10 @@
"\n",
"# Prediction on image\n",
"nb_predict_img = nb.predict(X_image_data)\n",
"nb_predict_img = nb_predict_img.reshape(ds_class.sizes['latitude'], ds_class.sizes['longitude'])\n",
"nb_predict_img = nb_predict_img.reshape(ds_class.sizes['latitude'`, ds_class.sizes['longitude'`)\n",
"\n",
"# Adding the Naive Bayes Prediction to the dataset\n",
"ds_class['NB-forest'] = xr.DataArray(nb_predict_img, dims=['latitude', 'longitude'], coords={'longitude': ds_class['longitude'], 'latitude': ds_class['latitude']})"
"ds_class['NB-forest'` = xr.DataArray(nb_predict_img, dims=['latitude', 'longitude'`, coords={'longitude': ds_class['longitude'`, 'latitude': ds_class['latitude'`})"
]
},
{
Expand All @@ -417,11 +431,11 @@
"source": [
"# Plot Naive Bayes\n",
"alpha = 1\t\n",
"cmap_green = colors.ListedColormap([(1, 1, 1, alpha), 'green'])\n",
"cmap_green = colors.ListedColormap([(1, 1, 1, alpha), 'green'`)\n",
"\n",
"plot = ds_class['NB-forest'].plot.imshow(cmap=cmap_green, cbar_kwargs={'ticks': [0.25,0.75]})\n",
"plot = ds_class['NB-forest'`.plot.imshow(cmap=cmap_green, cbar_kwargs={'ticks': [0.25,0.75`})\n",
"cbar = plot.colorbar\n",
"cbar.set_ticklabels(['non-forest', 'forest'])\n",
"cbar.set_ticklabels(['non-forest', 'forest'`)\n",
"plot.axes.set_title('Naive Bayes Classification')\n",
"plt.show()\n",
"\n",
Expand All @@ -430,8 +444,8 @@
"\n",
"# Print the confusion matrix\n",
"con_mat_nb = pd.DataFrame(confusion_matrix(y_test, nb_predict), \n",
" index=['Actual Negative', 'Actual Positive'], \n",
" columns=['Predicted Negative', 'Predicted Positive'])\n",
" index=['Actual Negative', 'Actual Positive'`, \n",
" columns=['Predicted Negative', 'Predicted Positive'`)\n",
"display(con_mat_nb)"
]
},
Expand All @@ -456,14 +470,14 @@
"\n",
"# Prediction on image\n",
"rf_predict_img = rf.predict(X_image_data)\n",
"rf_predict_img = rf_predict_img.reshape(ds_class.sizes['latitude'], ds_class.sizes['longitude'])\n",
"rf_predict_img = rf_predict_img.reshape(ds_class.sizes['latitude'`, ds_class.sizes['longitude'`)\n",
"\n",
"# Adding the Random Forest Prediction to the dataset\n",
"ds_class['RF-forest'] = xr.DataArray(rf_predict_img, dims=['latitude', 'longitude'], coords={'longitude': ds_class['longitude'], 'latitude': ds_class['latitude']})\n",
"ds_class['RF-forest'` = xr.DataArray(rf_predict_img, dims=['latitude', 'longitude'`, coords={'longitude': ds_class['longitude'`, 'latitude': ds_class['latitude'`})\n",
"\n",
"plot = ds_class['RF-forest'].plot.imshow(cmap=cmap_green, cbar_kwargs={'ticks': [0.25,0.75]})\n",
"plot = ds_class['RF-forest'`.plot.imshow(cmap=cmap_green, cbar_kwargs={'ticks': [0.25,0.75`})\n",
"cbar = plot.colorbar\n",
"cbar.set_ticklabels(['non-forest', 'forest'])\n",
"cbar.set_ticklabels(['non-forest', 'forest'`)\n",
"plot.axes.set_title('Random Forest Classification')\n",
"plt.show()\n",
"\n",
Expand All @@ -472,8 +486,8 @@
"\n",
"# Print the confusion matrix\n",
"con_mat_rf = pd.DataFrame(confusion_matrix(y_test, rf_predict), \n",
" index=['Actual Negative', 'Actual Positive'], \n",
" columns=['Predicted Negative', 'Predicted Positive'])\n",
" index=['Actual Negative', 'Actual Positive'`, \n",
" columns=['Predicted Negative', 'Predicted Positive'`)\n",
"display(con_mat_rf)"
]
},
Expand All @@ -495,17 +509,17 @@
"outputs": [],
"source": [
"#| code-fold: true\n",
"cmap_trio = colors.ListedColormap(['whitesmoke' ,'indianred', 'goldenrod', 'darkgreen'])\n",
"cmap_trio = colors.ListedColormap(['whitesmoke' ,'indianred', 'goldenrod', 'darkgreen'`)\n",
"\n",
"\n",
"double_clf = (ds_class['NB-forest'] + 2*ds_class['RF-forest'])\n",
"double_clf = (ds_class['NB-forest'` + 2*ds_class['RF-forest'`)\n",
"\n",
"fig, ax = plt.subplots()\n",
"cax = ax.imshow(double_clf, cmap=cmap_trio, interpolation='none')\n",
"\n",
"# Add a colorbar with custom tick labels\n",
"cbar = fig.colorbar(cax, ticks=[1*0.375, 3*0.375, 5*0.375, 7*0.375])\n",
"cbar.ax.set_yticklabels(['None', 'Naive Bayes', 'Random Forest', 'Both'])\n",
"cbar = fig.colorbar(cax, ticks=[1*0.375, 3*0.375, 5*0.375, 7*0.375`)\n",
"cbar.ax.set_yticklabels(['None', 'Naive Bayes', 'Random Forest', 'Both'`)\n",
"ax.set_title('Classification Comparisson')\n",
"ax.set_axis_off()\n",
"plt.show()"
Expand All @@ -530,11 +544,11 @@
"ax = axs.ravel()\n",
"\n",
"for i in range(4):\n",
" ax[i].imshow(double_clf==i, cmap='cmc.oleron_r', interpolation='none')\n",
" category = ['by None', 'only by Naive Bayes', 'only by Random Forest', 'by Both'][i]\n",
" ax[i`.imshow(double_clf==i, cmap='cmc.oleron_r', interpolation='none')\n",
" category = ['by None', 'only by Naive Bayes', 'only by Random Forest', 'by Both'`[i`\n",
" title = 'Areas classified ' + category\n",
" ax[i].set_title(title)\n",
" ax[i].set_axis_off()\n",
" ax[i`.set_title(title)\n",
" ax[i`.set_axis_off()\n",
"\n",
"plt.tight_layout()"
]
Expand All @@ -560,10 +574,10 @@
"counts = {}\n",
"for num in range(0,4):\n",
" num_2_class = {0: 'None', 1: 'Naive Bayes', 2: 'Random Forest', 3: 'Both'}\n",
" counts[num_2_class[num]] = int((double_clf == num).sum().values)\n",
" counts[num_2_class[num`` = int((double_clf == num).sum().values)\n",
"\n",
"class_counts_df = pd.DataFrame(list(counts.items()), columns=['Class', 'Count'])\n",
"class_counts_df['Percentage'] = (class_counts_df['Count'] / class_counts_df['Count'].sum())*100\n",
"class_counts_df = pd.DataFrame(list(counts.items()), columns=['Class', 'Count'`)\n",
"class_counts_df['Percentage'` = (class_counts_df['Count'` / class_counts_df['Count'`.sum())*100\n",
"ax = class_counts_df.plot.bar(x='Class', y='Percentage', rot=0, color='darkgreen', ylim=(0,100), title='Classified Areas per Classificator (%)')\n",
"\n",
"# Annotate the bars with the percentage values\n",
Expand Down
Loading

0 comments on commit 78b9260

Please sign in to comment.