Implement seasonal training data in UNET_G4G_2019_solar.ipynb

limiao766 · May 6, 2020 · a5adc95 · a5adc95
1 parent 3229d3e
commit a5adc95
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/UNET_G4G_2019_solar.ipynb b/UNET_G4G_2019_solar.ipynb
@@ -1 +1 @@
-{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"UNET_G4G_2019_solar.ipynb","provenance":[],"private_outputs":true,"collapsed_sections":[],"toc_visible":true,"machine_shape":"hm"},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"view-in-github","colab_type":"text"},"source":["<a href=\"https://colab.research.google.com/github/mjevans26/Satellite_ComputerVision/blob/master/UNET_G4G_2019_solar.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"]},{"cell_type":"code","metadata":{"id":"esIMGVxhDI0f","colab_type":"code","colab":{}},"source":["#@title Author: Michael Evans { display-mode: \"form\" }\n","# Licensed under the Apache License, Version 2.0 (the \"License\");\n","# you may not use this file except in compliance with the License.\n","# You may obtain a copy of the License at\n","#\n","# https://www.apache.org/licenses/LICENSE-2.0\n","#\n","# Unless required by applicable law or agreed to in writing, software\n","# distributed under the License is distributed on an \"AS IS\" BASIS,\n","# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n","# See the License for the specific language governing permissions and\n","# limitations under the License."],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"_SHAc5qbiR8l","colab_type":"text"},"source":["# Introduction\n","\n","This notebook demonstrates methods used to delineate all of the ground-mounted solar arrays in North Carolina using free satellite imagery.  Our workflow generates and exports satellite imagery data from Google Earth Engine for analysis in Tensorflow.  This analysis predicts the probability of the presence of a solar array as a function of the visible, infrared, and near infrared bands in Sentinel-2 imagery.  The model is a [fully convolutional neural network (FCNN)](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf), specifically [U-net](https://arxiv.org/abs/1505.04597).  This relatively simple model is a mostly unmodified version of [this example](https://github.com/tensorflow/models/blob/master/samples/outreach/blogs/segmentation_blogpost/image_segmentation.ipynb) from the TensorFlow docs.  This notebook shows:\n","\n","1.   Exporting training/testing patches from Earth Engine, suitable for training an FCNN model.\n","2.   Preprocessing.\n","3.   Training and validating an FCNN model.\n","4.   Making predictions with the trained model and importing them to Earth Engine."]},{"cell_type":"markdown","metadata":{"id":"_MJ4kW1pEhwP","colab_type":"text"},"source":["# Setup software libraries\n","\n","Install needed libraries to the notebook VM.  Authenticate as necessary."]},{"cell_type":"code","metadata":{"id":"neIa46CpciXq","colab_type":"code","colab":{}},"source":["# Cloud authentication.\n","from google.colab import auth\n","auth.authenticate_user()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"4D6ArFWrckmS","colab_type":"code","colab":{}},"source":["#@title Earth Engine install to notebook VM.\n","!pip install earthengine-api"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"jat01FEoUMqg","colab_type":"code","colab":{}},"source":["# Import, authenticate and initialize the Earth Engine library.\n","import ee\n","ee.Authenticate()\n","ee.Initialize()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"8RnZzcYhcpsQ","colab_type":"code","colab":{}},"source":["# Tensorflow setup.\n","import tensorflow as tf\n","device_name = tf.test.gpu_device_name()\n","tf.executing_eagerly()\n","print(tf.__version__)\n","print(device_name)\n","%load_ext tensorboard"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"n1hFdpBQfyhN","colab_type":"code","colab":{}},"source":["# Folium setup.\n","import folium\n","print(folium.__version__)\n","\n","# Define a method for displaying Earth Engine image tiles to a folium map.\n","def add_ee_layer(self, ee_image_object, vis_params, name):\n","  map_id_dict = ee.Image(ee_image_object).getMapId(vis_params)\n","  folium.raster_layers.TileLayer(\n","    tiles = map_id_dict['tile_fetcher'].url_format,\n","    attr = \"Map Data © Google Earth Engine\",\n","    name = name,\n","    overlay = True,\n","    control = True\n","  ).add_to(self)\n","\n","# Add EE drawing method to folium.\n","folium.Map.add_ee_layer = add_ee_layer\n","\n","# Define the URL format used for Earth Engine generated map tiles.\n","#EE_TILES = 'https://earthengine.googleapis.com/map/{mapid}/{{z}}/{{x}}/{{y}}?token={token}'"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"WjUgYcsAs9Ed","colab_type":"text"},"source":["##Mount Google Drive"]},{"cell_type":"code","metadata":{"id":"JKDKpX4FtQA1","colab_type":"code","colab":{}},"source":["from google.colab import drive\n","drive.mount('/content/drive')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"OyLDVLD7VKsM","colab_type":"code","colab":{}},"source":["# move into repo directory containing modules we need\n","%cd '/content/drive/My Drive/repos/Satellite_ComputerVision'"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"iT8ycmzClYwf","colab_type":"text"},"source":["# Variables\n","\n","Declare the variables that will be in use throughout the notebook."]},{"cell_type":"markdown","metadata":{"id":"qKs6HuxOzjMl","colab_type":"text"},"source":["Specify a cloud storage bucket to which you have read/write access"]},{"cell_type":"code","metadata":{"id":"obDDH1eDzsch","colab_type":"code","colab":{}},"source":["from os.path import join\n","BUCKET = 'cvod-203614-mlengine'\n","BUCKET_PATH = join('gs://', BUCKET)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"wmfKLl9XcnGJ","colab_type":"text"},"source":["## Set other global variables"]},{"cell_type":"code","metadata":{"id":"psz7wJKalaoj","colab_type":"code","colab":{}},"source":["# Specify names locations for outputs in Cloud Storage. \n","FOLDER = 'NC_solar'\n","PRED_BASE = 'data/predict'\n","TRAIN_BASE = 'data/training'\n","EVAL_BASE = 'data/eval'\n","MODEL_BASE = 'models/UNET256'\n","log_dir = 'drive/My Drive/Tensorflow/models/UNET256'\n","\n","# Specify inputs (Sentinel bands) to the model and the response variable.\n","opticalBands = ['B2', 'B3', 'B4']\n","thermalBands = ['B8', 'B11', 'B12']\n","# We may want to run some experiments where we use pca components\n","pcaBands = ['pc1', 'pc2', 'pc3']\n","BANDS = opticalBands + thermalBands# + pcaBands\n","RESPONSE = 'landcover'\n","FEATURES = BANDS + [RESPONSE]\n","SCENEID = 'SENSING_ORBIT_NUMBER'\n","\n","# Specify the size and shape of patches expected by the model.\n","KERNEL_SIZE = 256\n","KERNEL_SHAPE = [KERNEL_SIZE, KERNEL_SIZE]\n","COLUMNS = [\n","  tf.io.FixedLenFeature(shape=KERNEL_SHAPE, dtype=tf.float32) for k in FEATURES\n","]\n","FEATURES_DICT = dict(zip(FEATURES, COLUMNS))\n","\n","# Sizes of the training and evaluation datasets.\n","TRAIN_SIZE = 7700\n","EVAL_SIZE = 3300\n","\n","# Specify model training parameters.\n","BATCH_SIZE = 16\n","EPOCHS = 20\n","BUFFER_SIZE = 11000\n","OPTIMIZER = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999)\n","LOSS = 'binary_crossentropy'\n","METRICS = [tf.keras.metrics.categorical_accuracy, tf.keras.metrics.MeanIoU(num_classes=2)]"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"hgoDc7Hilfc4","colab_type":"text"},"source":["# Imagery\n","\n","Process the imagery to use for predictor variables.  This is a three-month, cloud-free, Sentinel-2 composite corresponding to the latest date from which we have confirmed training data.  Display it in the notebook for a sanity check."]},{"cell_type":"markdown","metadata":{"id":"MjNmEImcGuMb","colab_type":"text"},"source":["## Create sample image"]},{"cell_type":"code","metadata":{"id":"c12hxNU2S89-","colab_type":"code","colab":{}},"source":["# add the Google Drive repo directory to path so we can use our modules\n","import sys\n","sys.path.append('/content/drive/My Drive/repos/Satellite_ComputerVision/utils')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"-IlgXu-vcUEY","colab_type":"code","colab":{}},"source":["from clouds import basicQA\n","# Use Sentinel-2 surface reflectance data.\n","S2 = ee.ImageCollection(\"COPERNICUS/S2\")\n","# Grab a feature corresponding to our study area - North Carolina\n","states = ee.FeatureCollection(\"TIGER/2016/States\")\n","nc = states.filter(ee.Filter.eq('NAME', 'North Carolina'))\n","begin = '2019-01-01'\n","end = '2019-04-30'\n","\n","# The image input collection is cloud-masked.\n","filtered = S2.filterDate(begin, end)\\\n",".filterBounds(nc)\\\n",".filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20))\\\n",".map(basicQA)\n","\n","# Create a simple median composite to visualize\n","image = filtered.median().select(BANDS).clip(nc)\n","\n","# Use folium to visualize the imagery.\n","#mapid = image.getMapId({'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 0.3})\n","rgbParams = {'bands': ['B4', 'B3', 'B2'],\n","             'min': 0,\n","             'max': 0.3}\n","\n","nirParams = {'bands': ['B8', 'B11', 'B12'],\n","             'min': 0,\n","             'max': 0.3}\n","\n","map = folium.Map(location=[35.402, -78.376])\n","map.add_ee_layer(image, rgbParams, 'Color')\n","map.add_ee_layer(image, nirParams, 'Thermal')\n","\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"gHznnctkJsZJ","colab_type":"text"},"source":["Prepare the response variable.  This is the footprints of ground mounted solar arrays as of 2016, coded into a background class [0] and a target class [1]. Display on the map to verify."]},{"cell_type":"code","metadata":{"id":"5Wxz9BPYHBwh","colab_type":"code","colab":{}},"source":["def set_landcover(ft):\n","  \"\"\"\n","  Add a property to a feature and set it to 1\n","  Parameters:\n","    ft (ee.Feature): feature to have property added\n","  Returns:\n","    ee.Feature: input feature with 'label' property set to 1\n","  \"\"\"\n","  return ft.set('label', 1)\n","\n","# Get solar footprints data from our GEE Asset\n","NC_solar_footprints = ee.FeatureCollection(\"users/taylorminich/NC_solar_footprints\")\n","# Label each polygon with property 'label' equal to 1\n","NC_solar_footprints = NC_solar_footprints.map(set_landcover)\n","# Create an image with all pixels equal to 0\n","blankimg = ee.Image.constant(0)\n","# Convert solar footprints to an image (band value will be 1 based on 'label')\n","solar_footprint = NC_solar_footprints.reduceToImage(['label'], ee.Reducer.first())\n","# Convert pixels of blank image to 1 where the values of the footprint image are 1\n","# and rename to 'landcover'\n","labelimg = blankimg.where(solar_footprint, solar_footprint).rename('landcover')\n","\n","solarParams = {'bands': 'landcover', 'min':0, 'max': 1}\n","\n","map = folium.Map(location = [35.402, -78.376])\n","map.add_ee_layer(labelimg,  solarParams, 'Solar footprint')\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"F4djSxBRG2el","colab_type":"text"},"source":["Use some pre-made geometries to sample the stack in strategic locations.  We constrain sampling to occur within 10km of mapped solar arrays. Because our target features are small and sparse, relative to the landscape, we also guide sampling based on their centroids to ensure that we get training data for solar arrays."]},{"cell_type":"code","metadata":{"id":"ure_WaD0itQY","colab_type":"code","colab":{}},"source":["def buff(ft):\n","  return ft.buffer(10000)\n","\n","def centroid(ft):\n","  return ft.centroid()\n","\n","centroids = NC_solar_footprints.map(centroid)\n","studyArea = NC_solar_footprints.map(buff).union()\n","studyImage = ee.Image(0).byte().paint(studyArea, 1)\n","studyImage = studyImage.updateMask(studyImage)\n","\n","aoiParams = {'min':0, 'max': 1, 'palette': ['red']}\n","map = folium.Map(location=[35.402, -78.376], zoom_start=8)\n","map.add_ee_layer(studyImage, aoiParams, 'Sampling area')\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"BRHnejNf6SyM","colab_type":"text"},"source":["### Calibration (experimental)\n","For consistency in predictive ability across contexts, we calibrate all images in the collection to a standard, then normalize the image bands to [0,1] after squashing extreme values where the sensor was likely 'washed out'"]},{"cell_type":"code","metadata":{"id":"IdTBDdggTJrp","colab_type":"code","colab":{}},"source":["from calibration import equalize_collection, clamp_and_scale"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"wCECkm2xdrTp","colab_type":"code","colab":{}},"source":["# calibrate all scenes in the collection using histogram equalization\n","equalized = equalize_collection(filtered, BANDS, SCENEID)\n","\n","# need to cast all images in resulting collection to same type for \n","equalImage = equalized.cast(dict(zip(BANDS, ['float']*6)), BANDS).median()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"6iDwJS-9ToXf","colab_type":"code","colab":{}},"source":["# normalize the calibrated image to [0,1]\n","normImage = clamp_and_scale(equalImage, BANDS, 99, nc)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"CTS7_ZzPDhhg","colab_type":"text"},"source":["Stack the normalized sentinel composite and binary solar indicator image to create a single image from which samples can be taken.  Convert the image into an array image in which each pixel stores 256x256 patches of pixels for each band.  This is a key step that bears emphasis: to export training patches, convert a multi-band image to [an array image](https://developers.google.com/earth-engine/arrays_array_images#array-images) using [`neighborhoodToArray()`](https://developers.google.com/earth-engine/api_docs#eeimageneighborhoodtoarray), then sample the image at points."]},{"cell_type":"code","metadata":{"id":"eGHYsdAOipa4","colab_type":"code","colab":{}},"source":["featureStack = ee.Image.cat([\n","  image.select(BANDS),\n","  labelimg.select(RESPONSE)\n","]).float()\n","\n","list = ee.List.repeat(1, KERNEL_SIZE)\n","lists = ee.List.repeat(list, KERNEL_SIZE)\n","kernel = ee.Kernel.fixed(KERNEL_SIZE, KERNEL_SIZE, lists)\n","\n","arrays = featureStack.neighborhoodToArray(kernel)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ZV890gPHeZqz","colab_type":"text"},"source":["# Sampling\n","\n","The mapped data look reasonable so take a sample from each polygon and merge the results into a single export.  The key step is sampling the array image at points, to get all the pixels in a 256x256 neighborhood at each point.  It's worth noting that to build the training and testing data for the FCNN, you export a single TFRecord file that contains patches of pixel values in each record.  You do NOT need to export each training/testing patch to a different image.  Since each record potentially contains a lot of data (especially with big patches or many input bands), some manual sharding of the computation is necessary to avoid the `computed value too large` error.  Specifically, the following code takes multiple (smaller) samples within each geometry, merging the results to get a single export."]},{"cell_type":"code","metadata":{"id":"1T1cc6haU_oS","colab_type":"code","colab":{}},"source":["join(BUCKET_PATH, FOLDER, TRAIN_BASE, 'calibrated/')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"2CqL0Y6iLQPP","colab_type":"code","colab":{}},"source":["!gsutil mv {join(BUCKET_PATH, FOLDER, TRAINING_BASE, '*')} {join(BUCKET_PATH, FOLDER, TRAINING_BASE, 'calibrated/')}"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"VXes-Ot17RGI","colab_type":"code","colab":{}},"source":["!gsutil ls gs://cvod-203614-mlengine/NC_solar/data/predict"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"aJ4nGSvdYop6","colab_type":"text"},"source":["First we'll collect image patches from the centroids of known solar array locations"]},{"cell_type":"code","metadata":{"id":"FyRpvwENxE-A","colab_type":"code","cellView":"code","colab":{}},"source":["#@title Centroids slicing\n","# Get samples from delineated features using slice() on a feature collection\n","# THIS TAKES DAYS TO RUN...probably not the optimal\n","\n","# Add a random column to the centroids\n","centroids = centroids.randomColumn('random')\n","centroidList = centroids.toLis()\n","\n","S = centroidList.size().getInfo()\n","x = 0\n","\n","while x < S:\n","  region = centroidList.slice(x, x+10)\n","  sample = arrays.sampleRegions(\n","      collection = region,\n","      scale = 10,\n","      properties = ['label'],\n","      tileScale = 12\n","  )\n","  x += 10\n","                                  \n","  # assign a random number to samples and create a 70/30 train/test split\n","  sample = sample.randomColumn('random')\n","  training = sample.filter(ee.Filter.gte('random', 0.3))\n","  testing = sample.filter(ee.Filter.lt('random', 0.3))\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_trainCent' + str(x)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = training,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, TRAINING_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start()\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_evalCent' + str(x)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = testing,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, EVAL_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"YoJMncFKYwq2","colab_type":"code","colab":{}},"source":["#@title Centroids random sampling\n","\n","# Define sample sizes for shards and chunks. \n","# These numbers determined experimentally.\n","n = 100 # Number of shards in each chunk.\n","N = 200 # Total sample size in each chunk.\n","C = 5 # Number of chunks\n","\n","iterator = iter(range(N*C))\n","\n","# for each 'chunk' - which defines 2 export tasks per chunk: 1 train, 1 eval\n","for c in range(C):\n","  geomSample = ee.FeatureCollection([])\n","\n","  # for each 'shard' - which defines a batch of samples of size N/n\n","  for i in range(n):\n","    # generate a different seed for this iteration\n","    seed = next(iterator)\n","    sample = arrays.sample(\n","        region = NC_solar_footprints,\n","        scale = 10,\n","        numPixels = N/n,\n","        seed = seed,\n","        tileScale = 8\n","    )\n","    geomSample = geomSample.merge(sample)\n","\n","  #divide samples into training and evaluation data\n","  geomSample = geomSample.randomColumn('random')\n","  training = geomSample.filter(ee.Filter.gte('random', 0.3))\n","  testing = geomSample.filter(ee.Filter.lt('random', 0.3))\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_footprintTrain'+str(c)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = training,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, TRAINING_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start()\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_footprintEval' + str(c)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = testing,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, EVAL_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start() "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"QuRyLGmOYmrR","colab_type":"code","colab":{}},"source":["#@title Random sampling\n","\n","# Define sample sizes for shards and chunks. \n","# These numbers determined experimentally.\n","n = 100 # Number of shards in each chunk.\n","N = 1000 # Total sample size in each chunk.\n","C = 10# Number of chunks\n","\n","iterator = iter(range(N*C))\n","\n","for c in range(C):\n","  geomSample = ee.FeatureCollection([])\n","\n","  for i in range(n):\n","    seed = next(iterator)\n","    sample = arrays.sample(\n","        region = studyArea,\n","        scale = 10,\n","        numPixels = N/n,\n","        seed = seed,\n","        tileScale = 8\n","    )\n","    geomSample = geomSample.merge(sample)\n","\n","  #divide samples into training and evaluation data\n","  geomSample = geomSample.randomColumn('random')\n","  training = geomSample.filter(ee.Filter.gte('random', 0.3))\n","  testing = geomSample.filter(ee.Filter.lt('random', 0.3))\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_train'+str(c)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = training,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, TRAIN_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start()\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_eval' + str(c)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = testing,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, EVAL_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start() "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Nj1sFkUyYgnj","colab_type":"code","colab":{}},"source":[""],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"dk51-l7MH2Sa","colab_type":"text"},"source":["# Model data"]},{"cell_type":"markdown","metadata":{"id":"cC-zatoXNdlO","colab_type":"text"},"source":["## Preprocessing\n","Define functions that apply random manipulations to our training data"]},{"cell_type":"code","metadata":{"id":"ajyp48-vINuy","colab_type":"code","colab":{}},"source":["def augColor(x):\n","    \"\"\"Color augmentation\n","\n","    Args:\n","        x: Image\n","\n","    Returns:\n","        Augmented image\n","    \"\"\"\n","    x = tf.image.random_hue(x, 0.08)\n","    x = tf.image.random_saturation(x, 0.6, 1.6)\n","    x = tf.image.random_brightness(x, 0.05)\n","    x = tf.image.random_contrast(x, 0.7, 1.3)\n","    return x\n","  \n","def standard(x, axes=[0, 1, 2], epsilon=1e-8):\n","  \"\"\"\n","  Standardize incoming image patches by local mean and variance\n","  Parameters:\n","    x (tensor): nD image tensor\n","    axes (array): Array of ints. Axes along which to compute mean and variance, usually length n-1\n","    epsilon (float): small number to avoid dividing by zero\n","  Return:\n","    tensor: nD image tensor normalized by channels\n","  \"\"\"\n","  mean, variance = tf.nn.moments(x, axes=axes)\n","  x_normed = (x - mean) / tf.sqrt(variance + epsilon) # epsilon to avoid dividing by zero\n","  return x_normed\n","\n","def augImg(img):\n","  outDims = tf.shape(img)[0:1]\n","  x = tf.image.random_flip_left_right(img)\n","  x = tf.image.random_flip_up_down(x)\n","  x = rotated = tf.image.rot90(x, tf.random.uniform(shape=[], minval=0, maxval=4, dtype=tf.int32))\n","  #x = zoom(x, outDims)\n","  #since were gonna map_fn this on a 4d image, output must be 3d, so squeeze the artificial 'sample' dimension\n","  return tf.squeeze(x)\n","\n","def preprocess(img, labels):\n","  dims = tf.shape(img)\n","  #need to combine labels and bands for morphological transformations\n","  comb = tf.concat([img, tf.expand_dims(labels, axis = 2)], axis = 2)\n","  aug = aug_img(comb)\n","  #aug = tf.map_fn(fn = aug_img, elems = comb)\n","  labels = tf.squeeze(aug[:, :, -1])\n","  band_stack = color(aug[:, :, 0:dims[2]])\n","  return band_stack, labels"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"rWXrvBE4607G","colab_type":"text"},"source":["# Training data\n","\n","Load the data exported from Earth Engine into a `tf.data.Dataset`.  The following are helper functions for that."]},{"cell_type":"code","metadata":{"id":"WWZ0UXCVMyJP","colab_type":"code","colab":{}},"source":["def parse_tfrecord(example_proto):\n","  \"\"\"The parsing function.\n","  Read a serialized example into the structure defined by FEATURES_DICT.\n","  Args:\n","    example_proto: a serialized Example.\n","  Returns: \n","    A dictionary of tensors, keyed by feature name.\n","  \"\"\"\n","  return tf.io.parse_single_example(example_proto, FEATURES_DICT)\n","\n","\n","def to_tuple(inputs):\n","  \"\"\"Function to convert a dictionary of tensors to a tuple of (inputs, outputs).\n","  Turn the tensors returned by parse_tfrecord into a stack in HWC shape.\n","  Args:\n","    inputs: A dictionary of tensors, keyed by feature name.\n","  Returns: \n","    A dtuple of (inputs, outputs).\n","  \"\"\"\n","  inputsList = [inputs.get(key) for key in FEATURES]\n","  stacked = tf.stack(inputsList, axis=0)\n","  # Convert from CHW to HWC\n","  stacked = tf.transpose(stacked, [1, 2, 0])\n","  # Perform image augmentation\n","  stacked = augImg(stacked)\n","  # split input bands and labels\n","  bands = stacked[:,:,:len(BANDS)]\n","  labels = stacked[:,:,len(BANDS):]\n","  # standardize each patch of bands\n","  bands = standard(bands, [0, 1])\n","  return bands, labels \n","\n","def get_dataset(pattern):\n","  \"\"\"Function to read, parse and format to tuple a set of input tfrecord files.\n","  Get all the files matching the pattern, parse and convert to tuple.\n","  Args:\n","    pattern: A file pattern to match in a Cloud Storage bucket.\n","  Returns: \n","    A tf.data.Dataset\n","  \"\"\"\n","  glob = tf.io.gfile.glob(pattern)\n","  dataset = tf.data.TFRecordDataset(glob, compression_type='GZIP')\n","  dataset = dataset.map(parse_tfrecord, num_parallel_calls=5)\n","  dataset = dataset.map(to_tuple, num_parallel_calls=5)\n","  return dataset"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"Xg1fa18336D2","colab_type":"text"},"source":["Use the helpers to read in the training dataset.  Print the first record to check."]},{"cell_type":"code","metadata":{"id":"rm0qRF0fAYcC","colab_type":"code","colab":{}},"source":["def get_training_dataset(pattern):\n","\t\"\"\"Get the preprocessed training dataset\n","\tParameters:\n","\t\tpattern (str): directory path to training data\n","  Returns: \n","    A tf.data.Dataset of training data.\n","  \"\"\"\n","\t#glob = 'gs://' + BUCKET + '/' + FOLDER + '/' + TRAINING_BASE + '/*'\n","\tdataset = get_dataset(pattern)\n","\tdataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()\n","\treturn dataset"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"bk9rFou0J_dZ","colab_type":"code","colab":{}},"source":["# make sure we have training records\n","!gsutil ls {join(BUCKET_PATH, FOLDER, TRAIN_BASE, '*.tfrecord.gz')}"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"JzpG3kUwZ9J5","colab_type":"code","colab":{}},"source":["training = get_training_dataset(join(BUCKET_PATH, FOLDER, TRAIN_BASE, '*.tfrecord.gz'))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"cIueW4_Fs0ID","colab_type":"code","colab":{}},"source":["#check to make sure our records look like we expect\n","print(iter(training.take(1)).next())"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"j-cQO5RL6vob","colab_type":"text"},"source":["# Evaluation data\n","\n","Now do the same thing to get an evaluation dataset.  Note that unlike the training dataset, the evaluation dataset has a batch size of 1, is not repeated and is not shuffled."]},{"cell_type":"code","metadata":{"id":"ieKTCGiJ6xzo","colab_type":"code","colab":{}},"source":["def get_eval_dataset(pattern):\n","\t\"\"\"\n","\tGet the preprocessed evaluation dataset\n","\tParameters:\n","\t\tpattern (str): directory path to training data\n","  Returns: \n","    A tf.data.Dataset of evaluation data.\n","  \"\"\"\n","\t#glob = 'gs://' + BUCKET + '/' + FOLDER + '/' + EVAL_BASE + '/*'\n","\tdataset = get_dataset(pattern)\n","\tdataset = dataset.batch(1).repeat()\n","\treturn dataset"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"fkU1JcYlK1s3","colab_type":"code","colab":{}},"source":["# make sure we have eval data\n","path = join(BUCKET_PATH, FOLDER, EVAL_BASE, '*.tfrecord.gz')\n","!gsutil ls {path}"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"NpcsljQeKzq7","colab_type":"code","colab":{}},"source":["evaluation = get_eval_dataset(path)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"9JIE7Yl87lgU","colab_type":"text"},"source":["# Model\n","\n","Here we use the Keras implementation of the U-Net model as found [in the TensorFlow examples](https://github.com/tensorflow/models/blob/master/samples/outreach/blogs/segmentation_blogpost/image_segmentation.ipynb).  The U-Net model takes 256x256 pixel patches as input and outputs per-pixel class probability, label or a continuous output.  We can implement the model essentially unmodified, but will use mean squared error loss on the sigmoidal output since we are treating this as a regression problem, rather than a classification problem.  Since impervious surface fraction is constrained to [0,1], with many values close to zero or one, a saturating activation function is suitable here."]},{"cell_type":"markdown","metadata":{"id":"Xh2EZyyPu84H","colab_type":"text"},"source":["##Metrics"]},{"cell_type":"markdown","metadata":{"id":"HK6BKW_xMNqL","colab_type":"text"},"source":["We define a weighted binary cross entropy loss function because the training data is potentially sparse. This also gives us greater control over the rates of omission and commission prediciton errors. Because this is an image segmentation exercise, we may also be interested in the intersection over union as a loss measure."]},{"cell_type":"code","metadata":{"id":"wsnnnz56yS3l","colab_type":"code","colab":{}},"source":["from tensorflow.python.keras import layers\n","from tensorflow.python.keras import losses\n","from tensorflow.python.keras import models\n","from tensorflow.python.keras import metrics\n","from tensorflow.python.keras import optimizers\n","\n","def weighted_bce(y_true, y_pred):\n","    \"\"\"\n","    Compute the weighted binary cross entropy between predictions and observations\n","    Parameters:\n","        y_true (): 2D tensor of labels\n","        y_pred (): 2D tensor of probabilities\n","        weight (int): weighting factor for positive examples\n","    Returns:\n","        2D tensor\n","    \"\"\"\n","    bce = tf.nn.weighted_cross_entropy_with_logits(labels = y_true, logits = y_pred, pos_weight = 20)\n","    return tf.reduce_mean(bce)\n","\n","def iou(true, pred):\n","    \"\"\"\n","    Calcaulate the intersection over union metric\n","    \"\"\"\n","    intersection = true * pred\n","\n","    notTrue = 1 - true\n","    union = true + (notTrue * pred)\n","\n","    return tf.reduce_sum(intersection)/tf.reduce_sum(union)\n","\n","def conv_block(input_tensor, num_filters):\n","\tencoder = layers.Conv2D(num_filters, (3, 3), padding='same')(input_tensor)\n","\tencoder = layers.BatchNormalization()(encoder)\n","\tencoder = layers.Activation('relu')(encoder)\n","\tencoder = layers.Conv2D(num_filters, (3, 3), padding='same')(encoder)\n","\tencoder = layers.BatchNormalization()(encoder)\n","\tencoder = layers.Activation('relu')(encoder)\n","\treturn encoder\n","\n","def encoder_block(input_tensor, num_filters):\n","\tencoder = conv_block(input_tensor, num_filters)\n","\tencoder_pool = layers.MaxPooling2D((2, 2), strides=(2, 2))(encoder)\n","\treturn encoder_pool, encoder\n","\n","def decoder_block(input_tensor, concat_tensor, num_filters):\n","\tdecoder = layers.Conv2DTranspose(num_filters, (2, 2), strides=(2, 2), padding='same')(input_tensor)\n","\tdecoder = layers.concatenate([concat_tensor, decoder], axis=-1)\n","\tdecoder = layers.BatchNormalization()(decoder)\n","\tdecoder = layers.Activation('relu')(decoder)\n","\tdecoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n","\tdecoder = layers.BatchNormalization()(decoder)\n","\tdecoder = layers.Activation('relu')(decoder)\n","\tdecoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n","\tdecoder = layers.BatchNormalization()(decoder)\n","\tdecoder = layers.Activation('relu')(decoder)\n","\treturn decoder\n","\n","def get_model():\n","\tinputs = layers.Input(shape=[None, None, len(BANDS)])\n","\tencoder0_pool, encoder0 = encoder_block(inputs, 32)\n","\tencoder1_pool, encoder1 = encoder_block(encoder0_pool, 64)\n","\tencoder2_pool, encoder2 = encoder_block(encoder1_pool, 128)\n","\tencoder3_pool, encoder3 = encoder_block(encoder2_pool, 256)\n","\tencoder4_pool, encoder4 = encoder_block(encoder3_pool, 512)\n","\tcenter = conv_block(encoder4_pool, 1024)# center\n","\tdecoder4 = decoder_block(center, encoder4, 512)\n","\tdecoder3 = decoder_block(decoder4, encoder3, 256)\n","\tdecoder2 = decoder_block(decoder3, encoder2, 128)\n","\tdecoder1 = decoder_block(decoder2, encoder1, 64)\n","\tdecoder0 = decoder_block(decoder1, encoder0, 32)\n","\toutputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(decoder0)\n","\n","\tmodel = models.Model(inputs=[inputs], outputs=[outputs])\n","\n","\tmodel.compile(\n","\t\toptimizer=OPTIMIZER, \n","    loss = weighted_bce,\n","\t\t#loss=losses.get(LOSS),\n","\t\tmetrics=[metrics.get(metric) for metric in METRICS])\n","\n","\treturn model\n"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"PA2gJENE8-J1","colab_type":"code","colab":{}},"source":["# set up tensorboard and checkpoint callbacks\n","log_dir = 'drive/My Drive/Tensorflow/NC_solar/models/UNET256/Uncalibrated'\n","\n","tensorboard = tf.keras.callbacks.TensorBoard(log_dir= log_dir)\n","\n","checkpoint = tf.keras.callbacks.ModelCheckpoint(\n","    join(log_dir, 'best_weights.hdf5'),\n","    monitor='val_mean_io_u_2',\n","    verbose=1,\n","    save_best_only=True,\n","    mode='max'\n","    )"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uu_E7OTDBCoS","colab_type":"text"},"source":["# Training the model\n","\n","You train a Keras model by calling `.fit()` on it.  Here we're going to train for 10 epochs, which is suitable for demonstration purposes.  For production use, you probably want to optimize this parameter, for example through [hyperparamter tuning](https://cloud.google.com/ml-engine/docs/tensorflow/using-hyperparameter-tuning)."]},{"cell_type":"code","metadata":{"id":"NzzaWxOhSxBy","colab_type":"code","colab":{}},"source":["m = get_model()\n","\n","m.fit(\n","    x=training, \n","    epochs=EPOCHS, \n","    steps_per_epoch=int(TRAIN_SIZE / BATCH_SIZE), \n","    validation_data=evaluation,\n","    validation_steps=int(EVAL_SIZE/BATCH_SIZE),\n","    callbacks = [checkpoint, tensorboard]\n","    )\n","\n","#We save the model definition and weights to google drive (free) \n","m.save('drive/My Drive/Tensorflow/NC_solar/models/UNET256/Uncalibrated/UNET256.h5')"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"zvIqqpNXqJSE","colab_type":"text"},"source":["##Train from checkpoints\n","If we want to resume or continue training from a previous checkpoint we load the model and best weights from GDrive, check the current accuracy on the evaluation data, and resume training."]},{"cell_type":"code","metadata":{"id":"q0xgBhsaqInV","colab_type":"code","colab":{}},"source":["#bring in the architecture and best weights from Drive\n","m = models.load_model('drive/My Drive/Tensorflow/NC_solar/models/UNET256/Uncalibrated/UNET256.h5', custom_objects={'weighted_bce': weighted_bce})\n","m.load_weights('drive/My Drive/Tensorflow/NC_solar/models/UNET256/Uncalibrated/best_weights.hdf5') "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"umZy0rBzs1Th","colab_type":"code","colab":{}},"source":["#lets see where were at\n","evalMetrics = m.evaluate(x=evaluation, steps = EVAL_SIZE, verbose = 1)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"xlsFciElxOUA","colab_type":"code","colab":{}},"source":["#set the monitored value (val_mean_io_u) to current evaluation output\n","checkpoint = tf.keras.callbacks.ModelCheckpoint(\n","    log_dir+'best_weights.hdf5',\n","    monitor='val_mean_io_u',\n","    verbose=1,\n","    save_best_only=True,\n","    mode='max'\n","    )\n","\n","checkpoint.best = evalMetrics[2]\n","print(checkpoint.__dict__)\n","print(checkpoint.best)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Ty8wCxDtqWBM","colab_type":"code","colab":{}},"source":["#Now keep training!\n","m.fit(\n","    x=training, \n","    epochs= 10, \n","    steps_per_epoch=int(TRAIN_SIZE / BATCH_SIZE), \n","    validation_data=evaluation,\n","    validation_steps=EVAL_SIZE/BATCH_SIZE,\n","    callbacks = [checkpoint, tensorboard]\n","    )\n","#m.save('drive/My Drive/Tensorflow/models/UNET256/UNET256.h5')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"tyhWcGHJ82e8","colab_type":"code","colab":{}},"source":["m.save('drive/My Drive/Tensorflow/models/UNET256/UNET256.h5')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"i9OM5BiS1xYQ","colab_type":"code","colab":{}},"source":["%tensorboard --logdir 'drive/My Drive/Tensorflow/models/UNET256'"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"J1ySNup0xCqN","colab_type":"text"},"source":["# Prediction\n","\n","The prediction pipeline is:\n","\n","1.  Export imagery on which to do predictions from Earth Engine in TFRecord format to a Cloud Storge bucket.\n","2.  Use the trained model to make the predictions.\n","3.  Write the predictions to a TFRecord file in a Cloud Storage.\n","4.  Upload the predictions TFRecord file to Earth Engine.\n","\n","The following functions handle this process.  It's useful to separate the export from the predictions so that you can experiment with different models without running the export every time."]},{"cell_type":"code","metadata":{"id":"lv6nb0ShH4_T","colab_type":"code","colab":{}},"source":["#Inspect the prediction outputs\n","predictions = m.predict(evaluation, steps=1, verbose=1)\n","for prediction in predictions:\n","  print(predictions)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"M3WDAa-RUpXP","colab_type":"code","colab":{}},"source":["def doExport(image, path, out_image_base, kernel_buffer, region):\n","  \"\"\"\n","  Run an image export task on which to run predictions.  Block until complete.\n","  Parameters:\n","    image (ee.Image): image to be exported for prediction\n","    path (str): google cloud directory path for export\n","    out_image_base (str): base filename of exported image\n","    kernel_buffer (array<int>): pixels to buffer the prediction patch. half added to each side\n","    region (ee.Geometry):\n","  \"\"\"\n","  task = ee.batch.Export.image.toCloudStorage(\n","    image = image.select(BANDS), \n","    description = out_image_base, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(path, out_image_base),\n","    region = region.getInfo()['coordinates'], \n","    scale = 10, \n","    fileFormat = 'TFRecord', \n","    maxPixels = 1e13,\n","    formatOptions = { \n","      'patchDimensions': KERNEL_SHAPE,\n","      'kernelSize': kernel_buffer,\n","      'compressed': True,\n","      'maxFileSize': 104857600\n","    }\n","  )\n","  task.start()\n","\n","  # Block until the task completes.\n","  print('Running image export to Cloud Storage...')\n","  import time\n","  while task.active():\n","    time.sleep(30)\n","\n","  # Error condition\n","  if task.status()['state'] != 'COMPLETED':\n","    print('Error with image export.')\n","  else:\n","    print('Image export completed.')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"zb_9_FflygVw","colab_type":"code","colab":{}},"source":["def doPrediction(pred_path, pred_image_base, user_folder, out_image_base, kernel_buffer, region):\n","  \"\"\"\n","  Perform inference on exported imagery, upload to Earth Engine.\n","  Parameters:\n","    pred_path (str): Google cloud (or Drive) path storing prediction image files\n","    pred_image_base (str):\n","    user_folder (str): \n","    out_image_base (str): base filename for \n","    kernel_buffer (Array<int>): length 2 array \n","    region (ee.Geometry)):\n","  \"\"\"\n","\n","  print('Looking for TFRecord files...')\n","  \n","  # Get a list of all the files in the output bucket.\n","  filesList = !gsutil ls {join(BUCKET_PATH, pred_path)}\n","  # Get only the files generated by the image export.\n","  exportFilesList = [s for s in filesList if pred_image_base in s]\n","\n","  # Get the list of image files and the JSON mixer file.\n","  imageFilesList = []\n","  jsonFile = None\n","  for f in exportFilesList:\n","    if f.endswith('.tfrecord.gz'):\n","      imageFilesList.append(f)\n","    elif f.endswith('.json'):\n","      jsonFile = f\n","\n","  # Make sure the files are in the right order.\n","  imageFilesList.sort()\n","\n","  from pprint import pprint\n","  pprint(imageFilesList)\n","  print(jsonFile)\n","  \n","  import json\n","  # Load the contents of the mixer file to a JSON object.\n","  jsonText = !gsutil cat {jsonFile}\n","  # Get a single string w/ newlines from the IPython.utils.text.SList\n","  mixer = json.loads(jsonText.nlstr)\n","  pprint(mixer)\n","  patches = mixer['totalPatches']\n","  \n","  # Get set up for prediction.\n","  x_buffer = int(kernel_buffer[0] / 2)\n","  y_buffer = int(kernel_buffer[1] / 2)\n","\n","  buffered_shape = [\n","      KERNEL_SHAPE[0] + kernel_buffer[0],\n","      KERNEL_SHAPE[1] + kernel_buffer[1]]\n","\n","  imageColumns = [\n","    tf.io.FixedLenFeature(shape=buffered_shape, dtype=tf.float32) \n","      for k in BANDS\n","  ]\n","\n","  imageFeaturesDict = dict(zip(BANDS, imageColumns))\n","\n","  def parse_image(example_proto):\n","    return tf.io.parse_single_example(example_proto, imageFeaturesDict)\n","\n","  def toTupleImage(dic):\n","    inputsList = [dic.get(key) for key in BANDS]\n","    stacked = tf.stack(inputsList, axis=0)\n","    stacked = tf.transpose(stacked, [1, 2, 0])\n","    return stacked\n","  \n","   # Create a dataset from the TFRecord file(s) in Cloud Storage.\n","  imageDataset = tf.data.TFRecordDataset(imageFilesList, compression_type='GZIP')\n","  imageDataset = imageDataset.map(parse_image, num_parallel_calls=5)\n","  imageDataset = imageDataset.map(toTupleImage).batch(1)\n","  \n","  # Perform inference.\n","  print('Running predictions...')\n","  predictions = m.predict(imageDataset, steps=patches, verbose=1)\n","  # print(predictions[0])\n","\n","  print('Writing predictions...')\n","  out_image_file = join(BUCKET_DIR, pred_path, 'outputs', out_image_base, '.TFRecord')\n","  writer = tf.io.TFRecordWriter(out_image_file)\n","  patches = 0\n","  for predictionPatch in predictions:\n","    print('Writing patch ' + str(patches) + '...')\n","    predictionPatch = predictionPatch[\n","        x_buffer:x_buffer+KERNEL_SIZE, y_buffer:y_buffer+KERNEL_SIZE]\n","\n","    # Create an example.\n","    example = tf.train.Example(\n","      features=tf.train.Features(\n","        feature={\n","          'probability': tf.train.Feature(\n","              float_list=tf.train.FloatList(\n","                  value=predictionPatch.flatten()))\n","        }\n","      )\n","    )\n","    # Write the example.\n","    writer.write(example.SerializeToString())\n","    patches += 1\n","\n","  writer.close()\n","\n","  # Start the upload.\n","  out_image_asset = user_folder + '/' + out_image_base\n","  !earthengine upload image --asset_id={out_image_asset} {out_image_file} {jsonFile}"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"LZqlymOehnQO","colab_type":"text"},"source":["Now there's all the code needed to run the prediction pipeline, all that remains is to specify the output region in which to do the prediction, the names of the output files, where to put them, and the shape of the outputs.  In terms of the shape, the model is trained on 256x256 patches, but can work (in theory) on any patch that's big enough with even dimensions ([reference](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf)).  Because of tile boundary artifacts, give the model slightly larger patches for prediction, then clip out the middle 256x256 patch.  This is controlled with a kernel buffer, half the size of which will extend beyond the kernel buffer.  For example, specifying a 128x128 kernel will append 64 pixels on each side of the patch, to ensure that the pixels in the output are taken from inputs completely covered by the kernel.  "]},{"cell_type":"markdown","metadata":{"id":"G9UaJxPS3uZw","colab_type":"text"},"source":["### Test images"]},{"cell_type":"code","metadata":{"id":"BqDRwb6j27w-","colab_type":"code","colab":{}},"source":["# create several small aois to test predictions\n","test_aoi_1 = ee.Geometry.Polygon(\n","        [[[-78.19610376358034, 35.086989862385884],\n","          [-78.19610376358034, 34.735631502732396],\n","          [-77.67974634170534, 34.735631502732396],\n","          [-77.67974634170534, 35.086989862385884]]], None, False)\n","test_aoi_2 = ee.Geometry.Polygon(\n","        [[[-81.59087915420534, 35.84308746418702],\n","          [-81.59087915420534, 35.47711130797561],\n","          [-81.03057641983034, 35.47711130797561],\n","          [-81.03057641983034, 35.84308746418702]]], None, False)\n","test_aoi_3 = ee.Geometry.Polygon(\n","        [[[-78.74447677513596, 36.4941960586897],\n","          [-78.74447677513596, 36.17115435938789],\n","          [-78.21713302513596, 36.17115435938789],\n","          [-78.21713302513596, 36.4941960586897]]], None, False)\n","test_aoi_4 = ee.Geometry.Polygon(\n","        [[[-76.62411544701096, 36.33505523381603],\n","          [-76.62411544701096, 36.03800955668766],\n","          [-76.16818282982346, 36.03800955668766],\n","          [-76.16818282982346, 36.33505523381603]]], None, False)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"CQyLfPdt3TcA","colab_type":"code","colab":{}},"source":["# Create a different test image\n","S2 = ee.ImageCollection(\"COPERNICUS/S2\")\n","# Grab a feature corresponding to our study area - North Carolina\n","states = ee.FeatureCollection(\"TIGER/2016/States\")\n","nc = states.filter(ee.Filter.eq('NAME', 'North Carolina'))\n","begin = '2018-05-01'\n","end = '2018-08-30'\n","\n","# The image input collection is cloud-masked.\n","filtered = S2.filterDate(begin, end)\\\n",".filterBounds(nc)\\\n",".filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20))\\\n",".map(basicQA)\n","\n","# Create a simple median composite to visualize\n","test = filtered.median().select(BANDS).clip(test_aoi_4)\n","\n","# Use folium to visualize the imagery.\n","#mapid = image.getMapId({'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 0.3})\n","rgbParams = {'bands': ['B4', 'B3', 'B2'],\n","             'min': 0,\n","             'max': 0.3}\n","\n","nirParams = {'bands': ['B8', 'B11', 'B12'],\n","             'min': 0,\n","             'max': 0.3}\n","\n","map = folium.Map(location=[35.402, -78.376])\n","map.add_ee_layer(test, rgbParams, 'Color')\n","map.add_ee_layer(test, nirParams, 'Thermal')\n","\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"FPANwc7B1-TS","colab_type":"code","colab":{}},"source":["# Choose the GEE folder in which to ingest prediction image:\n","user_folder = 'users/defendersofwildlifeGIS/NC_solar'\n","\n","# prediction path\n","nc_path = join(FOLDER, PRED_BASE)\n","# Base file name to use for TFRecord files and assets. The name structure includes:\n","# the image processing used ['raw', 'calibrated', 'normalized'], the model\n","nc_image_base = 'raw_unet256_testpred4'\n","# Half this will extend on the sides of each patch.\n","nc_kernel_buffer = [128, 128]\n","# NC\n","nc_region = test_aoi_4"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"lLNEOLkXWvSi","colab_type":"code","cellView":"both","colab":{}},"source":["# Run the export.\n","doExport(test, nc_path, nc_image_base, nc_kernel_buffer, nc_region)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"KxACnxKFrQ_J","colab_type":"code","cellView":"both","colab":{}},"source":["# Run the prediction.\n","doPrediction(nc_path, nc_image_base, user_folder, 'raw_unet256_20_test4', nc_kernel_buffer, nc_region)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uj_G9OZ1xH6K","colab_type":"text"},"source":["# Display the output\n","\n","One the data has been exported, the model has made predictions and the predictions have been written to a file, and the image imported to Earth Engine, it's possible to display the resultant Earth Engine asset.  Here, display the impervious area predictions over Beijing, China."]},{"cell_type":"code","metadata":{"id":"Jgco6HJ4R5p2","colab_type":"code","colab":{}},"source":["out_image = ee.Image(user_folder + '/' + nc_image_base)\n","mapid = out_image.getMapId({'min': 0, 'max': 1})\n","map = folium.Map(location=[39.898, 116.5097])\n","map.add_ee_layer(out_image, {'min': 0, 'max': 1}, 'solar predictions')\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]}]}
+{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"UNET_G4G_2019_solar.ipynb","provenance":[],"private_outputs":true,"collapsed_sections":[],"toc_visible":true,"machine_shape":"hm"},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"view-in-github","colab_type":"text"},"source":["<a href=\"https://colab.research.google.com/github/mjevans26/Satellite_ComputerVision/blob/master/UNET_G4G_2019_solar.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"]},{"cell_type":"code","metadata":{"id":"esIMGVxhDI0f","colab_type":"code","colab":{}},"source":["#@title Author: Michael Evans { display-mode: \"form\" }\n","# Licensed under the Apache License, Version 2.0 (the \"License\");\n","# you may not use this file except in compliance with the License.\n","# You may obtain a copy of the License at\n","#\n","# https://www.apache.org/licenses/LICENSE-2.0\n","#\n","# Unless required by applicable law or agreed to in writing, software\n","# distributed under the License is distributed on an \"AS IS\" BASIS,\n","# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n","# See the License for the specific language governing permissions and\n","# limitations under the License."],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"_SHAc5qbiR8l","colab_type":"text"},"source":["# Introduction\n","\n","This notebook demonstrates methods used to delineate all of the ground-mounted solar arrays in North Carolina using free satellite imagery.  Our workflow generates and exports satellite imagery data from Google Earth Engine for analysis in Tensorflow.  This analysis predicts the probability of the presence of a solar array as a function of the visible, infrared, and near infrared bands in Sentinel-2 imagery.  The model is a [fully convolutional neural network (FCNN)](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf), specifically [U-net](https://arxiv.org/abs/1505.04597).  This relatively simple model is a mostly unmodified version of [this example](https://github.com/tensorflow/models/blob/master/samples/outreach/blogs/segmentation_blogpost/image_segmentation.ipynb) from the TensorFlow docs.  This notebook shows:\n","\n","1.   Exporting training/testing patches from Earth Engine, suitable for training an FCNN model.\n","2.   Preprocessing.\n","3.   Training and validating an FCNN model.\n","4.   Making predictions with the trained model and importing them to Earth Engine."]},{"cell_type":"markdown","metadata":{"id":"_MJ4kW1pEhwP","colab_type":"text"},"source":["# Setup software libraries\n","\n","Install needed libraries to the notebook VM.  Authenticate as necessary."]},{"cell_type":"code","metadata":{"id":"neIa46CpciXq","colab_type":"code","colab":{}},"source":["# Cloud authentication.\n","from google.colab import auth\n","auth.authenticate_user()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"4D6ArFWrckmS","colab_type":"code","colab":{}},"source":["#@title Earth Engine install to notebook VM.\n","!pip install earthengine-api --upgrade"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"jat01FEoUMqg","colab_type":"code","colab":{}},"source":["# Import, authenticate and initialize the Earth Engine library.\n","import ee\n","ee.Authenticate()\n","ee.Initialize()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"8RnZzcYhcpsQ","colab_type":"code","colab":{}},"source":["# Tensorflow setup.\n","import tensorflow as tf\n","device_name = tf.test.gpu_device_name()\n","tf.executing_eagerly()\n","print(tf.__version__)\n","print(device_name)\n","%load_ext tensorboard"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"n1hFdpBQfyhN","colab_type":"code","colab":{}},"source":["# Folium setup.\n","import folium\n","print(folium.__version__)\n","\n","# Define a method for displaying Earth Engine image tiles to a folium map.\n","def add_ee_layer(self, ee_image_object, vis_params, name):\n","  map_id_dict = ee.Image(ee_image_object).getMapId(vis_params)\n","  folium.raster_layers.TileLayer(\n","    tiles = map_id_dict['tile_fetcher'].url_format,\n","    attr = \"Map Data © Google Earth Engine\",\n","    name = name,\n","    overlay = True,\n","    control = True\n","  ).add_to(self)\n","\n","# Add EE drawing method to folium.\n","folium.Map.add_ee_layer = add_ee_layer\n","\n","# Define the URL format used for Earth Engine generated map tiles.\n","#EE_TILES = 'https://earthengine.googleapis.com/map/{mapid}/{{z}}/{{x}}/{{y}}?token={token}'"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"WjUgYcsAs9Ed","colab_type":"text"},"source":["##Mount Google Drive"]},{"cell_type":"code","metadata":{"id":"JKDKpX4FtQA1","colab_type":"code","colab":{}},"source":["from google.colab import drive\n","drive.mount('/content/drive')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"OyLDVLD7VKsM","colab_type":"code","colab":{}},"source":["# move into repo directory containing modules we need\n","%cd '/content/drive/My Drive/repos/Satellite_ComputerVision'"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"iT8ycmzClYwf","colab_type":"text"},"source":["# Variables\n","\n","Declare the variables that will be in use throughout the notebook."]},{"cell_type":"markdown","metadata":{"id":"qKs6HuxOzjMl","colab_type":"text"},"source":["Specify a cloud storage bucket to which you have read/write access"]},{"cell_type":"code","metadata":{"id":"obDDH1eDzsch","colab_type":"code","colab":{}},"source":["from os.path import join\n","BUCKET = 'cvod-203614-mlengine'\n","BUCKET_PATH = join('gs://', BUCKET)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"wmfKLl9XcnGJ","colab_type":"text"},"source":["## Set other global variables"]},{"cell_type":"code","metadata":{"id":"psz7wJKalaoj","colab_type":"code","colab":{}},"source":["# Specify names locations for outputs in Cloud Storage. \n","FOLDER = 'NC_solar'\n","PRED_BASE = 'data/predict'\n","TRAIN_BASE = 'data/training'\n","EVAL_BASE = 'data/eval'\n","MODEL_BASE = 'models/UNET256'\n","log_dir = 'drive/My Drive/Tensorflow/models/UNET256'\n","\n","# Specify inputs (Sentinel bands) to the model and the response variable.\n","opticalBands = ['B2', 'B3', 'B4']\n","thermalBands = ['B8', 'B11', 'B12']\n","# We may want to run some experiments where we use pca components\n","pcaBands = ['pc1', 'pc2', 'pc3']\n","BANDS = opticalBands + thermalBands# + pcaBands\n","RESPONSE = 'landcover'\n","FEATURES = BANDS + [RESPONSE]\n","SCENEID = 'SENSING_ORBIT_NUMBER'\n","\n","# Specify the size and shape of patches expected by the model.\n","KERNEL_SIZE = 256\n","KERNEL_SHAPE = [KERNEL_SIZE, KERNEL_SIZE]\n","COLUMNS = [\n","  tf.io.FixedLenFeature(shape=KERNEL_SHAPE, dtype=tf.float32) for k in FEATURES\n","]\n","FEATURES_DICT = dict(zip(FEATURES, COLUMNS))\n","\n","# Sizes of the training and evaluation datasets.\n","TRAIN_SIZE = 7700\n","EVAL_SIZE = 3300\n","\n","# Specify model training parameters.\n","BATCH_SIZE = 16\n","EPOCHS = 20\n","BUFFER_SIZE = 11000\n","OPTIMIZER = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999)\n","LOSS = 'binary_crossentropy'\n","METRICS = [tf.keras.metrics.categorical_accuracy, tf.keras.metrics.MeanIoU(num_classes=2)]"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"hgoDc7Hilfc4","colab_type":"text"},"source":["# Imagery\n","\n","Process the imagery to use for predictor variables.  This is a three-month, cloud-free, Sentinel-2 composite corresponding to the latest date from which we have confirmed training data.  Display it in the notebook for a sanity check."]},{"cell_type":"markdown","metadata":{"id":"MjNmEImcGuMb","colab_type":"text"},"source":["## Create sample image"]},{"cell_type":"code","metadata":{"id":"c12hxNU2S89-","colab_type":"code","colab":{}},"source":["# add the Google Drive repo directory to path so we can use our modules\n","import sys\n","sys.path.append('/content/drive/My Drive/repos/Satellite_ComputerVision/utils')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"-IlgXu-vcUEY","colab_type":"code","colab":{}},"source":["from clouds import basicQA\n","# Use Sentinel-2 surface reflectance data.\n","S2 = ee.ImageCollection(\"COPERNICUS/S2\")\n","# Grab a feature corresponding to our study area - North Carolina\n","states = ee.FeatureCollection(\"TIGER/2016/States\")\n","nc = states.filter(ee.Filter.eq('NAME', 'North Carolina'))\n","begin = '2019-01-01'\n","end = '2020-03-01'\n","\n","# The image input collection is cloud-masked.\n","filtered = S2.filterDate(begin, end)\\\n",".filterBounds(nc)\\\n",".filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20))\n","\n","\n","# Create a simple median composite to visualize\n","winter = filtered.filterDate('2019-12-01', '2020-02-28').map(basicQA).median().select(BANDS).clip(nc)\n","spring = filtered.filterDate('2019-03-01', '2019-05-31').map(basicQA).median().select(BANDS).clip(nc)\n","summer = filtered.filterDate('2019-06-01', '2019-08-31').map(basicQA).median().select(BANDS).clip(nc)\n","fall = filtered.filterDate('2019-09-01', '2019-11-30').map(basicQA).median().select(BANDS).clip(nc)\n","\n","# Use folium to visualize the imagery.\n","#mapid = image.getMapId({'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 0.3})\n","rgbParams = {'bands': ['B4', 'B3', 'B2'],\n","             'min': 0,\n","             'max': 0.3}\n","\n","nirParams = {'bands': ['B8', 'B11', 'B12'],\n","             'min': 0,\n","             'max': 0.3}\n","\n","map = folium.Map(location=[35.402, -78.376])\n","map.add_ee_layer(spring, rgbParams, 'Color')\n","map.add_ee_layer(spring, nirParams, 'Thermal')\n","\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"gHznnctkJsZJ","colab_type":"text"},"source":["Prepare the response variable.  This is the footprints of ground mounted solar arrays as of 2016, coded into a background class [0] and a target class [1]. Display on the map to verify."]},{"cell_type":"code","metadata":{"id":"5Wxz9BPYHBwh","colab_type":"code","colab":{}},"source":["def set_landcover(ft):\n","  \"\"\"\n","  Add a property to a feature and set it to 1\n","  Parameters:\n","    ft (ee.Feature): feature to have property added\n","  Returns:\n","    ee.Feature: input feature with 'label' property set to 1\n","  \"\"\"\n","  return ft.set('label', 1)\n","\n","# Get solar footprints data from our GEE Asset\n","NC_solar_footprints = ee.FeatureCollection(\"users/taylorminich/NC_solar_footprints\")\n","# Label each polygon with property 'label' equal to 1\n","NC_solar_footprints = NC_solar_footprints.map(set_landcover)\n","# Create an image with all pixels equal to 0\n","blankimg = ee.Image.constant(0)\n","# Convert solar footprints to an image (band value will be 1 based on 'label')\n","solar_footprint = NC_solar_footprints.reduceToImage(['label'], ee.Reducer.first())\n","# Convert pixels of blank image to 1 where the values of the footprint image are 1\n","# and rename to 'landcover'\n","labelimg = blankimg.where(solar_footprint, solar_footprint).rename('landcover')\n","\n","solarParams = {'bands': 'landcover', 'min':0, 'max': 1}\n","\n","map = folium.Map(location = [35.402, -78.376])\n","map.add_ee_layer(labelimg,  solarParams, 'Solar footprint')\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"F4djSxBRG2el","colab_type":"text"},"source":["Use some pre-made geometries to sample the stack in strategic locations.  We constrain sampling to occur within 10km of mapped solar arrays. Because our target features are small and sparse, relative to the landscape, we also guide sampling based on their centroids to ensure that we get training data for solar arrays."]},{"cell_type":"code","metadata":{"id":"ure_WaD0itQY","colab_type":"code","colab":{}},"source":["def buff(ft):\n","  return ft.buffer(10000)\n","\n","def centroid(ft):\n","  return ft.centroid()\n","\n","centroids = NC_solar_footprints.map(centroid)\n","studyArea = NC_solar_footprints.map(buff).union()\n","studyImage = ee.Image(0).byte().paint(studyArea, 1)\n","studyImage = studyImage.updateMask(studyImage)\n","\n","aoiParams = {'min':0, 'max': 1, 'palette': ['red']}\n","map = folium.Map(location=[35.402, -78.376], zoom_start=8)\n","map.add_ee_layer(studyImage, aoiParams, 'Sampling area')\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"BRHnejNf6SyM","colab_type":"text"},"source":["### Calibration (experimental)\n","For consistency in predictive ability across contexts, we calibrate all images in the collection to a standard, then normalize the image bands to [0,1] after squashing extreme values where the sensor was likely 'washed out'"]},{"cell_type":"code","metadata":{"id":"IdTBDdggTJrp","colab_type":"code","colab":{}},"source":["from calibration import equalize_collection, clamp_and_scale"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"wCECkm2xdrTp","colab_type":"code","colab":{}},"source":["# calibrate all scenes in the collection using histogram equalization\n","equalized = equalize_collection(filtered, BANDS, SCENEID)\n","\n","# need to cast all images in resulting collection to same type for \n","equalImage = equalized.cast(dict(zip(BANDS, ['float']*6)), BANDS).median()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"6iDwJS-9ToXf","colab_type":"code","colab":{}},"source":["# normalize the calibrated image to [0,1]\n","normImage = clamp_and_scale(equalImage, BANDS, 99, nc)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"CTS7_ZzPDhhg","colab_type":"text"},"source":["Stack the normalized sentinel composite and binary solar indicator image to create a single image from which samples can be taken.  Convert the image into an array image in which each pixel stores 256x256 patches of pixels for each band.  This is a key step that bears emphasis: to export training patches, convert a multi-band image to [an array image](https://developers.google.com/earth-engine/arrays_array_images#array-images) using [`neighborhoodToArray()`](https://developers.google.com/earth-engine/api_docs#eeimageneighborhoodtoarray), then sample the image at points."]},{"cell_type":"code","metadata":{"id":"eGHYsdAOipa4","colab_type":"code","colab":{}},"source":["featureStack = ee.Image.cat([\n","  summer.select(BANDS),\n","  labelimg.select(RESPONSE)\n","]).float()\n","\n","ls = ee.List.repeat(1, KERNEL_SIZE)\n","lists = ee.List.repeat(ls, KERNEL_SIZE)\n","kernel = ee.Kernel.fixed(KERNEL_SIZE, KERNEL_SIZE, lists)\n","\n","arrays = featureStack.neighborhoodToArray(kernel)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ZV890gPHeZqz","colab_type":"text"},"source":["# Sampling\n","\n","The mapped data look reasonable so take a sample from each polygon and merge the results into a single export.  The key step is sampling the array image at points, to get all the pixels in a 256x256 neighborhood at each point.  It's worth noting that to build the training and testing data for the FCNN, you export a single TFRecord file that contains patches of pixel values in each record.  You do NOT need to export each training/testing patch to a different image.  Since each record potentially contains a lot of data (especially with big patches or many input bands), some manual sharding of the computation is necessary to avoid the `computed value too large` error.  Specifically, the following code takes multiple (smaller) samples within each geometry, merging the results to get a single export."]},{"cell_type":"code","metadata":{"id":"1T1cc6haU_oS","colab_type":"code","colab":{}},"source":["join(BUCKET_PATH, FOLDER, TRAIN_BASE, 'calibrated/')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"2CqL0Y6iLQPP","colab_type":"code","colab":{}},"source":["!gsutil mv {join(BUCKET_PATH, FOLDER, TRAINING_BASE, '*')} {join(BUCKET_PATH, FOLDER, TRAINING_BASE, 'calibrated/')}"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"VXes-Ot17RGI","colab_type":"code","colab":{}},"source":["!gsutil ls gs://cvod-203614-mlengine/NC_solar/data/predict"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"aJ4nGSvdYop6","colab_type":"text"},"source":["First we'll collect image patches from the centroids of known solar array locations"]},{"cell_type":"code","metadata":{"id":"FyRpvwENxE-A","colab_type":"code","cellView":"code","colab":{}},"source":["#@title Centroids slicing\n","# Get samples from delineated features using slice() on a feature collection\n","# THIS TAKES DAYS TO RUN...probably not the optimal\n","\n","# Add a random column to the centroids\n","centroids = centroids.randomColumn('random')\n","S = centroids.size().getInfo()\n","centroidList = centroids.toList(S)\n","\n","x = 0\n","\n","while x < S:\n","  region = centroidList.slice(x, x+50)\n","  sample = arrays.sampleRegions(\n","      collection = region,\n","      scale = 10,\n","      tileScale = 12\n","  )\n","  x += 50\n","                                  \n","  # assign a random number to samples and create a 70/30 train/test split\n","  sample = sample.randomColumn('random')\n","  training = sample.filter(ee.Filter.gte('random', 0.3))\n","  testing = sample.filter(ee.Filter.lt('random', 0.3))\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_trainCentspring' + str(x)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = training,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, TRAIN_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start()\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_evalCentspring' + str(x)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = testing,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, EVAL_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"YoJMncFKYwq2","colab_type":"code","colab":{}},"source":["#@title Centroids random sampling\n","\n","# Define sample sizes for shards and chunks. \n","# These numbers determined experimentally.\n","n = 100 # Number of shards in each chunk.\n","N = 200 # Total sample size in each chunk.\n","C = 5 # Number of chunks\n","\n","iterator = iter(range(N*C))\n","\n","# for each 'chunk' - which defines 2 export tasks per chunk: 1 train, 1 eval\n","for c in range(C):\n","  geomSample = ee.FeatureCollection([])\n","\n","  # for each 'shard' - which defines a batch of samples of size N/n\n","  for i in range(n):\n","    # generate a different seed for this iteration\n","    seed = next(iterator)\n","    sample = arrays.sample(\n","        region = NC_solar_footprints,\n","        scale = 10,\n","        numPixels = N/n,\n","        seed = seed,\n","        tileScale = 8\n","    )\n","    geomSample = geomSample.merge(sample)\n","\n","  #divide samples into training and evaluation data\n","  geomSample = geomSample.randomColumn('random')\n","  training = geomSample.filter(ee.Filter.gte('random', 0.3))\n","  testing = geomSample.filter(ee.Filter.lt('random', 0.3))\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_footprintTrain'+str(c)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = training,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, TRAINING_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start()\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_footprintEval' + str(c)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = testing,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, EVAL_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start() "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"QuRyLGmOYmrR","colab_type":"code","colab":{}},"source":["#@title Random sampling\n","\n","# Define sample sizes for shards and chunks. \n","# These numbers determined experimentally.\n","n = 100 # Number of shards in each chunk.\n","N = 1000 # Total sample size in each chunk.\n","C = 10# Number of chunks\n","\n","iterator = iter(range(N*C))\n","\n","for c in range(C):\n","  geomSample = ee.FeatureCollection([])\n","\n","  for i in range(n):\n","    seed = next(iterator)\n","    sample = arrays.sample(\n","        region = studyArea,\n","        scale = 10,\n","        numPixels = N/n,\n","        seed = seed,\n","        tileScale = 8\n","    )\n","    geomSample = geomSample.merge(sample)\n","\n","  #divide samples into training and evaluation data\n","  geomSample = geomSample.randomColumn('random')\n","  training = geomSample.filter(ee.Filter.gte('random', 0.3))\n","  testing = geomSample.filter(ee.Filter.lt('random', 0.3))\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_train'+str(c)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = training,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, TRAIN_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start()\n","\n","  desc = 'UNET_' + str(KERNEL_SIZE) + '_eval' + str(c)\n","  task = ee.batch.Export.table.toCloudStorage(\n","    collection = testing,\n","    description = desc, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(FOLDER, EVAL_BASE, desc),\n","    fileFormat = 'TFRecord',\n","    selectors = BANDS + [RESPONSE]\n","  )\n","  task.start() "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Nj1sFkUyYgnj","colab_type":"code","colab":{}},"source":[""],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"dk51-l7MH2Sa","colab_type":"text"},"source":["# Model data"]},{"cell_type":"markdown","metadata":{"id":"cC-zatoXNdlO","colab_type":"text"},"source":["## Preprocessing\n","Define functions that apply random manipulations to our training data"]},{"cell_type":"code","metadata":{"id":"cwzvsqTavJFL","colab_type":"code","colab":{}},"source":["import numpy as np\n","test = np.array([[1,2,3],[5,6,2],[3,3,3]])\n","H,W = test.shape\n","print(H)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"ajyp48-vINuy","colab_type":"code","colab":{}},"source":["import tensorflow_probability as tfp\n","\n","def augColor(x):\n","    \"\"\"Color augmentation\n","\n","    Args:\n","        x: Image\n","\n","    Returns:\n","        Augmented image\n","    \"\"\"\n","    x = tf.image.random_hue(x, 0.08)\n","    x = tf.image.random_saturation(x, 0.6, 1.6)\n","    x = tf.image.random_brightness(x, 0.05)\n","    x = tf.image.random_contrast(x, 0.7, 1.3)\n","    return x\n","  \n","def normalize(x, axes=[0, 1, 2], epsilon=1e-8):\n","  \"\"\"\n","  Standardize incoming image patches by local mean and variance\n","  Parameters:\n","    x (tensor): nD image tensor\n","    axes (array): Array of ints. Axes along which to compute mean and variance, usually length n-1\n","    epsilon (float): small number to avoid dividing by zero\n","  Return:\n","    tensor: nD image tensor normalized by channels\n","  \"\"\"\n","  mean, variance = tf.nn.moments(x, axes=axes)\n","  x_normed = (x - mean) / tf.sqrt(variance + epsilon) # epsilon to avoid dividing by zero\n","  return x_normed\n","\n","def standard(img, axes = [0, 1, 2]):\n","  # shape attribute returns a tuple (256, 256, 6)\n","  dims = tf.shape(img)\n","  H = dims[0]\n","  W = dims[1]\n","  C = dims[2]\n","  ninetyninth = tfp.stats.percentile(img, 99, axis = axes, interpolation = 'lower')\n","  # create a list of HxW tensors holding 99th percentile values per band\n","  maximum = tf.reshape(tf.repeat(ninetyninth, repeats = [H*W, H*W, H*W, H*W, H*W, H*W]), [H,W,C]) \n","  minimum = tf.reshape(tf.repeat([0.0], repeats = [H*W*C]), shape = (H, W, C))\n","  #maximum = tf.reshape(tf.repeat([100.0], repeats = [H*W*C]), shape = (H, W, C))\n","  clipped = tf.clip_by_value(img, clip_value_min = minimum, clip_value_max = maximum)\n","  scaled = tf.divide(tf.subtract(clipped, minimum), tf.subtract(maximum, minimum))\n","  return scaled\n","\n","def augImg(img):\n","  outDims = tf.shape(img)[0:1]\n","  x = tf.image.random_flip_left_right(img)\n","  x = tf.image.random_flip_up_down(x)\n","  x = rotated = tf.image.rot90(x, tf.random.uniform(shape=[], minval=0, maxval=4, dtype=tf.int32))\n","  #x = zoom(x, outDims)\n","  #since were gonna map_fn this on a 4d image, output must be 3d, so squeeze the artificial 'sample' dimension\n","  return tf.squeeze(x)\n","\n","def preprocess(img, labels):\n","  dims = tf.shape(img)\n","  #need to combine labels and bands for morphological transformations\n","  comb = tf.concat([img, tf.expand_dims(labels, axis = 2)], axis = 2)\n","  aug = aug_img(comb)\n","  #aug = tf.map_fn(fn = aug_img, elems = comb)\n","  labels = tf.squeeze(aug[:, :, -1])\n","  band_stack = color(aug[:, :, 0:dims[2]])\n","  return band_stack, labels"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"rWXrvBE4607G","colab_type":"text"},"source":["# Training data\n","\n","Load the data exported from Earth Engine into a `tf.data.Dataset`.  The following are helper functions for that."]},{"cell_type":"code","metadata":{"id":"WWZ0UXCVMyJP","colab_type":"code","colab":{}},"source":["def parse_tfrecord(example_proto):\n","  \"\"\"The parsing function.\n","  Read a serialized example into the structure defined by FEATURES_DICT.\n","  Args:\n","    example_proto: a serialized Example.\n","  Returns: \n","    A dictionary of tensors, keyed by feature name.\n","  \"\"\"\n","  return tf.io.parse_single_example(example_proto, FEATURES_DICT)\n","\n","\n","def to_tuple(inputs):\n","  \"\"\"Function to convert a dictionary of tensors to a tuple of (inputs, outputs).\n","  Turn the tensors returned by parse_tfrecord into a stack in HWC shape.\n","  Args:\n","    inputs: A dictionary of tensors, keyed by feature name.\n","  Returns: \n","    A dtuple of (inputs, outputs).\n","  \"\"\"\n","  inputsList = [inputs.get(key) for key in FEATURES]\n","  stacked = tf.stack(inputsList, axis=0)\n","  # Convert from CHW to HWC\n","  stacked = tf.transpose(stacked, [1, 2, 0])\n","  # Perform image augmentation\n","  stacked = augImg(stacked)\n","  # split input bands and labels\n","  bands = stacked[:,:,:len(BANDS)]\n","  labels = stacked[:,:,len(BANDS):]\n","  # standardize each patch of bands\n","  bands = standard(bands, [0, 1])\n","  return bands, labels \n","\n","def get_dataset(pattern):\n","  \"\"\"Function to read, parse and format to tuple a set of input tfrecord files.\n","  Get all the files matching the pattern, parse and convert to tuple.\n","  Args:\n","    pattern: A file pattern to match in a Cloud Storage bucket.\n","  Returns: \n","    A tf.data.Dataset\n","  \"\"\"\n","  glob = tf.io.gfile.glob(pattern)\n","  dataset = tf.data.TFRecordDataset(glob, compression_type='GZIP')\n","  dataset = dataset.map(parse_tfrecord, num_parallel_calls=5)\n","  dataset = dataset.map(to_tuple, num_parallel_calls=5)\n","  return dataset"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"Xg1fa18336D2","colab_type":"text"},"source":["Use the helpers to read in the training dataset.  Print the first record to check."]},{"cell_type":"code","metadata":{"id":"rm0qRF0fAYcC","colab_type":"code","colab":{}},"source":["def get_training_dataset(pattern):\n","\t\"\"\"Get the preprocessed training dataset\n","\tParameters:\n","\t\tpattern (str): directory path to training data\n","  Returns: \n","    A tf.data.Dataset of training data.\n","  \"\"\"\n","\t#glob = 'gs://' + BUCKET + '/' + FOLDER + '/' + TRAINING_BASE + '/*'\n","\tdataset = get_dataset(pattern)\n","\tdataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()\n","\treturn dataset"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"bk9rFou0J_dZ","colab_type":"code","colab":{}},"source":["# make sure we have training records\n","!gsutil ls {join(BUCKET_PATH, FOLDER, TRAIN_BASE, '*.tfrecord.gz')}"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"JzpG3kUwZ9J5","colab_type":"code","colab":{}},"source":["training = get_training_dataset('gs://cvod-203614-mlengine/NC_solar/data/training/UNET_256_trainCentspring100.tfrecord.gz')\n","#training = get_training_dataset(join(BUCKET_PATH, FOLDER, TRAIN_BASE, '*winter500.tfrecord.gz'))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"cIueW4_Fs0ID","colab_type":"code","colab":{}},"source":["#check to make sure our records look like we expect\n","print(iter(training.take(1)).next())"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"j-cQO5RL6vob","colab_type":"text"},"source":["# Evaluation data\n","\n","Now do the same thing to get an evaluation dataset.  Note that unlike the training dataset, the evaluation dataset has a batch size of 1, is not repeated and is not shuffled."]},{"cell_type":"code","metadata":{"id":"ieKTCGiJ6xzo","colab_type":"code","colab":{}},"source":["def get_eval_dataset(pattern):\n","\t\"\"\"\n","\tGet the preprocessed evaluation dataset\n","\tParameters:\n","\t\tpattern (str): directory path to training data\n","  Returns: \n","    A tf.data.Dataset of evaluation data.\n","  \"\"\"\n","\t#glob = 'gs://' + BUCKET + '/' + FOLDER + '/' + EVAL_BASE + '/*'\n","\tdataset = get_dataset(pattern)\n","\tdataset = dataset.batch(1).repeat()\n","\treturn dataset"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"fkU1JcYlK1s3","colab_type":"code","colab":{}},"source":["# make sure we have eval data\n","path = join(BUCKET_PATH, FOLDER, EVAL_BASE, '*.tfrecord.gz')\n","!gsutil ls {path}"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"NpcsljQeKzq7","colab_type":"code","colab":{}},"source":["evaluation = get_eval_dataset(path)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"9JIE7Yl87lgU","colab_type":"text"},"source":["# Model\n","\n","Here we use the Keras implementation of the U-Net model as found [in the TensorFlow examples](https://github.com/tensorflow/models/blob/master/samples/outreach/blogs/segmentation_blogpost/image_segmentation.ipynb).  The U-Net model takes 256x256 pixel patches as input and outputs per-pixel class probability, label or a continuous output.  We can implement the model essentially unmodified, but will use mean squared error loss on the sigmoidal output since we are treating this as a regression problem, rather than a classification problem.  Since impervious surface fraction is constrained to [0,1], with many values close to zero or one, a saturating activation function is suitable here."]},{"cell_type":"markdown","metadata":{"id":"Xh2EZyyPu84H","colab_type":"text"},"source":["##Metrics"]},{"cell_type":"markdown","metadata":{"id":"HK6BKW_xMNqL","colab_type":"text"},"source":["We define a weighted binary cross entropy loss function because the training data is potentially sparse. This also gives us greater control over the rates of omission and commission prediciton errors. Because this is an image segmentation exercise, we may also be interested in the intersection over union as a loss measure."]},{"cell_type":"code","metadata":{"id":"wsnnnz56yS3l","colab_type":"code","colab":{}},"source":["from tensorflow.python.keras import layers\n","from tensorflow.python.keras import losses\n","from tensorflow.python.keras import models\n","from tensorflow.python.keras import metrics\n","from tensorflow.python.keras import optimizers\n","\n","def weighted_bce(y_true, y_pred):\n","    \"\"\"\n","    Compute the weighted binary cross entropy between predictions and observations\n","    Parameters:\n","        y_true (): 2D tensor of labels\n","        y_pred (): 2D tensor of probabilities\n","        weight (int): weighting factor for positive examples\n","    Returns:\n","        2D tensor\n","    \"\"\"\n","    bce = tf.nn.weighted_cross_entropy_with_logits(labels = y_true, logits = y_pred, pos_weight = 20)\n","    return tf.reduce_mean(bce)\n","\n","def iou(true, pred):\n","    \"\"\"\n","    Calcaulate the intersection over union metric\n","    \"\"\"\n","    intersection = true * pred\n","\n","    notTrue = 1 - true\n","    union = true + (notTrue * pred)\n","\n","    return tf.reduce_sum(intersection)/tf.reduce_sum(union)\n","\n","def conv_block(input_tensor, num_filters):\n","\tencoder = layers.Conv2D(num_filters, (3, 3), padding='same')(input_tensor)\n","\tencoder = layers.BatchNormalization()(encoder)\n","\tencoder = layers.Activation('relu')(encoder)\n","\tencoder = layers.Conv2D(num_filters, (3, 3), padding='same')(encoder)\n","\tencoder = layers.BatchNormalization()(encoder)\n","\tencoder = layers.Activation('relu')(encoder)\n","\treturn encoder\n","\n","def encoder_block(input_tensor, num_filters):\n","\tencoder = conv_block(input_tensor, num_filters)\n","\tencoder_pool = layers.MaxPooling2D((2, 2), strides=(2, 2))(encoder)\n","\treturn encoder_pool, encoder\n","\n","def decoder_block(input_tensor, concat_tensor, num_filters):\n","\tdecoder = layers.Conv2DTranspose(num_filters, (2, 2), strides=(2, 2), padding='same')(input_tensor)\n","\tdecoder = layers.concatenate([concat_tensor, decoder], axis=-1)\n","\tdecoder = layers.BatchNormalization()(decoder)\n","\tdecoder = layers.Activation('relu')(decoder)\n","\tdecoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n","\tdecoder = layers.BatchNormalization()(decoder)\n","\tdecoder = layers.Activation('relu')(decoder)\n","\tdecoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n","\tdecoder = layers.BatchNormalization()(decoder)\n","\tdecoder = layers.Activation('relu')(decoder)\n","\treturn decoder\n","\n","def get_model():\n","\tinputs = layers.Input(shape=[None, None, len(BANDS)])\n","\tencoder0_pool, encoder0 = encoder_block(inputs, 32)\n","\tencoder1_pool, encoder1 = encoder_block(encoder0_pool, 64)\n","\tencoder2_pool, encoder2 = encoder_block(encoder1_pool, 128)\n","\tencoder3_pool, encoder3 = encoder_block(encoder2_pool, 256)\n","\tencoder4_pool, encoder4 = encoder_block(encoder3_pool, 512)\n","\tcenter = conv_block(encoder4_pool, 1024)# center\n","\tdecoder4 = decoder_block(center, encoder4, 512)\n","\tdecoder3 = decoder_block(decoder4, encoder3, 256)\n","\tdecoder2 = decoder_block(decoder3, encoder2, 128)\n","\tdecoder1 = decoder_block(decoder2, encoder1, 64)\n","\tdecoder0 = decoder_block(decoder1, encoder0, 32)\n","\toutputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(decoder0)\n","\n","\tmodel = models.Model(inputs=[inputs], outputs=[outputs])\n","\n","\tmodel.compile(\n","\t\toptimizer=OPTIMIZER, \n","    loss = weighted_bce,\n","\t\t#loss=losses.get(LOSS),\n","\t\tmetrics=[metrics.get(metric) for metric in METRICS])\n","\n","\treturn model\n"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"PA2gJENE8-J1","colab_type":"code","colab":{}},"source":["# set up tensorboard and checkpoint callbacks\n","log_dir = 'drive/My Drive/Tensorflow/NC_solar/models/UNET256/Uncalibrated'\n","\n","tensorboard = tf.keras.callbacks.TensorBoard(log_dir= log_dir)\n","\n","checkpoint = tf.keras.callbacks.ModelCheckpoint(\n","    join(log_dir, 'best_weights.hdf5'),\n","    monitor='val_mean_io_u_2',\n","    verbose=1,\n","    save_best_only=True,\n","    mode='max'\n","    )"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uu_E7OTDBCoS","colab_type":"text"},"source":["# Training the model\n","\n","You train a Keras model by calling `.fit()` on it.  Here we're going to train for 10 epochs, which is suitable for demonstration purposes.  For production use, you probably want to optimize this parameter, for example through [hyperparamter tuning](https://cloud.google.com/ml-engine/docs/tensorflow/using-hyperparameter-tuning)."]},{"cell_type":"code","metadata":{"id":"NzzaWxOhSxBy","colab_type":"code","colab":{}},"source":["m = get_model()\n","\n","m.fit(\n","    x=training, \n","    epochs=EPOCHS, \n","    steps_per_epoch=int(TRAIN_SIZE / BATCH_SIZE), \n","    validation_data=evaluation,\n","    validation_steps=int(EVAL_SIZE/BATCH_SIZE),\n","    callbacks = [checkpoint, tensorboard]\n","    )\n","\n","#We save the model definition and weights to google drive (free) \n","m.save('drive/My Drive/Tensorflow/NC_solar/models/UNET256/Uncalibrated/UNET256.h5')"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"zvIqqpNXqJSE","colab_type":"text"},"source":["##Train from checkpoints\n","If we want to resume or continue training from a previous checkpoint we load the model and best weights from GDrive, check the current accuracy on the evaluation data, and resume training."]},{"cell_type":"code","metadata":{"id":"q0xgBhsaqInV","colab_type":"code","colab":{}},"source":["#bring in the architecture and best weights from Drive\n","m = models.load_model('drive/My Drive/Tensorflow/NC_solar/models/UNET256/Uncalibrated/UNET256.h5', custom_objects={'weighted_bce': weighted_bce})\n","m.load_weights('drive/My Drive/Tensorflow/NC_solar/models/UNET256/Uncalibrated/best_weights.hdf5') "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"umZy0rBzs1Th","colab_type":"code","colab":{}},"source":["#lets see where were at\n","evalMetrics = m.evaluate(x=evaluation, steps = EVAL_SIZE, verbose = 1)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"xlsFciElxOUA","colab_type":"code","colab":{}},"source":["#set the monitored value (val_mean_io_u) to current evaluation output\n","checkpoint = tf.keras.callbacks.ModelCheckpoint(\n","    log_dir+'best_weights.hdf5',\n","    monitor='val_mean_io_u',\n","    verbose=1,\n","    save_best_only=True,\n","    mode='max'\n","    )\n","\n","checkpoint.best = evalMetrics[2]\n","print(checkpoint.__dict__)\n","print(checkpoint.best)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Ty8wCxDtqWBM","colab_type":"code","colab":{}},"source":["#Now keep training!\n","m.fit(\n","    x=training, \n","    epochs= 10, \n","    steps_per_epoch=int(TRAIN_SIZE / BATCH_SIZE), \n","    validation_data=evaluation,\n","    validation_steps=EVAL_SIZE/BATCH_SIZE,\n","    callbacks = [checkpoint, tensorboard]\n","    )\n","#m.save('drive/My Drive/Tensorflow/models/UNET256/UNET256.h5')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"tyhWcGHJ82e8","colab_type":"code","colab":{}},"source":["m.save('drive/My Drive/Tensorflow/models/UNET256/UNET256.h5')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"i9OM5BiS1xYQ","colab_type":"code","colab":{}},"source":["%tensorboard --logdir 'drive/My Drive/Tensorflow/models/UNET256'"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"J1ySNup0xCqN","colab_type":"text"},"source":["# Prediction\n","\n","The prediction pipeline is:\n","\n","1.  Export imagery on which to do predictions from Earth Engine in TFRecord format to a Cloud Storge bucket.\n","2.  Use the trained model to make the predictions.\n","3.  Write the predictions to a TFRecord file in a Cloud Storage.\n","4.  Upload the predictions TFRecord file to Earth Engine.\n","\n","The following functions handle this process.  It's useful to separate the export from the predictions so that you can experiment with different models without running the export every time."]},{"cell_type":"code","metadata":{"id":"lv6nb0ShH4_T","colab_type":"code","colab":{}},"source":["#Inspect the prediction outputs\n","predictions = m.predict(evaluation, steps=1, verbose=1)\n","for prediction in predictions:\n","  print(predictions)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"M3WDAa-RUpXP","colab_type":"code","colab":{}},"source":["def doExport(image, path, out_image_base, kernel_buffer, region):\n","  \"\"\"\n","  Run an image export task on which to run predictions.  Block until complete.\n","  Parameters:\n","    image (ee.Image): image to be exported for prediction\n","    path (str): google cloud directory path for export\n","    out_image_base (str): base filename of exported image\n","    kernel_buffer (array<int>): pixels to buffer the prediction patch. half added to each side\n","    region (ee.Geometry):\n","  \"\"\"\n","  task = ee.batch.Export.image.toCloudStorage(\n","    image = image.select(BANDS), \n","    description = out_image_base, \n","    bucket = BUCKET, \n","    fileNamePrefix = join(path, out_image_base),\n","    region = region.getInfo()['coordinates'], \n","    scale = 10, \n","    fileFormat = 'TFRecord', \n","    maxPixels = 1e13,\n","    formatOptions = { \n","      'patchDimensions': KERNEL_SHAPE,\n","      'kernelSize': kernel_buffer,\n","      'compressed': True,\n","      'maxFileSize': 104857600\n","    }\n","  )\n","  task.start()\n","\n","  # Block until the task completes.\n","  print('Running image export to Cloud Storage...')\n","  import time\n","  while task.active():\n","    time.sleep(30)\n","\n","  # Error condition\n","  if task.status()['state'] != 'COMPLETED':\n","    print('Error with image export.')\n","  else:\n","    print('Image export completed.')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"zb_9_FflygVw","colab_type":"code","colab":{}},"source":["def doPrediction(pred_path, pred_image_base, user_folder, out_image_base, kernel_buffer, region):\n","  \"\"\"\n","  Perform inference on exported imagery, upload to Earth Engine.\n","  Parameters:\n","    pred_path (str): Google cloud (or Drive) path storing prediction image files\n","    pred_image_base (str):\n","    user_folder (str): \n","    out_image_base (str): base filename for \n","    kernel_buffer (Array<int>): length 2 array \n","    region (ee.Geometry)):\n","  \"\"\"\n","\n","  print('Looking for TFRecord files...')\n","  \n","  # Get a list of all the files in the output bucket.\n","  filesList = !gsutil ls {join(BUCKET_PATH, pred_path)}\n","  # Get only the files generated by the image export.\n","  exportFilesList = [s for s in filesList if pred_image_base in s]\n","\n","  # Get the list of image files and the JSON mixer file.\n","  imageFilesList = []\n","  jsonFile = None\n","  for f in exportFilesList:\n","    if f.endswith('.tfrecord.gz'):\n","      imageFilesList.append(f)\n","    elif f.endswith('.json'):\n","      jsonFile = f\n","\n","  # Make sure the files are in the right order.\n","  imageFilesList.sort()\n","\n","  from pprint import pprint\n","  pprint(imageFilesList)\n","  print(jsonFile)\n","  \n","  import json\n","  # Load the contents of the mixer file to a JSON object.\n","  jsonText = !gsutil cat {jsonFile}\n","  # Get a single string w/ newlines from the IPython.utils.text.SList\n","  mixer = json.loads(jsonText.nlstr)\n","  pprint(mixer)\n","  patches = mixer['totalPatches']\n","  \n","  # Get set up for prediction.\n","  x_buffer = int(kernel_buffer[0] / 2)\n","  y_buffer = int(kernel_buffer[1] / 2)\n","\n","  buffered_shape = [\n","      KERNEL_SHAPE[0] + kernel_buffer[0],\n","      KERNEL_SHAPE[1] + kernel_buffer[1]]\n","\n","  imageColumns = [\n","    tf.io.FixedLenFeature(shape=buffered_shape, dtype=tf.float32) \n","      for k in BANDS\n","  ]\n","\n","  imageFeaturesDict = dict(zip(BANDS, imageColumns))\n","\n","  def parse_image(example_proto):\n","    return tf.io.parse_single_example(example_proto, imageFeaturesDict)\n","\n","  def toTupleImage(dic):\n","    inputsList = [dic.get(key) for key in BANDS]\n","    stacked = tf.stack(inputsList, axis=0)\n","    stacked = tf.transpose(stacked, [1, 2, 0])\n","    return stacked\n","  \n","   # Create a dataset from the TFRecord file(s) in Cloud Storage.\n","  imageDataset = tf.data.TFRecordDataset(imageFilesList, compression_type='GZIP')\n","  imageDataset = imageDataset.map(parse_image, num_parallel_calls=5)\n","  imageDataset = imageDataset.map(toTupleImage).batch(1)\n","  \n","  # Perform inference.\n","  print('Running predictions...')\n","  predictions = m.predict(imageDataset, steps=patches, verbose=1)\n","  # print(predictions[0])\n","\n","  print('Writing predictions...')\n","  out_image_file = join(BUCKET_PATH, pred_path, 'outputs', out_image_base + '.TFRecord')\n","  writer = tf.io.TFRecordWriter(out_image_file)\n","  patches = 0\n","  for predictionPatch in predictions:\n","    print('Writing patch ' + str(patches) + '...')\n","    predictionPatch = predictionPatch[\n","        x_buffer:x_buffer+KERNEL_SIZE, y_buffer:y_buffer+KERNEL_SIZE]\n","\n","    # Create an example.\n","    example = tf.train.Example(\n","      features=tf.train.Features(\n","        feature={\n","          'probability': tf.train.Feature(\n","              float_list=tf.train.FloatList(\n","                  value=predictionPatch.flatten()))\n","        }\n","      )\n","    )\n","    # Write the example.\n","    writer.write(example.SerializeToString())\n","    patches += 1\n","\n","  writer.close()\n","\n","  # Start the upload.\n","  out_image_asset = user_folder + '/' + out_image_base\n","  !earthengine upload image --asset_id={out_image_asset} {out_image_file} {jsonFile}"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"LZqlymOehnQO","colab_type":"text"},"source":["Now there's all the code needed to run the prediction pipeline, all that remains is to specify the output region in which to do the prediction, the names of the output files, where to put them, and the shape of the outputs.  In terms of the shape, the model is trained on 256x256 patches, but can work (in theory) on any patch that's big enough with even dimensions ([reference](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf)).  Because of tile boundary artifacts, give the model slightly larger patches for prediction, then clip out the middle 256x256 patch.  This is controlled with a kernel buffer, half the size of which will extend beyond the kernel buffer.  For example, specifying a 128x128 kernel will append 64 pixels on each side of the patch, to ensure that the pixels in the output are taken from inputs completely covered by the kernel.  "]},{"cell_type":"markdown","metadata":{"id":"G9UaJxPS3uZw","colab_type":"text"},"source":["### Test images"]},{"cell_type":"code","metadata":{"id":"BqDRwb6j27w-","colab_type":"code","colab":{}},"source":["# create several small aois to test predictions\n","test_aoi_1 = ee.Geometry.Polygon(\n","        [[[-78.19610376358034, 35.086989862385884],\n","          [-78.19610376358034, 34.735631502732396],\n","          [-77.67974634170534, 34.735631502732396],\n","          [-77.67974634170534, 35.086989862385884]]], None, False)\n","test_aoi_2 = ee.Geometry.Polygon(\n","        [[[-81.59087915420534, 35.84308746418702],\n","          [-81.59087915420534, 35.47711130797561],\n","          [-81.03057641983034, 35.47711130797561],\n","          [-81.03057641983034, 35.84308746418702]]], None, False)\n","test_aoi_3 = ee.Geometry.Polygon(\n","        [[[-78.74447677513596, 36.4941960586897],\n","          [-78.74447677513596, 36.17115435938789],\n","          [-78.21713302513596, 36.17115435938789],\n","          [-78.21713302513596, 36.4941960586897]]], None, False)\n","test_aoi_4 = ee.Geometry.Polygon(\n","        [[[-76.62411544701096, 36.33505523381603],\n","          [-76.62411544701096, 36.03800955668766],\n","          [-76.16818282982346, 36.03800955668766],\n","          [-76.16818282982346, 36.33505523381603]]], None, False)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"CQyLfPdt3TcA","colab_type":"code","colab":{}},"source":["# Create a different test image\n","S2 = ee.ImageCollection(\"COPERNICUS/S2\")\n","# Grab a feature corresponding to our study area - North Carolina\n","states = ee.FeatureCollection(\"TIGER/2016/States\")\n","nc = states.filter(ee.Filter.eq('NAME', 'North Carolina'))\n","begin = '2018-05-01'\n","end = '2018-08-30'\n","\n","# The image input collection is cloud-masked.\n","filtered = S2.filterDate(begin, end)\\\n",".filterBounds(nc)\\\n",".filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20))\\\n",".map(basicQA)\n","\n","# Create a simple median composite to visualize\n","test = filtered.median().select(BANDS).clip(test_aoi_4)\n","\n","# Use folium to visualize the imagery.\n","#mapid = image.getMapId({'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 0.3})\n","rgbParams = {'bands': ['B4', 'B3', 'B2'],\n","             'min': 0,\n","             'max': 0.3}\n","\n","nirParams = {'bands': ['B8', 'B11', 'B12'],\n","             'min': 0,\n","             'max': 0.3}\n","\n","map = folium.Map(location=[35.402, -78.376])\n","map.add_ee_layer(test, rgbParams, 'Color')\n","map.add_ee_layer(test, nirParams, 'Thermal')\n","\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"FPANwc7B1-TS","colab_type":"code","colab":{}},"source":["# Choose the GEE folder in which to ingest prediction image:\n","user_folder = 'users/defendersofwildlifeGIS/NC'\n","\n","# prediction path\n","nc_path = join(FOLDER, PRED_BASE)\n","# Base file name to use for TFRecord files and assets. The name structure includes:\n","# the image processing used ['raw', 'calibrated', 'normalized'], the model\n","nc_image_base = 'raw_unet256_testpred1'\n","# Half this will extend on the sides of each patch.\n","nc_kernel_buffer = [128, 128]\n","# NC\n","nc_region = test_aoi_1"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"sopslcqBzkjA","colab_type":"code","colab":{}},"source":["print(join(BUCKET_PATH, nc_path, nc_image_base + '*.tfrecord.gz'))\n","!gsutil ls {join(BUCKET_PATH, nc_path, nc_image_base + '*.tfrecord.gz')}"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"9FDKc2ZwzODu","colab_type":"code","colab":{}},"source":["# Start the upload.\n","jsonFile = 'gs://cvod-203614-mlengine/NC_solar/data/predict/raw_unet256_testpred3mixer.json'\n","out_image_file = 'gs://cvod-203614-mlengine/NC_solar/data/predict/outputs/raw_unet256_20_test3.TFRecord'\n","!earthengine upload image --asset_id=users/defendersofwildlifeGIS/NC/raw_unet256_20_test3 {out_image_file} {jsonFile}"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"lLNEOLkXWvSi","colab_type":"code","cellView":"both","colab":{}},"source":["# Run the export.\n","doExport(test, nc_path, nc_image_base, nc_kernel_buffer, nc_region)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"KxACnxKFrQ_J","colab_type":"code","cellView":"both","colab":{}},"source":["# Run the prediction.\n","doPrediction(nc_path, nc_image_base, user_folder, 'raw_unet256_20_test1', nc_kernel_buffer, nc_region)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uj_G9OZ1xH6K","colab_type":"text"},"source":["# Display the output\n","\n","One the data has been exported, the model has made predictions and the predictions have been written to a file, and the image imported to Earth Engine, it's possible to display the resultant Earth Engine asset.  Here, display the impervious area predictions over Beijing, China."]},{"cell_type":"code","metadata":{"id":"Jgco6HJ4R5p2","colab_type":"code","colab":{}},"source":["out_image = ee.Image(user_folder + '/' + nc_image_base)\n","mapid = out_image.getMapId({'min': 0, 'max': 1})\n","map = folium.Map(location=[39.898, 116.5097])\n","map.add_ee_layer(out_image, {'min': 0, 'max': 1}, 'solar predictions')\n","map.add_child(folium.LayerControl())\n","map"],"execution_count":0,"outputs":[]}]}