Skip to content

SeasyHQ/satellite-image-deep-learning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This document lists resources for performing deep learning (DL) on satellite imagery. To a lesser extent classical Machine learning (ML, e.g. random forests) are also discussed, as are classical image processing techniques. Note there is a huge volume of academic literature published on these topics, and this repo does not seek to index them all but rather list approachable resources with published code that will benefit both the research and developer communities.

Table of contents

Top links

Techniques

This section explores the different deep and machine learning techniques people are applying to common problems in satellite imagery analysis.

Classification

The classic cats vs dogs labelling task, which in the remote sensing domain is used to assign a label to an image, e.g. this is an image of a forest. The more complex case is applying multiple labels to an image. Not to be confused with pixel-level classification which is called segmentation.

Segmentation

Segmentation will assign a class label to each pixel in an image. Segmentation is typically grouped into semantic or instance segmentation. In semantic segmentation objects of the same class are assigned the same label, whilst in instance segmentation each object is assigned a unique label. Read this beginner’s guide to segmentation. Single class models are often trained for road or building segmentation, with multi class for land use/crop type classification. Image annotation can take long than for classification/object detection since every pixel must be annotated. Note that many articles which refer to 'hyperspectral land classification' are actually describing semantic segmentation.

Semantic segmentation

Almost always performed using U-Net. For multi/hyper-spectral imagery more classical techniques may be used (e.g. k-means).

Semantic segmentation - multi class classification

Semantic segmentation - buildings & roads

Semantic segmentation - vegitation & crop boundaries

Semantic segmentation - water & floods

Semantic segmentation - fire & burn areas

Semantic segmentation - glaciers

  • HED-UNet -> a model for simultaneous semantic segmentation and edge detection, examples provided are glacier fronts and building footprints using the Inria Aerial Image Labeling dataset
  • glacier_mapping -> Mapping glaciers in the Hindu Kush Himalaya, Landsat 7 images, Shapefile labels of the glaciers, Unet with dropout

Instance segmentation

In instance segmentation, each individual 'instance' of a segmented area is given a unique lable. For detection of very small objects this may a good approach, but it can struggle seperating individual areas that are closely spaced.

Object detection

Put a box around individual objects in an image. A good introduction to the challenge of performing object detection on aerial imagery is given in this paper. In summary, images are large and objects may comprise only a few pixels, easily confused with random features in background. In general object detecion performs well on large objects, and gets increasingly difficult as the objects get smaller & more densely packed. Model accuracy falls off rapidly as resolution degrades, so it is common for object detection to use very high resolution imagery, e.g. 30cm RGB.

Object detection - buildings, rooftops & solar panels

Object detection - boats

Object detection - vehicles

Object detection - planes

Object detection - animals

  • cownter_strike -> counting cows, located with point-annotations, two models: CSRNet (a density-based method) & LCFCN (a detection-based method)

Counting trees

Oil storage tank detection & oil spills

Oil is stored in tanks at many points between extraction and sale, and the volume of oil in storage is an important economic indicator.

Cloud detection & removal

Generally treated as a semantic segmentation problem.

Change detection & time-series

Monitor water levels, coast lines, size of urban areas, wildfire damage. Note, clouds change often too..!

Wealth and economic activity measurement

The goal is to predict economic activity from satellite imagery rather than conducting labour intensive ground surveys

Super-resolution

Super-resolution attempts to enhance the resolution of an imaging system, and can be applied as a pre-processing step to improve the detection of small objects. For an introduction to this topic read this excellent article. Note that SR techniques operate on a single image or a stack images/video frames.

Image-to-image translation

Translate images e.g. from SAR to RGB.

GANS

Autoencoders & Dimensionality Reduction

Self-supervised/unsupervised learning

The terms self-supervised & unsupervised learning are often used interchangably in the literature, and describe tehcniques using unlabelled data. In general, the more classical techniques such as k-means classification or PCA are referred to as unsupervised, whilst newer techniques using CNN feature extraction or autoencoders are referred to as self-supervised. Yann LeCun has described self-supervised/unsupervised learning as the 'base of the cake': If we think of our brain as a cake, then the cake base is unsupervised learning. The machine predicts any part of its input for any observed part, all without the use of labelled data. Supervised learning forms the icing on the cake, and reinforcement learning is the cherry on top.

Mixed data learning

These techniques combine multiple data types, e.g. imagery and text data.

Pansharpening

Image fusion of low res multispectral with high res pan band.

NVDI - vegetation index

General image quality

Image registration

Image registration is the process of transforming different sets of data into one coordinate system. Typical use is overlapping images taken at different times or with different cameras.

Terrain mapping, Lidar & DEMs

Measure surface contours.

Thermal Infrared

SAR

ML best practice

This section includes tips and ideas I have picked up from other practitioners including ai-fast-track, FraPochetti & the IceVision community

Datasets

Warning satellite image files can be LARGE, even a small data set may comprise 50 GB of imagery

Sentinel

Landsat

Maxar

Planet

UC Merced

PatternNet

Spacenet

Kaggle

Kaggle hosts over > 100 satellite image datasets, search results here. The kaggle blog is an interesting read.

Kaggle - Amazon from space - classification challenge

Kaggle - DSTL - segmentation challenge

Kaggle - Airbus Ship Detection Challenge

Kaggle - Draper - place images in order of time

Kaggle - Deepsat - classification challenge

Not satellite but airborne imagery. Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared. The training and test labels are one-hot encoded 1x6 vectors. Each image patch is size normalized to 28x28 pixels. Data in .mat Matlab format. JPEG?

  • Imagery source
  • Sat4 500,000 image patches covering four broad land cover classes - barren land, trees, grassland and a class that consists of all land cover classes other than the above three
  • Sat6 405,000 image patches each of size 28x28 and covering 6 landcover classes - barren land, trees, grassland, roads, buildings and water bodies.
  • Deep Gradient Boosted Learning article

Kaggle - Understanding Clouds from Satellite Images

In this challenge, you will build a model to classify cloud organization patterns from satellite images.

Kaggle - Airbus oil storage detection dataset

Kaggle - Satellite images of hurricane damage

Kaggle - miscellaneous

Tensorflow datasets

  • resisc45 - RESISC45 dataset is a publicly available benchmark for Remote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class.
  • eurosat - EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples.
  • bigearthnet - The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels.

AWS datasets

Microsoft

Google Earth Engine (GEE)

Since there is a whole community around GEE I will not reproduce it here but list very select references. Get started at https://developers.google.com/earth-engine/

Radiant Earth

FAIR1M ‘world’s largest satellite image database’

DEM (digital elevation maps)

  • Shuttle Radar Topography Mission, search online at usgs.gov
  • Copernicus Digital Elevation Model (DEM) on S3, represents the surface of the Earth including buildings, infrastructure and vegetation. Data is provided as Cloud Optimized GeoTIFFs. link

Weather Datasets

Time series & change detection datasets

  • BreizhCrops -> A Time Series Dataset for Crop Type Mapping
  • The SeCo dataset contains image patches from Sentinel-2 tiles captured at different timestamps at each geographical location. Download SeCo here
  • Onera Satellite Change Detection Dataset comprises 24 pairs of multispectral images taken from the Sentinel-2 satellites between 2015 and 2018
  • SYSU-CD -> The dataset contains 20000 pairs of 0.5-m aerial images of size 256×256 taken between the years 2007 and 2014 in Hong Kong

UAV & Drone datasets

Synthetic data

Interesting deep learning projects

Raster Vision by Azavea

torchrs - PyTorch Remote Sensing

  • torchrs
  • PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Resolution, Land Cover Classification/Segmentation, Image-to-Image Translation, etc.) for various Optical (Sentinel-2, Landsat, etc.) and Synthetic Aperture Radar (SAR) (Sentinel-1) sensors

chip-n-scale-queue-arranger by developmentseed

spaceml.org

  • http://spaceml.org/
  • A Machine Learning toolbox and developer community building the next generation AI applications for space science and exploration.

TorchSat (no activity since June 2020)

  • TorchSat is an open-source deep learning framework for satellite imagery analysis based on PyTorch

DeepNetsForEO (no activity since 2019)

Skynet-data (no activity since 2018)

RoboSat (no longer maintained)

  • https://github.com/mapbox/robosat
  • Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
  • robosat-jupyter-notebook -> walks through all of the steps in an excellent blog post on the Robosat feature extraction and machine learning pipeline.
  • Note there is/was fork of Robosat, originally named RoboSat.pink, and subsequently neat-EO.pink although this appears to be dead/archived

DeepOSM (no activity since 2017)

State of the art

  • Compute and data storage are moving to the cloud
  • A combination of batch processing on clusters and serverless functions are common for routine compute tasks
  • Custom hardware is being developed for rapid training and inferencing with deep learning models
  • Traditional data formats aren't designed for processing on the cloud, so new standards are evolving such as COGS and STAC
  • Read about how Planet and Airbus use Google Cloud as their backend
  • Google Earth Engine and Microsoft Planetary Computer are democratising access to huge compute platforms
  • Whilst the combo of python and keras/pytorch are currently preeminent, new python libraries such as Jax and alternative languages such as Julia are showing serious promise

Online platforms for performing analytics

  • This article discusses some of the available platforms
  • Pangeo -> There is no single software package called “pangeo”; rather, the Pangeo project serves as a coordination point between scientists, software, and computing infrastructure. Includes open source resources for parallel processing using Dask and Xarray. Pangeo recently announced their 2.0 goals: pivoting away from directly operating cloud-based JupyterHubs, and towards eductaion and research
  • Airbus Sandbox -> will provide access to imagery
  • Descartes Labs -> access to EO imagery from a variety of providers via python API
  • DigitalGlobe have a cloud hosted Jupyter notebook platform called GBDX. Cloud hosting means they can guarantee the infrastructure supports their algorithms, and they appear to be close/closer to deploying DL.
  • Planet have a Jupyter notebook platform which can be deployed locally.
  • eurodatacube.com -> data & platform for EO analytics in Jupyter env, paid
  • up42 is a developer platform and marketplace, offering all the building blocks for powerful, scalable geospatial products
  • Microsoft Planetary Computer -> direct Google Earth Engine competitor in the making?
  • eofactory.ai -> supports multi public and private data sources that can be used to analyse and extract information

Free online computing resources

A GPU is required for training deep learning models (but not necessarily for inferencing), and this section lists a couple of free Jupyter environments with GPU available. There is a good overview of online Jupyter development environments on the fastai site. I personally use Colab Pro with data hosted on Google Drive, or Sagemaker if I have very long running training jobs.

Google Colab

  • Collaboratory notebooks with GPU as a backend for free for 12 hours at a time. Note that the GPU may be shared with other users, so if you aren't getting good performance try reloading.
  • Also a pro tier for $10 a month -> https://colab.research.google.com/signup
  • Tensorflow, pytorch & fastai available but you may need to update them
  • Colab Alive is a chrome extension that keeps Colab notebooks alive.
  • colab-ssh -> lets you ssh to a colab instance like it’s an EC2 machine and install packages that require full linux functionality

Kaggle - also Google!

  • Free to use
  • GPU Kernels - may run for 1 hour
  • Tensorflow, pytorch & fastai available but you may need to update them
  • Advantage that many datasets are already available

Cloud providers

An overview of the most relevant services provided by the main cloud providers. This section is limited since I personally use AWS and have a small amount of experience with Google. Also consider Microsoft Azure.

AWS

Google cloud

  • For storage use Cloud Storage (AWS S3 equivalent)
  • For data warehousing use BigQuery (AWS Redshift equivalent). Visualize massive spatial datasets directly in BigQuery using CARTO
  • For model training use Vertex (AWS Sagemaker equivalent)
  • For containerised apps use Cloud Run (AWS App Runner equivalent but can scale to zero)

Deploying models to production

This section discusses how to get a trained machine learning & specifically deep learning model into production. For an overview on serving deep learning models checkout Practical-Deep-Learning-on-the-Cloud. There are many options if you are happy to dedicate a server, although you may want a GPU for batch processing. For serverless consider AWS lambda.

Rest API on dedicated server

A common approach to serving up deep learning model inference code is to wrap it in a rest API. The API can be implemented in python (flask or FastAPI), and hosted on a dedicated server e.g. EC2 instance. Note that making this a scalable solution will require significant experience.

Framework specific model serving options

If you are happy to live exclusively in the Tensorflow or Pytorch ecosystem, these are good options

NVIDIA Triton server

Image formats, data management and catalogues

Cloud Optimised GeoTiff (COG)

A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF that supports HTTP range requests, enabling downloading of specific tiles rather than the full file. COG generally work normally in GIS software such as QGIS, but are larger than regular GeoTIFFs

SpatioTemporal Asset Catalog specification (STAC)

The STAC specification provides a common metadata specification, API, and catalog format to describe geospatial assets, so they can more easily indexed and discovered.

Image annotation

For supervised machine learning, you will require annotated images. For example if you are performing object detection you will need to annotate images with bounding boxes. Check that your annotation tool of choice supports large image (likely geotiff) files, as not all will. Note that GeoJSON is widely used by remote sensing researchers but this annotation format is not commonly supported in general computer vision frameworks, and in practice you may have to convert the annotation format to use the data with your chosen framework. There are both closed and open source tools for creating and converting annotation formats. Some of these tools are simply for performing annotation, whilst others add features such as dataset management and versioning.

Open source & desktop annotation tools

Start with labelImg or labelme if you are annotating solo, or CVAT if you are in a team.

  • If you are considering building an in house annotation platform read this article. Used PostGis database, GeoJson format and GIS standard in a stateless architecture.
  • labelImg is the classic desktop tool, limited to bounding boxes for object detection. Also checkout roLabelImg which supports ROTATED rectangle regions, as often occurs in aerial imagery.
  • Labelme is a simple dektop app for polygonal annotation, but note it outputs annotations in a custom LabelMe JSON format which you will need to convert. Read Labelme Image Annotation for Geotiffs
  • CVAT suports object detection, segmentation and classification via a local web app. There is an open issue to support large TIFF files. This article on Roboflow gives a good intro to CVAT.
  • Create your own annotation tool using Bokeh Holoviews
  • geolabel-maker -> combine satellite or aerial imagery with vector spatial data to create your own ground-truth dataset in the COCO format for deep-learning models
  • VoTT -> an electron app for building end to end Object Detection Models from Images and Videos, by Microsoft
  • Label Studio is a multi-type data labeling and annotation tool with standardized output format, webpage at labelstud.io
  • Deeplabel is a cross-platform tool for annotating images with labelled bounding boxes. Deeplabel also supports running inference using state-of-the-art object detection models like Faster-RCNN and YOLOv4. With support out-of-the-box for CUDA, you can quickly label an entire dataset using an existing model.
  • Alturos.ImageAnnotation is a collaborative tool for labeling image data on S3 for yolo
  • rectlabel is a desktop app for MacOS to annotate images for bounding box object detection and segmentation, paid and free (rectlabel-lite) versions
  • pigeonXT can be used to create custom image classification annotators within Jupyter notebooks
  • ipyannotations -> Image annotations in python using jupyter notebooks
  • diffgram supports cloud backends, also available as hosted service
  • Label-Detect -> is a graphical image annotation tool and using this tool a user can also train and test large satellite images, fork of the popular labelImg tool
  • Swipe-Labeler -> Swipe Labeler is a Graphical User Interface based tool that allows rapid labeling of image data
  • SuperAnnotate can be run locally or used via a cloud service
  • dash_doodler -> A web application built with plotly/dash for image segmentation with minimal supervision

Enterprise grade annotation platforms

Generally more fully featured than open source tools, often adding model assisted labelling & integration with providers of annotation as a service (outsourced annotation). There are many companies competing in this space, so I just list a few I have experience with.

  • GroundWork is designed for annotating and labeling geospatial data like satellite imagery, from Azavea
  • Roboflow can be used to convert between annotation formats & manage datasets, as well as train and deploy custom models. Free tier quite useful
  • supervise.ly is one of the more fully featured platforms, decent free tier
  • AWS supports image annotation via the Rekognition Custom Labels console
  • The labelbox.com free tier is quite generous

Annotation formats

Note there are many annotation formats, although PASCAL VOC and coco-json are the most commonly used.

  • PASCAL VOC format: XML files in the format used by ImageNet
  • coco-json format: JSON in the format used by the 2015 COCO dataset
  • YOLO Darknet TXT format: contains one text file per image, used by YOLO
  • Tensorflow TFRecord: a proprietary binary file format used by the Tensorflow Object Detection API
  • Many more formats listed here

Useful paid software

Many of these companies & products predate the open source software boom, and offer functionality which can be found in open source alternatives. However it is important to consider the licensing and support aspects before adopting an open source stack.

  • ArcGIS -> mapping and analytics software, with both local and cloud hosted options. Checkout Geospatial deep learning with arcgis.learn. It appears ArcGIS are using fastai for their deep learning backend. ArcGIS Jupyter Notebooks in ArcGIS Enterprise are built to run big data analysis, deep learning models, and dynamic visualization tools.
  • ENVI -> image processing and analysis
  • ERDAS IMAGINE -> remote sensing, photogrammetry, LiDAR analysis, basic vector analysis, and radar processing into a single product
  • PEARL -> a human-in-the-loop AI tool to drastically reduce the time required to produce an accurate land cover map, blog post, uses Microsoft Planetary Computer and (some?) ML models run locally in the browser
  • Spacemetric Keystone -> transform unprocessed sensor data into quality geospatial imagery ready for analysis
  • microimages TNTgis -> advanced GIS, image processing, and geospatial analysis at an affordable price

Useful open source software

A note on licensing: The two general types of licenses for open source are copyleft and permissive. Copyleft requires that subsequent derived software products also carry the license forward, e.g. the GNU Public License (GNU GPLv3). For permissive, options to modify and use the code as one please are more open, e.g. MIT & Apache 2. Checkout choosealicense.com/

GDAL & Rasterio

  • So improtant this pair gets their own section. GDAL is THE command line tool for reading and writing raster and vector geospatial data formats. If you are using python you will probably want to use Rasterio which provides a pythonic wrapper for GDAL
  • GDAL and on twitter
  • GDAL is a dependency of Rasterio and can be difficult to build and install. I recommend using conda, brew (on OSX) or docker in these situations
  • GDAL docker quickstart: docker pull osgeo/gdal then docker run --rm -v $(pwd):/data/ osgeo/gdal gdalinfo /data/cog.tiff
  • Even Rouault maintains GDAL, please consider sponsoring him
  • Rasterio -> reads and writes GeoTIFF and other raster formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON. There are a variety of plugins that extend Rasterio functionality.
  • rio-cogeo -> Cloud Optimized GeoTIFF (COG) creation and validation plugin for Rasterio.
  • rioxarray -> geospatial xarray extension powered by rasterio
  • aws-lambda-docker-rasterio -> AWS Lambda Container Image with Python Rasterio for querying Cloud Optimised GeoTiffs. See this presentation
  • godal -> golang wrapper for GDAL
  • Write rasterio to xarray
  • Loam: A Client-Side GDAL Wrapper for Javascript
  • Short list of useful GDAL commands while working in data science for remote sensing

General utilities

  • PyShp -> The Python Shapefile Library (PyShp) reads and writes ESRI Shapefiles in pure Python
  • s2p -> a Python library and command line tool that implements a stereo pipeline which produces elevation models from images taken by high resolution optical satellites such as Pléiades, WorldView, QuickBird, Spot or Ikonos
  • EarthPy -> A set of helper functions to make working with spatial data in open source tools easier. readExploratory Data Analysis (EDA) on Satellite Imagery Using EarthPy
  • pygeometa -> provides a lightweight and Pythonic approach for users to easily create geospatial metadata in standards-based formats using simple configuration files
  • pesto -> PESTO is designed to ease the process of packaging a Python algorithm as a processing web service into a docker image. It contains shell tools to generate all the boiler plate to build an OpenAPI processing web service compliant with the Geoprocessing-API. By Airbus Defence And Space
  • GEOS -> Google Earth Overlay Server (GEOS) is a python-based server for creating Google Earth overlays of tiled maps. Your can also display maps in the web browser, measure distances and print maps as high-quality PDF’s.
  • GeoDjango intends to be a world-class geographic Web framework. Its goal is to make it as easy as possible to build GIS Web applications and harness the power of spatially enabled data. Some features of GDAL are supported.
  • rasterstats -> summarize geospatial raster datasets based on vector geometries
  • turfpy -> a Python library for performing geospatial data analysis which reimplements turf.js
  • image-similarity-measures -> Implementation of eight evaluation metrics to access the similarity between two images. Blog post here
  • rsgislib -> Remote Sensing and GIS Software Library; python module tools for processing spatial data.
  • eo-learn is a collection of open source Python packages that have been developed to seamlessly access and process spatio-temporal image sequences acquired by any satellite fleet in a timely and automatic manner
  • RStoolbox: Tools for Remote Sensing Data Analysis in R
  • nd -> Framework for the analysis of n-dimensional, multivariate Earth Observation data, built on xarray
  • reverse-geocoder -> a fast, offline reverse geocoder in Python

Low level numerical & data formats

  • xarray -> N-D labeled arrays and datasets. Read Handling multi-temporal satellite images with Xarray. Checkout xarray_leaflet for tiled map plotting
  • xarray-spatial -> Fast, Accurate Python library for Raster Operations. Implements algorithms using Numba and Dask, free of GDAL
  • xarray-beam -> Distributed Xarray with Apache Beam by Google
  • Geowombat -> geo-utilities applied to air- and space-borne imagery, uses Rasterio, Xarray and Dask for I/O and distributed computing with named coordinates
  • NumpyTiles -> a specification for providing multiband full-bit depth raster data in the browser
  • Zarr -> Zarr is a format for the storage of chunked, compressed, N-dimensional arrays. Zarr depends on NumPy

Image handling and manipulation packages

  • Pillow is the Python Imaging Library -> this will be your go-to package for image manipulation in python
  • opencv-python is pre-built CPU-only OpenCV packages for Python
  • kornia is a differentiable computer vision library for PyTorch, like openCV but on the GPU. Perform image transformations, epipolar geometry, depth estimation, and low-level image processing such as filtering and edge detection that operate directly on tensors.
  • tifffile -> Read and write TIFF files
  • xtiff -> A small Python 3 library for writing multi-channel TIFF stacks
  • geotiff -> A noGDAL tool for reading and writing geotiff files
  • image_slicer -> Split images into tiles. Join the tiles back together.
  • tiler -> split images into tiles and merge tiles into a large image
  • felicette -> Satellite imagery for dummies. Generate JPEG earth imagery from coordinates/location name with publicly available satellite data.
  • imagehash -> Image hashes tell whether two images look nearly identical.
  • xbatcher -> Xbatcher is a small library for iterating xarray DataArrays in batches. The goal is to make it easy to feed xarray datasets to machine learning libraries such as Keras.
  • fake-geo-images -> A module to programmatically create geotiff images which can be used for unit tests
  • sahi -> A vision library for performing sliced inference on large images/small objects
  • imagededup -> Finding duplicate images made easy! Uses perceptual hashing
  • rmstripes -> Remove stripes from images with a combined wavelet/FFT approach
  • activeloopai Hub -> The fastest way to store, access & manage datasets with version-control for PyTorch/TensorFlow. Works locally or on any cloud. Scalable data pipelines.
  • sewar -> All image quality metrics you need in one package
  • fiftyone -> open-source tool for building high-quality datasets and computer vision models. Visualise complex labels, evaluating models, exploring scenarios of interest, identifying failure modes, finding annotation mistakes, and much more!
  • GeoTagged_ImageChip -> A simple script to create geo tagged image chips from high resolution RS iamges for training deep learning models such as Unet.
  • Label Maker -> downloads OpenStreetMap QA Tile information and satellite imagery tiles and saves them as an .npz file for use in machine learning training.
  • Satellite imagery label tool -> provides an easy way to collect a random sample of labels over a given scene of satellite imagery

Image augmentation packages

Image augmentation is a technique used to expand a training dataset in order to improve ability of the model to generalise

  • AugLy -> A data augmentations library for audio, image, text, and video. By Facebook
  • albumentations -> Fast image augmentation library and an easy-to-use wrapper around other libraries
  • FoHIS -> Towards Simulating Foggy and Hazy Images and Evaluating their Authenticity

Model specification and versioning

  • geo-ml-model-catalog -> provides a common metadata definition for ML models that operate on geospatial data
  • dvc -> not specific to EO ML models, dvc is a git extension to keep track of changes in data, source code, and ML models together

Deep learning packages

  • rastervision
  • torchvision-enhance -> Enhance PyTorch vision for semantic segmentation, multi-channel images and TIF file
  • DeepHyperX -> A Python/pytorch tool to perform deep learning experiments on various hyperspectral datasets.

Data discovery and ingestion

  • landsat_ingestor -> Scripts and other artifacts for landsat data ingestion into Amazon public hosting
  • satpy -> a python library for reading and manipulating meteorological remote sensing data and writing it to various image and data file formats
  • GIBS-Downloader -> a command-line tool which facilitates the downloading of NASA satellite imagery and offers different functionalities in order to prepare the images for training in a machine learning pipeline
  • eodag -> Earth Observation Data Access Gateway
  • pylandsat -> Search, download, and preprocess Landsat imagery
  • sentinelsat -> Search and download Copernicus Sentinel satellite images
  • landsatxplore -> Search and download Landsat scenes from EarthExplorer

Graphing and visualisation

  • hvplot -> A high-level plotting API for the PyData ecosystem built on HoloViews. Allows overlaying data on map tiles, see Exploring USGS Terrain Data in COG format using hvPlot
  • Pyviz examples include several interesting geospatial visualisations
  • napari -> napari is a fast, interactive, multi-dimensional image viewer for Python. It’s designed for browsing, annotating, and analyzing large multi-dimensional images. By integrating closely with the Python ecosystem, napari can be easily coupled to leading machine learning and image analysis tools. Note that to view a 3GB COG I had to install the napari-tifffile-reader plugin.
  • pixel-adjust -> Interactively select and adjust specific pixels or regions within a single-band raster. Built with rasterio, matplotlib, and panel.
  • Plotly Dash can be used for making interactive dashboards
  • folium -> a python wrapper to the excellent leaflet.js which makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. Also checkout the streamlit-folium component for adding folium maps to your streamlit apps
  • ipyearth -> An IPython Widget for Earth Maps
  • geopandas-view -> Interactive exploration of GeoPandas GeoDataFrames
  • geogif -> Turn xarray timestacks into GIFs
  • leafmap -> geospatial analysis and interactive mapping with minimal coding in a Jupyter environment
  • xmovie -> A simple way of creating movies from xarray objects
  • acquisition-time -> Drawing (Satellite) acquisition dates in a timeline
  • splot -> Lightweight plotting for geospatial analysis in PySAL
  • prettymaps -> A small set of Python functions to draw pretty maps from OpenStreetMap data
  • Tools to Design or Visualize Architecture of Neural Network

Streamlit

Streamlit is an awesome python framework for creating apps with python. Additionally they will host the apps free of charge. Here I list resources which are EO related. Note that a component is an addon which extends Streamlits basic functionality. If you like Streamlit also checkout gradio

Cluster computing with Dask

Algorithms

  • WaterDetect -> an end-to-end algorithm to generate open water cover mask, specially conceived for L2A Sentinel 2 imagery. It can also be used for Landsat 8 images and for other multispectral clustering/segmentation tasks.
  • GatorSense Hyperspectral Image Analysis Toolkit -> This repo contains algorithms for Anomaly Detectors, Classifiers, Dimensionality Reduction, Endmember Extraction, Signature Detectors, Spectral Indices
  • detectree -> Tree detection from aerial imagery
  • pylandstats -> compute landscape metrics
  • dg-calibration -> Coefficients and functions for calibrating DigitalGlobe imagery
  • python-fmask -> Implementation in Python of the cloud and shadow algorithms known collectively as Fmask
  • pyshepseg -> Python implementation of image segmentation algorithm of Shepherd et al (2019) Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination.
  • Shadow-Detection-Algorithm-for-Aerial-and-Satellite-Images -> shadow detection and correction algorithm

Movers and shakers on Github

Companies on Github

For a full list of companies, on and off Github, checkout awesome-geospatial-companies. The following lists companies with interesting Github profiles.

Courses

Books

Online communities

Jobs

Signup for the geospatial-jobs-newsletter and Pangeo discourse lists multiple jobs, global. List of companies job portals below:

Neural nets in space

Processing on satellite allows less data to be downlinked. E.g. super-resolution image might take 4-8 images to generate, then a single image is downlinked.

About the author

My background is in optical physics, and I hold a PhD from Cambridge on the topic of localised surface Plasmons. Since academia I have held a variety of roles, including doing research at Sharp Labs Europe, developing optical systems at Surrey Satellites (SSTL), and working at an IOT startup. It was whilst at SSTL that I started this repository as a personal resource. Over time I have steadily gravitated towards data analytics and software engineering with python, and I now work as a senior data scientist at Satellite Vu. Please feel free to connect with me on Twitter & LinkedIn, and please do let me know if this repository is useful to your work.

Linkedin: robmarkcole Twitter Follow

About

Resources for deep learning with satellite & aerial imagery

Resources

License

Stars

Watchers

Forks

Packages

No packages published