This document lists resources for performing deep learning (DL) on satellite imagery. To a lesser extent classical Machine learning (ML, e.g. random forests) are also discussed, as are classical image processing techniques. Note there is a huge volume of academic literature published on these topics, and this repo does not seek to index them all but rather list approachable resources with published code that will benefit both the research and developer communities.
- Top links
- Techniques
- ML best practice
- Datasets
- Interesting deep learning projects
- State of the art
- Online platforms for performing analytics
- Free online computing resources
- Cloud providers
- Deploying models to production
- Image formats, data management and catalogues
- Image annotation
- Useful paid software
- Useful open source software
- Movers and shakers on Github
- Companies on Github
- Courses
- Online communities
- Jobs
- Neural nets in space
- About the author
- awesome-satellite-imagery-datasets
- awesome-earthobservation-code
- awesome-sentinel
- geospatial-machine-learning
- Long list of satellite missions with example imagery
- paperswithcode aggregates SoTA Computer Vision techniques
- Deep learning in remote sensing applications: A meta-analysis and review
- Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review
This section explores the different deep and machine learning techniques people are applying to common problems in satellite imagery analysis.
The classic cats vs dogs labelling task, which in the remote sensing domain is used to assign a label to an image, e.g. this is an image of a forest. The more complex case is applying multiple labels to an image. Not to be confused with pixel-level classification which is called segmentation.
- Land classification on Sentinel 2 data using a simple sklearn cluster algorithm or deep learning CNN
- Land Use Classification on Merced dataset using CNN in Keras or fastai. Also checkout Multi-label Land Cover Classification using the redesigned multi-label Merced dataset with 17 land cover classes. For alternative visualisations see this approach
- Multi-Label Classification of Satellite Photos of the Amazon Rainforest using keras or FastAI
- Detecting Informal Settlements from Satellite Imagery using fine-tuning of ResNet-50 classifier with repo
- Vision Transformers Use Case: Satellite Image Classification without CNNs
- Applying Deep Learning on Satellite Imagery Classification -> using EuroSAT dataset of RGB and multi spectral covering 13 spectral bands, resnet50 & pytorch, with repo
- Land Cover Classification of Satellite Imagery using Convolutional Neural Networks using Keras and a multi spectral dataset captured over vineyard fields of Salinas Valley, California
- Detecting deforestation from satellite images -> using FastAI and ResNet50, with repo fsdl_deforestation_detection
- Neural Network for Satellite Data Classification Using Tensorflow in Python -> A step-by-step guide for Landsat 5 multispectral data classification for binary built-up/non-built-up class prediction, with repo
- Slums mapping from pretrained CNN network on VHR (Pleiades: 0.5m) and MR (Sentinel: 10m) imagery
- Comparing urban environments using satellite imagery and convolutional neural networks -> includes interesting study of the image embedding features extracted for each image on the Urban Atlas dataset. Accompanying paper
- RSI-CB -> A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data
- NAIP_PoolDetection -> modelled as an object recognition problem, a CNN is used to identify images as being swimming pools or something else - specifically a street, rooftop, or lawn
Segmentation will assign a class label to each pixel in an image. Segmentation is typically grouped into semantic or instance segmentation. In semantic segmentation objects of the same class are assigned the same label, whilst in instance segmentation each object is assigned a unique label. Read this beginner’s guide to segmentation. Single class models are often trained for road or building segmentation, with multi class for land use/crop type classification. Image annotation can take long than for classification/object detection since every pixel must be annotated. Note that many articles which refer to 'hyperspectral land classification' are actually describing semantic segmentation.
Almost always performed using U-Net. For multi/hyper-spectral imagery more classical techniques may be used (e.g. k-means).
- awesome-satellite-images-segmentation
- Satellite Image Segmentation: a Workflow with U-Net is a decent intro article
- nga-deep-learning -> performs semantic segmentation on high resultion GeoTIF data using a modified U-Net & Keras, published by NASA researchers
- How to create a DataBlock for Multispectral Satellite Image Semantic Segmentation using Fastai
- Using a U-Net for image segmentation, blending predicted patches smoothly is a must to please the human eye -> python code to blend predicted patches smoothly
- Automatic Detection of Landfill Using Deep Learning
- SpectralNET -> a 2D wavelet CNN for Hyperspectral Image Classification, uses Salinas Scene dataset & Keras
- FactSeg -> Foreground Activation Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery (TGRS), also see FarSeg and FreeNet, implementations of research paper
- SCAttNet -> Semantic Segmentation Network with Spatial and Channel Attention Mechanism
- Land Cover Classification with U-Net -> Satellite Image Multi-Class Semantic Segmentation Task with PyTorch Implementation of U-Net
- Multi-class semantic segmentation of satellite images using U-Net using DSTL dataset, tensorflow 1 & python 2.7. Accompanying article
- Codebase for multi class land cover classification with U-Net accompanying a masters thesis, uses Keras
- Land cover classification of Sundarbans satellite imagery using K-Nearest Neighbor(K-NNC), Support Vector Machine (SVM), and Gradient Boosting classification algorithms with Python with repo
- dubai-satellite-imagery-segmentation -> due to the small dataset, image augmentation was used
- Semantic Segmentation on Aerial Images using fastai uses U-Net on the Inria Aerial Image Labeling Dataset of urban settlements in Europe and the United States, and is labelled as a building and not building classes (no repo)
- Road and Building Semantic Segmentation in Satellite Imagery uses U-Net on the Massachusetts Roads Dataset & keras
- Semantic segmentation of roads and highways using Sentinel-2 imagery (10m) super-resolved using the SENX4 model up to x4 the initial spatial resolution (2.5m) (results, no repo)
- find-unauthorized-constructions-using-aerial-photography -> semantic segmentation using U-Net with custom_f1 metric & Keras. The creation of the dataset is described in this article
- semantic segmentation model to identify newly developed or flooded land using NAIP imagery provided by the Chesapeake Conservancy, training on MS Azure
- Semantic Segmentation of roads using U-net Keras, OSM data, project summary article by student, no code
- Road detection using semantic segmentation and albumentations for data augmention using the Massachusetts Roads Dataset, U-net & Keras
- Winning Solutions from SpaceNet Road Detection and Routing Challenge
- Pix2Pix-for-Semantic-Segmentation-of-Satellite-Images -> using Pix2Pix GAN network to segment the building footprint from Satellite Images, uses tensorflow
- SpaceNetUnet -> Baseline model is U-net like, applied to SpaceNet Vegas data, using Keras
- Building footprint detection with fastai on the challenging SpaceNet7 dataset uses U-Net
- automated-building-detection -> Input: very-high-resolution (<= 0.5 m/pixel) RGB satellite images. Output: buildings in vector format (geojson), to be used in digital map products. Built on top of robosat and robosat.pink.
- project_sunroof_india -> Analyzed Google Satellite images to generate a report on individual house rooftop's solar power potential, uses a range of classical computer vision techniques (e.g Canny Edge Detection) to segment the roofs
- JointNet-A-Common-Neural-Network-for-Road-and-Building-Extraction
- Mapping Africa’s Buildings with Satellite Imagery: Google AI blog post
- DeepSolar: A Machine Learning Framework to Efficiently Construct a Solar Deployment Database in the United States -> with website and dataset on kaggle, actually used a CNN for classification and segmentation is obtained by applying a threshold to the activation map
- nz_convnet -> A U-net based ConvNet for New Zealand imagery to classify building outlines
- RoadVecNet -> Road-Network-Segmentation-and-Vectorization in keras with dataset and paper
- polycnn -> End-to-End Learning of Polygons for Remote Sensing Image Classification
- spacenet_building_detection solution by motokimura using Unet
- Сrор field boundary detection: approaches overview and main challenges - review article, no code
- kenya-crop-mask -> Annual and in-season crop mapping in Kenya - LSTM classifier to classify pixels as containing crop or not, and a multi-spectral forecaster that provides a 12 month time series given a partial input. Dataset downloaded from GEE and pytorch lightning used for training
- What’s growing there? Identify crops from multi-spectral remote sensing data (Sentinel 2) using eo-learn for data pre-processing, cloud detection, NDVI calculation, image augmentation & fastai
- Tree species classification from from airborne LiDAR and hyperspectral data using 3D convolutional neural networks accompanies research paper and uses fastai
- crop-type-classification -> using Sentinel 1 & 2 data with a U-Net + LSTM, more features (i.e. bands) and higher resolution produced better results (article, no code)
- Find sports fields using Mask R-CNN and overlay on open-street-map
- UNSOAT used fastai to train a Unet to perform semantic segmentation on satellite imageries to detect water - paper + notebook, accuracy 0.97, precision 0.91, recall 0.92
- Semi-Supervised Classification and Segmentation on High Resolution Aerial Images - Solving the FloodNet problem
- Flood Detection and Analysis using UNET with Resnet-34 as the back bone uses fastai
- Houston_flooding -> labeling each pixel as either flooded or not using data from Hurricane Harvey. Dataset consisted of pre and post flood images, and a ground truth floodwater mask was created using unsupervised clustering (with DBScan) of image pixels with human cluster verification/adjustment
- ml4floods -> An ecosystem of data, models and code pipelines to tackle flooding with ML
- Wild Fire Detection using U-Net trained on Databricks & Keras, semantic segmentation
- A Practical Method for High-Resolution Burned Area Monitoring Using Sentinel-2 and VIIRS with code. Dataset created on Google Earth Engine, downloaded to local machine for model training using fastai. The BA-Net model used is much smaller than U-Net, resulting in lower memory requirements and a faster computation
- HED-UNet -> a model for simultaneous semantic segmentation and edge detection, examples provided are glacier fronts and building footprints using the Inria Aerial Image Labeling dataset
- glacier_mapping -> Mapping glaciers in the Hindu Kush Himalaya, Landsat 7 images, Shapefile labels of the glaciers, Unet with dropout
In instance segmentation, each individual 'instance' of a segmented area is given a unique lable. For detection of very small objects this may a good approach, but it can struggle seperating individual areas that are closely spaced.
- Instance segmentation of center pivot irrigation system in Brazil using free Landsat images, mask R-CNN & Keras
- Oil tank instance segmentation with Mask R-CNN with accompanying article using Keras & Airbus Oil Storage Detection Dataset on Kaggle
Put a box around individual objects in an image. A good introduction to the challenge of performing object detection on aerial imagery is given in this paper. In summary, images are large and objects may comprise only a few pixels, easily confused with random features in background. In general object detecion performs well on large objects, and gets increasingly difficult as the objects get smaller & more densely packed. Model accuracy falls off rapidly as resolution degrades, so it is common for object detection to use very high resolution imagery, e.g. 30cm RGB.
- Super-Resolution and Object Detection -> Super-resolution is a relatively inexpensive enhancement that can improve object detection performance
- Tackling the Small Object Problem in Object Detection
- Satellite Imagery Multiscale Rapid Detection with Windowed Networks (SIMRDWN) -> combines some of the leading object detection algorithms into a unified framework designed to detect objects both large and small in overhead imagery. Train models and test on arbitrary image sizes with YOLO (versions 2 and 3), Faster R-CNN, SSD, or R-FCN.
- Several useful articles on awesome-tiny-object-detection
- YOLTv4 -> YOLTv4 is designed to detect objects in aerial or satellite imagery in arbitrarily large images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks. Read Announcing YOLTv4: Improved Satellite Imagery Object Detection
- Tensorflow Benchmarks for Object Detection in Aerial Images -> tensorflow-based codebase created to build benchmarks for object detection in aerial images
- Pytorch Benchmarks for Object Detection in Aerial Images -> pytorch-based codebase created to build benchmarks for object detection in aerial images
- ASPDNet -> Counting dense objects in remote sensing images, arxiv paper
- EESRGAN -> Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network
- awesome-aerial-object-detection -> lists many relevant papers
- Machine Learning For Rooftop Detection and Solar Panel Installment discusses tiling large images and generating annotations from OSM data. Features of the roofs were calculated using a combination of contour detection and classification. Follow up article using semantic segmentation
- Building Extraction with YOLT2 and SpaceNet Data
- AIcrowd dataset of building outlines -> 300x300 pixel RGB images with annotations in MS-COCO format
- XBD-hurricanes -> Models for building (and building damage) detection in high-resolution (<1m) satellite and aerial imagery using a modified RetinaNet model
- Detecting solar panels from satellite imagery using segmentation
- How hard is it for an AI to detect ships on satellite images?
- Object Detection in Satellite Imagery, a Low Overhead Approache
- Detecting Ships in Satellite Imagery using the Planet dataset and Keras
- Planet use non DL felzenszwalb algorithm to detect ships
- Ship detection using k-means clustering & CNN classifier on patches
- sentinel2-xcube-boat-detection -> detect and count boat traffic in Sentinel-2 imagery using temporal, spectral and spatial features
- Truck Detection with Sentinel-2 during COVID-19 crisis -> moving objects in Sentinel-2 data causes a specific reflectance relationship in the RGB, which looks like a rainbow, and serves as a marker for trucks. Improve accuracy by only analysing roads. Not using object detection but relevant
- cowc_car_counting -> car counting on the Cars Overhead With Context (COWC) dataset. Not sctictly object detection but a CNN to predict the car count in a tile
- yoltv4 includes examples on the RarePlanes dataset
- cownter_strike -> counting cows, located with point-annotations, two models: CSRNet (a density-based method) & LCFCN (a detection-based method)
- DeepForest is a python package for training and predicting individual tree crowns from airborne RGB imagery
- Official repository for the "Identifying trees on satellite images" challenge from Omdena
- Counting-Trees-using-Satellite-Images -> create an inventory of incoming and outgoing trees for an annual tree inspections, uses keras & semantic segmentation
- 2020 Nature paper - An unexpectedly large count of trees in the West African Sahara and Sahel -> tree detection framework based on U-Net & tensorflow 2 with code here
Oil is stored in tanks at many points between extraction and sale, and the volume of oil in storage is an important economic indicator.
- Oil Storage Tank’s Volume Occupancy On Satellite Imagery Using YoloV3 with repo
- Oil-Tank-Volume-Estimation -> combines object detection and classical computer vision
- MCAN-OilSpillDetection -> Oil Spill Detection with A Multiscale Conditional Adversarial Network under Small Data Training, with paper. A multiscale conditional adversarial network (MCAN) trained with four oil spill observation images accurately detects oil spills in new images.
Generally treated as a semantic segmentation problem.
- From this article on sentinelhub there are three popular classical algorithms that detects thresholds in multiple bands in order to identify clouds. In the same article they propose using semantic segmentation combined with a CNN for a cloud classifier (excellent review paper here), but state that this requires too much compute resources.
- This article compares a number of ML algorithms, random forests, stochastic gradient descent, support vector machines, Bayesian method.
- Segmentation of Clouds in Satellite Images Using Deep Learning -> semantic segmentation using a Unet on the Kaggle 38-cloud Landsat dataset
- Cloud Detection in Satellite Imagery compares FPN+ResNet18 and CheapLab architectures on Sentinel-2 L1C and L2A imagery
- Cloud-Removal-with-GAN-Satellite-Image-Processing
- Benchmarking Deep Learning models for Cloud Detection in Landsat-8 and Sentinel-2 images
- Landsat-8 to Proba-V Transfer Learning and Domain Adaptation for Cloud detection
- Multitemporal Cloud Masking in Google Earth Engine
Monitor water levels, coast lines, size of urban areas, wildfire damage. Note, clouds change often too..!
- awesome-remote-sensing-change-detection lists many datasets and publications
- Change-Detection-Review -> A review of change detection methods, including code and open data sets for deep learning
- Unsupervised Changed Detection in Multi-Temporal Satellite Images using PCA & K-Means -> python 2
- LANDSAT Time Series Analysis for Multi-temporal Land Cover Classification using Random Forest
- Unstructured-change-detection-using-CNN
- Siamese neural network to detect changes in aerial images -> uses Keras and VGG16 architecture
- Change Detection in 3D: Generating Digital Elevation Models from Dove Imagery
- QGIS plugin for applying change detection algorithms on high resolution satellite imagery
- LamboiseNet -> Master thesis about change detection in satellite imagery using Deep Learning
- Fully Convolutional Siamese Networks for Change Detection -> with paper
- Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks -> with paper, used the Onera Satellite Change Detection (OSCD) dataset
- STANet -> official implementation of the spatial-temporal attention neural network (STANet) for remote sensing image change detection
- BIT_CD -> Official Pytorch Implementation of Remote Sensing Image Change Detection with Transformers
- IAug_CDNet -> Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images
- dpm-rnn-public -> Code implementing a damage mapping method combining satellite data with deep learning
- SenseEarth2020-ChangeDetection -> 1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime; predictions of five HRNet-based segmentation models are ensembled, serving as pseudo labels of unchanged areas
- KPCAMNet -> Python implementation of the paper Unsupervised Change Detection in Multi-temporal VHR Images Based on Deep Kernel PCA Convolutional Mapping Network
- CDLab -> benchmarking deep learning-based change detection methods.
- Siam-NestedUNet -> The pytorch implementation for "SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images"
- SUNet-change_detection -> Implementation of paper SUNet: Change Detection for Heterogeneous Remote Sensing Images from Satellite and UAV Using a Dual-Channel Fully Convolution Network
- temporalCNN -> Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series
- Self-supervised Change Detection in Multi-view Remote Sensing Images
- MFPNet -> Remote Sensing Change Detection Based on Multidirectional Adaptive Feature Fusion and Perceptual Similarity
- pytorch-psetae -> PyTorch implementation of the model presented in Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention
- GitHub for the DIUx xView Detection Challenge -> The xView2 Challenge focuses on automating the process of assessing building damage after a natural disaster
- DASNet -> Dual attentive fully convolutional siamese networks for change detection of high-resolution satellite images
- Self-Attention for Raw Optical Satellite Time Series Classification
The goal is to predict economic activity from satellite imagery rather than conducting labour intensive ground surveys
- Using publicly available satellite imagery and deep learning to understand economic well-being in Africa, Nature Comms 22 May 2020 -> Used CNN on Ladsat imagery (night & day) to predict asset wealth of African villages
- Combining Satellite Imagery and machine learning to predict poverty -> review article
- Measuring Human and Economic Activity from Satellite Imagery to Support City-Scale Decision-Making during COVID-19 Pandemic
- Predicting Food Security Outcomes Using CNNs for Satellite Tasking
- Crop yield Prediction with Deep Learning -> The necessary code for the paper Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data, AAAI 2017 (Best Student Paper Award in Computational Sustainability Track).
- https://github.com/taspinar/sidl/blob/master/notebooks/2_Detecting_road_and_roadtypes_in_sattelite_images.ipynb
- Measuring the Impacts of Poverty Alleviation Programs with Satellite Imagery and Deep Learning
- Traffic density estimation as a regression problem
- Crop Yield Prediction Using Deep Neural Networks and LSTM and Building a Crop Yield Prediction App in Senegal Using Satellite Imagery and Jupyter Voila
- Advanced Deep Learning Techniques for Predicting Maize Crop Yield using Sentinel-2 Satellite Imagery
- Building a Spatial Model to Classify Global Urbanity Levels -> estimage global urbanity levels from population data, nightime lights and road networks
Super-resolution attempts to enhance the resolution of an imaging system, and can be applied as a pre-processing step to improve the detection of small objects. For an introduction to this topic read this excellent article. Note that SR techniques operate on a single image or a stack images/video frames.
- https://medium.com/the-downlinq/super-resolution-on-satellite-imagery-using-deep-learning-part-1-ec5c5cd3cd2 -> Nov 2016 blog post by CosmiQ Works with a nice introduction to the topic. Proposes and demonstrates a new architecture with perturbation layers with practical guidance on the methodology and code. Three part series
- Super Resolution for Satellite Imagery - srcnn repo
- TensorFlow implementation of "Accurate Image Super-Resolution Using Very Deep Convolutional Networks" adapted for working with geospatial data
- Random Forest Super-Resolution (RFSR repo) including sample data
- Super-Resolution (python) Utilities for managing large satellite images
- Enhancing Sentinel 2 images by combining Deep Image Prior and Decrappify. Repo for deep-image-prior and article on decrappify
- The keras docs have a great tutorial - Image Super-Resolution using an Efficient Sub-Pixel CNN
- HighRes-net -> Pytorch implementation of HighRes-net, a neural network for multi-frame super-resolution, trained and tested on the European Space Agency’s Kelvin competition
- super-resolution-using-gan -> Super-Resolution of Sentinel-2 Using Generative Adversarial Networks
- Super-resolution of Multispectral Satellite Images Using Convolutional Neural Networks with paper
- Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network -> enhanced super-resolution GAN (ESRGAN)
- pytorch-enhance -> Library of Image Super-Resolution Models, Datasets, and Metrics for Benchmarking or Pretrained Use. Also checkout this implementation in Jax
- Multi-temporal Super-Resolution on Sentinel-2 Imagery using HighRes-Net, repo
- image-super-resolution -> Super-scale your images and run experiments with Residual Dense and Adversarial Networks.
- SSPSR-Pytorch -> A spatial-spectral prior deep network for single hyperspectral image super-resolution
- Sentinel-2 Super-Resolution: High Resolution For All (Bands)
- super-resolution for satellite images using SRCNN
- CinCGAN -> Unofficial Implementation of Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks
- Satellite-image-SRGAN using PyTorch
- Super Resolution in OpenCV
- deepsum -> Deep neural network for Super-resolution of Unregistered Multitemporal images (ESA PROBA-V challenge)
- 3DWDSRNet -> code to reproduce Satellite Image Multi-Frame Super Resolution (MISR) Using 3D Wide-Activation Neural Networks
- RAMS -> Official TensorFlow code for paper Multi-Image Super Resolution of Remotely Sensed Images Using Residual Attention Deep Neural Networks
- TR-MISR -> Transformer-based MISR framework for the the PROBA-V super-resolution challenge
- EEGAN -> Edge Enhanced GAN For Remote Sensing Image Super-Resolution, TensorFlow 1.1
- PECNN -> A Progressively Enhanced Network for Video Satellite Imagery Super-Resolution
- Awesome-Super-Resolution -> another 'awesome' repo, getting a little out of date now
Translate images e.g. from SAR to RGB.
- How to Develop a Pix2Pix GAN for Image-to-Image Translation -> how to develop a Pix2Pix model for translating satellite photographs to Google map images. A good intro to GANS
- SAR to RGB Translation using CycleGAN -> uses a CycleGAN model in the ArcGIS API for Python
- A growing problem of ‘deepfake geography’: How AI falsifies satellite images
- Kaggle Pix2Pix Maps -> dataset for pix2pix to take a google map satellite photo and build a street map
- guided-deep-decoder -> With guided deep decoder, you can solve different image pair fusion problems, allowing super-resolution, pansharpening or denoising
- hackathon-ci-2020 -> generate nighttime imagery from infrared observations
- satellite-to-satellite-translation -> VAE-GAN architecture for unsupervised image-to-image translation with shared spectral reconstruction loss. Model is trained on GOES-16/17 and Himawari-8 L1B data
- Anomaly Detection on Mars using a GAN
- Using Generative Adversarial Networks to Address Scarcity of Geospatial Training Data -> GAN perform better than CNN in segmenting land cover classes outside of the training dataset (article, no code)
- Autoencoders & their Application in Remote Sensing -> intro article and example use case applied to SAR data for land classification
- LEt-SNE -> Dimensionality Reduction and visualization technique that compensates for the curse of dimensionality
- AutoEncoders for Land Cover Classification of Hyperspectral Images -> An autoencoder nerual net is used to reduce 103 band data to 60 features (dimensionality reduction), keras. Also read part 2 which implements K-NNC, SVM and Gradient Boosting
The terms self-supervised & unsupervised learning are often used interchangably in the literature, and describe tehcniques using unlabelled data. In general, the more classical techniques such as k-means classification or PCA are referred to as unsupervised, whilst newer techniques using CNN feature extraction or autoencoders are referred to as self-supervised. Yann LeCun has described self-supervised/unsupervised learning as the 'base of the cake': If we think of our brain as a cake, then the cake base is unsupervised learning. The machine predicts any part of its input for any observed part, all without the use of labelled data. Supervised learning forms the icing on the cake, and reinforcement learning is the cherry on top.
- Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data -> Seasonal Contrast (SeCo) is an effective pipeline to leverage unlabeled data for in-domain pre-training of remote sensing representations. Models trained with SeCo achieve better performance than their ImageNet pre-trained counterparts and state-of-the-art self-supervised learning methods on multiple downstream tasks. paper and repo
- Train SimSiam on Satellite Images using Lightly to generate embeddings that can be used for data exploration and understanding
- Unsupervised Learning for Land Cover Classification in Satellite Imagery
- Tile2Vec: Unsupervised representation learning for spatially distributed data
- Contrastive Sensor Fusion -> Code implementing Contrastive Sensor Fusion, an approach for unsupervised learning of multi-sensor representations targeted at remote sensing imagery.
- hyperspectral-autoencoders -> Tools for training and using unsupervised autoencoders and supervised deep learning classifiers for hyperspectral data, built on tensorflow. Autoencoders are unsupervised neural networks that are useful for a range of applications such as unsupervised feature learning and dimensionality reduction.
- Sentinel-2 image clustering in python
- MARTA GANs: Unsupervised Representation Learning for Remote Sensing Image Classification and code
- A generalizable and accessible approach to machine learning with global satellite imagery nature publication -> MOSAIKS is designed to solve an unlimited number of tasks at planet-scale quickly using feature vectors, with repo. Also see mosaiks-api
These techniques combine multiple data types, e.g. imagery and text data.
- Building a mixed-data neural network in Keras to predict accident locations -> Combining satellite imagery and structured data to predict the location of traffic accidents with a neural network of neural networks
- Multi-Input Deep Neural Networks with PyTorch-Lightning - Combine Image and Tabular Data -> excellent intro article using pytorch, not actually applied to satellite data but to real estate data
Image fusion of low res multispectral with high res pan band.
- Several algorithms described in the ArcGIS docs, with the simplest being taking the mean of the pan and RGB pixel value.
- Does not require DL, classical algos suffice, see this notebook and this kaggle kernel
- https://github.com/mapbox/rio-pansharpen
- PSGAN -> A Generative Adversarial Network for Remote Sensing Image Pan-sharpening, arxiv paper
- Pansharpening-by-Convolutional-Neural-Network
- PBR_filter -> {P}ansharpening by {B}ackground {R}emoval algorithm for sharpening RGB images
- Simple band math
ndvi = np.true_divide((ir - r), (ir + r))
but challenging due to the size of the imagery. - Example notebook local
- Landsat data in cloud optimised (COG) format analysed for NVDI with medium article here.
- Visualise water loss with Holoviews
- Identifying Buildings in Satellite Images with Machine Learning and Quilt -> NDVI & edge detection via gaussian blur as features, fed to TPOT for training with labels from OpenStreetMap, modelled as a two class problem, “Buildings” and “Nature”
- Seeing Through the Clouds - Predicting Vegetation Indices Using SAR
- Convolutional autoencoder network can be employed to image denoising, read about this on the Keras blog
- jitter-compensation -> Remote Sensing Image Jitter Detection and Compensation Using CNN
- DeblurGANv2 -> Deblurring (Orders-of-Magnitude) Faster and Better
- image-quality-assessment -> CNN to predict the aesthetic and technical quality of images
- Convolutional autoencoder for image denoising -> keras guide
- piq -> a collection of measures and metrics for image quality assessment
Image registration is the process of transforming different sets of data into one coordinate system. Typical use is overlapping images taken at different times or with different cameras.
- Wikipedia article on registration -> register for change detection or image stitching
- Phase correlation is used to estimate the XY translation between two images with sub-pixel accuracy. Can be used for accurate registration of low resolution imagery onto high resolution imagery, or to register a sub-image on a full image -> Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects. With additional pre-processing image rotation and scale changes can also be calculated.
- cnn-registration -> A image registration method using convolutional neural network features written in Python2, Tensorflow 1.5
- Detecting Ground Control Points via Convolutional Neural Network for Stereo Matching -> code?
- Image Registration: From SIFT to Deep Learning -> background reading on has the state of the art has evolved from OpenCV to Neural Networks
- ImageCoregistration -> Image registration with openCV using sift and RANSAC
- mapalignment -> Aligning and Updating Cadaster Maps with Remote Sensing Images
- CVPR21-Deep-Lucas-Kanade-Homography -> deep learning pipeline to accurately align challenging multimodality images. The method is based on traditional Lucas-Kanade algorithm with feature maps extracted by deep neural networks.
- imreg_dft -> Image registration using discrete Fourier transform.
- arosics -> Perform automatic subpixel co-registration of two satellite image datasets using phase-correlation, XY translations only.
- eolearn implements phase correlation, feature matching and ECC
- RStoolbox supports Image to Image Co-Registration based on Mutual Information
- Reprojecting the Perseverance landing footage onto satellite imagery
Measure surface contours.
- Wikipedia DEM article and phase correlation article
- Intro to depth from stereo
- Map terrain from stereo images to produce a digital elevation model (DEM) -> high resolution & paired images required, typically 0.3 m, e.g. Worldview or GeoEye.
- Process of creating a DEM here and here.
- ArcGIS can generate DEMs from stereo images
- https://github.com/MISS3D/s2p -> produces elevation models from images taken by high resolution optical satellites -> demo code on https://gfacciol.github.io/IS18/
- Automatic 3D Reconstruction from Multi-Date Satellite Images
- Semi-global matching with neural networks
- Predict the fate of glaciers
- monodepth - Unsupervised single image depth prediction with CNNs
- Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
- Terrain and hydrological analysis based on LiDAR-derived digital elevation models (DEM) - Python package
- Phase correlation in scikit-image
- s2p -> a Python library and command line tool that implements a stereo pipeline which produces elevation models from images taken by high resolution optical satellites such as Pléiades, WorldView, QuickBird, Spot or Ikonos
- The Mapbox API provides images and elevation maps, article here
- Reconstructing 3D buildings from aerial LiDAR with Mask R-CNN
- ResDepth -> A Deep Prior For 3D Reconstruction From High-resolution Satellite Images
- The World Needs (a lot) More Thermal Infrared Data from Space
- IR2VI thermal-to-visible image translation framework based on GANs with code
- The finest resolution urban outdoor heat exposure maps in major US cities -> urban microclimate modeling based on high resolution 3D urban models and meteorological data makes it possible to examine how people are exposed to heat stress at a fine spatio-temporal level.
- Object_Classification_in_Thermal_Images -> classification accuracy was improved by adding the object size as a feature directly within the CNN
- Thermal imaging with satellites blog post by Christoph Rieke
- Removing speckle noise from Sentinel-1 SAR using a CNN
- A dataset which is specifically made for deep learning on SAR and optical imagery is the SEN1-2 dataset, which contains corresponding patch pairs of Sentinel 1 (VV) and 2 (RGB) data. It is the largest manually curated dataset of S1 and S2 products, with corresponding labels for land use/land cover mapping, SAR-optical fusion, segmentation and classification tasks. Data: https://mediatum.ub.tum.de/1474000
- so2sat on Tensorflow datasets -> So2Sat LCZ42 is a dataset consisting of co-registered synthetic aperture radar and multispectral optical image patches acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and the corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world.
- You do not need clean images for SAR despeckling with deep learning -> How Speckle2Void learned to stop worrying and love the noise
- PySAR - InSAR (Interferometric Synthetic Aperture Radar) timeseries analysis in python
- Synthetic Aperture Radar (SAR) Analysis With Clarifai
- Labeled SAR imagery dataset of ten geophysical phenomena from Sentinel-1 wave mode consists of more than 37,000 SAR vignettes divided into ten defined geophysical categories
- Deep Learning and SAR Applications
This section includes tips and ideas I have picked up from other practitioners including ai-fast-track, FraPochetti & the IceVision community
- Almost all imagery data on the internet is in RGB format, and common techniques designed for working with this 3 band imagery may fail or need significant adaptation to work with multiband data (e.g. 13-band Sentinel 2)
- In general, classification and object detection models are created using transfer learning, where the majority of the weights are not updated in training but have been pre computed using standard vision datasets such as ImageNet
- Since satellite images are typically very large, it is common to tile them before processing. Alternatively checkout Fully Convolutional Image Classification on Arbitrary Sized Image -> TLDR replace the fully-connected layer with a convolution-layer
- Where you have small sample sizes, e.g. for a small object class which may be under represented in your training dataset, use image augmentation
- In general, larger models will outperform smaller models, particularly on challenging tasks such as detecting small objetcs
- If model performance in unsatisfactory, try to increase your dataset size before switching to another model architecture
- In training, whenever possible increase the batch size, as small batch sizes produce poor normalization statistics
- The vast majority of the literature uses supervised learning with the requirement for large volumes of annotated data, which is a bottleneck to development and deployment. We are just starting to see self-supervised approaches applied to remote sensing data
- 4-ways-to-improve-class-imbalance discusses the pros and cons of several rebalancing techniques, applied to an aerial dataset. Reason to read: models can reach an accuracy ceiling where majority classes are easily predicted but minority classes poorly predicted. Overall model accuracy may not improve until steps are taken to account for class imbalance.
- For general guidance on dataset size see this issue
- Read A Recipe for Training Neural Networks by Andrej Karpathy
- Seven steps towards a satellite imagery dataset
- Implementing Transfer Learning from RGB to Multi-channel Imagery -> takes a resnet50 model pre-trained on an input of 224x224 pixels with 3 channels (RGB) and updates it for a new input of 480x400 pixels and 15 channels (12 new + RGB) using keras
- How to implement augmentations for Multispectral Satellite Images Segmentation using Fastai-v2 and Albumentations
- Principal Component Analysis: In-depth understanding through image visualization applied to Landsat TM images, with repo
- Leveraging Geolocation Data for Machine Learning: Essential Techniques -> A Gentle Guide to Feature Engineering and Visualization with Geospatial data, in Plain English
- 3 Tips to Optimize Your Machine Learning Project for Data Labeling
- Image Classification Labeling: Single Class versus Multiple Class Projects
- Labeling Satellite Imagery for Machine Learning
- Image Augmentations for Aerial Datasets
- Leveraging satellite imagery for machine learning computer vision applications
- Best Practices for Preparing and Augmenting Image Data for CNNs
- Using TensorBoard While Training Land Cover Models with Satellite Imagery
- An Overview of Model Compression Techniques for Deep Learning in Space
- Visualise Embeddings with Tensorboard -> also checkout the Tensorflow Embedding Projector
- Introduction to Satellite Image Augmentation with Generative Adversarial Networks - video
- Use Gradio and W&B together to monitor training and view predictions
- Every important satellite imagery analysis project is challenging, but here are ten straightforward steps to get started
- Challenges with SpaceNet 4 off-nadir satellite imagery: Look angle and target azimuth angle -> building prediction in images taken at nearly identical look angles — for example, 29 and 30 degrees — produced radically different performance scores.
- How not to test your deep learning algorithm? - bad ideas to avoid
- AI products and remote sensing: yes, it is hard and yes, you need a good infra -> advice on building an in-house data annotation service
- Boosting object detection performance through ensembling on satellite imagery
- How to use deep learning on satellite imagery — Playing with the loss function
Warning satellite image files can be LARGE, even a small data set may comprise 50 GB of imagery
- As part of the EU Copernicus program, multiple Sentinel satellites are capturing imagery -> see wikipedia.
- 13 bands, Spatial resolution of 10 m, 20 m and 60 m, 290 km swath, the temporal resolution is 5 days
- awesome-sentinel - a curated list of awesome tools, tutorials and APIs related to data from the Copernicus Sentinel Satellites.
- Sentinel-2 Cloud-Optimized GeoTIFFs and Sentinel-2 L2A 120m Mosaic
- Open access data on GCP and paid access via sentinel-hub and python-api.
- Example loading sentinel data in a notebook
- so2sat on Tensorflow datasets - So2Sat LCZ42 is a dataset consisting of co-registered synthetic aperture radar and multispectral optical image patches acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and the corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world.
- eurosat - EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples. Dataset and usage in EuroSAT: Land Use and Land Cover Classification with Sentinel-2, where a CNN achieves a classification accuracy 98.57%.
- bigearthnet - The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels.
- Jupyter Notebooks for working with Sentinel-5P Level 2 data stored on S3. The data can be browsed here
- Sentinel NetCDF data
- Analyzing Sentinel-2 satellite data in Python with Keras
- Xarray backend to Copernicus Sentinel-1 satellite data products
- Long running US program -> see Wikipedia
- 8 bands, 15 to 60 meters, 185km swath, the temporal resolution is 16 days
- Landsat 4, 5, 7, and 8 imagery on Google, see the GCP bucket here, with Landsat 8 imagery in COG format analysed in this notebook
- Landsat 8 imagery on AWS, with many tutorials and tools listed
- https://github.com/kylebarron/landsat-mosaic-latest -> Auto-updating cloudless Landsat 8 mosaic from AWS SNS notifications
- Visualise landsat imagery using Datashader
- Landsat-mosaic-tiler -> This repo hosts all the code for landsatlive.live website and APIs.
- Satellites owned by Maxar (formerly DigitalGlobe)
- Open Data images for humanitarian response
- Maxar ARD (COG plus data masks, with STAC) sample data in S3
- Dataset on AWS -> see this getting started notebook and this notebook on the off-Nadir dataset
- cloud_optimized_geotif here used in the 3D modelling notebook here.
- WorldView cloud optimized geotiffs used in the 3D modelling notebook here.
- For more Worldview imagery see Kaggle DSTL competition.
- Planet’s high-resolution, analysis-ready mosaics of the world’s tropics, supported through Norway’s International Climate & Forests Initiative. BBC coverage
- Planet have made imagery available via kaggle competitions
- Land use classification dataset with 21 classes and 100 RGB TIFF images for each class
- Each image measures 256x256 pixels with a pixel resolution of 1 foot
- http://weegee.vision.ucmerced.edu/datasets/landuse.html
- Available as a Tensorflow dataset -> https://www.tensorflow.org/datasets/catalog/uc_merced
- Also available as a multi-label dataset
- Read Vision Transformers for Remote Sensing Image Classification where a Vision Transformer classifier achieves 98.49% classification accuracy on Merced
- Land use classification dataset with 38 classes and 800 RGB JPG images for each class
- https://sites.google.com/view/zhouwx/dataset?authuser=0
- Publication: PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval
- Spacenet is an online hub for data, challenges, algorithms, and tools.
- spacenet.ai website covering the series of SpaceNet challenges, lots of useful resources (blog, video and papers)
- Getting Started with SpaceNet
- Package of utilities to assist working with the SpaceNet dataset.
- The SpaceNet 7 Multi-Temporal Urban Development Challenge: Dataset Release
- SpaceNet - WorldView-3 article here, and semantic segmentation using Raster Vision
Kaggle hosts over > 100 satellite image datasets, search results here. The kaggle blog is an interesting read.
- https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data
- 3-5 meter resolution GeoTIFF images from planet Dove satellite constellation
- 12 classes including - cloudy, primary + waterway etc
- 1st place winner interview - used 11 custom CNN
- FastAI Multi-label image classification
- Multi-Label Classification of Satellite Photos of the Amazon Rainforest
- https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection
- Rating - medium, many good examples (see the Discussion as well as kernels), but as this competition was run a couple of years ago many examples use python 2
- WorldView 3 - 45 satellite images covering 1km x 1km in both 3 (i.e. RGB) and 16-band (400nm - SWIR) images
- 10 Labelled classes include - Buildings, Road, Trees, Crops, Waterway, Vehicles
- Interview with 1st place winner who used segmentation networks - 40+ models, each tweaked for particular target (e.g. roads, trees)
- Deepsense 4th place solution
- Entry by lopuhin using UNet with batch-normalization
- https://www.kaggle.com/c/airbus-ship-detection/overview
- Rating - medium, most solutions using deep-learning, many kernels, good example kernel
- I believe there was a problem with this dataset, which led to many complaints that the competition was ruined
- https://www.kaggle.com/c/draper-satellite-image-chronology/data
- Rating - hard. Not many useful kernels.
- Images are grouped into sets of five, each of which have the same setId. Each image in a set was taken on a different day (but not necessarily at the same time each day). The images for each set cover approximately the same area but are not exactly aligned.
- Kaggle interviews for entrants who used XGBOOST and a hybrid human/ML approach
Not satellite but airborne imagery. Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared. The training and test labels are one-hot encoded 1x6 vectors. Each image patch is size normalized to 28x28 pixels. Data in .mat
Matlab format. JPEG?
- Imagery source
- Sat4 500,000 image patches covering four broad land cover classes - barren land, trees, grassland and a class that consists of all land cover classes other than the above three
- Sat6 405,000 image patches each of size 28x28 and covering 6 landcover classes - barren land, trees, grassland, roads, buildings and water bodies.
- Deep Gradient Boosted Learning article
In this challenge, you will build a model to classify cloud organization patterns from satellite images.
- https://www.kaggle.com/c/understanding_cloud_organization/
- 3rd place solution on Github by naivelamb
- https://www.kaggle.com/airbusgeo/airbus-oil-storage-detection-dataset
- Oil-Storage Tank Instance Segmentation with Mask R-CNN with accompanying article
- https://www.kaggle.com/kmader/satellite-images-of-hurricane-damage
- https://github.com/dbuscombe-usgs/HurricaneHarvey_buildingdamage
- https://www.kaggle.com/reubencpereira/spatial-data-repo -> Satellite + loan data
- https://www.kaggle.com/towardsentropy/oil-storage-tanks -> Image data of industrial tanks with bounding box annotations, estimate tank fill % from shadows
- https://www.kaggle.com/rhammell/ships-in-satellite-imagery -> Classify ships in San Franciso Bay using Planet satellite imagery
- https://www.kaggle.com/rhammell/planesnet -> Detect aircraft in Planet satellite image chips
- https://www.kaggle.com/datamunge/overheadmnist -> A Benchmark Satellite Dataset as Drop-In Replacement for MNIST
- https://www.kaggle.com/balraj98/deepglobe-land-cover-classification-dataset -> Land Cover Classification Dataset from DeepGlobe Challenge
- resisc45 - RESISC45 dataset is a publicly available benchmark for Remote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class.
- eurosat - EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples.
- bigearthnet - The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels.
- Earth on AWS is the AWS equivalent of Google Earth Engine
- Currently 36 satellite datasets on the Registry of Open Data on AWS
- USBuildingFootprints -> computer generated building footprints in all 50 US states, GeoJSON format, generated using semantic segmentation
- Checkout Microsofts Planetary Computer project
Since there is a whole community around GEE I will not reproduce it here but list very select references. Get started at https://developers.google.com/earth-engine/
- Various imagery and climate datasets, including Landsat & Sentinel imagery
- awesome-google-earth-engine & awesome-earth-engine-apps
- How to Use Google Earth Engine and Python API to Export Images to Roboflow -> to acquire training data
- Reduce Satellite Image Resolution with Google Earth Engine -> a crucial step before applying machine learning to satellite imagery
- ee-fastapi is a simple FastAPI web application for performing flood detection using Google Earth Engine in the backend.
- How to Download High-Resolution Satellite Data for Anywhere on Earth
- https://www.radiant.earth/
- Datasets and also models on https://mlhub.earth/
- Database of 15,000 high-definition images with 1 million labelled ‘scenes’ will be open to the international community in June 2021
- FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery
- Download at gaofen-challenge.com
- Shuttle Radar Topography Mission, search online at usgs.gov
- Copernicus Digital Elevation Model (DEM) on S3, represents the surface of the Earth including buildings, infrastructure and vegetation. Data is provided as Cloud Optimized GeoTIFFs. link
- UK metoffice -> https://www.metoffice.gov.uk/datapoint
- NASA (make request and emailed when ready) -> https://search.earthdata.nasa.gov
- NOAA (requires BigQuery) -> https://www.kaggle.com/noaa/goes16/home
- Time series weather data for several US cities -> https://www.kaggle.com/selfishgene/historical-hourly-weather-data
- BreizhCrops -> A Time Series Dataset for Crop Type Mapping
- The SeCo dataset contains image patches from Sentinel-2 tiles captured at different timestamps at each geographical location. Download SeCo here
- Onera Satellite Change Detection Dataset comprises 24 pairs of multispectral images taken from the Sentinel-2 satellites between 2015 and 2018
- SYSU-CD -> The dataset contains 20000 pairs of 0.5-m aerial images of size 256×256 taken between the years 2007 and 2014 in Hong Kong
- Many on https://www.visualdata.io
- AU-AIR dataset -> a multi-modal UAV dataset for object detection.
- ERA -> A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos.
- Aerial Maritime Drone Dataset
- Stanford Drone Dataset
- RetinaNet for pedestrian detection
- Aerial Maritime Drone Dataset
- EmergencyNet -> identify fire and other emergencies from a drone
- OpenDroneMap -> generate maps, point clouds, 3D models and DEMs from drone, balloon or kite images.
- Dataset of thermal and visible aerial images for multi-modal and multi-spectral image registration and fusion -> The dataset consists of 30 visible images and their metadata, 80 thermal images and their metadata, and a visible georeferenced orthoimage.
- BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos -> TIR videos of humans and animals with several challenging scenarios like scale variations, background clutter due to thermal reflections, large camera rotations, and motion blur
- ERA: A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos
- The Synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation
- RarePlanes -> incorporates both real and synthetically generated satellite imagery including aircraft.
- Checkout Microsoft AirSim, which is a simulator for drones, cars and more, built on Unreal Engine
- Combining Synthetic Data with Real Data to Improve Detection Results in Satellite Imagery
- Synthinel -> synthetic overhead imagery with full pixel-wise building labels, created using ESRI CityEngine
- BlenderGIS could be used for synthetic data generation
- https://www.azavea.com/projects/raster-vision/
- An open source Python framework for building computer vision models on aerial, satellite, and other large imagery sets.
- Accessible through the Raster Foundry
- Example use cases on open data
- torchrs
- PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Resolution, Land Cover Classification/Segmentation, Image-to-Image Translation, etc.) for various Optical (Sentinel-2, Landsat, etc.) and Synthetic Aperture Radar (SAR) (Sentinel-1) sensors
- https://github.com/developmentseed/chip-n-scale-queue-arranger
- an orchestration pipeline for running machine learning inference at scale
- Supports fastai models
- http://spaceml.org/
- A Machine Learning toolbox and developer community building the next generation AI applications for space science and exploration.
- TorchSat is an open-source deep learning framework for satellite imagery analysis based on PyTorch
- https://github.com/nshaud/DeepNetsForEO
- Uses SegNET for working on remote sensing images using deep learning.
- https://github.com/developmentseed/skynet-data
- Data pipeline for machine learning with OpenStreetMap
- https://github.com/mapbox/robosat
- Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
- robosat-jupyter-notebook -> walks through all of the steps in an excellent blog post on the Robosat feature extraction and machine learning pipeline.
- Note there is/was fork of Robosat, originally named RoboSat.pink, and subsequently neat-EO.pink although this appears to be dead/archived
- https://github.com/trailbehind/DeepOSM
- Train a deep learning net with OpenStreetMap features and satellite imagery.
- Compute and data storage are moving to the cloud
- A combination of batch processing on clusters and serverless functions are common for routine compute tasks
- Custom hardware is being developed for rapid training and inferencing with deep learning models
- Traditional data formats aren't designed for processing on the cloud, so new standards are evolving such as COGS and STAC
- Read about how Planet and Airbus use Google Cloud as their backend
- Google Earth Engine and Microsoft Planetary Computer are democratising access to huge compute platforms
- Whilst the combo of python and keras/pytorch are currently preeminent, new python libraries such as Jax and alternative languages such as Julia are showing serious promise
- This article discusses some of the available platforms
- Pangeo -> There is no single software package called “pangeo”; rather, the Pangeo project serves as a coordination point between scientists, software, and computing infrastructure. Includes open source resources for parallel processing using Dask and Xarray. Pangeo recently announced their 2.0 goals: pivoting away from directly operating cloud-based JupyterHubs, and towards eductaion and research
- Airbus Sandbox -> will provide access to imagery
- Descartes Labs -> access to EO imagery from a variety of providers via python API
- DigitalGlobe have a cloud hosted Jupyter notebook platform called GBDX. Cloud hosting means they can guarantee the infrastructure supports their algorithms, and they appear to be close/closer to deploying DL.
- Planet have a Jupyter notebook platform which can be deployed locally.
- eurodatacube.com -> data & platform for EO analytics in Jupyter env, paid
- up42 is a developer platform and marketplace, offering all the building blocks for powerful, scalable geospatial products
- Microsoft Planetary Computer -> direct Google Earth Engine competitor in the making?
- eofactory.ai -> supports multi public and private data sources that can be used to analyse and extract information
A GPU is required for training deep learning models (but not necessarily for inferencing), and this section lists a couple of free Jupyter environments with GPU available. There is a good overview of online Jupyter development environments on the fastai site. I personally use Colab Pro with data hosted on Google Drive, or Sagemaker if I have very long running training jobs.
- Collaboratory notebooks with GPU as a backend for free for 12 hours at a time. Note that the GPU may be shared with other users, so if you aren't getting good performance try reloading.
- Also a pro tier for $10 a month -> https://colab.research.google.com/signup
- Tensorflow, pytorch & fastai available but you may need to update them
- Colab Alive is a chrome extension that keeps Colab notebooks alive.
- colab-ssh -> lets you ssh to a colab instance like it’s an EC2 machine and install packages that require full linux functionality
- Free to use
- GPU Kernels - may run for 1 hour
- Tensorflow, pytorch & fastai available but you may need to update them
- Advantage that many datasets are already available
An overview of the most relevant services provided by the main cloud providers. This section is limited since I personally use AWS and have a small amount of experience with Google. Also consider Microsoft Azure.
- Host your data on S3 and metadata in a db such as postgres
- For batch processing use Batch. GPU instances are available for batch deep learning inferencing. See how Rastervision implement this here
- If processing can be performed in 15 minutes or less, serverless Lambda functions are an attractive option owing to their ability to scale. Note that lambda may not be a particularly quick solution for deep learning applications, since you do not have the option to batch inference on a GPU. Creating a docker container with all the required dependencies can be a challenge. To get started read Using container images to run PyTorch models in AWS Lambda and for an image classification example checkout this repo
- Use Glue for data preprocessing
- To orchestrate basic data pipelines Step functions. Use the AWS Step Functions Workflow Studio to get started. Read Orchestrating and Monitoring Complex, Long-running Workflows Using AWS Step Functions. Note that step functions are defined in JSON
- If step functions are too limited or you want to write pipelines in python and use Directed Acyclic Graphs (DAGs) for workflow management, checkout hosted AWS managed Airflow. Read Orchestrate XGBoost ML Pipelines with Amazon Managed Workflows for Apache Airflow and checkout amazon-mwaa-examples
- Sagemaker includes a hosted Jupyter environment for training of ML models. There are also tools for deployment of models, using docker.
- Deep learning AMIs are EC2 instances with deep learning frameworks preinstalled. They do require more setup from the user than Sagemaker but in return allow access to the underlying hardware, which makes debugging issues more straightforward. There is a good guide to setting up your AMI instance on the Keras blog
- Specifically created for deep learning inferencing is AWS Inferentia
- Rekognition custom labels is a 'no code' annotation, training and inferencing service. Read Training models using Satellite (Sentinel-2) imagery on Amazon Rekognition Custom Labels. For a comparison with Azure and Google alternatives read this article
- When developing you will definitely want to use boto3 and probably aws-data-wrangler
- For managing infrastructure use Terraform. Alternatively if you wish to use TypeScript, JavaScript, Python, Java, or C# checkout AWS CDK, although I found relatively few examples to get going using python
- AWS Ground Station now supports data delivery to Amazon S3
- Redshift is a fast, scalable data warehouse that can extend queries to S3. Redshift is based on PostgreSQL but has some differences. Redshift supports geospatial data.
- AWS App Runner enables quick deployment of containers as apps
- For storage use Cloud Storage (AWS S3 equivalent)
- For data warehousing use BigQuery (AWS Redshift equivalent). Visualize massive spatial datasets directly in BigQuery using CARTO
- For model training use Vertex (AWS Sagemaker equivalent)
- For containerised apps use Cloud Run (AWS App Runner equivalent but can scale to zero)
This section discusses how to get a trained machine learning & specifically deep learning model into production. For an overview on serving deep learning models checkout Practical-Deep-Learning-on-the-Cloud. There are many options if you are happy to dedicate a server, although you may want a GPU for batch processing. For serverless consider AWS lambda.
A common approach to serving up deep learning model inference code is to wrap it in a rest API. The API can be implemented in python (flask or FastAPI), and hosted on a dedicated server e.g. EC2 instance. Note that making this a scalable solution will require significant experience.
- Basic API: https://blog.keras.io/building-a-simple-keras-deep-learning-rest-api.html with code here
- Advanced API with request queuing: https://www.pyimagesearch.com/2018/01/29/scalable-keras-deep-learning-rest-api/
If you are happy to live exclusively in the Tensorflow or Pytorch ecosystem, these are good options
- Tensorflow serving is limited to Tensorflow models
- Pytorch serve is easy to use, limited to Pytorch models, can be deployed via AWS Sagemaker
- The Triton Inference Server provides an optimized cloud and edge inferencing solution
- Supports TensorFlow, ONNX, PyTorch TorchScript and OpenVINO model formats. Both TensorFlow 1.x and TensorFlow 2.x are supported.
- Read CAPE Analytics Uses Computer Vision to Put Geospatial Data and Risk Information in Hands of Property Insurance Companies
- Available on the AWS Marketplace
- GeoServer -> an open source server for sharing geospatial data
- Open Data Cube - serve up cubes of data https://www.opendatacube.org/
- https://terria.io/ for pretty catalogues
- Sentinel-hub eo-browser
- Large datasets may come in HDF5 format, can view with -> https://www.hdfgroup.org/downloads/hdfview/
- Climate data is often in netcdf format, which can be opened using xarray
- The xarray docs list a number of ways that data can be stored and loaded.
- TileDB -> a 'Universal Data Engine' to store, analyze and share any data (beyond tables), with any API or tool (beyond SQL) at planet-scale (beyond clusters), open source and managed options. Recently hiring to work with xarray, dask, netCDF and cloud native storage
- BigVector database -> A fully-managed, highly-scalable, and cost-effective database for vectors. Vectorize structured data or orbital imagery and discover new insights
- Read about Serverless PostGIS on AWS Aurora
- Hub -> The fastest way to store, access & manage datasets with version-control for PyTorch/TensorFlow. Works locally or on any cloud. Read Faster Machine Learning Using Hub by Activeloop: A code walkthrough of using the hub package for satellite imagery
- A Comparison of Spatial Functions: PostGIS, Athena, PrestoDB, BigQuery vs RedShift
- Unfolded Studio -> visualization platform building on open source geospatial technologies including kepler.gl, deck.gl and H3. Processing is performed browser side enabling very responsive visualisations.
A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF that supports HTTP range requests, enabling downloading of specific tiles rather than the full file. COG generally work normally in GIS software such as QGIS, but are larger than regular GeoTIFFs
- https://www.cogeo.org/
- cog-best-practices
- COGs in production
- rio-cogeo -> Cloud Optimized GeoTIFF (COG) creation and validation plugin for Rasterio.
- aiocogeo -> Asynchronous cogeotiff reader (python asyncio)
- Landsat data in cloud optimised (COG) format analysed for NVDI with medium article Cloud Native Geoprocessing of Earth Observation Satellite Data with Pangeo.
- Working with COGS and STAC in python using geemap
- Load, Experiment, and Download Cloud Optimized Geotiffs (COG) using Python with Google Colab -> short read which covers finding COGS, opening with Rasterio and doing some basic manipulations, all in a Colab Notebook.
- Exploring USGS Terrain Data in COG format using hvPlot -> local COG from public AWS bucket, open with rioxarray, visualise with hvplot. See the Jupyter notebook
- aws-lambda-docker-rasterio -> AWS Lambda Container Image with Python Rasterio for querying Cloud Optimised GeoTiffs. See this presentation
- cogbeam -> a python based Apache Beam pipeline, optimized for Google Cloud Dataflow, which aims to expedite the conversion of traditional GeoTIFFs into COGs
- cogserver -> Expose a GDAL file as a HTTP accessible on-the-fly COG
- Displaying a gridded dataset on a web-based map - Step by step guide for displaying large GeoTIFFs, using Holoviews, Bokeh, and Datashader
- cog_worker -> Scalable arbitrary analysis on COGs
The STAC specification provides a common metadata specification, API, and catalog format to describe geospatial assets, so they can more easily indexed and discovered.
- Spec at https://github.com/radiantearth/stac-spec
- STAC 1.0.0: The State of the STAC Software Ecosystem
- Planet Disaster Data catalogue has the catalogue source on Github and uses the stac-browser
- Getting Started with STAC APIs intro article
- SpatioTemporal Asset Catalog API specification -> an API to make geospatial assets openly searchable and crawlable
- stacindex -> STAC Catalogs, Collections, APIs, Software and Tools
- Several useful repos on https://github.com/sat-utils
- Intake-STAC -> Intake-STAC provides an opinionated way for users to load Assets from STAC catalogs into the scientific Python ecosystem. It uses the intake-xarray plugin and supports several file formats including GeoTIFF, netCDF, GRIB, and OpenDAP.
- sat-utils/sat-search -> Sat-search is a Python 3 library and a command line tool for discovering and downloading publicly available satellite imagery using STAC compliant API
- franklin -> A STAC/OGC API Features Web Service focused on ease-of-use for end-users.
- stacframes -> A Python library for working with STAC Catalogs via Pandas DataFrames
- sat-api-pg -> A Postgres backed STAC API
- stactools -> Command line utility and Python library for STAC
- pystac -> Python library for working with any STAC Catalog
- STAC Examples for Nightlights data -> minimal example STAC implementation for the Light Every Night dataset of all VIIRS DNB and DMSP-OLS nighttime satellite data
- stackstac -> Turn a STAC catalog into a dask-based xarray
- stac-fastapi -> STAC API implementation with FastAPI
- ml-aoi -> An Item and Collection extension to provide labeled training data for machine learning models
- Using STAC to catalog machine learning training data
- eoAPI -> Earth Observation API with STAC + dynamic Raster/Vector Tiler
For supervised machine learning, you will require annotated images. For example if you are performing object detection you will need to annotate images with bounding boxes. Check that your annotation tool of choice supports large image (likely geotiff) files, as not all will. Note that GeoJSON is widely used by remote sensing researchers but this annotation format is not commonly supported in general computer vision frameworks, and in practice you may have to convert the annotation format to use the data with your chosen framework. There are both closed and open source tools for creating and converting annotation formats. Some of these tools are simply for performing annotation, whilst others add features such as dataset management and versioning.
Start with labelImg or labelme if you are annotating solo, or CVAT if you are in a team.
- If you are considering building an in house annotation platform read this article. Used PostGis database, GeoJson format and GIS standard in a stateless architecture.
- labelImg is the classic desktop tool, limited to bounding boxes for object detection. Also checkout roLabelImg which supports ROTATED rectangle regions, as often occurs in aerial imagery.
- Labelme is a simple dektop app for polygonal annotation, but note it outputs annotations in a custom LabelMe JSON format which you will need to convert. Read Labelme Image Annotation for Geotiffs
- CVAT suports object detection, segmentation and classification via a local web app. There is an open issue to support large TIFF files. This article on Roboflow gives a good intro to CVAT.
- Create your own annotation tool using Bokeh Holoviews
- geolabel-maker -> combine satellite or aerial imagery with vector spatial data to create your own ground-truth dataset in the COCO format for deep-learning models
- VoTT -> an electron app for building end to end Object Detection Models from Images and Videos, by Microsoft
- Label Studio is a multi-type data labeling and annotation tool with standardized output format, webpage at labelstud.io
- Deeplabel is a cross-platform tool for annotating images with labelled bounding boxes. Deeplabel also supports running inference using state-of-the-art object detection models like Faster-RCNN and YOLOv4. With support out-of-the-box for CUDA, you can quickly label an entire dataset using an existing model.
- Alturos.ImageAnnotation is a collaborative tool for labeling image data on S3 for yolo
- rectlabel is a desktop app for MacOS to annotate images for bounding box object detection and segmentation, paid and free (rectlabel-lite) versions
- pigeonXT can be used to create custom image classification annotators within Jupyter notebooks
- ipyannotations -> Image annotations in python using jupyter notebooks
- diffgram supports cloud backends, also available as hosted service
- Label-Detect -> is a graphical image annotation tool and using this tool a user can also train and test large satellite images, fork of the popular labelImg tool
- Swipe-Labeler -> Swipe Labeler is a Graphical User Interface based tool that allows rapid labeling of image data
- SuperAnnotate can be run locally or used via a cloud service
- dash_doodler -> A web application built with plotly/dash for image segmentation with minimal supervision
Generally more fully featured than open source tools, often adding model assisted labelling & integration with providers of annotation as a service (outsourced annotation). There are many companies competing in this space, so I just list a few I have experience with.
- GroundWork is designed for annotating and labeling geospatial data like satellite imagery, from Azavea
- Roboflow can be used to convert between annotation formats & manage datasets, as well as train and deploy custom models. Free tier quite useful
- supervise.ly is one of the more fully featured platforms, decent free tier
- AWS supports image annotation via the Rekognition Custom Labels console
- The labelbox.com free tier is quite generous
Note there are many annotation formats, although PASCAL VOC and coco-json are the most commonly used.
- PASCAL VOC format: XML files in the format used by ImageNet
- coco-json format: JSON in the format used by the 2015 COCO dataset
- YOLO Darknet TXT format: contains one text file per image, used by YOLO
- Tensorflow TFRecord: a proprietary binary file format used by the Tensorflow Object Detection API
- Many more formats listed here
Many of these companies & products predate the open source software boom, and offer functionality which can be found in open source alternatives. However it is important to consider the licensing and support aspects before adopting an open source stack.
- ArcGIS -> mapping and analytics software, with both local and cloud hosted options. Checkout Geospatial deep learning with arcgis.learn. It appears ArcGIS are using fastai for their deep learning backend. ArcGIS Jupyter Notebooks in ArcGIS Enterprise are built to run big data analysis, deep learning models, and dynamic visualization tools.
- ENVI -> image processing and analysis
- ERDAS IMAGINE -> remote sensing, photogrammetry, LiDAR analysis, basic vector analysis, and radar processing into a single product
- PEARL -> a human-in-the-loop AI tool to drastically reduce the time required to produce an accurate land cover map, blog post, uses Microsoft Planetary Computer and (some?) ML models run locally in the browser
- Spacemetric Keystone -> transform unprocessed sensor data into quality geospatial imagery ready for analysis
- microimages TNTgis -> advanced GIS, image processing, and geospatial analysis at an affordable price
A note on licensing: The two general types of licenses for open source are copyleft and permissive. Copyleft requires that subsequent derived software products also carry the license forward, e.g. the GNU Public License (GNU GPLv3). For permissive, options to modify and use the code as one please are more open, e.g. MIT & Apache 2. Checkout choosealicense.com/
- QGIS- Create, edit, visualise, analyse and publish geospatial information. Python scripting and plugins. Open source alternative to ArcGIS.
- Orfeo toolbox - remote sensing toolbox with python API (just a wrapper to the C code). Do activites such as pansharpening, ortho-rectification, image registration, image segmentation & classification. Not much documentation.
- QUICK TERRAIN READER - view DEMS, Windows
- dl-satellite-docker -> docker files for geospatial analysis, including tensorflow, pytorch, gdal, xgboost...
- AIDE V2 - Tools for detecting wildlife in aerial images using active learning
- Land Cover Mapping web app from Microsoft
- Solaris -> An open source ML pipeline for overhead imagery by CosmiQ Works, similar to Rastervision but with some unique very vool features
- openSAR -> Synthetic Aperture Radar (SAR) Tools and Documents from Earth Big Data LLC (http://earthbigdata.com/)
- qhub -> QHub enables teams to build and maintain a cost effective and scalable compute/data science platform in the cloud.
- imagej -> a very versatile image viewer and processing program
- Geo Data Viewer extension for VSCode which enables opening and viewing various geo data formats with nice visualisations
- Datasette is a tool for exploring and publishing data as an interactive website and accompanying API, with SQLite backend. Various plugins extend its functionality, for example to allow displaying geospatial info, render images (useful for thumbnails), and add user authentication.
- Photoprism is a privately hosted app for browsing, organizing, and sharing your photo collection, with support for tiffs
- dbeaver is a free universal database tool and SQL client with geospatial features
- Grafana can be used to make interactive dashboards, checkout this example showing Point data. Note there is an AWS managed service for Grafana
- litestream -> Continuously stream SQLite changes to S3-compatible storage
- ImageFusion) -> Temporal fusion of raster image time-Series
- nvtop -> NVIDIA GPUs htop like monitoring tool
- So improtant this pair gets their own section. GDAL is THE command line tool for reading and writing raster and vector geospatial data formats. If you are using python you will probably want to use Rasterio which provides a pythonic wrapper for GDAL
- GDAL and on twitter
- GDAL is a dependency of Rasterio and can be difficult to build and install. I recommend using conda, brew (on OSX) or docker in these situations
- GDAL docker quickstart:
docker pull osgeo/gdal
thendocker run --rm -v $(pwd):/data/ osgeo/gdal gdalinfo /data/cog.tiff
- Even Rouault maintains GDAL, please consider sponsoring him
- Rasterio -> reads and writes GeoTIFF and other raster formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON. There are a variety of plugins that extend Rasterio functionality.
- rio-cogeo -> Cloud Optimized GeoTIFF (COG) creation and validation plugin for Rasterio.
- rioxarray -> geospatial xarray extension powered by rasterio
- aws-lambda-docker-rasterio -> AWS Lambda Container Image with Python Rasterio for querying Cloud Optimised GeoTiffs. See this presentation
- godal -> golang wrapper for GDAL
- Write rasterio to xarray
- Loam: A Client-Side GDAL Wrapper for Javascript
- Short list of useful GDAL commands while working in data science for remote sensing
- PyShp -> The Python Shapefile Library (PyShp) reads and writes ESRI Shapefiles in pure Python
- s2p -> a Python library and command line tool that implements a stereo pipeline which produces elevation models from images taken by high resolution optical satellites such as Pléiades, WorldView, QuickBird, Spot or Ikonos
- EarthPy -> A set of helper functions to make working with spatial data in open source tools easier. readExploratory Data Analysis (EDA) on Satellite Imagery Using EarthPy
- pygeometa -> provides a lightweight and Pythonic approach for users to easily create geospatial metadata in standards-based formats using simple configuration files
- pesto -> PESTO is designed to ease the process of packaging a Python algorithm as a processing web service into a docker image. It contains shell tools to generate all the boiler plate to build an OpenAPI processing web service compliant with the Geoprocessing-API. By Airbus Defence And Space
- GEOS -> Google Earth Overlay Server (GEOS) is a python-based server for creating Google Earth overlays of tiled maps. Your can also display maps in the web browser, measure distances and print maps as high-quality PDF’s.
- GeoDjango intends to be a world-class geographic Web framework. Its goal is to make it as easy as possible to build GIS Web applications and harness the power of spatially enabled data. Some features of GDAL are supported.
- rasterstats -> summarize geospatial raster datasets based on vector geometries
- turfpy -> a Python library for performing geospatial data analysis which reimplements turf.js
- image-similarity-measures -> Implementation of eight evaluation metrics to access the similarity between two images. Blog post here
- rsgislib -> Remote Sensing and GIS Software Library; python module tools for processing spatial data.
- eo-learn is a collection of open source Python packages that have been developed to seamlessly access and process spatio-temporal image sequences acquired by any satellite fleet in a timely and automatic manner
- RStoolbox: Tools for Remote Sensing Data Analysis in R
- nd -> Framework for the analysis of n-dimensional, multivariate Earth Observation data, built on xarray
- reverse-geocoder -> a fast, offline reverse geocoder in Python
- xarray -> N-D labeled arrays and datasets. Read Handling multi-temporal satellite images with Xarray. Checkout xarray_leaflet for tiled map plotting
- xarray-spatial -> Fast, Accurate Python library for Raster Operations. Implements algorithms using Numba and Dask, free of GDAL
- xarray-beam -> Distributed Xarray with Apache Beam by Google
- Geowombat -> geo-utilities applied to air- and space-borne imagery, uses Rasterio, Xarray and Dask for I/O and distributed computing with named coordinates
- NumpyTiles -> a specification for providing multiband full-bit depth raster data in the browser
- Zarr -> Zarr is a format for the storage of chunked, compressed, N-dimensional arrays. Zarr depends on NumPy
- Pillow is the Python Imaging Library -> this will be your go-to package for image manipulation in python
- opencv-python is pre-built CPU-only OpenCV packages for Python
- kornia is a differentiable computer vision library for PyTorch, like openCV but on the GPU. Perform image transformations, epipolar geometry, depth estimation, and low-level image processing such as filtering and edge detection that operate directly on tensors.
- tifffile -> Read and write TIFF files
- xtiff -> A small Python 3 library for writing multi-channel TIFF stacks
- geotiff -> A noGDAL tool for reading and writing geotiff files
- image_slicer -> Split images into tiles. Join the tiles back together.
- tiler -> split images into tiles and merge tiles into a large image
- felicette -> Satellite imagery for dummies. Generate JPEG earth imagery from coordinates/location name with publicly available satellite data.
- imagehash -> Image hashes tell whether two images look nearly identical.
- xbatcher -> Xbatcher is a small library for iterating xarray DataArrays in batches. The goal is to make it easy to feed xarray datasets to machine learning libraries such as Keras.
- fake-geo-images -> A module to programmatically create geotiff images which can be used for unit tests
- sahi -> A vision library for performing sliced inference on large images/small objects
- imagededup -> Finding duplicate images made easy! Uses perceptual hashing
- rmstripes -> Remove stripes from images with a combined wavelet/FFT approach
- activeloopai Hub -> The fastest way to store, access & manage datasets with version-control for PyTorch/TensorFlow. Works locally or on any cloud. Scalable data pipelines.
- sewar -> All image quality metrics you need in one package
- fiftyone -> open-source tool for building high-quality datasets and computer vision models. Visualise complex labels, evaluating models, exploring scenarios of interest, identifying failure modes, finding annotation mistakes, and much more!
- GeoTagged_ImageChip -> A simple script to create geo tagged image chips from high resolution RS iamges for training deep learning models such as Unet.
- Label Maker -> downloads OpenStreetMap QA Tile information and satellite imagery tiles and saves them as an
.npz
file for use in machine learning training. - Satellite imagery label tool -> provides an easy way to collect a random sample of labels over a given scene of satellite imagery
Image augmentation is a technique used to expand a training dataset in order to improve ability of the model to generalise
- AugLy -> A data augmentations library for audio, image, text, and video. By Facebook
- albumentations -> Fast image augmentation library and an easy-to-use wrapper around other libraries
- FoHIS -> Towards Simulating Foggy and Hazy Images and Evaluating their Authenticity
- geo-ml-model-catalog -> provides a common metadata definition for ML models that operate on geospatial data
- dvc -> not specific to EO ML models, dvc is a git extension to keep track of changes in data, source code, and ML models together
- rastervision
- torchvision-enhance -> Enhance PyTorch vision for semantic segmentation, multi-channel images and TIF file
- DeepHyperX -> A Python/pytorch tool to perform deep learning experiments on various hyperspectral datasets.
- landsat_ingestor -> Scripts and other artifacts for landsat data ingestion into Amazon public hosting
- satpy -> a python library for reading and manipulating meteorological remote sensing data and writing it to various image and data file formats
- GIBS-Downloader -> a command-line tool which facilitates the downloading of NASA satellite imagery and offers different functionalities in order to prepare the images for training in a machine learning pipeline
- eodag -> Earth Observation Data Access Gateway
- pylandsat -> Search, download, and preprocess Landsat imagery
- sentinelsat -> Search and download Copernicus Sentinel satellite images
- landsatxplore -> Search and download Landsat scenes from EarthExplorer
- hvplot -> A high-level plotting API for the PyData ecosystem built on HoloViews. Allows overlaying data on map tiles, see Exploring USGS Terrain Data in COG format using hvPlot
- Pyviz examples include several interesting geospatial visualisations
- napari -> napari is a fast, interactive, multi-dimensional image viewer for Python. It’s designed for browsing, annotating, and analyzing large multi-dimensional images. By integrating closely with the Python ecosystem, napari can be easily coupled to leading machine learning and image analysis tools. Note that to view a 3GB COG I had to install the napari-tifffile-reader plugin.
- pixel-adjust -> Interactively select and adjust specific pixels or regions within a single-band raster. Built with rasterio, matplotlib, and panel.
- Plotly Dash can be used for making interactive dashboards
- folium -> a python wrapper to the excellent leaflet.js which makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. Also checkout the streamlit-folium component for adding folium maps to your streamlit apps
- ipyearth -> An IPython Widget for Earth Maps
- geopandas-view -> Interactive exploration of GeoPandas GeoDataFrames
- geogif -> Turn xarray timestacks into GIFs
- leafmap -> geospatial analysis and interactive mapping with minimal coding in a Jupyter environment
- xmovie -> A simple way of creating movies from xarray objects
- acquisition-time -> Drawing (Satellite) acquisition dates in a timeline
- splot -> Lightweight plotting for geospatial analysis in PySAL
- prettymaps -> A small set of Python functions to draw pretty maps from OpenStreetMap data
- Tools to Design or Visualize Architecture of Neural Network
Streamlit is an awesome python framework for creating apps with python. Additionally they will host the apps free of charge. Here I list resources which are EO related. Note that a component is an addon which extends Streamlits basic functionality. If you like Streamlit also checkout gradio
- cogviewer -> Simple Cloud Optimized GeoTIFF viewer
- cogcreator -> Simple Cloud Optimized GeoTIFF Creator. Generates COG from GeoTIFF files.
- cogvalidator -> Simple Cloud Optimized GeoTIFF validator
- streamlit-image-juxtapose -> A simple Streamlit component to compare images in Streamlit apps
- streamlit-folium -> Streamlit Component for rendering Folium maps
- streamlit-keplergl -> Streamlit component for rendering kepler.gl maps
- streamlit-light-leaflet -> Streamlit quick & dirty Leaflet component that sends back coordinates on map click
- leafmap-streamlit -> various examples showing how to use streamlit to: create a 3D map using Kepler.gl, create a heat map, display a GeoJSON file on a map, and add a colorbar or change the basemap on a map
- BirdsPyView -> convert images to top-down view and get coordinates of objects
- Build a useful web application in Python: Geolocating Photos
- Wild fire detection app
- Dask works with your favorite PyData libraries to provide performance at scale for the tools you love -> checkout Read and manipulate tiled GeoTIFF datasets
- Coiled is a managed Dask service. Get started by reading Democratizing Satellite Imagery Analysis with Dask
- Dask with PyTorch for large scale image analysis
- stackstac -> Turn a STAC catalog into a dask-based xarray
- dask-geopandas -> Parallel GeoPandas with Dask
- dask-image -> many SciPy ndimage functions implemented
- WaterDetect -> an end-to-end algorithm to generate open water cover mask, specially conceived for L2A Sentinel 2 imagery. It can also be used for Landsat 8 images and for other multispectral clustering/segmentation tasks.
- GatorSense Hyperspectral Image Analysis Toolkit -> This repo contains algorithms for Anomaly Detectors, Classifiers, Dimensionality Reduction, Endmember Extraction, Signature Detectors, Spectral Indices
- detectree -> Tree detection from aerial imagery
- pylandstats -> compute landscape metrics
- dg-calibration -> Coefficients and functions for calibrating DigitalGlobe imagery
- python-fmask -> Implementation in Python of the cloud and shadow algorithms known collectively as Fmask
- pyshepseg -> Python implementation of image segmentation algorithm of Shepherd et al (2019) Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination.
- Shadow-Detection-Algorithm-for-Aerial-and-Satellite-Images -> shadow detection and correction algorithm
- Adam Van Etten is doing interesting things in object detection and segmentation
- Andrew Cutts cohosts the Scene From Above podcast and has many interesting repos
- Ankit Kariryaa published a recent nature paper on tree detection
- Chris Holmes is doing great things at Planet
- Christoph Rieke maintains a very popular imagery repo and has published his thesis on segmentation
- Daniel J Dufour builds geotiff.io and more
- Daniel Moraite is publishing some excellent articles
- Even Rouault maintains several of the most critical tools in this domain such as GDAL, please consider sponsoring him
- Gonzalo Mateo García is working on clouds and Water segmentation with CNNs
- Isaac Corley is working on super-resolution and torchrs
- Jake Shermeyer many interesting repos
- Mort Canty is an expert in change detection
- Mykola Kozyr is working on streamlit apps
- Nicholas Murray is an Australia-based scientist with a focus on delivering the science necessary to inform large scale environmental management and conservation
- Oscar Mañas is advancing the state of the art in SSL
- Qiusheng Wu is an Assistant Professor in the Department of Geography at the University of Tennessee, checkout his YouTube channel
- Rodrigo Caye Daudt is doing great work on change detection
- Robin Wilson is a former academic who is very active in the satellite imagery space
For a full list of companies, on and off Github, checkout awesome-geospatial-companies. The following lists companies with interesting Github profiles.
- Airbus Defence And Space
- Azavea -> lots of interesting repos around STAC
- Development Seed
- Descartes Labs
- DHI GRAS
- ElementAI
- Element 84
- Hummingbird Technologies Ltd -> sustainability and optimised food production
- ICEYE
- Mapbox -> thanks for Rasterio!
- Maxar-Analytics
- Near Space Labs
- Planet Labs -> thanks for COGS!
- Preligens -> formerly Earthcube Lab
- SatelliteVu -> currently it's all private!
- SpaceKnow
- Sparkgeo
- up42 -> Airbus spinout providing 'The easiest way to build geospatial solutions'
- Introduction to Geospatial Raster and Vector Data with Python -> an intro course on a single page
- Manning: Monitoring Changes in Surface Water Using Satellite Image Data
- Automating GIS processes includes a lesson on automating raster data processing
- For deep learning checkout the fastai course which uses the fastai library & pytorch
- pyimagesearch.com hosts courses and plenty of material using opencv and keras
- Official opencv courses on opencv.org
- TensorFlow Developer Professional Certificate
- Geospatial_Python_CourseV1 -> a collection of blog posts turned into a course format
- Satellite Machine Learning Training -> lessons on how to apply Machine Learning analysis to satellite data.
- Image Analysis, Classification and Change Detection in Remote Sensing With Algorithms for Python, Fourth Edition, By Morton John Canty -> code here
- I highly recommend Deep Learning with Python by François Chollet
- fast AI geospatial study group
- Kaggle Intro to Satellite imagery Analysis group
- Omdena brings together small teams of engineers to work on AI projects
Signup for the geospatial-jobs-newsletter and Pangeo discourse lists multiple jobs, global. List of companies job portals below:
Processing on satellite allows less data to be downlinked. E.g. super-resolution image might take 4-8 images to generate, then a single image is downlinked.
- Lockheed Martin and USC to Launch Jetson-Based Nanosatellite for Scientific Research Into Orbit - Aug 2020 - One app that will run on the GPU-accelerated satellite is SuperRes, an AI-based application developed by Lockheed Martin, that can automatically enhance the quality of an image.
- Intel to place movidius in orbit to filter images of clouds at source - Oct 2020 - Getting rid of these images before they’re even transmitted means that the satellite can actually realize a bandwidth savings of up to 30%
- Whilst not involving neural nets the PyCubed project gets a mention here as it is putting python on space hardware such as the V-R3x
- WorldFloods will pioneer the detection of global flood events from space, launched on June 30, 2021. This paper describes the model which is run on Intel Movidius Myriad2 hardware capable of processing a 12 MP image in less than a minute
My background is in optical physics, and I hold a PhD from Cambridge on the topic of localised surface Plasmons. Since academia I have held a variety of roles, including doing research at Sharp Labs Europe, developing optical systems at Surrey Satellites (SSTL), and working at an IOT startup. It was whilst at SSTL that I started this repository as a personal resource. Over time I have steadily gravitated towards data analytics and software engineering with python, and I now work as a senior data scientist at Satellite Vu. Please feel free to connect with me on Twitter & LinkedIn, and please do let me know if this repository is useful to your work.