This module is designed to facilitate interaction with the various data products used in the GLAM system, both Normalized Difference Vegetation Index (NDVI) products and the "ancillary" data products that fall outside the core NDVI functionality. These products can be downloaded and processed by glam_data_processing
.
glam_data_processing
was developed as part of the GLAM system's move to the cloud in 2019-2020. It provides a programmatic way to pull and ingest the necessary ancillary data, and offers reliable interaction with the AWS portion of the system.
When working with the volume of data that we are, it's vital to have a re-usable, general engine for downloading imagery, uploading it to the cloud, and extracting the relevant statistics. This module provides all of that functionality for both the NDVI and ancillary data products.
The Downloader
class can be used to pull any available data product, whether from its source (NASA, Copernicus, etc.) or from the GLAM AWS S3 bucket. The Downloader.pull()
method allows quick and easy retrieval of image files. The resulting file name is automatically formatted for use with Image
objects (see below), allowing for efficient automation.
Downloading of NDVI products relies on the octvi
package.
This module offers two classes used for image ingestion and stats generation: one for use with ancillary data products, and one for use with NDVI products. Instances are initialized by providing the path to a well-named image on disk (e.g. Image("C:/swi.2019-01-01.tif")
). Both classes inherit from the generic Image
class, and thus share common attributes and methods.
Image.ingest()
performs database ingestion and S3 uploading for the given image. If this method is successfully executed, the file will be available for display in the GLAM system, and custom statistics generation can be performed. Regional cached statistics, however, are not generated by this method.
Image.uploadStats()
extracts and uploads regional statistics for the image, making them available for retrieval from the GLAM statistics database. Note that the image will not be visible through the GLAM system unless successfully ingested (see Image.ingest()
above).
Handling of ancillary products (CHIRPS rainfall, MERRA-2 temperature, and Soil Water Index) should be done through the AncillaryImage
class. When an instance of this class is successfully initialized (by passing the constructor the full path to the file on disk), the ingest()
and uploadStats()
methods will be available (see above).
Date format for ancillary files is "%Y-%m-%d"; e.g. "2019-01-01".
Handling of NDVI products (M*D09Q1, M*D13Q1, etc) should be done through the ModisImage
class. When an instance of this class is successfully initialized (by passing the constructor the full path to the file on disk), the ingest()
and uploadStats()
methods will be available (see above).
Date format for NDVI files is "%Y.%j"; e.g. "2019.001".
It is possible to set credentials and update all data streams from the command line.
The glamconfigure
script prompts the user to configure their credentials for the GLAM database and for the two password-protected data archives (MERRA-2 and Copernicus) in the data stream. These credentials are written to a json file that glam_data_processing
reads upon load.
The glamupdatedata
script is an all-in-one tool for ensuring that the GLAM archive is up to date. The script finds all available missing files, downloads them, ingests them and calculates statistics on them, and then deletes the files on disk. This script can be put into a cron job to keep the data pool as up-to-date as possible.
import glam_data_processing as glam # helpful to provide a shorter name
# The ToDoList class searches the database for missing files,
# and also creates new database records for all potentially
# available files between the current date and the last record
# for each product type.
# create a ToDoList object
toDo = glam.ToDoList()
# filter out unavailable files; leave only the dates that have
# available imagery
toDo.filterUnavailabe()
# create a Downloader object
downloader = glam.Downloader()
# iterate over ToDoList
for t in toDo: # yields tuples of the form ("PRODUCT","DATE")
files = downloader.pullFromSource(*t, "C:/temp") # returns tuple of filepaths
for f in files:
# Use either ModisImage or AncillaryImage to ingest the data
# and generate regional statistics
img = glam.getImageType(f)(f) # create correct Image object type
img.ingest() # ingest image to S3 bucket and database
img.uploadStats() # upload image statistics
MIT License
Copyright (c) 2020 F. Dan O'Neill
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.