Skip to content

Commit

Permalink
LightningDataModule to load GeoTIFF files (#52)
Browse files Browse the repository at this point in the history
* ➕ Add torchdata

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries!

* ♻️ Refactor test_model_vit to use datapipe fixture

Decoupling the neural network model's unit test from the LightningDataModule by implementing a standalone datapipe fixture instead.

* ✨ Implement GeoTIFFDataPipeModule

Create a LightningDataModule to load GeoTIFF files. Uses torchdata to create the data pipeline. Using the FileLister DataPipe to iterate over *.tif files in the data/ folder, and do a random 80/20 split for the training and validation set. The GeoTIFF files are read into numpy.ndarrrays using rasterio, and converted to torch.Tensors with the default collate function. Using rasterio instead of rioxarray to reduce an extra layer of overhead in the data loading.

* 🧵 Allow configuring num_workers in DataLoader

Enable setting the number of subprocesses used for data loading. Default to 8 for now, but can be configured on LightningCLI using `python trainer.py fit --data.num_workers=8`.

* 📌 Install torchdata=0.7.1 from conda-forge instead of PyPI

Contains a build of torchdata that is pre-compiled with the correct AWSSDK extension, and won't result in errors like `ValueError: curlCode: 77, Problem with the SSL CA cert (path? access rights?)`.

* 🔧 Allow configuring data path containing the GeoTIFF files

Enable setting the path to the folder containing the GeoTIFF data files. Defaults to data/ for now, but can be configured on LightningCLI using `python trainer.py fit --data.data_path=data/56HKH`. Also setting the recursive=True flag to allow for files in nested directories.

* ✅ Add unit test for GeoTIFFDataModule

Ensure that loading one mini-batch of data from a data folder works. Created two temporary random GeoTIFF files containing arrays of shape (3, 256, 256) in a fixture for the test.
  • Loading branch information
weiji14 authored Nov 28, 2023
1 parent 7704b9b commit be426c1
Show file tree
Hide file tree
Showing 7 changed files with 266 additions and 148 deletions.
Loading

0 comments on commit be426c1

Please sign in to comment.