Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CaFFe Dataset #2350

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/api/datamodules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ CaBuAr

.. autoclass:: CaBuArDataModule

CaFFe
^^^^^

.. autoclass:: CaFFeDataModule

ChaBuD
^^^^^^

Expand Down
5 changes: 5 additions & 0 deletions docs/api/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,11 @@ CaBuAr

.. autoclass:: CaBuAr

CaFFe
^^^^^

.. autoclass:: CaFFe

ChaBuD
^^^^^^

Expand Down
1 change: 1 addition & 0 deletions docs/api/datasets/non_geo_datasets.csv
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Dataset,Task,Source,License,# Samples,# Classes,Size (px),Resolution (m),Bands
`BigEarthNet`_,C,Sentinel-1/2,"CDLA-Permissive-1.0","590,326",19--43,120x120,10,"SAR, MSI"
`BioMassters`_,R,Sentinel-1/2 and Lidar,"CC-BY-4.0",,,256x256, 10, "SAR, MSI"
`CaBuAr`_,CD,Sentinel-2,"OpenRAIL",424,2,512x512,20,MSI
`CaFFe`_,S,"Sentinel-1, TerraSAR-X, TanDEM-X, ENVISAT, European Remote Sensing Satellite 1&2, ALOS PALSAR, and RADARSAT-1","CC-BY-4.0","19092",4,"512x512",6-20,"SAR"
`ChaBuD`_,CD,Sentinel-2,"OpenRAIL",356,2,512x512,10,MSI
`Cloud Cover Detection`_,S,Sentinel-2,"CC-BY-4.0","22,728",2,512x512,10,MSI
`COWC`_,"C, R","CSUAV AFRL, ISPRS, LINZ, AGRC","AGPL-3.0-only","388,435",2,256x256,0.15,RGB
Expand Down
Binary file added tests/data/caffe/caffe.zip
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 80 additions & 0 deletions tests/data/caffe/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#!/usr/bin/env python3

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import hashlib
import os
import shutil

import numpy as np
from PIL import Image

# Define the root directory and subdirectories
root_dir = 'caffe'
sub_dirs = ['zones', 'sar_images', 'fronts']
splits = ['train', 'val', 'test']

zone_file_names = [
'Crane_2002-11-09_ERS_20_2_061_zones__93_102_0_0_0.png',
'Crane_2007-09-22_ENVISAT_20_1_467_zones__93_102_8_1024_0.png',
'JAC_2015-12-23_TSX_6_1_005_zones__57_49_195_384_1024.png',
]

IMG_SIZE = 32


# Function to create dummy images
def create_dummy_image(path: str, shape: tuple[int], pixel_values: list[int]) -> None:
data = np.random.choice(pixel_values, size=shape, replace=True).astype(np.uint8)
img = Image.fromarray(data)
img.save(path)


def create_zone_images(split: str, filename: str) -> None:
zone_pixel_values = [0, 64, 127, 255]
path = os.path.join(root_dir, 'zones', split, filename)
create_dummy_image(path, (IMG_SIZE, IMG_SIZE), zone_pixel_values)


def create_sar_images(split: str, filename: str) -> None:
sar_pixel_values = range(256)
path = os.path.join(root_dir, 'sar_images', split, filename)
create_dummy_image(path, (IMG_SIZE, IMG_SIZE), sar_pixel_values)


def create_front_images(split: str, filename: str) -> None:
sar_pixel_values = range(256)
path = os.path.join(root_dir, 'fronts', split, filename)
create_dummy_image(path, (IMG_SIZE, IMG_SIZE), sar_pixel_values)


if os.path.exists(root_dir):
shutil.rmtree(root_dir)

# Create the directory structure
for sub_dir in sub_dirs:
for split in splits:
os.makedirs(os.path.join(root_dir, sub_dir, split), exist_ok=True)

# Create dummy data for all splits and filenames
for split in splits:
for filename in zone_file_names:
create_zone_images(split, filename)
create_sar_images(split, filename.replace('_zones_', '_'))
create_front_images(split, filename.replace('_zones_', '_front_'))

# zip and compute md5
shutil.make_archive(root_dir, 'zip', '.', root_dir)


def md5(fname: str) -> str:
hash_md5 = hashlib.md5()
with open(fname, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b''):
hash_md5.update(chunk)
return hash_md5.hexdigest()


md5sum = md5('caffe.zip')
print(f'MD5 checksum: {md5sum}')
42 changes: 42 additions & 0 deletions tests/datamodules/test_caffe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import os

import matplotlib.pyplot as plt
import pytest

from torchgeo.datamodules import CaFFeDataModule


class TestCaFFeDataModule:
@pytest.fixture
def datamodule(self) -> CaFFeDataModule:
root = os.path.join('tests', 'data', 'caffe')
batch_size = 2
num_workers = 0
dm = CaFFeDataModule(root=root, batch_size=batch_size, num_workers=num_workers)
return dm

def test_train_dataloader(self, datamodule: CaFFeDataModule) -> None:
datamodule.setup('fit')
next(iter(datamodule.train_dataloader()))

def test_val_dataloader(self, datamodule: CaFFeDataModule) -> None:
datamodule.setup('validate')
next(iter(datamodule.val_dataloader()))

def test_test_dataloader(self, datamodule: CaFFeDataModule) -> None:
datamodule.setup('test')
next(iter(datamodule.test_dataloader()))

def test_plot(self, datamodule: CaFFeDataModule) -> None:
datamodule.setup('validate')
batch = next(iter(datamodule.val_dataloader()))
sample = {
'image': batch['image'][0],
'mask_zones': batch['mask_zones'][0],
'mask_front': batch['mask_front'][0],
}
datamodule.plot(sample)
plt.close()
72 changes: 72 additions & 0 deletions tests/datasets/test_caffe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import os
import shutil
from pathlib import Path

import matplotlib.pyplot as plt
import pytest
import torch
import torch.nn as nn
from _pytest.fixtures import SubRequest
from pytest import MonkeyPatch

from torchgeo.datasets import CaFFe, DatasetNotFoundError


class TestCaFFe:
@pytest.fixture(params=['train', 'val', 'test'])
def dataset(
self, monkeypatch: MonkeyPatch, tmp_path: Path, request: SubRequest
) -> CaFFe:
md5 = 'f06c155a3fea372e884c234115c169e1'
monkeypatch.setattr(CaFFe, 'md5', md5)
url = os.path.join('tests', 'data', 'caffe', 'caffe.zip')
monkeypatch.setattr(CaFFe, 'url', url)
root = tmp_path
split = request.param
transforms = nn.Identity()
return CaFFe(root, split, transforms, download=True, checksum=True)

def test_getitem(self, dataset: CaFFe) -> None:
x = dataset[0]
assert isinstance(x, dict)
assert isinstance(x['image'], torch.Tensor)
assert x['image'].shape[0] == 1
assert isinstance(x['mask_zones'], torch.Tensor)
assert x['image'].shape[-2:] == x['mask_zones'].shape[-2:]

def test_len(self, dataset: CaFFe) -> None:
if dataset.split == 'train':
assert len(dataset) == 3
else:
assert len(dataset) == 3

def test_already_downloaded(self, dataset: CaFFe) -> None:
CaFFe(root=dataset.root)

def test_not_yet_extracted(self, tmp_path: Path) -> None:
filename = 'caffe.zip'
dir = os.path.join('tests', 'data', 'caffe')
shutil.copyfile(
os.path.join(dir, filename), os.path.join(str(tmp_path), filename)
)
CaFFe(root=str(tmp_path))

def test_invalid_split(self) -> None:
with pytest.raises(AssertionError):
CaFFe(split='foo')

def test_not_downloaded(self, tmp_path: Path) -> None:
with pytest.raises(DatasetNotFoundError, match='Dataset not found'):
CaFFe(tmp_path)

def test_plot(self, dataset: CaFFe) -> None:
dataset.plot(dataset[0], suptitle='Test')
plt.close()

sample = dataset[0]
sample['prediction'] = torch.clone(sample['mask_zones'])
dataset.plot(sample, suptitle='Prediction')
plt.close()
2 changes: 2 additions & 0 deletions torchgeo/datamodules/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from .agrifieldnet import AgriFieldNetDataModule
from .bigearthnet import BigEarthNetDataModule
from .cabuar import CaBuArDataModule
from .caffe import CaFFeDataModule
from .chabud import ChaBuDDataModule
from .chesapeake import ChesapeakeCVPRDataModule
from .cowc import COWCCountingDataModule
Expand Down Expand Up @@ -67,6 +68,7 @@
'SouthAfricaCropTypeDataModule',
# NonGeoDataset
'BigEarthNetDataModule',
'CaFFeDataModule',
'CaBuArDataModule',
'ChaBuDDataModule',
'COWCCountingDataModule',
Expand Down
55 changes: 55 additions & 0 deletions torchgeo/datamodules/caffe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

"""CaFFe datamodule."""

from typing import Any

import kornia.augmentation as K
import torch

from ..datasets import CaFFe
from ..transforms import AugmentationSequential
from .geo import NonGeoDataModule


class CaFFeDataModule(NonGeoDataModule):
"""LightningDataModule implementation for the CaFFe dataset.

Implements the default splits that come with the dataset.

.. versionadded:: 0.7
"""

mean = torch.Tensor([0.5517])
std = torch.Tensor([11.8478])

def __init__(
self, batch_size: int = 64, num_workers: int = 0, size: int = 256, **kwargs: Any
) -> None:
"""Initialize a new CaFFeDataModule instance.

Args:
batch_size: Size of each mini-batch.
num_workers: Number of workers for parallel data loading.
size: resize images of input size 512x512 to size x size
**kwargs: Additional keyword arguments passed to
:class:`~torchgeo.datasets.CaFFe`.
"""
super().__init__(CaFFe, batch_size, num_workers, **kwargs)

self.train_aug = AugmentationSequential(
K.Normalize(mean=self.mean, std=self.std),
K.Resize(size),
K.RandomHorizontalFlip(p=0.5),
K.RandomVerticalFlip(p=0.5),
data_keys=['image', 'mask'],
)

self.aug = AugmentationSequential(
K.Normalize(mean=self.mean, std=self.std),
K.Resize(size),
data_keys=['image', 'mask'],
)

self.size = size
2 changes: 2 additions & 0 deletions torchgeo/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from .bigearthnet import BigEarthNet
from .biomassters import BioMassters
from .cabuar import CaBuAr
from .caffe import CaFFe
from .cbf import CanadianBuildingFootprints
from .cdl import CDL
from .chabud import ChaBuD
Expand Down Expand Up @@ -205,6 +206,7 @@
'BigEarthNet',
'BioMassters',
'CaBuAr',
'CaFFe',
'ChaBuD',
'CloudCoverDetection',
'COWC',
Expand Down
Loading