Skip to content

Commit

Permalink
Reorganize some information
Browse files Browse the repository at this point in the history
  • Loading branch information
adamjstewart committed Feb 29, 2024
1 parent 18bd060 commit cf38ce3
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 30 deletions.
4 changes: 2 additions & 2 deletions docs/tutorials/custom_raster_dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -329,11 +329,11 @@
"\n",
"### `is_image`\n",
"\n",
"If your dataset only contains source data, such as image files, like Sentinel-2, or a digital surface, like a Digital Elevation Model, Digital Surface Model, Digital Terrain Model, or a raster of temperature values, use `is_image = True`. If your dataset only contains target data, such as a segmentation mask, like land use or land cover classification, use `is_image = False` instead.\n",
"If your data only contains model inputs (such as images), use `is_image = True`. If your data only contains ground truth model outputs (such as segmentation masks), use `is_image = False` instead.\n",
"\n",
"### `dtype`\n",
"\n",
"Defaults to float32 for `is_image == True` and long for `is_image == False`. This is what is usually wanted for 99% of datasets but can be overridden for pixel-wise regression masks (where the target should be float32). Uint16 and uint32 are automatically cast to int32 and int64, respectively, because numpy supports the former but torch does not.\n",
"Defaults to float32 for `is_image == True` and long for `is_image == False`. This is what you want for 99% of datasets, but can be overridden for tasks like pixel-wise regression (where the target mask should be float32).\n",
"\n",
"### `separate_files`\n",
"\n",
Expand Down
66 changes: 38 additions & 28 deletions torchgeo/datasets/geo.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,11 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
based on latitude/longitude. This allows users to do things like:
* Combine image and target labels and sample from both simultaneously
(e.g. Landsat and CDL)
(e.g., Landsat and CDL)
* Combine datasets for multiple image sources for multimodal learning or data fusion
(e.g. Landsat and Sentinel)
* Combine image and digital surface (e.g., elevation, temperature,
pressure) and sample from both simultaneously (e.g. Sentinel-2 and an Aster
Global DEM tile)
(e.g., Landsat and Sentinel)
* Combine image and other raster data (e.g., elevation, temperature, pressure)
and sample from both simultaneously (e.g., Landsat and Aster Global DEM)
These combinations require that all queries are present in *both* datasets,
and can be combined using an :class:`IntersectionDataset`:
Expand All @@ -73,9 +71,9 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
Users may also want to:
* Combine datasets for multiple image sources and treat them as equivalent
(e.g. Landsat 7 and Landsat 8)
(e.g., Landsat 7 and Landsat 8)
* Combine datasets for disparate geospatial locations
(e.g. Chesapeake NY and PA)
(e.g., Chesapeake NY and PA)
These combinations require that all queries are present in *at least one* dataset,
and can be combined using a :class:`UnionDataset`:
Expand Down Expand Up @@ -112,7 +110,7 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
def __init__(
self, transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new GeoDataset instance.
Args:
transforms: a function/transform that takes an input sample
Expand Down Expand Up @@ -348,14 +346,14 @@ class RasterDataset(GeoDataset):
#: ``start`` and ``stop`` groups.
date_format = "%Y%m%d"

#: True if the dataset contains source data, such as imagery. False if the dataset
#: contains target data, such as a mask. In order to maintain consistency with Kornia,
#: if ``is_image`` is set to ``True``, ``__getitem__`` uses ``image`` as the key
#: for the data; otherwise, ``__getitem__`` uses ``mask`` as the key. The default
#: behavior when multiple datasets with different keys are combined and the same
#: key is used for multiple datasets, for example 2 "image" and 1 "mask", is that
#: the channels will be stacked so that there's still a single value for that key.
#: However, this behavior can be changed by specifying adifferent ``collate_fn``.
#: True if the dataset only contains model inputs (such as images). False if the
#: dataset only contains ground truth model outputs (such as segmentation masks).
#:
#: The sample returned by the dataset/data loader will use the ``image`` key if
#: ``is_image`` is True, otherwise it will use the ``mask`` key.
#:
#: For datasets with both model inputs and outputs, a custom ``__getitem__`` method
#: must be implemented.
is_image = True

#: True if data is stored in a separate file for each band, else False.
Expand All @@ -374,9 +372,9 @@ class RasterDataset(GeoDataset):
def dtype(self) -> torch.dtype:
"""The dtype of the dataset (overrides the dtype of the data file via a cast).
Defaults to float32 for is_image = True and long for is_image = False. This is
what we usually want for 99% of datasets but can be overridden for pixel-wise
regression masks (where it should be float32).
Defaults to float32 for ``is_image = True`` and long for ``is_image = False``.
Can be overridden for tasks like pixel-wise regression where the mask should be
float32 instead of long.
Returns:
the dtype of the dataset
Expand All @@ -397,7 +395,7 @@ def __init__(
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
cache: bool = True,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new RasterDataset instance.
Args:
paths: one or more root directories to search or files to load
Expand Down Expand Up @@ -620,7 +618,7 @@ def __init__(
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
label_name: Optional[str] = None,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new VectorDataset instance.
Args:
paths: one or more root directories to search or files to load
Expand Down Expand Up @@ -888,9 +886,11 @@ class IntersectionDataset(GeoDataset):
This allows users to do things like:
* Combine image and target labels and sample from both simultaneously
(e.g. Landsat and CDL)
(e.g., Landsat and CDL)
* Combine datasets for multiple image sources for multimodal learning or data fusion
(e.g. Landsat and Sentinel)
(e.g., Landsat and Sentinel)
* Combine image and other raster data (e.g., elevation, temperature, pressure)
and sample from both simultaneously (e.g., Landsat and Aster Global DEM)
These combinations require that all queries are present in *both* datasets,
and can be combined using an :class:`IntersectionDataset`:
Expand All @@ -911,7 +911,12 @@ def __init__(
] = concat_samples,
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new IntersectionDataset instance.
When computing the intersection between two datasets that both contain model
inputs (such as images) or model outputs (such as masks), the default behavior
is to stack the data along the channel dimension. The *collate_fn* parameter
can be used to change this behavior.
Args:
dataset1: the first dataset
Expand Down Expand Up @@ -1041,9 +1046,9 @@ class UnionDataset(GeoDataset):
This allows users to do things like:
* Combine datasets for multiple image sources and treat them as equivalent
(e.g. Landsat 7 and Landsat 8)
(e.g., Landsat 7 and Landsat 8)
* Combine datasets for disparate geospatial locations
(e.g. Chesapeake NY and PA)
(e.g., Chesapeake NY and PA)
These combinations require that all queries are present in *at least one* dataset,
and can be combined using a :class:`UnionDataset`:
Expand All @@ -1064,7 +1069,12 @@ def __init__(
] = merge_samples,
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new UnionDataset instance.
When computing the union between two datasets that both contain model inputs
(such as images) or model outputs (such as masks), the default behavior is to
merge the data to create a single image/mask. The *collate_fn* parameter can be
used to change this behavior.
Args:
dataset1: the first dataset
Expand Down

0 comments on commit cf38ce3

Please sign in to comment.