Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS 2-3: Investigate accessing AWS Annual 10m LULC dataset #112

Open
rajadain opened this issue Apr 15, 2024 · 13 comments
Open

AWS 2-3: Investigate accessing AWS Annual 10m LULC dataset #112

rajadain opened this issue Apr 15, 2024 · 13 comments
Assignees
Labels
AWS Funding Source: AWS

Comments

@rajadain
Copy link
Member

This is the dataset: https://aws.amazon.com/marketplace/pp/prodview-llgtvqflvozwu

Test accessing this via GDAL / QGIS. Confirm if this requires specifying an AWS account.

Ensure the data can be subsetted to an Areas of Interest.

@rajadain
Copy link
Member Author

rajadain commented May 6, 2024

The dataset is saved in us-west-2, we should be cognizant of associated data transfer costs.

@rajadain
Copy link
Member Author

rajadain commented May 6, 2024

Browsing the dataset here: https://radiantearth.github.io/stac-browser/#/external/api.impactobservatory.com/stac-aws/collections/io-10m-annual-lulc, we see that the TIFFs are in an arrangement that spans the globe:

2024-05-06.16.23.07.mp4

It's also not quite clear how to navigate the TIFF files. There is an index file: s3://io-10m-annual-lulc/io-10m-annual-lulc.ndjson with each newline-delimited row containing JSON like:

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "id": "60F-2022",
  "properties": {
    "proj:bbox": [276230, 3789860, 723770, 4683690],
    "proj:epsg": 32760,
    "proj:shape": [89383, 44754],
    "end_datetime": "2023-01-01T00:00:00Z",
    "proj:geometry": {
      "type": "Polygon",
      "coordinates": [
        [
          [276230, 3789860],
          [723770, 3789860],
          [723770, 4683690],
          [276230, 4683690],
          [276230, 3789860]
        ]
      ]
    },
    "proj:projjson": {
      "id": {"code": 32760, "authority": "EPSG"},
      "name": "WGS 84 / UTM zone 60S",
      "type": "ProjectedCRS",
      "$schema": "https://proj.org/schemas/v0.4/projjson.schema.json",
      "base_crs": {
        "id": {"code": 4326, "authority": "EPSG"},
        "name": "WGS 84",
        "datum": {
          "name": "World Geodetic System 1984",
          "type": "GeodeticReferenceFrame",
          "ellipsoid": {
            "name": "WGS 84",
            "semi_major_axis": 6378137,
            "inverse_flattening": 298.257223563
          }
        },
        "coordinate_system": {
          "axis": [
            {
              "name": "Geodetic latitude",
              "unit": "degree",
              "direction": "north",
              "abbreviation": "Lat"
            },
            {
              "name": "Geodetic longitude",
              "unit": "degree",
              "direction": "east",
              "abbreviation": "Lon"
            }
          ],
          "subtype": "ellipsoidal"
        }
      },
      "conversion": {
        "name": "UTM zone 60S",
        "method": {
          "id": {"code": 9807, "authority": "EPSG"},
          "name": "Transverse Mercator"
        },
        "parameters": [
          {
            "id": {"code": 8801, "authority": "EPSG"},
            "name": "Latitude of natural origin",
            "unit": "degree",
            "value": 0
          },
          {
            "id": {"code": 8802, "authority": "EPSG"},
            "name": "Longitude of natural origin",
            "unit": "degree",
            "value": 177
          },
          {
            "id": {"code": 8805, "authority": "EPSG"},
            "name": "Scale factor at natural origin",
            "unit": "unity",
            "value": 0.9996
          },
          {
            "id": {"code": 8806, "authority": "EPSG"},
            "name": "False easting",
            "unit": "metre",
            "value": 500000
          },
          {
            "id": {"code": 8807, "authority": "EPSG"},
            "name": "False northing",
            "unit": "metre",
            "value": 10000000
          }
        ]
      },
      "coordinate_system": {
        "axis": [
          {
            "name": "Easting",
            "unit": "metre",
            "direction": "east",
            "abbreviation": ""
          },
          {
            "name": "Northing",
            "unit": "metre",
            "direction": "north",
            "abbreviation": ""
          }
        ],
        "subtype": "Cartesian"
      }
    },
    "proj:transform": [10, 0, 276230, 0, -10, 4683690, 0, 0, 1],
    "start_datetime": "2022-01-01T00:00:00Z",
    "io:supercell_id": "60F",
    "datetime": "2022-06-01T00:00:00Z"
  },
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [
      [
        [
          [180, -55.99999],
          [180, -48.00137],
          [180, -48.00096176128751],
          [179.99766, -47.96096],
          [179.71251, -47.96805],
          [179.42727, -47.97444],
          [179.14193, -47.98011],
          [178.8565, -47.98508],
          [178.57101, -47.98934],
          [178.28545, -47.9929],
          [177.99985, -47.99574],
          [177.71421, -47.99787],
          [177.42853, -47.99929],
          [177.14285, -48],
          [176.85715, -48],
          [176.57147, -47.99929],
          [176.28579, -47.99787],
          [176.00015, -47.99574],
          [175.71455, -47.9929],
          [175.42899, -47.98934],
          [175.1435, -47.98508],
          [174.85807, -47.98011],
          [174.57273, -47.97444],
          [174.28749, -47.96805],
          [174.00234, -47.96096],
          [173.97997, -48.34337],
          [173.95712, -48.72575],
          [173.93378, -49.10809],
          [173.90995, -49.4904],
          [173.8856, -49.87268],
          [173.86073, -50.25492],
          [173.83532, -50.63713],
          [173.80935, -51.0193],
          [173.7828, -51.40144],
          [173.75567, -51.78354],
          [173.72792, -52.16561],
          [173.69955, -52.54764],
          [173.67054, -52.92964],
          [173.64086, -53.3116],
          [173.6105, -53.69352],
          [173.57943, -54.07541],
          [173.54763, -54.45726],
          [173.51508, -54.83907],
          [173.48176, -55.22085],
          [173.44763, -55.60258],
          [173.41267, -55.98428],
          [173.75367, -55.99374],
          [174.09487, -56.00226],
          [174.43624, -56.00984],
          [174.77776, -56.01646],
          [175.11942, -56.02215],
          [175.46119, -56.02689],
          [175.80305, -56.03068],
          [176.14498, -56.03352],
          [176.48697, -56.03542],
          [176.82899, -56.03636],
          [177.17101, -56.03636],
          [177.51303, -56.03542],
          [177.85502, -56.03352],
          [178.19695, -56.03068],
          [178.53881, -56.02689],
          [178.88058, -56.02215],
          [179.22224, -56.01646],
          [179.56376, -56.00984],
          [179.90513, -56.00226],
          [180, -55.99999]
        ]
      ],
      [
        [
          [-180, -55.99999],
          [-179.75367, -55.99374],
          [-179.41267, -55.98428],
          [-179.44763, -55.60258],
          [-179.48176, -55.22085],
          [-179.51508, -54.83907],
          [-179.54763, -54.45726],
          [-179.57943, -54.07541],
          [-179.6105, -53.69352],
          [-179.64086, -53.3116],
          [-179.67054, -52.92964],
          [-179.69955, -52.54764],
          [-179.72792, -52.16561],
          [-179.75567, -51.78354],
          [-179.7828, -51.40144],
          [-179.80935, -51.0193],
          [-179.83532, -50.63713],
          [-179.86073, -50.25492],
          [-179.8856, -49.87268],
          [-179.90995, -49.4904],
          [-179.93378, -49.10809],
          [-179.95712, -48.72575],
          [-179.97997, -48.34337],
          [-180, -48.00096176128751],
          [-180, -48.00137],
          [-180, -55.99999]
        ]
      ]
    ]
  },
  "links": [],
  "assets": {
    "supercell": {
      "href": "https://s3.us-west-2.amazonaws.com/io-10m-annual-lulc/60F_2022.tif",
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "file:values": [
        {"values": [0], "summary": "No Data"},
        {"values": [1], "summary": "Water"},
        {"values": [2], "summary": "Trees"},
        {"values": [4], "summary": "Flooded vegetation"},
        {"values": [5], "summary": "Crops"},
        {"values": [7], "summary": "Built area"},
        {"values": [8], "summary": "Bare ground"},
        {"values": [9], "summary": "Snow/ice"},
        {"values": [10], "summary": "Clouds"},
        {"values": [11], "summary": "Rangeland"}
      ],
      "raster:bands": [{"nodata": 0, "spatial_resolution": 10}],
      "roles": ["data"]
    }
  },
  "bbox": [173.41267, -56.03636, -179.41267, -47.96096],
  "stac_extensions": [
    "https://stac-extensions.github.io/projection/v1.0.0/schema.json",
    "https://stac-extensions.github.io/file/v2.0.0/schema.json",
    "https://stac-extensions.github.io/raster/v1.1.0/schema.json"
  ],
  "collection": "io-10m-annual-lulc"
}

@rajadain
Copy link
Member Author

rajadain commented May 7, 2024

After Impact Observatory fixed their API, I was successful in using it to load relevant GeoTIFFs in QGIS:

image

Thanks to @gadomski, here's a way to do the same using pystac-client:

stac-client search https://api.impactobservatory.com/stac-aws -c io-10m-annual-lulc --intersects "$(cat Lower\ Schuylkill\ River,\ HUC-10\ Watershed\ \(ID\ 0204020310\).geojson )" --max-items 10 | stac-asset download

So it seems that we can use STAC clients to identify the COGs that intersect with an area of interest. Next we need to pull just the bits from them that are in the AoI, and not the rest. This will be done via GeoTrellis. I'll next delve into the GeoTrellis documentation to study that.

@rajadain
Copy link
Member Author

rajadain commented May 8, 2024

I was able to make a demo gist using the Python GIS stack to fetch and subset data for the above shape: https://gist.github.com/rajadain/45e00f2e9518350d12fdb2b92f5f38be. This clarifies the use case in my mind. I'd like to edit this to also use the correct color bands, and do a simple count operation. Then I'll move on to implementing this in GeoTrellis.

@rajadain
Copy link
Member Author

rajadain commented May 8, 2024

Should use the colors specified here: https://www.impactobservatory.com/maps-for-good/

@rajadain
Copy link
Member Author

This is a good example notebook that shows how to work with this data using other Python tools:

https://edsbook.org/notebooks/gallery/b128b282-dee7-44a7-bc21-f1fd21452a83/notebook

@rajadain
Copy link
Member Author

My latest investigation is here: https://gist.github.com/rajadain/fae21522bfb4cfb2a9d28b35d4e3f573

I'm using this scheme to translate the NLCD values to Impact Observatory classes:

NLCD_TO_IO = {
    11: 1, # NLCD Open Water to IO Water
    12: 9, # NLCD Perennial Ice/Snow to IO Snow/Ice
    21: 7, # NLCD Developed, Open Space to IO Built area
    22: 7, # NLCD Developed, Low Intensity to IO Built area
    23: 7, # NLCD Developed, Medium Intensity to IO Built area
    24: 7, # NLCD Developed, High Intensity to IO Built area
    31: 8, # NLCD Barren Land (Rock/Sand/Clay) to IO Bare ground
    41: 2, # NLCD Deciduous Forest to IO Trees
    42: 2, # NLCD Evergreen Forest to IO Trees
    43: 2, # NLCD Mixed Forest to IO Trees
    52: 11, # NLCD Shrub/Scrub to IO Rangeland
    71: 11, # NLCD Grassland/Herbaceous to IO Rangeland
    81: 5, # NLCD Pasture/Hay to IO Crops
    82: 5, # NLCD Cultivated Crops to IO Crops
    90: 4, # NLCD Woody Wetlands to IO Flooded vegetation
    95: 4, # NLCD Emergent Herbaceous Wetlands to IO Flooded vegetation
}

For the Lower Schuylkill HUC-10 0204020310, this is the output from the IO Global Dataset:

image

and this is how its output compares to MMW's using NLCD 2019 translated to IO classes:

image

For the Conococheague-Opequon HUC-8 02070004, this is the output from the IO Global Dataset:

image

and this is how its output compares to MMW's using NLCD 2019 translated to IO classes:

image

These spot checks show that while there is general agreement, these two datasets are not identical by any means.

@aufdenkampe
Copy link
Member

aufdenkampe commented May 28, 2024

@rajadain, thanks for this analysis! I'm not too surprised that are discrepancies, given that the land use / land cover (LULC) classes have different definitions, especially for short vegetation (pasture, shrub, rangeland). Here's some extra into to on IO & NLCD classes for examining your initial crosswalk.

io_lulc_classes = {
    'No Data': 0, 
    'Water': 1, 
    'Trees': 2, 
    'Flooded vegetation': 4, 
    'Crops': 5, 
    'Built area': 7, 
    'Bare ground': 8, 
    'Snow/ice': 9, 
    'Clouds': 10, 
    'Rangeland': 11
}

@aufdenkampe
Copy link
Member

@rajadain, one suggested edit to your crosswalk might be to change NLCD 81 Pasture/Hay from IO 5 Crops to IO 11 Rangeland:

81: 11, # NLCD Pasture/Hay to IO Rangeland

I'm guessing this update will substantially improve the comparison.

@aufdenkampe
Copy link
Member

It seems like further exploration of the crosswalk will require close inspection of some of these areas vs satellite imagery, probably in QGIS.

I wonder if anyone else has already come up with an NCLD to IO crosswalk?

@rajadain
Copy link
Member Author

rajadain commented May 28, 2024

Good idea, I was wondering about that too. Updated gist: https://gist.github.com/rajadain/1d3591a7a00c750466e3793bf1a9bcd2

Shape MMW vs IO if NLCD 81 mapped to IO 5 MMW vs IO if NLCD 81 mapped to IO 11
Lower Schuylkil HUC-10 0204020310 image Diff: 34.3% image Diff 24.8%
Conococheague-Opequon HUC-8 02070004 image Diff 30.3% image Diff 30.0%

@rajadain
Copy link
Member Author

Also added a diff percentage above that divides the total absolute difference by the total MMW baseline, which shows that NLCD 81 to IO 11 is a better assignment.

@aufdenkampe
Copy link
Member

@rajadain, thanks for showing the comparison and adding the overall difference.

It looks like cross walking "NLCD 81 pasture/hay" to "IO 11 Rangeland" isn't the perfect solution, as it overshoots the previously observed differences. That said, it is likely the best choice.

Frankly, the overall comparison is quite good. One wouldn't expect a perfect match, given the different resolution, the different satellites, the different processing methods, and the different class definitions. It's actually surprisingly good given all that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS Funding Source: AWS
Projects
None yet
Development

No branches or pull requests

2 participants