Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS 2-3: Add New Operation for RasterGroupedCount on COGs #113

Open
rajadain opened this issue Apr 15, 2024 · 10 comments · May be fixed by #117
Open

AWS 2-3: Add New Operation for RasterGroupedCount on COGs #113

rajadain opened this issue Apr 15, 2024 · 10 comments · May be fixed by #117
Assignees
Labels
AWS Funding Source: AWS

Comments

@rajadain
Copy link
Member

Following up to #112, add a new operation that is similar to the RasterGroupedCount operation but works on COGs.

@rajadain rajadain added the AWS Funding Source: AWS label Apr 15, 2024
@rajadain rajadain self-assigned this Apr 15, 2024
@pomadchin
Copy link

pomadchin commented May 14, 2024

Also a tiny example here to help / isolated demo (summary of all GT Server / Avris links)

That FS2 ~ Futures on edges is a bit of a rough spot; worth considering moving server to cats-effects 3. It should be not too complex given that the server code is not that big.

GT Server artifacts are updated & published up to the most resent GT / stac4s / maml versions.

@rajadain
Copy link
Member Author

I've got some work in this branch, mainly copied from the very handy stac-example-simple above (thank you @pomadchin!), but it's not really working.

I've implemented a sample /stac endpoint, but when I hit it with this request it fails:

time xh --verbose :8090/stac shape="{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.220813751220703,39.932512033740309],[-75.150260925292969,39.929221201562726],[-75.120391845703125,39.924350479594167],[-75.124168395996094,39.894723763816785],[-75.192832946777344,39.89643587838578],[-75.229225158691406,39.903810652180944],[-75.224761962890625,39.928957928153828],[-75.220813751220703,39.932512033740309]]]]}"
POST /stac HTTP/1.1
Accept: application/json, */*;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Length: 388
Content-Type: application/json
Host: localhost:8090
User-Agent: xh/0.22.0

{
    "shape": "{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.220813751220703,39.932512033740309],[-75.150260925292969,39.929221201562726],[-75.120391845703125,39.924350479594167],[-75.124168395996094,39.894723763816785],[-75.192832946777344,39.89643587838578],[-75.229225158691406,39.903810652180944],[-75.224761962890625,39.928957928153828],[-75.220813751220703,39.932512033740309]]]]}"
}



HTTP/1.1 200 OK
Content-Length: 111
Content-Type: text/plain; charset=UTF-8
Date: Wed, 15 May 2024 17:44:35 GMT
Server: akka-http/10.5.3

no rasters found in extent Extent(-75.2292251586914, 39.894723763816785, -75.12039184570312, 39.93251203374031)
________________________________________________________
Executed in    4.86 secs      fish           external
   usr time   95.58 millis    0.19 millis   95.38 millis
   sys time   27.09 millis    2.32 millis   24.77 millis

The two issues I'm seeing are:

  1. The bbox filter is not working correctly. This is possibly because the shape is not in the right CRS, or I'm not converting a shape into a bbox correctly.
  2. The service is very slow. Corresponding operations in Python feel faster.

Will pair on this tomorrow with @philvarner and @jpolchlo.

@rajadain
Copy link
Member Author

rajadain commented May 17, 2024

I was able to get a STAC Summary endpoint working in this branch: https://github.com/WikiWatershed/mmw-geoprocessing/tree/tt/113/add-stac-cog-count-endpoint

time xh --verbose http://172.20.0.1:8090/stac shape="{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.220813751220703,39.932512033740309],[-75.150260925292969,39.929221201562726],[-75.120391845703125,39.924350479594167],[-75.124168395996094,39.894723763816785],[-75.192832946777344,39.89643587838578],[-75.229225158691406,39.903810652180944],[-75.224761962890625,39.928957928153828],[-75.220813751220703,39.932512033740309]]]]}"
POST /stac HTTP/1.1
Accept: application/json, */*;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Length: 388
Content-Type: application/json
Host: 172.20.0.1:8090
User-Agent: xh/0.18.0

{
    "shape": "{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.220813751220703,39.932512033740309],[-75.150260925292969,39.929221201562726],[-75.120391845703125,39.924350479594167],[-75.124168395996094,39.894723763816785],[-75.192832946777344,39.89643587838578],[-75.229225158691406,39.903810652180944],[-75.224761962890625,39.928957928153828],[-75.220813751220703,39.932512033740309]]]]}"
}



HTTP/1.1 200 OK
Content-Length: 80
Content-Type: application/json
Date: Fri, 17 May 2024 03:04:00 GMT
Server: akka-http/10.5.3

{
    "List(2)": 7532,
    "List(8)": 2505,
    "List(11)": 5150,
    "List(7)": 331794,
    "List(1)": 46323
}
________________________________________________________
Executed in    4.55 secs      fish           external
   usr time    2.64 millis  174.00 micros    2.46 millis
   sys time    0.09 millis   87.00 micros    0.00 millis

I think this is cropping the raster to the extent of the AoI correctly, but not the AoI itself. Need to check that next.

@rajadain
Copy link
Member Author

I made updated notebooks with simple shapes to set as the baseline correct numbers. Will compare the output of the geoprocessing service to this.

https://gist.github.com/rajadain/aa7a569fea3f316a0c9c03c846291d17

@pomadchin
Copy link

Btw trying it with GDAL on could be a nice experiment, may give some performance benefits (tho carries some hidden GDAL pitfalls along the way).

I’ll cut an updated GT release soon that properly supports GDAL 3.7+

@rajadain
Copy link
Member Author

Anecdotally, running it on my Windows Core i7 was faster than macOS M1 Pro. Possibly an x86 vs ARM issue? Final deployment will be on x86 so should be fine 🤞

@pomadchin
Copy link

Could be a JDK issue, what Java version is on M1 vs x86? I’d recommend going to 21 if possible.

@rajadain
Copy link
Member Author

That was a good tip. M1 was on JDK 18, installed 21 and it is markedly faster.

@rajadain
Copy link
Member Author

Alright, I've got a pretty good implementation in my branch https://github.com/WikiWatershed/mmw-geoprocessing/tree/tt/113/add-stac-cog-count-endpoint. It follows the recipe in Python / Jupyter here: https://gist.github.com/rajadain/1d3591a7a00c750466e3793bf1a9bcd2#file-lower-schuylkill-huc-10-test-2-ipynb.

The order of operations in Jupyter is:

  1. Find tiffs in the STAC API that intersect the AoI
  2. For each tiff, clip to AoI extent and reproject to EPSG:5070
    • (really to the correct Albers projection based on where the AoI is, but for simplicity I'm using ConusAlbers EPSG:5070 here)
  3. For each reprojected clip, mask to the AoI
  4. Merge all masked clips
  5. Calculate histogram using np.unique(data, return_counts=True)

This generates an output for the Lower Schuylkill HUC-10 like:

{
    "List(1)": 87250,
    "List(11)": 308877,
    "List(2)": 639920,
    "List(4)": 283,
    "List(5)": 173754,
    "List(7)": 2855276,
    "List(8)": 3126
}

The order of operations in GeoTrellis is:

  1. Find tiffs in the STAC API that intersect with the AoI
  2. Create a MosaicRasterSource from those tiffs, reprojected to EPSG:5070
  3. Clip the RasterSource to the AoI's extent
  4. Mask the Raster to the AoI
  5. Calculate histogram using histogram.binCounts

This results in numbers that are quite close, but not exact:

{
    "List(1)": 105463,
    "List(11)": 390845,
    "List(2)": 730907,
    "List(4)": 525,
    "List(5)": 169751,
    "List(7)": 3591677,
    "List(8)": 5066
}

Here's how they compare:

image

image

Notably, we use the relative distribution in MMW.

Here's another example with Conococheague-Opequon HUC-8:

image

image

I'm going to put up a PR for this and request feedback from the multiple folks that have assisted so far 🙏

@rajadain rajadain linked a pull request May 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS Funding Source: AWS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants