Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update deafrica-waterbodies command line tools #11

Merged
merged 133 commits into from
Nov 27, 2023
Merged

Update deafrica-waterbodies command line tools #11

merged 133 commits into from
Nov 27, 2023

Conversation

vikineema
Copy link
Collaborator

  • Update the generate-polygons and generate-timeseries tools and add tests
  • Add publishing workflows to DockerHub and PYPI

Copy link
Collaborator

@caitlinadams caitlinadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making the necessary changes! The current level of buffering looks good to me -- I have made a suggestion that we avoid hard-coding this value if possible, and base it on a default value of a 500 m buffer, to then be divided by the resolution of the land-sea mask. If you can make this change, I think that would help us re-use this in the future if we ever want to change the mask.

Copy link
Collaborator

@caitlinadams caitlinadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the remaining changes! It's really close now. My only comment is that I think it would be best to calculate the longest length at this stage, but to only filter polygons during the conflux processing.

This allows us to still map large water bodies and display them on DE Africa maps.

I think some thinking will need to go into how we handle the attributes for these large polygons though. This might also change if we end up moving to much more of a raster-based pipeline as well.

I think for now, it might be best to remove the filtering step from here and just do the filtering in conflux.


waterbodies_gdf = add_polygon_properties(polygons=waterbodies_gdf)

waterbodies_gdf = filter_by_length(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does running the filter by length step at the polygon generation stage mean that our output data set wouldn't include large polygons at all? This might not be ideal for displaying the product on the DE Africa maps. I can think of two options:

  1. Calculate the length attribute here, but only filter the polygons in the conflux repository.
  2. Write out two copies of the parquet file: one with all water bodies, and one with large water bodies removed. The one with large waterbodies removed could then be used for the conflux processing.

For simplicity, I think it might be best to keep all water bodies at this stage of the code, and then remove the large waterbodies at the conflux stage. Given the length attribute has been calculated here, that attribute can be used to filter out the large water bodies during the conflux processing.

# Mask any pixels whose frequency of water detection is less than the threshold.
wofs_alltime_summary_valid_wetness = wofs_alltime_summary.frequency > threshold
# Convert to numpy arrays for image processing.
np_detection = xr_detection.to_numpy().astype(int)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To save memory, we could use the same change here that we made to the conflux processing -- use xr_detection.values.astype(int) (and same for xr_extent)

output_directory = TEST_OUTPUT_DIRECTORY
min_polygon_size = 4500
max_polygon_size = math.inf
length_threshold_km = 150
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest removing filtering by length from this step and only doing this for conflux. Length attribute should still be calculated.

Copy link
Collaborator

@caitlinadams caitlinadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vicky, this is looking great!

@vikineema vikineema merged commit 895a18c into stable Nov 27, 2023
4 checks passed
@vikineema vikineema deleted the update-cli branch January 22, 2024 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants