-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update deafrica-waterbodies command line tools #11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making the necessary changes! The current level of buffering looks good to me -- I have made a suggestion that we avoid hard-coding this value if possible, and base it on a default value of a 500 m buffer, to then be divided by the resolution of the land-sea mask. If you can make this change, I think that would help us re-use this in the future if we ever want to change the mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the remaining changes! It's really close now. My only comment is that I think it would be best to calculate the longest length at this stage, but to only filter polygons during the conflux processing.
This allows us to still map large water bodies and display them on DE Africa maps.
I think some thinking will need to go into how we handle the attributes for these large polygons though. This might also change if we end up moving to much more of a raster-based pipeline as well.
I think for now, it might be best to remove the filtering step from here and just do the filtering in conflux.
|
||
waterbodies_gdf = add_polygon_properties(polygons=waterbodies_gdf) | ||
|
||
waterbodies_gdf = filter_by_length( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does running the filter by length step at the polygon generation stage mean that our output data set wouldn't include large polygons at all? This might not be ideal for displaying the product on the DE Africa maps. I can think of two options:
- Calculate the length attribute here, but only filter the polygons in the conflux repository.
- Write out two copies of the parquet file: one with all water bodies, and one with large water bodies removed. The one with large waterbodies removed could then be used for the conflux processing.
For simplicity, I think it might be best to keep all water bodies at this stage of the code, and then remove the large waterbodies at the conflux stage. Given the length attribute has been calculated here, that attribute can be used to filter out the large water bodies during the conflux processing.
# Mask any pixels whose frequency of water detection is less than the threshold. | ||
wofs_alltime_summary_valid_wetness = wofs_alltime_summary.frequency > threshold | ||
# Convert to numpy arrays for image processing. | ||
np_detection = xr_detection.to_numpy().astype(int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To save memory, we could use the same change here that we made to the conflux processing -- use xr_detection.values.astype(int) (and same for xr_extent)
output_directory = TEST_OUTPUT_DIRECTORY | ||
min_polygon_size = 4500 | ||
max_polygon_size = math.inf | ||
length_threshold_km = 150 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest removing filtering by length from this step and only doing this for conflux. Length attribute should still be calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Vicky, this is looking great!
generate-polygons
andgenerate-timeseries
tools and add tests