-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripts for generating basins #68
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of careful, highly-informed work. Wow.
I have two main categories of change request:
- Description/documentation: Define terms (watershed, basin) in more detail. Define what the files contain at a high level -- whole province, individual watershed/basin? And broadly what the pipeline does to these files.
- Code: Some items to improve code readability and maintainability.
|
||
You might think that more points - a higher resolution geometry - is ideal, but given the way the concave hull algorithm works, this is not always the case. You might imagine concave hull as attempting to shrinkwrap all the individual points with one sheet of shrinkwrap with the smallest possible amount of air inside the shrink wrap. Lower ratios corresponding to giving the algorithm longer sheets of shrinkwrap. | ||
|
||
Imagine you have three points, and concave hull is attempting to shrinkwrap them. You give the algorithm a short piece of shrinkwrap, and it wraps around the outside of the three points, making a triangle. If, instead, you give the algorithm a longer piece of shrink wrap, it might wrap around the outside from point one to point two and from point two to point three, then turn and wrap arond the inside from point three back to point two, and along the inside back to point one. In this case, rather than a triangle, you get the hollow outline of one, a narrow wrapped strip just big enough to contain the three points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good explanation.
|
||
A ratio of .8 was good for the shapefiles included in this directory, but different ratios might be good for different files. You'll have to open your shapefile in QGIS or something to make sure the generated polygons make sense, and rerun with a different ratio if they don't. | ||
|
||
This repository includes an example of a poorly selected ratio. ratio-original.png shows the original shapefile (brown) for Haida Gwai. ratio-08.png shows the resulting geometry (yellow) run with a ratio of .8, which came out well. Ratio-01.png shows the resulting geometry (green) run with a ratio of .1. You can see that the large island at the top has become a thin strip around the outside of the region, instead of enclosing the entire region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Can you be more explicit about the the qualities or criteria you wanted in the result? That is, what is the difference in the result between "poorly-selected" and "well-selected"? It seems something like
- Small(est?) number of polygons? Ratio 0.1 seems to give more polygons (4) than 0.8 does (2) , yes?
- Small amount of concavity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or possibly I am misreading the images. In either case, a bit more explanation of criteria and meaning of images would help.
|
||
## Files Processed | ||
|
||
watershed-degrees.shp was processed to concave.shp by concave_hull.py. basins.shp was generated from concave.shp with MjrDrain_Formatted.csv as a list of which watershed is in which basin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel a bit foggy on what the files are and how the processing transforms them. A few questions:
- Still a bit foggy on the relationship between watersheds and basins, despite the explanation at the top of this document.
- What is in
watershed-degrees.shp
? All watersheds in the province? (Also, what does the word "degrees" communicate in its name? Is that units in the projection or the like? Is that important; would this fail in other projections/units?) - What, in overview, is in
concave.shp
? I'm guessing that it is all the watersheds in the province, concave-ized? - What, in overview, is in
basin.shp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I just spotted this in your initial PR comment
The Salmon Climate Impacts Portal uses a hierarchical model of watersheds. A "watershed" is a small section of river or stream. Each watershed belongs to a "basin," which is a major river like the Fraser. These scripts generate shapefiles for the "basins" from the "watersheds".
That would be an excellent thing to include in DESCRIPTION.md , along with any other commentary you think would be enlightening. Domain terminology is always key!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in fact, if I understand the usual meaning of "watershed" correctly, the particular usage/meaning of it in this context idiosyncratic and just a tad confusing. It might be helpful to point this out too. But this may be a naive comment; I'm hardly a domain expert.
logger.info("Loading input files") | ||
|
||
driver = ogr.GetDriverByName("ESRI Shapefile") | ||
input = driver.Open(args.input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work as a context manager? If so, it would probably be better to use it.
|
||
# create output files | ||
logger.info("Creating output files") | ||
output = driver.CreateDataSource(args.output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context manager?
logger.info("Removing holes from {} multipolygon".format(basin)) | ||
polygons = [] | ||
for polygon in from_wkt(geom).geoms: | ||
polygons.append(Polygon(polygon.exterior.coords)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer list comprehension.
) | ||
basins[basin] = {"watersheds": 1, "geometry": geom} | ||
|
||
logger.info("Outputting basins") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining what exactly is output (per basin) would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like in the output file, a feature is a basin. Yes? See comment above re. explaining correspondence between file elements and domain concepts.
output_feature.SetGeometry(ogr.CreateGeometryFromWkt(to_wkt(no_holes))) | ||
|
||
out_layer.CreateFeature(output_feature) | ||
num_basins = num_basins + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks as if num_basins
just ends up being len(basins)
. Or am I missing something?
logger.info("Loading shapefile") | ||
|
||
driver = ogr.GetDriverByName("ESRI Shapefile") | ||
watersheds = driver.Open(args.watersheds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context manager?
out_layer.CreateField(ogr.FieldDefn("NAME", ogr.OFTString)) | ||
out_layer.CreateField(ogr.FieldDefn("WATERSH", ogr.OFTInteger)) | ||
|
||
# step through the watersheds and create each basin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused as to the relationship between file elements (layers, features) and domain concepts (watersheds, basins). Some comments explaining this would be excellent. Possibly in DESCRIPTION.md, also here.
These scripts are unintentionally stripping projection information out of the shapefile and should be updated to not do that. |
The Salmon Climate Impacts Portal uses a hierarchical model of watersheds. A "watershed" is a small section of river or stream. Each watershed belongs to a "basin," which is a major river like the Fraser. These scripts generate shapefiles for the "basins" from the "watersheds".
The plan is to use this data to help a used find which watershed they care about - they start by selecting a river (basin) and then they can look through its tributaries (watersheds) to find the area they're interested in, or view data for the entire basin, if they prefer.
resolves #67