Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts for generating basins #68

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Scripts for generating basins #68

wants to merge 1 commit into from

Conversation

corviday
Copy link
Contributor

@corviday corviday commented Jul 25, 2023

The Salmon Climate Impacts Portal uses a hierarchical model of watersheds. A "watershed" is a small section of river or stream. Each watershed belongs to a "basin," which is a major river like the Fraser. These scripts generate shapefiles for the "basins" from the "watersheds".

The plan is to use this data to help a used find which watershed they care about - they start by selecting a river (basin) and then they can look through its tributaries (watersheds) to find the area they're interested in, or view data for the entire basin, if they prefer.

resolves #67

Copy link

@rod-glover rod-glover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of careful, highly-informed work. Wow.

I have two main categories of change request:

  • Description/documentation: Define terms (watershed, basin) in more detail. Define what the files contain at a high level -- whole province, individual watershed/basin? And broadly what the pipeline does to these files.
  • Code: Some items to improve code readability and maintainability.


You might think that more points - a higher resolution geometry - is ideal, but given the way the concave hull algorithm works, this is not always the case. You might imagine concave hull as attempting to shrinkwrap all the individual points with one sheet of shrinkwrap with the smallest possible amount of air inside the shrink wrap. Lower ratios corresponding to giving the algorithm longer sheets of shrinkwrap.

Imagine you have three points, and concave hull is attempting to shrinkwrap them. You give the algorithm a short piece of shrinkwrap, and it wraps around the outside of the three points, making a triangle. If, instead, you give the algorithm a longer piece of shrink wrap, it might wrap around the outside from point one to point two and from point two to point three, then turn and wrap arond the inside from point three back to point two, and along the inside back to point one. In this case, rather than a triangle, you get the hollow outline of one, a narrow wrapped strip just big enough to contain the three points.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good explanation.


A ratio of .8 was good for the shapefiles included in this directory, but different ratios might be good for different files. You'll have to open your shapefile in QGIS or something to make sure the generated polygons make sense, and rerun with a different ratio if they don't.

This repository includes an example of a poorly selected ratio. ratio-original.png shows the original shapefile (brown) for Haida Gwai. ratio-08.png shows the resulting geometry (yellow) run with a ratio of .8, which came out well. Ratio-01.png shows the resulting geometry (green) run with a ratio of .1. You can see that the large island at the top has become a thin strip around the outside of the region, instead of enclosing the entire region.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Can you be more explicit about the the qualities or criteria you wanted in the result? That is, what is the difference in the result between "poorly-selected" and "well-selected"? It seems something like

  • Small(est?) number of polygons? Ratio 0.1 seems to give more polygons (4) than 0.8 does (2) , yes?
  • Small amount of concavity?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or possibly I am misreading the images. In either case, a bit more explanation of criteria and meaning of images would help.


## Files Processed

watershed-degrees.shp was processed to concave.shp by concave_hull.py. basins.shp was generated from concave.shp with MjrDrain_Formatted.csv as a list of which watershed is in which basin.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel a bit foggy on what the files are and how the processing transforms them. A few questions:

  • Still a bit foggy on the relationship between watersheds and basins, despite the explanation at the top of this document.
  • What is in watershed-degrees.shp? All watersheds in the province? (Also, what does the word "degrees" communicate in its name? Is that units in the projection or the like? Is that important; would this fail in other projections/units?)
  • What, in overview, is in concave.shp? I'm guessing that it is all the watersheds in the province, concave-ized?
  • What, in overview, is in basin.shp?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I just spotted this in your initial PR comment

The Salmon Climate Impacts Portal uses a hierarchical model of watersheds. A "watershed" is a small section of river or stream. Each watershed belongs to a "basin," which is a major river like the Fraser. These scripts generate shapefiles for the "basins" from the "watersheds".

That would be an excellent thing to include in DESCRIPTION.md , along with any other commentary you think would be enlightening. Domain terminology is always key!

Copy link

@rod-glover rod-glover Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in fact, if I understand the usual meaning of "watershed" correctly, the particular usage/meaning of it in this context idiosyncratic and just a tad confusing. It might be helpful to point this out too. But this may be a naive comment; I'm hardly a domain expert.

logger.info("Loading input files")

driver = ogr.GetDriverByName("ESRI Shapefile")
input = driver.Open(args.input)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work as a context manager? If so, it would probably be better to use it.


# create output files
logger.info("Creating output files")
output = driver.CreateDataSource(args.output)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context manager?

logger.info("Removing holes from {} multipolygon".format(basin))
polygons = []
for polygon in from_wkt(geom).geoms:
polygons.append(Polygon(polygon.exterior.coords))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer list comprehension.

)
basins[basin] = {"watersheds": 1, "geometry": geom}

logger.info("Outputting basins")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining what exactly is output (per basin) would be helpful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like in the output file, a feature is a basin. Yes? See comment above re. explaining correspondence between file elements and domain concepts.

output_feature.SetGeometry(ogr.CreateGeometryFromWkt(to_wkt(no_holes)))

out_layer.CreateFeature(output_feature)
num_basins = num_basins + 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as if num_basins just ends up being len(basins). Or am I missing something?

logger.info("Loading shapefile")

driver = ogr.GetDriverByName("ESRI Shapefile")
watersheds = driver.Open(args.watersheds)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context manager?

out_layer.CreateField(ogr.FieldDefn("NAME", ogr.OFTString))
out_layer.CreateField(ogr.FieldDefn("WATERSH", ogr.OFTInteger))

# step through the watersheds and create each basin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to the relationship between file elements (layers, features) and domain concepts (watersheds, basins). Some comments explaining this would be excellent. Possibly in DESCRIPTION.md, also here.

@corviday
Copy link
Contributor Author

These scripts are unintentionally stripping projection information out of the shapefile and should be updated to not do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

create basin-assembling script
2 participants