Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Overlay functionality with Dask-geopandas #217

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

slumnitz
Copy link
Contributor

@slumnitz slumnitz commented Aug 27, 2022

  • add overlay.py script to codebase
  • add overlay to init.py

To-Do

  • add tests
  • adding keywords

@martinfleis
Copy link
Member

Thanks for looking into this. Seeing that this is essentially a mirror of sjoin inner implementation we have here, I am a bit afraid it won't be that easy.

Take an example of difference and assume you have a polygon in df1 which intersects with two polygons in df2, each of which needs to be clipped out from the original. The issue you may face is when one of these two polygons is in one partition while the other is in another one. What happens in this PR is that you will end up with two versions of our original polygon from df1, one that is a result of difference with one partition and the other that comes from the operation with the other partition. So we have a duplicated entry where both of them are actually wrong.

We face very similar issue in left sjoin, that is why only inner is implemented at the moment.

For overlay, we would need some form of very smart overlapping computation (#40) and I am not fully sure how would that work at the moment.

@keewis
Copy link

keewis commented Jul 12, 2024

I wonder if it would not be better to solve this using the map-reduce pattern? For computing differences,
the idea would be to first compute the element-wise differences, then group by the geometries we subtract from and reduce using intersection (somewhat like the "map-reduce" method from the flox documentation):

That way, we can avoid increasing the partition sizes to implement map_overlap, which could make some of the partitions bigger-than-memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants