Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add subgraph method to Graph to get subsets #640

Merged
merged 2 commits into from
Nov 19, 2023

Conversation

martinfleis
Copy link
Member

I have found myself in a need to create subsets of a large graph covering only specific portions of my geometries. Hence I have developed a method to create a subgraph since making a subset of adjacency and passing that to a constructor discards isolates.

islands = np.setdiff1d(ids, heads)
islands = pd.Index(ids).difference(pd.Index(heads))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be several orders of magnitude faster for large string indices.

Copy link

codecov bot commented Nov 10, 2023

Codecov Report

Merging #640 (77bd0ff) into main (79c4f82) will increase coverage by 0.0%.
Report is 1 commits behind head on main.
The diff coverage is 100.0%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main    #640   +/-   ##
=====================================
  Coverage   83.9%   83.9%           
=====================================
  Files        139     139           
  Lines      14970   14976    +6     
=====================================
+ Hits       12562   12569    +7     
+ Misses      2408    2407    -1     
Files Coverage Δ
libpysal/graph/_utils.py 89.3% <100.0%> (ø)
libpysal/graph/base.py 97.7% <100.0%> (+<0.1%) ⬆️
libpysal/graph/tests/test_base.py 100.0% <100.0%> (ø)

... and 6 files with indirect coverage changes

@knaaptime
Copy link
Member

cool. I wonder if we can use this to enhance pysal/esda#259 as discussed over there (e.g by computing the largest range first, then successively cutting down the graph). Seems like you'd still need a tree query to get the indices though?

@martinfleis
Copy link
Member Author

martinfleis commented Nov 10, 2023

I wonder if we can use this to enhance pysal/esda#259 as discussed over there (e.g by computing the largest range first, then successively cutting down the graph). Seems like you'd still need a tree query to get the indices though?

I don't think so. This aims to create a subgraph based on a subset of focals. In correlogram, you need to keep the same focals but cut their neighbors.

What you could doe once #635 is in is something like this:

for i in distances:
    adj = graph_w_distance.adjacency.copy()
    adj[adj > i] = 0
    smaller_graph = graph.Graph(adj, is_sorted=True).transform("r")
    # compute stats using smaller_graph

This assumes that graph_w_distance has weight == distance. Right now, there is the sorting bottleneck in this code but after #635 that will be gone.

edit: you could eventually call eliminate_zeros() from #634 on that as well before transform to get a cleaner graph but I am less sure about perf benefits of that.

@martinfleis martinfleis merged commit 85e6d5f into pysal:main Nov 19, 2023
@martinfleis martinfleis deleted the subset branch November 19, 2023 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants