Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path compression variants for union-find IntDisjointSet #913

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jacob-roth
Copy link

Based on a question I'd asked on Discourse, I received some encouragement to write a path-halving/path-splitting implementation for union-find data structure. I have a first draft of this for consideration. This was also related to 911.

@jacob-roth jacob-roth changed the title [911](https://github.com/JuliaCollections/DataStructures.jl/issues/911) path compression variants for union-find IntDisjointSet Sep 10, 2024
@jacob-roth
Copy link
Author

jacob-roth commented Sep 10, 2024

Here are benchmark times for a find on the last element of a vector where each element's parent is its predecessor (x[i] is parent to x[i+1]). Seems that splitting has good performance relative to recursive (which gives a stackoverflow for depth > 100_000)

julia> benchmark_find_root(10)
Benchmarking recursive path compression implementation (find_root_impl!):
  3.468 ns (0 allocations: 0 bytes)
Benchmarking iterative path compression implementation (find_root_iterative!):
  3.289 ns (0 allocations: 0 bytes)
Benchmarking path-halving implementation (find_root_halving!):
  2.909 ns (0 allocations: 0 bytes)
Benchmarking path-splitting implementation (find_root_path_splitting!):
  2.912 ns (0 allocations: 0 bytes)

julia> benchmark_find_root(100)
Benchmarking recursive path compression implementation (find_root_impl!):
  3.045 ns (0 allocations: 0 bytes)
Benchmarking iterative path compression implementation (find_root_iterative!):
  3.472 ns (0 allocations: 0 bytes)
Benchmarking path-halving implementation (find_root_halving!):
  2.627 ns (0 allocations: 0 bytes)
Benchmarking path-splitting implementation (find_root_path_splitting!):
  2.752 ns (0 allocations: 0 bytes)

julia> benchmark_find_root(1_000)
Benchmarking recursive path compression implementation (find_root_impl!):
  2.752 ns (0 allocations: 0 bytes)
Benchmarking iterative path compression implementation (find_root_iterative!):
  3.477 ns (0 allocations: 0 bytes)
Benchmarking path-halving implementation (find_root_halving!):
  2.909 ns (0 allocations: 0 bytes)
Benchmarking path-splitting implementation (find_root_path_splitting!):
  2.902 ns (0 allocations: 0 bytes)

julia> benchmark_find_root(10_000)
Benchmarking recursive path compression implementation (find_root_impl!):
  3.018 ns (0 allocations: 0 bytes)
Benchmarking iterative path compression implementation (find_root_iterative!):
  3.489 ns (0 allocations: 0 bytes)
Benchmarking path-halving implementation (find_root_halving!):
  2.911 ns (0 allocations: 0 bytes)
Benchmarking path-splitting implementation (find_root_path_splitting!):
  2.483 ns (0 allocations: 0 bytes)

julia> benchmark_find_root(100_000)
Benchmarking recursive path compression implementation (find_root_impl!):
Recursive may path compression may encounter stack-overflow; skipping
Benchmarking iterative path compression implementation (find_root_iterative!):
  3.293 ns (0 allocations: 0 bytes)
Benchmarking path-halving implementation (find_root_halving!):
  2.631 ns (0 allocations: 0 bytes)
Benchmarking path-splitting implementation (find_root_path_splitting!):
  2.755 ns (0 allocations: 0 bytes)

I'm using:

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: 8 × Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  LD_LIBRARY_PATH = /Users/myuser/.julia/v0.6/Pardiso/deps/libpardiso500-MACOS-X86-64.dylib
  JULIA_BINDIR = /Applications/Julia-1.8.app/Contents/Resources/julia/bin/

println("Recursive may path compression may encounter stack-overflow; skipping")
else
s = create_disjoint_set_struct(n)
@btime find_root!($s, $n, PCRecursive())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increase the number of evals to let's say 100. post the median and max time. do it for all of the methods

return current
end

# path-splitting: every node on the path points to its grandparent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the exact difference between path compression using path halving and path splitting? it's not very clear. can you illustrate with an example?

root = current
# compress the path: make every node point directly to the root
current = x
@inbounds while parents[current] != root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address the test coverage warning.

end


struct PCRecursive end # path compression types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this an enum.

@@ -68,6 +68,7 @@ module DataStructures
include("queue.jl")
include("accumulator.jl")
include("disjoint_set.jl")
export PCRecursive, PCIterative, PCHalving, PCSplitting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after making it an enum, export the enum

@jacob-roth
Copy link
Author

Thanks for reviewing @eulerkochy! I'll address when I have a chance (next week). Also, wanted to draw your attention to a concurrent implementation: https://github.com/kalmarek/ConcurrentDisjointSets.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants