-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MG weighted similarity test failure #4054
Changes from 4 commits
fc90902
e291883
bf33d09
8700a4f
137f1f1
ce3c0df
a6bdc53
26e0f65
81df775
7722003
a68868f
1ceb38d
4268d00
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1005,9 +1005,14 @@ remove_self_loops(raft::handle_t const& handle, | |
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types); | ||
|
||
/** | ||
* @brief Remove all but one edge when a multi-edge exists. Note that this function does not use | ||
* stable methods. When a multi-edge exists, one of the edges will remain, there is no | ||
* guarantee on which one will remain. | ||
* @brief Remove all but one edge when a multi-edge exists. | ||
* | ||
* When a multi-edge exists, one of the edges will remain. If @p keep_min_value_edge is false, an | ||
* arbitrary edge will be selected among the edges in the multi-edge. If @p keep_min_value_edge is | ||
* true, the edge with the minimum value will be selected. The edge weights will be first compared | ||
* (if @p edgelist_weights.has_value() is true); edge IDs will be compared next (if @p | ||
* edgelist_edge_ids.has_value() is true); and edge types (if @p edgelist_edge_types.has_value() is | ||
* true) will compared last. | ||
* | ||
* In an MG context it is assumed that edges have been shuffled to the proper GPU, | ||
* in which case any multi-edges will be on the same GPU. | ||
|
@@ -1024,6 +1029,11 @@ remove_self_loops(raft::handle_t const& handle, | |
* @param edgelist_weights Optional list of edge weights | ||
* @param edgelist_edge_ids Optional list of edge ids | ||
* @param edgelist_edge_types Optional list of edge types | ||
* @param keep_min_value_edge Flag indicating whether to keep an arbitrary edge (false) or the | ||
* minimum value edge (true) among the edges in a multi-edge. Relevant only if @p | ||
* edgelist_wegihts.has_value() | @p edgelist_edge_ids.has_value() | @p | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wegihts => weights |
||
* edgelist_edge_types.has_value() is true. Setting this to true incurs performance overhead as this | ||
* requires more comparisons. | ||
* @return Tuple of vectors storing edge sources, destinations, optional weights, | ||
* optional edge ids, optional edge types. | ||
*/ | ||
|
@@ -1038,6 +1048,7 @@ remove_multi_edges(raft::handle_t const& handle, | |
rmm::device_uvector<vertex_t>&& edgelist_dsts, | ||
std::optional<rmm::device_uvector<weight_t>>&& edgelist_weights, | ||
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids, | ||
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types); | ||
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types, | ||
bool keep_min_value_edge = false); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this flag also be exposed to the CAPI? In fact the new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice catch, yes, if the graph is symmetric, we need to set keep_min_value_edge to true to maintain symmetry. See a68868f for the update. |
||
|
||
} // namespace cugraph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about this when I put this function in. The decision to select an arbitrary edge was both simpler, and didn't assume what criteria would be desired. But it does limit the usefulness of the feature.
It seems like there are more reasonable choices than just the minimum edge. Here some thoughts:
We also support other edge properties (currently we use edge type and edge id), but the data structures and primitives support arbitrary properties.
Would this better be handled by passing in some sort of struct that creates the sorting and reduction criteria? Making it an optional would allow the arbitrary behavior if
std::nullopt
or use the struct if we want some sort of specific criteria.Thinking off the top of my head (leaving out lots of details), perhaps something like:
Doesn't quite cover all of the cases (averaging would require both summing values and counting values so we could divide at the end... so perhaps there's an optional transform at the end.
I'd be fine marking this with a FIXME and letting the smallest value fix our immediate problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I was also thinking about edge ID & type... If we want to maintain a symmetric graph, should we also assume that the reverse edge to have the same ID & type? (or this can really be a case by case thing?)
We can discuss what options are really necessary in cuGraph's context. NetworkX supports "arbitrary".
"If both edges exist in digraph and their edge data is different, only one edge is created with an arbitrary choice of which edge data to use."
https://networkx.org/documentation/stable/reference/classes/generated/networkx.DiGraph.to_undirected.html
And I wasn't sure how much should we go beyond this without clear use cases, but now we have at least one use case (maintaining weight symmetry). We should clearly make updates if we can identify more use cases (e.g. anything related to edge ID & types).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue to track this idea. Definitely need to see what kind of use cases would drive this feature before we spend to much time on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to interject on this late but the python API has a tangential issue. In fact, the python API does not symmetrize edges that have
edge_ids
under the assumption that each edge has a unique edge id and will throw an exception if the user attempts to create such undirected graph. If the reverse edge has the same ID, wouldn't that violate uniqueness of the edge_ids?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this really depends on the application context. If a user considers an undirected edge as a single edge, representing this single edge as two edges with the opposite direction is just an internal implementation issue and those two edges may have the same edge ID. If a user considers a directed graph which happens to be symmetric, then, the two edges may better have two different edge IDs. A similar issue can happen with edge types.
I assume this needs more in-depth discussion.