Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MG weighted similarity test failure #4054

Merged
merged 13 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions cpp/include/cugraph/graph_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1005,9 +1005,14 @@ remove_self_loops(raft::handle_t const& handle,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types);

/**
* @brief Remove all but one edge when a multi-edge exists. Note that this function does not use
* stable methods. When a multi-edge exists, one of the edges will remain, there is no
* guarantee on which one will remain.
* @brief Remove all but one edge when a multi-edge exists.
*
* When a multi-edge exists, one of the edges will remain. If @p keep_min_value_edge is false, an
* arbitrary edge will be selected among the edges in the multi-edge. If @p keep_min_value_edge is
* true, the edge with the minimum value will be selected. The edge weights will be first compared
* (if @p edgelist_weights.has_value() is true); edge IDs will be compared next (if @p
* edgelist_edge_ids.has_value() is true); and edge types (if @p edgelist_edge_types.has_value() is
* true) will compared last.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about this when I put this function in. The decision to select an arbitrary edge was both simpler, and didn't assume what criteria would be desired. But it does limit the usefulness of the feature.

It seems like there are more reasonable choices than just the minimum edge. Here some thoughts:

  1. Arbitrary (original code, or this code with the flag set to false)
  2. Minimum edge weight (this code with the flag set to true)
  3. Maximum edge weight
  4. Sum of edge weights
  5. Average edge weight

We also support other edge properties (currently we use edge type and edge id), but the data structures and primitives support arbitrary properties.

Would this better be handled by passing in some sort of struct that creates the sorting and reduction criteria? Making it an optional would allow the arbitrary behavior if std::nullopt or use the struct if we want some sort of specific criteria.

Thinking off the top of my head (leaving out lots of details), perhaps something like:

struct edge_reduction_t {
   ...
   template ...
   auto key_first() { // function that returns iterator to the first sort key }
   auto value_first() { // function that returns iterator to the first value key }
   auto reduce_function(auto key, auto value) { // returns the reduced edge }
};

Doesn't quite cover all of the cases (averaging would require both summing values and counting values so we could divide at the end... so perhaps there's an optional transform at the end.

I'd be fine marking this with a FIXME and letting the smallest value fix our immediate problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I was also thinking about edge ID & type... If we want to maintain a symmetric graph, should we also assume that the reverse edge to have the same ID & type? (or this can really be a case by case thing?)

We can discuss what options are really necessary in cuGraph's context. NetworkX supports "arbitrary".

"If both edges exist in digraph and their edge data is different, only one edge is created with an arbitrary choice of which edge data to use."
https://networkx.org/documentation/stable/reference/classes/generated/networkx.DiGraph.to_undirected.html

And I wasn't sure how much should we go beyond this without clear use cases, but now we have at least one use case (maintaining weight symmetry). We should clearly make updates if we can identify more use cases (e.g. anything related to edge ID & types).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue to track this idea. Definitely need to see what kind of use cases would drive this feature before we spend to much time on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to maintain a symmetric graph, should we also assume that the reverse edge to have the same ID & type? (or this can really be a case by case thing?)

Sorry to interject on this late but the python API has a tangential issue. In fact, the python API does not symmetrize edges that have edge_ids under the assumption that each edge has a unique edge id and will throw an exception if the user attempts to create such undirected graph. If the reverse edge has the same ID, wouldn't that violate uniqueness of the edge_ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this really depends on the application context. If a user considers an undirected edge as a single edge, representing this single edge as two edges with the opposite direction is just an internal implementation issue and those two edges may have the same edge ID. If a user considers a directed graph which happens to be symmetric, then, the two edges may better have two different edge IDs. A similar issue can happen with edge types.

I assume this needs more in-depth discussion.

*
* In an MG context it is assumed that edges have been shuffled to the proper GPU,
* in which case any multi-edges will be on the same GPU.
Expand All @@ -1024,6 +1029,11 @@ remove_self_loops(raft::handle_t const& handle,
* @param edgelist_weights Optional list of edge weights
* @param edgelist_edge_ids Optional list of edge ids
* @param edgelist_edge_types Optional list of edge types
* @param keep_min_value_edge Flag indicating whether to keep an arbitrary edge (false) or the
* minimum value edge (true) among the edges in a multi-edge. Relevant only if @p
* edgelist_wegihts.has_value() | @p edgelist_edge_ids.has_value() | @p
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wegihts => weights

* edgelist_edge_types.has_value() is true. Setting this to true incurs performance overhead as this
* requires more comparisons.
* @return Tuple of vectors storing edge sources, destinations, optional weights,
* optional edge ids, optional edge types.
*/
Expand All @@ -1038,6 +1048,7 @@ remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<vertex_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<weight_t>>&& edgelist_weights,
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types);
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types,
bool keep_min_value_edge = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this flag also be exposed to the CAPI? In fact the new cugraph_graph_create_sg takes drop_multi_edges however the users on the higher stack (C, PLC, python) won't be able to control this parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, yes, if the graph is symmetric, we need to set keep_min_value_edge to true to maintain symmetry. See a68868f for the update.


} // namespace cugraph
18 changes: 12 additions & 6 deletions cpp/src/structure/remove_multi_edges.cu
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<int32_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<float>>&& edgelist_weights,
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types);
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types,
bool keep_min_value_edge);

template std::tuple<rmm::device_uvector<int32_t>,
rmm::device_uvector<int32_t>,
Expand All @@ -39,7 +40,8 @@ remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<int32_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<float>>&& edgelist_weights,
std::optional<rmm::device_uvector<int64_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types);
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types,
bool keep_min_value_edge);

template std::tuple<rmm::device_uvector<int64_t>,
rmm::device_uvector<int64_t>,
Expand All @@ -51,7 +53,8 @@ remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<int64_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<float>>&& edgelist_weights,
std::optional<rmm::device_uvector<int64_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types);
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types,
bool keep_min_value_edge);

template std::tuple<rmm::device_uvector<int32_t>,
rmm::device_uvector<int32_t>,
Expand All @@ -63,7 +66,8 @@ remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<int32_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<double>>&& edgelist_weights,
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types);
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types,
bool keep_min_value_edge);

template std::tuple<rmm::device_uvector<int32_t>,
rmm::device_uvector<int32_t>,
Expand All @@ -75,7 +79,8 @@ remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<int32_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<double>>&& edgelist_weights,
std::optional<rmm::device_uvector<int64_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types);
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types,
bool keep_min_value_edge);

template std::tuple<rmm::device_uvector<int64_t>,
rmm::device_uvector<int64_t>,
Expand All @@ -87,6 +92,7 @@ remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<int64_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<double>>&& edgelist_weights,
std::optional<rmm::device_uvector<int64_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types);
std::optional<rmm::device_uvector<int32_t>>&& edgelist_edge_types,
bool keep_min_value_edge);

} // namespace cugraph
61 changes: 40 additions & 21 deletions cpp/src/structure/remove_multi_edges_impl.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,12 @@ group_multi_edges(
rmm::device_uvector<vertex_t>&& edgelist_srcs,
rmm::device_uvector<vertex_t>&& edgelist_dsts,
decltype(allocate_dataframe_buffer<edge_value_t>(0, rmm::cuda_stream_view{}))&& edgelist_values,
size_t mem_frugal_threshold)
size_t mem_frugal_threshold,
bool keep_min_value_edge)
{
auto pair_first = thrust::make_zip_iterator(edgelist_srcs.begin(), edgelist_dsts.begin());
auto value_first = get_dataframe_buffer_begin(edgelist_values);
auto edge_first = thrust::make_zip_iterator(pair_first, value_first);

if (edgelist_srcs.size() > mem_frugal_threshold) {
// FIXME: Tuning parameter to address high frequency multi-edges
Expand All @@ -128,19 +130,28 @@ group_multi_edges(
raft::update_host(
h_group_counts.data(), group_counts.data(), group_counts.size(), handle.get_stream());

thrust::sort_by_key(handle.get_thrust_policy(),
pair_first,
pair_first + h_group_counts[0],
get_dataframe_buffer_begin(edgelist_values));
thrust::sort_by_key(handle.get_thrust_policy(),
pair_first + h_group_counts[0],
pair_first + edgelist_srcs.size(),
get_dataframe_buffer_begin(edgelist_values) + h_group_counts[0]);
if (keep_min_value_edge) {
thrust::sort(handle.get_thrust_policy(), edge_first, edge_first + h_group_counts[0]);
thrust::sort(handle.get_thrust_policy(),
edge_first + h_group_counts[0],
edge_first + edgelist_srcs.size());
} else {
thrust::sort_by_key(
handle.get_thrust_policy(), pair_first, pair_first + h_group_counts[0], value_first);
thrust::sort_by_key(handle.get_thrust_policy(),
pair_first + h_group_counts[0],
pair_first + edgelist_srcs.size(),
value_first + h_group_counts[0]);
}
} else {
thrust::sort_by_key(handle.get_thrust_policy(),
pair_first,
pair_first + edgelist_srcs.size(),
get_dataframe_buffer_begin(edgelist_values));
if (keep_min_value_edge) {
thrust::sort(handle.get_thrust_policy(), edge_first, edge_first + edgelist_srcs.size());
} else {
thrust::sort_by_key(handle.get_thrust_policy(),
pair_first,
pair_first + edgelist_srcs.size(),
get_dataframe_buffer_begin(edgelist_values));
}
}

return std::make_tuple(
Expand All @@ -160,7 +171,8 @@ remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<vertex_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<weight_t>>&& edgelist_weights,
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types)
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types,
bool keep_min_value_edge)
{
auto total_global_mem = handle.get_device_properties().totalGlobalMem;
size_t element_size = sizeof(vertex_t) * 2;
Expand All @@ -187,7 +199,8 @@ remove_multi_edges(raft::handle_t const& handle,
std::make_tuple(std::move(*edgelist_weights),
std::move(*edgelist_edge_ids),
std::move(*edgelist_edge_types)),
mem_frugal_threshold);
mem_frugal_threshold,
keep_min_value_edge);
} else {
std::forward_as_tuple(
edgelist_srcs, edgelist_dsts, std::tie(edgelist_weights, edgelist_edge_ids)) =
Expand All @@ -196,7 +209,8 @@ remove_multi_edges(raft::handle_t const& handle,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::make_tuple(std::move(*edgelist_weights), std::move(*edgelist_edge_ids)),
mem_frugal_threshold);
mem_frugal_threshold,
keep_min_value_edge);
}
} else {
if (edgelist_edge_types) {
Expand All @@ -207,15 +221,17 @@ remove_multi_edges(raft::handle_t const& handle,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::make_tuple(std::move(*edgelist_weights), std::move(*edgelist_edge_types)),
mem_frugal_threshold);
mem_frugal_threshold,
keep_min_value_edge);
} else {
std::forward_as_tuple(edgelist_srcs, edgelist_dsts, std::tie(edgelist_weights)) =
detail::group_multi_edges<vertex_t, thrust::tuple<weight_t>>(
handle,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::make_tuple(std::move(*edgelist_weights)),
mem_frugal_threshold);
mem_frugal_threshold,
keep_min_value_edge);
}
}
} else {
Expand All @@ -228,15 +244,17 @@ remove_multi_edges(raft::handle_t const& handle,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::make_tuple(std::move(*edgelist_edge_ids), std::move(*edgelist_edge_types)),
mem_frugal_threshold);
mem_frugal_threshold,
keep_min_value_edge);
} else {
std::forward_as_tuple(edgelist_srcs, edgelist_dsts, std::tie(edgelist_edge_ids)) =
detail::group_multi_edges<vertex_t, thrust::tuple<edge_t>>(
handle,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::make_tuple(std::move(*edgelist_edge_ids)),
mem_frugal_threshold);
mem_frugal_threshold,
keep_min_value_edge);
}
} else {
if (edgelist_edge_types) {
Expand All @@ -246,7 +264,8 @@ remove_multi_edges(raft::handle_t const& handle,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::make_tuple(std::move(*edgelist_edge_types)),
mem_frugal_threshold);
mem_frugal_threshold,
keep_min_value_edge);
} else {
std::tie(edgelist_srcs, edgelist_dsts) = detail::group_multi_edges(
handle, std::move(edgelist_srcs), std::move(edgelist_dsts), mem_frugal_threshold);
Expand Down
12 changes: 7 additions & 5 deletions cpp/tests/link_prediction/weighted_similarity_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@

struct Similarity_Usecase {
bool use_weights{false};
bool check_correctness{true};
size_t max_seeds{std::numeric_limits<size_t>::max()};
size_t max_vertex_pairs_to_check{std::numeric_limits<size_t>::max()};
bool check_correctness{true};
};

template <typename input_usecase_t>
Expand Down Expand Up @@ -293,7 +293,7 @@ INSTANTIATE_TEST_SUITE_P(
// Disable weighted computation testing in 22.10
//::testing::Values(Similarity_Usecase{true, true, 20, 100}, Similarity_Usecase{false, true, 20,
//: 100}),
::testing::Values(Similarity_Usecase{true, true, 20, 100}),
::testing::Values(Similarity_Usecase{true, 20, 100, true}),
::testing::Values(cugraph::test::File_Usecase("test/datasets/karate.mtx"),
cugraph::test::File_Usecase("test/datasets/dolphins.mtx"))));

Expand All @@ -305,7 +305,7 @@ INSTANTIATE_TEST_SUITE_P(
// Disable weighted computation testing in 22.10
//::testing::Values(Similarity_Usecase{true, true, 20, 100},
//: Similarity_Usecase{false,true,20,100}),
::testing::Values(Similarity_Usecase{true, true, 20, 100}),
::testing::Values(Similarity_Usecase{true, 20, 100, true}),
::testing::Values(cugraph::test::Rmat_Usecase(10, 16, 0.57, 0.19, 0.19, 0, true, false))));

INSTANTIATE_TEST_SUITE_P(
Expand All @@ -319,7 +319,8 @@ INSTANTIATE_TEST_SUITE_P(
// disable correctness checks
// Disable weighted computation testing in 22.10
//::testing::Values(Similarity_Usecase{false, false}, Similarity_Usecase{true, false}),
::testing::Values(Similarity_Usecase{true, true}),
::testing::Values(Similarity_Usecase{
true, std::numeric_limits<size_t>::max(), std::numeric_limits<size_t>::max(), true}),
::testing::Values(cugraph::test::File_Usecase("test/datasets/karate.mtx"))));

INSTANTIATE_TEST_SUITE_P(
Expand All @@ -332,7 +333,8 @@ INSTANTIATE_TEST_SUITE_P(
::testing::Combine(
// disable correctness checks for large graphs
//::testing::Values(Similarity_Usecase{false, false}, Similarity_Usecase{true, false}),
::testing::Values(Similarity_Usecase{true, false}),
::testing::Values(Similarity_Usecase{
true, std::numeric_limits<size_t>::max(), std::numeric_limits<size_t>::max(), false}),
::testing::Values(cugraph::test::Rmat_Usecase(10, 16, 0.57, 0.19, 0.19, 0, true, false))));

CUGRAPH_TEST_PROGRAM_MAIN()
14 changes: 8 additions & 6 deletions cpp/tests/utilities/test_graphs.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -633,12 +633,14 @@ construct_graph(raft::handle_t const& handle,

if (drop_multi_edges) {
std::tie(d_src_v, d_dst_v, d_weights_v, std::ignore, std::ignore) =
cugraph::remove_multi_edges<vertex_t, edge_t, weight_t, int32_t>(handle,
std::move(d_src_v),
std::move(d_dst_v),
std::move(d_weights_v),
std::nullopt,
std::nullopt);
cugraph::remove_multi_edges<vertex_t, edge_t, weight_t, int32_t>(
handle,
std::move(d_src_v),
std::move(d_dst_v),
std::move(d_weights_v),
std::nullopt,
std::nullopt,
is_symmetric ? true /* keep minimum weight edges to maintain symmetry */ : false);
}

graph_t<vertex_t, edge_t, store_transposed, multi_gpu> graph(handle);
Expand Down