-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GraphBLAS needs methods to sort a vector or matrix #50
Comments
Agreed, the python sort function, while state of the art, is inherently single threaded. Also due to the Python Global Interpreter Lock (GIL) there's really no way to do good parallel sorts in pure python. A parallel sort in in the API would be very useful!
I have a hunch the Louvain community detection method could benefit from this as well by first permuting the graph by out-degree order. I suspect having larger initial wave fronts would group larger changes per iteration, but I haven't explored this further.
Maybe Cuthill McKee? https://en.wikipedia.org/wiki/Cuthill%E2%80%93McKee_algorithm
+1
Yeah I think finding the ordinal position of entries like that would be quite useful! |
Could The "move everything to the left" operation can be useful--I use it to do parallel prefix sum. But, this seems like a completely different operation. Why shoehorn it into We could do easily do parallel sort in Python using |
Yes, I think either P and/or X could be NULL. Probably they should be created first and passed in as P and X, not &P or &X. And X should not always be sorted in-place, but an output matrix could be specified, say C, then if X is passed in as C, it would be done in-place. The "move everything to the left" would be a byproduct of the sort. It would just be the sorted data itself. |
Yeah, thanks, I didn't catch that upon first reading. |
If the output could be smaller, then this could also calculate |
Right. The same thing is true if you want to answer the query "In each row A(i,:), give me a matrix C where C(i,:) has just the first 5 entries in A(i,:), looking left to right". That computation is needed in the LAGraph_ConnectedComponents method. That would be best solved with a kind of GrB_select but with a different kind of operator (one that has "k" as an input, if A(i,j) is the kth entry in its row i, or in its column j, depending on the descriptor, say). So the GrB_*sort methods wouldn't be used everywhere, but they'd be very useful and fast when a complete sort is actually needed. |
Yes, that's a perfect example. Cuthill-McKee is a revised BFS where each level needs to be sorted by degree. This could be done one frontier at a time, in the inner loop of the LAGraph BFS. Or perhaps it could be done at the end, if the sort metric is, say, level * n + degree. Either way, it needs a sort. |
Here is a possible specification. The "implemenation-specific" comments would not be part of a spec; just what I would do in SuiteSparse:GraphBLAS. Feedback welcome:
|
The matrix or vector could be sorted in place if called this way:
|
MATLAB has a similar feature, where the outputs B and I are like my usage of C and P for GxB_Matrix_sort.
|
I've drafted two methods. It turns out the op_eq is not needed, just op_lt. The signature
where a and be are double, say, and are entries from A. I've drafted and partially tested GxB_Matrix_sort and GxB_Vector_sort. The Matrix sort can sort either rows or columns of A. A can also be sorted in place, with |
GxB_Matrix_sort and GxB_Vector_sort are now fully documented and tested. They will appear in the pre-release v5.2.0.alpha14 and v6.0.0.alph14 shortly. |
I'm coming across more and more graph algorithms that need to sort a vector, and some need to sort a matrix. We have some LAGraph_Sort methods but they are not sufficient.
Sorting a vector or matrix comes up in these methods:
I'm sure there are more uses for sorting a matrix or vector.
I think we need a GrB_Vector_sort (&P, X, op, descriptor) that sorts the vector X, in place, using the GrB_BinaryOp op to compare entries (GrB_LT_FP64 would sort a vector X in ascending order, for example). The output vector X would have the same size as on input. If X had e entries on input, then on output X[0...e-1] would hold all the entries in ascending order. The output vector P would be of type GrB_UINT64 and would hold the row indices of the entries in X, in P[0..e-1]. If &P is NULL it would not be returned.
A GrB_Matrix_sort would do the same, except it would sort each row of the matrix X. The output matrix P would have the same dimensions as X, and P(i, 0...e-1) would hold the row indices of
the sorted X(i, 0...e-1). If GrB_DESC_T0 is used, it would sort each column of the matrix.
I don't think this method could take a mask or accumulator, since it has two outputs P and X.
If op is NULL, then no sort would be done. The GrB_Matrix_sort (&P, X, NULL, NULL) would return a matrix X with all entries in X(i,:) moved to the "left", in X(i,0..e-1) if row i of X had e entries. This would allow Jeremy's query: "Give me the 4th entry in each row of a matrix". All it would require is GrB_extract to get X(:,3) and P(:,3).
Computing the median or mode of each row of a matrix after a sort would be simple, but this might need yet another GrB_* method. Perhaps GrB_Matrix_statistic (&C, P, X, method, descriptor) would take in the output P and X of a GrB_Matrix_sort, and return a vector C with the mode, median, etc (chosen by the method parameter). This needs some more thought, but it's clear to me that some kind of sorting mechanism is needed to sort the rows/cols of a GrB_Matrix, or the entries of a single GrB_Vector.
The text was updated successfully, but these errors were encountered: