Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Leiden clustering numbering is off #4791

Open
2 tasks done
abs51295 opened this issue Nov 27, 2024 · 5 comments
Open
2 tasks done

[BUG]: Leiden clustering numbering is off #4791

abs51295 opened this issue Nov 27, 2024 · 5 comments
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@abs51295
Copy link

Version

24.12

Which installation method(s) does this occur on?

No response

Describe the bug.

Hey,

We ran leiden clustering on our dataset after the recent fix #4730 and found that it skips some cluster numbers randomly. I wonder if it's just a label issue and not a problem with the algorithm itself. Image

Minimum reproducible example

Relevant log output

Environment details

Other/Misc.

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@abs51295 abs51295 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 27, 2024
@ChuckHastings
Copy link
Collaborator

I believe this is a label numbering problem and not a clustering problem. Leiden is a hierarchical clustering algorithm. At each level, we combine clusters, the cluster is numbered by one of the vertices in the cluster. That assignment is arbitrary (based upon when the algorithm decides to move things). For example, if vertex 10 is being evaluated and it is determined it should merge into cluster 5, then it will be assigned to cluster 5. Cluster 10 would then be empty.

The vertices/clusters are renumbered when we move to a new level of the hierarchy, but I don't believe that we renumber the vertices/clusters unless we move to another level of the hierarchy.

@abs51295
Copy link
Author

abs51295 commented Dec 3, 2024

Thanks @ChuckHastings for clarifying this up. I wonder though I wasn't able to see that in the previous versions of rapids (24.10) and has something to do with the recent update.

@ChuckHastings
Copy link
Collaborator

The PR you referenced that fixed Leiden corrected a bug where the convergence criteria was wrong and was causing the algorithm to abort early. It is likely that this bug was masking this effect.

Do you have a small example that you can share where this is occurring? I can try and recreate to get a better understanding of what you're seeing.

@abs51295
Copy link
Author

abs51295 commented Dec 5, 2024

Hey Chuck,

Here's the file for the adjacency matrix: https://cedars.box.com/s/4mg82y2u0m77pi8c4i9izt3yzx52xq1c. I just use this function (from rapids-singlecell) to get a weighted graph

def _create_graph(adjacency, use_weights=True):
    from cugraph import Graph

    sources, targets = adjacency.nonzero()
    weights = adjacency[sources, targets]
    if isinstance(weights, np.matrix):
        weights = weights.A1
    df = cudf.DataFrame({"source": sources, "destination": targets, "weights": weights})
    g = Graph()
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        if use_weights:
            g.from_cudf_edgelist(
                df, source="source", destination="destination", weight="weights"
            )
        else:
            g.from_cudf_edgelist(df, source="source", destination="destination")
    return g

and then run leiden using the following:

    from cugraph import leiden as culeiden

    leiden_parts, _ = culeiden(
        g,
        resolution=1,
        random_state=0,
        max_iter=100,
    )

which generates the following output:
Image
where cluster number 18 is skipped.

Thanks for all your help.

@rlratzel
Copy link
Contributor

update: @jnke2016 is also taking a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants