Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Implementing cuGraph.node2vec on kaggel or other datasets #4094

Closed
2 tasks done
ShivanjaliR opened this issue Jan 16, 2024 · 11 comments · Fixed by #4161
Closed
2 tasks done

[FEA]: Implementing cuGraph.node2vec on kaggel or other datasets #4094

ShivanjaliR opened this issue Jan 16, 2024 · 11 comments · Fixed by #4161
Assignees
Labels
bug Something isn't working

Comments

@ShivanjaliR
Copy link

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Critical (currently preventing usage)

Please provide a clear description of problem this feature solves

I created a directed graph using cuGraph and I am using a dataset which is not belong to cuGraph libraries. When I am trying to implement node2vec on such directed graph I am facing the attached issue.
cuda_issue

If I am using from cugraph.datasets import karate, karate_asymmetric datasets then node2vec is working as expected.

Traceback (most recent call last):
File "/app/web_reading.py", line 239, in
paths, weights, path_sizes = cugraph.node2vec(original_bitcoin_graph, start_vertices, 10, True, p, q)
File "/opt/conda/lib/python3.10/site-packages/cugraph/sampling/node2vec.py", line 123, in node2vec
vertex_set, edge_set, sizes = pylibcugraph_node2vec(
File "node2vec.pyx", line 160, in pylibcugraph.node2vec.node2vec
File "utils.pyx", line 53, in pylibcugraph.utils.assert_success
RuntimeError: non-success value returned from cugraph_node2vec: CUGRAPH_UNKNOWN_ERROR CUDA error encountered at: file=/opt/conda/include/raft/util/cudart_utils.hpp line=148:
['/app', '/opt/conda/lib/python310.zip', '/opt/conda/lib/python3.10', '/opt/conda/lib/python3.10/lib-dynload', '/opt/conda/lib/python3.10/site-packages']

def createDirectedWebGraph(sources, targets):
G = cugraph.Graph(directed=True)
edges_df = cudf.DataFrame({'source': sources, 'target': targets})
G.from_cudf_edgelist(edges_df, source='source', destination='target', renumber=True)
return G

node_content = cudf.read_csv(bitcoin_inputfile)
numpy_array = node_content.to_pandas().to_numpy()
source = numpy_array[:, 0]
target = numpy_array[:, 1]
original_bitcoin_graph = createDirectedWebGraph(source, target)
start_vertices = cudf.Series(original_bitcoin_graph.nodes(), dtype=np.int32)
print(original_bitcoin_graph.nodes())
paths, weights, path_sizes = cugraph.node2vec(original_bitcoin_graph, start_vertices, 10, True, p, q)
print(paths)
print(weights)
print(path_sizes)

Are we able to use node2vec on cugraph Graph which is created from another dataset.

Describe your ideal solution

We should be able to implement the node2vec library on other kaggle datasets. Please let me know if I am going wrong in above code.
cuda_issue

Describe any alternatives you have considered

No response

Additional context

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@ShivanjaliR ShivanjaliR added ? - Needs Triage Need team to review and classify feature request New feature or request labels Jan 16, 2024
@jnke2016
Copy link
Contributor

Hi @ShivanjaliR . Thank you for raising this up and I am looking into it. cuGraph should be able to handle any datasets including kaggle's one. Is the bitcoin_inputfile public or can you share it so that I can try to reproduce the issue locally. As you already mentioned I tried with our test datasets and it worked.

@jnke2016 jnke2016 self-assigned this Jan 18, 2024
@ShivanjaliR
Copy link
Author

https://snap.stanford.edu/data/soc-sign-bitcoin-alpha.html

Its a publicly available dataset.
soc-sign-bitcoinalpha.csv

I have attached csv file as well.

@jnke2016
Copy link
Contributor

Thanks. Looking into now

@jnke2016
Copy link
Contributor

jnke2016 commented Jan 18, 2024

I was able to reproduce the error and I figured what is wrong. The error you are getting is Reason=cudaErrorInvalidValue:invalid argument
and if you closely look at the argument you pass to cugraph.node2vec, you will realize that original_bitcoin_graph.nodes() and start_vertices are not of the same type after you cast the latter to int32 while the former is int64. In fact, the source and target type need to match start_vertices and the original_bitcoin_graph.nodes() type is derived from your source and target. All you have to do is to match your source, target and start_vertices. Below are some options to resolve the issue

  1. Remove the cast to int32 of start_vertices: start_vertices = cudf.Series(original_bitcoin_graph.nodes())
  2. Specify the type of the edge list dataframe: edges_df = cudf.DataFrame({'source': sources, 'target': targets}, dtype=np.int32)

The first option ensures that start_vertices, source and target are of the same type int64 but this is not memory efficient because the max vertex ID is 7604 which is not worth an int64

The second option ensures that start_vertices, source and target are of the same type int32 which is probably what you want.

Please let me know if this solve your issue or if you have any further questions. I will also push a PR ensuring that the vertices and start_vertices are of the same type otherwise throws a more helpful exception that the user can understand.

@ChuckHastings
Copy link
Collaborator

@jnke2016 - I suggest making the change both in the python and C API, since this could affect users coming in at any layer. C++ is already enforced by the template parameters.

@BradReesWork BradReesWork removed the ? - Needs Triage Need team to review and classify label Feb 8, 2024
@rlratzel rlratzel added bug Something isn't working and removed feature request New feature or request labels Feb 19, 2024
@rlratzel
Copy link
Contributor

@jnke2016 do you think this should this be a bug or a feature request? I'm leaning toward classifying this as a bug since it failed to provide an accurate error message.

@ShivanjaliR
Copy link
Author

I think my previous request consider as bug .. but now I am facing new issues where I need new feature requests:

  1. I realized just like networkx graph object, there should be facility for cugraph as well to remove specific node or list of nodes from cugraph.
  2. in case of node2vec library for CUP, its one of input arguments is dimension which returns node embedding of size of dimension but in case of node2vec library of cugraph, it returns random walks not node embedding, is there any way where I will get node embedding of graph for cugraph.node2vec as well. Just like node2vec in CPU.

Waiting for reply.

@ChuckHastings
Copy link
Collaborator

I think my previous request consider as bug .. but now I am facing new issues where I need new feature requests:

  1. I realized just like networkx graph object, there should be facility for cugraph as well to remove specific node or list of nodes from cugraph.
  2. in case of node2vec library for CUP, its one of input arguments is dimension which returns node embedding of size of dimension but in case of node2vec library of cugraph, it returns random walks not node embedding, is there any way where I will get node embedding of graph for cugraph.node2vec as well. Just like node2vec in CPU.

Waiting for reply.

@ShivanjaliR - please make this a separate issue within git. This will allow us to resolve the bug when the linked PR is resolved. The new feature request will need to be tracked separately.

@ShivanjaliR
Copy link
Author

I raised the new issue as follows:

#4198

Please look into it as soon as possible.

@ChuckHastings
Copy link
Collaborator

Thanks!

@ShivanjaliR
Copy link
Author

Any updates on this issue? any progress?

@rapids-bot rapids-bot bot closed this as completed in 728ffd0 Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants