Skip to content

Commit

Permalink
Merge pull request #117 from superlinked/robertdhayanturner-patch-2
Browse files Browse the repository at this point in the history
Update node_representation_learning.md
  • Loading branch information
robertdhayanturner authored Jan 12, 2024
2 parents bc46a4f + de44391 commit 500442d
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions docs/use_cases/node_representation_learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ print(next(iter(loader)))
>>> Data(x=[2646, 1433], edge_index=[2, 8642], edge_label_index=[2, 2048], edge_label=[2048], ...)
```

In the `Data` object `x` contains the BoW node features. The `edge_label_index` tensor contains the head and tail node indices for the positive and negative samples. `edge_label` is the binary target for these pairs (1 for positive 0 for negative samples). The `edge_index` tensor holds the adjacency list for the current batch of nodes.
In the `Data` object, `x` contains the BoW node features. The `edge_label_index` tensor contains the head and tail node indices for the positive and negative samples. `edge_label` is the binary target for these pairs (1 for positive 0 for negative samples). The `edge_index` tensor holds the adjacency list for the current batch of nodes.

Now we can **train** our model as follows:

Expand Down Expand Up @@ -329,12 +329,13 @@ The results obtained with LLM only, Node2Vec combined with LLM, and GraphSAGE tr
| F1 (macro) | 0.779 (+7.8%) | **0.840** (+0.9%) | 0.831 (+1.1%) |


Let's explore how good LLM vectors are at *representing citation data*.
Let's explore how well LLM vectors *represent citation data*.

![LLM cosine similarity edge counts](../assets/use_cases/node_representation_learning/bins_llm.png)

With LLM embeddings, nodes that are connected have a stronger similarity between their representations, much stronger than using Bag of Words (BoW) features. However, for pairs of nodes that aren't connected, there's still a wide range of similarity values. This makes it challenging to easily tell them apart from connected pairs - meaning that they are somewhere in between BoW and Node2Vec features in capturing the graph structure.


## Conclusion: LLM, Node2Vec, GraphSAGE better at learning node and node relationship data than BoW

For classification tasks on our article citation dataset, we can conclude that:
Expand Down

0 comments on commit 500442d

Please sign in to comment.