Skip to content

Commit c55377b

Browse files
committed
BPE for KGE added
1 parent e8ca9e9 commit c55377b

File tree

2 files changed

+24
-33
lines changed

2 files changed

+24
-33
lines changed

pages/theses/BPE_KGE.mdx

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
date: '2024-11-15'
3+
title: 'Byte pair encoding for Knowledge Graph Embeddings'
4+
type: 'Bachelor'
5+
supervisor: dice:CaglarDemir
6+
contact: dice:CaglarDemir
7+
---
8+
9+
# Topic
10+
A knowledge graph embedding (KGE) model assigns a unique embedding row for each unique entities/nodes and relations/edges.
11+
As the size of the unique entities or relations grows, the memory usage of KGE increases.
12+
Therefore, the memory requirement to train KGE model or deploy a trained model is bounded by the size of the data.
13+
14+
LLMs uses byte pair encoding techniques to learn to represent sequence of chars with subword unit.
15+
Therefore, LLM embeddings are subword units, instead of unique words.
16+
Recently, we show that byte pair encoding schema developed for LLMs can also be used for KGEs (see
17+
[Inference over Unseen Entities, Relations and Literals on Knowledge Graphs](https://arxiv.org/pdf/2410.06742) .
18+
In this thesis, the student will design a byte pair encoding schema based on a given knowledge graph.
19+
The student will closely work on [dice-embeddings](https://github.com/dice-group/dice-embeddings).
20+
21+
22+
#### Question & Answer Session
23+
24+
In case you have further questions, feel free to contact [Caglar Demir](https://dice-research.org/CaglarDemir).

pages/theses/RobostEmbeddings.mdx

-33
This file was deleted.

0 commit comments

Comments
 (0)