-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #567 from dice-group/thesis_ml
Retrieval Augmented Generation over KGE
- Loading branch information
Showing
3 changed files
with
63 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
date: '2024-11-15' | ||
title: 'Byte pair encoding for Knowledge Graph Embeddings' | ||
type: 'Bachelor' | ||
supervisor: dice:CaglarDemir | ||
contact: dice:CaglarDemir | ||
--- | ||
|
||
# Topic | ||
A knowledge graph embedding (KGE) model assigns a unique embedding row for each unique entities/nodes and relations/edges. | ||
As the size of the unique entities or relations grows, the memory usage of KGE increases. | ||
Therefore, the memory requirement to train KGE model or deploy a trained model is bounded by the size of the data. | ||
|
||
LLMs uses byte pair encoding techniques to learn to represent sequence of chars with subword unit. | ||
Therefore, LLM embeddings are subword units, instead of unique words. | ||
Recently, we show that byte pair encoding schema developed for LLMs can also be used for KGEs (see | ||
[Inference over Unseen Entities, Relations and Literals on Knowledge Graphs](https://arxiv.org/pdf/2410.06742) . | ||
In this thesis, the student will design a byte pair encoding schema based on a given knowledge graph. | ||
The student will closely work on [dice-embeddings](https://github.com/dice-group/dice-embeddings). | ||
|
||
|
||
#### Question & Answer Session | ||
|
||
In case you have further questions, feel free to contact [Caglar Demir](https://dice-research.org/CaglarDemir). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
--- | ||
date: '2024-11-15' | ||
title: 'RAG over Neural Triple Stores' | ||
type: 'Bachelor' | ||
supervisor: dice:CaglarDemir | ||
contact: dice:CaglarDemir | ||
--- | ||
|
||
# Topic | ||
Most knowledge graphs are incomplete. | ||
Neural link predictors (most knowledge graph embeddings) can accurately infer of missing knowledge even involving multi-hop reasoning. | ||
|
||
In this thesis, the student will focus on techniques that combine LLMs and neural link predictors in the context of retrieval augmented generation (RAG). | ||
Through designing a novel and effective model, we aim to achieve the following workflow | ||
|
||
1. A user can ask a question. | ||
2. LLM renders the question into a first order logic expression via prompt engineering. | ||
3. The first-order logic expression is given to a neural link predictor to perform [multi-hop query answering](https://github.com/dice-group/dice-embeddings?tab=readme-ov-file#answering-complex-queries) | ||
4. The result (e.g. an ordered sequence of nodes/entities) is preprocessed and given to LLM to generate fluent response to the user | ||
|
||
|
||
The student will closely work on [dice-embeddings](https://github.com/dice-group/dice-embeddings) and a LLM provided by us. | ||
|
||
A working simple example: | ||
```python | ||
Graph={("ComputerScientist","subclass","Scientist"), ("Scientist","subclass","Person"),("CaglarDemir","type","ComputerScientist")} | ||
trained_kge=KGE().train(G) | ||
user_query="What is the occupation of Caglar?" | ||
llm_endpoint="" | ||
response=students_work(user_query, trained_kge, llm_endpoint) | ||
""" | ||
response ~ Caglar Demir is a Computer Scientist. | ||
""" | ||
``` | ||
|
||
|
||
#### Question & Answer Session | ||
|
||
In case you have further questions, feel free to contact [Caglar Demir](https://dice-research.org/CaglarDemir). |
This file was deleted.
Oops, something went wrong.