AskWikidata

A Prototype for a Wikidata Question-Answering System

This system allows users to query Wikidata using natural language questions. The responses contain links to sources. If Wikidata does not provide the information requested, the system refuses to answer.

The system is in an early proof of concept state.

Demo

Quickstart

To give it a try, use ➡️ this Google Colab Notebook or load AskWikidata_Quickstart.ipynb in your infrastructure.

Implementation

In order to answer questions based on Wikidata, the system uses retrieval augmented generation. First it transforms Wikidata items to text and generates embeddings for them. The user query is then embedded as well. Using nearest neighbor search, most relevant Wikidata items are identified. A reranker model selects only the best matches from the neighbors. Finally, these matches are incorporated into the LLM prompt in order to allow the LLM to generate using Wikidata knowledge.

All models, including the LLM, can run on the local machine using pytorch and bitsandbytes quantization. For nearest neighbor search, an annoy index is used.

Usage

Install dependencies

Nix

On Nix the dev shell will install all required dependencies.

nix develop .

Pip

Alternatively, install python requirements using pip.

pip install -r requirements.txt

Unpack provided caches

For faster execution, the results of some pre-computation steps are cached. In order to use those caches, unpack them:

bunzip2 --keep --force *.json.bz2

Generate dataset

Generate text representations for Wikidata items. The list of items to use is currently hardcoded in text_representation.py.

python text_representation.py

Answer a question

This python code will use AskWikidata to answer one question.

from askwikidata import AskWikidata

config = {
    "chunk_size": 1280,
    "chunk_overlap": 0,
    "index_trees": 1024,
    "retrieval_chunks": 16,
    "context_chunks": 5,
    "embedding_model_name": "BAAI/bge-small-en-v1.5",
    "reranker_model_name": "BAAI/bge-reranker-base",
    "qa_model_url": "Qwen/Qwen2.5-3B-Instruct",
}

askwikidata = AskWikidata(**config)
askwikidata.setup()
print(askwikidata.ask("Who is the current mayor of Berlin? And since when is them serving?"))

Interactive REPL

A simple interactive read eval print loop can be used to ask questions.

python repl.py

Run evaluation

A script to evaluate the performance of different configurations is provided.

python eval.py

Configure API Keys

If you do not want to use a local LLM, AskWikidata can access the Huggingface LLM API. Configure your Hugginface API key in the HUGGINGFACE_API_KEY environment variable.

Run tests

To execute the unit test suite, run:

$ python -m unittest

To get a coverage report, run

$ coverage run -m unittest
$ coverage report --omit="test_*,/nix/*" --show-missing

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
snippets		snippets
text_representations		text_representations
.envrc		.envrc
.gitignore		.gitignore
AskWikidata_Quickstart.ipynb		AskWikidata_Quickstart.ipynb
README.md		README.md
askwikidata.cast		askwikidata.cast
askwikidata.gif		askwikidata.gif
askwikidata.py		askwikidata.py
cache-1280-0-BAAI-bge-base-en-v1.5.json.bz2		cache-1280-0-BAAI-bge-base-en-v1.5.json.bz2
cache-1280-0-BAAI-bge-small-en-v1.5.json.bz2		cache-1280-0-BAAI-bge-small-en-v1.5.json.bz2
eval.py		eval.py
flake.lock		flake.lock
flake.nix		flake.nix
generate.py		generate.py
image.jpg		image.jpg
repl.py		repl.py
requirements.txt		requirements.txt
test_text_representation.py		test_text_representation.py
text_representation.py		text_representation.py
wikidata_item_cache.json.bz2		wikidata_item_cache.json.bz2
wikidata_label_cache.json.bz2		wikidata_label_cache.json.bz2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AskWikidata

Demo

Quickstart

Implementation

Usage

Install dependencies

Nix

Pip

Unpack provided caches

Generate dataset

Answer a question

Interactive REPL

Run evaluation

Configure API Keys

Run tests

About

Languages

rti/askwikidata

Folders and files

Latest commit

History

Repository files navigation

AskWikidata

Demo

Quickstart

Implementation

Usage

Install dependencies

Nix

Pip

Unpack provided caches

Generate dataset

Answer a question

Interactive REPL

Run evaluation

Configure API Keys

Run tests

About

Resources

Stars

Watchers

Forks

Languages