Name		Name	Last commit message	Last commit date
parent directory ..
DETAILS.md		DETAILS.md
README.md		README.md
__init__.py		__init__.py
create_wikipedia_dataset.py		create_wikipedia_dataset.py
create_wikipedia_search_index.py		create_wikipedia_search_index.py
extract.py		extract.py
factory.py		factory.py
query.py		query.py
search_internet.py		search_internet.py
search_wikipedia.py		search_wikipedia.py

README.md

Search tools

This package contains tools for internet and Wikipedia searches.

Search internet tool

The SearchInternetTool is a RAG-based search tool that performs internet searches across multiple search engines and synthesizes concise responses using a large language model (LLM).

The tool does not require API keys as it utilizes a local SearXNG meta search instance to query various search engines. Results from the search engines are retrieved from the internet, filtered and re-ranked by relevance and finally passed to the LLM to generate a concise response.

For a more detailed explanation of the implementation details refer to the section search internet tool implementation.

To use the SearchInternetTool you need to set up a local SearXNG instance and provide the endpoint URL to the tool.

Setup

Setup a local SearXNG instance using the official docker container (see also SearXNG docs).

docker run \
  --name searxng \
  -d -p 8080:8080 \
  -v "${PWD}/.searxng:/etc/searxng" \
  -e "BASE_URL=http://localhost:8080" \
  -e "INSTANCE_NAME=my-instance" \
  searxng/searxng:2024.5.24-75e4b6512

The .searxng/settings.yaml file used in this project has been modified to additionally support json mode:

   search:
     formats:
     - html
     - json

See getting started for instructions how to serve the Llama 3 model used by these tools.

Usage

from gba.tools.search import create_search_internet_tool

search_internet = create_search_internet_tool(
    llama3_endpoint="http://localhost:8084/completion",
    searxng_endpoint="http://localhost:8080",
)

response = search_internet.search(
    query="When was the video game 'The Last of Us' released"
)

Parameters

top_k_documents: number of top-k webpages to select from the search results (default: 3)
top_k_nodes_per_document: number of top-k relevant text nodes to select from each webpage for generating the response (default: 5)
top_k_snippets: number of top-k webpage snippets to include (default: top_k_documents)

The complete list of parameters can be found in the SearchInternetTool class.

Search Wikipedia tool

The SearchWikipediaTool is a RAG-based search tool designed for efficient search in a local Wikipedia dataset.

The tool utilizes multiple locally stored, quantized search indices for memory and runtime-efficient nearest neighbor searches in the dataset. Given a search query, the tool retrieves the most relevant text nodes from Wikipedia articles and synthesizes responses using a large language model (LLM).

Note: the dataset has a knowledge cutoff of November 2023.

For a more detailed explanation of the implementation details refer to the section search wikipedia tool implementation.

Usage

from gba.tools.search import create_search_wikipedia_tool

search_wikipedia = create_search_wikipedia_tool(
   llama3_endpoint="http://localhost:8084/completion",
)

response = search_wikipedia.search(
   search="Search Wikipedia for the launch date of the first iPhone."
)

Parameters

top_k_nodes: number of top-k text nodes to select from the initial search results (default: 10)
top_k_related_documents: number of top-k related documents to select from the initial search results (default: 1)
top_k_related_nodes: number of top-k text nodes to select from the related documents (default: 3)

The complete list of parameters can be found in the SearchWikipediaTool class.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

search

search

README.md

Search tools

Search internet tool

Setup

Usage

Parameters

Search Wikipedia tool

Usage

Parameters

Files

search

Directory actions

More options

Directory actions

More options

Latest commit

History

search

Folders and files

parent directory

README.md

Search tools

Search internet tool

Setup

Usage

Parameters

Search Wikipedia tool

Usage

Parameters