Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #9, Developing a Search Function Test Utilizing LLM #47

Merged
merged 149 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
f3db963
issue #41 - creation of the sql view
melanie-fressard Nov 2, 2023
9362d33
issue #44: fix package configuration
k-allagbe Nov 8, 2023
acb8b5d
issue #44: fix module import
k-allagbe Nov 8, 2023
17c7877
Merge remote-tracking branch 'origin/main' into 41-individual-scoring
melanie-fressard Nov 9, 2023
da2ad52
issue #41 - adding avg score
melanie-fressard Nov 9, 2023
8ba87c8
modification of .env.template to match standards
melanie-fressard Nov 9, 2023
9adc2d8
imports modification
melanie-fressard Nov 9, 2023
31df218
Fixes #9, new script
Nov 14, 2023
4f94a84
issue#24, change louis.db for ailab.db
Nov 14, 2023
c90769a
issue #41 - creation of init in ailab/db/finesse
melanie-fressard Nov 14, 2023
e668662
issue #20 - moving search.py
melanie-fressard Nov 15, 2023
30f6069
issue #41 - script to use the code
melanie-fressard Nov 15, 2023
67eca6e
issue #41 - debbug connexion db
melanie-fressard Nov 15, 2023
c97cd7d
issue #41 - debbug sql function
melanie-fressard Nov 15, 2023
21c6614
issue #41 - formatting
melanie-fressard Nov 16, 2023
403f782
issue #41 - separating creation and select
melanie-fressard Nov 16, 2023
32bc58a
issue #41 - csv file output + correction sql
melanie-fressard Nov 16, 2023
a1d7ab3
issue #41 - suppressing similarity column
melanie-fressard Nov 17, 2023
a3d10d2
issue #41 - reworking output making
melanie-fressard Nov 17, 2023
4924fc7
changes in template
melanie-fressard Nov 20, 2023
68cf448
issue #49 - modification from changing repo name
melanie-fressard Nov 20, 2023
3c38ab7
issue #49 - results file by query name
melanie-fressard Nov 20, 2023
ecfc757
Adding base script
JolanThomassin Nov 21, 2023
a687fdc
adressing issue #48 and final line
melanie-fressard Nov 21, 2023
92f327a
issue #49 - minor correction + script to launch
melanie-fressard Nov 21, 2023
08401f8
Fixes #9, Query and LLM Q&A generation
JolanThomassin Nov 21, 2023
7a195f1
changes of the call pf the search function
melanie-fressard Nov 21, 2023
6888691
issue #49 - search output
melanie-fressard Nov 21, 2023
67d5b63
line eof
melanie-fressard Nov 21, 2023
8f34a6c
Enhances Chunk Selection Quality - Resolves #9
JolanThomassin Nov 21, 2023
93841ef
lintests fails attempt to correct
melanie-fressard Nov 22, 2023
16a5528
lintests fails attempt to correct
melanie-fressard Nov 22, 2023
4363ef1
Merge branch '49-expand-search-examples' of https://github.com/ai-cfi…
melanie-fressard Nov 22, 2023
baaa739
issue #41 - eof line
melanie-fressard Nov 22, 2023
963c049
issue #49 - secrets correction
melanie-fressard Nov 22, 2023
7f436f1
issue #41 - eof without tabs
melanie-fressard Nov 23, 2023
bb483ed
issue #49 - removing search balise
melanie-fressard Nov 23, 2023
1e40ae9
eof
melanie-fressard Nov 23, 2023
009d2de
Fixes #9, new SQL files
JolanThomassin Nov 23, 2023
53758f5
issue #41 - avg instead of sum
melanie-fressard Nov 23, 2023
eb5e1e1
Fixes #9, save, check tokens, better query
JolanThomassin Nov 23, 2023
9bd6b6c
issue #49 - linttest correction
melanie-fressard Nov 24, 2023
27bec9f
test
melanie-fressard Nov 24, 2023
ffb7126
issue #31 - correction of lint test
melanie-fressard Nov 24, 2023
7f4d360
Merge pull request #50 from ai-cfia/49-expand-search-examples
melanie-fressard Nov 24, 2023
9428c57
issue #31 - resolve lint test
melanie-fressard Nov 24, 2023
080a005
issue #31 - lint test merge
melanie-fressard Nov 24, 2023
352b767
issue #41 - solve merge conflict
melanie-fressard Nov 24, 2023
31ee685
Merge branch 'main' into 41-individual-scoring
melanie-fressard Nov 24, 2023
776cbfb
Merge pull request #42 from ai-cfia/41-individual-scoring
melanie-fressard Nov 24, 2023
6446b30
issue #54: workflow call to main
vivalareda Nov 24, 2023
f5ca3e4
Merge pull request #55 from ai-cfia/issue-54-fix-workflow-for-ailab-db
vivalareda Nov 24, 2023
8871006
issue #56: run deploy only on main
vivalareda Nov 25, 2023
b862a4a
Merge branch 'main' into k-allagbe/issue44-package-submodule-configur…
k-allagbe Nov 25, 2023
3d44f7c
Merge pull request #60 from ai-cfia/k-allagbe/issue44-package-submodu…
k-allagbe Nov 27, 2023
f9e0bc7
Bump certifi from 2023.5.7 to 2023.7.22
dependabot[bot] Nov 27, 2023
e7cec22
Bump urllib3 from 2.0.3 to 2.0.7
dependabot[bot] Nov 27, 2023
d8079a5
Bump aiohttp from 3.8.4 to 3.8.6
dependabot[bot] Nov 27, 2023
59cf33b
Merge branch 'main' into 56-run-deploy-workflow-step-only-if-branch-i…
vivalareda Nov 27, 2023
ffe988a
Merge pull request #57 from ai-cfia/56-run-deploy-workflow-step-only-…
vivalareda Nov 27, 2023
021b3cf
Merge branch 'main' into dependabot/pip/certifi-2023.7.22
melanie-fressard Nov 27, 2023
782a6a2
Merge branch 'main' into dependabot/pip/urllib3-2.0.7
melanie-fressard Nov 27, 2023
7ece22e
Merge branch 'main' into dependabot/pip/aiohttp-3.8.6
melanie-fressard Nov 27, 2023
e9164f6
Merge pull request #53 from ai-cfia/dependabot/pip/urllib3-2.0.7
melanie-fressard Nov 27, 2023
6ddcca1
Merge branch 'main' into dependabot/pip/aiohttp-3.8.6
melanie-fressard Nov 27, 2023
f372a04
Merge pull request #51 from ai-cfia/dependabot/pip/aiohttp-3.8.6
melanie-fressard Nov 27, 2023
6c382d9
Merge branch 'main' into dependabot/pip/certifi-2023.7.22
melanie-fressard Nov 27, 2023
f29d8d6
Merge pull request #52 from ai-cfia/dependabot/pip/certifi-2023.7.22
melanie-fressard Nov 27, 2023
eea27c0
Bump aiohttp from 3.8.6 to 3.9.0
dependabot[bot] Nov 28, 2023
6e58fa0
Fixes #9, new SQL script
JolanThomassin Nov 28, 2023
62ac36d
issue #61 - add md file
melanie-fressard Nov 28, 2023
5733ba0
log on schema
melanie-fressard Nov 29, 2023
8b6b929
fix of lint tests
melanie-fressard Nov 29, 2023
1bfa1cf
ruff error
melanie-fressard Nov 29, 2023
8607c58
Merge pull request #63 from ai-cfia/dependabot/pip/aiohttp-3.9.0
melanie-fressard Nov 29, 2023
6525f59
issue #61 - adding jolan's scores
melanie-fressard Nov 29, 2023
9b21633
issue #61 - adding a title and last updated date
melanie-fressard Nov 29, 2023
f83b7b3
Fixes #9, refactored code
JolanThomassin Nov 29, 2023
8fca008
Fixes #9, black formatter
JolanThomassin Nov 29, 2023
27b5df4
issue #61 - added similarity
melanie-fressard Nov 29, 2023
0e0365c
issue #61 - add file where each score is computed
melanie-fressard Dec 1, 2023
3e93698
Merge branch 'main' into 61-explain-scores-and-weights
melanie-fressard Dec 1, 2023
f1a39a4
issue #61 - removing common standards
melanie-fressard Dec 1, 2023
6ae4b00
Merge remote-tracking branch 'refs/remotes/origin/61-explain-scores-a…
melanie-fressard Dec 1, 2023
1f06baa
issue #61 - adding scale for each score
melanie-fressard Dec 1, 2023
656f100
issue #61 - changing description of didactic
melanie-fressard Dec 5, 2023
c8d5b51
issue #61 - link to file
melanie-fressard Dec 5, 2023
40f270e
adding future scores
melanie-fressard Dec 6, 2023
7db06b4
Merge pull request #64 from ai-cfia/61-explain-scores-and-weights
melanie-fressard Dec 11, 2023
5e716ce
Fixes #9, new SQL scripts
JolanThomassin Dec 12, 2023
8d2e67f
Fixes #9, code clarification
JolanThomassin Dec 12, 2023
afffd40
Fixes #9, unit test for search qna function
JolanThomassin Dec 12, 2023
42965b0
Fixes #9, set schema fix
JolanThomassin Dec 13, 2023
2c0e141
Fixes #9, adding seed to get random chunk
JolanThomassin Dec 13, 2023
4f869e7
Fixes #9, script rename
JolanThomassin Dec 13, 2023
12448f0
Fixes #9, delete old script
JolanThomassin Dec 13, 2023
86adfa3
Fixes #9, renaming scripts mistakes
JolanThomassin Dec 13, 2023
9e4aa52
Fixes #9, cursor only open once
JolanThomassin Dec 13, 2023
a918685
Fixes #9, black formatter
JolanThomassin Dec 13, 2023
e9e0533
Fixes #9, character length
Jan 15, 2024
4f8664d
Fixes #9, magic string
Jan 15, 2024
6d85615
Fixes #9, argparse
Jan 15, 2024
384e58e
Fixes #9, new script
Nov 14, 2023
9fa9902
issue#24, change louis.db for ailab.db
Nov 14, 2023
6bc85ed
Fixes #9, Query and LLM Q&A generation
JolanThomassin Nov 21, 2023
38d8e42
Enhances Chunk Selection Quality - Resolves #9
JolanThomassin Nov 21, 2023
c8e68bb
Fixes #9, new SQL files
JolanThomassin Nov 23, 2023
02252f0
Fixes #9, save, check tokens, better query
JolanThomassin Nov 23, 2023
12b3042
Fixes #9, new SQL script
JolanThomassin Nov 28, 2023
5116063
Fixes #9, refactored code
JolanThomassin Nov 29, 2023
e509c4b
Fixes #9, black formatter
JolanThomassin Nov 29, 2023
9b2f83b
Fixes #9, new SQL scripts
JolanThomassin Dec 12, 2023
f3585d3
Fixes #9, code clarification
JolanThomassin Dec 12, 2023
3b89bf6
Fixes #9, unit test for search qna function
JolanThomassin Dec 12, 2023
b172f5f
Fixes #9, set schema fix
JolanThomassin Dec 13, 2023
8fb2add
Fixes #9, adding seed to get random chunk
JolanThomassin Dec 13, 2023
daffde7
Fixes #9, script rename
JolanThomassin Dec 13, 2023
f1d6ec5
Fixes #9, delete old script
JolanThomassin Dec 13, 2023
131830f
Fixes #9, renaming scripts mistakes
JolanThomassin Dec 13, 2023
8df4ba4
Fixes #9, cursor only open once
JolanThomassin Dec 13, 2023
0e9dfe5
Fixes #9, black formatter
JolanThomassin Dec 13, 2023
a625206
Fixes #9, character length
Jan 15, 2024
408acb5
Fixes #9, magic string
Jan 15, 2024
53e7494
Fixes #9, argparse
Jan 15, 2024
d08f373
Merge remote-tracking branch 'origin/issue#9-search-function-test-jt'…
Feb 1, 2024
2565a5c
Fixes #9, fixed ruff error
Feb 8, 2024
3d1266d
Fixes #9, file rename
Feb 8, 2024
c25f8fc
Fixes #9, test first function
Feb 8, 2024
a110fad
Fixes #9, clearer JSON template
JolanThomassin Feb 8, 2024
9215998
Fixes #9, missing line break
JolanThomassin Feb 8, 2024
4c02761
Fixes #9, path changes for test
JolanThomassin Feb 8, 2024
b1a769e
Fixes #9, new ENV var for schema
JolanThomassin Feb 8, 2024
6b3ede6
Fixes #9, test_generate_question
JolanThomassin Feb 8, 2024
ee5374e
Fixes #9, lint ruff error
JolanThomassin Feb 8, 2024
b721451
Fixes #9, add black formatter extension
JolanThomassin Feb 8, 2024
418399b
Fixes #9, add semver to requirements
JolanThomassin Feb 8, 2024
39e159d
Fixes #9, changes semver version
JolanThomassin Feb 8, 2024
b86fbe9
Fixes #9, test for db failure
JolanThomassin Feb 12, 2024
b7e5b56
Fixes #9, replace sys.exit(1)
JolanThomassin Feb 12, 2024
2046aa7
Fixes #9, import removed
JolanThomassin Feb 22, 2024
a40b7d5
Fixes #9, separate save for test
JolanThomassin Feb 22, 2024
7ecb190
Fixes #9, import at the top
JolanThomassin Feb 22, 2024
252d3b3
Fixes #9, import at the top
JolanThomassin Feb 22, 2024
d19bb55
Fixes #9, fixed number of generated question
JolanThomassin Feb 26, 2024
f1b016a
Fixes #9, adding "question_quality" variable
JolanThomassin Feb 26, 2024
ca58322
Fixes #9, random query new method
JolanThomassin Feb 29, 2024
8cd1664
Fixes #9, adding parameter into call
JolanThomassin Feb 29, 2024
fa67e42
Fixes #9, user_prompt more example
JolanThomassin Feb 29, 2024
71a436f
Fixes #9, remove "question_quality" variable
JolanThomassin Mar 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
"extensions": [
"timonwong.shellcheck",
"GitHub.vscode-pull-request-github",
"charliermarsh.ruff"
"charliermarsh.ruff",
"ms-python.black-formatter"
]
}
},
Expand Down
6 changes: 4 additions & 2 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@ USER=
PGHOST=
POSTGRES_PASSWORD=
PGPASSWORD=
PGDATA=
OPENAI_API_KEY=
AZURE_OPENAI_SERVICE=
OPENAI_ENDPOINT=
OPENAI_API_ENGINE=
LOUIS_SCHEMA=
DB_SERVER_CONTAINER_NAME=
PGDATA=
FINESSE_WEIGHTS=
4 changes: 2 additions & 2 deletions .github/workflows/pull-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
secrets: inherit

build:
# if: github.ref == 'refs/heads/main'
if: github.ref == 'refs/heads/main'
needs: lint-test
uses: ai-cfia/github-workflows/.github/workflows/workflow-build-container.yml@main
with:
Expand All @@ -23,7 +23,7 @@ jobs:
secrets: inherit

deploy:
# if: github.ref == 'refs/heads/main'
if: github.ref == 'refs/heads/main'
needs: build
uses: ai-cfia/github-workflows/.github/workflows/workflow-deploy-gcp.yml@main
with:
Expand Down
4 changes: 3 additions & 1 deletion DEVELOPER.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,16 @@

* OPENAI_API_KEY: The API key required for authentication when making requests to the OpenAI API. It can be found [here](https://portal.azure.com/#home).

* AZURE_OPENAI_ENDPOINT: The link used to call into Azure OpenAI endpoints. It can be found at the same place as the OPENAI_API_KEY.
* OPENAI_ENDPOINT: The link used to call into Azure OpenAI endpoints. It can be found at the same place as the OPENAI_API_KEY.

* OPENAI_API_ENGINE: The name of the model deployment you want to use (ex:ailab-gpt-35-turbo).

* LOUIS_SCHEMA: The Louis schema within database (ex: louis_v005).

* DB_SERVER_CONTAINER_NAME: The name of your database server container (ex: louis-db-server).

* AILAB_SCHEMA_VERSION: The version of the schema you want to use.

* Run database locally (see bin/postgres.sh)
* Restore latest schema dump

Expand Down
41 changes: 27 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,32 @@
# What is louis-db ?
# What is ailab-db ?

Louis-db is the database of [Louis](https://github.com/ai-cfia/louis). It includes all of the python and bash scripts as well as sql functions. It uses ada-002 API (Application Programming Interface) for semantic search. Ada is a model that vectorises text to make a semantic representation out of it. It follows this process :
> Taking the text, passing it through the model, putting it in an embedding table that will work like an index.
Ailab-db contains the database of [Louis](https://github.com/ai-cfia/louis),
[Nachet](https://github.com/ai-cfia/nachet-backend) and any other product of the
CFIA's AI Lab. It includes all of the python and bash scripts as well as sql
functions. It uses ada-002 API (Application Programming Interface) for semantic
search. Ada is a model that vectorises text to make a semantic representation
out of it. It follows this process :
> Taking the text, passing it through the model, putting it in an embedding
> table that will work like an index.

- The *bin* folder : This folder contains all of the bash scripts that would be useful to set up the database.
- The *bin* folder : This folder contains all of the bash scripts that would be
useful for the backends or to set up the database.

- The *louis* folder : This folder is a python module structure that allows connections to the database or the api.
- The *ailab* folder : This folder is a python module structure that allows
connections to the database or the api as well as containing useful functions
for product backends.

- The *postgres* folder : This folder contains the bash script that will set up the docker container.
- The *postgres* folder : This folder contains the bash script that will set up
the docker container.

- The *sql* folder : This folder holds all of the sql functions and scripts.

- The *tests* folder : This is the test folder, it allows you to test the code.

Here is the database schema :
![database schema](img/database-schema.png)
Here is the database schema : ![database schema](img/database-schema.png)

If you need to set up the database locally, please follow [this procedure](setup-procedure.md).
If you need to set up the database locally, please follow [this
procedure](setup-procedure.md).

---

Expand All @@ -27,17 +37,20 @@ If you need to set up the database locally, please follow [this procedure](setup
If you need to interface with the database, use this to install:

```
pip install git+https://github.com/ai-cfia/louis[email protected]
pip install git+https://github.com/ai-cfia/ailab[email protected]
```

You'll often want to add, move or modify existing database layer functions found in louis-db from a client repository.
You'll often want to add, move or modify existing database layer functions found
in ailab-db from a client repository.

To edit, you can install an editable version of the package dependencies such as:
To edit, you can install an editable version of the package dependencies such
as:

```
pip install -e git+https://github.com/ai-cfia/louis-db#egg=louis_db
```

this will checkout the latest source in a local git in src/louis-db allowing edits in that directory to be immediately available for use by louis-crawler.
this will checkout the latest source in a local git in src/louis-db allowing
edits in that directory to be immediately available for use by louis-crawler.

Don't forget to create a PR with your changes once you're done!
Don't forget to create a PR with your changes once you're done!
4 changes: 2 additions & 2 deletions ailab/db/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Database functions for the Louis project."""
"""Database functions for the ailab project."""
import hashlib
import logging
import os
Expand Down Expand Up @@ -72,4 +72,4 @@ def hash(text):

We hash using the Python library to remove a roundtrip to the database
"""
return hashlib.md5(text.encode()).hexdigest()
return hashlib.md5(text.encode()).hexdigest()
10 changes: 5 additions & 5 deletions ailab/db/api/__init__.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
import louis.models.openai as openai
import ailab.models.openai as openai
import os
import json
import sys
import louis.db as db
import ailab.db as db
import dotenv
# This is used to load the .env file
dotenv.load_dotenv()

FINESSE_WEIGHTS = os.environ.get("FINESSE_WEIGHTS") \
or db.raise_db_error("FINESSE_WEIGHTS is not set")
or db.raise_error("FINESSE_WEIGHTS is not set")


if FINESSE_WEIGHTS:
Expand Down Expand Up @@ -71,7 +71,7 @@ def search(cursor, query_embedding):
'text': ' '.join(sys.argv[1:]),
'query_embedding': query_embedding,
'match_threshold': 0.5,
'match_count': 1,
'match_count': 10,
'weights': json.dumps(FINESSE_JSON_PARSED_WEIGHTS)
}

Expand Down Expand Up @@ -105,4 +105,4 @@ def search_from_text_query(cursor, query):
else:
data.update(db_data)
docs = search(cursor, data['embedding'])
return docs
return docs
2 changes: 1 addition & 1 deletion ailab/db/crawler/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import psycopg
import numpy as np

import louis.db as db
import ailab.db as db

def link_pages(cursor, source_url, destination_url):
"""Link two pages together in the database."""
Expand Down
32 changes: 32 additions & 0 deletions ailab/db/finesse/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import os
import json

# Constants
DEFAULT_JSON_PATH = "/ailab/db/finesse/prompt"


def load_prompt(prompt_path, filename):
full_path = os.path.join(prompt_path, filename)

with open(full_path, "r") as file:
content = file.read()

return content


def load_json_template(json_path=DEFAULT_JSON_PATH):
"""
Load a JSON template from the specified path.

Returns:
str: A JSON-formatted string representing the loaded data.
"""
json_file_path = json_path + "/json_template.json"
k-allagbe marked this conversation as resolved.
Show resolved Hide resolved

with open(json_file_path, "r") as file:
data = json.load(file) # Load the JSON data as a Python dictionary

# Convert the dictionary back to a JSON-formatted string
content = json.dumps(data, indent=4) # Use 'indent' for pretty printing

return content
10 changes: 10 additions & 0 deletions ailab/db/finesse/prompt/json_template.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"page_score": "",
"crawl_id": "",
"chunk_id": "",
"title": "",
"url": "",
"text_content": "",
"question": "",
"answer": ""
}
1 change: 1 addition & 0 deletions ailab/db/finesse/prompt/qna_system_prompt.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Generate Q&A pairs from search results, ensuring that the language in the Q&A pairs matches that of the search results. Create a single JSON file with appropriate formatting and indentation. Always leave "text_content" empty. If you got more than one query, only treat and return result for the first one. Always return a SINGLE JSON file.
1 change: 1 addition & 0 deletions ailab/db/finesse/prompt/qna_user_prompt.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Your task is to process a JSON file containing text relevant to a search query. Produce a new JSON file completing all existing keys in the JSON template. Specifically, generate entries for 'question' and 'answer' where 'question' represents a question derived from the text and 'answer' provides an appropriate answer addressing the generated question.
46 changes: 46 additions & 0 deletions ailab/db/finesse/test_queries/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import time
import semver
import math


def get_random_chunk(cursor, schema_version, seed=None):
assert (
semver.compare(schema_version, "0.0.6") >= 0
), "Schema version must be >= 0.0.6"
schema_version = "louis_" + schema_version

if seed is None:
seed = math.sin(time.time())

# Execute the SET commands separately
cursor.execute(f'SET SEARCH_PATH TO "{schema_version}", public;')
cursor.execute(f"SET SEED TO {seed};")

query = """
SELECT
dc.score AS score, cr.id AS crawl_id, ch.id AS chunk_id, ch.title, cr.url, ch.text_content
FROM
Chunk ch
INNER JOIN
documents dc ON ch.id = dc.chunk_id
INNER JOIN
html_content_to_chunk hctc ON ch.id = hctc.chunk_id
INNER JOIN
html_content hc ON hctc.md5hash = hc.md5hash
INNER JOIN
crawl cr ON hc.md5hash = cr.md5hash
WHERE
dc.score > 0.01
ORDER BY
floor(random() * (
SELECT
COUNT(*)
FROM
Chunk
))
LIMIT
1;
"""

cursor.execute(query)
return cursor.fetchall()
2 changes: 1 addition & 1 deletion ailab/models/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,4 @@ def get_chat_answer(system_prompt, user_prompt, max_token):
{"role": "user", "content": (user_prompt)}
]
)
return response
return response
72 changes: 70 additions & 2 deletions benchmarking/search_results.json
Original file line number Diff line number Diff line change
@@ -1,2 +1,70 @@
{"commit_version": "b2badf1069c3feb0475a4b56da8ddb4d61ff06f0", "function_version": "0.0.1", "args": ["\"how to bring a cat from france to canada\""], "kwargs": {}, "function_name": "init_bench", "start_time": "2023-10-30T20:06:34.197284", "finish_time": "2023-10-30T20:06:34.495414", "return_value": [{"search": [{"id": "e9a64ea3-c7c4-4f0e-bd49-c4e1d9de1a7f", "url": "https://inspection.canada.ca/importing-food-plants-or-animals/pets/eng/1326600389775/1326600500578", "score": 3.017821829567311, "title": "Bringing animals to Canada: Importing and travelling with pets - Canadian Food Inspection Agency", "scores": {"current": 1, "recency": 0.8708793589157114, "traffic": 0.8188254189764937, "similarity": 0.8272663498460967, "typicality": 0.004253509145044662}, "content": "Bringing animals to Canada: Importing and travelling with pets If you are travelling with a pet or planning to import an animal to Canada, you will need the right paperwork at the border to meet Canada's import requirements\n If you don't, you risk experiencing delays at the border and your animal may not be allowed into Canada\n Canada has specific import requirements in place to avoid introducing animal diseases to protect its people, plants and animals", "query_id": "9f5e33a0-6b81-48da-9681-349dd6d3bb9c", "subtitle": "Bringing animals to Canada: Importing and travelling with pets", "last_updated": "2022-10-20", "tokens_count": 84}]}]}
{"commit_version": "f8c9d7a72944fe0c94fd41bccd2633fc17a11d8c", "function_version": "0.0.1", "args": ["\"how to bring a cat from france to canada\""], "kwargs": {}, "function_name": "init_bench", "start_time": "2023-10-31T14:34:26.463150", "finish_time": "2023-10-31T14:34:32.552876", "return_value": [{"search": [{"id": "e9a64ea3-c7c4-4f0e-bd49-c4e1d9de1a7f", "url": "https://inspection.canada.ca/importing-food-plants-or-animals/pets/eng/1326600389775/1326600500578", "score": 3.017821829567311, "title": "Bringing animals to Canada: Importing and travelling with pets - Canadian Food Inspection Agency", "scores": {"current": 1, "recency": 0.8708793589157114, "traffic": 0.8188254189764937, "similarity": 0.8272663498460967, "typicality": 0.004253509145044662}, "content": "Bringing animals to Canada: Importing and travelling with pets If you are travelling with a pet or planning to import an animal to Canada, you will need the right paperwork at the border to meet Canada's import requirements\n If you don't, you risk experiencing delays at the border and your animal may not be allowed into Canada\n Canada has specific import requirements in place to avoid introducing animal diseases to protect its people, plants and animals", "query_id": "b7b0a221-1239-48e9-98f7-016d75946b38", "subtitle": "Bringing animals to Canada: Importing and travelling with pets", "last_updated": "2022-10-20", "tokens_count": 84}]}]}
{
"commit_version": "b2badf1069c3feb0475a4b56da8ddb4d61ff06f0",
"function_version": "0.0.1",
"args": [
"\"how to bring a cat from france to canada\""
],
"kwargs": {},
"function_name": "init_bench",
"start_time": "2023-10-30T20:06:34.197284",
"finish_time": "2023-10-30T20:06:34.495414",
"return_value": [
{
"search": [
{
"id": "e9a64ea3-c7c4-4f0e-bd49-c4e1d9de1a7f",
"url": "https://inspection.canada.ca/importing-food-plants-or-animals/pets/eng/1326600389775/1326600500578",
"score": 3.017821829567311,
"title": "Bringing animals to Canada: Importing and travelling with pets - Canadian Food Inspection Agency",
"scores": {
"current": 1,
"recency": 0.8708793589157114,
"traffic": 0.8188254189764937,
"similarity": 0.8272663498460967,
"typicality": 0.004253509145044662
},
"content": "Bringing animals to Canada: Importing and travelling with pets If you are travelling with a pet or planning to import an animal to Canada, you will need the right paperwork at the border to meet Canada's import requirements\n If you don't, you risk experiencing delays at the border and your animal may not be allowed into Canada\n Canada has specific import requirements in place to avoid introducing animal diseases to protect its people, plants and animals",
"query_id": "9f5e33a0-6b81-48da-9681-349dd6d3bb9c",
"subtitle": "Bringing animals to Canada: Importing and travelling with pets",
"last_updated": "2022-10-20",
"tokens_count": 84
}
]
}
]
}
{
"commit_version": "f8c9d7a72944fe0c94fd41bccd2633fc17a11d8c",
"function_version": "0.0.1",
"args": [
"\"how to bring a cat from france to canada\""
],
"kwargs": {},
"function_name": "init_bench",
"start_time": "2023-10-31T14:34:26.463150",
"finish_time": "2023-10-31T14:34:32.552876",
"return_value": [
{
"search": [
{
"id": "e9a64ea3-c7c4-4f0e-bd49-c4e1d9de1a7f",
"url": "https://inspection.canada.ca/importing-food-plants-or-animals/pets/eng/1326600389775/1326600500578",
"score": 3.017821829567311,
"title": "Bringing animals to Canada: Importing and travelling with pets - Canadian Food Inspection Agency",
"scores": {
"current": 1,
"recency": 0.8708793589157114,
"traffic": 0.8188254189764937,
"similarity": 0.8272663498460967,
"typicality": 0.004253509145044662
},
"content": "Bringing animals to Canada: Importing and travelling with pets If you are travelling with a pet or planning to import an animal to Canada, you will need the right paperwork at the border to meet Canada's import requirements\n If you don't, you risk experiencing delays at the border and your animal may not be allowed into Canada\n Canada has specific import requirements in place to avoid introducing animal diseases to protect its people, plants and animals",
"query_id": "b7b0a221-1239-48e9-98f7-016d75946b38",
"subtitle": "Bringing animals to Canada: Importing and travelling with pets",
"last_updated": "2022-10-20",
"tokens_count": 84
}
]
}
]
}
4 changes: 2 additions & 2 deletions bin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This assumes:

* you are running WSL
* you are running a dockerized version of Postgresql 15 under WSL
* you are running louis-db in a DevContainer under Visual Studio Code
* you are running ailab-db in a DevContainer under Visual Studio Code
* your source is on WSL under ~/src

## configuration
Expand Down Expand Up @@ -61,4 +61,4 @@ validate manually that schema is as expected here (dbBeaver ERD diagram) before

```
./bin/load-versioned-data.sh louis_v005
```
```
3 changes: 1 addition & 2 deletions bin/deprecated/migrate.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

set -x

SCHEMA_CHANGES=sql/schema2.sql
Expand Down Expand Up @@ -34,4 +33,4 @@ if [ "$?" -eq 0 ]; then
else
echo "update schema failed"
exit 1
fi
fi
7 changes: 7 additions & 0 deletions bin/generate-qna.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
DIRNAME=$(dirname "$0")
. "$DIRNAME"/lib.sh

PROMPT_PATH=$PROJECT_DIR/ailab/db/finesse/prompt

PYTHONPATH=$PROJECT_DIR python "$DIRNAME"/generate_qna.py "$PROMPT_PATH" --storage_path "../qna-test"
Loading
Loading