Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search functions return wrong content in ailab-db #91

Open
k-allagbe opened this issue Mar 27, 2024 · 0 comments
Open

Search functions return wrong content in ailab-db #91

k-allagbe opened this issue Mar 27, 2024 · 0 comments
Assignees

Comments

@k-allagbe
Copy link
Member

Description

Search functions now return funny values for the content field.

def search(cursor, query_embedding):
    """Search matching documents with a given query and return a dict."""
    data = {
        'text': ' '.join(sys.argv[1:]),
        'query_embedding': query_embedding,
        'match_threshold': 0.5,
        'match_count': 10,
        'weights': json.dumps(FINESSE_JSON_PARSED_WEIGHTS)
    }

    cursor.execute("""
        SELECT *
        FROM search(%(text)s, %(query_embedding)s::vector, %(match_threshold)s,
                   %(match_count)s, %(weights)s::JSONB)
    """, data)
    # turn into list of dict now to preserve dictionaries
    results = cursor.fetchall()
    from pprint import pprint
    pprint(results)
    return [dict(r) for r in results[0]["search"]]

Printed value for "who's the president?"

[
    {
        "search": [
            {
                "content": "2004",
                "id": "1bc6def7-339e-49cf-996c-8559c8f074b0",
                "last_updated": "2022-08-02",
                "query_id": "b685ff7a-7bc3-40c7-88de-8f6183a13f2a",
                "score": 0.43899683490982105,
                "scores": {
                    "current": 1,
                    "recency": 0.8294036718602102,
                    "similarity": 0.778211130010486,
                    "traffic": 0.08711416212970638,
                    "typicality": 0.0012760527435133986,
                },
                "subtitle": "2004",
                "title": "L'ACIA : beaucoup de chemin parcouru depuis 1997 - "
                "Agence canadienne d'inspection des aliments",
                "url": "https://inspection.canada.ca/inspecter-et-proteger/salubrite-des-aliments/chemin-parcouru/fra/1655235733366/1655496070310",
            },
            {
                "content": "2004",
                "id": "9a8ba37c-8b15-4b3f-9f5d-49ca50f92d35",
                "last_updated": "2022-08-02",
                "query_id": "b685ff7a-7bc3-40c7-88de-8f6183a13f2a",
                "score": 0.43874722259501314,
                "scores": {
                    "current": 1,
                    "recency": 0.8294036718602102,
                    "similarity": 0.778211130010486,
                    "traffic": 0.08586610055566671,
                    "typicality": 0.0012760527435133986,
                },
                "subtitle": "2004",
                "title": "L'ACIA : beaucoup de chemin parcouru depuis 1997 - "
                "Agence canadienne d'inspection des aliments",
                "url": "https://inspection.canada.ca/inspecter-et-proteger/sante-des-vegetaux/chemin-parcouru/fra/1655235733366/1655496024138",
            }
        ]
    }
]
  1. The chunk's full text or snippet is expected to be received in the content field.
  2. The subtitle values also look funny, although I'm not sure what is supposed to be received there.
  3. I suggest we receive the value (list of results) of the search field instead of [ { "search": [...] } ]

For comparison we receive this from azure-db:

[
    {
        "content": "Kochhar\n   \n \n  CFIA <strong>President</strong>\n Dr.",
        "id": "ZGI2M2VmNjktNTVmMC00ODQ2LThlZWItZDljYTYwZDMwNTI10",
        "last_updated": "2023-04-18T00:00:00Z",
        "score": 8.465198,
        "title": "Dr. Harpreet S. Kochhar - Canadian Food Inspection Agency",
        "url": "https://inspection.canada.ca/about-cfia/organizational-structure/cfia-president/eng/1681496883837/1681496884212",
    },
    {
        "content": "<strong>The</strong> CFIA is headed by a "
        "<strong>President</strong>, who has <strong>the</strong> rank "
        "and all <strong>the</strong> powers of a Deputy Head of a "
        "Department.",
        "id": "ZDg1YzNlMjEtMzhjMS00NTQ2LWFhYTEtM2ZjOWUyOTZmZWFm0",
        "last_updated": "2017-08-28T00:00:00Z",
        "score": 8.322858,
        "title": "Canadian Food Inspection Agency (CFIA) - Quarterly Financial "
        "Report (QFR) for the Quarter ended June 30, 2017 - Canadian Food "
        "Inspection Agency",
        "url": "https://inspection.canada.ca/about-cfia/transparency/corporate-management-reporting/reports-to-parliament/financial-reporting/quarter-ended-june-30-2017/eng/1502989987656/1502989988316",
    },
]

Acceptance criteria

  • Full text or relevant snippet is received in the content field
@github-project-automation github-project-automation bot moved this to Todo in Database Mar 27, 2024
@k-allagbe k-allagbe changed the title Search functions return wrong content in ailab-db Search functions return wrong content in ailab-db Mar 27, 2024
@k-allagbe k-allagbe self-assigned this Apr 2, 2024
@k-allagbe k-allagbe added this to Finesse Apr 2, 2024
@k-allagbe k-allagbe moved this to Todo in Finesse Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Archived in project
Development

No branches or pull requests

1 participant