Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #79, Testing Current Score #84

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

JolanThomassin
Copy link
Contributor

@JolanThomassin JolanThomassin commented Mar 14, 2024

issue #79

Test our schema scoring to see if they are here/accurate.

Schema tested: Louis_v005

  • Pick a list of random chunk/crawl

Test our score weights

  • Similarity
  • Recency
  • Traffic
  • Current
  • Typicality

Evaluation Criteria

Similarity: ?
Recency: Compare dates and scores to ensure alignment.
Traffic: ?
Current: Confirm that archived documents receive a score of 0.
Typicality: Compare the average number of site references within the dataset and verify if documents with more references receive higher scores.

@JolanThomassin JolanThomassin added this to the louis_v005 milestone Mar 14, 2024
@JolanThomassin JolanThomassin self-assigned this Mar 14, 2024
@JolanThomassin JolanThomassin linked an issue Mar 14, 2024 that may be closed by this pull request
6 tasks
"""

cursor.execute(query)
return cursor.fetchall()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EOF newline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this should be validated by the repo standards check)

WITH random_crawl AS (
SELECT id
FROM crawl
ORDER BY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this came up before; ordering randomly the table is expensive means you are fetching every row and then picking up one.

it's cheaper to pick a row between 1 and count(*) and use that with OFFSET combined with LIMIT:

https://www.postgresql.org/docs/current/queries-limit.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you told me about that, when I copy paste the query I forgot to change ORDER by OFFSET. It will be changed soon.

@rngadam
Copy link
Contributor

rngadam commented Mar 21, 2024

  • the issue reference issue #79 should be in the description to make it clickable (edited)
  • this PR is open but no reviewers; did you mean to set it at Draft?
    image
  • branch name is not following our naming conventions: issue#79-testing-current-score

https://github.com/ai-cfia/.github/blob/main/profile/CONTRIBUTING.md

image

@JolanThomassin
Copy link
Contributor Author

Yes sorry, I forgot to put it as draft, I'm working on different PR at the same time for testing purpose, so everything is probably going to change.

@JolanThomassin JolanThomassin marked this pull request as draft March 21, 2024 17:54
@JolanThomassin JolanThomassin removed their assignment Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Archived in project
Development

Successfully merging this pull request may close these issues.

Testing current Score in our Schema
2 participants