-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #79, Testing Current Score #84
base: main
Are you sure you want to change the base?
Conversation
""" | ||
|
||
cursor.execute(query) | ||
return cursor.fetchall() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EOF newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this should be validated by the repo standards check)
WITH random_crawl AS ( | ||
SELECT id | ||
FROM crawl | ||
ORDER BY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this came up before; ordering randomly the table is expensive means you are fetching every row and then picking up one.
it's cheaper to pick a row between 1 and count(*) and use that with OFFSET combined with LIMIT:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you told me about that, when I copy paste the query I forgot to change ORDER
by OFFSET
. It will be changed soon.
https://github.com/ai-cfia/.github/blob/main/profile/CONTRIBUTING.md |
Yes sorry, I forgot to put it as draft, I'm working on different PR at the same time for testing purpose, so everything is probably going to change. |
issue #79
Test our schema scoring to see if they are here/accurate.
Schema tested: Louis_v005
Test our score weights
Evaluation Criteria
Similarity: ?
Recency: Compare dates and scores to ensure alignment.
Traffic: ?
Current: Confirm that archived documents receive a score of 0.
Typicality: Compare the average number of site references within the dataset and verify if documents with more references receive higher scores.