-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR #244 [searching] is prone to false positives [ALSO: database integrity questions about safely/synchronously deleting "videos" from both/all DB's] #246
Comments
The challenge will likely be establishing a primary key (much like a UUID / GUID) for every video-or-similar, cleanly guaranteeing integrity across both (all 3) databases: CLARIFS:
Related:
cc: @mabuelhagag, @avni, @codewiz |
Related IDEA from @deldesir:
|
For metadata.db, according to the calibredb manual, these are the available book fields:
uuid may be a unique identifier for a given book? It may still be helpful to map the relevant columns across the three tables, metadata.db, xklb-metadata.db, app.db to support solving this (i.e., a mini-specification). Foreign keys can be used to map across tables if that is not already being done. |
Recap:
|
I'm seeking clarification on the problem statement, i.e., what does false positive mean, what's a use case that shows the problem? If different videos have the same title, I think that's okay right? Hypothetical use case: 2 different people in different accounts may have uploaded videos with the same title coincidentally. An IIAB user downloads those two videos into their IIAB Calibre-Web instance. When they search for a term that includes the videos' title, we want both of these videos to return. Not clear to me yet what false positive means. |
The currently proposed PR #244 searches thru video transcripts yes, but it incorrectly returns every video that contains the same title. That's an obvious bug, and we should keep very focused on fixing this bug. Recall from 2-3 weeks ago (PR #244 originally from Aug 30) that poorly named variable
That's Line 982 of cps/db.py in PR #244 here: https://github.com/iiab/calibre-web/pull/244/files#diff-204c1e4c10ba05516a4a6ed88fed4a34a133e781e4db0c80a4d96cd64f0b268aR982 (hope that helps?!) [*] This hack of putting search string and video titles together into the very same list... is a fine proof-of-concept for now... but probably should not be merged! 😅 |
Agree
From what I can tell there's a chance that the search results contain the same video multiple times, which is not desired. If there are multiple videos with the same title (coincidentally) and a user is searching for that term, every unique video should be returned. You could/perhaps should also make sure the videos returned are unique (using a PK if available), which is perhaps what this is trying to address? I may be missing something though! |
After speaking with @holta further on Friday, separating out the search string and video titles into separate variables should resolve this so that false positives do not return! 🚀 |
The queries made to xklb-metadata.db return titles to be used as terms to fetch videos records from metadata.db. What if different videos have the same title!
The text was updated successfully, but these errors were encountered: