-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support searching videos [subtitles, captions, transcripts?] #152
base: master
Are you sure you want to change the base?
Conversation
Function to search through subtitles in xklb-metadata.db. The video titles returned as results are used to enhance Calibre-Web's simple search.
This PR needs #140 to work. It's ready for merge. |
# the search_query function below only searches for books titles | ||
result += self.search_query(term_part, config, *join).order_by(*order).all() | ||
# we need to remove duplicates because the same book/video could be found multiple times | ||
result = list(set(result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking myself if (a) user's search term/query (e.g. "feelings") and (b) lists of video/book titles whose subtitles contain that term/query ...can be more crisply/cleanly disambiguated.
@deldesir can you please clarify:
- Is var
result
on Line 971 of cps/db.py a list of Calibre-Web book/video IDs — e.g. Calibre-Web's actual counting numbers like [1, 2, 3, 27] that appear in its web UI? i.e. Is that what functionsearch_query
outputs, do you know...?Lines 909 to 938 in a5486db
def search_query(self, term, config, *join): term.strip().lower() self.session.connection().connection.connection.create_function("lower", 1, lcase) q = list() author_terms = re.split("[, ]+", term) for author_term in author_terms: q.append(Books.authors.any(func.lower(Authors.name).ilike("%" + author_term + "%"))) query = self.generate_linked_query(config.config_read_column, Books) if len(join) == 6: query = query.outerjoin(join[0], join[1]).outerjoin(join[2]).outerjoin(join[3], join[4]).outerjoin(join[5]) if len(join) == 3: query = query.outerjoin(join[0], join[1]).outerjoin(join[2]) elif len(join) == 2: query = query.outerjoin(join[0], join[1]) elif len(join) == 1: query = query.outerjoin(join[0]) cc = self.get_cc_columns(config, filter_config_custom_read=True) filter_expression = [Books.tags.any(func.lower(Tags.name).ilike("%" + term + "%")), Books.series.any(func.lower(Series.name).ilike("%" + term + "%")), Books.authors.any(and_(*q)), Books.publishers.any(func.lower(Publishers.name).ilike("%" + term + "%")), func.lower(Books.title).ilike("%" + term + "%")] for c in cc: if c.datatype not in ["datetime", "rating", "bool", "int", "float"]: filter_expression.append( getattr(Books, 'custom_column_' + str(c.id)).any( func.lower(cc_classes[c.id].value).ilike("%" + term + "%"))) return query.filter(self.common_filters(True)).filter(or_(*filter_expression)) - Or, maybe it's a list of some equivalent book/video pointers within SQLite ?
(Please paste in an actual result
sample, as an example will be extremely useful!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deldesir are all these variables that start with cc
and cc_
about Calibre and/or Calibre-Web "custom columns" ?
(In function search_query
and similar functions, within cps/db.py
?)
And if so, can we mostly ignore those for now?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deldesir inserted stub Line 961 below: (to log the value of variable result
)
result = self.search_query(term, config, *join).order_by(*order).all()
log.debug("***Search results***: {}".format(result))
...yielding tail -f /var/log/calibre-web.log
output:
[2024-05-29 09:54:46,674] INFO {cps.server:268} Starting Tornado server on :8083
[2024-05-29 09:55:01,271] DEBUG {cps.db:961} Search results: [(<Books('Top 5 MISTAKES Beginner Rides Make in TRAFFIC,Top 5 MISTAKES Beginner Rides Make in TRAFFICChaseontwowheels2024-05-29 13:07:08.6832822023-11-15 00:00:001.02024-05-29 13:07:08.683286Chaseontwowheels/Top 5 MISTAKES Beginner Rides Make in TRAFFIC (7)1')>, None, None)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (7)
in the above output definitely appears to be the book/video ID, and other aspects of the result
variable (tags, series, authors, publishers, ETC!) might be more understandable thanks to:
Lines 363 to 390 in a5486db
class Books(Base): | |
__tablename__ = 'books' | |
DEFAULT_PUBDATE = datetime(101, 1, 1, 0, 0, 0, 0) # ("0101-01-01 00:00:00+00:00") | |
id = Column(Integer, primary_key=True, autoincrement=True) | |
title = Column(String(collation='NOCASE'), nullable=False, default='Unknown') | |
sort = Column(String(collation='NOCASE')) | |
author_sort = Column(String(collation='NOCASE')) | |
timestamp = Column(TIMESTAMP, default=datetime.utcnow) | |
pubdate = Column(TIMESTAMP, default=DEFAULT_PUBDATE) | |
series_index = Column(String, nullable=False, default="1.0") | |
last_modified = Column(TIMESTAMP, default=datetime.utcnow) | |
path = Column(String, default="", nullable=False) | |
has_cover = Column(Integer, default=0) | |
uuid = Column(String) | |
isbn = Column(String(collation='NOCASE'), default="") | |
flags = Column(Integer, nullable=False, default=1) | |
authors = relationship(Authors, secondary=books_authors_link, backref='books') | |
tags = relationship(Tags, secondary=books_tags_link, backref='books', order_by="Tags.name") | |
comments = relationship(Comments, backref='books') | |
data = relationship(Data, backref='books') | |
series = relationship(Series, secondary=books_series_link, backref='books') | |
ratings = relationship(Ratings, secondary=books_ratings_link, backref='books') | |
languages = relationship(Languages, secondary=books_languages_link, backref='books') | |
publishers = relationship(Publishers, secondary=books_publishers_link, backref='books') | |
identifiers = relationship(Identifiers, backref='books') |
The search feature using |
Thanks @EMG70. Can you test with an actual word or term spoken in the video. This term should not be part of the title. The idea is to get the right videos by searching for something you heard in the video. This should help when you don't remember in which video you heard this specific term. |
@EMG70, it looks like per your screenshot you forgot to include this PR (#152) in your test. To be successful, your cf. how to test a PR |
Sorry i mistakenly thought by running iiab-update -f would include the patch -26. I am redoing it now. |
A new VM was created and ran PR#152 . The word " nitrogen" which features in the video https://www.youtube.com/watch?v=j2vm9cq9l9Y&t=7s was searched using GUI search,but did not return matches. An attempt was made in Advanced serach box,this returned the correct result. Although advanced search picked up the word nitrogen from the video,I would have expected to see the word nitrogen in the square brackets but only says 1 search result for [] in above screenshot. |
What's happening in your case is your lb-wrapper is not updated. To fix this, do in your vm terminal:
then try seaching again. |
|
@deldesir showed me in a VM that searching (using Calibre-Web interface) takes 2-3 seconds with PR #152, whereas it takes much less than 1 second with PR #244. This is confusing, as an extremely simple Python external call to Mysterious! 🙃 |
There seems to be a slight difference in speed between PR #152, and PR #244. Hard to measure the difference quantitatively from the front-end but it is perceptible. Here are two screen recordings in case this helps. http://192.168.64.33 has PR #152 applied. Screen.Recording.2024-09-04.at.1.04.34.AM.movScreen.Recording.2024-09-04.at.1.06.23.AM.mov |
🚀 Pull Request Overview:
With this PR, searching subtitles is now possible. The video titles returned as results are used to enhance Calibre-Web's simple search.
📋 Checklist:
📌 Testing scenario:
Related to #140
cc @EMG70