Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support searching videos [subtitles, captions, transcripts?] #152

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
15 changes: 13 additions & 2 deletions cps/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
from flask_babel import get_locale
from flask import flash

from . import logger, ub, isoLanguages
from . import logger, ub, isoLanguages, lb_search
from .pagination import Pagination

from weakref import WeakSet
Expand Down Expand Up @@ -957,7 +957,18 @@ def get_cc_columns(self, config, filter_config_custom_read=False):
def get_search_results(self, term, config, offset=None, order=None, limit=None, *join):
order = order[0] if order else [Books.sort]
pagination = None
result = self.search_query(term, config, *join).order_by(*order).all()

# search also through the subtitles (for videos)
other_terms = lb_search.get_search_terms(term)
# lb_search.get_search_terms returns a list of video titles, "term" parameter is expected to be a book/video title
term = [term] + other_terms
holta marked this conversation as resolved.
Show resolved Hide resolved

result = list()
for term_part in term:
# the search_query function below only searches for books titles
result += self.search_query(term_part, config, *join).order_by(*order).all()
# we need to remove duplicates because the same book/video could be found multiple times
result = list(set(result))
Copy link
Member

@holta holta May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking myself if (a) user's search term/query (e.g. "feelings") and (b) lists of video/book titles whose subtitles contain that term/query ...can be more crisply/cleanly disambiguated.

@deldesir can you please clarify:

  • Is var result on Line 971 of cps/db.py a list of Calibre-Web book/video IDs — e.g. Calibre-Web's actual counting numbers like [1, 2, 3, 27] that appear in its web UI? i.e. Is that what function search_query outputs, do you know...?

    calibre-web/cps/db.py

    Lines 909 to 938 in a5486db

    def search_query(self, term, config, *join):
    term.strip().lower()
    self.session.connection().connection.connection.create_function("lower", 1, lcase)
    q = list()
    author_terms = re.split("[, ]+", term)
    for author_term in author_terms:
    q.append(Books.authors.any(func.lower(Authors.name).ilike("%" + author_term + "%")))
    query = self.generate_linked_query(config.config_read_column, Books)
    if len(join) == 6:
    query = query.outerjoin(join[0], join[1]).outerjoin(join[2]).outerjoin(join[3], join[4]).outerjoin(join[5])
    if len(join) == 3:
    query = query.outerjoin(join[0], join[1]).outerjoin(join[2])
    elif len(join) == 2:
    query = query.outerjoin(join[0], join[1])
    elif len(join) == 1:
    query = query.outerjoin(join[0])
    cc = self.get_cc_columns(config, filter_config_custom_read=True)
    filter_expression = [Books.tags.any(func.lower(Tags.name).ilike("%" + term + "%")),
    Books.series.any(func.lower(Series.name).ilike("%" + term + "%")),
    Books.authors.any(and_(*q)),
    Books.publishers.any(func.lower(Publishers.name).ilike("%" + term + "%")),
    func.lower(Books.title).ilike("%" + term + "%")]
    for c in cc:
    if c.datatype not in ["datetime", "rating", "bool", "int", "float"]:
    filter_expression.append(
    getattr(Books,
    'custom_column_' + str(c.id)).any(
    func.lower(cc_classes[c.id].value).ilike("%" + term + "%")))
    return query.filter(self.common_filters(True)).filter(or_(*filter_expression))
  • Or, maybe it's a list of some equivalent book/video pointers within SQLite ?

(Please paste in an actual result sample, as an example will be extremely useful!)

Copy link
Member

@holta holta May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deldesir are all these variables that start with cc and cc_ about Calibre and/or Calibre-Web "custom columns" ?

(In function search_query and similar functions, within cps/db.py ?)

And if so, can we mostly ignore those for now?!

Copy link
Member

@holta holta May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deldesir inserted stub Line 961 below: (to log the value of variable result)

        result = self.search_query(term, config, *join).order_by(*order).all()
        log.debug("***Search results***: {}".format(result))

...yielding tail -f /var/log/calibre-web.log output:

[2024-05-29 09:54:46,674] INFO {cps.server:268} Starting Tornado server on :8083
[2024-05-29 09:55:01,271] DEBUG {cps.db:961} Search results: [(<Books('Top 5 MISTAKES Beginner Rides Make in TRAFFIC,Top 5 MISTAKES Beginner Rides Make in TRAFFICChaseontwowheels2024-05-29 13:07:08.6832822023-11-15 00:00:001.02024-05-29 13:07:08.683286Chaseontwowheels/Top 5 MISTAKES Beginner Rides Make in TRAFFIC (7)1')>, None, None)]

Copy link
Member

@holta holta May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (7) in the above output definitely appears to be the book/video ID, and other aspects of the result variable (tags, series, authors, publishers, ETC!) might be more understandable thanks to:

calibre-web/cps/db.py

Lines 363 to 390 in a5486db

class Books(Base):
__tablename__ = 'books'
DEFAULT_PUBDATE = datetime(101, 1, 1, 0, 0, 0, 0) # ("0101-01-01 00:00:00+00:00")
id = Column(Integer, primary_key=True, autoincrement=True)
title = Column(String(collation='NOCASE'), nullable=False, default='Unknown')
sort = Column(String(collation='NOCASE'))
author_sort = Column(String(collation='NOCASE'))
timestamp = Column(TIMESTAMP, default=datetime.utcnow)
pubdate = Column(TIMESTAMP, default=DEFAULT_PUBDATE)
series_index = Column(String, nullable=False, default="1.0")
last_modified = Column(TIMESTAMP, default=datetime.utcnow)
path = Column(String, default="", nullable=False)
has_cover = Column(Integer, default=0)
uuid = Column(String)
isbn = Column(String(collation='NOCASE'), default="")
flags = Column(Integer, nullable=False, default=1)
authors = relationship(Authors, secondary=books_authors_link, backref='books')
tags = relationship(Tags, secondary=books_tags_link, backref='books', order_by="Tags.name")
comments = relationship(Comments, backref='books')
data = relationship(Data, backref='books')
series = relationship(Series, secondary=books_series_link, backref='books')
ratings = relationship(Ratings, secondary=books_ratings_link, backref='books')
languages = relationship(Languages, secondary=books_languages_link, backref='books')
publishers = relationship(Publishers, secondary=books_publishers_link, backref='books')
identifiers = relationship(Identifiers, backref='books')

result_count = len(result)
if offset != None and limit != None:
offset = int(offset)
Expand Down
31 changes: 31 additions & 0 deletions cps/lb_search.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import os
import re

from . import logger
from .constants import XKLB_DB_FILE
from .subproc_wrapper import process_open

log = logger.create()

def get_search_terms(term):
"""Perform a search against xklb-metadata.db"""
video_titles = []
lb_executable = os.getenv("LB_WRAPPER", "lb-wrapper")

if term:
subprocess_args = [lb_executable, "search", term]
log.debug("Executing: %s", subprocess_args)

try:
p = process_open(subprocess_args, newlines=True)
stdout, stderr = p.communicate()
if p.returncode != 0:
log.error("Error executing lb-wrapper: %s", stderr)
return video_titles
pattern = r"^[^\d\n].*?(?= - )"
matches = re.findall(pattern, stdout, re.MULTILINE)
video_titles.extend(matches)
except Exception as ex:
log.error("Error executing lb-wrapper: %s", ex)

return video_titles