More efficient search #89

jstone-dev · 2023-10-16T20:34:36Z

Full-text searches are currently implemented using wildcards; this approach is slow, although it may be faster than the Django version was. Full-text search should be made more efficient.

PostgreSQL's text search capabilities are likely to be sufficient. A separate full-text search engine like Elasticsearch is also a possibility.

Two challenges are anticipated:

Integrating full-text search oriented toward natural languages with substring search. The latter is likely to be needed for many use cases, unless the full-text search engine can be configured to support it.
Running appropriate queries using SQLAlchemy.

nickzoic · 2024-04-09T01:38:40Z

Searching from the homepage is done using the endpoint /api/v1/score-sets/search which mostly comes down to this query in src/mavedb/lib/score_sets.py:

lower_search_text = search.text.lower()
        query = query.filter(
            or_(
                ScoreSet.urn.icontains(lower_search_text),
                ScoreSet.title.icontains(lower_search_text),
                ScoreSet.short_description.icontains(lower_search_text),
                ScoreSet.abstract_text.icontains(lower_search_text),
                ScoreSet.target_genes.any(func.lower(TargetGene.name).icontains(lower_search_text)),
                ScoreSet.target_genes.any(func.lower(TargetGene.category).icontains(lower_search_text)),
                ScoreSet.keyword_objs.any(func.lower(Keyword.text).icontains(lower_search_text)),
                ScoreSet.target_genes.any(
                    TargetGene.target_sequence.has(
                        TargetSequence.reference.has(
                            func.lower(ReferenceGenome.organism_name).icontains(lower_search_text)
                        )
                    )
                ),
                # TODO(#94): add UNIPROT, ENSEMBL, REFSEQ, LICENSE, plus TAX_ID if numeric
                ScoreSet.publication_identifiers.any(
                    func.lower(PublicationIdentifier.identifier).icontains(lower_search_text)
                ),
                ScoreSet.publication_identifiers.any(
                    func.lower(PublicationIdentifier.abstract).icontains(lower_search_text)
                ),
                ScoreSet.publication_identifiers.any(
                    func.lower(PublicationIdentifier.title).icontains(lower_search_text)
                ),
                ScoreSet.publication_identifiers.any(
                    func.lower(PublicationIdentifier.publication_journal).icontains(lower_search_text)
                ),
                ScoreSet.publication_identifiers.any(
                    func.jsonb_path_exists(
                        PublicationIdentifier.authors, f"""$[*].name ? (@ like_regex "{lower_search_text}" flag "i")"""
                    )
                ),
            )
        )

which, as a massive OR of ILIKE clauses over multiple tables is not indexable and thus not efficient as our number of score sets increases. Experiment search has the same issue but fewer clauses. Results can take 3-5 seconds to return.

The links on the homepage eg: https://www.mavedb.org/#/search?search=Homo+sapiens also use score set search and so are very slow.

There's a few approaches possible here:

Caching would be particularly effective for comment searches such as the link on the homepage and maybe for really popular datasets but not very effective for arbitrary queries.
Speeding up that query via a GIN or GIST index on a generated column would be more complicated but also more generally applicable.

just as a little example, at an SQL level, something like:

alter table "scoresets" add column "text_search" tsvector generated always as (to_tsvector('english', title || ' ' || short_description || ' ' || abstract_text)) stored;

create index on scoresets using gin (text_search);

select id from scoresets where text_search @@ to_tsquery('score');

results in a fast bitmap heap scan instead of a slow sequential scan.

A materialized view is might be even more efficient as we could include the other tables, eg something like

create materialized view scoreset_search as select S.id, to_tsvector(S.title || ' ' || S.short_description || ' ' || S.abstract_text || ' ' || string_agg(G.name, ' ')) as text from scoresets S join target_genes G on G.scoreset_id = S.id group by S.id;
create index scoreset_search_idx on scoreset_search using gin (text);
select id from scoreset_search where text @@ to_tsquery('score');

we could also include whichever fields are needed in the search response to avoid joining back to scoresets. The downside is we'd have to periodically refresh materialized view scoreset_search and that changes to the contents of the table(s) would take time to be reflected in the materialized view so there's some corner cases there.

nickzoic · 2024-10-07T23:07:15Z

(one year later) Perhaps a combination of full text indexing w/ GIN index and some hooks in the API level to denormalize genome name and whatever else into field(s) on scoresets: that would get us the multi-table-spanning search without having to mess around with materialized views and should end up being very fast.

Related: #94
Related: VariantEffect/mavedb-ui#157

nickzoic · 2024-10-15T02:59:37Z

At an SQL level, a query for Homo sapiens ends up with:

SELECT experiments.id, experiments.urn, experiments.title, experiments.short_description, experiments.abstract_text, experiments.method_text, experiments.extra_metadata, experiments.private, experiments.approved, experiments.published_date, experiments.processing_state, experiments.num_scoresets, experiments.experiment_set_id, experiments.created_by_id, experiments.modified_by_id, experiments.creation_date, experiments.modification_date, scoresets.id AS id_1, scoresets.urn AS urn_1, scoresets.title AS title_1, scoresets.method_text AS method_text_1, scoresets.abstract_text AS abstract_text_1, scoresets.short_description AS short_description_1, scoresets.extra_metadata AS extra_metadata_1, scoresets.dataset_columns, scoresets.external_links, scoresets.normalised, scoresets.private AS private_1, scoresets.approved AS approved_1, scoresets.published_date AS published_date_1, scoresets.processing_state AS processing_state_1, scoresets.processing_errors, scoresets.data_usage_policy, scoresets.num_variants, scoresets.experiment_id, scoresets.licence_id, scoresets.replaces_id, scoresets.created_by_id AS created_by_id_1, scoresets.modified_by_id AS modified_by_id_1, scoresets.creation_date AS creation_date_1, scoresets.modification_date AS modification_date_1, experiment_sets_1.id AS id_2, experiment_sets_1.urn AS urn_2, experiment_sets_1.extra_metadata AS extra_metadata_2, experiment_sets_1.private AS private_2, experiment_sets_1.approved AS approved_2, experiment_sets_1.published_date AS published_date_2, experiment_sets_1.processing_state AS processing_state_2, experiment_sets_1.num_experiments, experiment_sets_1.created_by_id AS created_by_id_2, experiment_sets_1.modified_by_id AS modified_by_id_2, experiment_sets_1.creation_date AS creation_date_2, experiment_sets_1.modification_date AS modification_date_2, users_1.id AS id_3, users_1.username, users_1.first_name, users_1.last_name, users_1.email, users_1.is_superuser, users_1.is_staff, users_1.is_active, users_1.date_joined, users_1.is_first_login, users_1.last_login, users_2.id AS id_4, users_2.username AS username_1, users_2.first_name AS first_name_1, users_2.last_name AS last_name_1, users_2.email AS email_1, users_2.is_superuser AS is_superuser_1, users_2.is_staff AS is_staff_1, users_2.is_active AS is_active_1, users_2.date_joined AS date_joined_1, users_2.is_first_login AS is_first_login_1, users_2.last_login AS last_login_1, keywords_1.id AS id_5, keywords_1.text, keywords_1.creation_date AS creation_date_3, keywords_1.modification_date AS modification_date_3, doi_identifiers_1.id AS id_6, doi_identifiers_1.identifier, doi_identifiers_1.db_name, doi_identifiers_1.db_version, doi_identifiers_1.url, doi_identifiers_1.creation_date AS creation_date_4, doi_identifiers_1.modification_date AS modification_date_4, publication_identifiers_1.id AS id_7, publication_identifiers_1.identifier AS identifier_1, publication_identifiers_1.db_name AS db_name_1, publication_identifiers_1.db_version AS db_version_1, publication_identifiers_1.title AS title_2, publication_identifiers_1.abstract, publication_identifiers_1.authors, publication_identifiers_1.doi, publication_identifiers_1.publication_year, publication_identifiers_1.publication_journal, publication_identifiers_1.url AS url_1, publication_identifiers_1.reference_html, publication_identifiers_1.creation_date AS creation_date_5, publication_identifiers_1.modification_date AS modification_date_5, experiment_publication_identifiers_1.experiment_id AS experiment_id_1, experiment_publication_identifiers_1.publication_identifier_id, experiment_publication_identifiers_1."primary", sra_identifiers_1.id AS id_8, sra_identifiers_1.identifier AS identifier_2, sra_identifiers_1.db_name AS db_name_2, sra_identifiers_1.db_version AS db_version_2, sra_identifiers_1.url AS url_2, sra_identifiers_1.creation_date AS creation_date_6, sra_identifiers_1.modification_date AS modification_date_6, licenses_1.id AS id_9, licenses_1.long_name, licenses_1.short_name, licenses_1.text AS text_1, licenses_1.link, licenses_1.version, licenses_1.creation_date AS creation_date_7, licenses_1.modification_date AS modification_date_7, keywords_2.id AS id_10, keywords_2.text AS text_2, keywords_2.creation_date AS creation_date_8, keywords_2.modification_date AS modification_date_8, doi_identifiers_2.id AS id_11, doi_identifiers_2.identifier AS identifier_3, doi_identifiers_2.db_name AS db_name_3, doi_identifiers_2.db_version AS db_version_3, doi_identifiers_2.url AS url_3, doi_identifiers_2.creation_date AS creation_date_9, doi_identifiers_2.modification_date AS modification_date_9, publication_identifiers_2.id AS id_12, publication_identifiers_2.identifier AS identifier_4, publication_identifiers_2.db_name AS db_name_4, publication_identifiers_2.db_version AS db_version_4, publication_identifiers_2.title AS title_3, publication_identifiers_2.abstract AS abstract_1, publication_identifiers_2.authors AS authors_1, publication_identifiers_2.doi AS doi_1, publication_identifiers_2.publication_year AS publication_year_1, publication_identifiers_2.publication_journal AS publication_journal_1, publication_identifiers_2.url AS url_4, publication_identifiers_2.reference_html AS reference_html_1, publication_identifiers_2.creation_date AS creation_date_10, publication_identifiers_2.modification_date AS modification_date_10, scoreset_publication_identifiers_1.scoreset_id, scoreset_publication_identifiers_1.publication_identifier_id AS publication_identifier_id_1, scoreset_publication_identifiers_1."primary" AS primary_1, taxonomies_1.id AS id_13, taxonomies_1.tax_id, taxonomies_1.organism_name, taxonomies_1.common_name, taxonomies_1.rank, taxonomies_1.has_described_species_name, taxonomies_1.url AS url_5, taxonomies_1.article_reference, taxonomies_1.genome_identifier_id, taxonomies_1.creation_date AS creation_date_11, taxonomies_1.modification_date AS modification_date_11, target_sequences_1.id AS id_14, target_sequences_1.sequence_type, target_sequences_1.sequence, target_sequences_1.label, target_sequences_1.taxonomy_id, target_sequences_1.creation_date AS creation_date_12, target_sequences_1.modification_date AS modification_date_12, target_accessions_1.id AS id_15, target_accessions_1.assembly, target_accessions_1.accession, target_accessions_1.gene, target_accessions_1.creation_date AS creation_date_13, target_accessions_1.modification_date AS modification_date_13, ensembl_identifiers_1.id AS id_16, ensembl_identifiers_1.identifier AS identifier_5, ensembl_identifiers_1.db_name AS db_name_5, ensembl_identifiers_1.db_version AS db_version_5, ensembl_identifiers_1.url AS url_6, ensembl_identifiers_1.creation_date AS creation_date_14, ensembl_identifiers_1.modification_date AS modification_date_14, ensembl_offsets_1.identifier_id, ensembl_offsets_1.target_gene_id, ensembl_offsets_1."offset", refseq_identifiers_1.id AS id_17, refseq_identifiers_1.identifier AS identifier_6, refseq_identifiers_1.db_name AS db_name_6, refseq_identifiers_1.db_version AS db_version_6, refseq_identifiers_1.url AS url_7, refseq_identifiers_1.reference_html AS reference_html_2, refseq_identifiers_1.creation_date AS creation_date_15, refseq_identifiers_1.modification_date AS modification_date_15, refseq_offsets_1.identifier_id AS identifier_id_1, refseq_offsets_1.target_gene_id AS target_gene_id_1, refseq_offsets_1."offset" AS offset_1, uniprot_identifiers_1.id AS id_18, uniprot_identifiers_1.identifier AS identifier_7, uniprot_identifiers_1.db_name AS db_name_7, uniprot_identifiers_1.db_version AS db_version_7, uniprot_identifiers_1.url AS url_8, uniprot_identifiers_1.reference_html AS reference_html_3, uniprot_identifiers_1.creation_date AS creation_date_16, uniprot_identifiers_1.modification_date AS modification_date_16, uniprot_offsets_1.identifier_id AS identifier_id_2, uniprot_offsets_1.target_gene_id AS target_gene_id_2, uniprot_offsets_1."offset" AS offset_2, target_genes_1.id AS id_19, target_genes_1.name, target_genes_1.category, target_genes_1.scoreset_id AS scoreset_id_1, target_genes_1.target_sequence_id, target_genes_1.accession_id, target_genes_1.creation_date AS creation_date_17, target_genes_1.modification_date AS modification_date_17 
FROM scoresets JOIN experiments ON experiments.id = scoresets.experiment_id LEFT OUTER JOIN experiment_sets AS experiment_sets_1 ON experiment_sets_1.id = experiments.experiment_set_id LEFT OUTER JOIN users AS users_1 ON users_1.id = experiments.created_by_id LEFT OUTER JOIN users AS users_2 ON users_2.id = experiments.modified_by_id LEFT OUTER JOIN (experiment_keywords AS experiment_keywords_1 JOIN keywords AS keywords_1 ON keywords_1.id = experiment_keywords_1.keyword_id) ON experiments.id = experiment_keywords_1.experiment_id LEFT OUTER JOIN (experiment_doi_identifiers AS experiment_doi_identifiers_1 JOIN doi_identifiers AS doi_identifiers_1 ON doi_identifiers_1.id = experiment_doi_identifiers_1.doi_identifier_id) ON experiments.id = experiment_doi_identifiers_1.experiment_id LEFT OUTER JOIN experiment_publication_identifiers AS experiment_publication_identifiers_1 ON experiments.id = experiment_publication_identifiers_1.experiment_id LEFT OUTER JOIN publication_identifiers AS publication_identifiers_1 ON publication_identifiers_1.id = experiment_publication_identifiers_1.publication_identifier_id LEFT OUTER JOIN (experiment_sra_identifiers AS experiment_sra_identifiers_1 JOIN sra_identifiers AS sra_identifiers_1 ON sra_identifiers_1.id = experiment_sra_identifiers_1.sra_identifier_id) ON experiments.id = experiment_sra_identifiers_1.experiment_id LEFT OUTER JOIN licenses AS licenses_1 ON licenses_1.id = scoresets.licence_id LEFT OUTER JOIN (scoreset_keywords AS scoreset_keywords_1 JOIN keywords AS keywords_2 ON keywords_2.id = scoreset_keywords_1.keyword_id) ON scoresets.id = scoreset_keywords_1.scoreset_id LEFT OUTER JOIN (scoreset_doi_identifiers AS scoreset_doi_identifiers_1 JOIN doi_identifiers AS doi_identifiers_2 ON doi_identifiers_2.id = scoreset_doi_identifiers_1.doi_identifier_id) ON scoresets.id = scoreset_doi_identifiers_1.scoreset_id LEFT OUTER JOIN scoreset_publication_identifiers AS scoreset_publication_identifiers_1 ON scoresets.id = scoreset_publication_identifiers_1.scoreset_id LEFT OUTER JOIN publication_identifiers AS publication_identifiers_2 ON publication_identifiers_2.id = scoreset_publication_identifiers_1.publication_identifier_id LEFT OUTER JOIN target_genes AS target_genes_1 ON scoresets.id = target_genes_1.scoreset_id LEFT OUTER JOIN target_sequences AS target_sequences_1 ON target_sequences_1.id = target_genes_1.target_sequence_id LEFT OUTER JOIN taxonomies AS taxonomies_1 ON taxonomies_1.id = target_sequences_1.taxonomy_id LEFT OUTER JOIN target_accessions AS target_accessions_1 ON target_accessions_1.id = target_genes_1.accession_id LEFT OUTER JOIN ensembl_offsets AS ensembl_offsets_1 ON target_genes_1.id = ensembl_offsets_1.target_gene_id LEFT OUTER JOIN ensembl_identifiers AS ensembl_identifiers_1 ON ensembl_identifiers_1.id = ensembl_offsets_1.identifier_id LEFT OUTER JOIN refseq_offsets AS refseq_offsets_1 ON target_genes_1.id = refseq_offsets_1.target_gene_id LEFT OUTER JOIN refseq_identifiers AS refseq_identifiers_1 ON refseq_identifiers_1.id = refseq_offsets_1.identifier_id LEFT OUTER JOIN uniprot_offsets AS uniprot_offsets_1 ON target_genes_1.id = uniprot_offsets_1.target_gene_id LEFT OUTER JOIN uniprot_identifiers AS uniprot_identifiers_1 ON uniprot_identifiers_1.id = uniprot_offsets_1.identifier_id 
WHERE NOT (EXISTS (SELECT 1 
FROM scoresets AS scoresets_1 
WHERE scoresets.id = scoresets_1.replaces_id)) AND ((lower(scoresets.urn) LIKE '%' || lower('homo sapiens') || '%') OR (lower(scoresets.title) LIKE '%' || lower('homo sapiens') || '%') OR (lower(scoresets.short_description) LIKE '%' || lower('homo sapiens') || '%') OR (lower(scoresets.abstract_text) LIKE '%' || lower('homo sapiens') || '%') OR (EXISTS (SELECT 1 
FROM target_genes 
WHERE scoresets.id = target_genes.scoreset_id AND (lower(lower(target_genes.name)) LIKE '%' || lower('homo sapiens') || '%'))) OR (EXISTS (SELECT 1 
FROM target_genes 
WHERE scoresets.id = target_genes.scoreset_id AND (lower(lower(target_genes.category)) LIKE '%' || lower('homo sapiens') || '%'))) OR (EXISTS (SELECT 1 
FROM keywords, scoreset_keywords 
WHERE scoresets.id = scoreset_keywords.scoreset_id AND keywords.id = scoreset_keywords.keyword_id AND (lower(lower(keywords.text)) LIKE '%' || lower('homo sapiens') || '%'))) OR (EXISTS (SELECT 1 
FROM target_genes 
WHERE scoresets.id = target_genes.scoreset_id AND (EXISTS (SELECT 1 
FROM target_sequences 
WHERE target_sequences.id = target_genes.target_sequence_id AND (EXISTS (SELECT 1 
FROM taxonomies 
WHERE taxonomies.id = target_sequences.taxonomy_id AND (lower(lower(taxonomies.organism_name)) LIKE '%' || lower('homo sapiens') || '%'))))))) OR (EXISTS (SELECT 1 
FROM target_genes 
WHERE scoresets.id = target_genes.scoreset_id AND (EXISTS (SELECT 1 
FROM target_sequences 
WHERE target_sequences.id = target_genes.target_sequence_id AND (EXISTS (SELECT 1 
FROM taxonomies 
WHERE taxonomies.id = target_sequences.taxonomy_id AND (lower(lower(taxonomies.common_name)) LIKE '%' || lower('homo sapiens') || '%'))))))) OR (EXISTS (SELECT 1 
FROM target_genes 
WHERE scoresets.id = target_genes.scoreset_id AND (EXISTS (SELECT 1 
FROM target_accessions 
WHERE target_accessions.id = target_genes.accession_id AND (lower(lower(target_accessions.assembly)) LIKE '%' || lower('homo sapiens') || '%'))))) OR (EXISTS (SELECT 1 
FROM scoreset_publication_identifiers 
WHERE scoresets.id = scoreset_publication_identifiers.scoreset_id AND (EXISTS (SELECT 1 
FROM publication_identifiers 
WHERE publication_identifiers.id = scoreset_publication_identifiers.publication_identifier_id AND (lower(lower(publication_identifiers.identifier)) LIKE '%' || lower('homo sapiens') || '%'))))) OR (EXISTS (SELECT 1 
FROM scoreset_publication_identifiers 
WHERE scoresets.id = scoreset_publication_identifiers.scoreset_id AND (EXISTS (SELECT 1 
FROM publication_identifiers 
WHERE publication_identifiers.id = scoreset_publication_identifiers.publication_identifier_id AND (lower(lower(publication_identifiers.doi)) LIKE '%' || lower('homo sapiens') || '%'))))) OR (EXISTS (SELECT 1 
FROM scoreset_publication_identifiers 
WHERE scoresets.id = scoreset_publication_identifiers.scoreset_id AND (EXISTS (SELECT 1 
FROM publication_identifiers 
WHERE publication_identifiers.id = scoreset_publication_identifiers.publication_identifier_id AND (lower(lower(publication_identifiers.abstract)) LIKE '%' || lower('homo sapiens') || '%'))))) OR (EXISTS (SELECT 1 
FROM scoreset_publication_identifiers 
WHERE scoresets.id = scoreset_publication_identifiers.scoreset_id AND (EXISTS (SELECT 1 
FROM publication_identifiers 
WHERE publication_identifiers.id = scoreset_publication_identifiers.publication_identifier_id AND (lower(lower(publication_identifiers.title)) LIKE '%' || lower('homo sapiens') || '%'))))) OR (EXISTS (SELECT 1 
FROM scoreset_publication_identifiers 
WHERE scoresets.id = scoreset_publication_identifiers.scoreset_id AND (EXISTS (SELECT 1 
FROM publication_identifiers 
WHERE publication_identifiers.id = scoreset_publication_identifiers.publication_identifier_id AND (lower(lower(publication_identifiers.publication_journal)) LIKE '%' || lower('homo sapiens') || '%'))))) OR (EXISTS (SELECT 1 
FROM scoreset_publication_identifiers 
WHERE scoresets.id = scoreset_publication_identifiers.scoreset_id AND (EXISTS (SELECT 1 
FROM publication_identifiers 
WHERE publication_identifiers.id = scoreset_publication_identifiers.publication_identifier_id AND jsonb_path_exists(publication_identifiers.authors, '$[*].name ? (@ like_regex "homo sapiens" flag "i")'))))) OR (EXISTS (SELECT 1 
FROM doi_identifiers, scoreset_doi_identifiers 
WHERE scoresets.id = scoreset_doi_identifiers.scoreset_id AND doi_identifiers.id = scoreset_doi_identifiers.doi_identifier_id AND (lower(lower(doi_identifiers.identifier)) LIKE '%' || lower('homo sapiens') || '%'))) OR (EXISTS (SELECT 1 
FROM target_genes 
WHERE scoresets.id = target_genes.scoreset_id AND (EXISTS (SELECT 1 
FROM uniprot_offsets 
WHERE target_genes.id = uniprot_offsets.target_gene_id AND (EXISTS (SELECT 1 
FROM uniprot_identifiers 
WHERE uniprot_identifiers.id = uniprot_offsets.identifier_id AND (lower(lower(uniprot_identifiers.identifier)) LIKE '%' || lower('homo sapiens') || '%'))))))) OR (EXISTS (SELECT 1 
FROM target_genes 
WHERE scoresets.id = target_genes.scoreset_id AND (EXISTS (SELECT 1 
FROM refseq_offsets 
WHERE target_genes.id = refseq_offsets.target_gene_id AND (EXISTS (SELECT 1 
FROM refseq_identifiers 
WHERE refseq_identifiers.id = refseq_offsets.identifier_id AND (lower(lower(refseq_identifiers.identifier)) LIKE '%' || lower('homo sapiens') || '%'))))))) OR (EXISTS (SELECT 1 
FROM target_genes 
WHERE scoresets.id = target_genes.scoreset_id AND (EXISTS (SELECT 1 
FROM ensembl_offsets 
WHERE target_genes.id = ensembl_offsets.target_gene_id AND (EXISTS (SELECT 1 
FROM ensembl_identifiers 
WHERE ensembl_identifiers.id = ensembl_offsets.identifier_id AND (lower(lower(ensembl_identifiers.identifier)) LIKE '%' || lower('homo sapiens') || '%')))))))) ORDER BY experiments.title

this takes about 8 seconds on my machine from HTTP and about 5.3 seconds from psql so it could use some work.
It also looks like some fields which are getting retrieved aren't needed for serialization: eg the full license text is getting retrieved by joinedload(ScoreSet.license) when it isn't probably needed.

EstelleDa · 2024-10-15T04:34:35Z

So long the query!!

nickzoic · 2024-10-18T04:27:04Z

It is indeed! Ok so https://github.com/VariantEffect/mavedb-api/tree/nick/89-efficient-search is a start on this,
most of the code is in https://github.com/VariantEffect/mavedb-api/blob/efa7dc3a407b8a7062ef510f2dd3b1b38d4fc625/src/mavedb/models/score_set_fulltext.py

Build a materialized view scoreset_fulltext which contains denormalization of all the relevant fields as ts_vectors.
use a GIN index over the ts_vectors for fast lookups.
translate "text" queries to GIN lookups using postgres websearch_to_tsquery
Refresh materialized view when scoresets are updated

What's missing (as of right now):

Refresh materialized view asynchronously when scoresets are updated.
Tidy up other aspects of the search query.

This is probably easiest to do using the arq / worker system we already have, although there are other ways.

jstone-dev added the type: enhancement Enhancement to an existing feature label Oct 16, 2023

nickzoic self-assigned this Apr 8, 2024

jstone-dev changed the title ~~More efficient full-text search~~ More efficient search Apr 15, 2024

bencap added app: backend Task implementation touches the backend app: database Task implementation requires database changes labels Oct 1, 2024

This was referenced Oct 7, 2024

search results overwritten by previous (unfinished) request VariantEffect/mavedb-ui#157

Open

Advanced Search #335

Open

nickzoic linked a pull request Oct 22, 2024 that will close this issue

Nick/89 efficient search #343

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More efficient search #89

More efficient search #89

jstone-dev commented Oct 16, 2023

nickzoic commented Apr 9, 2024 •

edited

Loading

nickzoic commented Oct 7, 2024 •

edited

Loading

nickzoic commented Oct 15, 2024 •

edited

Loading

EstelleDa commented Oct 15, 2024

nickzoic commented Oct 18, 2024 •

edited

Loading

More efficient search #89

More efficient search #89

Comments

jstone-dev commented Oct 16, 2023

nickzoic commented Apr 9, 2024 • edited Loading

nickzoic commented Oct 7, 2024 • edited Loading

nickzoic commented Oct 15, 2024 • edited Loading

EstelleDa commented Oct 15, 2024

nickzoic commented Oct 18, 2024 • edited Loading

nickzoic commented Apr 9, 2024 •

edited

Loading

nickzoic commented Oct 7, 2024 •

edited

Loading

nickzoic commented Oct 15, 2024 •

edited

Loading

nickzoic commented Oct 18, 2024 •

edited

Loading