Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multi-select facets #1329

Merged
merged 4 commits into from
Nov 20, 2023
Merged

Conversation

olovy
Copy link
Contributor

@olovy olovy commented Nov 15, 2023

Disclaimer: names for these concepts are up for discussion.

Add support for multi-select facets. Normally rendered as check boxes in an interface.
The selected facets from the group are OR-ed.

Make a facet group multi-select by setting connective (as in "logical connective") to OR in _statsrepr

Example

{ "dimensionChain": ["issuanceType"], "itemLimit": 100, "connective": "OR" }

Normally configured in https://github.com/libris/definitions/blob/develop/source/apps.jsonld

In the search response the facet (observation) is marked as selected: true/false and the view-link selects/deselects the facet.
Example: Collection and Integrating are selected.

   "observation": [
      {
        "totalItems": 17299,
        "view": { "@id": "/find?q=*&_limit=20&issuanceType=Collection&issuanceType=Integrating&issuanceType=Monograph" },
        "selected": false,
        "object": { "@id": "https://id.kb.se/vocab/Monograph", ... }
      },
      {
        "totalItems": 10,
        "view": { "@id": "/find?q=*&_limit=20&issuanceType=Integrating" },
        "selected": true,
        "object": { "@id": "https://id.kb.se/vocab/Collection", ... }
      }, 
      ...

The only change needed to the client is to render facets as checkboxes when the selected property is present.

Changes

We've of course always supported filtering on multiple values for the same property. That is in fact the default behavior.
The big changes are in how document counts in facets have to be performed.

ES aggregations are calculated on the query result. We now want to have counts on documents that are outside the result set. For example, when "Serial" is selected we should still show the number of documents with "Monograph". Before this change filters were always applied in the filter section of the query. Multi-select facets/filters have now been moved to post_filter [1] which is run after aggregations. This means we still correct result set in the end.

Since multi-facet filters are applied after aggregation we now have to do some filtering inside the aggregations to get the right count. This is done by moving applying the same filters as in post_filter to each aggregation. Except any filter that refers to the same field.

Elastic is supposedly good at caching filter conditions so this should be efficient.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.html#post-filter

Example

Serial and Monograph are selected

"post_filter": { "bool": { "must": [ { "bool": { "should": [
   { "simple_query_string": { "query": "Serial", "fields": ["issuanceType"], "default_operator": "AND" }},
   { "simple_query_string": { "query": "Monograph", "fields": [ "issuanceType" ], "default_operator": "AND" }
}]}}]}}

same filter on other aggs, for instance meta.bibliography.@id.
a is just a dummy name for the nested agg.

"meta.bibliography.@id": {
  "aggs": {
    "a": {
      "terms": {
        "field": "meta.bibliography.@id",
        "size": 100,
        "order": { "_count": "desc" }
      }
    }
  },
  "filter": { "bool": { "must": [ { "bool": { "should": [
    { "simple_query_string": { "query": "Serial", "fields": ["issuanceType"], "default_operator": "AND" }},
    { "simple_query_string": { "query": "Monograph", "fields": [ "issuanceType" ], "default_operator": "AND" }
  }]}}]}}
}

but not on issuanceType agg

"issuanceType": {
  "aggs": {
    "a": {
      "terms": {
        "field": "issuanceType",
        "size": 100,
        "order": { "_count": "desc" }
      }
    }
  },
  "filter": { "bool": { "must": [] } }
}

Future

This works, but we are reaching the point where the query building has to be refactored. For example not passing around queryParameters everywhere but parse everything into a better representation of the state and requested query.

Copy link
Contributor

@lrosenstrom lrosenstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not reviewed in detail but LGTM!

@lrosenstrom
Copy link
Contributor

Also, from a frontend perspective, this is exactly what I need.

@olovy olovy marked this pull request as ready for review November 20, 2023 09:25
@olovy olovy merged commit d60ebb8 into develop Nov 20, 2023
1 check passed
@olovy olovy deleted the feature/multi-select-facets branch November 20, 2023 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants