Amundsen Search now has a new API v2
which supports searching all resources indexed in Amundsen and handles filtered and unfiltered search through the /v2/search
endpoint as opposed to the old endpoints which were defined per resource. This new API also supports updating documents in Elasticsearch through /v2/document
. The old endpoints relied on the ElasticsearchProxy
proxy which doesn't support v2
. The frontend service has been migrated to the v2
API, so the old search proxy cannot be used with amundsen-frontend >= 4.0.0
.
There is ElasticsearchProxyV2
which supports v2
and has feature parity with the old API. However this proxy doesn't include any enhancements beyond multi-valued filters with AND/OR logic because further enhancements (like search fuzziness, stemmings, highlighting, etc.)require new mappings to be created by databuilder.
Finally there is ElasticsearchProxyV2_1
. This latets proxy supports all new search enhacements that rely on new Elasticsearch mappings and it's also accessible throught /v2/search
and /v2/document
. This proxy class is configured by default but it will fall back to ElasticsearchProxyV2
if it cannot find the new mappings in Elasticsearch.
The goal of this tutorial is to explain what the new search functionality offers, how to transition into using /v2/search
search service endpoint out of the box as well as how to customize it ahead of deprecation of the old search endpoints. The updated search service offers fuzziness, word stemming, and boosted search ranking based on usage.
In order for search to function, metadata for all searchable resources configured is indexed into Elasticsearch with a databuilder task and then Elasticsearch is queried via the search service. In Elasticsearch there are two main concepts that are fundamental to understanding how search works, mappings and search queries. Mappings define how fields are indexed into Elasticsearch by specifying the types of fields and how text fields are analyzed. The queries should then be written with the mapping in mind since there certain field types are required to perform certain types of queries efficiently.
In order to ensure a smooth transition from our previous search functionality to a version which supports these new features follow these steps:
- Bump to
amundsen-databuilder >= 6.8.0
- Configure SearchMetadatatoElasticsearchTask
- Make sure to configure it using a different index alias. So for example if previously the configuration for writing new ES indices was (A) for this task (B) must be configured instead for each resource.
(A)
ELASTICSEARCH_ALIAS_CONFIG_KEY → table_search_index
(B)ELASTICSEARCH_ALIAS_CONFIG_KEY → table_search_index_v2_1
- This way the previous index is preserved and the new index aliases have the new mappings that enable all of the functionality mentioned above.
- Note this task will use the mappings already provided in this file and default queries to extract metadata from neo4j. Elasticsearch mappings can be customized by extending the mapping classes and configuring the task to use the custom mapping via
MAPPING_CLASS
Queries to extract metadata from neo4j can be customized and configured throughCYPHER_QUERY_CONFIG_KEY
.
- Make sure to configure it using a different index alias. So for example if previously the configuration for writing new ES indices was (A) for this task (B) must be configured instead for each resource.
(A)
- Run the task.
- Verify that your ES mappings match the new mapping definitions by running this directly on Elasticsearch.
GET new_table_search_index
- Bump to
amundsen-search >= 4.0.0
- Configure the Elasticsearch ES_PROXY_CLIENT to use ELASTICSEARCH_V2_1, which is enabled by default in the latest version of search unless configured differently.
- You can customize the search query by providing a custom client. You can create your own client by extending the class from
es_proxy_v2_1.py
(ex:class MyESClient(ElasticsearchProxyV2_1):
) and overwritting any of the functions provided to change the query.
- You can customize the search query by providing a custom client. You can create your own client by extending the class from
- (OPTIONAL) If the alias your new mappings are indexed under differs from
{resource}_search_index_v2_1
make sure to configure the correct string template by addingES_ALIAS_TEMPLATE = 'my_{resource}_search_index_alias'
to the config with your custom alias name.
- Make sure you are using
amundsen-frontend >= 4.0.0
which calls the search service/v2/search
endpoint.