From 8a54ee74928503525207da8b3d8c298286feb7ad Mon Sep 17 00:00:00 2001 From: thomas nares Date: Mon, 11 Dec 2023 20:19:20 +0100 Subject: [PATCH 1/3] WIP Search index doc --- development/products-lifecycle/_index.md | 9 ++ .../products-lifecycle/search-index.md | 112 ++++++++++++++++++ 2 files changed, 121 insertions(+) create mode 100644 development/products-lifecycle/_index.md create mode 100644 development/products-lifecycle/search-index.md diff --git a/development/products-lifecycle/_index.md b/development/products-lifecycle/_index.md new file mode 100644 index 0000000000..8423f4a40b --- /dev/null +++ b/development/products-lifecycle/_index.md @@ -0,0 +1,9 @@ +--- +title: Products lifecycle +weight: 42 +useMermaid: true +--- + +# Products lifecycle + +{{% children /%}} diff --git a/development/products-lifecycle/search-index.md b/development/products-lifecycle/search-index.md new file mode 100644 index 0000000000..9eaa59eed1 --- /dev/null +++ b/development/products-lifecycle/search-index.md @@ -0,0 +1,112 @@ +--- +title: Products search index +weight: 42 +useMermaid: true +--- + +# Products search index + +In PrestaShop, products are natively indexed for search in database, by keywords. + +When a query is made in a search bar, it is sanitized, splitted in words, and queries are made against `ps_search_word`. + +Then, hits are retrieved from `ps_search_index`, to retrieve matching product ids, and then a sort is made to return relevant products. + +## Search index structure + +
+classDiagram + ps_product <-- ps_search_index + ps_search_word <-- ps_search_index + ps_product : int product_id + ps_product : ... + class ps_search_index{ + int id_product + int id_word + int weight + } + class ps_search_word{ + int id_word + int id_shop + int id_lang + varchar word + } +
+ +## Search index lifecycle + +Several actions can trigger a reindex of a Product in the database: + +| Location | action | indexation type | +| --- | --- | --- | +| PrestaShop installation | injecting fixtures | product | +| PrestaShop installation | installing a theme | full | +| Back Office | creating a product from Back Office | product | +| Back Office | updating a product from Back Office | product | +| Back Office | duplicating a product from Back Office | product | +| Back Office | deleting a product from Back Office | product | +| Back Office | activating a product from Back Office | product | +| Back Office | creating a new shop | full | +| Back Office | installing a theme | full | +| Back Office | requesting an index rebuild | full or missing product only | +| Webservices | creating a product | product | +| Webservices | updating a product | product | +| Crons | refreshing search index | full | + +## Search index fields weights + +Almost every field / information in the product is weighted to fine tune result relevance. + +The weight of fields is adjustable from the Back Office (Shop Parameters > Search) with the configuration keys below: + +| Field | Configuration key | Default weight | Description | +| --- | --- | --- | --- | +| pname | PS_SEARCH_WEIGHT_PNAME | 6 | Product name | +| reference | PS_SEARCH_WEIGHT_REF | 10 | Product reference | +| pa_reference | PS_SEARCH_WEIGHT_REF | 10 | Combination reference | +| supplier_reference | PS_SEARCH_WEIGHT_REF | 10 | Supplier reference +| pa_supplier_reference | PS_SEARCH_WEIGHT_REF | 10 | Combination supplier reference | +| ean13 | PS_SEARCH_WEIGHT_REF | 10 | Product EAN13 | +| pa_ean13 | PS_SEARCH_WEIGHT_REF | 10 | Combination EAN13 | +| isbn | PS_SEARCH_WEIGHT_REF | 10 | Product ISBN | +| pa_isbn | PS_SEARCH_WEIGHT_REF | 10 | Combination ISBN | +| upc | PS_SEARCH_WEIGHT_REF | 10 | Product UPC | +| pa_upc | PS_SEARCH_WEIGHT_REF | 10 | Combination UPC | +| mpn | PS_SEARCH_WEIGHT_REF | 10 | Product MPN | +| pa_mpn | PS_SEARCH_WEIGHT_REF | 10 | Combination MPN | +| description_short | PS_SEARCH_WEIGHT_SHORTDESC | 1 | Product short description | +| description | PS_SEARCH_WEIGHT_DESC | 1 | Product description | +| cname | PS_SEARCH_WEIGHT_CNAME | 3 | Category name | +| mname | PS_SEARCH_WEIGHT_MNAME | 3 | Manufacturer name | +| tags | PS_SEARCH_WEIGHT_TAG | 4 | Product tags | +| attributes | PS_SEARCH_WEIGHT_ATTRIBUTE | 2 | Combinations | +| features | PS_SEARCH_WEIGHT_FEATURE | 2 | Product features | + +## Trigger a Search Index refresh by cron + +To trigger a Search Index refresh by cron, craft a GET url to the Back Office to be called with `curl` (or `wget`, or whatever http request tool), to the Admin controller **AdminSearch**. + +| Param | Value | Description | +| --- | --- | --- | +| action | searchCron | | +| ajax | 1 | | +| full | 1 | If 1, it will rebuild the full index. If 0 or omitted, it will build only missing products | +| token | **tokenValue** | | + +To create the **tokenValue**, you need to retrieve the `_COOKIE_KEY_` constant from the Configuration. + +Then, your token is created by extracting a substring from this constant. + +```php +$tokenValue = substr( + _COOKIE_KEY_, + AdminSearchController::TOKEN_CHECK_START_POS, + AdminSearchController::TOKEN_CHECK_LENGTH +); +``` + +If correctly crafted, your URL should look like: + +``` +https://domain.tld/admin-xxx/index.php?controller=AdminSearch&action=searchCron&ajax=1&full=1&token=xxxxxxxx +``` \ No newline at end of file From 17ba0a394c8e72954ff5fb44ab2272f1efd07690 Mon Sep 17 00:00:00 2001 From: thomas nares Date: Tue, 12 Dec 2023 08:30:40 +0100 Subject: [PATCH 2/3] Improve search index page --- .../products-lifecycle/search-index.md | 68 ++++++++++++------- 1 file changed, 44 insertions(+), 24 deletions(-) diff --git a/development/products-lifecycle/search-index.md b/development/products-lifecycle/search-index.md index 9eaa59eed1..bf204341a2 100644 --- a/development/products-lifecycle/search-index.md +++ b/development/products-lifecycle/search-index.md @@ -10,7 +10,15 @@ In PrestaShop, products are natively indexed for search in database, by keywords When a query is made in a search bar, it is sanitized, splitted in words, and queries are made against `ps_search_word`. -Then, hits are retrieved from `ps_search_index`, to retrieve matching product ids, and then a sort is made to return relevant products. +Then, hits are retrieved from `ps_search_index`, to retrieve matching product ids, and then a weighting and a sort is made to return relevant products. + +
+flowchart TB + id1[Search Query] --> id2[Sanitize,\nremove unwanted words,\nsplit by words] + id2 --> id3[Retrieve `id_word` from `ps_search_word`] + id3 --> id4[Retrieve `id_product` and `weight` from `ps_search_index`] + id4 --> id5[Weight results to return most relevant products] +
## Search index structure @@ -35,7 +43,7 @@ classDiagram ## Search index lifecycle -Several actions can trigger a reindex of a Product in the database: +Several actions can trigger a reindex of a Product or of the complete catalog in the database: | Location | action | indexation type | | --- | --- | --- | @@ -57,34 +65,34 @@ Several actions can trigger a reindex of a Product in the database: Almost every field / information in the product is weighted to fine tune result relevance. -The weight of fields is adjustable from the Back Office (Shop Parameters > Search) with the configuration keys below: +The weight of fields is adjustable from the `Back Office > Shop Parameters > Search` with the configuration keys below: | Field | Configuration key | Default weight | Description | | --- | --- | --- | --- | -| pname | PS_SEARCH_WEIGHT_PNAME | 6 | Product name | -| reference | PS_SEARCH_WEIGHT_REF | 10 | Product reference | -| pa_reference | PS_SEARCH_WEIGHT_REF | 10 | Combination reference | -| supplier_reference | PS_SEARCH_WEIGHT_REF | 10 | Supplier reference -| pa_supplier_reference | PS_SEARCH_WEIGHT_REF | 10 | Combination supplier reference | -| ean13 | PS_SEARCH_WEIGHT_REF | 10 | Product EAN13 | -| pa_ean13 | PS_SEARCH_WEIGHT_REF | 10 | Combination EAN13 | -| isbn | PS_SEARCH_WEIGHT_REF | 10 | Product ISBN | -| pa_isbn | PS_SEARCH_WEIGHT_REF | 10 | Combination ISBN | -| upc | PS_SEARCH_WEIGHT_REF | 10 | Product UPC | -| pa_upc | PS_SEARCH_WEIGHT_REF | 10 | Combination UPC | -| mpn | PS_SEARCH_WEIGHT_REF | 10 | Product MPN | -| pa_mpn | PS_SEARCH_WEIGHT_REF | 10 | Combination MPN | -| description_short | PS_SEARCH_WEIGHT_SHORTDESC | 1 | Product short description | -| description | PS_SEARCH_WEIGHT_DESC | 1 | Product description | -| cname | PS_SEARCH_WEIGHT_CNAME | 3 | Category name | -| mname | PS_SEARCH_WEIGHT_MNAME | 3 | Manufacturer name | -| tags | PS_SEARCH_WEIGHT_TAG | 4 | Product tags | -| attributes | PS_SEARCH_WEIGHT_ATTRIBUTE | 2 | Combinations | -| features | PS_SEARCH_WEIGHT_FEATURE | 2 | Product features | +| pname | `PS_SEARCH_WEIGHT_PNAME` | 6 | Product name | +| reference | `PS_SEARCH_WEIGHT_REF` | 10 | Product reference | +| pa_reference | `PS_SEARCH_WEIGHT_REF` | 10 | Combination reference | +| supplier_reference | `PS_SEARCH_WEIGHT_REF` | 10 | Supplier reference +| pa_supplier_reference | `PS_SEARCH_WEIGHT_REF` | 10 | Combination supplier reference | +| ean13 | `PS_SEARCH_WEIGHT_REF` | 10 | Product EAN13 | +| pa_ean13 | `PS_SEARCH_WEIGHT_REF` | 10 | Combination EAN13 | +| isbn | `PS_SEARCH_WEIGHT_REF` | 10 | Product ISBN | +| pa_isbn | `PS_SEARCH_WEIGHT_REF` | 10 | Combination ISBN | +| upc | `PS_SEARCH_WEIGHT_REF` | 10 | Product UPC | +| pa_upc | `PS_SEARCH_WEIGHT_REF` | 10 | Combination UPC | +| mpn | `PS_SEARCH_WEIGHT_REF` | 10 | Product MPN | +| pa_mpn | `PS_SEARCH_WEIGHT_REF` | 10 | Combination MPN | +| description_short | `PS_SEARCH_WEIGHT_SHORTDESC` | 1 | Product short description | +| description | `PS_SEARCH_WEIGHT_DESC` | 1 | Product description | +| cname | `PS_SEARCH_WEIGHT_CNAME` | 3 | Category name | +| mname | `PS_SEARCH_WEIGHT_MNAME` | 3 | Manufacturer name | +| tags | `PS_SEARCH_WEIGHT_TAG` | 4 | Product tags | +| attributes | `PS_SEARCH_WEIGHT_ATTRIBUTE` | 2 | Combinations | +| features | `PS_SEARCH_WEIGHT_FEATURE` | 2 | Product features | ## Trigger a Search Index refresh by cron -To trigger a Search Index refresh by cron, craft a GET url to the Back Office to be called with `curl` (or `wget`, or whatever http request tool), to the Admin controller **AdminSearch**. +To trigger a Search Index refresh by cron, craft an url to be called with GET method, to the Back Office, to the Admin controller **AdminSearch**. | Param | Value | Description | | --- | --- | --- | @@ -109,4 +117,16 @@ If correctly crafted, your URL should look like: ``` https://domain.tld/admin-xxx/index.php?controller=AdminSearch&action=searchCron&ajax=1&full=1&token=xxxxxxxx +``` + +{{% notice note %}} +You can also find this URL already generated in `Back Office > Shop Parameters > Search > Indexing` +{{% /notice %}} + +Then, your URL can be used by cURL in a cron: + +``` bash +# crontab +# triggers a reindex everyday at 6:00AM +0 6 * * * curl https://domain.tld/admin-xxx/index.php?controller=AdminSearch&action=searchCron&ajax=1&full=1&token=xxxxxxxx ``` \ No newline at end of file From 06a62dc74ced5b80835bda05d70af0fcd1d340e7 Mon Sep 17 00:00:00 2001 From: Thomas NARES Date: Wed, 13 Dec 2023 15:14:37 +0100 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Krystian Podemski --- .../products-lifecycle/search-index.md | 33 ++++--------------- 1 file changed, 6 insertions(+), 27 deletions(-) diff --git a/development/products-lifecycle/search-index.md b/development/products-lifecycle/search-index.md index bf204341a2..ad3dcf244a 100644 --- a/development/products-lifecycle/search-index.md +++ b/development/products-lifecycle/search-index.md @@ -6,11 +6,7 @@ useMermaid: true # Products search index -In PrestaShop, products are natively indexed for search in database, by keywords. - -When a query is made in a search bar, it is sanitized, splitted in words, and queries are made against `ps_search_word`. - -Then, hits are retrieved from `ps_search_index`, to retrieve matching product ids, and then a weighting and a sort is made to return relevant products. +In PrestaShop, product search functionality relies on keyword-based indexing. Each search query entered in the search bar undergoes sanitization and is split into individual words. These words are then matched against the `ps_search_word` table. Matching product IDs are retrieved from the `ps_search_index`, followed by the process of weighting and sorting to deliver the most relevant product results.
flowchart TB @@ -43,7 +39,7 @@ classDiagram ## Search index lifecycle -Several actions can trigger a reindex of a Product or of the complete catalog in the database: +There are several actions that can trigger a reindex of a product or the complete catalog in the database: | Location | action | indexation type | | --- | --- | --- | @@ -92,38 +88,21 @@ The weight of fields is adjustable from the `Back Office > Shop Parameters > Sea ## Trigger a Search Index refresh by cron -To trigger a Search Index refresh by cron, craft an url to be called with GET method, to the Back Office, to the Admin controller **AdminSearch**. +To trigger a search index refresh via cron, create a GET request URL to the Back Office Admin controller, **AdminSearch**. | Param | Value | Description | | --- | --- | --- | | action | searchCron | | | ajax | 1 | | -| full | 1 | If 1, it will rebuild the full index. If 0 or omitted, it will build only missing products | +| full | 1 | If 1, it will rebuild the full index. If 0 or omitted, it will index only missing products | | token | **tokenValue** | | -To create the **tokenValue**, you need to retrieve the `_COOKIE_KEY_` constant from the Configuration. - -Then, your token is created by extracting a substring from this constant. - -```php -$tokenValue = substr( - _COOKIE_KEY_, - AdminSearchController::TOKEN_CHECK_START_POS, - AdminSearchController::TOKEN_CHECK_LENGTH -); -``` - -If correctly crafted, your URL should look like: - -``` -https://domain.tld/admin-xxx/index.php?controller=AdminSearch&action=searchCron&ajax=1&full=1&token=xxxxxxxx -``` {{% notice note %}} -You can also find this URL already generated in `Back Office > Shop Parameters > Search > Indexing` +You can find indexation URL in `Back Office > Shop Parameters > Search > Indexing` {{% /notice %}} -Then, your URL can be used by cURL in a cron: +Instead of manually running the script, you can use an indexation URL with cURL in a crontab. ``` bash # crontab