Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use openfoodfacts-query service for facet queries instead of product_tags collection #8947

Merged
merged 31 commits into from
Oct 9, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
335e165
Call local postgres service instead of mongo
john-gom Jul 24, 2023
d541056
Initial count support
john-gom Aug 7, 2023
de06ea3
Removing unused code
john-gom Aug 7, 2023
f68e5cf
Separate queries so easier to mock results
john-gom Aug 7, 2023
8377b6c
Make QUERY_URL configurable
john-gom Aug 7, 2023
9e29c3b
Removing references to products_tags
john-gom Aug 14, 2023
b545ee8
Note and tweaks
john-gom Aug 14, 2023
233a7ab
Perl tidy fixes
john-gom Aug 14, 2023
1cf12fc
Get from query first then fallback
john-gom Aug 21, 2023
77cba44
Merge remote-tracking branch 'origin/main' into issue/8676
john-gom Aug 21, 2023
85a8e97
Fix merge issue
john-gom Aug 22, 2023
42275c7
Update port
john-gom Aug 22, 2023
b02b765
Tidy up comment
john-gom Aug 22, 2023
11b84e6
Merge remote-tracking branch 'origin/main' into issue/8676
john-gom Sep 4, 2023
5e3055f
Merge branch 'main' into issue/8676
john-gom Sep 4, 2023
c8c8704
Fix comment block
john-gom Sep 13, 2023
abe07e6
Remove commented code that prevented caching
john-gom Sep 13, 2023
a93375d
Removed $@ reference as not being used
john-gom Sep 13, 2023
f53d99b
Move estimate count into a separate function
john-gom Sep 13, 2023
5b83916
Merge branch 'main' into issue/8676
john-gom Sep 13, 2023
b4bf580
Check fixes
john-gom Sep 13, 2023
eb6bbe2
extra_hosts for Linux
john-gom Sep 15, 2023
ad2c868
Merge remote-tracking branch 'origin/main' into issue/8676
john-gom Sep 15, 2023
8e98741
Allow Produuct Opener to continue to use product_tags during transition
john-gom Sep 19, 2023
600ddd3
Perltidy fixes
john-gom Sep 19, 2023
fa5b2a5
Merge remote-tracking branch 'origin/main' into issue/8676
john-gom Oct 2, 2023
894b9c5
Prevent errors when query_urll not set on refresh_product_tags
john-gom Oct 2, 2023
612a563
Set query_url for staging
john-gom Oct 2, 2023
d327c37
Merge remote-tracking branch 'origin/main' into issue/8676
john-gom Oct 2, 2023
6a303fc
Merge branch 'main' into issue/8676
john-gom Oct 9, 2023
f311104
Make sure product_tags is not used when no_cache=1
john-gom Oct 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ MONGODB_CACHE_SIZE=8 # GB
MONGO_INITDB_ROOT_USERNAME=root
MONGO_INITDB_ROOT_PASSWORD=test
ROBOTOFF_URL=http://robotoff.openfoodfacts.localhost:5500 # connect to Robotoff running in separate docker-compose deployment
# connect to openfoodfacts-query running in separate docker-compose deployment.
# To test locally change to http://host.docker.internal:5510
QUERY_URL=http://query:5510
EVENTS_URL=
FACETS_KP_URL = https://facets-kp.openfoodfacts.org/render-to-html
# use this to push products to openfoodfacts-search
Expand Down
1 change: 0 additions & 1 deletion .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ nginx:

mongodb:
- conf/mongodb/create_indexes.js
- scripts/refresh_products_tags.js
- .github/workflows/mongo-deploy.yml
- scripts/checkmongodb.pl
- scripts/update_all_products_from_dir_in_mongodb.pl
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/daily.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ jobs:
echo "SSH_HOST=10.1.0.200" >> $GITHUB_ENV
echo "SSH_PROXY_HOST=ovh1.openfoodfacts.org" >> $GITHUB_ENV
echo "SSH_USERNAME=off" >> $GITHUB_ENV
- name: Refresh MongoDB products_tags collection
- name: Refresh Postgres cache
uses: appleboy/ssh-action@master
with:
host: ${{ env.SSH_HOST }}
Expand Down
6 changes: 2 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -192,10 +192,8 @@ create_mongodb_indexes:
${DOCKER_COMPOSE} exec -T mongodb //bin/sh -c "mongo off /data/db/create_indexes.js"

refresh_product_tags:
@echo "🥫 Refreshing products tags (update MongoDB products_tags collection) …"
# get id for mongodb container
docker cp scripts/refresh_products_tags.js $(shell docker-compose ps -q mongodb):/data/db
${DOCKER_COMPOSE} exec -T mongodb //bin/sh -c "mongo off /data/db/refresh_products_tags.js"
@echo "🥫 Refreshing product data cached in Postgres …"
${DOCKER_COMPOSE} run --rm backend perl /opt/product-opener/scripts/refresh_postgres.pl ${from}

import_sample_data:
@echo "🥫 Importing sample data (~200 products) into MongoDB …"
Expand Down
1 change: 1 addition & 0 deletions conf/apache.conf
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ PerlPassEnv PRODUCT_OPENER_DOMAIN
PerlPassEnv PRODUCT_OPENER_PORT
PerlPassEnv PRODUCERS_PLATFORM
PerlPassEnv ROBOTOFF_URL
PerlPassEnv QUERY_URL
PerlPassEnv EVENTS_URL
PerlPassEnv FACETS_KP_URL
PerlPassEnv EVENTS_USERNAME
Expand Down
1 change: 1 addition & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ x-backend-conf: &backend-conf
- POSTGRES_USER
- POSTGRES_PASSWORD
- ROBOTOFF_URL
- QUERY_URL
- EVENTS_URL
- FACETS_KP_URL
- EVENTS_USERNAME
Expand Down
1 change: 1 addition & 0 deletions docs/dev/how-to-quick-start-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ The `.env` file contains ProductOpener default settings:
| `PRODUCT_OPENER_FLAVOR_SHORT` | can be modified to run different flavors of OpenFoodFacts, amongst `off` (default), `obf`, `oppf`, `opf`.|
| `PRODUCERS_PLATFORM` | can be set to `1` to build / run the **producer platform**.|
| `ROBOTOFF_URL` | can be set to **connect with a Robotoff instance**.|
| `QUERY_URL` | can be set to **connect with a Query instance**.|
| `REDIS_URL` | can be set to **connect with a Redis instance for populating the search index**.|
| `GOOGLE_CLOUD_VISION_API_KEY` | can be set to **enable OCR using Google Cloud Vision**.|
| `CROWDIN_PROJECT_IDENTIFIER` and `CROWDIN_PROJECT_KEY` | can be set to **run translations**.|
Expand Down
5 changes: 5 additions & 0 deletions lib/ProductOpener/Config2_docker.pm
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ BEGIN {
$crowdin_project_identifier
$crowdin_project_key
$robotoff_url
$query_url
$events_url
$facets_kp_url
$events_username
Expand Down Expand Up @@ -105,6 +106,10 @@ $log_emails = $ENV{OFF_LOG_EMAILS} // 0;
# enable an in-site robotoff-asker in the product page
$robotoff_url = $ENV{ROBOTOFF_URL};

# Set this to your instance of https://github.com/openfoodfacts/openfoodfacts-query/ to
# enable product counts and aggregations / facets
$query_url = $ENV{QUERY_URL};

# Set this to your instance of https://github.com/openfoodfacts/openfoodfacts-events
# enable creating events for some actions (e.g. when a product is edited)
$events_url = $ENV{EVENTS_URL};
Expand Down
2 changes: 2 additions & 0 deletions lib/ProductOpener/Config2_sample.pm
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ BEGIN {
$crowdin_project_identifier
$crowdin_project_key
$robotoff_url
$query_url
$events_url
$events_username
$events_password
Expand Down Expand Up @@ -75,6 +76,7 @@ $crowdin_project_key = '';
# Set this to your instance of https://github.com/openfoodfacts/robotoff/ to
# enable an in-site robotoff-asker in the product page
$robotoff_url = '';
$query_url = '';

# Set this to your instance of https://github.com/openfoodfacts/openfoodfacts-events
# enable creating events for some actions (e.g. when a product is edited)
Expand Down
2 changes: 2 additions & 0 deletions lib/ProductOpener/Config_obf.pm
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ BEGIN {
$crowdin_project_key

$robotoff_url
$query_url
$events_url
$events_username
$events_password
Expand Down Expand Up @@ -206,6 +207,7 @@ $crowdin_project_key = $ProductOpener::Config2::crowdin_project_key;
# Set this to your instance of https://github.com/openfoodfacts/robotoff/ to
# enable an in-site robotoff-asker in the product page
$robotoff_url = $ProductOpener::Config2::robotoff_url;
$query_url = $ProductOpener::Config2::query_url;

# Set this to your instance of https://github.com/openfoodfacts/openfoodfacts-events
# enable creating events for some actions (e.g. when a product is edited)
Expand Down
4 changes: 3 additions & 1 deletion lib/ProductOpener/Config_off.pm
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ BEGIN {

$log_emails
$robotoff_url
$query_url
$events_url
$events_username
$events_password
Expand Down Expand Up @@ -354,6 +355,7 @@ $crowdin_project_key = $ProductOpener::Config2::crowdin_project_key;
# Set this to your instance of https://github.com/openfoodfacts/robotoff/ to
# enable an in-site robotoff-asker in the product page
$robotoff_url = $ProductOpener::Config2::robotoff_url;
$query_url = $ProductOpener::Config2::query_url;

# do we want to send emails
$log_emails = $ProductOpener::Config2::log_emails;
Expand Down Expand Up @@ -461,7 +463,6 @@ my $manifest = {
};
$options{manifest} = $manifest;

$options{mongodb_supports_sample} = 0; # from MongoDB 3.2 onward
$options{display_random_sample_of_products_after_edits} = 0; # from MongoDB 3.2 onward

$options{favicons} = <<HTML
Expand Down Expand Up @@ -702,6 +703,7 @@ $options{replace_existing_values_when_importing_those_tags_fields} = {
);

# fields for drilldown facet navigation
# If adding to this list ensure that the tables are being replicated to Postgres in the openfoodfacts-query repo

@drilldown_fields = qw(
nutrition_grades
Expand Down
2 changes: 2 additions & 0 deletions lib/ProductOpener/Config_opf.pm
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ BEGIN {
$crowdin_project_key

$robotoff_url
$query_url
$events_url
$events_username
$events_password
Expand Down Expand Up @@ -204,6 +205,7 @@ $crowdin_project_key = $ProductOpener::Config2::crowdin_project_key;
# Set this to your instance of https://github.com/openfoodfacts/robotoff/ to
# enable an in-site robotoff-asker in the product page
$robotoff_url = $ProductOpener::Config2::robotoff_url;
$query_url = $ProductOpener::Config2::query_url;

# Set this to your instance of https://github.com/openfoodfacts/openfoodfacts-events
# enable creating events for some actions (e.g. when a product is edited)
Expand Down
2 changes: 2 additions & 0 deletions lib/ProductOpener/Config_opff.pm
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ BEGIN {
$crowdin_project_key

$robotoff_url
$query_url
$events_url
$events_username
$events_password
Expand Down Expand Up @@ -203,6 +204,7 @@ $crowdin_project_key = $ProductOpener::Config2::crowdin_project_key;
# Set this to your instance of https://github.com/openfoodfacts/robotoff/ to
# enable an in-site robotoff-asker in the product page
$robotoff_url = $ProductOpener::Config2::robotoff_url;
$query_url = $ProductOpener::Config2::query_url;

# Set this to your instance of https://github.com/openfoodfacts/openfoodfacts-events
# enable creating events for some actions (e.g. when a product is edited)
Expand Down
71 changes: 46 additions & 25 deletions lib/ProductOpener/Data.pm
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,10 @@ ProductOpener::Data - methods to create or get the mongoDB client and fetch "dat
The module implements the methods required to fetch certain collections from the MongoDB database.
The functions used in this module are responsible for executing queries, to get connection to database and also to select the collection required.

The module exposes 2 distinct kinds of collections, products and products_tags, returned by the Data::get_products_collections
and the Data::get_products_tags_collections methods respectively.

The products collection contains a complete document for each product in the OpenFoodFacts database which exposes all
available information about the product.

The products_tags collection contains a stripped down version of the data in the products collection, where each
product entry has a select few fields, including fields used in tags. The main purpose of having this copy is to
improve performance of aggregate queries for an improved user experience and more efficient resource usage. This
collection was initially proposed in L<issue#1610|https://github.com/openfoodfacts/openfoodfacts-server/issues/1610> on
GitHub, where some additional context is available.

Obsolete products that have been withdrawn from the market have separate collections: products_obsolete and products_obsolete_tags
Obsolete products that have been withdrawn from the market have a separate collection: products_obsolete

=cut

Expand All @@ -52,6 +43,8 @@ BEGIN {
use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS);
@EXPORT_OK = qw(
&execute_query
&execute_aggregate_tags_query
&execute_count_tags_query
&get_database
&get_collection
&get_products_collection
Expand All @@ -71,6 +64,7 @@ use ProductOpener::Config qw/:all/;

use MongoDB;
use Tie::IxHash;
use JSON::PP;
use Log::Any qw($log);

use Action::CircuitBreaker;
Expand Down Expand Up @@ -116,6 +110,48 @@ sub execute_query ($sub) {
)->run();
}

sub execute_aggregate_tags_query ($aggregate_parameters) {
return execute_tags_query('aggregate', $aggregate_parameters);
}

sub execute_count_tags_query ($query_ref) {
return execute_tags_query('count', $query_ref);
}

sub execute_tags_query ($type, $parameters) {
if ((defined $query_url) and (length($query_url) > 0)) {
$query_url =~ s/^\s+|\s+$//g;
my $path = "$query_url/$type";
$log->debug('Executing PostgreSQL ' . $type . ' query on ' . $path, {query => $parameters})
if $log->is_debug();

my $ua = LWP::UserAgent->new();
my $resp = $ua->post(
$path,
Content => encode_json($parameters),
'Content-Type' => 'application/json; charset=utf-8'
);
if ($resp->is_success) {
return decode_json($resp->decoded_content);
}
else {
$log->warn(
"query response not ok",
{
code => $resp->code,
status_line => $resp->status_line,
response => $resp
}
) if $log->is_warn();
return;
}
}
else {
$log->debug('QUERY_URL not defined') if $log->is_debug();
return;
}
}

=head2 get_products_collection( $parameters_ref )

C<get_products_collection()> establishes a connection to MongoDB and uses timeout as an argument. This then selects a collection
Expand All @@ -139,16 +175,6 @@ This is useful when moving products to another flavour
If set to a true value, the function returns a collection that contains only obsolete products,
otherwise it returns the collection with products that are not obsolete.

=head4 tags

If set to a true value, the function may return a smaller collection that contains only the *_tags fields,
in order to speed aggregate queries. The smaller collection is created every night,
and may therefore contain slightly stale data.

As of 2023/03/13, we return the products_tags collection for non obsolete products.
For obsolete products, we currently return the products_obsolete collection, but we might
create a separate products_obsolete_tags collection in the future, if it becomes necessary to create one.

=head3 Return values

Returns a mongoDB collection object.
Expand All @@ -161,11 +187,6 @@ sub get_products_collection ($parameters_ref = {}) {
if ($parameters_ref->{obsolete}) {
$collection .= '_obsolete';
}
# We don't have a products_obsolete_tags collection at this point
# if it changes, the following elsif should be changed to a if
elsif ($parameters_ref->{tags}) {
$collection .= '_tags';
}
return get_collection($database, $collection, $parameters_ref->{timeout});
}

Expand Down
Loading