Skip to content

Commit

Permalink
feat: add rate-limiting on facet pages (openfoodfacts#10738)
Browse files Browse the repository at this point in the history
Rate-limit of on facet pages, such as:
- /categories
- /label/organic
- /ingredient/salt/category/breads

The rate limit depends on the user:
- unregistered users: 5 req/min
- registered user: 10 req/min
- crawling bots: 10 req/min. Rate-limiting is higher for crawling bots
than for unregistered users, as otherwise web indexer bots will
slow-down indexing for the full website

This mechanism also provides an easy way to block completely some
resource-intensive pages, such as facets, if the server is unresponsive
due to heavy load (by setting the rate-limit value for the corresponding
bucket to 0).
  • Loading branch information
raphael0202 authored Aug 28, 2024
1 parent 1e255d4 commit 9e9e321
Show file tree
Hide file tree
Showing 21 changed files with 221 additions and 50 deletions.
4 changes: 2 additions & 2 deletions cgi/search.pl
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@
}

$request_ref->{search} = 1;
# api_action is required for `check_and_update_rate_limits`
$request_ref->{api_action} = 'search';
# rate_limiter_bucket is required for `check_and_update_rate_limits`
$request_ref->{rate_limiter_bucket} = 'search';

check_and_update_rate_limits($request_ref);

Expand Down
1 change: 1 addition & 0 deletions docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ To protect our infrastructure, we enforce rate-limits on the API and the website

- 100 req/min for all read product queries (`GET /api/v*/product` requests or product page). There is no limit on product write queries.
- 10 req/min for all search queries (`GET /api/v*/search` or `GET /cgi/search.pl` requests)
- 2 req/min for facet queries (such as `/categories`, `/label/organic`, `/ingredient/salt/category/breads`,...)

If these limits are reached, we reserve the right to deny you access to the website and the API through IP address ban. If your IP has been banned, feel free to [email us to explain why you reached the limits][why_reached_limits]: reverting the ban is possible.

Expand Down
11 changes: 10 additions & 1 deletion lib/ProductOpener/Config_off.pm
Original file line number Diff line number Diff line change
Expand Up @@ -1634,7 +1634,16 @@ $options{sample_product_code} = "093270067481501"; # A good product for you -

# Number of requests per minutes for the search API
$options{rate_limit_search} = 10;
# Number of requests per minutes for the product API
# Number of requests per minutes for all facets for anonymous users
$options{rate_limit_facet_products_unregistered} = 5;
# Number of requests per minutes for facets for registered users
$options{rate_limit_facet_products_registered} = 10;
# Number of requests per minutes for facets for bots
$options{rate_limit_facet_products_crawl_bot} = 10;
# Number of requests per minutes for facet tags (list of tags with count) for anonymous users
$options{rate_limit_facet_tags_unregistered} = 5;
$options{rate_limit_facet_tags_registered} = 10;
$options{rate_limit_facet_tags_crawl_bot} = 10;
$options{rate_limit_product} = 100;

# Rate limit allow list
Expand Down
12 changes: 12 additions & 0 deletions lib/ProductOpener/Display.pm
Original file line number Diff line number Diff line change
Expand Up @@ -634,6 +634,18 @@ sub init_request ($request_ref = {}) {
$request_ref->{deny_all_robots_txt} = 1;
}

# Rate-limiter specific settings
# Also see set_rate_limit_attributes in Routing.pm

# Each request is (possibly) associated with a rate limiter bucket
$request_ref->{rate_limiter_bucket} = undef;
# Number of requests the user did in the last minute
$request_ref->{rate_limiter_user_requests} = undef;
# Limit of requests for the specific bucket (and/or user)
$request_ref->{rate_limiter_limit} = undef;
# If the rate limiter is blocking the request
$request_ref->{rate_limiter_blocking} = 0;

# TODO: global variables should be moved to $request_ref
$request_ref->{styles} = '';
$request_ref->{scripts} = '';
Expand Down
36 changes: 18 additions & 18 deletions lib/ProductOpener/Redis.pm
Original file line number Diff line number Diff line change
Expand Up @@ -160,9 +160,9 @@ sub push_to_redis_stream ($user_id, $product_ref, $action, $comment, $diffs, $ti
return;
}

=head2 get_rate_limit_user_requests ($ip, $api_action)
=head2 get_rate_limit_user_requests ($ip, $bucket)
Return the number of requests performed by the given user for the current minute for the given API route.
Return the number of requests performed by the given user for the current minute for the given rate-limit bucket.
See https://redis.com/glossary/rate-limiting/ for more information.
If the rate-limiter is not configured or if an error occurs, returns undef.
Expand All @@ -173,13 +173,13 @@ If the rate-limiter is not configured or if an error occurs, returns undef.
The IP address of the user who is making the request.
=head4 String $api_action
=head4 String $bucket
The API action that is being requested.
The rate-limit bucket that is being requested.
=cut

sub get_rate_limit_user_requests ($ip, $api_action) {
sub get_rate_limit_user_requests ($ip, $bucket) {
if (!$redis_url) {
# No Redis URL provided, we can't get the remaining number of requests
if (!$sent_warning_about_missing_redis_url) {
Expand All @@ -197,17 +197,17 @@ sub get_rate_limit_user_requests ($ip, $api_action) {
}
my $resp;
if (defined $redis_client) {
$ratelimiter_log->debug("Getting rate-limit remaining requests", {ip => $ip, api_action => $api_action})
$ratelimiter_log->debug("Getting rate-limit user requests", {ip => $ip, bucket => $bucket})
if $ratelimiter_log->is_debug();
my $current_minute = int(time() / 60);
eval {$resp = $redis_client->get("po-rate-limit:$ip:$api_action:$current_minute");};
eval {$resp = $redis_client->get("po-rate-limit:$ip:$bucket:$current_minute");};
$error = $@;
}
else {
$error = "Can't connect to Redis";
}
if (!($error eq "")) {
$ratelimiter_log->error("Failed to get remaining number of requests from Redis rate-limiter", {error => $error})
$ratelimiter_log->error("Failed to get number of user requests logged by Redis rate-limiter", {error => $error})
if $ratelimiter_log->is_warn();
# ask for eventual reconnection for next call
$redis_client = undef;
Expand All @@ -219,17 +219,17 @@ sub get_rate_limit_user_requests ($ip, $api_action) {
else {
$resp = 0;
}
$ratelimiter_log->debug("Remaining number of requests from Redis rate-limiter", {remaining_requests => $resp})
$ratelimiter_log->debug("Number of user requests logged by Redis rate-limiter", {requests => $resp})
if $ratelimiter_log->is_debug();
return $resp;
}

return;
}

=head2 increment_rate_limit_requests ($ip, $api_action)
=head2 increment_rate_limit_requests ($ip, $bucket)
Increment the number of requests according to the Redis rate-limiter for the current minute for the given user and API route.
Increment the number of requests according to the Redis rate-limiter for the current minute for the given user and bucket.
The expiration of the counter is set to 59 seconds.
See https://redis.com/glossary/rate-limiting/ for more information.
Expand All @@ -239,13 +239,13 @@ See https://redis.com/glossary/rate-limiting/ for more information.
The IP address of the user who is making the request.
=head4 String $api_action
=head4 String $bucket
The API action that is being requested.
The rate-limit bucket that is being requested.
=cut

sub increment_rate_limit_requests ($ip, $api_action) {
sub increment_rate_limit_requests ($ip, $bucket) {
if (!$redis_url) {
# No Redis URL provided, we can't increment the number of requests
if (!$sent_warning_about_missing_redis_url) {
Expand All @@ -262,14 +262,14 @@ sub increment_rate_limit_requests ($ip, $api_action) {
init_redis();
}
if (defined $redis_client) {
$ratelimiter_log->debug("Incrementing rate-limit requests", {ip => $ip, api_action => $api_action})
$ratelimiter_log->debug("Incrementing rate-limit requests", {ip => $ip, bucket => $bucket})
if $ratelimiter_log->is_debug();
my $current_minute = int(time() / 60);
eval {
# Use a MULTI/EXEC block to increment the counter and set the expiration atomically
$redis_client->multi();
$redis_client->incr("po-rate-limit:$ip:$api_action:$current_minute");
$redis_client->expire("po-rate-limit:$ip:$api_action:$current_minute", 59);
$redis_client->incr("po-rate-limit:$ip:$bucket:$current_minute");
$redis_client->expire("po-rate-limit:$ip:$bucket:$current_minute", 59);
$redis_client->exec();
};
$error = $@;
Expand All @@ -285,7 +285,7 @@ sub increment_rate_limit_requests ($ip, $api_action) {
}
else {
$ratelimiter_log->debug("Incremented number of requests from Redis rate-limiter",
{ip => $ip, api_action => $api_action})
{ip => $ip, bucket => $bucket})
if $ratelimiter_log->is_debug();
}

Expand Down
57 changes: 42 additions & 15 deletions lib/ProductOpener/Routing.pm
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,10 @@ sub api_route($request_ref) {
set_request_stats_value($request_ref->{stats}, "api_method", $request_ref->{api_method});
set_request_stats_value($request_ref->{stats}, "api_version", $request_ref->{api_version});

if ($api_action eq "product") {
$request_ref->{rate_limiter_bucket} = "product";
}

$log->debug("api_route", {request_ref => $request_ref}) if $log->is_debug();
return 1;
}
Expand All @@ -364,6 +368,7 @@ sub api_route($request_ref) {
sub search_route($request_ref) {
$request_ref->{search} = 1;
set_request_stats_value($request_ref->{stats}, "route", "search");
$request_ref->{rate_limiter_bucket} = "search";
return 1;
}

Expand Down Expand Up @@ -409,6 +414,7 @@ sub product_route($request_ref) {
$request_ref->{product} = 1;
$request_ref->{code} = $request_ref->{components}[1];
$request_ref->{titleid} = $request_ref->{components}[2] // '';
$request_ref->{rate_limiter_bucket} = "product";
set_request_stats_value($request_ref->{stats}, "route", "product");
}
else {
Expand Down Expand Up @@ -547,10 +553,12 @@ sub facets_route($request_ref) {
$request_ref->{canon_rel_url} .= $canon_rel_url_suffix;

if (defined $request_ref->{groupby_tagtype}) {
$request_ref->{rate_limiter_bucket} = "facet_tags";
set_request_stats_value($request_ref->{stats}, "route", "facets_tags");
set_request_stats_value($request_ref->{stats}, "groupby_tagtype", $request_ref->{groupby_tagtype});
}
else {
$request_ref->{rate_limiter_bucket} = "facet_products";
set_request_stats_value($request_ref->{stats}, "route", "facets_products");
}
set_request_stats_value($request_ref->{stats}, "facets_tags", (scalar @{$request_ref->{tags}}));
Expand Down Expand Up @@ -943,28 +951,47 @@ sub set_rate_limit_attributes ($request_ref, $ip) {
$request_ref->{rate_limiter_limit} = undef;
$request_ref->{rate_limiter_blocking} = 0;

my $api_action = $request_ref->{api_action};
if (not defined $api_action) {
# The request is not an API request, we don't need to check the rate-limiter
my $rate_limit_bucket = $request_ref->{rate_limiter_bucket};

if (not defined $rate_limit_bucket) {
# The request is not rate-limited
return;
}
$request_ref->{rate_limiter_user_requests} = get_rate_limit_user_requests($ip, $api_action);
$request_ref->{rate_limiter_user_requests} = get_rate_limit_user_requests($ip, $rate_limit_bucket);

my $limit;
if (($api_action eq "search") or ($request_ref->{search})) {
if ($rate_limit_bucket eq "search") {
$limit = $options{rate_limit_search};
}
elsif ($api_action eq "product") {
elsif ($rate_limit_bucket eq "product") {
$limit = $options{rate_limit_product};
}
else {
# No rate-limit is defined for this API action
return;
elsif ($rate_limit_bucket eq "facet_products") {
if ($request_ref->{is_crawl_bot}) {
$limit = $options{rate_limit_facet_products_crawl_bot};
}
elsif (defined $request_ref->{user_id}) {
$limit = $options{rate_limit_facet_products_registered};
}
else {
$limit = $options{rate_limit_facet_products_unregistered};
}
}
elsif ($rate_limit_bucket eq "facet_tags") {
if ($request_ref->{is_crawl_bot}) {
$limit = $options{rate_limit_facet_tags_crawl_bot};
}
elsif (defined $request_ref->{user_id}) {
$limit = $options{rate_limit_facet_tags_registered};
}
else {
$limit = $options{rate_limit_facet_tags_unregistered};
}
}
$request_ref->{rate_limiter_limit} = $limit;

if (
# if $limit is not defined, the rate-limiter is disabled for this API action
# if $limit is not defined, the rate-limiter is disabled for this route and/or user
defined $limit
and defined $request_ref->{rate_limiter_user_requests}
and $request_ref->{rate_limiter_user_requests} >= $limit
Expand Down Expand Up @@ -998,9 +1025,10 @@ sub set_rate_limit_attributes ($request_ref, $ip) {
$block_message,
{
ip => $ip,
api_action => $api_action,
rate_limit_bucket => $rate_limit_bucket,
user_requests => $request_ref->{rate_limiter_user_requests},
limit => $limit
limit => $limit,
user_agent => $request_ref->{user_agent},
}
) if $ratelimiter_log->is_info();
}
Expand All @@ -1013,11 +1041,10 @@ sub check_and_update_rate_limits($request_ref) {
my $ip_address = remote_addr();
# Set rate-limiter related request attributes
set_rate_limit_attributes($request_ref, $ip_address);
my $api_action = $request_ref->{api_action};

if (defined $api_action) {
if (defined $request_ref->{rate_limiter_bucket}) {
# Increment the number of requests performed by the user for the current minute
increment_rate_limit_requests($ip_address, $api_action);
increment_rate_limit_requests($ip_address, $request_ref->{rate_limiter_bucket});
}
}
return;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"original_query_string" : "api/v0/attribute_groups",
"page" : 1,
"query_string" : "api/v0/attribute_groups",
"rate_limiter_bucket" : null,
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : null,
"rate_limiter_user_requests" : null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"original_query_string" : "api/v3/product/03564703999971",
"page" : 1,
"query_string" : "api/v3/product/03564703999971",
"rate_limiter_bucket" : "product",
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : 100,
"rate_limiter_user_requests" : null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"original_query_string" : "api/v3/product/https%3A%2F%2Fid.gs1.org%2F01%2F03564703999971%2F10%2FABC%2F21%2F123456%3F17%3D211200",
"page" : 1,
"query_string" : "api/v3/product/https%3A%2F%2Fid.gs1.org%2F01%2F03564703999971%2F10%2FABC%2F21%2F123456%3F17%3D211200",
"rate_limiter_bucket" : "product",
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : 100,
"rate_limiter_user_requests" : null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,9 @@
"tagtype" : "categories"
}
],
"tagtype" : "categories"
"tagtype" : "categories",
"rate_limiter_bucket": "facet_tags",
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : null,
"rate_limiter_user_requests" : null
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,9 @@
"tagtype" : "categories"
}
],
"tagtype" : "categories"
"tagtype" : "categories",
"rate_limiter_bucket": "facet_tags",
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : null,
"rate_limiter_user_requests" : null
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,9 @@
"tagtype" : "categories"
}
],
"tagtype" : "categories"
"tagtype" : "categories",
"rate_limiter_bucket" : "facet_products",
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : null,
"rate_limiter_user_requests" : null
}
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@
"page" : "4",
"param" : {},
"query_string" : "category/bread/4",
"rate_limiter_bucket" : "facet_products",
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : null,
"rate_limiter_limit" : 5,
"rate_limiter_user_requests" : null,
"tag" : "en:bread",
"tag_prefix" : "",
Expand Down
3 changes: 2 additions & 1 deletion tests/unit/expected_test_results/routing/facet-url.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@
"page" : 1,
"param" : {},
"query_string" : "category/breads",
"rate_limiter_bucket" : "facet_products",
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : null,
"rate_limiter_limit" : 10,
"rate_limiter_user_requests" : null,
"tag" : "en:breads",
"tag_prefix" : "",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
"original_query_string" : "api/v3/geopip/12.45.23.45",
"page" : 1,
"query_string" : "api/v3/geopip/12.45.23.45",
"rate_limiter_bucket" : null,
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : null,
"rate_limiter_user_requests" : null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
"original_query_string" : "api/v3/geopip/2001:ac8:25:3b::e01d",
"page" : 1,
"query_string" : "api/v3/geopip/2001:ac8:25:3b::e01d",
"rate_limiter_bucket" : null,
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : null,
"rate_limiter_user_requests" : null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
"page" : 1,
"param" : {},
"query_string" : "category/breads/no-nutrition-data",
"rate_limiter_bucket" : null,
"rate_limiter_blocking" : 0,
"rate_limiter_limit" : null,
"rate_limiter_user_requests" : null,
Expand Down
Loading

0 comments on commit 9e9e321

Please sign in to comment.