Skip to content

Conversation

@ikreymer
Copy link
Member

@ikreymer ikreymer commented Oct 28, 2025

  • set 'max_concur_queue_to_limit_scale' to determine max size of concurrent crawls that can be queued
  • if set, above limit queueing new crawls will be rejected with a 429 until there are less concurrent crawls in the queue
  • default to disabled tests: add max queue limit to concurrent crawl tests
  • fixes [Task]: Add a way to limit concurrent crawler queue #2938

Testing:

  • Set a lower concurrent crawl limit on an org, eg. 1
  • Set max_concur_queue_to_limit_scale: 1
  • Attempt to start another crawl after 1 is already waiting, should receive error / 429.

- set 'max_concur_queue_to_limit_scale' to determine max size of concurrent crawls that can be queued
- if set, above limit queueing new crawls will be rejected with a 429
- default to disabled
tests: add max queue limit to concurrent crawl tests
@ikreymer ikreymer requested a review from tw4l October 28, 2025 19:37
Copy link
Member

@tw4l tw4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general working well! Tested locally with a limit scale multiplier of 1 and 2.

The nightly concurrent crawl limit test doesn't pass yet, should be tweaked a bit. I left a comment in-line.

I wonder if we also want to have a limit for the number of queued crawls even when a concurrent crawl quota isn't set? Not sure if self-deployments that don't use a concurrent crawl limit might still run into resourcing issues from starting too many crawls at the same time.

Only other comment is that I find max_concur_queue_to_limit_scale to be pretty cryptic. Without reading the comments in values.yaml I'm not sure I'd be able to piece together what it means. Something like crawl_queue_limit_scale might be a bit easier to read?

@ikreymer
Copy link
Member Author

I wonder if we also want to have a limit for the number of queued crawls even when a concurrent crawl quota isn't set? Not sure if self-deployments that don't use a concurrent crawl limit might still run into resourcing issues from starting too many crawls at the same time.

PR #2945 adds a separate optimization which should make the concurrent crawl check more efficient in general, even if there is no limit. I think that should avoid the main resourcing issue, and maybe makes this PR is less important, but still an option to have.

@ikreymer ikreymer requested a review from tw4l October 30, 2025 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task]: Add a way to limit concurrent crawler queue

3 participants