Skip to content

Conversation

aponb
Copy link
Contributor

@aponb aponb commented Aug 13, 2025

This change introduces a new CLI option --extraChromeArgs to Browsertrix Crawler, allowing users to pass arbitrary Chrome flags without modifying the codebase.

This approach is future-proof: any Chrome flag can be provided at runtime, avoiding the need for hard-coded allowlists.
Maintains backward compatibility: if no extraChromeArgs are passed, behavior remains unchanged.

@ikreymer
Copy link
Member

Hm, are there specific Chromium flags you're interested in?
I see this being useful, but also there are many flags that could break the crawler, and the curated list makes sense to make it easier to configure. Not opposed to adding per se, just wanted to understand if there's specific use case.

@aponb
Copy link
Contributor Author

aponb commented Sep 16, 2025

We’re running Browsertrix in a virtualized environment. Our biggest bottleneck is limited fast local storage, while the crawler writes a lot of small files.
Our main use case is to be able to redirect the Chrome cache to another storage location with
--disk-cache-dir, and also control related settings like --disk-cache-size. Having the flexibility to pass these flags directly lets us optimize performance without needing to patch Browsertrix itself every time.
So the initial motivation was the cache directory, but having a general extraChromeArgs option makes it future-proof: if other flags become useful later, we don’t need code changes.

@ikreymer
Copy link
Member

We’re running Browsertrix in a virtualized environment. Our biggest bottleneck is limited fast local storage, while the crawler writes a lot of small files. Our main use case is to be able to redirect the Chrome cache to another storage location with --disk-cache-dir, and also control related settings like --disk-cache-size. Having the flexibility to pass these flags directly lets us optimize performance without needing to patch Browsertrix itself every time. So the initial motivation was the cache directory, but having a general extraChromeArgs option makes it future-proof: if other flags become useful later, we don’t need code changes.

Hm, I suppose that's something we could expose, but also the dir can be controlled by mapping a new volume to the directory..

I suppose there's no harm in supporting this as a catch all, just wanted to see if there's a better way we can support what you're doing. Can you rebase this off the main to see hopefully that the tests are passing now?

@aponb
Copy link
Contributor Author

aponb commented Oct 1, 2025

I rebased this, but some checks are still not successful. It seems there is a login to dockerhub required somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants