Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Proxy Settings for Crawler #745

Open
1 task done
Sinterdial opened this issue Dec 21, 2024 · 6 comments
Open
1 task done

Add Proxy Settings for Crawler #745

Sinterdial opened this issue Dec 21, 2024 · 6 comments

Comments

@Sinterdial
Copy link

Describe the feature you'd like

In certain regions of the world, internet censorship is very strict, so proxies must be used to access some very popular websites.

However, when I tried to use the environment variables HTTP_PROXY and HTTPS_PROXY to route container traffic through a proxy, it had no effect. I hope there can be configuration options added in the .env file to enable this functionality.

Describe the benefits this would bring to existing Hoarder users

Users around the world

Can the goal of this request already be achieved via other means?

Yes, this could perhaps be achieved by setting the proxy in the Docker daemon's configuration file, but I only want to enable the proxy for specific containers

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

@MohamedBassem
Copy link
Collaborator

you should be able to setup a proxy server on the chrome container (where most of the fetching happens) using chrome flags. Check #420 out

@KortanZ
Copy link

KortanZ commented Dec 22, 2024

Is there any way to setup a proxy server when using browserless? It will be great if i could reuse my browserless service :D.

@MohamedBassem
Copy link
Collaborator

@KortanZ seems like you can by adding it at the end of the browserless URL you give to hoarder (https://docs.browserless.io/recipes/proxies#specifying-the-proxy)

@primejava
Copy link

primejava commented Dec 24, 2024

I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I www.youtube.com inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded
image

@Sinterdial
Copy link
Author

I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I www.youtube.com inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded image

same

@waynexia
Copy link

In the docker environment the networking is a bit different. You need a special IP 172.17.0.1 to access the host network (my handbook)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants