-
Notifications
You must be signed in to change notification settings - Fork 249
Selenium Wire not working with zyte-smartproxy-headless-proxy #322
Comments
When you're using a proxy, you need to use Selenium Wire's proxy option to specify it. This is because Selenium Wire hijacks the normal proxy mechanism in order to capture requests. So in your case, you'd need to do: options = {
'proxy': {
'http': 'http://localhost:3128',
'https': 'https://localhost:3128',
}
}
driver = webdriver.Chrome(seleniumwire_options=options) and then you should remove the |
Dear wkeeling, Thank a lot for your quick response. I have done what you say it but it gives me the next following error:
Much appreciated, Andreu Jové |
Are you able to share the code you're using and the config options you're passing to the webdriver? |
Dear wkeeling, I can share some of the code. The proxy needs authentication but I already authenticate when I run the proxy on the port 3128. I have checked that is going throw the proxy but it fails in Here is the code that I'm using now. from seleniumwire import webdriver
options = {
'proxy': {
'http': 'http://localhost:3128',
'https': 'https://localhost:3128',
}
}
SELENIUM_DRIVER_ARGUMENTS = [
# "--headless",
"log-level=3",
"--no-sandbox",
"start-maximized",
"enable-automation",
"--disable-infobars",
"--disable-xss-auditor",
"--disable-setuid-sandbox",
"--disable-xss-auditor",
"--disable-web-security",
"--disable-dev-shm-usage",
"--disable-webgl",
"--disable-popup-blocking",
"ignore-certificate-errors",
]
def add_driver_arguments(
chrome_options: webdriver.ChromeOptions, driver_arguments: list
) -> None:
for argument in driver_arguments:
chrome_options.add_argument(argument)
def main():
chrome_options = webdriver.ChromeOptions()
add_driver_arguments(chrome_options, SELENIUM_DRIVER_ARGUMENTS)
driver = webdriver.Chrome(
options=chrome_options,
seleniumwire_options=options
)
driver.get("https://www.dieteticacentral.com/marcas/aquilea/aquilea-melatonina-1-95mg-30comp.html")
driver.close()
if __name__ == "__main__":
main() |
Thanks, that all looks ok as far as I can see. Are you able to post the full traceback you're getting? |
Dear wkeeling, Thanks for your quick reply. Here is the full traceback. Is quite wierd because in normal selenium it works fine.
|
Dear wkeeling, The following code is the same but for normal selenium package that it is working fine. I thought it might be helpful. from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.proxy import Proxy, ProxyType
SELENIUM_DRIVER_ARGUMENTS = [
# "--headless",
"log-level=3",
"--no-sandbox",
"start-maximized",
"enable-automation",
"--disable-infobars",
"--disable-xss-auditor",
"--disable-setuid-sandbox",
"--disable-xss-auditor",
"--disable-web-security",
"--disable-dev-shm-usage",
"--disable-webgl",
"--disable-popup-blocking",
"ignore-certificate-errors",
]
def add_driver_arguments(
chrome_options: webdriver.ChromeOptions, driver_arguments: list
) -> None:
for argument in driver_arguments:
chrome_options.add_argument(argument)
def main():
headless_proxy = "127.0.0.1:3128"
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': headless_proxy,
'ftpProxy' : headless_proxy,
'sslProxy' : headless_proxy,
'noProxy' : ''
})
capabilities = dict(DesiredCapabilities.CHROME)
proxy.add_to_capabilities(capabilities)
chrome_options = webdriver.ChromeOptions()
add_driver_arguments(chrome_options, SELENIUM_DRIVER_ARGUMENTS)
driver = webdriver.Chrome(
options=chrome_options,
desired_capabilities=capabilities
)
driver.get("https://www.dieteticacentral.com/marcas/aquilea/aquilea-melatonina-1-95mg-30comp.html")
driver.close()
if __name__ == "__main__":
main() |
Thanks. It looks like you're using an older version of Selenium Wire. The old versions sometimes had issues with SSL handshaking and proxy servers - and the traceback indicates that seems to be happening here. Is it possible for you to upgrade?
The latest version is 4.3.0 |
Dear wkeeling, Thanks for your quick reply.
Do you have any idea about this? Much appreciated, Andreu |
That looks as though you're using a different version of pyopenssl to what Selenium Wire needs. Are you able to see what version with:
Selenium Wire needs 19.1.0 or above. |
Now I look again, this may be because OpenSSL itself isn't up to date. OpenSSL normally comes preinstalled on most platforms but it's possible that the version you're using could be old. What OS are you using? |
Dear wkeeling, I'm using Linux. I manage to create a new environment and it is working. But I'm facing a new problem:
I'm using this arguments for driver: SELENIUM_DRIVER_ARGUMENTS = [
"--headless",
"log-level=3",
"--no-sandbox",
"start-maximized",
"enable-automation",
"--disable-infobars",
"--disable-xss-auditor",
"--disable-setuid-sandbox",
"--disable-xss-auditor",
"--disable-web-security",
"--disable-dev-shm-usage",
"--disable-webgl",
"--disable-popup-blocking",
"--ignore-certificate-errors-spki-list",
"ignore-certificate-errors",
] I'm only installing selenium-wire:
|
Thanks @AndreuJove I've not seen that one before. Could you try adding options = {
'mitm_http2': False, # Add this
'proxy': {
'http': 'http://localhost:3128',
'https': 'https://localhost:3128',
}
} Also could you let me know what version of OpenSSL you've got installed, with:
On my version of Linux, I have |
Dear wkeeling, Thanks for your update.
The problem is running the code inside a docker container( Do you have any idea of what is the problem? Thanks a lot, Andreu |
Does it work if you omit the proxy settings and go direct from Selenium Wire to the target site - using your Docker setup? |
Dear wkeeling, Thank you for your response. I have tried without proxy and it's not working either. selenium_wire_logger = logging.getLogger("seleniumwire")
selenium_wire_logger.setLevel(logging.ERROR) But it doesn't work. Do we have any other way? Thanks a lot for your help, Andreu |
Dear wkeeling, Do you have any news on:
Thanks a lot, Andreu Jové |
Dear wkeeling, I found the problem. I'm using crawlera headless proxy and has the next following problem:
https://github.com/zytedata/zyte-smartproxy-headless-proxy With the configuration that you told me of Is there any other seleniumwire options can I pass? Thanks a lot, Andreu Jové |
I suspect that it has deactivated the http2 connections but the proxy is closing off the connection for some other reason. Just looking at the GitHub page for the proxy, have you tried setting |
Dear wkeeling, Unfourtanely I tried and it didn't work either. It is quite weird. Is there any other way to deactivate http2? Thanks a lot for your help, Andreu Jové |
Ok thanks. I'm afraid I'm running out of ideas at this point. The |
Dear wkeeling, Can you please reopen the issue? It doesn't work either I guess it is a problem of the connections of both proxies. Is there any other way to ignore certificates errors in selenium-wire? SELENIUM_DRIVER_ARGUMENTS = [
"--headless",
"log-level=3",
"--no-sandbox",
"start-maximized",
"enable-automation",
"--disable-infobars",
"--disable-xss-auditor",
"--disable-setuid-sandbox",
"--disable-xss-auditor",
"--disable-web-security",
"--disable-dev-shm-usage",
"--disable-webgl",
"--disable-popup-blocking",
"--ignore-certificate-errors",
"--ignore-certificate-errors-spki-list",
"--ignore-ssl-errors",
"--allow-insecure-localhost",
] Thank you so much, Andreu Jové |
@AndreuJove will re-open. Selenium Wire ignores SSL certificate errors by default. It's going to require some further debugging. I feel we should perhaps update the title of this ticket to e.g. "Selenium Wire not working with zyte-smartproxy-headless-proxy" if you agree? |
Yes sure I change it. |
@AndreuJove , @wkeeling had the same issue, managed to fix it by |
Thanks @heisen273 @AndreuJove are you able to confirm whether that fixes for you? |
Thank you very much for your help. Could you please provide me the version of cryptography that you are using. I should put it the requirements.txt of my project that are installed in my docker container. Thanks! |
@AndreuJove there must be something in the stack (or network) that's stripping out the X- headers. I don't have access to my machine currently, but I'll see if I can reproduce locally a bit later. |
Dear @wkeeling, Okey!! Can't wait for your answer. |
@AndreuJove so I've tried reproducing with: def interceptor(request):
del request.headers['Referer']
request.headers['Referer'] = "new_referer"
request.headers['foo'] = 'bar'
request.headers['X-Crawlera-Cookies'] = "disable"
request.headers['X-Crawlera-Profile'] = "desktop" and I'm getting: {
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Foo": "bar",
"Host": "httpbin.org",
"Proxy-Connection": "keep-alive",
"Referer": "new_referer",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-60ccddf3-6c7a7c3f237a3b433c331272",
"X-Crawlera-Cookies": "disable",
"X-Crawlera-Profile": "desktop"
}
} I'm using an upstream proxy - but an instance of mitmproxy. I'm thinking it must be something local to your environment. Is the proxy you're using stripping the headers out? Have you tried without the proxy - just a direct request to httpbin.org? |
Dear @wkeeling, I have seen that yes it dissappears, maybe is because the proxy that I'm using after selenium-wire. Need to debug more because the 2 proxies are not connecting well. |
Dear @wkeeling, I'm recieving
Do you know why it's happening this if selenium-wire ignores ssl certificates? Thanks a lot, Andreu |
That is strange as Selenium Wire does allow insecure SSL certificates by default. Have you tried adding the upstream proxy root certificate to the browser's trusted root certificate authorities? |
Dear wkeeling, I do apologise for not replying. I had to develop other issues. What do you mean about adding the upstream proxy root? The problem that I face know is that I'm recieving this warning:
But I already have already the argument for selenium:
Thanks a lot, Andreu Jové |
Dear @wkeeling, I'm facing a new error, when using interceptor crawlera headless proxy does not recieve what it should recieve. Do you know something about that? Thanks a lot, Andreu |
What data is missing that the proxy should be receiving? It maybe worth temporarily disabling the crawlera proxy and validating that the interceptor is doing the right thing, by checking it against https://httpbin.org. Once confirmed the interceptor is correct, add the proxy back. Regarding the certificate error can you try adding |
Dear @wkeeling , Sorry for not replying I had to develop other features. We have solved the previous error adding another ca.crt of crawlera. But we are fascing another issue related to TLS certificates:
Do we need to install any other certificates? The problem is in the deploy in local is working fine. That's why I think that we are missing something to install in our docker container. |
Dear @wkeeling , From the logs:
The proxy that we are using: https://github.com/zytedata/zyte-smartproxy-headless-proxy is configurated in port 3128:
Why is running in 47122 selenium-wire??? Thanks a lot for your help, Andreu Jové |
Thanks @AndreuJove Port 47122 is probably the port the Selenium Wire server is listening on. When you run Selenium Wire it starts the server on a random free port number. As mentioned previously, Selenium Wire is configured to ignore certificate errors by default. Does that message actually cause a problem loading farmavazquez.com ? |
Dear @wkeeling, Thank you for your help. Yes that message might be causing an error in our Deployment site (Scrapy Cloud). But I'm not sure if is only this. How can we avoid all these kind of messages? Also the log shows
But I guess is not working either. To pass `--ssl- Do we have to install any certificate of selenium-wire? Kind regards, Hope that we can finally find a solution, your library is very powerful we would like to use it in production. Andreu |
Hmm ok that doesn't seem right. That option is enabled by default. I'll need to look at it in some more detail. Selenium Wire does have its own certificate that you can install into your browser, but I guess you've probably already done that? The instructions are on the main README in the "Certificates" section. I'm away at the moment but I'll try and investigate once I'm back. |
Dear @wkeeling , The problem is that I have to install that cerfiticate on my dockerfile. I'm doing it like this.
Not working either. Thanks a lot, Andreu Jové |
@AndreuJove try using the
The
|
Dear @wkeeling , Thank you for your help. I'm copying directly the certificate, sometimes curl can fail. I had to create a dir first to run the certuil.
I'm facing a new error now:
Do you have any idea of this? Andreu Jové |
Dear @wkeeling , I'm having the same error even without using the proxy and without the interceptor, so clearly the problem is selenium-wire. With normal selenium is working on the cloud, but with wire not.
I'm thinking about incompatibilities betweeen the version I'm using of selenium-wire and the chrome-version or the chrome webdriver. Here I show you how I'm building the chrome and the webdriver:
selenium-wire==4.3.0 More information:
Thanks a lot for your help, Andreu Jové |
Ok thanks @AndreuJove Are you able to share the Dockerfile that reproduces the issue? I'll try and debug this locally to figure out what's going on. |
Dear @wkeeling , The problem that I'm having is to use Selenium Wire in Scrapy Cloud (now called Zyte) it raises what I told you. In my docker container it works fine. The dockerfile I'm using. It could be any uncompatibility? Thank you for your help, I have tried installing cetificates and 100 other combinations and they don't work in the Scrapy Cloud. Andreu Jové |
Dear @wkeeling , We have discovered that the version 3.0.6 works on Scrapy Cloud (not giving ERROR SSL) the problem is that doesn't work with the proxy. Do you know something about this? Thank you for your help, Andreu Jové |
Version 3.0.6 was the last version to use the old backend. From 4.0.0 onward Selenium Wire uses mitmproxy as its backend. The reason for the change was that the old backend struggled to handle upstream proxy servers and would often fail with low level socket errors or fail to connect at all (which sounds like the issue you've had with it). mitmproxy has much better stability with upstream proxies. |
hey everyone! In this library we call In that issue you'll find a link to the issue in |
@ejulio great work in discovering this. I'll consider each of the workarounds in pyca/pyopenssl#168 and see if I can apply a fix to Selenium Wire. Thanks again. |
@ejulio there's now a new version of Selenium Wire available with a fix for the socket timeout issue (v4.5.3). |
Thanks @wkeeling |
Dear selenium-wire,
I have been using proxy running in my localhost port 3128 and it is working in normal selenium.
I guess that the command:
chrome_options.add_argument("--proxy-server=localhost:3128")
for selenium-wire is not working.Any idea of solving this?
Thanks a lot
The text was updated successfully, but these errors were encountered: