Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

Selenium Wire not working with zyte-smartproxy-headless-proxy #322

Closed
AndreuJove opened this issue Jun 12, 2021 · 63 comments · Fixed by #415
Closed

Selenium Wire not working with zyte-smartproxy-headless-proxy #322

AndreuJove opened this issue Jun 12, 2021 · 63 comments · Fixed by #415

Comments

@AndreuJove
Copy link

AndreuJove commented Jun 12, 2021

Dear selenium-wire,

I have been using proxy running in my localhost port 3128 and it is working in normal selenium.

I guess that the command: chrome_options.add_argument("--proxy-server=localhost:3128") for selenium-wire is not working.

Any idea of solving this?

Thanks a lot

@wkeeling
Copy link
Owner

wkeeling commented Jun 12, 2021

When you're using a proxy, you need to use Selenium Wire's proxy option to specify it. This is because Selenium Wire hijacks the normal proxy mechanism in order to capture requests. So in your case, you'd need to do:

options = {
    'proxy': {
        'http': 'http://localhost:3128',
        'https': 'https://localhost:3128',
    }
}
driver = webdriver.Chrome(seleniumwire_options=options)

and then you should remove the --proxy-server argument from your chrome_options.

@AndreuJove
Copy link
Author

AndreuJove commented Jun 12, 2021

Dear wkeeling,

Thank a lot for your quick response.

I have done what you say it but it gives me the next following error:

OSError: [Errno 0] Error
2021-06-12 19:47:49 [seleniumwire.proxy.handler] ERROR: Error making request

Much appreciated,

Andreu Jové

@wkeeling
Copy link
Owner

Are you able to share the code you're using and the config options you're passing to the webdriver?

@AndreuJove
Copy link
Author

AndreuJove commented Jun 12, 2021

Dear wkeeling,

I can share some of the code. The proxy needs authentication but I already authenticate when I run the proxy on the port 3128.

I have checked that is going throw the proxy but it fails in seleniumwire/proxy/proxy2.py", line 91, in proxy_request

Here is the code that I'm using now.

from seleniumwire import webdriver

options = {
    'proxy': {
        'http': 'http://localhost:3128',
        'https': 'https://localhost:3128',
    }
}

SELENIUM_DRIVER_ARGUMENTS = [
    # "--headless",
    "log-level=3",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "ignore-certificate-errors",
]

def add_driver_arguments(
    chrome_options: webdriver.ChromeOptions, driver_arguments: list
) -> None:
    for argument in driver_arguments:
        chrome_options.add_argument(argument)

def main(): 
    chrome_options = webdriver.ChromeOptions()
    add_driver_arguments(chrome_options, SELENIUM_DRIVER_ARGUMENTS)
    driver = webdriver.Chrome(
               options=chrome_options,
               seleniumwire_options=options
            )
    driver.get("https://www.dieteticacentral.com/marcas/aquilea/aquilea-melatonina-1-95mg-30comp.html")
    driver.close()

if __name__ == "__main__":
    main()

@wkeeling
Copy link
Owner

Thanks, that all looks ok as far as I can see. Are you able to post the full traceback you're getting?

@AndreuJove
Copy link
Author

Dear wkeeling,

Thanks for your quick reply.

Here is the full traceback. Is quite wierd because in normal selenium it works fine.

Error making request
Traceback (most recent call last):
  File ".local/lib/python3.8/site-packages/seleniumwire/proxy/proxy2.py", line 91, in proxy_request
    conn.request(self.command, path, req_body, dict(req.headers))
  File "/usr/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 950, in send
    self.connect()
  File "/.local/lib/python3.8/site-packages/seleniumwire/proxy/proxy2.py", line 368, in connect
    super().connect()
  File "/usr/lib/python3.8/http/client.py", line 1424, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/usr/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
OSError: [Errno 0] Error

@AndreuJove
Copy link
Author

Dear wkeeling,

The following code is the same but for normal selenium package that it is working fine. I thought it might be helpful.

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.proxy import Proxy, ProxyType

SELENIUM_DRIVER_ARGUMENTS = [
    # "--headless",
    "log-level=3",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "ignore-certificate-errors",
]


def add_driver_arguments(
    chrome_options: webdriver.ChromeOptions, driver_arguments: list
) -> None:
    for argument in driver_arguments:
        chrome_options.add_argument(argument)


def main():
    headless_proxy = "127.0.0.1:3128"
    proxy = Proxy({
        'proxyType': ProxyType.MANUAL,
        'httpProxy': headless_proxy,
        'ftpProxy' : headless_proxy,
        'sslProxy' : headless_proxy,
        'noProxy'  : ''
    })

    capabilities = dict(DesiredCapabilities.CHROME)
    proxy.add_to_capabilities(capabilities)
    chrome_options = webdriver.ChromeOptions()
    add_driver_arguments(chrome_options, SELENIUM_DRIVER_ARGUMENTS)
    driver = webdriver.Chrome(
               options=chrome_options,
               desired_capabilities=capabilities
            )
    driver.get("https://www.dieteticacentral.com/marcas/aquilea/aquilea-melatonina-1-95mg-30comp.html")
    driver.close()

if __name__ == "__main__":
    main()

@wkeeling
Copy link
Owner

wkeeling commented Jun 12, 2021

Thanks. It looks like you're using an older version of Selenium Wire. The old versions sometimes had issues with SSL handshaking and proxy servers - and the traceback indicates that seems to be happening here. Is it possible for you to upgrade?

pip install --upgrade selenium-wire

The latest version is 4.3.0

@AndreuJove
Copy link
Author

Dear wkeeling,

Thanks for your quick reply.
I have updated the version I was on 2.1.2 but I'm chasing a new error:

127.0.0.1:35492: Traceback (most recent call last):
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 113, in handle
    root_layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/modes/http_proxy.py", line 23, in __call__
    layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 285, in __call__
    layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http1.py", line 100, in __call__
    layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 205, in __call__
    if not self._process_flow(flow):
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 306, in _process_flow
    return self.handle_upstream_connect(f)
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 253, in handle_upstream_connect
    return layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 102, in __call__
    layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 278, in __call__
    self._establish_tls_with_client_and_server()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 358, in _establish_tls_with_client_and_server
    self._establish_tls_with_server()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 445, in _establish_tls_with_server
    self.server_conn.establish_tls(
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/connections.py", line 290, in establish_tls
    self.convert_to_tls(cert=client_cert, sni=sni, **kwargs)
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/net/tcp.py", line 382, in convert_to_tls
    context = tls.create_client_context(
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 285, in create_client_context
    param = SSL._lib.SSL_CTX_get0_param(context._context)
AttributeError: module 'lib' has no attribute 'SSL_CTX_get0_param'

Do you have any idea about this?

Much appreciated,

Andreu

@wkeeling
Copy link
Owner

That looks as though you're using a different version of pyopenssl to what Selenium Wire needs. Are you able to see what version with:

pip show pyopenssl

Selenium Wire needs 19.1.0 or above.

@wkeeling
Copy link
Owner

Now I look again, this may be because OpenSSL itself isn't up to date. OpenSSL normally comes preinstalled on most platforms but it's possible that the version you're using could be old. What OS are you using?

@AndreuJove
Copy link
Author

AndreuJove commented Jun 12, 2021

Dear wkeeling,

I'm using Linux. I manage to create a new environment and it is working. But I'm facing a new problem:

time="2021-06-12T19:54:30Z" level=warning msg="[127.0.0.1:44493] (227633266689): cennot finish TLS handshake: EOF"

I'm using this arguments for driver:

SELENIUM_DRIVER_ARGUMENTS = [
    "--headless",
    "log-level=3",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "--ignore-certificate-errors-spki-list",
    "ignore-certificate-errors",
]

I'm only installing selenium-wire:

selenium-wire==4.3.0

@wkeeling
Copy link
Owner

Thanks @AndreuJove I've not seen that one before. Could you try adding mitm_http2: False to your seleniumwire_options:

options = {
    'mitm_http2': False,  # Add this
    'proxy': {
        'http': 'http://localhost:3128',
        'https': 'https://localhost:3128',
    }
}

Also could you let me know what version of OpenSSL you've got installed, with:

openssl version

On my version of Linux, I have OpenSSL 1.1.1 11 Sep 2018 installed.

@AndreuJove
Copy link
Author

Dear wkeeling,

Thanks for your update.
I have added this options to seleniumwire_options but still have the exact same error.

time="2021-06-12T19:54:30Z" level=warning msg="[127.0.0.1:44493] (227633266689): cennot finish TLS handshake: EOF"

The problem is running the code inside a docker container(OpenSSL 1.1.1d 10 Sep 2019), in my local host is running great (OpenSSL 1.1.1f 31 Mar 2020).

Do you have any idea of what is the problem?

Thanks a lot,

Andreu

@wkeeling
Copy link
Owner

wkeeling commented Jun 13, 2021

Does it work if you omit the proxy settings and go direct from Selenium Wire to the target site - using your Docker setup?

@AndreuJove
Copy link
Author

AndreuJove commented Jun 13, 2021

Dear wkeeling,

Thank you for your response. I have tried without proxy and it's not working either.
I have also one more question about selenium-wire, meanwhile I'm debbugging the problem of docker container. I would like to change the level of logging of selenium-wire.
I have tried:

selenium_wire_logger = logging.getLogger("seleniumwire")
selenium_wire_logger.setLevel(logging.ERROR)

But it doesn't work. Do we have any other way?

Thanks a lot for your help,

Andreu

@AndreuJove
Copy link
Author

AndreuJove commented Jun 14, 2021

Dear wkeeling,

Do you have any news on:

time="2021-06-12T19:54:30Z" level=warning msg="[127.0.0.1:44493] (227633266689): cennot finish TLS handshake: EOF"

Thanks a lot,

Andreu Jové

@AndreuJove
Copy link
Author

Dear wkeeling,

I found the problem. I'm using crawlera headless proxy and has the next following problem:

Since crawlera-headless-proxy has to inject X-Headers into responses, it works with your browser only by HTTP 1.1. Unfortunately, there is no clear way how to hijack HTTP2 connections. Also, since it is effectively MITM proxy, you need to use its own TLS certificate. This is hardcoded into the binary so you have to download it and apply it to your system. Please consult with manuals of your operating system how to do that.

https://github.com/zytedata/zyte-smartproxy-headless-proxy

With the configuration that you told me of mitm_http2: False should deactivate the http2 connections and work, but it's not.

Is there any other seleniumwire options can I pass?

Thanks a lot,

Andreu Jové

@wkeeling
Copy link
Owner

I suspect that it has deactivated the http2 connections but the proxy is closing off the connection for some other reason. Just looking at the GitHub page for the proxy, have you tried setting --dont-verify-crawlera-cert for the proxy itself?

@AndreuJove
Copy link
Author

Dear wkeeling,

Unfourtanely I tried and it didn't work either. It is quite weird. Is there any other way to deactivate http2?

Thanks a lot for your help,

Andreu Jové

@wkeeling
Copy link
Owner

Ok thanks. I'm afraid I'm running out of ideas at this point. The mitm_http2: False should definitely work when placed in the seleniumwire_options as it's used as a workaround for other issues. Not all websites support HTTP2. If you specify a site that doesn't use HTTP2 does it work? E.g. http://httpbin.org/anything

@AndreuJove
Copy link
Author

Dear wkeeling,

Can you please reopen the issue? It doesn't work either I guess it is a problem of the connections of both proxies. Is there any other way to ignore certificates errors in selenium-wire?

SELENIUM_DRIVER_ARGUMENTS = [
    "--headless",
    "log-level=3",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "--ignore-certificate-errors",
    "--ignore-certificate-errors-spki-list",
    "--ignore-ssl-errors",
    "--allow-insecure-localhost",
]

Thank you so much,

Andreu Jové

@wkeeling
Copy link
Owner

@AndreuJove will re-open. Selenium Wire ignores SSL certificate errors by default. It's going to require some further debugging. I feel we should perhaps update the title of this ticket to e.g. "Selenium Wire not working with zyte-smartproxy-headless-proxy" if you agree?

@wkeeling wkeeling reopened this Jun 15, 2021
@AndreuJove
Copy link
Author

@wkeeling

Yes sure I change it.

@AndreuJove AndreuJove changed the title --proxy-server=localhost:3128 not working in seleniumwire Selenium Wire not working with zyte-smartproxy-headless-proxy Jun 15, 2021
@heisen273
Copy link

@AndreuJove , @wkeeling had the same issue, managed to fix it by pip3 install -U cryptography.
Cryptography module was outdated.

@wkeeling
Copy link
Owner

Thanks @heisen273

@AndreuJove are you able to confirm whether that fixes for you?

@AndreuJove
Copy link
Author

@heisen273

Thank you very much for your help. Could you please provide me the version of cryptography that you are using. I should put it the requirements.txt of my project that are installed in my docker container.

Thanks!

@wkeeling
Copy link
Owner

@AndreuJove there must be something in the stack (or network) that's stripping out the X- headers. I don't have access to my machine currently, but I'll see if I can reproduce locally a bit later.

@AndreuJove
Copy link
Author

Dear @wkeeling,

Okey!! Can't wait for your answer.

@wkeeling
Copy link
Owner

wkeeling commented Jun 18, 2021

@AndreuJove so I've tried reproducing with:

def interceptor(request):
    del request.headers['Referer']
    request.headers['Referer'] = "new_referer"
    request.headers['foo'] = 'bar'
    request.headers['X-Crawlera-Cookies'] = "disable"
    request.headers['X-Crawlera-Profile'] = "desktop"

and I'm getting:

{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8", 
    "Foo": "bar", 
    "Host": "httpbin.org", 
    "Proxy-Connection": "keep-alive", 
    "Referer": "new_referer", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-60ccddf3-6c7a7c3f237a3b433c331272", 
    "X-Crawlera-Cookies": "disable", 
    "X-Crawlera-Profile": "desktop"
  }
}

I'm using an upstream proxy - but an instance of mitmproxy.

I'm thinking it must be something local to your environment. Is the proxy you're using stripping the headers out? Have you tried without the proxy - just a direct request to httpbin.org?

@AndreuJove
Copy link
Author

Dear @wkeeling,

I have seen that yes it dissappears, maybe is because the proxy that I'm using after selenium-wire. Need to debug more because the 2 proxies are not connecting well.

@AndreuJove
Copy link
Author

Dear @wkeeling,

I'm recieving


[seleniumwire.server] 127.0.0.1:54132: Certificate verification error for www.dieteticacentral.com: self signed certificate in certificate chain (errno: 19, depth: 1)

WARNING | [seleniumwire.server] 127.0.0.1:54132: Invalid certificate, closing connection. Pass --ssl-insecure to disable validation.

Do you know why it's happening this if selenium-wire ignores ssl certificates?

Thanks a lot,

Andreu

@wkeeling
Copy link
Owner

That is strange as Selenium Wire does allow insecure SSL certificates by default. Have you tried adding the upstream proxy root certificate to the browser's trusted root certificate authorities?

@AndreuJove
Copy link
Author

AndreuJove commented Jul 12, 2021

Dear wkeeling,

I do apologise for not replying. I had to develop other issues.

What do you mean about adding the upstream proxy root?

The problem that I face know is that I'm recieving this warning:

[seleniumwire.server] 127.0.0.1:42076: Invalid certificate, closing connection. Pass --ssl-insecure to disable validation.

But I already have already the argument for selenium:

SELENIUM_DRIVER_ARGUMENTS = [
    "--headless",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "--ignore-certificate-errors-spki-list",
    "--ignore-ssl-errors",
    "--ssl-insecure"
]

Thanks a lot,

Andreu Jové

@AndreuJove
Copy link
Author

Dear @wkeeling,

I'm facing a new error, when using interceptor crawlera headless proxy does not recieve what it should recieve.

Do you know something about that?

Thanks a lot,

Andreu

@wkeeling
Copy link
Owner

What data is missing that the proxy should be receiving? It maybe worth temporarily disabling the crawlera proxy and validating that the interceptor is doing the right thing, by checking it against https://httpbin.org. Once confirmed the interceptor is correct, add the proxy back.

Regarding the certificate error can you try adding --ignore-certificate-errors to your list of options? I wouldn't expect that to make any difference as Selenium Wire implicitly sets it - but worth a try.

@AndreuJove
Copy link
Author

AndreuJove commented Aug 3, 2021

Dear @wkeeling ,

Sorry for not replying I had to develop other features.

We have solved the previous error adding another ca.crt of crawlera. But we are fascing another issue related to TLS certificates:

502 Bad Gateway   502 Bad Gateway TlsProtocolException("Cannot establish TLS with www.farmavazquez.com:443 (sni: www.farmavazquez.com): TlsException('SSL handshake error: WantReadError()')")
--

Do we need to install any other certificates? The problem is in the deploy in local is working fine. That's why I think that we are missing something to install in our docker container.

@AndreuJove
Copy link
Author

AndreuJove commented Aug 4, 2021

Dear @wkeeling ,

From the logs:

[seleniumwire.server] 127.0.0.1:47122: Certificate verification error for www.farmavazquez.com: self signed certificate in certificate chain (errno: 19, depth: 1)

The proxy that we are using: https://github.com/zytedata/zyte-smartproxy-headless-proxy is configurated in port 3128:

    'proxy': {
        'http': 'http://localhost:3128',
        'https': 'https://localhost:3128',
    }
}

Why is running in 47122 selenium-wire???

Thanks a lot for your help,

Andreu Jové

@wkeeling
Copy link
Owner

wkeeling commented Aug 4, 2021

Thanks @AndreuJove

Port 47122 is probably the port the Selenium Wire server is listening on. When you run Selenium Wire it starts the server on a random free port number.

As mentioned previously, Selenium Wire is configured to ignore certificate errors by default. Does that message actually cause a problem loading farmavazquez.com ?

@AndreuJove
Copy link
Author

AndreuJove commented Aug 4, 2021

Dear @wkeeling,

Thank you for your help.

Yes that message might be causing an error in our Deployment site (Scrapy Cloud). But I'm not sure if is only this. How can we avoid all these kind of messages?

Also the log shows

[seleniumwire.server] 127.0.0.1:47122: Invalid certificate, closing connection. Pass --ssl-insecure to disable validation.

But I guess is not working either. To pass `--ssl-

Do we have to install any certificate of selenium-wire?

Kind regards,

Hope that we can finally find a solution, your library is very powerful we would like to use it in production.

Andreu

@wkeeling
Copy link
Owner

wkeeling commented Aug 4, 2021

Hmm ok that doesn't seem right. That option is enabled by default. I'll need to look at it in some more detail. Selenium Wire does have its own certificate that you can install into your browser, but I guess you've probably already done that? The instructions are on the main README in the "Certificates" section. I'm away at the moment but I'll try and investigate once I'm back.

@AndreuJove
Copy link
Author

AndreuJove commented Aug 4, 2021

Dear @wkeeling ,

The problem is that I have to install that cerfiticate on my dockerfile. I'm doing it like this.

RUN curl https://raw.githubusercontent.com/wkeeling/selenium-wire/master/seleniumwire/ca.crt -o seleniumire-certificate.crt
RUN sudo cp seleniumire-certificate.crt /usr/local/share/ca-certificates/seleniumire-certificate.crt

RUN sudo update-ca-certificates

Not working either.

Thanks a lot,

Andreu Jové

@wkeeling
Copy link
Owner

wkeeling commented Aug 8, 2021

@AndreuJove try using the certutil command to install the certificate:

RUN curl https://raw.githubusercontent.com/wkeeling/selenium-wire/master/seleniumwire/ca.crt -o seleniumire-certificate.crt
RUN certutil -d sql:$HOME/.pki/nssdb -A -t TC -n "Selenium Wire" -i seleniumire-certificate.crt

The certutil command is part of the libnss3-tools library, so assuming you're using a Debian derivative, you'll need to install that by adding to your Dockerfile:

RUN apt install libnss3-tools

@AndreuJove
Copy link
Author

AndreuJove commented Aug 10, 2021

Dear @wkeeling ,

Thank you for your help.

I'm copying directly the certificate, sometimes curl can fail. I had to create a dir first to run the certuil.

# RUN curl https://raw.githubusercontent.com/wkeeling/selenium-wire/master/seleniumwire/ca.crt -o seleniumire-certificate.crt
COPY ./seleniumire-certificate.crt /usr/local/share/ca-certificates/seleniumire-certificate.crt
RUN mkdir -p $HOME/.pki/nssdb
RUN certutil -d sql:$HOME/.pki/nssdb -A -t TC -n "Selenium Wire" -i /usr/local/share/ca-certificates/seleniumire-certificate.crt

COPY ./zyte-proxy-ca.crt /usr/local/share/ca-certificates/zyte-proxy-ca.crt

RUN sudo update-ca-certificates

I'm facing a new error now:

Message: unknown error: net::ERR_SSL_PROTOCOL_ERROR (Session info: headless chrome=92.0.4515.131

Do you have any idea of this?

Andreu Jové

@AndreuJove
Copy link
Author

AndreuJove commented Aug 11, 2021

Dear @wkeeling ,

I'm having the same error even without using the proxy and without the interceptor, so clearly the problem is selenium-wire. With normal selenium is working on the cloud, but with wire not.

Message: unknown error: net::ERR_SSL_PROTOCOL_ERROR (Session info: headless chrome=92.0.4515.131)

I'm thinking about incompatibilities betweeen the version I'm using of selenium-wire and the chrome-version or the chrome webdriver.

Here I show you how I'm building the chrome and the webdriver:


#============================================
# Google Chrome
#============================================
# can specify versions by CHROME_VERSION;
#  e.g. google-chrome-stable=53.0.2785.101-1
#       google-chrome-beta=53.0.2785.92-1
#       google-chrome-unstable=54.0.2840.14-1
#       latest (equivalent to google-chrome-stable)
#       google-chrome-beta  (pull latest beta)
#============================================

RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
  && echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \
  && apt-get update -qqy \
  && apt-get -qqy install \
    ${CHROME_VERSION:-google-chrome-stable} \
  && rm /etc/apt/sources.list.d/google-chrome.list \
  && rm -rf /var/lib/apt/lists/* /var/cache/apt/*

#============================================
# Chrome Webdriver
#============================================
# can specify versions by CHROME_DRIVER_VERSION
# Latest released version will be used by default
#============================================
RUN CHROME_STRING=$(google-chrome --version) \
  && CHROME_VERSION_STRING=$(echo "${CHROME_STRING}" | grep -oP "\d+\.\d+\.\d+\.\d+") \
  && CHROME_MAYOR_VERSION=$(echo "${CHROME_VERSION_STRING%%.*}") \
  && wget --no-verbose -O /tmp/LATEST_RELEASE "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_${CHROME_MAYOR_VERSION}" \
  && CD_VERSION=$(cat "/tmp/LATEST_RELEASE") \
  && rm /tmp/LATEST_RELEASE \
  && if [ -z "$CHROME_DRIVER_VERSION" ]; \
     then CHROME_DRIVER_VERSION="${CD_VERSION}"; \
     fi \
  && CD_VERSION=$(echo $CHROME_DRIVER_VERSION) \
  && echo "Using chromedriver version: "$CD_VERSION \
  && wget --no-verbose -O /tmp/chromedriver_linux64.zip https://chromedriver.storage.googleapis.com/$CD_VERSION/chromedriver_linux64.zip \
  && rm -rf /opt/selenium/chromedriver \
  && unzip /tmp/chromedriver_linux64.zip -d /opt/selenium \
  && rm /tmp/chromedriver_linux64.zip \
  && mv /opt/selenium/chromedriver /opt/selenium/chromedriver-$CD_VERSION \
  && chmod 755 /opt/selenium/chromedriver-$CD_VERSION \
  && sudo ln -fs /opt/selenium/chromedriver-$CD_VERSION /usr/bin/chromedriver

selenium-wire==4.3.0

More information:
The docker which is not working selenium-wire is:

uname_result(system='Linux', node='job-376562-5-285', release='4.15.0-72-generic', version='#81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019', machine='x86_64', processor='')

Thanks a lot for your help,

Andreu Jové

@wkeeling
Copy link
Owner

Ok thanks @AndreuJove

Are you able to share the Dockerfile that reproduces the issue? I'll try and debug this locally to figure out what's going on.

@AndreuJove
Copy link
Author

AndreuJove commented Aug 13, 2021

Dear @wkeeling ,

The problem that I'm having is to use Selenium Wire in Scrapy Cloud (now called Zyte) it raises what I told you. In my docker container it works fine.

The dockerfile I'm using.
https://support.zyte.com/support/solutions/articles/22000240310-deploying-custom-docker-image-with-selenium-on-scrapy-cloud

It could be any uncompatibility?

Thank you for your help,

I have tried installing cetificates and 100 other combinations and they don't work in the Scrapy Cloud.

Andreu Jové

@AndreuJove
Copy link
Author

AndreuJove commented Aug 18, 2021

Dear @wkeeling ,

We have discovered that the version 3.0.6 works on Scrapy Cloud (not giving ERROR SSL) the problem is that doesn't work with the proxy.

Do you know something about this?

Thank you for your help,

Andreu Jové

@wkeeling
Copy link
Owner

Version 3.0.6 was the last version to use the old backend. From 4.0.0 onward Selenium Wire uses mitmproxy as its backend. The reason for the change was that the old backend struggled to handle upstream proxy servers and would often fail with low level socket errors or fail to connect at all (which sounds like the issue you've had with it). mitmproxy has much better stability with upstream proxies.

@ejulio
Copy link

ejulio commented Sep 28, 2021

hey everyone!
I'm Júlio e developer from Zyte and I've been investigating the issue over the last couple of days.
It is not related to our servers, but related to a library we developed to link the containers with our servers to store the scraping results.

In this library we call socket.setdefaulttimeout which seems to cause some issues with pyOpenSSL
I've created an issue on our library to investigate if we can overcome it somehow
scrapinghub/scrapinghub-entrypoint-scrapy#62

In that issue you'll find a link to the issue in pyOpenSSL and it seems they provide some approaches to fix the issue when using pyOpenSSL. Maybe you can check if it makes sense to add this fix to selenium-wire as well as it will fail in any case that the user called socket.setdefaulttimeout

@wkeeling
Copy link
Owner

wkeeling commented Oct 1, 2021

@ejulio great work in discovering this. I'll consider each of the workarounds in pyca/pyopenssl#168 and see if I can apply a fix to Selenium Wire. Thanks again.

@wkeeling
Copy link
Owner

wkeeling commented Oct 3, 2021

@ejulio there's now a new version of Selenium Wire available with a fix for the socket timeout issue (v4.5.3).

@ejulio
Copy link

ejulio commented Oct 4, 2021

Thanks @wkeeling

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants