Skip to content
This repository has been archived by the owner on Sep 10, 2020. It is now read-only.

Crawler stalling indefinitely--cause unknown #21

Open
psivesely opened this issue Jul 18, 2016 · 1 comment
Open

Crawler stalling indefinitely--cause unknown #21

psivesely opened this issue Jul 18, 2016 · 1 comment
Labels

Comments

@psivesely
Copy link
Contributor

http://xnsoeplvch4fhk3s.onion/ stalls the crawler indefinitely. The 20s page load timeout variable should kill the connection, but for some reason Selenium fails to do so with this site.

Here's the Firefox log:

[07-18 18:00:04] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/ via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/style.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/effects.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/prettyPhoto.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css_002.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jss-style.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/attentionGrabber_css.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/wp-customer-reviews.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/woocommerce.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css3_grid_style_002.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css3_grid_style.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/styles.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_002.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/agent.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/default.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/rounded.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/custom_002.htm via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/converter.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/social-product-automation.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/faq.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/ga_002.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/ga.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-2.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jss-script.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/attentionGrabber_js.htm via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/sws_frontend.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/wp-customer-reviews.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/comment-reply.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/iphorm.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/swfupload_002.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/swfobject.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/swfupload_003.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/swfupload.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-migrate.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/social-product-automation.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/superfish.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/general.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/slides.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/affiliate_platform_style.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/black.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/shortcodes.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/custom.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/select-package.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/featured-tag.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/starttag.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/tick_04.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/TwitterFollowers-Payments-Badges-New1a.jpg via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/TwitterFollowers-Payments-Badges-New1b.jpg via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/logos2.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/Twitter001.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/1369009171_twitter_bird_blueprint-social.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/1364267098_anonymous.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/guarantee4.jpg via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-ui-1.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_008.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-ui-1.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/des_expander.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/money.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/cookie.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/folding.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_007.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_004.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_002.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_006.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_005.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_003.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/rounded.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-plugins.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/woocommerce.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/des_expander.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css/reset.css via xnsoeplvch4fhk3s.onion:0

Here's the traceback after I killed the crawler with ^C:

noah@hs-crawler-nyc:~/FingerprintSecureDrop/fpsd$ ./crawler.py
^C[tbselenium] Request-sent
Traceback (most recent call last):
  File "./crawler.py", line 212, in collect_onion_trace
    self.crawl_url(url)
  File "./crawler.py", line 270, in crawl_url
    wait_for_page_body=True)
  File "/home/noah/FingerprintSecureDrop/fpsd/tor-browser-selenium/tbselenium/tbdriver.py", line 156, in load_url
    self.find_element_by("body", find_by=By.TAG_NAME)
  File "/home/noah/FingerprintSecureDrop/fpsd/tor-browser-selenium/tbselenium/tbdriver.py", line 163, in find_element_by
    EC.presence_of_element_located((find_by, selector)))
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/support/wait.py", line 71, in until
    value = method(self._driver)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/support/expected_conditions.py", line 59, in __call__
    return _find_element(driver, self.locator)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/support/expected_conditions.py", line 274, in _find_element
    return driver.find_element(*by)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 744, in find_element
    {'using': by, 'value': value})['value']
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/remote_connection.py", line 426, in _request
    resp = self._conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./crawler.py", line 466, in <module>
    ratio=int(config["monitored_nonmonitored_ratio"]))
  File "./crawler.py", line 437, in crawl_monitored_nonmonitored_classes
    trace_dir=nonmon_trace_dir)
  File "./crawler.py", line 398, in collect_set_of_traces
    retry=False)
  File "./crawler.py", line 387, in collect_set_of_traces
    iteration=iteration) == "failed"
  File "./crawler.py", line 225, in collect_onion_trace
    self.controller.get_circuits()
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 414, in wrapped
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 409, in wrapped
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 3035, in get_circuits
    response = self.get_info('circuit-status')
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 414, in [07-18 17:58:24] Torbutton INFO: tor SOCKS: https://fonts.gstatic.com/s/permanentmarker/v5/9vYsg5VgPHKK8SXYbf3sMsW72xVeg1938eUHStY_AJ4.woff2 via cmyaw5mzy7dse3xl
wrapped
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 409, in wrapped
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 1113, in get_info
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 1065, in get_info
    response = self.msg('GETINFO %s' % ' '.join(params))
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 580, in msg
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 563, in msg
    raise response
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 853, in _reader_loop
    control_message = self._socket.recv()
  File "/usr/local/lib/python3.5/dist-packages/stem/socket.py", line 177, in recv
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/socket.py", line 156, in recv
    return recv_message(socket_file)
  File "/usr/local/lib/python3.5/dist-packages/stem/socket.py", line 561, in recv_message
    raise stem.SocketClosed('Received empty socket content.')
stem.SocketClosed: Received empty socket content.

I also tried visiting it on my desktop and no page content would load. From the console:

getFirstPartyURI failed for chrome://browser/content/browser.xul: 0x80070057
[07-18 21:26:11] Torbutton WARN: no SOCKS credentials found for current document.
getFirstPartyURI failed for view-source:http://xnsoeplvch4fhk3s.onion/: no host in first party URI view-source:http://xnsoeplvch4fhk3s.onion/
[07-18 21:26:13] Torbutton WARN: no SOCKS credentials found for current document.
@psivesely
Copy link
Contributor Author

Seeing these same errors

getFirstPartyURI failed for chrome://browser/content/browser.xul: 0x80070057
[07-18 22:06:29] Torbutton WARN: no SOCKS credentials found for current document.

when visiting http://cbw7pgk4jfjl4m6x.onion/, which also stalled out the crawler.

@psivesely psivesely added the bug label Sep 15, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant