Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All URLs being passed as http #7

Open
salomoneb opened this issue Sep 26, 2020 · 3 comments
Open

All URLs being passed as http #7

salomoneb opened this issue Sep 26, 2020 · 3 comments

Comments

@salomoneb
Copy link

salomoneb commented Sep 26, 2020

Expected Result

When I enter https://rollingstone.com, Blacklight tests https://rollingstone.com.

Actual Result

When I enter https://rollingstone.com, Blacklight tests http://rollingstone.com.

Description

It looks like all urls, even ones specified as https, are being passed to the back end as http. The example.js file actually has http hardcoded, though I don't know if this is what your production app is using.

Demo:
https://www.dropbox.com/s/uye88dsfr0qf81c/http.mov?dl=0

This issue occurs with sites other than http://rollingstone.com as well. I used that example because I was finding that my Blacklight results kept timing out when I tested https://rollingstone.com. Rolling Stone does redirect to https if you go to http://rollingstone.com.

I don't know if the http was causing my timeout issue or if it's some other quirk related to that particular site, but not using https when the user enters it in the input field seems like unintended behavior.

@kjetilk
Copy link

kjetilk commented Jan 8, 2024

FWIW, I currently see the same thing if I test rollingstone.com with the Web interface, i.e. they both time out.

I have hacked my own script based on example.js but where I removed that assumption, and then the collector appears to test fine. The inspection result includes:

  "args": "Blacklight Inspection",
  "uri_ins": "http://rollingstone.com",
  "uri_dest": "https://www.rollingstone.com/",
  "uri_redirects": [
    "http://rollingstone.com/",
    "https://rollingstone.com/"
  ]

@BatMiles
Copy link
Member

@salomoneb can you confirm whether this report refers to the Blacklight interface at https://themarkup.org/blacklight, your local version of the blacklight-collector, or both?

@salomoneb
Copy link
Author

salomoneb commented Mar 5, 2024

I was referring to the Blacklight interface at https://themarkup.org/blacklight. I think Rolling Stone has changed their website since I filed this 4 (!) years ago, but I just tried the URL again and the Blacklight interface timed out after 30s. Here's the request copied from Chrome. I want to point out that I entered https://www.rollingstone.com/ in the UI bar, but it seems to be automatically getting converted to http. I did this multiple times to confirm. I think it might have something to do with a validation regex in the scripting of the web page, but I was poking around at obfuscated code and don't want to speculate.

Screenshot 2024-03-05 at 3 12 45 PM

Request

curl 'https://blacklight.api.themarkup.org/graphic-api' \
  -H 'authority: blacklight.api.themarkup.org' \
  -H 'accept: */*' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H 'content-type: text/plain;charset=UTF-8' \
  -H 'origin: https://themarkup.org' \
  -H 'sec-ch-ua: "Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36' \
  --data-raw '{"inUrl": "http://www.rollingstone.com/", "device": "mobile"}'

That returned a 502 and two error messages:

Access to XMLHttpRequest at 'https://blacklight.api.themarkup.org/graphic-api' from origin 'https://themarkup.org' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

POST https://blacklight.api.themarkup.org/graphic-api net::ERR_FAILED 502 (Bad Gateway)

Just for fun, I also tried https://rollingstone.com. That failed in an entirely separate way!

Request

curl 'https://blacklight.api.themarkup.org/graphic-api' \
  -H 'authority: blacklight.api.themarkup.org' \
  -H 'accept: */*' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H 'content-type: text/plain;charset=UTF-8' \
  -H 'origin: https://themarkup.org' \
  -H 'sec-ch-ua: "Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36' \
  --data-raw '{"inUrl": "http://rollingstone.com/", "device": "mobile"}'

Response

{
    "status": "error",
    "page_response": "Navigation timeout of 30000 ms exceeded",
    "error_message": "Navigation timeout of 30000 ms exceeded"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants