-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSI API giving ERRORED_DOCUMENT_REQUEST error for some urls that worked recently #15989
Comments
Does this still occur with 12.0 (we just updated PSI API)? I just tried a few times and it seems to work for me. It may be an intermittent error. |
I just tried using the endpoint we've been using "https://www.googleapis.com/pagespeedonline/v5/runPagespeed" and getting the same error still. Here is the curl command - How do I test this api with 12.0? Using |
Thanks. I'll look further tomorrow.
You already are. There's only one PSI version (v5), but we update the LH version there (which is now 12). |
Hopefully you're able to reproduce the issue. Let me know if not. Thanks! |
I overlooked the 403 in your error message. I get the same locally when using the API, and also via plain usage of curl:
Seems your webserver is blocking UAs that indicate curl was used (or rather, that a web browser is not being used), which would explain failures of the API from programmatic usage. The 403 error is coming from a machine in google making requests to your webserver, which IIUC should be the same via curl kicking off the API request or the webserver doing it.... so actually I'm really unsure why this could be happening. @paulirish mentions perhaps |
I tried I don't work for Is there anyway to make this work by sending any custom headers to the PSI api? Thanks for looking into this. |
I think the options of using PSI for the mentioned domain are limited given the bot control mechanism put in place. Can the CrUX API or CrUX History API be used to fetch the aggregated data from BigQuery without reaching the origin url? |
We have some planned changes to the PSI api that preclude spending time on it now to still get the CruX parts of the API even if the Lighthouse part fails. For now, any error in the Lighthouse part will fail the entire request. Is what you're looking for not part of these APIs? https://developer.chrome.com/docs/crux/methodology/tools#tool-crux-api or https://developer.chrome.com/docs/crux/methodology/tools#tool-crux-history-api |
It would be great to have PSI api to return CrUX part despite Lighthouse failures. Do you have a rough idea when these changes may be available? Is it like 1-2 quarters or longer? For now I'm going to see if we can use the CrUX or CrUX History api. Thanks! |
Unlikely.
Good plan. :) |
FAQ
URL
https://www.realtor.com/realestateandhomes-search/Chicago_IL
What happened?
The url https://www.realtor.com/realestateandhomes-search/Chicago_IL and some other valid urls from the same domain have started failing in the PSI API calls. We used PSI API for these urls for long time successfully but seeing these errors for past couple of weeks. Here is the error:
[Lighthouse returned error: ERRORED_DOCUMENT_REQUEST. Lighthouse was unable to reliably load the page you requested. Make sure you are testing the correct URL and that the server is properly responding to all requests. (Status code: 403)]
All these failing urls continue to work on https://pagespeed.web.dev. I checked bug reports for similar error but most of those are for lighthouse as opposed to PSI API. I see some possible causes listed in #2784, but curious why the same urls work successfully on the PSI site. We run the API from a Python script but same error can be reproduced by running the API on Postman as well.
Please suggest what can be done to resolve this.
What did you expect?
As mentioned earlier, these urls worked till couple weeks ago. We expect it to give us web vital data using field and lab metrics very similar to what we can see even now on https://pagespeed.web.dev.
What have you tried?
Tested different urls and validated on https://pagespeed.web.dev. Other urls from different sites we use in our test suite continue to work. Just the urls from this domain stopped working recently.
How were you running Lighthouse?
PageSpeed Insights, Other
Lighthouse Version
11.5.0
Chrome Version
119.0.0.0
Node Version
No response
OS
Linux & Mac
Relevant log output
The text was updated successfully, but these errors were encountered: