Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too early timeout / timeout not controllable #135

Open
emp-00 opened this issue Oct 22, 2024 · 11 comments
Open

Too early timeout / timeout not controllable #135

emp-00 opened this issue Oct 22, 2024 · 11 comments

Comments

@emp-00
Copy link

emp-00 commented Oct 22, 2024

This is another issue in the context of #134

In this case I can confirm it's not only happening with thorium but also with chrome (Win11, latest single-file revision).

Symptom:

  • I am using --urls-file="test_URL-List.txt" with total 4 urls to be downloaded (urls cannot be shared since they are network-statistics pages behind a cookie/login-firewall; the urls deliver regular modern html/javascript pages, sometimes taking up to 10 seconds to load due to some processing time on the server). All other command line switches see Not working: --browser-load-max-time (no timeout, browser running "forever") #134 , this issue is also independent from "--browser-load-max-time" which I currently have removed
  • 4+1 instances of chrome.exe are started by single-file.exe and the download works most of the time
  • However, sometime single-file.exe is terminating the download too early: the downloaded file consists of a short html/script section with the "hourglass" shown by the page when it has not yet fully loaded
  • In my logs I see that the total single-file.exe processing time for this call is always around ~30 (+-3) seconds

Obviously, in this case single-file.exe is stopping/terminating the chrome instances too early. Expected behavior is to fully download the pages...

From my observations, I have a feeling that currently 30 seconds timeout is hardcoded in single-file.exe?? This may be completely wrong, or I might be doing something stupid - any help is much appreciated to make my download process more reliable...

I was looking into below option but I don't fully understand "what to use", "what the options mean", so have not tested any of those yet.... Is this the path forward for my issue?

--browser-wait-until: When to consider the page is loaded (InteractiveTime, networkIdle, networkAlmostIdle, load, domContentLoaded) <string> (default: "networkIdle")

gildas-lormeau added a commit that referenced this issue Oct 22, 2024
@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Oct 22, 2024

I've improved the implementation by adding a watchdog after the network event is triggered. If a new network event is triggered meanwhile, the watchdog is reset. Hopefully, this should fix your issue. The watchdog delay value can be set with the new --browser-wait-until-delay option (1000 ms by default). This new implementation will be available in the next version.

@emp-00
Copy link
Author

emp-00 commented Oct 22, 2024

Thank you @gildas-lormeau for the head-up. Sounds good, I will test it as soon as it's available.

What exactly is the "browser-wait-until-delay" with only 1 second as default timing? Do you suggest to keep it at default for my issue above or rather go to 10 or more seconds? 1 second for a watchdog is pretty short, but I probably misunderstand the meaning of this option. Would be great if you write down a clear description in the help file.

For the other option " --browser-wait-until" I also suggest to improve the help text. At least I don't really understand what those cryptic options mean. Since you did not mention this, am I right that this option does not help solving my issue?

@gildas-lormeau
Copy link
Owner

I've updated single-file. Please let me know if it fixes your issue with the default settings.

@emp-00

This comment was marked as outdated.

@gildas-lormeau
Copy link
Owner

@emp-00 Sorry for the late reply, did you to use the option --browser-wait-delay in that particular case?

@emp-00
Copy link
Author

emp-00 commented Nov 5, 2024

@gildas-lormeau : Yes, I did and I am still using it. However, I cannot 100% tie the option to "success" or "failed".

What I do: I am pulling data from a personal photovoltaics monitoring status-webpage every 15mins. Practically daily I run into the situation single-file only downloads the "hourglass page" (see above). When I open the webpage in the same chrome browser manually at the same time, the page opens normally after showing the hourglass page just for usually 5 seconds.

My standard call uses "--browser-wait-until-delay=25000" (25 secs); Yesterday I tried removing the option when the download always failed (every 15 mins, for ~2 hours) and then the single-file download suddenly worked but in the next 15 min cycle it again only downloaded the hourglass page.

I also tried 2500, 5000 and 10000 ms -> sometimes it works, sometimes it does not. From my best guess, it's a problem from single-file which does not really work with "my status-page". Only very rarely I find that the page has a delay of more than 10 seconds and still with 25000 ms this should also be covered, which is doesn't ... I'm puzzled. Can you see something in the hourglass html code that explains why single-file does not wait sufficiently long for the regular page to be displayed?

Stupid question: What exactly does "browser-wait-until-delay" do? You explained it's a watchdog timer. I clearly noticed, that the single-file execution always takes at least as long as this "delay": I therefore have a feeling that this option is just a "delay" before single-file downloads the page? So it does not start downloading and then waits at least 25 secs? Maybe there's a glitch in the code for this option? I only leave it at 25 secs because I see that the download takes longer - believing that this helps, but my gut feeling says differently.

Your description reads "delay of time in ms to wait before considering the page is loaded when the value of --browser-wait-until is reached" --> honestly, I still don't fully understand what "the value of ..." really means. Those values mentioned for the other option, I just don't understand them. Maybe I need to use a different "value" for the other option (--browser-wait-until)?

Thanks for your time + efforts, single-file.exe is really a great tool!

@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Nov 7, 2024

You could make it 100% reliable by injecting a script used to wait for the dynamic content of the status page. This can be done with the --browser-scriptoption.

For example, this demo page https://gildas-lormeau.github.io/tmp/index-popup.html displays "Hello!" after a delay between 2s and 7s. So, if you run the command single-file https://gildas-lormeau.github.io/tmp/index-popup.html --dump-content multiple times, sometimes it works (see <div class=popup-container><div>Hello!</div></div>) and sometimes <div>Hello!</div> is not present in the saved page.

This can be fixed by creating a script file (e.g. "script.js") with the content below and running single-file https://gildas-lormeau.github.io/tmp/index-popup.html --dump-content --browser-script script.js.

const CSS_SELECTOR_TO_WAIT = ".popup-container > div";

dispatchEvent(new CustomEvent("single-file-user-script-init"));

addEventListener("single-file-on-before-capture-request", async event => {
    event.preventDefault();
    await waitForElement(CSS_SELECTOR_TO_WAIT);
    dispatchEvent(new CustomEvent("single-file-on-before-capture-response"));
});

function waitForElement(selector) {
    return new Promise(deferWaitForElement);

    function deferWaitForElement(callback) {
        setTimeout(() => {
            if (document.querySelector(selector)) {
                callback();
            } else {
                deferWaitForElement(callback);
            }
        }, 1000);
    }
}

You can also use this script as-is. All you need to do is to change the value of CSS_SELECTOR_TO_WAIT (i.e. ".popup-container > div") and use a CSS selector adapted to the page you want to save. You can use Chrome to generate this selector by following the procedure described here: https://www.geeksforgeeks.org/how-to-generate-css-selector-automatically/, you have to select one element which is present in the page only when it is fully loaded.

--browser-wait-until-delay is the timeout delay of a watchdog which can be reset if a network event or a browser event is detected before the timeout is expired. I will improve the documentation, I agree it's unclear.

A new option --debug-messages-file will be added in the next version, if can help you to understand how --browser-wait-until-delay and browser-wait-until work.

@emp-00
Copy link
Author

emp-00 commented Nov 9, 2024

Thanks @gildas-lormeau for the --browser-script idea!

  1. I have copied a fitting CSS_SELECTOR from my (fully loaded) two pages in question
  2. saved two complete script files as "StatusPage.js" and "EnergyPage.js"
  3. I'm also using this URL-List.txt file for in total 3 pages to download:
https://enlighten.enphaseenergy.com/web/XXXXXXX/today/graph/hours --browser-script="StatusPage.js"
https://enlighten.enphaseenergy.com/web/XXXXXXX/history/graph/hours --browser-script="EnergyPage.js"
https://www.cnbc.com/quotes/NVDA?qsearchterm=nvidia
  1. executed single-file as follows:
    single-file.exe --browser-width=640 --browser-height=1200 --block-fonts --compress-HTML=false --errors-file="ERROR-Log.txt" --filename-conflict-action=overwrite --urls-file="URL-List.txt" --filename-template="Grabber_{url-pathname-flat}{url-search}.html"

This seems to work :-) regarding the js-script waiting for the page to fully load (still to be confirmed with a longer testing period).

However, I notice, that --filename-template does not work as expected anymore: The downloaded files for the first two pages using the --browser-script option (see URL-List.txt) are NOT saved according to the template anymore. This is a problem for my script relying on unique filenames starting with the keyword "Grabber".,..

Could you check if this is my fault or maybe there's a bug in single-file.exe? No errors are displayed but the filenames of the two first pages in URL-List.txt are definitely not generated with the desired template format. Maybe --filename-template is in conflict with using parameters after the url in URL-List.txt ? Thanks for looking into this!

@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Nov 9, 2024

Thank you for the feedback, the --filename-template issue should be fixed in the last version I've just published.

@emp-00
Copy link
Author

emp-00 commented Nov 11, 2024

@gildas-lormeau : Thank you very much - I confirm
a) the above --filename-template bug is fixed with yesterday's reöease and
b) the --browser-script option with your javascript above is working perfectly fpr my case! :-)

"Last question": Is there any timeout for the --browser-script waiting via the above script or does it wait "forever"? Can I use --browser-load-max-time to adjust the timeout also for the waiting-script to e.g. 30 seconds?

@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Nov 12, 2024

I've added the option--browser-capture-max-time in the latest version in order to set the maximum time allowed when capturing the page. Setting --browser-capture-max-time=30000 should help. Note that this will trigger a “Capture timeout” error when the time has elapsed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants