You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The site I was trying to crawl has query strings as part of the navigation causing the script to fail when trying to save the screen shot on Windows (may or may not repro on other platforms). It appears slugify doesn't trim out all characters illegal for Windows file names.
Example error sequence (sort headers on a table add to the query string): Loading: https://example.org/index/99984?sort=NAME&order=asc (node:19708) UnhandledPromiseRejectionWarning: Error: ENOENT: no such file or directory, open 'C:\Users\Reeves\source\repos\puppeteer\output\https___example.org\https___example.org\index_99984?sort=NAME&order=asc'
I corrected this in my script by adding 'santize-filename' and adding to the screenshots section of the code (on line 146 at this hot second). const path = `./${OUT_DIR}/${slugify(sanitze(page.url))}.png`;
The slugify in this context may be redundant.
The text was updated successfully, but these errors were encountered:
I overlooked that you are using a custom slugify function and not the module. I extended the custom slugify function to include all characters which shouldn't be in a file path (character list based reserved characters list from wikipedia file path article).
Here's my proposed fix:
// Replaces characters from the URL which are illegal in a file path for working dir and saving screenshots.functionslugify(str){returnstr.replace(/[\/:?*%|"<>. ]/g,'_');}
Thanks,
Reeves
ReevesL
added a commit
to ReevesL/puppeteer-examples
that referenced
this issue
Feb 11, 2019
The site I was trying to crawl has query strings as part of the navigation causing the script to fail when trying to save the screen shot on Windows (may or may not repro on other platforms). It appears slugify doesn't trim out all characters illegal for Windows file names.
Example error sequence (sort headers on a table add to the query string):
Loading: https://example.org/index/99984?sort=NAME&order=asc (node:19708) UnhandledPromiseRejectionWarning: Error: ENOENT: no such file or directory, open 'C:\Users\Reeves\source\repos\puppeteer\output\https___example.org\https___example.org\index_99984?sort=NAME&order=asc'
I corrected this in my script by adding 'santize-filename' and adding to the screenshots section of the code (on line 146 at this hot second).
const path = `./${OUT_DIR}/${slugify(sanitze(page.url))}.png`;
The slugify in this context may be redundant.
The text was updated successfully, but these errors were encountered: