Option `--crawl-replace-urls` does not replace the crawled URLs #131

VAdri · 2024-10-17T22:03:50Z

The option --crawl-replace-urls indicates:

Replace URLs of saved pages with relative paths of saved pages on the filesystem

So if I understand correctly the HTML extracted by single-file should have all its URLs crawled with the option --crawl-links replaced by the file path on which they are exported.

However, when I try this command I get only the original URLs:

./single-file-x86_64-linux https://example.com --crawl-links=true --crawl-max-depth=1 --crawl-inner-links-only=false --crawl-replace-urls=true

I also tried this command from the README using the option --crawl-rewrite-rule but it did not work either:

./single-file-x86_64-linux https://www.wikipedia.org --crawl-links=true --crawl-inner-links-only=true --crawl-max-depth=1 --crawl-rewrite-rule="^(.*)\\?.*$ $1"

I was able to make it work on v2.0.0 but not since v2.0.2.

The text was updated successfully, but these errors were encountered:

gildas-lormeau · 2024-10-21T20:49:38Z

In the first example, there are no inner links. The second example does not work anymore (I'm pretty sure it used to work in the past) because there are no link with a resolved URL starting with "https://www.wikipedia.org/" in the page.

VAdri · 2024-10-22T20:06:12Z

Is it supposed to work only for inner links? Because I did put the option --crawl-inner-links-only=false in my first example.

But even with inner links only it doesn't do the trick apparently:

./single-file https://matklad.github.io/2024/09/23/what-is-io-uring.html --crawl-links=true --crawl-max-depth=1 --crawl-inner-links-only=true --crawl-replace-urls=true

gildas-lormeau · 2024-10-22T22:09:57Z

That was not working because --crawl-replace-urls written in lowercase did not work. I fixed this issue in the last version I've just published.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option `--crawl-replace-urls` does not replace the crawled URLs #131

Option `--crawl-replace-urls` does not replace the crawled URLs #131

VAdri commented Oct 17, 2024

gildas-lormeau commented Oct 21, 2024

VAdri commented Oct 22, 2024

gildas-lormeau commented Oct 22, 2024

Option --crawl-replace-urls does not replace the crawled URLs #131

Option --crawl-replace-urls does not replace the crawled URLs #131

Comments

VAdri commented Oct 17, 2024

gildas-lormeau commented Oct 21, 2024

VAdri commented Oct 22, 2024

gildas-lormeau commented Oct 22, 2024

Option `--crawl-replace-urls` does not replace the crawled URLs #131

Option `--crawl-replace-urls` does not replace the crawled URLs #131