Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special characters in article names break on HTML export #17

Open
joepie91 opened this issue Oct 30, 2012 · 0 comments
Open

Special characters in article names break on HTML export #17

joepie91 opened this issue Oct 30, 2012 · 0 comments

Comments

@joepie91
Copy link

Hi,

I attempted to do a HTML export of an Instiki instance I had set up, and noticed that special characters in URLs were giving problems.

Example:

The original article name would be bogus, name. This would be URL-encoded into bogus%2C+name. So far so good. However, while the HTML file is (correctly!) saved as bogus%2C+name.xhtml, the URLs to that page on other pages are not further URL-encoded. This leads to the URL to said page on another page leading to bogus%2C+name.xhtml, which is parsed as bogus, name.xhtml which is of course not correct (as no such file exists). The fix for this would be to further encode the % into %25, thereby making the URL bogus%252C+name.xhtml, which would be interpreted as bogus%2C+name.xhtml, the correct filename.

A quick fix until this bug is solved (for others that are encountering the same problem) is to run the following on a directory of exported files:

find ./ -name "*.xhtml" | xargs sed -i.bak -r -e "s/%([0-9A-Z]{2})/%25\\1/g"

Note that this will also affect URL encoding that does not appear in a URL, but elsewhere on the page, so it's by no means perfect - but at least your links will work.

  • Sven
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant