Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are escaped character later unescaped? #36

Open
chrisnicola opened this issue Nov 1, 2019 · 5 comments
Open

Why are escaped character later unescaped? #36

chrisnicola opened this issue Nov 1, 2019 · 5 comments

Comments

@chrisnicola
Copy link

// Un-escape characters that get escaped by parse5

I am trying to debug an issue where I basically can't match translation keys for HTML with HTML entities like & in it. It brought me to the above line of code which seems problematic.

In my use case I'm translating an element at runtime using element.innerHTML using innerText is not practical because some translations may actually require the HTML to be part of the translation like with a hyperlink.

As a result the innerHTML has the entity as & but the key is forced to be & by the extractor so they can never match.

Is this intended. Could it be made an optional capability instead?

@lukasgeiter
Copy link
Owner

Yes this is indeed intended.

My goal for the extracted messages is to match the string in the source code as best as possible. If a developer writes a string containing & in the code, I believe it should be extracted this way (at least by default). Translators will probably also prefer & over & which they might not understand.

That said I just noticed that the current implementation doesn't handle strings actually containing & in the source nicely. That is they will also get converted to &. But that's not really what this issue is about...

I will look into adding an option to escape HTML entities in the same way innerHTML does.

If you don't mind not having entities in your translations, you might also want to consider changing your runtime code to match the behavior of the extractor.

@chrisnicola
Copy link
Author

chrisnicola commented Nov 4, 2019

@lukasgeiter yeah I'm not sure there is any easy way to do this because of how parse5 works. It is possible parse5 has an option to not change the original text but that would be necessary.

Also changing the runtime code to match the behaviour is not possible. Currently this completely breaks translating strings with & in the for me. Browsers will automatically convert & into & when accessing innerHTML from the rendered HTML to look up the key it will always fail to look it up. In fact the browser spec is the reason parse5 does this as well.

The bottom line is that I can't change the runtime behaviour of web browsers.

@chrisnicola
Copy link
Author

I should have noted my workaround for the time being will be that I have to convert & to & at the point I do key lookup at runtime. This will work but it seems less than ideal as from what I can tell this looks like it would be a common problem.

@lukasgeiter
Copy link
Owner

lukasgeiter commented Nov 4, 2019

The workaround you mention is precisely what I meant by changing your runtime code. I understand that this is not an optimal solution for you. I will definitely add an option for this in the future.

@chrisnicola
Copy link
Author

Ok thanks, that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants