Why are escaped character later unescaped? #36

chrisnicola · 2019-11-01T22:35:43Z

Line 20 in cea9a47

// Un-escape characters that get escaped by parse5

I am trying to debug an issue where I basically can't match translation keys for HTML with HTML entities like & in it. It brought me to the above line of code which seems problematic.

In my use case I'm translating an element at runtime using element.innerHTML using innerText is not practical because some translations may actually require the HTML to be part of the translation like with a hyperlink.

As a result the innerHTML has the entity as & but the key is forced to be & by the extractor so they can never match.

Is this intended. Could it be made an optional capability instead?

The text was updated successfully, but these errors were encountered:

lukasgeiter · 2019-11-02T14:52:12Z

Yes this is indeed intended.

My goal for the extracted messages is to match the string in the source code as best as possible. If a developer writes a string containing & in the code, I believe it should be extracted this way (at least by default). Translators will probably also prefer & over & which they might not understand.

That said I just noticed that the current implementation doesn't handle strings actually containing & in the source nicely. That is they will also get converted to &. But that's not really what this issue is about...

I will look into adding an option to escape HTML entities in the same way innerHTML does.

If you don't mind not having entities in your translations, you might also want to consider changing your runtime code to match the behavior of the extractor.

chrisnicola · 2019-11-04T18:50:02Z

@lukasgeiter yeah I'm not sure there is any easy way to do this because of how parse5 works. It is possible parse5 has an option to not change the original text but that would be necessary.

Also changing the runtime code to match the behaviour is not possible. Currently this completely breaks translating strings with & in the for me. Browsers will automatically convert & into & when accessing innerHTML from the rendered HTML to look up the key it will always fail to look it up. In fact the browser spec is the reason parse5 does this as well.

The bottom line is that I can't change the runtime behaviour of web browsers.

chrisnicola · 2019-11-04T18:51:22Z

I should have noted my workaround for the time being will be that I have to convert & to & at the point I do key lookup at runtime. This will work but it seems less than ideal as from what I can tell this looks like it would be a common problem.

lukasgeiter · 2019-11-04T19:02:44Z

The workaround you mention is precisely what I meant by changing your runtime code. I understand that this is not an optimal solution for you. I will definitely add an option for this in the future.

chrisnicola · 2019-11-04T19:08:42Z

Ok thanks, that makes sense.

lukasgeiter added the enhancement label Nov 2, 2019

lukasgeiter mentioned this issue Aug 6, 2020

Feat: Optionally extract raw html instead of parse5 serialization #42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are escaped character later unescaped? #36

Why are escaped character later unescaped? #36

chrisnicola commented Nov 1, 2019

lukasgeiter commented Nov 2, 2019

chrisnicola commented Nov 4, 2019 •

edited

Loading

chrisnicola commented Nov 4, 2019

lukasgeiter commented Nov 4, 2019 •

edited

Loading

chrisnicola commented Nov 4, 2019

Why are escaped character later unescaped? #36

Why are escaped character later unescaped? #36

Comments

chrisnicola commented Nov 1, 2019

lukasgeiter commented Nov 2, 2019

chrisnicola commented Nov 4, 2019 • edited Loading

chrisnicola commented Nov 4, 2019

lukasgeiter commented Nov 4, 2019 • edited Loading

chrisnicola commented Nov 4, 2019

chrisnicola commented Nov 4, 2019 •

edited

Loading

lukasgeiter commented Nov 4, 2019 •

edited

Loading