Feat: Optionally extract raw html instead of parse5 serialization #42

vbraun · 2020-08-03T18:48:23Z

This adds a rawHtml option to extract the actual source html instead of the parse5 roundtripped version; Not sure if its a good idea but I'm trying to replace a gettext extractor that does just this.

extractor
    .createHtmlParser([
        HtmlExtractors.elementContent('translate, [translate]', {
            attributes: {
                context: 'translate-context',
                comment: 'translate-comment',
            },
            rawHtml: true,
        }),
    ])
    .parseFilesGlob('./src/**/*.html');

Documentation and lint needs fixing, but maybe its not a good idea to start with? ;-)

lukasgeiter · 2020-08-06T14:15:30Z

Can you go a bit more into detail on the problem your change addresses? Is this just about HTML entities (similar to #36) or do you have other issues with the extracted contents?

vbraun · 2020-08-06T15:33:06Z

Yes, its about HTML entities, that is, roundtripping through parse5 loses information. In particular, the angular-gettext-cli extractor doesn't do that and … extracts as literal. Now as a first step to replace it I wanted to reproduce the extracted po file in an existing project, and found that I was unable to do so for various html entities.

Now one might argue that this the correct way of doing things since the DOM does that as well, and you are going to match el.innerText / el.innerHTML anyways. And I'm open to editing my po files to move html entites around. Still, it seems that for full flexibility one should at least be able to have po files where the msgid is either

innerHTML
innerText
actual source of the template

Slightly related question: getElementContent has some special handling for <, >, and & but not   even though thats also in the spec: https://html.spec.whatwg.org/multipage/parsing.html#escapingString

Feat: Optionally extract raw html instead of parse5 serialization

60884c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Optionally extract raw html instead of parse5 serialization #42

Feat: Optionally extract raw html instead of parse5 serialization #42

vbraun commented Aug 3, 2020 •

edited

Loading

lukasgeiter commented Aug 6, 2020

vbraun commented Aug 6, 2020

Feat: Optionally extract raw html instead of parse5 serialization #42

Are you sure you want to change the base?

Feat: Optionally extract raw html instead of parse5 serialization #42

Conversation

vbraun commented Aug 3, 2020 • edited Loading

lukasgeiter commented Aug 6, 2020

vbraun commented Aug 6, 2020

vbraun commented Aug 3, 2020 •

edited

Loading