Skip to content
Lukas Geiter edited this page Mar 13, 2023 · 5 revisions

Parser

For extracting messages from HTML sources, we'll start by creating a parser:

extractor.createHtmlExtractor([
    // extractor functions (factories see below)
])

The returned parser instance offers the following methods for parsing source code:

parseFile(fileName, [options]) reads and parses a single file.

parseFilesGlob(pattern, [globOptions], [options]) reads and parses all files matching a pattern.

parseString(source, [fileName], [options]) parses a string.

More details can be found in the API Reference.


Built-in Extractors

These are factories included in the library that offer an easy way to create extractor functions.

Text from Element Content

This factory creates an extractor function that can extract the message text from the element content as well as additional data from attributes. Here's an example:

HtmlExtractors.elementContent('translate', {
    attributes: {
        textPlural: 'translate-plural',
        context: 'translate-context',
        comment: 'translate-comment'
    }
})

Matching

The first parameter is a CSS selector that is used to find elements that should be used for extraction. See the API Reference for more details.

Extracting

With this extractor, the message text is extracted from the content of the HTML element.

Note: If the matched element contains any HTML tags, they will get extracted as well. This can be especially useful if you need to support tags like <strong> in your messages.

The second parameter is an options object which can contain an optional attributes property to specify the HTML attribute names of additional message information:

  • textPlural
  • context
  • comment

You can also customize how the element content should be treated (e.g. trimming whitespace) under the content property in options. More on this in the API Reference

Text from Attribute Value

This factory creates an extractor function that can extract the message text and other data from the HTML element attributes. Here's an example:

HtmlExtractors.elementAttribute('[translate-text]', 'translate-text', {
    attributes: {
        textPlural: 'translate-plural',
        context: 'translate-context',
        comment: 'translate-comment'
    }
})

Matching

The first parameter is a CSS selector that is used to find elements that should be used for extraction. See the API Reference for more details.

Extracting

The second parameter specifies the name of the attribute holding the message text.

The third parameter is an options object which can contain an optional attributes property to specify the HTML attribute names of additional message information:

  • textPlural
  • context
  • comment

Embedded JavaScript

If your HTML contains JavaScript, usually in a <script> tag, you can extract messages from that code as well. To do this, you have to create a JavaScript Parser first, and pass it to this extractor function factory.

For all matching HTML elements, the extractor function will pass the contents to the JavaScript parser to parse it as if it was a normal file.

Here's an example for extracting a translations.getText(...) call expression inside a <script> tag with type text/javascript:

let jsParser = extractor.createJsParser([
    JsExtractors.callExpression('translations.getText', {
        arguments: {
            text: 0
        }
    })
]);

HtmlExtractors.embeddedJs('script[type=text/javascript]', jsParser);

The same can be accomplished for attributes using embeddedAttributeJs.


Full Example

header.html

<header>
    <img src="logo.png" translate-alt alt="Logo"/>
    <ul class="user-menu">
        <li translate translation-context="Menu" translate-plural="{{n}} notifications">{{n}} notification</li>
        <li translate translation-context="Menu">Settings</li>
        <li translate translation-context="Menu" translation-comment="Comment">Logout</li>
    </ul>
</header>

Extractor configuration

let extractor = new GettextExtractor();

extractor
    .createHtmlParser([
        HtmlExtractors.elementContent('[translate]', {
            attributes: {
                textPlural: 'translate-plural',
                context: 'translation-context',
                comment: 'translation-comment'
            }
        }),
        HtmlExtractors.elementAttribute('[translate-alt]', 'alt')
    ])
    .parseFile('header.html');

extractor.savePotFile('template.pot');

Messages in template.pot

#: header.html:2
msgid "Logo"
msgstr ""

#: header.html:4
msgctxt "Menu"
msgid "{{n}} notification"
msgid_plural "{{n}} notifications"
msgstr[0] ""
msgstr[1] ""

#. Comment
#: header.html:6
msgctxt "Menu"
msgid "Logout"
msgstr ""

#: header.html:5
msgctxt "Menu"
msgid "Settings"
msgstr ""