Translated comments shown in final document #9

nmontesoro · 2023-02-15T15:46:35Z

The contents of the comments get translated correctly, but when reconstructing the BeautifulSoup object their tags are lost, causing the final translated document to show the contents of the comments when opened with a web browser.

The issue, as far as I can work out, is that neither itag_of_soup nor soup_of_itag differentiate between a bs4.element.NavigableString and a bs4.element.Comment (which inherits from the former).

So, itag_of_soup returns an str object regardless of whether its processing a NavigableString or a Comment. When soup_of_itag is called, it checks if the object passed to it is an instance of str and if so constructs a NavigableString, which for the case of comments results in losing the  characters in the final document.

Here's an example:

import argostranslate.translate
import translatehtml

# Original "file"
content = """
<html>
    <head>
        <title>Test</title>
    </head>
    <body>
        <!-- This should not be seen in a browser -->
        <h1>Welcome to Test!</h1>
    </body>
</html>
"""

# Define languages for translation from English to Hindi
en = argostranslate.translate.get_language_from_code("en")
hi = argostranslate.translate.get_language_from_code("hi")
ut = en.get_translation(hi)

# Translate the file with translate_html
content = translatehtml.translate_html(ut, content)

# Write the translated file
with open("test.html", "wt") as fp:
    fp.write(str(content))


Original file


Translation

The text was updated successfully, but these errors were encountered:

nmontesoro · 2023-02-15T15:59:19Z

A workaround might be to remove the comments from the tree before using translate_html, like so:

soup = BeautifulSoup(content, "html.parser")
comments = soup.find_all(text=lambda text: isinstance(text, Comment))
for comment in comments:
    comment.extract()

Then passing str(soup) instead of content to translate_html.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translated comments shown in final document #9

Translated comments shown in final document #9

nmontesoro commented Feb 15, 2023 •

edited

Loading

nmontesoro commented Feb 15, 2023

Translated comments shown in final document #9

Translated comments shown in final document #9

Comments

nmontesoro commented Feb 15, 2023 • edited Loading

nmontesoro commented Feb 15, 2023

nmontesoro commented Feb 15, 2023 •

edited

Loading