Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

made CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE log message better readable… #1158

Conversation

zs-stpa
Copy link

@zs-stpa zs-stpa commented Oct 11, 2024

… by surrounding the faulty language code with quotes.

this only touches localized variants of the string, no code changes at all.

this is how it looked prior to the change with the example faulty language code 'Segmentierung der Textdateien'
Bildschirmfoto 2024-10-11 um 15 25 30

… by surrounding the faulty language code with quotes
@brandelune
Copy link
Contributor

Nice catch. Would you mind changing the other bundles as well?

But the wording itself is strange. Why "unkown language code"? What does that refer to?
@miurahr do you have any idea?

@zs-stpa
Copy link
Author

zs-stpa commented Oct 11, 2024

Nice catch. Would you mind changing the other bundles as well?

french already had parentheses around the code and my editor only showed me escaped unicode for the japanese (where i imagine the code pops out naturally in contrast with kana and kanji) and no other bundles contained that string. so, thanks for asking but no thanks ^_^

But the wording itself is strange. Why "unkown language code"? What does that refer to? @miurahr do you have any idea?

this seems to only happen when there is something strange at the very bottom of the segmentation rules like this:
Bildschirmfoto 2024-10-11 um 17 45 46

@brandelune
Copy link
Contributor

But the wording itself is strange. Why "unkown language code"? What does that refer to? @miurahr do you have any idea?

this seems to only happen when there is something strange at the very bottom of the segmentation rules like this:

What this shows is that the problematic "language code" is the second before last. And there is no "language code" attribute to that languagemap element. What is closest to a "language code" is the "languagepattern" attribute but I’m not seeing anything weird there.

@zs-stpa
Copy link
Author

zs-stpa commented Oct 12, 2024

But the wording itself is strange. Why "unkown language code"? What does that refer to? @miurahr do you have any idea?

this seems to only happen when there is something strange at the very bottom of the segmentation rules like this:

What this shows is that the problematic "language code" is the second before last. And there is no "language code" attribute to that languagemap element. What is closest to a "language code" is the "languagepattern" attribute but I’m not seeing anything weird there.

unclear where that languagerulename value came from, i probably typed something stupid when getting to know Ωt years ago and left it there because it never bothered me until now (had to finally convert from legacy version to current one). i am also unsure why the rule name should comply with language codes or names. possibly this was not the intended effect of that method, but this PR is only about readability of the log message itself, not about when or why it is emitted, the author of that lambda expression could maybe have a look at that.
Bildschirmfoto 2024-10-12 um 09 03 38
Bildschirmfoto 2024-10-12 um 09 03 12

@miurahr
Copy link
Member

miurahr commented Oct 31, 2024

@zs-stpa
Please report with plain text. Please report with conditions to reproduce.

A languagerulename field can be localized in the segmentation rule file.
It is defined in Bundle_de.properties in Germany,

CORE_SRX_RULES_LANG_DEFAULT=Standard
CORE_SRX_RULES_FORMATTING_TEXT=Segmentierung von Textdateien
CORE_SRX_RULES_FORMATTING_HTML=Segmentierung von HTML-, XHTML-, ODF- und Infix-Dateien

You report a situation with some rule file which may have a difference with Segmentierung von Textdateien, then detection failed.

@brandelune we should not change just for the message, but we need to understand what is the root cause of the problem. Current rule detection logic uses a comparison in localized rule names.

Please check Bundles.properties

CORE_SRX_RULES_LANG_CATALAN=Catalan
CORE_SRX_RULES_LANG_CZECH=Czech
CORE_SRX_RULES_LANG_GERMAN=German
CORE_SRX_RULES_LANG_ENGLISH=English
CORE_SRX_RULES_LANG_SPANISH=Spanish
CORE_SRX_RULES_LANG_FINNISH=Finnish
CORE_SRX_RULES_LANG_FRENCH=French
CORE_SRX_RULES_LANG_ITALIAN=Italian
CORE_SRX_RULES_LANG_JAPANESE=Japanese
CORE_SRX_RULES_LANG_DUTCH=Dutch
CORE_SRX_RULES_LANG_POLISH=Polish
CORE_SRX_RULES_LANG_RUSSIAN=Russian
CORE_SRX_RULES_LANG_SWEDISH=Swedish
CORE_SRX_RULES_LANG_SLOVAK=Slovak
CORE_SRX_RULES_LANG_CHINESE=Chinese
CORE_SRX_RULES_LANG_DEFAULT=Default
CORE_SRX_RULES_FORMATTING_TEXT=Text files segmentation
CORE_SRX_RULES_FORMATTING_HTML=HTML, XHTML, ODF and Infix segmentation

@miurahr
Copy link
Member

miurahr commented Oct 31, 2024

A message has been introduced in #539

@miurahr
Copy link
Member

miurahr commented Oct 31, 2024

In a discussion with @t-cordonnier previous CONF file uses localized rule names
#539 (comment)

We have a space to improve it without localized names but with the standard code.

@miurahr
Copy link
Member

miurahr commented Oct 31, 2024

A line of Bundle_de.properties

CORE_SRX_RULES_FORMATTING_TEXT=Segmentierung von Textdateien

have been changed by L10N project for v5.5, from

CORE_SRX_RULES_FORMATTING_TEXT=Segmentierung der Textdateien

in 9, May, 2021 at commit 293c930#diff-a0b5e935869c4620cf942b3b73226a631b3f7dafd7f028386e369425a665c00d

Your rule file may be generated in pre-v5.5 days.

@zs-stpa
Copy link
Author

zs-stpa commented Oct 31, 2024

sorry for using screenshots instead of plain text.

yes the segmentation.conf file is from an ancient version, 3.6.0_10 i believe, and the error message occurs when converting the old style team project to a new style project with segmentation.srx. in this case, the rule with the offending name is totally redundant, and i think it is a little bit strange that the name of the segmentation rule groups should conform to locale naming conventions.

i am not that familiar with lambda expressions, but in my last screenshot of the debugger i can see in the lower left corner that the call to that error message comes from within a lambda expression, which is expanded for readability in the box on the right hand side. so the mapping rule is calling getLanguageCodeByName(code) with what should be just a human readable name instead of a code, since that exact name is a localized string from an earlier version as you have pointed out.

@miurahr
Copy link
Member

miurahr commented Nov 2, 2024

I think the message defined as "CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE" is like a debug purpose.
So we can remove it from the resource bundle and change the MapRule.java code like as follows;

  1. Put logger definition private static final ILogger LOGGER = LoggerFactory.getLogger(MapRule.class); after serialVersionUID definition in MapRule.java
  2. Replace Log.logWarningRB("CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE", code); by LOGGER.atDebug().setMessage("got unknown language rule name {}").addArgument(code).log();
  3. Remove "CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE" from Bundle_*.properties

@miurahr
Copy link
Member

miurahr commented Nov 2, 2024

I pushed the fix as #1159
With #1159, there is no problematic message in the resource bundle.

@zs-stpa zs-stpa closed this Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants