made CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE log message better readable… #1158

zs-stpa · 2024-10-11T13:27:00Z

… by surrounding the faulty language code with quotes.

this only touches localized variants of the string, no code changes at all.

this is how it looked prior to the change with the example faulty language code 'Segmentierung der Textdateien'

… by surrounding the faulty language code with quotes

brandelune · 2024-10-11T13:46:38Z

Nice catch. Would you mind changing the other bundles as well?

But the wording itself is strange. Why "unkown language code"? What does that refer to?
@miurahr do you have any idea?

zs-stpa · 2024-10-11T15:46:05Z

Nice catch. Would you mind changing the other bundles as well?

french already had parentheses around the code and my editor only showed me escaped unicode for the japanese (where i imagine the code pops out naturally in contrast with kana and kanji) and no other bundles contained that string. so, thanks for asking but no thanks ^_^

But the wording itself is strange. Why "unkown language code"? What does that refer to? @miurahr do you have any idea?

this seems to only happen when there is something strange at the very bottom of the segmentation rules like this:

brandelune · 2024-10-12T01:15:09Z

But the wording itself is strange. Why "unkown language code"? What does that refer to? @miurahr do you have any idea?

this seems to only happen when there is something strange at the very bottom of the segmentation rules like this:

What this shows is that the problematic "language code" is the second before last. And there is no "language code" attribute to that languagemap element. What is closest to a "language code" is the "languagepattern" attribute but I’m not seeing anything weird there.

zs-stpa · 2024-10-12T07:26:06Z

But the wording itself is strange. Why "unkown language code"? What does that refer to? @miurahr do you have any idea?

this seems to only happen when there is something strange at the very bottom of the segmentation rules like this:

What this shows is that the problematic "language code" is the second before last. And there is no "language code" attribute to that languagemap element. What is closest to a "language code" is the "languagepattern" attribute but I’m not seeing anything weird there.

unclear where that languagerulename value came from, i probably typed something stupid when getting to know Ωt years ago and left it there because it never bothered me until now (had to finally convert from legacy version to current one). i am also unsure why the rule name should comply with language codes or names. possibly this was not the intended effect of that method, but this PR is only about readability of the log message itself, not about when or why it is emitted, the author of that lambda expression could maybe have a look at that.

miurahr · 2024-10-31T00:20:35Z

@zs-stpa
Please report with plain text. Please report with conditions to reproduce.

A languagerulename field can be localized in the segmentation rule file.
It is defined in Bundle_de.properties in Germany,

CORE_SRX_RULES_LANG_DEFAULT=Standard
CORE_SRX_RULES_FORMATTING_TEXT=Segmentierung von Textdateien
CORE_SRX_RULES_FORMATTING_HTML=Segmentierung von HTML-, XHTML-, ODF- und Infix-Dateien

You report a situation with some rule file which may have a difference with Segmentierung von Textdateien, then detection failed.

@brandelune we should not change just for the message, but we need to understand what is the root cause of the problem. Current rule detection logic uses a comparison in localized rule names.

Please check Bundles.properties

CORE_SRX_RULES_LANG_CATALAN=Catalan
CORE_SRX_RULES_LANG_CZECH=Czech
CORE_SRX_RULES_LANG_GERMAN=German
CORE_SRX_RULES_LANG_ENGLISH=English
CORE_SRX_RULES_LANG_SPANISH=Spanish
CORE_SRX_RULES_LANG_FINNISH=Finnish
CORE_SRX_RULES_LANG_FRENCH=French
CORE_SRX_RULES_LANG_ITALIAN=Italian
CORE_SRX_RULES_LANG_JAPANESE=Japanese
CORE_SRX_RULES_LANG_DUTCH=Dutch
CORE_SRX_RULES_LANG_POLISH=Polish
CORE_SRX_RULES_LANG_RUSSIAN=Russian
CORE_SRX_RULES_LANG_SWEDISH=Swedish
CORE_SRX_RULES_LANG_SLOVAK=Slovak
CORE_SRX_RULES_LANG_CHINESE=Chinese
CORE_SRX_RULES_LANG_DEFAULT=Default
CORE_SRX_RULES_FORMATTING_TEXT=Text files segmentation
CORE_SRX_RULES_FORMATTING_HTML=HTML, XHTML, ODF and Infix segmentation

miurahr · 2024-10-31T00:41:46Z

A message has been introduced in #539

miurahr · 2024-10-31T00:48:07Z

In a discussion with @t-cordonnier previous CONF file uses localized rule names
#539 (comment)

We have a space to improve it without localized names but with the standard code.

miurahr · 2024-10-31T01:01:15Z

A line of Bundle_de.properties

CORE_SRX_RULES_FORMATTING_TEXT=Segmentierung von Textdateien

have been changed by L10N project for v5.5, from

CORE_SRX_RULES_FORMATTING_TEXT=Segmentierung der Textdateien

in 9, May, 2021 at commit 293c930#diff-a0b5e935869c4620cf942b3b73226a631b3f7dafd7f028386e369425a665c00d

Your rule file may be generated in pre-v5.5 days.

zs-stpa · 2024-10-31T08:20:23Z

sorry for using screenshots instead of plain text.

yes the segmentation.conf file is from an ancient version, 3.6.0_10 i believe, and the error message occurs when converting the old style team project to a new style project with segmentation.srx. in this case, the rule with the offending name is totally redundant, and i think it is a little bit strange that the name of the segmentation rule groups should conform to locale naming conventions.

i am not that familiar with lambda expressions, but in my last screenshot of the debugger i can see in the lower left corner that the call to that error message comes from within a lambda expression, which is expanded for readability in the box on the right hand side. so the mapping rule is calling getLanguageCodeByName(code) with what should be just a human readable name instead of a code, since that exact name is a localized string from an earlier version as you have pointed out.

miurahr · 2024-11-02T03:12:03Z

I think the message defined as "CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE" is like a debug purpose.
So we can remove it from the resource bundle and change the MapRule.java code like as follows;

Put logger definition private static final ILogger LOGGER = LoggerFactory.getLogger(MapRule.class); after serialVersionUID definition in MapRule.java
Replace Log.logWarningRB("CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE", code); by LOGGER.atDebug().setMessage("got unknown language rule name {}").addArgument(code).log();
Remove "CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE" from Bundle_*.properties

miurahr · 2024-11-02T04:59:09Z

I pushed the fix as #1159
With #1159, there is no problematic message in the resource bundle.

made CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE log message better readable…

04337bf

… by surrounding the faulty language code with quotes

zs-stpa requested review from Kazephil, brandelune and kosivantsov as code owners October 11, 2024 13:27

brandelune approved these changes Oct 11, 2024

View reviewed changes

Kazephil approved these changes Oct 12, 2024

View reviewed changes

miurahr mentioned this pull request Oct 12, 2024

fix: o.o.c.segmentation.SRX to load conf and save srx in more robust way and remove warning message #1159

Merged

zs-stpa closed this Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

made CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE log message better readable… #1158

made CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE log message better readable… #1158

zs-stpa commented Oct 11, 2024

brandelune commented Oct 11, 2024

zs-stpa commented Oct 11, 2024 •

edited

Loading

brandelune commented Oct 12, 2024

zs-stpa commented Oct 12, 2024

miurahr commented Oct 31, 2024 •

edited

Loading

miurahr commented Oct 31, 2024 •

edited

Loading

miurahr commented Oct 31, 2024

miurahr commented Oct 31, 2024 •

edited

Loading

zs-stpa commented Oct 31, 2024

miurahr commented Nov 2, 2024

miurahr commented Nov 2, 2024

made CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE log message better readable… #1158

made CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE log message better readable… #1158

Conversation

zs-stpa commented Oct 11, 2024

brandelune commented Oct 11, 2024

zs-stpa commented Oct 11, 2024 • edited Loading

brandelune commented Oct 12, 2024

zs-stpa commented Oct 12, 2024

miurahr commented Oct 31, 2024 • edited Loading

miurahr commented Oct 31, 2024 • edited Loading

miurahr commented Oct 31, 2024

miurahr commented Oct 31, 2024 • edited Loading

zs-stpa commented Oct 31, 2024

miurahr commented Nov 2, 2024

miurahr commented Nov 2, 2024

zs-stpa commented Oct 11, 2024 •

edited

Loading

miurahr commented Oct 31, 2024 •

edited

Loading

miurahr commented Oct 31, 2024 •

edited

Loading

miurahr commented Oct 31, 2024 •

edited

Loading