Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISO_8859_1 breaking UTF-8 in CLP logtype String #42

Open
intr3p1d opened this issue Feb 19, 2024 · 0 comments
Open

ISO_8859_1 breaking UTF-8 in CLP logtype String #42

intr3p1d opened this issue Feb 19, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@intr3p1d
Copy link

Bug

Cause

clp-ffi-java internally use StandardCharsets.ISO_8859_1 in EncodedMessage.getLogTypeAsString();

public String getLogTypeAsString() {
if (null == logtype) {
return null;
} else {
return new String(logtype, StandardCharsets.ISO_8859_1);
}
}

(getDictionaryVarsAsStrings also)

Effect

https://github.com/apache/pinot/blob/0a4398634be81cdbbe891b3da249134ef98743e7/pinot-plugins/pinot-input-format/pinot-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java#L151-L154

This makes some characters broken like this:
Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: /u0011 이상이어야 합니다
into
Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: � ����� ���

This is fine after going through the decode function, but when dealing with individual logtype, these broken strings don't seem appropriate (LIKE searches, etc).

clp-ffi version

0.4.4

Environment

Linux, Java
https://github.com/apache/pinot/blob/1d490c1ac3268103a16d77ddfa70f8f8602f9e96/pom.xml#L160

Reproduction steps

Encode some characters which is not supported by ISO_8859_1
Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: /u0011 이상이어야 합니다
Then get the logtype
Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: � ����� ���

@intr3p1d intr3p1d added the bug Something isn't working label Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant