Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to define String codepage #17

Open
mattywausb opened this issue May 14, 2018 · 5 comments
Open

Option to define String codepage #17

mattywausb opened this issue May 14, 2018 · 5 comments
Labels
enhancement-functional New functions to add to the component

Comments

@mattywausb
Copy link
Collaborator

mattywausb commented May 14, 2018

When hashing a string, the correct codepage of the string representation is essential. This should default to UTF-8 and more advanced shoud be changeable. If strings in Java are always a UTF-8, this behaviuor must be added to the documentation for clarity.

@mattywausb mattywausb added the enhancement-functional New functions to add to the component label May 14, 2018
@jlolling
Copy link
Owner

jlolling commented May 14, 2018

The code page of Strings in Java is always UTF-16. Anything else is only relevant if we read or write files or streams. I do not think so, we have to declare the code page of Strings.

@rbtrtr
Copy link
Contributor

rbtrtr commented May 18, 2018

The current implementation converts the string which will be used to calulate the hash to UTF-8 by default.
final byte[] result = messageDigest.digest(content.getBytes(Charset.forName("UTF-8")));

@jlolling
Copy link
Owner

I have just enabled to configure this decoding.

@mattywausb
Copy link
Collaborator Author

Havent checked it yet, but already thank you. Thats great.

@jlolling
Copy link
Owner

jlolling commented Oct 9, 2021

Sorry, I have started make it configurable but after thinking about I have doubt this would make sense.
A Java String is always encoded in UTF-16 and the only part of a job where the encoding is relevant is where we read bytes and make a String from it. This is not the case in this component. We do not read bytes and need to know the encoding.
Let as speak about this please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement-functional New functions to add to the component
Projects
None yet
Development

No branches or pull requests

3 participants