-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cp1252 as default encoding for Strings #300
base: master
Are you sure you want to change the base?
Conversation
For Strings with special characters the length needs to be of the utf-8 encoded value or else there's an 1797 ADS Error.
@hkalteBr would you mind bringing an example for this? As strings are 1-Byte characters there should be the same length for type string and type byte. |
I'm using the attribute utf-8 encoding for some strings: {attribute 'TcEncoding':='UTF-8'} |
This is a hard one. Honestly, I didn't know about this attribute. It will be hard to read values encoded this way because it is not determined which size a character has. Also we don't know the encoding on the client side. |
Thanks for looking into this. I remember having the encoding issue with symbol comments. I think we already changed the encoding to cp1252 for them. But we use utf-8 for the normal read/write operations at the moment. Would you mind sharing the link to the documentation? |
https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_plc_intro/2529327243.html&id= |
My initial guess is that no-one really thought about the encoding too much at that time. To address this issue I suppose we could add an attribute to the read/write functions for string_encoding. Alternatively this attribute could be set in the Connection object. |
It is good practice to open an issue before opening a PR so I created #301 on this topic where we can further discuss the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String encoding is an issue concerning most of the functions for read/write activities. So finally they all need to be altered. Also I feel we need to add some tests concerning encoding.
@@ -1089,7 +1089,7 @@ def adsSumWrite( | |||
if data_name in structured_data_names: | |||
buf[offset: offset + data_symbols[data_name].size] = value | |||
elif data_symbols[data_name].dataType == ADST_STRING: | |||
buf[offset: offset + len(value)] = value.encode("utf-8") | |||
buf[offset: offset + len(value.encode("utf-8"))] = value.encode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good starting point. We should introduce a constant DEFAULT_TCENCODING = "cp1252"
. Also we should buffer the encoded value to avoid multiple encodings.
For Strings with special characters the length needs to be of the utf-8 encoded value or else there's an 1797 ADS Error.