Skip to content

Commit

Permalink
Add tests {ValidEq,Expr}StrEncodings
Browse files Browse the repository at this point in the history
Following the comment kaitai-io/kaitai_struct_ruby_runtime#7 (comment)

Test that the string parsed from stream in the specified encoding
is equal to the appropriate literal UTF-8 counterpart.

This is designed to point out a problem in Ruby, where each string
is represented as a byte array holding the characters in specified
encoding. Testing the same strings for equality will then fail if
their byte representations are not the same (i.e. if they use
encodings that represent chars differently).
  • Loading branch information
generalmimon committed Jul 1, 2021
1 parent d201687 commit 585729a
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 0 deletions.
44 changes: 44 additions & 0 deletions formats/expr_str_encodings.ksy
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
meta:
id: expr_str_encodings
endian: le
seq:
- id: len_of_1
type: u2
- id: str1
type: str
size: len_of_1
encoding: ASCII
- id: len_of_2
type: u2
- id: str2
type: str
size: len_of_2
encoding: UTF-8
- id: len_of_3
type: u2
- id: str3
type: str
size: len_of_3
encoding: SJIS
- id: len_of_4
type: u2
- id: str4
type: str
size: len_of_4
encoding: CP437
instances:
str1_eq:
value: str1 == "Some ASCII"
str2_eq:
value: str2 == "こんにちは"
str3_eq:
value: str3 == "こんにちは"
str3_eq_str2:
value: str3 == str2
str4_eq:
value: str4 == "░▒▓"
str4_gt_str_calc:
value: str4 > "┤" # in UTF-8 "░" (U+2591) > "┤" (U+2524),
# in CP437 "░" (0xB0) < "┤" (0xB4)
str4_gt_str_from_bytes:
value: 'str4 > [0xb4].to_s("CP437")'
32 changes: 32 additions & 0 deletions formats/valid_eq_str_encodings.ksy
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
meta:
id: valid_eq_str_encodings
endian: le
seq:
- id: len_of_1
type: u2
- id: str1
type: str
size: len_of_1
encoding: ASCII
valid: '"Some ASCII"'
- id: len_of_2
type: u2
- id: str2
type: str
size: len_of_2
encoding: UTF-8
valid: '"こんにちは"'
- id: len_of_3
type: u2
- id: str3
type: str
size: len_of_3
encoding: SJIS
valid: '"こんにちは"'
- id: len_of_4
type: u2
- id: str4
type: str
size: len_of_4
encoding: CP437
valid: '"░▒▓"'
17 changes: 17 additions & 0 deletions spec/ks/expr_str_encodings.kst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
id: expr_str_encodings
data: str_encodings.bin
asserts:
- actual: str1_eq
expected: true
- actual: str2_eq
expected: true
- actual: str3_eq
expected: true
- actual: str3_eq_str2
expected: true
- actual: str4_eq
expected: true
- actual: str4_gt_str_calc
expected: true
- actual: str4_gt_str_from_bytes
expected: true
4 changes: 4 additions & 0 deletions spec/ks/valid_eq_str_encodings.kst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
id: valid_eq_str_encodings
data: str_encodings.bin

# No asserts, validation is built into the generated code.

0 comments on commit 585729a

Please sign in to comment.