-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization #255
base: master
Are you sure you want to change the base?
Serialization #255
Conversation
…iting that gets expression to write properly
…r to use new KaitaiStruct.* abstract classes
…f members in a collection
…or implementations
…ry size checks, terminator inclusion checks, etc
These languages use UniversalFooter, which implements fetchInstancesFooter, but fetchInstancesHeader does nothing there - so there was a misplaced footer that did not correspond to any active block, and all generated code was typically broken by that.
See b5c2e92 {,write}alignToByte() methods are now called inside the runtime library as needed: kaitai-io/kaitai_struct_python_runtime@1cb84b8
+ let --read-write imply --zero-copy-substream=false, see kaitai-io/kaitai_struct#1060 (comment)
(fix regression from d1f16dd)
var wasUnaligned = false | ||
seq.foreach { (attr) => | ||
val nowUnaligned = isUnalignedBits(attr.dataType) | ||
if (wasUnaligned && !nowUnaligned) | ||
lang.alignToByte(lang.normalIO) | ||
lang.attrWrite(attr, attr.id, defEndian) | ||
wasUnaligned = nowUnaligned | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to avoid using var
and imperative. foldLeft
seems to do the trick here, cleanly conveying the semantics and allowing e.g. advanced parallelization of such loops:
var wasUnaligned = false | |
seq.foreach { (attr) => | |
val nowUnaligned = isUnalignedBits(attr.dataType) | |
if (wasUnaligned && !nowUnaligned) | |
lang.alignToByte(lang.normalIO) | |
lang.attrWrite(attr, attr.id, defEndian) | |
wasUnaligned = nowUnaligned | |
} | |
seq.foldLeft(false) { case (wasUnaligned, attr) => | |
val nowUnaligned = isUnalignedBits(attr.dataType) | |
if (wasUnaligned && !nowUnaligned) | |
lang.alignToByte(lang.normalIO) | |
lang.attrWrite(attr, attr.id, defEndian) | |
nowUnaligned | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, one of the major changes in Java and Python runtime libraries related to serialization was that all alignToByte
insertion logic is handled in the runtime libraries themselves and the alignToByte()
call generation is disabled in the compiler for both Java and Python:
kaitai_struct_compiler/shared/src/main/scala/io/kaitai/struct/languages/JavaCompiler.scala
Lines 497 to 500 in 2eca3de
// NOTE: the compiler does not need to output alignToByte() calls for Java anymore, since the byte | |
// alignment is handled by the runtime library since commit | |
// https://github.com/kaitai-io/kaitai_struct_java_runtime/commit/1bc75aa91199588a1cb12a5a1c672b80b66619ac | |
override def alignToByte(io: String): Unit = {} |
kaitai_struct_compiler/shared/src/main/scala/io/kaitai/struct/languages/PythonCompiler.scala
Lines 486 to 489 in 2eca3de
// NOTE: the compiler does not need to output alignToByte() calls for Python anymore, since the byte | |
// alignment is handled by the runtime library since commit | |
// https://github.com/kaitai-io/kaitai_struct_python_runtime/commit/1cb84b84d358e1cdffe35845d1e6688bff923952 | |
override def alignToByte(io: String): Unit = {} |
(and now I see a small mistake in the comment in PythonCompiler
- technically it should be "the compiler does not need to output alignToByte()align_to_byte()
" per Python naming)
So I guess the piece of code you're reviewing can be simplified to just lang.attrWrite(attr, attr.id, defEndian)
.
shared/src/main/scala/io/kaitai/struct/datatype/Endianness.scala
Outdated
Show resolved
Hide resolved
out.puts | ||
out.puts("def _fetch_instances(self):") | ||
out.inc | ||
out.puts("pass") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a limitation to make the insertion of pass
conditional? Like in readHeader with isEmpty
. For this particular function it doesn't look so straightforward given that attrs seems to always be non-empty.
Although innocuous, having a "dangling" pass
stands out on first read and makes one wonder if there was some kind of bug in the code generation or in the source ksy file. Probably a nit though!
def _fetch_instances(self):
pass
self.attr_01._fetch_instances()
...
Same occurrences in:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a limitation to make the insertion of
pass
conditional? Like in readHeader withisEmpty
. For this particular function it doesn't look so straightforward
Yep, it's not very straightforward with how the current compiler code is internally structured. The problem is that when the compiler generates a method header (typically in the *Header
methods), it doesn't know whether that function will end up having some body or not. In case of readHeader
, we're just very lucky that the body will be empty if and only if there are seq
fields. Things get a bit more complicated in case of writeHeader
, but it's doable - generated _write
methods might not be empty even if there are no seq
fields, provided there are parse instances that need their "write itself upon access" flags initialized:
kaitai_struct_compiler/shared/src/main/scala/io/kaitai/struct/ClassCompiler.scala
Line 333 in cb0c1eb
lang.writeHeader(defEndian, !instances.values.exists(i => i.isInstanceOf[ParseInstanceSpec]) && seq.isEmpty) |
For _check
or _fetchInstances
(let alone something reused for many different purposes like condIfHeader
!), trying to predict ahead of time with an ultimate boolean expression whether the function will or won't be empty borders on insanity (again, with the current code structure). It's of course theoretically possible, just absolutely unmaintainable and definitely not worth spending time on.
I'll rather take all the superfluous pass
es - the worst thing that can happen is that someone inspects the generated code and the pass
es "stand out" to them, as you put it. Far worse would be if we tried to eliminate them, but in that effort we forget to insert a pass
in some cases when it should be there, which means the entire generated Python code will be rejected by Python interpreter and people will actually have to patch these manually because KS compiler produced invalid code.
So until there's time and willingness to refactor the compiler code so that we can reliably know whether the function body is empty or not at the time of generating its header, I don't think there's any point in doing anything about this. And this is very low priority for me, given that it only affects code purity. Almost every KS issue should have a higher priority than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the long and insightful answer! This confirms my suspicion that it was tricky. I agree this is very low priority, sorry for the noise[1].
Thanks also for the tremendous effort done in serialization. I'm super excited about this feature :D
[1] At the risk of adding more noise, just in case I wanted to point out that an empty docstring is an alternative to using pass
for an empty body function. But probably without content it's still not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness: I've realized it wouldn't be hard to only insert pass
es when they're needed if we added pass
in the *Footer
method instead of the *Header
method. At the time of when *Footer
is called, we already know whether there was some code inserted since the corresponding *Header
.
I may explore this idea one day.
Relevant tests were added in these commits: - Java: kaitai-io/kaitai_struct_tests@e92fb33 - Python: kaitai-io/kaitai_struct_tests@e7869f0 This commit is needed for the `testCheckBadValidOldIo` / `test_check_bad_valid_old_io` test methods to pass. The _check() method is intended to verify pure data consistency and is supposed to be called at the time when the actual `_io` is not available yet (or is not in the correct state) and should not be used even if it's not `null`. Before this commit, if we wanted to initialize a KS object by reading an existing stream and then edit the data and write them, _check() would read the position from the old `_io` used for reading and report it in the validation error, which is wrong.
@GreyCat Could you please continue with the review? |
... as in Java, see https://github.com/kaitai-io/kaitai_struct_compiler/blob/829a14f1e33e8e48eeae726c8a287a5967bcb668/shared/src/main/scala/io/kaitai/struct/languages/JavaCompiler.scala#L153 This commit was extracted from the `serialization` branch (originally e776c98), see #255
@GreyCat @generalmimon Hey, how is this getting along? Would be great to have serialization support in main! |
No description provided.