Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative runtime without struct #21

Open
GreyCat opened this issue Mar 6, 2018 · 5 comments
Open

Alternative runtime without struct #21

GreyCat opened this issue Mar 6, 2018 · 5 comments
Assignees

Comments

@GreyCat
Copy link
Member

GreyCat commented Mar 6, 2018

Python's struct module seems to be pretty inefficient for our purposes. Namely, in all APIs it provides, it requires passing a format string into unpack-like function, which then parses that format string in runtime, calls relevant unpack methods, and then constructs a tuple with a single value, which we extract right away.

Actually, struct even has everything we need — for example, these are functions which read ("unpack") integers, but it's not exposed as Python API.

Would it make sense / be faster to introduce alternative, native Kaitai Struct API which would be written in C, but would be faster than existing one?

Cc @koczkatamas @KOLANICH @arekbulski

@arekbulski arekbulski self-assigned this Mar 6, 2018
@arekbulski
Copy link
Member

arekbulski commented Mar 6, 2018

I will reach out to python mailing list, the guys there are very helpful with advice, and knowledgable too.

There is a way to pre-compile a formatstring into a packer object, but it does also return a tuple.
https://docs.python.org/3/library/struct.html#classes

>>> timeit.timeit("struct.unpack('=b', b'x')", "import struct")
0.18020300199714256
>>> timeit.timeit("p.unpack(b'x')", "import struct; p = struct.Struct('=b')")
0.11721895999653498

@KOLANICH
Copy link
Contributor

KOLANICH commented Mar 6, 2018

Would it make sense / be faster to introduce alternative, native Kaitai Struct API which would be written in C

IMHO no.

1 C is fast, but I wonder if Rust may be better here.
2 If make a better API, let it be a part of python, not a standalone library.
3 as @arekbulski has mentioned, it is possible to precompile the structs parsers. Of course it is the work of KSC to merge adjacent fields into a single struct if it is possible. I wonder if makes any sense to do some flattening.

meta:
 id: l_l_l
seq:
 - id: a
   type: u4
 - id: b
   type: u4
 - id: c
   type: aa
 - id: d
   type: f8
types:
  aa:
    seq:
      - id: b
        type: u1

now (simplified, only conducts sense)

class LLL(...):
  ...
    a=unpack("I", ...)
    b=unpack("I", ...)
    c=Aa(...)
    d=unpack("d")

with precompilation

class LLL(...):
  ab_unp=Struct("II")
  d_unp=Struct("d") # in fact we can precompile for single bytes once and reuse.
  ...
    a, b=self.__class__.ab_unp.unpack(...)
    c=Aa(...)
    d = self.__class__.d_unp.unpack(...) 

with flattening:

class LLL(...):
  abcd_unp=Struct("IIBd")
  ...
    a, b, c_b, d=self.__class__.abcd_unp.unpack(...)
    c=Aa._from_unpacked_tuple((c_b,))

@arekbulski
Copy link
Member

I suggest closing this topic. I think we have already arrived at a conclusion: Implementing Python parser (not runtime) in C would be a major hurdle that would not even be worth the effort. And we already exhausted what can be done in Pure python.

@armijnhemel
Copy link

If dropping Python 2 support is actually an option, then it might be worth looking at the from_bytes() method that is standard in the int module. It basically works like this:

int.from_bytes(byte_string, byteorder=byteorder, signed=signed)

(default: unsigned)

for example:

>>> int.from_bytes(b'\x01\x01\x00\x00', byteorder='big')
16842752
>>> int.from_bytes(b'\x01\x01\x00\x00', byteorder='little')
257

It also allows converting arbitrary length byte strings, so something like implementing u6 becomes very trivial:

>>> int.from_bytes(b'\x01\x01\x00\x00\x00\x00', byteorder='little')
257
>>> int.from_bytes(b'\x01\x01\x00\x00\x00\x00', byteorder='big')
1103806595072

https://docs.python.org/3/library/stdtypes.html#int.from_bytes

@KOLANICH
Copy link
Contributor

For arrays of numbers we probably should use array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants