Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any port for plain C? #263

Open
Zorgatone opened this issue Sep 25, 2017 · 103 comments
Open

Any port for plain C? #263

Zorgatone opened this issue Sep 25, 2017 · 103 comments

Comments

@Zorgatone
Copy link

Zorgatone commented Sep 25, 2017

Hi,
I would like to know if you would consider (or have any plans already) to port the project for use with "plain" C (other than C++ and C#). I would use it, and not all the systems (even embedded maybe?) support C++ and/or C#. Having a C version would enable portability on any system and even more languages with C bindings

@GreyCat
Copy link
Member

GreyCat commented Sep 25, 2017

You're completely correct, C port has been in heavy discussion since almost the very beginning of the project, yet nobody ever created an issue about it (and that's bad, because it's hard to collect all these discussions in one place).

There are/were several major issues with C target, though. It become a somewhat lengthy review of what's been discussed over the years, but I believe I've remembered most of the points and tried to order them from most serious to least serious.

Completely different workflow in mind

It turns out that most people who need C support in KS have completely different workflow that what KS provides now. Right now, KS does a very simple thing: it gets a binary format serialization spec and generates API around it. It usually does zero transformations, except for very simple and technical ones (i.e. endianness and that kind of stuff) — whatever's in the format, it all is reflected exactly as is in the memory. C people usually strive for performance and memory efficiency and would prefer to not save stuff that can be used right away and then just thrown out.

A very simple example:

seq:
  - id: len_foo
    type: u2
  - id: foo
    size: len_foo
    type: str

This is usually ok for many modern languages, but a lot of people who wanted C target automatically suggest that:

  • len_foo must not be stored in the structures that KS generates in memory at all — it must be used once during the parsing and then just thrown away
  • Given that we're talking about "string" data type, why not convert it into "pure C string", as most C stdlib functions expect it to be — i.e. no length information, just a zero byte termination. Of course, this also implies that (1) we'll need to allocate one more byte than len_foo for that zero byte, (2) we need to actually put that zero byte into a string (although it's clearly not existing in the stream), (3) we're silently agreeing on dealing with zero-terminated strings, i.e. foo could never contain a zero inside it.

A more complex (and real-life) example is a typical parsing of any network packet, for example, an udp_datagram. Typical current vision of what KS might create is something like this:

typedef struct udp_datagram_t {
  uint16_t src_port;
  uint16_t dst_port;
  uint16_t length;
  uint16_t checksum;
  char* body;
} udp_datagram_t;

udp_datagram_t* read_udp_datagram(kaitai_stream* io) {
  udp_datagram_t* r = (udp_datagram_t*) malloc(sizeof(udp_datagram_t));

  r->src_port = read_u2be(io);
  r->dst_port = read_u2be(io);
  r->length = read_u2be(io);
  r->checksum = read_u2be(io);
  r->body = read_byte_eos(io);

  return r;
}

It turns out that many users would be comfortable with completely different mechanism than "read function just fills in some structures in memory and returns a pointer to them":

  • Some would want certain callbacks to be called every time an attribute is "parsed", and do not need it to be stored in single memory structure at all, i.e.:
void read_udp_datagram(kaitai_stream* io, udp_diagram_callbacks* callbacks) {
  uint16_t src_port = read_u2be(io);
  if (io->status != OK) {
    udp_diagram_callbacks->on_error(io->status);
    return;
  }
  udp_diagram_callbacks->on_read_src_port(src_port);

  // ...
}
  • Some suggested more complex pubsub-like models, so there's some intermediate machinery where user applies to "subscribe" to only certain events like "this part of structure is finally completely read".

  • Some users suggested that such low-level packet parsing usually happens on incomplete/fragmented packets/structures. Typically, in such a situation KS would just stop reading and throw an exception. In C, however, they would prefer to be able to continuously resupply additional stream buffer contents into a single "reader" procedure, which would keep track of what have been already "parsed" on previous iterations (and not invoke relevant callbacks twice), and actually even to be able to resume parsing from certain points.

Not an "everything is an expression" language

Simply put, almost everything we had before supported "every KS expression translates into target language expression" idiom. That is, if you need to do string concatenation, i.e.

seq:
  - id: a
    type: strz
  - id: b
    type: strz
instances:
  c:
    value: a + b

... you do that a + b in one single-line expression everywhere. Even C++ allowed us to get away with a + b using std::string. In C, however, it traditionally boils down to many lines and temporary variables:

// Real-life code would be even more complex, probably with more checks, etc.
size_t len_a = strlen(a);
size_t len_b = strlen(b);
char *tmp = (char *) malloc(len_a + len_b + 1);
memcpy(tmp, a, len_a);
memcpy(tmp + len_a, b, len_b);
tmp[len_a + len_b] = 0;

This issue, however, was more or less solved with advent of #146.

Complex memory management

What's not solved however, is that such arbitrary allocations of temporary variables sometimes result in more complex memory management and need for additional manual cleanup. In the example above, tmp would likely be used directly as c value, and thus there's no need to store it additionally. However, if multiple operations will occur, we'll either need to store these intermediate values, or use some clever logic to either reusing these temporary buffers (and/or avoid extra copying), or clean them up right after they're no longer needed (i.e. earlier than in object's destructor).

Actually, even "allocate everything on the heap" is not universally agreed upon in many C apps. So, typical parsing of user-defined type like that:

udp_datagram_t* r = (udp_datagram_t*) malloc(sizeof(udp_datagram_t));

might be suggested to be replaced with passing a ready-made pointer to structure to fill into that read_* functions and creation of that udp_datagram_t on a stack of the caller instead.

No single standard library

For KS, we need some basic stuff like:

  • Byte arrays, which could report length of the contents that they store. There are no standard structure like that in C:
typedef byte_array {
    int len;
    void* data;
} byte_array;
  • Strings (again, knowing its length and, ideally, encoding-aware). If we'll stick to traditional char* strings, then we're getting hit with "no zero bytes inside" requirement, which might hurt some formats.
  • True element arrays which (1) know its size, (2) allow growth.

There are tons of "enhanced standard" libraries that do that, but there's no universal agreement on that. Probably roughly 80% of C applications roll something homebrew like that inside them. Out of "standard" implementations, there is glib, klib, libmowgli, libulz, tons of lesser known libraries, there's a huge assortment of string-related libs, array-related libs, etc. Out of them, probably glib is most well-known and well-maintained, but even a suggestion to use that frequently encounters a huge resistance in many C developers.

Another possible way (albeit not way too well-received by many developers) is to roll our own (yet another) implementation of all that stuff, and deal with ks_string*, ks_bytes*, ks_array*, etc, instead of char*, whatever_t[], etc.

No simple solution here, and whatever we would choose probably won't be accepted by many C developers. Probably if we'll implement support for top 3 (or top 5) popular libs that will cover at least some popular options.

Exception support

As we all know, C does not have any standard exception support, and typical KS-generated code relies on them a lot, i.e.:

  r->src_port = read_u2be(io);
  r->dst_port = read_u2be(io);
  r->length = read_u2be(io);
  // ...

On every step, read_u2be might encounter end of stream (or IO error) and it won't be able to suceed parsing yet another 2 bytes. Typical solution for that in C is using return codes and passing value-to-fill by reference, i.e.:

int err;

err = read_u2be(io, &(r->src_port));
if (err != 0)
  return err;

err = read_u2be(io, &(r->dst_port));
if (err != 0)
  return err;

// ...

Since Go support introduction (#146), that became possible, although probably it still be a pain-in-the-ass to use in C :(

Another quick "solution" for C is to use signals/abortions to handle these erroneous situations. In fact, it would even be ok in many use cases like embedded stuff, because things are not usually supposed to blow up there and if they do, then everything is lost already, there's no graceful exists, user interactions, "Send error report to the vendor" dialogs, etc.

Stream abstraction

Relatively minor and solveable issue, but still an issue: what would be a concept of "KS stream" be in C? Two popular options:

  • FILE* — usually it's not buffered, so many sequential "read_u2be" would translate into literal "read 2 bytes" syscalls, which is terribly inefficient. Besides, one can't read from in-memory array using FILE*
  • char* — just use in-memory array and screw everything else. On-disk file parsing can be done using mmap, but this is (1) very platform-dependent, (2) pretty inefficient for lots of smaller files. And a question about handling IO errors (or at least end-of-streams) still remain, so we'll need a wrapper for that to store mapped length.

Probably C runtime would need to implement all these options and allow for end-user to choose. Nothing too scary, but still an issue to be solved.

@GreyCat
Copy link
Member

GreyCat commented Sep 25, 2017

And, to answer these:

Having a C version would enable portability on any system

Well, I won't be that optimistic. Given all the stuff above, chances are tons of C people would still opt to roll things manually because of all these compromises and "does not exactly fit my workflow" argument.

and even more languages with C bindings

Probably it won't be that easy :( KS C runtime is likely to be easier to rewrite in another language than go through all that binding hassle, and then you'll have to do that "binding" glue code for every particular type ported.

@Zorgatone
Copy link
Author

Hi thanks for the lengthy and detailed answer. I'm glad to hear that some discussion about C was already made, and considered.
For the "string" argument I would go for "standard C" zero-terminated strings. Other "strings" that contain zeros in them I would tread them as binary data of given length.
For the libraries to use (many that would encounter resistance) I'd go for custom implementation. That could be long to make but shouldn't be too hard to do (let me know if you want some help, I would be happy to do so).

For Exception support what about CException? See link
Otherwise we could do something like C11's bound-checked string functions and return errno_t.

For the KS stream any of the two solutions would be ok. If I remember correctly you can set/enable the default buffering/buffer of FILE *. Otherwise allocate everything manually on memory and release it later.

About the "workflow" argument, everyone will always decide on their own what library to use or what to do with their own code (even doing all custom handling), so I wouldn't think too much about that.

For the "C bindings", it would be good for languages not yet implemented that can use the C bindings easily.

I think a good solution would be to have a kslib_init() and kslib_free() or something similar if the library needs to initialize and allocate/release its own resources. Even if it looks ugly or you have to save and pass around an extra arguments to the library's functions. Still better than nothing.

I believe it would be "uglier" to just have to make C functions "wrapped around" C++ API calls, or even worse not being able to compile on some systems, or having to implement everything (without this library) manually every time.

I like the project (even if I haven't had the chance to play around with it yet) and, if I have some extra time, I'd really like to give a hand and help to make a C port (even if it would be a side-project with some differences)

@GreyCat
Copy link
Member

GreyCat commented Sep 25, 2017

@Zorgatone Ok, for a start, I would suggest to really play around with KS and see what it does and what it does not. May be you'll decide that it won't meet your expectations anyway?..

For Exception support what about CException? See link

The link just says "Non-Image content-type returned" for me :( If you mean something like that — https://github.com/ThrowTheSwitch/CException — at the very least, that's +1 extra library of dependencies, and in C world every library is usually a major hassle. But may be that could be done too.

I'd really like to give a hand and help to make a C port

You've probably seen http://doc.kaitai.io/new_language.html — right now we're somewhere in between stages (2) and (3). From all the issues that I've outlined, this "totally different workflow expected" is definitely the most serious one. I'm not too keen on doing lots of work that almost nobody would want to use.

@Zorgatone
Copy link
Author

Zorgatone commented Sep 25, 2017

Understandable, thanks for the reply. I was planning to do some testing with KS in the near future, maybe I will try and make my own library in C if I think I'll need it :)

PS: thanks for the link, it's a good starting point

@KOLANICH
Copy link

len_foo must not be stored in the structures that KS generates in memory at all — it must be used once during the parsing and then just thrown away

I don't use C, I use C++ and IMHO the preferred approach is not to store the info in a standalone structure, but to decompose the thing into a set of fixed (or variable size, if language supports it) dumb structures and put them upon raw virtual memory. #65

Given that we're talking about "string" data type, why not convert it into "pure C string", as most C stdlib functions expect it to be — i.e. no length information, just a zero byte termination.

for strz type just pass a pointer to that memory. There is issue with non-zero-byte terminators though.

Complex memory management

IMHO we should just use C++ for that. C coders can write in C++ in C-style if they want.

@GreyCat
Copy link
Member

GreyCat commented Sep 25, 2017

I'll just leave it here, just in case: https://matt.sh/howto-c

This link was heavily suggested by several modern C proponents that I've discussed KS support for C. Suggestions to modern C style guides are also most welcome. The only one that I know is Linux kernel coding style guide — this is my personal preference for C as well, but chances are that there are other popular style guides in other areas?

@Zorgatone
Copy link
Author

Zorgatone commented Sep 25, 2017

@GreyCat nice link! Useful to know that. But still not all compilers support all the C11 features unfortunately. At least it should be good to use C99, especially for the stdint.h int types (I really didn't know about the fast and least ints! I knew about the fixed-size ones, though).

@KOLANICH
Copy link

KOLANICH commented Sep 25, 2017

Most of things from that are also valid for C++.

@Zorgatone
Copy link
Author

I'm linking also another article with critics to matt's "how to c in 2016" article, to consider the other opinions as well: https://github.com/Keith-S-Thompson/how-to-c-response

@arekbulski
Copy link
Member

For C strings, I would recommend that one field would end up adding few fields to resulting struct, with similar names and different types. For example:

  r->text_array = read_array(io, 10);
  r->text_str = r->text_array.to_str();

This does not consume more memory (only const amount), as the char* pointer points to same data as the array. End user might want some glib arrays, or char*, why not give them both?

@GreyCat
Copy link
Member

GreyCat commented Jan 19, 2018

@arekbulski Giving them both is probably a bad idea: it will require dependency on glib, and would add extra unneeded bloat for both parties. Besides, char* strings are just not enough anyway: you need to be able to do .length on that, and you just can't do that with char* string.

@arekbulski
Copy link
Member

arekbulski commented Jan 19, 2018

Another possible way (albeit not way too well-received by many developers) is to roll our own (yet another) implementation of all that stuff, and deal with ks_string*, ks_bytes*, ks_array*, etc, instead of char*, whatever_t[], etc.

You suggested using our own types, and it could provide convenience functions for transforming ks_arrays to glib bytearrays and other types. Hm? Glibc would be supported, not required.

@arekbulski
Copy link
Member

@GreyCat
I would be willing to start implementing the C runtime. If you would approve, then I would outline the runtime file first (the types and methods for bytearrays etc), and if that meets your standards, we (you) would update the compiler/translator to suport the runtime, and I would implement the meat in runtime. What do you think?

@smaximov
Copy link

smaximov commented Feb 8, 2018

Besides, char* strings are just not enough anyway: you need to be able to do .length on that, and you just can't do that with char* string

@GreyCat, you may consider rolling your own string implementation which uses the same technique as sds. This will make Kaitai strings compatible with most functions accepting char* (unless a Kaitai string contain an extra zero byte in addition to the terminating NULL).

@GreyCat
Copy link
Member

GreyCat commented Feb 8, 2018

@arekbulski Sure, go ahead :) I'm not sure you've seen it, we also have this article with an overall new language support plan and an implementation graph like this one.

@GreyCat
Copy link
Member

GreyCat commented Feb 8, 2018

@smaximov Yeah, that's probably how it should be done for "roll your own" implementation.

@arekbulski
Copy link
Member

I have sweat sour feelings about SDS. I really like the idea, I really do, but the implementation is horrible. The repo you linked has bug reports and bugfixes going back 4 years and still hanging. They also implemented variable-length prefix (the count field) which makes it bananas. We can implement our own SDS, I do not recommend using theirs.

Big thanks for sharing this with us, @smaximov !

@jonahharris
Copy link

Is anyone working on this, even as a prototype?

@GreyCat
Copy link
Member

GreyCat commented Apr 4, 2018

Not really. Personally, I would probably return to this one after #146, as experience with Go is very much the same as with C (except for the fact that Go relatively ok strings and slices).

@arekbulski
Copy link
Member

I promised to implement the C runtime, but that was few months ago. Since then I had much work on Construct, and now I am working on few things in KS. I am still willing to implement this, but I cant work on everything at once. If you wish, then I will get on top of C but other work items would need to be shelved instead.

@DarkShadow44
Copy link

Any updates on this? I'd like to help, but I'm not familiar with scala...

@GreyCat
Copy link
Member

GreyCat commented Dec 20, 2018

No updates. Unfortunately, most of #263 (comment) still stands. It's probably still a good idea to complete Go port first, as it is shares many common concepts (except for the hassle with memory management).

@DarkShadow44
Copy link

FWIW, I have a (for me) working C version at
https://github.com/DarkShadow44/UIRibbon-Reversing/blob/master/tests/UIRibbon/parser_generic.c
https://github.com/DarkShadow44/UIRibbon-Reversing/blob/master/tests/UIRibbon/parser_uiribbon.c
It's pretty simple, copying data from the file into an in-memory struct. It also support writing data. What do you think about that approach? It might not fulfill all use cases, but to me it does the job.

@KOLANICH
Copy link

KOLANICH commented Feb 2, 2021

BTW, I have a half-finished (but not yet published, development stalled because I got other tasks) proposal of how it should look like for C and C++ for one damn simple spec .

In general:

  • C structs are made of pointer are public interface for access. They are headers of private structures.
  • private structures are pieces of memory + a header of pointers to them. Private structureres are used to insert items.
  • when serializing private structures they are memcpyed into the map, then their sources are truncated to their headers using realloc and the pointers are changed to point into a memory map.
  • when parsing raw structures are laid over memory. No streamed io at all, only memory-mapped one. Compatible to larger-than ram files as long as one doesn't need random access more than mapped pages and the index + the pages fit into memory. Also compatible to driving hardware.
  • serialization of simple structs already in their place is almost zero-cost

@DarkShadow44
Copy link

Would you have an example of how that C code would look like? I don't quite understand the "private structures" bit. In my example, all structs are public.
I don't really do streams either, it's an in-memory stream abstraction. How do you do memory mapping in standard C?

@KOLANICH
Copy link

KOLANICH commented Feb 2, 2021

Would you have an example of how that C code would look like?

I have said that it is unfinished. But I'd create a small example just now illustrating what I mean, but without any guarantees of correctness.

I don't quite understand the "private structures" bit. In my example, all structs are public.

Very easy

struct a{
  uint64_t *c;
};
struct a_priv{
  uint64_t c;
};
struct a_full{
  struct a pub;
  struct a_priv priv;
};

struct a * construct_a(){
  sruct a_full *a = (struct a_full *) malloc(sizeof(a_full));
  a->pub.c = & a->priv.c;
  return (struct a *) a;
}

void process(struct a * c){
  *(c->b) = 42;
}

This way we access the data only via pointers, so we access it uniformly no matter where are they. It is at cost - there is overhead, a pointer per a var. It is possible to make it more efficient by keeping only pointers to structs, not to every fields, but in C it will cause the API being terrible and sufficiently different from it in other langs. In C++ it can be fixed by operator override and constexprs.

I don't really do streams either, it's an in-memory stream abstraction.

I guess some libc can implement fread fseek fwrite API over mmaps.

How do you do memory mapping in standard C?

Standard C doesn't even have any sane functions to work with strings. It is an extremily bad too -fpermissive stagnating language (once I was debugging a memory-safety issue for quite a long time .... because C compiler almost silently (with a warning, but who looks at warnings in a project that is already filled with warnings?) allowed to pass an incompatible type as an arg (or maybe I missed an arg, I don't remember exactly)). IMHO there is no sense to use C where C++ can be used. Usually when I see C fans, I see the inacceptable shitcode. The only real way to fix that shitcode ... is to implement a kind of OOP myself above plain C. I prefer to just use C++, but there are some projects created by C fanatics (in the sense I have told above, the projects are full of shitcode) I had to contribute to.

@Nosh-Ware
Copy link

I'm not too experienced here, but I'll give it a go once I get to a phase where I'll need it. Hopefully, others will lend a hand with that review part?

@jonahharris
Copy link

jonahharris commented May 31, 2022

@DarkShadow44 I wanted to say excellent work on this - I was playing with it and am pretty happy with it. There are a couple issues; if you pull the SQLite KSY and build that you'll see an issue with the generated code in terms of a minor incorrect code generation. But, everything else I've tried--including my personal KSYs--seems fine. A far as memory allocations, were you thinking of generating a destruction/free function as well, or leave that up to the implementor?


For anyone who wants to try the C generation fork but isn't familiar with Scala builds, it's easy to build via Docker:

git clone [email protected]:DarkShadow44/kaitai_struct_compiler.git && cd kaitai_struct_compiler
docker run \
  -it \
  --rm \
  --name metarank-sbt-build \
  -u sbtuser \
  -v "$(pwd)":/home/sbtuser \
  -w /home/sbtuser \
  hseeberger/scala-sbt:11.0.7_1.3.13_2.12.12 \
  sbt compilerJVM/universal:packageBin

You'll find the distributable zip (kaitai-struct-compiler-0.10-SNAPSHOT.zip) in jvm/target/universal/.

A crappy basic driver (for the zip example) is:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>

#include "kaitaistruct.h"
#include "zip.h"

void
logger (
  const char *txt
) {
  fprintf(stdout, "%s\n", txt);
}

int
main (
  int    argc,
  char **argv
) {
  FILE *fp = fopen("ramdisk.zip", "rb");

  if (NULL == fp) {
    fprintf(stderr, "Couldn't open file...");
    return (1);
  }

  ks_config config;

  ks_config_init(&config, logger);
  ks_stream *ksfs = ks_stream_create_from_file(fp, &config);
  ksx_zip zip;
  int rc = ksx_read_zip_from_stream(ksfs, &zip);

  printf("rc = %d\n", rc);
  printf("sections = %" PRIi64 "\n", zip.sections->size);
  for (size_t ii = 0; ii < zip.sections->size; ++ii) {
    printf("section[%04zu] type... %" PRIu16 "\n", ii,
      zip.sections->data[ii]->section_type);

    /* If this is the central directory, print out the file name as well... */
    if (513 == zip.sections->data[ii]->section_type) {
      ksx_central_dir_entry *de = (ksx_central_dir_entry *) zip.sections->data[ii]->body;
      printf("              file... %*s\n",
        (int) de->file_name->len, de->file_name->data);
    }
  }
  ks_stream_destroy(ksfs);
  fclose(fp);
  return (0);
}

@DarkShadow44
Copy link

@jonahharris Thanks! I'll look into the SQLite issues. free functions are the last big thing missing, for my simple programs that simply isn't necessary. But of course I'll add that into the code generation. It's just that I'm really good at procrastinating. ;)

@iboofkratom
Copy link

iboofkratom commented Nov 28, 2022

Hi @DarkShadow44, are there any updates on this project? I've found about it a week ago and I'm realizing that it would be very useful for me. Do you plan to continue with it, or is it abandoned for good.
I sadly can't help much with coding, as I don't know Scala. But I would gladly test it thoroughly, if it would help.
I already tried rebasing your repo to the upstream kaitai-struct-compiler repo, and there is just two conflicting lines that are easily resolvable. Only problem is that it doesn't compile afterwards :D The error is:

[error] /home/escaje/exchange_adam_jenik/SW/ksc_updated/shared/src/main/scala/io/kaitai/struct/languages/CCompiler.scala:13:7: class CCompiler needs to be abstract, since:                                        
[error] it has 5 unimplemented members.                                                                                                                                                                            
[error] /** As seen from class CCompiler, the missing signatures are as follows.                                                                                                                                   
[error]  *  For convenience, these are usable as stub implementations.                                   
[error]  */                                         
[error]   def condRepeatCommonInit(id: io.kaitai.struct.format.Identifier,dataType: io.kaitai.struct.datatype.DataType,needRaw: io.kaitai.struct.datatype.NeedRaw): Unit = ???                                     
[error]   def condRepeatEosHeader(id: io.kaitai.struct.format.Identifier,io: String,dataType: io.kaitai.struct.datatype.DataType): Unit = ???                                                                      
[error]   def condRepeatExprHeader(id: io.kaitai.struct.format.Identifier,io: String,dataType: io.kaitai.struct.datatype.DataType,repeatExpr: io.kaitai.struct.exprlang.Ast.expr): Unit = ???                      
                                                                                                                                                                                                                   
[error]   def condRepeatUntilFooter(id: io.kaitai.struct.format.Identifier,io: String,dataType: io.kaitai.struct.datatype.DataType,untilExpr: io.kaitai.struct.exprlang.Ast.expr): Unit = ???                      
                                                                                                         
[error]   def condRepeatUntilHeader(id: io.kaitai.struct.format.Identifier,io: String,dataType: io.kaitai.struct.datatype.DataType,untilExpr: io.kaitai.struct.exprlang.Ast.expr): Unit = ???                      
                                                    
[error] class CCompiler(typeProvider: ClassTypeProvider, config: RuntimeConfig)                                                                                                                                    
[error]       ^                                                                                                                                                                                                    
[error] /home/escaje/exchange_adam_jenik/SW/ksc_updated/shared/src/main/scala/io/kaitai/struct/languages/CCompiler.scala:567:16: method condRepeatEosHeader overrides nothing.                                     
[error] Note: the super classes of class CCompiler contain the following, non final members named condRepeatEosHeader:                                                                                             
[error] def condRepeatEosHeader(id: io.kaitai.struct.format.Identifier,io: String,dataType: io.kaitai.struct.datatype.DataType): Unit                                                                              
[error]   override def condRepeatEosHeader(id: Identifier, io: String, dataType: DataType, needRaw: NeedRaw): Unit = {                                                                                             
[error]                ^                            
[error] /home/escaje/exchange_adam_jenik/SW/ksc_updated/shared/src/main/scala/io/kaitai/struct/languages/CCompiler.scala:607:16: method condRepeatExprHeader overrides nothing.
[error] Note: the super classes of class CCompiler contain the following, non final members named condRepeatExprHeader:                                                                                            
[error] def condRepeatExprHeader(id: io.kaitai.struct.format.Identifier,io: String,dataType: io.kaitai.struct.datatype.DataType,repeatExpr: io.kaitai.struct.exprlang.Ast.expr): Unit                              
[error]   override def condRepeatExprHeader(id: Identifier, io: String, dataType: DataType, needRaw: NeedRaw, repeatExpr: expr): Unit = {                                                                          
[error]                ^                            
[error] /home/escaje/exchange_adam_jenik/SW/ksc_updated/shared/src/main/scala/io/kaitai/struct/languages/CCompiler.scala:636:16: method condRepeatUntilHeader overrides nothing.                                   
[error] Note: the super classes of class CCompiler contain the following, non final members named condRepeatUntilHeader:                                                                                           
[error] def condRepeatUntilHeader(id: io.kaitai.struct.format.Identifier,io: String,dataType: io.kaitai.struct.datatype.DataType,untilExpr: io.kaitai.struct.exprlang.Ast.expr): Unit                              
[error]   override def condRepeatUntilHeader(id: Identifier, io: String, dataType: DataType, needRaw: NeedRaw, untilExpr: expr): Unit = {                                                                          
[error]                ^                                                                                                                                                                                           
[error] /home/escaje/exchange_adam_jenik/SW/ksc_updated/shared/src/main/scala/io/kaitai/struct/languages/CCompiler.scala:679:16: method condRepeatUntilFooter overrides nothing.                                   
[error] Note: the super classes of class CCompiler contain the following, non final members named condRepeatUntilFooter:                                                                                           
[error] def condRepeatUntilFooter(id: io.kaitai.struct.format.Identifier,io: String,dataType: io.kaitai.struct.datatype.DataType,untilExpr: io.kaitai.struct.exprlang.Ast.expr): Unit                              
[error]   override def condRepeatUntilFooter(id: Identifier, io: String, dataType: DataType, needRaw: NeedRaw, untilExpr: expr): Unit = { 

I tried to fix it myself, but as i said, my Scala skills are nonexistent, so i couldn't tackle the problem. Does someone have any idea what could cause this?

Thanks!!

@DarkShadow44
Copy link

@iboofkratom
No updates as of yet, but I planned to finish and try to upstream it when I am on holidays, starting second week of December.

For now, don't rebase it, it needs fixes to work with current KaitaiStruct. I also need to rework some of the parsing, so it fits the current infrastructure better.

Current plan:

  • Get reader functional and mergable
  • Merge if maintainers allow it at this point
  • "Destructor" function generation, no more memory leaks from C code
  • Hopefully merge
  • General Writing support in Kaitai Struct
  • Writing support for C

@iboofkratom
Copy link

@DarkShadow44 Thanks for the info, good work! I finally managed to parse a HelloWorld packet (it took some figuring out), so I'm looking forward to start using this for my C projects.

@DarkShadow44
Copy link

DarkShadow44 commented Dec 14, 2022

Just updated to the latest kaitai struct... Branches are at

EDIT: Those branches are now outdated, see my later comments for the MR!

Feedback, like always, welcome!

There's still a free kinks to iron out, but at least the tests (mostly) pass.
Things on my agenda:

  • Fixing broken tests
  • Rethink error architecture, right now errors are stored inside a stream. Not all functions set errors properly
  • Refactor expression/assignment handling, get rid of __EXPR__
  • Rethink _x / _le / _be functions, maybe merge _x functions
  • Recheck c code style
  • Code cleanup

As I already said, generated C code and library code is pretty finished, feel free to review! Although keep in mind it needs to support C89 and gcc4.3, for best compatibility. It's gotten pretty complex, but I think that's needed to support all the usecases.

EDIT: Last update: 2022-12-17

@KOLANICH
Copy link

Aren't ks_div and ks_mod pure functions?

@DarkShadow44
Copy link

Sure, why do you ask? Btw, feel free to open issues in my own repos if you want to discuss my changes.

@DarkShadow44
Copy link

DarkShadow44 commented Dec 18, 2022

Pushed a few major reworks, the rest should be minor adjustments... But I needed to get the architecture in a way I like it.
Thoughts on the current state of generated C code?
EDIT: Sqlite issues are solved as well. Still need to make a testcase though...

@alokprasad
Copy link

Is there any branch with C support?

@DarkShadow44
Copy link

Yes, see #263 (comment)

I need more time though, work started again and I'm pretty busy...

@DarkShadow44
Copy link

DarkShadow44 commented Jun 25, 2023

I added memory management, and created a MR:

Could you please create a repo for the C runtime so I can make a MR as well?

Review and Feedback very welcome, of course this goes not only to the maintainers but to everyone who is interested in a C implementation!

@generalmimon
Copy link
Member

@DarkShadow44:

Could you please create a repo for the C runtime so I can make a MR as well?

Sorry, I missed this call to action. Here it is: https://github.com/kaitai-io/kaitai_struct_c_runtime

@DarkShadow44
Copy link

DarkShadow44 commented Jul 4, 2023

Thanks, created a PR and updated my last comment.
Also, note that I intentionally squished all commits, I didn't think my messy development history would be useful. I personally think that makes the review even harder...
If you want, I could break this down into smaller parts though, just tell me if you want me to do so.

@DarkShadow44
Copy link

Any news regarding a review?

@SamuelMarks
Copy link

Oh missed this issue when I made mine - #1078

Great to see progress in creating a C target.

PS: One thing that no one has mentioned here is Nim. Kaitai supports this target. This target can natively produce C.

@DarkShadow44
Copy link

Thanks, feel free to test and report any issues. Or help reviewing. :)
Nim can compile to C, but that doesn't seem very suitable for C inter-opt, I very much prefer a native C generator.

@Ryanf55
Copy link

Ryanf55 commented Dec 6, 2023

Is there any chance you could put together a "hello world" example for the native C support? I'm new to KaiTai, but have written a fair number of low-level uart drivers in ArduPilot. I'm sure the developer team there would have some useful feedback on the generated C code.

@DarkShadow44
Copy link

Not exactly sure what a "hello world" example would be. There's a bunch of formats here:

I could provide generated C code for some of those if that would help?

For a simple hello world the code probably looks way too verbose, but this is needed to handle all the advanced features kaitai struct supports.

@Ryanf55
Copy link

Ryanf55 commented Dec 6, 2023

Not exactly sure what a "hello world" example would be. There's a bunch of formats here:

I could provide generated C code for some of those if that would help?

For a simple hello world the code probably looks way too verbose, but this is needed to handle all the advanced features kaitai struct supports.

I understand. A representative and well-documented protocol can be found here:
https://s3.amazonaws.com/files.microstrain.com/GQ7+User+Manual/dcp_content/introduction/Command%20Overview.htm

It has start/end bytes to sync up the packet. There are commands and responses, and the responses are interleaved with sensor data on the wire. Most of the protocols have some sort of custom checksum, however this uses the Fletcher 2 byte checksum.

Thus, it would be great to see the generated code for

  • Packing a ping packet
  • How to add in the checksum compution
  • Call serialize to a buffer
  • deserialize a buffer
  • Unpack the data

If you think any of the existing protocols you linked are similar that, then great, otherwise I'm happy to take a stab at writing a short .ksy file.

@DarkShadow44
Copy link

DarkShadow44 commented Dec 6, 2023

Packing a ping packet

Kaitai struct is mostly focused on reading data, this C implementation is for reading only.

How to add in the checksum compution

Checksums should IMHO be handled outside kaitai struct as well.

Call serialize to a buffer
deserialize a buffer
Unpack the data

Not sure what you mean.

Give me a ksy you want to see and I can generate code for you though.

@jhgorse
Copy link

jhgorse commented Dec 29, 2023

Referring back to @GreyCat's OR to OP on the topic of C Exceptions. This is more of a bookmark-for-later-addition rather than something to add to the current stack for development. Here is a minimal dependency implementation from the Ren-C implementation Rebol 3, an interpreted language:
https://github.com/metaeducation/ren-c/blob/master/src/include/sys-trap.h

The complexity seems to come from supporting WebAssembly. It is a setjmp()/longjmp() based implementation. It is easier "to implement for compilers on traditional CPU architectures, but much more difficult when the underlying platform is abstract/structured (e.g. WebAssembly)."

Cheers.

@DarkShadow44
Copy link

DarkShadow44 commented Dec 29, 2023

While that is possible, I think using error codes is more idiomatic. Added benefits of my current implementation is that you get a proper stacktrace when an error occurs, the macros will make sure to print it. I don't see the benefit of using setjmp, and it feels pretty icky IMHO.

@jhgorse
Copy link

jhgorse commented Dec 30, 2023

I understand. The purpose would be to satisfy the programming style which assumes success rather than defensive error code checking. I think your approach is good and idiomatic, just adding the bookmark for folks who march to a different beat.

What is left to make your C port official?

@DarkShadow44
Copy link

What's left is an official review and then possibly adjustments.

@jhgorse
Copy link

jhgorse commented Dec 31, 2023 via email

@DarkShadow44
Copy link

Please see #263 (comment)

@DarkShadow44
Copy link

It's been a while, anything I can do to help get this along?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests