-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any port for plain C? #263
Comments
You're completely correct, C port has been in heavy discussion since almost the very beginning of the project, yet nobody ever created an issue about it (and that's bad, because it's hard to collect all these discussions in one place). There are/were several major issues with C target, though. It become a somewhat lengthy review of what's been discussed over the years, but I believe I've remembered most of the points and tried to order them from most serious to least serious. Completely different workflow in mindIt turns out that most people who need C support in KS have completely different workflow that what KS provides now. Right now, KS does a very simple thing: it gets a binary format serialization spec and generates API around it. It usually does zero transformations, except for very simple and technical ones (i.e. endianness and that kind of stuff) — whatever's in the format, it all is reflected exactly as is in the memory. C people usually strive for performance and memory efficiency and would prefer to not save stuff that can be used right away and then just thrown out. A very simple example: seq:
- id: len_foo
type: u2
- id: foo
size: len_foo
type: str This is usually ok for many modern languages, but a lot of people who wanted C target automatically suggest that:
A more complex (and real-life) example is a typical parsing of any network packet, for example, an udp_datagram. Typical current vision of what KS might create is something like this: typedef struct udp_datagram_t {
uint16_t src_port;
uint16_t dst_port;
uint16_t length;
uint16_t checksum;
char* body;
} udp_datagram_t;
udp_datagram_t* read_udp_datagram(kaitai_stream* io) {
udp_datagram_t* r = (udp_datagram_t*) malloc(sizeof(udp_datagram_t));
r->src_port = read_u2be(io);
r->dst_port = read_u2be(io);
r->length = read_u2be(io);
r->checksum = read_u2be(io);
r->body = read_byte_eos(io);
return r;
} It turns out that many users would be comfortable with completely different mechanism than "read function just fills in some structures in memory and returns a pointer to them":
void read_udp_datagram(kaitai_stream* io, udp_diagram_callbacks* callbacks) {
uint16_t src_port = read_u2be(io);
if (io->status != OK) {
udp_diagram_callbacks->on_error(io->status);
return;
}
udp_diagram_callbacks->on_read_src_port(src_port);
// ...
}
Not an "everything is an expression" languageSimply put, almost everything we had before supported "every KS expression translates into target language expression" idiom. That is, if you need to do string concatenation, i.e. seq:
- id: a
type: strz
- id: b
type: strz
instances:
c:
value: a + b ... you do that // Real-life code would be even more complex, probably with more checks, etc.
size_t len_a = strlen(a);
size_t len_b = strlen(b);
char *tmp = (char *) malloc(len_a + len_b + 1);
memcpy(tmp, a, len_a);
memcpy(tmp + len_a, b, len_b);
tmp[len_a + len_b] = 0; This issue, however, was more or less solved with advent of #146. Complex memory managementWhat's not solved however, is that such arbitrary allocations of temporary variables sometimes result in more complex memory management and need for additional manual cleanup. In the example above, Actually, even "allocate everything on the heap" is not universally agreed upon in many C apps. So, typical parsing of user-defined type like that: udp_datagram_t* r = (udp_datagram_t*) malloc(sizeof(udp_datagram_t)); might be suggested to be replaced with passing a ready-made pointer to structure to fill into that No single standard libraryFor KS, we need some basic stuff like:
typedef byte_array {
int len;
void* data;
} byte_array;
There are tons of "enhanced standard" libraries that do that, but there's no universal agreement on that. Probably roughly 80% of C applications roll something homebrew like that inside them. Out of "standard" implementations, there is glib, klib, libmowgli, libulz, tons of lesser known libraries, there's a huge assortment of string-related libs, array-related libs, etc. Out of them, probably glib is most well-known and well-maintained, but even a suggestion to use that frequently encounters a huge resistance in many C developers. Another possible way (albeit not way too well-received by many developers) is to roll our own (yet another) implementation of all that stuff, and deal with No simple solution here, and whatever we would choose probably won't be accepted by many C developers. Probably if we'll implement support for top 3 (or top 5) popular libs that will cover at least some popular options. Exception supportAs we all know, C does not have any standard exception support, and typical KS-generated code relies on them a lot, i.e.: r->src_port = read_u2be(io);
r->dst_port = read_u2be(io);
r->length = read_u2be(io);
// ... On every step, int err;
err = read_u2be(io, &(r->src_port));
if (err != 0)
return err;
err = read_u2be(io, &(r->dst_port));
if (err != 0)
return err;
// ... Since Go support introduction (#146), that became possible, although probably it still be a pain-in-the-ass to use in C :( Another quick "solution" for C is to use signals/abortions to handle these erroneous situations. In fact, it would even be ok in many use cases like embedded stuff, because things are not usually supposed to blow up there and if they do, then everything is lost already, there's no graceful exists, user interactions, "Send error report to the vendor" dialogs, etc. Stream abstractionRelatively minor and solveable issue, but still an issue: what would be a concept of "KS stream" be in C? Two popular options:
Probably C runtime would need to implement all these options and allow for end-user to choose. Nothing too scary, but still an issue to be solved. |
And, to answer these:
Well, I won't be that optimistic. Given all the stuff above, chances are tons of C people would still opt to roll things manually because of all these compromises and "does not exactly fit my workflow" argument.
Probably it won't be that easy :( KS C runtime is likely to be easier to rewrite in another language than go through all that binding hassle, and then you'll have to do that "binding" glue code for every particular type ported. |
@Zorgatone Ok, for a start, I would suggest to really play around with KS and see what it does and what it does not. May be you'll decide that it won't meet your expectations anyway?..
The link just says "Non-Image content-type returned" for me :( If you mean something like that — https://github.com/ThrowTheSwitch/CException — at the very least, that's +1 extra library of dependencies, and in C world every library is usually a major hassle. But may be that could be done too.
You've probably seen http://doc.kaitai.io/new_language.html — right now we're somewhere in between stages (2) and (3). From all the issues that I've outlined, this "totally different workflow expected" is definitely the most serious one. I'm not too keen on doing lots of work that almost nobody would want to use. |
Understandable, thanks for the reply. I was planning to do some testing with KS in the near future, maybe I will try and make my own library in C if I think I'll need it :) PS: thanks for the link, it's a good starting point |
I don't use C, I use C++ and IMHO the preferred approach is not to store the info in a standalone structure, but to decompose the thing into a set of fixed (or variable size, if language supports it) dumb structures and put them upon raw virtual memory. #65
for
IMHO we should just use C++ for that. C coders can write in C++ in C-style if they want. |
I'll just leave it here, just in case: https://matt.sh/howto-c This link was heavily suggested by several modern C proponents that I've discussed KS support for C. Suggestions to modern C style guides are also most welcome. The only one that I know is Linux kernel coding style guide — this is my personal preference for C as well, but chances are that there are other popular style guides in other areas? |
@GreyCat nice link! Useful to know that. But still not all compilers support all the C11 features unfortunately. At least it should be good to use C99, especially for the |
Most of things from that are also valid for C++. |
I'm linking also another article with critics to matt's "how to c in 2016" article, to consider the other opinions as well: https://github.com/Keith-S-Thompson/how-to-c-response |
For C strings, I would recommend that one field would end up adding few fields to resulting struct, with similar names and different types. For example: r->text_array = read_array(io, 10);
r->text_str = r->text_array.to_str(); This does not consume more memory (only const amount), as the char* pointer points to same data as the array. End user might want some glib arrays, or char*, why not give them both? |
@arekbulski Giving them both is probably a bad idea: it will require dependency on glib, and would add extra unneeded bloat for both parties. Besides, |
You suggested using our own types, and it could provide convenience functions for transforming ks_arrays to glib bytearrays and other types. Hm? Glibc would be supported, not required. |
@GreyCat |
@GreyCat, you may consider rolling your own string implementation which uses the same technique as sds. This will make Kaitai strings compatible with most functions accepting |
@arekbulski Sure, go ahead :) I'm not sure you've seen it, we also have this article with an overall new language support plan and an implementation graph like this one. |
@smaximov Yeah, that's probably how it should be done for "roll your own" implementation. |
I have sweat sour feelings about SDS. I really like the idea, I really do, but the implementation is horrible. The repo you linked has bug reports and bugfixes going back 4 years and still hanging. They also implemented variable-length prefix (the count field) which makes it bananas. We can implement our own SDS, I do not recommend using theirs. Big thanks for sharing this with us, @smaximov ! |
Is anyone working on this, even as a prototype? |
Not really. Personally, I would probably return to this one after #146, as experience with Go is very much the same as with C (except for the fact that Go relatively ok strings and slices). |
I promised to implement the C runtime, but that was few months ago. Since then I had much work on Construct, and now I am working on few things in KS. I am still willing to implement this, but I cant work on everything at once. If you wish, then I will get on top of C but other work items would need to be shelved instead. |
Any updates on this? I'd like to help, but I'm not familiar with scala... |
No updates. Unfortunately, most of #263 (comment) still stands. It's probably still a good idea to complete Go port first, as it is shares many common concepts (except for the hassle with memory management). |
FWIW, I have a (for me) working C version at |
BTW, I have a half-finished (but not yet published, development stalled because I got other tasks) proposal of how it should look like for C and C++ for one damn simple spec . In general:
|
Would you have an example of how that C code would look like? I don't quite understand the "private structures" bit. In my example, all structs are public. |
I have said that it is unfinished. But I'd create a small example just now illustrating what I mean, but without any guarantees of correctness.
Very easy struct a{
uint64_t *c;
};
struct a_priv{
uint64_t c;
};
struct a_full{
struct a pub;
struct a_priv priv;
};
struct a * construct_a(){
sruct a_full *a = (struct a_full *) malloc(sizeof(a_full));
a->pub.c = & a->priv.c;
return (struct a *) a;
}
void process(struct a * c){
*(c->b) = 42;
} This way we access the data only via pointers, so we access it uniformly no matter where are they. It is at cost - there is overhead, a pointer per a var. It is possible to make it more efficient by keeping only pointers to structs, not to every fields, but in C it will cause the API being terrible and sufficiently different from it in other langs. In C++ it can be fixed by operator override and constexprs.
I guess some libc can implement
Standard C doesn't even have any sane functions to work with strings. It is an extremily bad too |
I'm not too experienced here, but I'll give it a go once I get to a phase where I'll need it. Hopefully, others will lend a hand with that review part? |
@DarkShadow44 I wanted to say excellent work on this - I was playing with it and am pretty happy with it. There are a couple issues; if you pull the SQLite KSY and build that you'll see an issue with the generated code in terms of a minor incorrect code generation. But, everything else I've tried--including my personal KSYs--seems fine. A far as memory allocations, were you thinking of generating a destruction/free function as well, or leave that up to the implementor? For anyone who wants to try the C generation fork but isn't familiar with Scala builds, it's easy to build via Docker: git clone [email protected]:DarkShadow44/kaitai_struct_compiler.git && cd kaitai_struct_compiler docker run \
-it \
--rm \
--name metarank-sbt-build \
-u sbtuser \
-v "$(pwd)":/home/sbtuser \
-w /home/sbtuser \
hseeberger/scala-sbt:11.0.7_1.3.13_2.12.12 \
sbt compilerJVM/universal:packageBin You'll find the distributable zip ( A crappy basic driver (for the zip example) is: #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
#include "kaitaistruct.h"
#include "zip.h"
void
logger (
const char *txt
) {
fprintf(stdout, "%s\n", txt);
}
int
main (
int argc,
char **argv
) {
FILE *fp = fopen("ramdisk.zip", "rb");
if (NULL == fp) {
fprintf(stderr, "Couldn't open file...");
return (1);
}
ks_config config;
ks_config_init(&config, logger);
ks_stream *ksfs = ks_stream_create_from_file(fp, &config);
ksx_zip zip;
int rc = ksx_read_zip_from_stream(ksfs, &zip);
printf("rc = %d\n", rc);
printf("sections = %" PRIi64 "\n", zip.sections->size);
for (size_t ii = 0; ii < zip.sections->size; ++ii) {
printf("section[%04zu] type... %" PRIu16 "\n", ii,
zip.sections->data[ii]->section_type);
/* If this is the central directory, print out the file name as well... */
if (513 == zip.sections->data[ii]->section_type) {
ksx_central_dir_entry *de = (ksx_central_dir_entry *) zip.sections->data[ii]->body;
printf(" file... %*s\n",
(int) de->file_name->len, de->file_name->data);
}
}
ks_stream_destroy(ksfs);
fclose(fp);
return (0);
} |
@jonahharris Thanks! I'll look into the SQLite issues. free functions are the last big thing missing, for my simple programs that simply isn't necessary. But of course I'll add that into the code generation. It's just that I'm really good at procrastinating. ;) |
Hi @DarkShadow44, are there any updates on this project? I've found about it a week ago and I'm realizing that it would be very useful for me. Do you plan to continue with it, or is it abandoned for good.
I tried to fix it myself, but as i said, my Scala skills are nonexistent, so i couldn't tackle the problem. Does someone have any idea what could cause this? Thanks!! |
@iboofkratom For now, don't rebase it, it needs fixes to work with current KaitaiStruct. I also need to rework some of the parsing, so it fits the current infrastructure better. Current plan:
|
@DarkShadow44 Thanks for the info, good work! I finally managed to parse a HelloWorld packet (it took some figuring out), so I'm looking forward to start using this for my C projects. |
Just updated to the latest kaitai struct... Branches are at EDIT: Those branches are now outdated, see my later comments for the MR! Feedback, like always, welcome! There's still a free kinks to iron out, but at least the tests (mostly) pass.
As I already said, generated C code and library code is pretty finished, feel free to review! Although keep in mind it needs to support C89 and gcc4.3, for best compatibility. It's gotten pretty complex, but I think that's needed to support all the usecases. EDIT: Last update: 2022-12-17 |
Aren't |
Sure, why do you ask? Btw, feel free to open issues in my own repos if you want to discuss my changes. |
Pushed a few major reworks, the rest should be minor adjustments... But I needed to get the architecture in a way I like it. |
Is there any branch with C support? |
Yes, see #263 (comment) I need more time though, work started again and I'm pretty busy... |
I added memory management, and created a MR: Could you please create a repo for the C runtime so I can make a MR as well? Review and Feedback very welcome, of course this goes not only to the maintainers but to everyone who is interested in a C implementation! |
Sorry, I missed this call to action. Here it is: https://github.com/kaitai-io/kaitai_struct_c_runtime |
Thanks, created a PR and updated my last comment. |
Any news regarding a review? |
Thanks, feel free to test and report any issues. Or help reviewing. :) |
Is there any chance you could put together a "hello world" example for the native C support? I'm new to KaiTai, but have written a fair number of low-level uart drivers in ArduPilot. I'm sure the developer team there would have some useful feedback on the generated C code. |
Not exactly sure what a "hello world" example would be. There's a bunch of formats here:
I could provide generated C code for some of those if that would help? For a simple hello world the code probably looks way too verbose, but this is needed to handle all the advanced features kaitai struct supports. |
I understand. A representative and well-documented protocol can be found here: It has start/end bytes to sync up the packet. There are commands and responses, and the responses are interleaved with sensor data on the wire. Most of the protocols have some sort of custom checksum, however this uses the Fletcher 2 byte checksum. Thus, it would be great to see the generated code for
If you think any of the existing protocols you linked are similar that, then great, otherwise I'm happy to take a stab at writing a short |
Kaitai struct is mostly focused on reading data, this C implementation is for reading only.
Checksums should IMHO be handled outside kaitai struct as well.
Not sure what you mean. Give me a ksy you want to see and I can generate code for you though. |
Referring back to @GreyCat's OR to OP on the topic of C Exceptions. This is more of a bookmark-for-later-addition rather than something to add to the current stack for development. Here is a minimal dependency implementation from the Ren-C implementation Rebol 3, an interpreted language: The complexity seems to come from supporting WebAssembly. It is a setjmp()/longjmp() based implementation. It is easier "to implement for compilers on traditional CPU architectures, but much more difficult when the underlying platform is abstract/structured (e.g. WebAssembly)." Cheers. |
While that is possible, I think using error codes is more idiomatic. Added benefits of my current implementation is that you get a proper stacktrace when an error occurs, the macros will make sure to print it. I don't see the benefit of using setjmp, and it feels pretty icky IMHO. |
I understand. The purpose would be to satisfy the programming style which assumes success rather than defensive error code checking. I think your approach is good and idiomatic, just adding the bookmark for folks who march to a different beat. What is left to make your C port official? |
What's left is an official review and then possibly adjustments. |
Have you opened a PR? I find the review mechanism of GitHub useful.
Also, which branch are you working from? I saw master was behind
significantly the latest here.
…On Sat, Dec 30, 2023 at 07:13 DarkShadow44 ***@***.***> wrote:
What's left is an official review and then possibly adjustments.
—
Reply to this email directly, view it on GitHub
<#263 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANTNMSFJVIDXQ4ZMDTOXLDYMAAPTAVCNFSM4D4J3H3KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBXGI2TCNRWGY4Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Please see #263 (comment) |
It's been a while, anything I can do to help get this along? |
Hi,
I would like to know if you would consider (or have any plans already) to port the project for use with "plain" C (other than C++ and C#). I would use it, and not all the systems (even embedded maybe?) support C++ and/or C#. Having a C version would enable portability on any system and even more languages with C bindings
The text was updated successfully, but these errors were encountered: