-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce size even more: generate s-box table? #228
Comments
Nervermind I found it in the comments, p is set to the polynomial with x=2, i.e. p=283. So putting sbox in RAM and adding this function reduces the code size by 212 bytes :) void AES_generateSbox(void) {
uint32_t p = 283;
uint32_t t[256];
uint32_t x = 1;
for (int i = 0; i < 256; i ++) {
t[i] = x;
x ^= (x << 1) ^ ((x >> 7) * p);
}
sbox[0] = 0x63;
for (int i = 0; i < 255; i ++) {
x = t[255 - i];
x |= x << 8;
x ^= (x >> 4) ^ (x >> 5) ^ (x >> 6) ^ (x >> 7);
sbox[t[i]] = (x ^ 0x63) & 0xFF;
}
} I checked and the generated sbox is correct. If anybody has any ideas on how to reduce the function's code size, I'm happy to hear it :) |
Just FYI I found a few more minor optimizations, when compiling for a 32-bit processor. All the operations are byte-based, while for a 32-bit processor doing operations as 32-bit words (when possible) is much more efficient, saving both space and computing time. Fortunately the compiler optimizes already most of the possible ones, and there are many which are not optimizable (for example when working on 4 non-adjacent bytes). Replacing all the Since I'm using only CTR most of the functions benefit from being inlined, so I added a bunch of "inline" on these functions' declaration... Only to find out that the compiler was inlining them already :D This last optimization, in // The first round key is the key itself.
for (i = 0; i < Nk; ++i)
{
((uint32_t*)roundKey)[i] = ((uint32_t*)AES_key)[i]; // optimized version of the commented 4 lines below
/*roundKey[(i * 4) + 0] = AES_key[(i * 4) + 0];
roundKey[(i * 4) + 1] = AES_key[(i * 4) + 1];
roundKey[(i * 4) + 2] = AES_key[(i * 4) + 2];
roundKey[(i * 4) + 3] = AES_key[(i * 4) + 3];*/
} With these, as well as removing the |
Nice. It’d be great if you sent a merge/pull request. |
there are a few more cases where the code could be optimized for speed and maybe code size, assuming the buffers are DW aligned. the block structure can be represented by array of 2 uint64_t. uint64_t will be emulated with 32 bit operations on 32 bit machine and utilize power of 64 machine.
Probably 4/8 byte operations could be enforced more elegantly having a union in AES_ctx buffers with byte and DW arrays. |
Perhaps types such as |
Hi @Midronome 😄
You're very welcome! And thanks a lot, I really appreciate the kind words 😊 👍 It looks like you found the answer to your original question already (how to generate the s-box table) ?
Yes definitely, both speed & size I'd say. There are some size optimizations easily available if you are willing to compromise on portability (e.g. targeting 32-bit MCUs) or flexibility (e.g. hard-coding key and IV). W.r.t. speed, there is much that can be improved - this library is honestly quite slow compared to other implementations (e.g. with table-lookups) because of the focus on small binary size.
To be frank I would reject it if targeting A write-up / tutorial in a markdown-document would be much preferable IMO. I have been wanting a writeup of usage-examples for some times. Something that demonstrates various ways to use or (slightly) mold the code to various use-cases. The main focus of this library, is to make AES available on machines where traditional implementations require too many resources (RAM/ROM). I originally wrote this library for an 8-bit processor many years ago, because I couldn't find anything I could fit on the chip 😆 |
Awesome :) I would love to hear about those specific optimization removing portability (32-bit MCUs only) and flexibility (hard-coding key + IV). Just like you, I have no concerns about execution speed, only about small binary size (actually in my case there is plenty RAM but ROM is extremely limited). |
Hi :)
First a HUGE thanks for making this and releasing it publically this way, really awesomely useful :))
I'm implementing this on a tiny uC and I'm trying to reduce (compiled) code size as much as possible. As suggested in the readme, I'm only using CTR.
The aes.c file mentions this:
I've been looking around the net for a way to generate this tables (I'm not using rsbox since I'm only using CTR), the best I could find is this: https://crypto.stackexchange.com/questions/98396/how-does-this-code-calculating-aes-s-box-work
But I am not sure how to get the value of
p
from the polynomial1 + x + x^3 + x^4 + x^8
in C code?Thank you!
The text was updated successfully, but these errors were encountered: