Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsigned char is not uint1, truncations != C casts != promises #1606

Open
andres-erbsen opened this issue May 23, 2023 · 35 comments
Open

unsigned char is not uint1, truncations != C casts != promises #1606

andres-erbsen opened this issue May 23, 2023 · 35 comments

Comments

@andres-erbsen
Copy link
Contributor

From #1605

Currently, variables for which range analysis determines the range [0, 1] are assigned the type uint1. That type is then printed to C, and perhaps to other languages, as a typedef for unsigned char. That's no good: casting to and adding in these two types produce different results.

We should stop printing uint1. If we really want to output code for which it is locally evident that a variable is in the range [0, 1], refencing it as x&1 (which is what we were also often doing up to recently) is less bad. We can probably do even better.

Our use of signed types also warrants some review.

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented May 23, 2023

Re #1605 (comment):

We should probably parametrize the pass on a list of acceptable bitwidths to remove, then pass in [8; 16; 32; 64; 128] or something in BoundsPipeline

Yes, we could restrict the redundant-& optinization to uint8+, but could we remove uint1 altogether? Do you remember why/when we added it in the first place?

@real-or-random
Copy link

real-or-random commented May 23, 2023

I'd recommend against using any type that has a rank less than the rank of int. (That is, don't use (unsigned) char and (unsigned) short. In practice, it also means don't use (u)int8_t and uint16_t ). These can be very nasty due to C's integer promotion rules, which say that before any arithmetic operation, any type A of rank less than int will first be promoted to either int or unsigned int, depending on whether all values of A fit in int or in unsigned int.

Want a nasty example? The following code has UB for some inputs due to signed (!) overflow:

uint16_t mul(uint16_t a, uint16_t b) {
    return a*b;
}

See here for more background: https://stackoverflow.com/questions/73600275/c-undefined-behavior-when-multiplying-uint16-t
A standard technique to avoid against these problems is to start multiplications with 1U * ... This will make sure that arithmetic only ever happens on unsigned types.

Our use of signed types also warrants some review.

Happy to look at some generated code here.
I think unsigned overflow can also happen with left shifts: 1 << 31 should also be UB on a platform where int has 32 bits (use 1U << 31 instead), but multiplication and left shift should be the only operations that can be problematic (unless you're on a non-existent platform where short is just one bit shorter than int, so overflow could happen with additions too).

(Moreover, small types may be slower.)

@samoht
Copy link

samoht commented May 23, 2023

Hello,

I am not sure it is related, but we also just noticed some wrong behaviour of the inverse function when upgrading from clang 14.0.0 to 14.0.3 on macOS. I haven't managed to create a short repro yet, but I have this which use mirage-crypto: https://gist.github.com/samoht/7c27b6418148ee04590e571f390f1997

mc_p256_inv is implemented by https://github.com/mirage/mirage-crypto/blob/main/ec/native/p256_stubs.c#L94-L99 which calls https://github.com/mirage/mirage-crypto/blob/main/ec/native/inversion_template.h (which is almost a raw import of the extracted C code, with a small patch on top where the "inversion" function is declared "static", and the convenience function "inversion" is provided)

@JasonGross
Copy link
Collaborator

Yes, we could restrict the redundant-& optinization to uint8+, but could we remove uint1 altogether? Do you remember why/when we added it in the first place?

I think we need it because we want code that relaxes uint8 to uint32, but does not relax carries to uint32? Plausibly we should support for non-idempotent bounds relaxation functions that turn uint1 into uint8 but turn larger things into uint32? I could be misremembering though. (It's also possible we need it for bounds analysis, or that the rewrite rules that deal with carries expect casts to be uint1?)

@andres-erbsen
Copy link
Contributor Author

@samoht take a look at #1605 -- if you got fiat-crypto code in the range of bad commits described there, it's reasonably likely the same issue. If not, I'm not sure.

@hannesm
Copy link
Contributor

hannesm commented May 26, 2023

@samoht take a look at #1605 -- if you got fiat-crypto code in the range of bad commits described there, it's reasonably likely the same issue. If not, I'm not sure.

unfortunately, the error on some macOS machines / versions seem to be present with a commit after the revert (i.e. with c79cf60)

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented May 26, 2023

I took a look at divstep_precomp in https://github.com/mirage/mirage-crypto/blob/main/ec/native/np256_64.h#L1227 and it looks like that function has no arithmetic on uint1 or any other small integer type, and all casts to uint1 have &1 on them.

Lacking a specific guess for the issue, I figure we can perhaps just track down the change.
Does the issue show up if you prevent inlining of the functions called by the divstep template, and does it show up when you further prevent inlining of all functions in fiat-crypto-generated code? Can macos clang cross-compile for linux, and does the issue show up then? Can you make a good binary and a bad binary and diff the generated assembly code, or send it to me so I can?

@samoht
Copy link

samoht commented May 30, 2023

@andres-erbsen I've added an ok (generated with clang 14.0.0) and a ko (compiled with clang 14.0.3) binary to the gist: https://gist.github.com/samoht/7c27b6418148ee04590e571f390f1997 -- let me know if you want to track this in a separate issue.

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented May 30, 2023

Thank you. It appears that any objdump in Arch repos doesn't grok mac executables, so if you could run objdump -M intel -d repro-ko.exe > repro-ko.s or such for both executables that'd be great. Update: radare2 works.

@samoht
Copy link

samoht commented May 30, 2023

removing noise
#include <stdint.h>
#include <stdio.h>

int main() {
  uint64_t x=UINT64_C(0xffffffffffffffff);
  printf("x=%llu\n", x);
}

is generating a different result on GCC/clang 14.0.0 vs. clang 14.0.3

CLANG-14.0.3
x=18446744073709551615
GCC / CLANG-14.0.0
x=18446744073709551615

And this value is used by fiat_p256_msat.

@real-or-random
Copy link

CLANG-14.0.3
x=18446744073709551615
GCC / CLANG-14.0.0
x=18446744073709551615

These values are the same.

@samoht
Copy link

samoht commented May 30, 2023

Ok, I'm now very confused - it used to by x=-1 for clan 14.0.3, but now I cannot reproduce when scripting this (I copied/pasted the output of my scripts without reading the results...). I'll try to understand what is going on... apologies for the noise.

@real-or-random
Copy link

It probably doesn't matter for that issue, but %lld or %ld should give you -1, because d interprets (the unsigned value) as signed.

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented May 30, 2023

Ok, I looked into this for a bit. The diff is rather large so it's hard to say anything conclusive. However, it's rather striking that carry propagation (adcs with xzr) only appears prominently on the "ok" side. I also appears in the source code here. It's possible that whatever the ko side does is very clever and correct, but it wouldn't be the first time fiat-crypto code runs into a bug in exactly this scenario with a mainstream compiler. 🤔

@samoht
Copy link

samoht commented May 30, 2023

ok more news - -O0 fixes the issue on clang 14.0.3.

@andres-erbsen
Copy link
Contributor Author

On gcc I debugged a similar issue by listing the compilation passes and disabling them one-by-one. Clang docs seem to say they have a tool that does that https://llvm.org/docs/HowToSubmitABug.html#miscompilations and the other suggestions there also sound good.

Coincidentally, I recalled that we at least have a C-language test template at
https://github.com/search?q=repo%3Amit-plv%2Ffiat-crypto%20inversion_test_template&type=code so if that reproduces the issue it might be a good alternative to trying to bughunt through ocaml.

@samoht
Copy link

samoht commented May 31, 2023

Bisecting the compilation passes with opt-bisect-limit points to this pass:

BISECT: running pass (303) InstCombinePass on fiat_p256_divstep

Also I've added a p256_64 test using the C-language test template, but this always seems to pass.

@andres-erbsen
Copy link
Contributor Author

Great find! I figure it should be possible to make a test case in C still, but it'd probably require replicating the compilation-unit structure of the ocaml example. Perhaps the most sure we can get is to have the same object file for divstep and link it against an ocaml test harness and a C test harness, confirming with gdb that the same arguments get passed.

@andres-erbsen
Copy link
Contributor Author

I looked through llvm/llvm-project@llvmorg-14.0.0...llvmorg-14.0.3 and the two commits that mention InstCombine are llvm/llvm-project@43ee392 llvm/llvm-project@9a3e81e but sticks out to me right now. But even if InstCombine is buggy pass in the code, the bug may have been already present and only triggered by a 14.0.3 change.

@samoht
Copy link

samoht commented May 31, 2023

I managed to reproduce using C only, by tracing what values the OCaml code was sending to the C side.

#define LIMBS 4
#define WORD uint64_t
#define WORDSIZE 64
#define LEN_PRIME 256
#define CURVE_DESCRIPTION fiat_p256

#include "p256_64.h"
#include "inversion_template.h"
#include <stdio.h>
#include <string.h>
#include <assert.h>

int main (void) {
  uint64_t init[5];
  uint64_t zero[4];
  uint64_t result[4];
  int i;

  init[0] = 3164219169453123020;
  init[1] = 5610335456920920796;
  init[2] = 4531946527682827593;
  init[3] = UINT64_C(17678136164097313971);
  init[4] = 0;
  zero[0] = 0;
  zero[1] = 0;
  zero[2] = 0;
  zero[3] = 0;
  result[0] = 5547023629237264145;
  result[1] = 6634760001876509537;
  result[2] = 2568289811407991911;
  result[3] = UINT64_C(9585681005862132995);

  inverse(zero, init);

  for (i = 0; i < 4; i++) {
    if (zero[i] != result[i]) {
      printf("FAIL zero[%d]=%llu result[%d]=%llu\n", i, zero[i], i, result[i]);
      return 1;
    }
  }
  printf("PASS\n");
  return 0;

}

I've pushed to the gist two updated binaries (which are now much smaller!)

@andres-erbsen
Copy link
Contributor Author

Ok that looks like we almost have it. If I had a machine that the bug reproduced on my approach at this point would be to aggressively inline and minimize the example that tells the difference between -O0 and -O1. I do it manually (while : inotifywait compile && test && bell...) but gcc has recommendations here.

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented Jun 1, 2023

Out of the two binaries you posted, the -ok one has ubsan clutter all over it so I won't spend time comparing them now. Minimizing is likely a more productive avenue anyway, because if we need to file a bug report we should post an example. Edit: if we do need to look at binaries, they should be as similar as you can make them otherwise.

@samoht
Copy link

samoht commented Jun 2, 2023

I've pushed a smaller repro with less clutter. It is still too big?

@andres-erbsen
Copy link
Contributor Author

The asm for the two sides looks very different (much more different than at first) due to different optimizations being applied. For example, the -ok file still has addcarryx and cmovznz as function calls instead of inlined.

@samoht
Copy link

samoht commented Jun 3, 2023

Apologies for the painful process. I still haven't managed to find enough time to inline all of this properly and I do realise it doesn't help you very much.

But it's now clear something is fishing with inscombine:

clang -O1 -mllvm --instcombine-max-iterations=0 main.c # ok
clang -O1 -mllvm --instcombine-max-iterations=1 main.c # ko

Unfortunately, this iteration still leads to very different assembly outputs. I'll try to dig into this more later this weekend and manually inline functions.

@real-or-random
Copy link

Perhaps it's better to report to LLVM already now, even with a somewhat large source file. I can imagine that they have more experience and/or good suggestions on how to narrow down the issue further.

@andres-erbsen
Copy link
Contributor Author

Yep, I think it's a fine time to report. Hopefully breaking as late as instcombine will dodge the question of why we believe a program so large to be devoid of undefined behavior.

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented Jun 3, 2023

The diff is actually nice and not-too-large again, but I still can't figure out what is going on 🤣 . The most glaring change seems to be the following. In particular, the addcarry of the highest limb of p256 isn't doing the obvious thing on either side:
diff

https://andres.systems/tmp/rrddiff0.html

@samoht
Copy link

samoht commented Jun 3, 2023

I have updated to clang version 16.0.4 (the one shipped with homebrew and not Xcode) and the bug disappeared - not totally sure what to do now. I have pushed another version of the binaries compiled with the same optimisation flags but with 2 versions of the compiler..

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented Jun 3, 2023

Ok, I looked at the new ok as well and here's a hypothesis. Looking at the last adcs in the image I posted, we can see on the left that it computes x, carry = limb + 0 + carry and then x = x-y whereas on the right we have x, carry = limb + (-y) + carry. The resulting x is the same, but the resulting carry could be different. In particular, in both -ok versions carry is 0 since limb < 2^64-1. In the -ko, carry could be nonzero depending on y. Below, these carry bits are fed into similar addition chains that I do not understand.

@JasonGross
Copy link
Collaborator

How far is the asm equivalence checker from being able to ingest these compilations and report on differences? (And would it be useful?)

@andres-erbsen
Copy link
Contributor Author

This is AArch64 so quite far.

@andres-erbsen
Copy link
Contributor Author

@samoht Would you be up to filing a report for this? Even though the issue has stopped appearing with latest compiler, it'd be good to make sure that the bad rule was actually fixed or removed instead of just being no longer triggered due to other optimizations changing. (Or at the very least, there should be a regression test in their repo.)

@andres-erbsen
Copy link
Contributor Author

https://patchew.org/QEMU/[email protected]/ has been brought to my attention wrt the clang14 issue. I have not dug deep enough to confirm it is the same root cause, but the version matches. Just FYI, the linked patch says "There is not currently a version of Apple Clang which has the bug fix".

hannesm added a commit to hannesm/opam-repository that referenced this issue Sep 18, 2023
…age, mirage-crypto-rng-lwt, mirage-crypto-rng-eio, mirage-crypto-rng-async, mirage-crypto-pk and mirage-crypto-ec (0.11.2)

CHANGES:

* mirage-crypto-rng-eio: improve portability by using eio 0.7's monotonic clock
  interface instead of mtime.clock.os. (mirage/mirage-crypto#176 @TheLortex)
* mirage-crypto-rng-eio: update to eio 0.12 (mirage/mirage-crypto#182 @talex5)
* mirage-crypto-rng: fix typo in RNG setup (mirage/mirage-crypto#179 @samueldurantes)
* macOS: on arm64 with clang 14.0.3, avoid instcombine (due to miscompilations)
  reported by @samoht mit-plv/fiat-crypto#1606 (comment)
  re-reported in ulrikstrid/ocaml-jose#63 and mirleft/ocaml-tls#478
  (mirage/mirage-crypto#185 @hannesm @kit-ty-kate)
* avoid "stringop-overflow" warning on PPC64 and S390x (spurious warnings) when
  in devel mode (mirage/mirage-crypto#178 mirage/mirage-crypto#184 @avsm @hannesm)
* stricter C prototypes, unsigned/signed integers (mirage/mirage-crypto#175 @MisterDA @haesbaert
  @avsm @hannesm)
* support DragonFlyBSD (mirage/mirage-crypto#181 @movepointsolutions)
* support GNU/Hurd (mirage/mirage-crypto#174 @pinotree)
@hannesm
Copy link
Contributor

hannesm commented Sep 21, 2023

To update on the issue described within this issue (on macOS/arm64 with clang 14.0.3, there is a miscompilation - starting at #1606 (comment)). Since it is unclear when Apple will update their compiler toolchain, we shipped "mirage-crypto" with using "-mllvm --instcombine-max-iterations=0" on that specific platform and C compiler. Thanks a lot for the investigations and remarks in here.

nberth pushed a commit to nberth/opam-repository that referenced this issue Jun 18, 2024
…age, mirage-crypto-rng-lwt, mirage-crypto-rng-eio, mirage-crypto-rng-async, mirage-crypto-pk and mirage-crypto-ec (0.11.2)

CHANGES:

* mirage-crypto-rng-eio: improve portability by using eio 0.7's monotonic clock
  interface instead of mtime.clock.os. (mirage/mirage-crypto#176 @TheLortex)
* mirage-crypto-rng-eio: update to eio 0.12 (mirage/mirage-crypto#182 @talex5)
* mirage-crypto-rng: fix typo in RNG setup (mirage/mirage-crypto#179 @samueldurantes)
* macOS: on arm64 with clang 14.0.3, avoid instcombine (due to miscompilations)
  reported by @samoht mit-plv/fiat-crypto#1606 (comment)
  re-reported in ulrikstrid/ocaml-jose#63 and mirleft/ocaml-tls#478
  (mirage/mirage-crypto#185 @hannesm @kit-ty-kate)
* avoid "stringop-overflow" warning on PPC64 and S390x (spurious warnings) when
  in devel mode (mirage/mirage-crypto#178 mirage/mirage-crypto#184 @avsm @hannesm)
* stricter C prototypes, unsigned/signed integers (mirage/mirage-crypto#175 @MisterDA @haesbaert
  @avsm @hannesm)
* support DragonFlyBSD (mirage/mirage-crypto#181 @movepointsolutions)
* support GNU/Hurd (mirage/mirage-crypto#174 @pinotree)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants