-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
521.wrf_r segfaults when built with -flto=full -O2 #1429
Comments
As discussed at the call, I will follow up in this issue with information about which variables are left uninitialized in the source, and whether the problem occurs at |
The problem does occur at The compiler flags I used are:
The linker flags I used (with
|
The problematic |
@pawosm-arm @shivaramaarao Could you check things on your side to see if you can reproduce the problem, or determine why you don't see the problem? |
Hi @bryanpkc assuming the problem manifests itself at runtime, I need to ask, what input data set are you using? (test, ref, or something else). Also, with -mcpu=native being used at build time, what are the CPU features (namely, NEON vs SVE)? |
@pawosm-arm I have confirmed that the problem can be reproduced without specifying any |
Been testing it all night in our CI, with your flags, with O3 and even with Ofast, and it didn't fail. It's time to do it manually, but bear in mind, building spec's wrf with LTO take ages, and I want to try two different builds of the compiler, so it may take days before I'd confirm anything. |
Just noticed, our CI always adds |
...also we don't seem to use (or even build) lld, but it also should make no difference on what LTO does when optimizing the code |
Should |
@pawosm-arm Can you confirm in the |
I could reproduce the issue with -O2 and -flto. I need to still check why we don’t see the issue in our downstream compiler. Will update when I have additional information.
Regards,
Shivaram
|
I've got two versions here: 1.0.1 and 1.1.5. The Since I can't reproduce the problem with the most recent release of our compiler, I'll try to build spec2k17's wrf with vanilla classic. |
Indeed, wrf fails (with segfault) right after start when built like that (spec2k17 ver. 1.1.5), but pop2 doesn't. |
As I mentioned above, our downstream compiler has an option to zero-initialize variables automatically, so we have a workaround, which we may be able to upstream. gfortran has a similar set of options. Oracle's Fortran compiler has an I had a quick look at the language standard. While it discusses definition and undefinition of variables, it does not specify any default initialization behaviour, and uses of uninitialized variables are undefined behaviour. From what I can see in online discussions, programmers are encouraged to define variables explicitly. What should be our next step? Any opinion? @pawosm-arm @shivaramaarao |
I'm slightly confused. Doesn’t classic-flang already zero-initialize variables (at least globals/module variables)? |
I think the global/module variables are allocated in the BSS section and zero-initialized only as a side effect. Local variables that are not explicitly initialized are left undefined. |
After upgrading to LLVM 17, the 521.wrf_r benchmark in SPEC CPU2017 is miscompiled at O3 with LTO enabled. Examination of the crash site shows that a large amount of code that used to be inlined into
wrf_init
(viaalloc_and_configure_domain
) has been deleted, leaving thehead_grid
pointer uninitialized.The root cause of this problem is that upstream LLVM had strengthened the pruning of unreachable code to handle
undef
andpoison
. Specifically, these three patches are found to be relevant:Classic Flang leaves variables uninitialized by default. The corresponding IR values are therefore considered to contain
undef
. In our downstream compiler, we have tested with initializing all variables to zeroes, and that successfully avoids the problem.628.pop2_s is also affected by this problem.
The text was updated successfully, but these errors were encountered: