Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LTO on windows with clang #12043

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

GertyP
Copy link
Contributor

@GertyP GertyP commented Jul 27, 2023

After gettting no response to my questions on the status of some aspects of meson's LTO support here, I decided to poke around and work through what seemed to be needed to get LTO fully supported on windows with clang.

One area that became apparent as being a problem is the automatic selection of the static library linker/archiver tool, which may be incompatible with the currently selected compiler. It seems as though there are two ways to help/notify a user who runs in to problems resulting from incompatibilities between their compiler/archiver/linker tools. One would be to simply try to detect known incompatibilities and notify the user, along with better documentation on selecting the static linker / AR tool (this is what I've done in these changes, along with a big, waffly comment next to the new _validate_tool_combination(...) function, which is just to prompt some discussion; I'll trim down that huge comment if we settle an approach). Another alternative would be to try to more intelligently, quietly/automatically select a static linker tool, which I'm less keen on for reasons I've put in the big comment.

Actually, for convenience, this is the big comment I left in these changes -

If the user's enabled 'b_lto' and are using a supporting compiler/linker on windows (i.e. clang/ld.lld, clang-cl/lld-link), then unless they also specify a suitable static linker (llvm-ar or llvm-lib), meson unfortunately tries to quietly go and guess at a static lib/linker tool to use. Under the normal visual studio native tools environment, it picks the 'lib' tool, which results in an unhelpful error -
"invalid or corrupt file: cannot read at 0x..."
We can provide a much more helpful error to the user.

More generally, however, we can do this kind of tool combination problem checking here. E.g. we don't expect any of the clang/llvm tools to be mixed with the msvc tools like 'cl', 'lib', 'link'.

"Why even bother with this at all? If the user's mixing weird combinations of tools, they're asking for trouble and we aren't obliged to go out of our way to clean up after them": Normally, I'd agree with this, however, meson has the undesirable tendency to quietly (i.e. without requiring some action/acknowledgement from the user) go and search and guess at things (like a static linker tool) that are needed but which the user hasn't specified. So, in the example above (and maybe with other similar 'find and guess' cases), either meson needs to do a better job at guessing at a static linker or it at least should do more checking of problem combinations and better informing the user of the problem and how to fix it.

I'd argue, in the overwhelming majority of cases, users are easily capable of - and well placed to - decide and specify suitable tools on the occasions where meson could simply report the need for a tool the user hasn't specified.
In general, the principle of avoiding finding/guessing, in favour of just getting an explicit choice from a user has longer term benefits (less maintenance, testing, responsibility, less user surprise and frustration in problem cases, greater user understanding) that outweigh any short-term, small inconveniences.

What are people's thoughts on this?

Comment on lines 461 to 465
clangcl_color_args: T.Dict[str, T.List[str]] = {
'auto': ['-fcolor-diagnostics'],
'always': ['-fcolor-diagnostics'],
'never': ['-fno-color-diagnostics'],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just move this into the function since it is only used once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might also add colorout support separately since it is again a commit that could just be pulled in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tristan957. Thanks for taking the time to give some review feedback.

move this into the function

I considered this but wondered if there is a performance impact on the way the python interpreter processes repeated calls to a function that defines a function-local variable/table/list/object like this... And there is, albeit very small and practically insignificant; a function-local structure definition incurs a very small overhead, in repeated function calls, compared to the equivalent file-scoped variable definition. So, even the performance of a function-local clangcl_color_args is practically identical, and even though I normally do prefer restricting scope and hiding access where reasonable, I'd rather avoid incurring any unnecessary overhead since the alternative (file-scoped, as we have in this change) is perfectly vanilla and not a practical concern as far as I can see... unless you can suggest why this might be a realistic issue.

Copy link
Member

@eli-schwartz eli-schwartz Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NACK to moving it into a function, the current approach matches the style of other mixins -- to keep that table as a module-global dictionary.

But it could be from .clang import clang_color_args instead, which validates the reasoning for keeping these as module-global. :)

Comment on lines +522 to +507
if mode == 'thin':
args.append(f'-flto={mode}')
else:
assert mode == 'default', 'someone forgot to wire something up'
args.append('-flto')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely on my list of things to make this an enum at some point

# checking here. E.g. we don't expect any of the clang/llvm tools to be mixed
# with the msvc tools like 'cl', 'lib', 'link'.
#
# "Why even bother with this at all? If the user's mixing weird combinations of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random quote

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random statement 😄

To repeat my rambling PR description point on this big comment block in particular -

I'll trim down that huge comment if we settle an approach

so that quoted question ("Why even bother ...") is my way of presenting and addressing a possible argument against having this piece of validation that tries to help users from shooting themselves (intentionally or accidentally) in the foot.

"Why not just make a statement without posing a question like this?", you might wonder 😄 It's just a simple, effective way of walking the reader through a thought process involving pros and cons to an approach, in an area where I'm also trying to draw attention to what I think is perhaps an unfortunate choice of meson to employ a kind of "quiet, search and guess"-like behaviour that I've fallen foul of in a few areas now, which I think is probably less useful in the long term than a kind of simpler "stop, report, and require explicit user choice"-like principle.

if cc.get_id() != 'clang':
raise SkipTest('Only clang currently supports thinLTO')
if cc.get_id() not in {'clang', 'clang-cl'}:
raise SkipTest('Only clang/clang-cl currently supports thinLTO')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/supports/support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. Are you saying: Replace 'supports' with 'support'? If so, that's not valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I am saying, and yes it is correct. The string is equivalent to: "Only clang and clang-cl supports thinLTO", which doesn't make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Think I was reading it wrongly.
Done.

@@ -349,15 +349,25 @@ class VisualStudioLinker(VisualStudioLikeLinker, StaticLinker):

"""Microsoft's lib static linker."""

id = 'lib'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be able to reduce your diff by adding these ids in a separate commit that could just be pulled into master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature directly requires this change. I'm not sure what problem it gives by having it included here (perhaps you can explain your thinking)... While I am sure that the re-shuffling and unpicking of this and related changes, to make a separate commit for a separate PR, which this one would then depend on, is quite a lot of potentially error-prone jiggling around and paper shuffling for an unclear purpose and benfit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're only hurting your own chances of getting this PR merged. Up to you.

It doesn't have to be a separate PR, just another commit.

Comment on lines +219 to +225
# It is using the VS-style linker/archiver syntax but it's important to
# distinguish between 'lib' and 'llvm-lib', which have incompatibilities
# (notably, when using LTO).
if 'LLVM' in out:
return linkers.LlvmVisualStudioLinker(linker, getattr(compiler, 'machine', None))
else:
return linkers.VisualStudioLinker(linker, getattr(compiler, 'machine', None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change could also be a separate commit to reduce the diff since this look like a bug fix. Same with the similar chunk below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a bug fix; it's a change that's needed in order to perform the 'mixing toolchain streams' validation done by _validate_tool_combination in interpreter.py. Previously, use of both 'lib' or 'llvm-lib' would end up instantiating a VisualStudioLinker, after which, we can no longer distinguish the difference by normal means (i.e. 'id'). I think we'd need to open it up and inspect other members to figure out which flavour we're dealing with.

Comment on lines 174 to 190
proc = subprocess.run(command, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT if stderr else subprocess.PIPE,
env=env,
encoding='utf-8',
text=True, cwd=workdir, timeout=60 * 5)
print('$', join_args(command))
print('stdout:')
print(p.stdout)
print(proc.stdout)
if not stderr:
print('stderr:')
print(p.stderr)
if p.returncode != 0:
if 'MESON_SKIP_TEST' in p.stdout:
print(proc.stderr)
if proc.returncode != 0:
if 'MESON_SKIP_TEST' in proc.stdout:
raise SkipTest('Project requested skipping.')
raise subprocess.CalledProcessError(p.returncode, command, output=p.stdout)
return p.stdout
raise subprocess.CalledProcessError(proc.returncode, command, output=proc.stdout)
return proc.stdout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, split this change out into its own commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that, but I'd like to better understand why you think that's necessary/preferable.

If a change is fairly incidental, superficial, and is deemed OK, is there any practical harm it does to get rolled in with a more substantial change?

Just because something can be done, does that mean it should? Any time someone renames a variable or corrects some comment spelling/grammer, are you on the case to get it out into a separate commit? Is that worth the shennanigans involved in cherry-picking/juggling changes of this kind? In what way is this a practical benefit?

I'm not trying to be objectionable but I do think it best to at least make an honest attempt to discuss, understand, and negotiate where I might initially disagree or not see the value of something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits should be self-contained and atomic. Your commit message gives it away that it should be split into two commits.

Fix 'test_lto_mode' and rename variables that clash with pdb commands.

2 completely unrelated changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a change is fairly incidental, superficial, and is deemed OK, is there any practical harm it does to get rolled in with a more substantial change?

It violates the project coding standards, which is "one commit per logical change" and the two logical changes here are:

  • Fix 'test_lto_mode'
  • rename variables that clash with pdb commands.

Just because something can be done, does that mean it should? Any time someone renames a variable or corrects some comment spelling/grammer, are you on the case to get it out into a separate commit? Is that worth the shennanigans involved in cherry-picking/juggling changes of this kind? In what way is this a practical benefit?

It makes it orders of magnitude easier for:

  • project maintainers to review the PR (in particular, a commit that only contains "inert" changes such as variable renaming or comment tweaks can be mentally filtered out and allow reviewers to look at just the active changes)
  • future readers of the code to understand the flow of changes
  • git bisect to accurately pin down which commit affected some functionality (what if a variable name was typoed in an uncommon branch? If the bisect points to a commit that changes 3 or 4 different things, which change caused the regression?)
  • backporting patches to maintenance branches (changes to comments or variable names in maintenance branches increases the likelihood of conflicts, and generally constitutes an unnecessary risk)

which is why the general guideline is in place, it's not specific to this PR and it's not even specific to the Meson project. In fact, the git software itself includes the same recommendation in its own guidance: https://git-scm.com/docs/SubmittingPatches#separate-commits

If it helps, I find that it becomes much easier to juggle the changes if I start off by adding changes using git add -p to interactively add just part of the change to a commit, and then commit the change and use git add -p to add other changes to a second commit. It's also easier to start off with two commits and later squash them, than to start off with one commit and later break them apart.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both of you for taking the time to try to elaborate on this point but I'm still struggling to see the real-world net benefits in this particular case.

I'm still left with the impression that there is a general guideline, which seems reasonable, but which has now become sacrosanct and can be, in a few circumstances, of dubious value, if applied to the extreme.
I hope you can see that neither "2 completely unrelated changes" nor discussing general guidlines and more nebulous "understand the flow of changes" are things that I have any problem with, but neither do they specifically explain concrete examples of problems that arise from failing to split incidental changes like renaming out into a separate change. They just restate a guideline/doctrine or talking in general terms, avoiding specifics, which I find doesn't help clarify.

To re-iterate: I'm not disputing the general niceness of having separate logical changes, but I would like to understand real world examples of the value of splitting some incidental renaming changes out into a separate change.

Renaming variables must happen a lot. Of those, I'd imagine a significant proportion would be to simply use better, more meaningful variable names. Now, I don't think anyone could dispute that incidental, one-off, isolated variable renaming to something marginally better is a change that technically CAN be separated out into a new commit, but a whole separate commit of nothing more than 'Rename: More meaningful variable name', for example, still strikes me as of rather dubious value to go to the effort of reorganising commits for some incidental renaming.
But if that were really the case, then huge amounts of commits for all projects would be bloated with the likes of 'Rename (X) to better name (Y)' ... Only we don't have that, in general, because I think there is a more sensible approach than just blindly following the standards/guidlines(/dogma) to the extreme (i.e. in all possible cases). All of which means that there must be some pragmatic grey area that prevents the blind, unquestioning enforcement/application of the rule: "If it can conceivably be considered a separate logical change then it must be split out into a separate change". And what defines that grey area must surely come from an understanding of the practical problems that are fixed/avoided by going to the effort of splitting up changes to the Nth degree. For me, simple renaming improvements (among other things) has always fallen into this grey area of there being no clear benefit to splitting out as a separate change (as well as quite tangible drawbacks of busying commit history with arguably unnecessary noise, adding sometimes fiddly work with potential to introduce mistakes)... but maybe there are scenarios I'm unaware of where this really does pay off.

To avoid talking in less helpful, general terms, or merely refering to rules/guidelines as if that should be answer enough, in the specific case of incidental variable renaming, I would appreciate it if you could humour me with some real-world scenarios that might better illustrate the merits of going to the effort of splitting them out into separate changes, as well as clarifying if you think there's a pragmatic grey area where it's not worth splitting out spearate changes to Nth degree, and if so, what distinguishes that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A real world example is that people don't like reviewing commits with changes that aren't logically the same. It's a context switch for no logical reason.

Variable name changes typically happen in the course of other changes in the same unit of code.

You are also hurting the chances that anyone gives you a serious review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a context switch for no logical reason

Thank you. I think this is perhaps the closest we've come to getting something tangible to grasp on to what a possible problem might be (and which is not too broad a generalisation or "Them's the rules; they must be applied to the Nth degree"). I didn't intend for it to be a difficult or trick question. However, I do think it's quite a stretch of the imagination to think that such a simple, incidental renaming change could realistically give any reviewer/reader any difficulty at all... but I do conceed that perhaps there's a remote chance that, even in this case, it does.

Anyway, on this point, even though I think I still disagree with the problem that this renaming (being lumped in with another change) realistically imposes, and I stand by the view that I think it is a tangible, error prone risk to unpick changes of this type, with drawbacks that outweigh the benefits of splitting out, I've done this change.

You are also hurting the chances that anyone gives you a serious review.

By doing what? I'm not refusing to make changes, but I am asking for clear explanation if I don't understand or dispute the value of changes. I think it's the job of a responsible, diligent reviewee to not merely unquestioningly do as they're asked, but to truly understand the value of suggestions and to also challenge what is being asked if the benefits are unclear. This way, both the reviewee can learn, but also the reviewer has the opportunity to learn, to question the extent to which they understand the value of their practices; to be a better communicator as well as to refine and develop pragmatic and nuanced opinions (or even harden their views) under questioning. I also hope a good reviewer would take questions at face value (to not read any malicious intent or get upset at being pressed to better explain and justify their view), and be able to discuss, negotiate, learn, change, and be flexible if debate ensues.

Thanks again both of you (@tristan957 and @eli-schwartz ) for taking the time to discuss this.

@tristan957
Copy link
Contributor

Thanks for contributing!

After gettting no response to my questions on the status of some aspects of meson's LTO support #11933

Highly recommend joining the Matrix/IRC channel for more real time discussion. That is where most of us hangout.

@tristan957
Copy link
Contributor

Please squash the grammar and lint complaint commits into their respective parent commits.

@@ -1545,6 +1598,7 @@ def add_languages_for(self, args: T.List[str], required: bool, for_machine: Mach
mlog.bold(' '.join(comp.linker.get_exelist())), comp.linker.id, comp.linker.version)
self.build.ensure_static_linker(comp)
self.compilers[for_machine][lang] = comp
_validate_tool_combination(comp, self.build.static_linker[comp.for_machine])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems semantically out of place here -- shouldn't it be invoked inside of detect_static_linker, rather than in the interpreter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do. They're done immediately one after the other and we also have everything we need at (or immediately after the call to 'detect_static_linker') this validate could easily be pushed down fractionally more out of the way. However, injecting new 'whole tool chain combination' validation within build.ensure_static_linker or detect_static_linker could surprise/mislead the reader since they'd no longer be named exactly what they do; 'detect_static_linker' is no longer just detecting static linker but detecting linker and validating compiler/static/dynamic tool combinations. Similarly 'ensure_static_linker' is doing exactly and only what its name says.

Thoughts -

  1. What about fattening up the name? I'm slightly uncomforable with renaming to something like ensure_static_linker_and_validate_tools: Although it makes expectations clearer to the reader, it is smushing a couple of not strictly related things together... But is that actually going to be a problem? Hard to say, but what if, some time in the future, there's another tool in the chain that we care about (an assembler, a pre-processor, some kind of decompiler, instrumentation injection tool, bespoke symbol extracter/stripper, or whatever)... Because we sneaked toolchain validation into the static linker selection func, it then may not be convenient to validate the whole tool chain at that point.
  2. What about not fattening up the name but just sneaking this extra validation in there anyway? That feels like inviting trouble by surprising the reader, when a function's name suggests it's ONLY concerned with the static linker. Yes, 'build.ensure_static_linker(comp)' is passed in a Compiler, which is pretty explicit that its state is valid and used, but I prefer to more precisely convey what is done with a name than fret over too long a name because of concerns with some limit guidelines.
  3. Where _validate_tool_combination(...) is currently called, in interpreter.py certainly feels like similar kind of work currently being done by self.build.ensure_static_linker(comp) and is the earliest opportunity to inspect all the current tools at hand as well as saying exactly and only what it does, and while also allowing us to more easily broaden the set of tools and the checks done across them all.

My preference is 3 but I wouldn't object (beyond what I've already said) if others preferred 1. Any alternatives. Can we get a consensus?

@@ -349,15 +349,25 @@ class VisualStudioLinker(VisualStudioLikeLinker, StaticLinker):

"""Microsoft's lib static linker."""

id = 'lib'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general regarding these ids, I think you're adding them for static linkers while currently we mostly do not implement self.id for static linkers at all, and do not seem to check this attribute or expose it anywhere particularly?

e.g. if I delete them all from this file in git master, the tests all pass.

This makes me inclined to believe it's okay to treat the addition of these ids as part of the same logical change as the validation logic you are adding.

It also means I'm not concerned about e.g. documenting these ids or caring about whether this will change what users see -- because users can only get access to cc.get_linker_id() providing the dynamic linker's identity.

# In which case, duplicating the compile lto options for the linker args just produces
# unnecessary warnings, so we expect the compile options (but not necessarily the link
# options) to be non-empty.
self.assertTrue(len(expected_compile) > 0, f'No LTO compile args added "{cc.id}" compiler')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A set with length 0 is always false, a set with length >0 is always true... I think the usual style for unittest asserts would be:

        self.assertTrue(expected_compile, f'No LTO compile args added "{cc.id}" compiler')

The practical difference here is in the logging. When you use len() you get this error:

AssertionError: False is not true : No LTO compile args added "{cc.id}" compiler

When directly asserting self.assertTrue(expected_compile) you get this test failure message:

AssertionError: set() is not true: No LTO compile args added "{cc.id}" compiler

Although strictly speaking the only crucial bit is the explicit message passed, it doesn't hurt to permit the internal diagnostic pretty-print the actual value passed to the assertion method.

Also, it's shorter to type. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

@tristan957 tristan957 added this to the 1.3.0 milestone Jul 31, 2023
@GertyP
Copy link
Contributor Author

GertyP commented Jul 31, 2023

I saw that the 'clangx64ninja' automated tests were failing with -

$ D:\a\_temp\msys64\mingw64\bin\python3.exe D:\a\meson\meson\meson.py setup --backend=ninja --prefix /usr --libdir lib -Db_lto=true -Db_lto_threads=8 D:\a\meson\meson\tmpje_pcf0_ "D:\a\meson\meson\test cases/common\6 linkshared"
...
C compiler for the host machine: clang (clang 16.0.5 "clang version 16.0.5")
C linker for the host machine: clang ld.bfd 2.40
C++ compiler for the host machine: clang++ (clang 16.0.5 "clang version 16.0.5")
C++ linker for the host machine: clang++ ld.bfd 2.40
...
[5/10] Linking target libmylib.dll
FAILED: libmylib.dll 
"clang"  -o libmylib.dll libmylib.dll.p/libfile.c.obj "-flto" "-flto-jobs=8" "-Wl,--allow-shlib-undefined" "-shared" "-Wl,--start-group" "-Wl,--out-implib=libmylib.dll.a" "-lkernel32" "-luser32" "-lgdi32" "-lwinspool" "-lshell32" "-lole32" "-loleaut32" "-luuid" "-lcomdlg32" "-ladvapi32" "-Wl,--end-group"
libmylib.dll.p/libfile.c.obj: file not recognized: file format not recognized
...

which seems to me to be a case of mixing incompatible tools; in this case, a clang compiler and the gnu bfd linker.
I wouldn't be surprised if the two happened to be compatible when vanilla compiling but, just as appears to be the case with "clang-compiled objects to msvc's 'lib'", the introduction of clang's LTO byte-code perhaps understandably seems to upset tools downstream from LTO compilation (ld.bfd in this case).

So, can anyone help with this and point me to where I can change this 'clangx64ninja' build environment so that it explicitly selects a more appropriate linker (i.e. 'ld.lld')?
Expecting some tool family uniformity seems like a reasonable thing to have in the build automation (and user) environments but perhaps mixing the tools is done intentionally. Does anyone know if this llvm/clang + gnu linker mixing is done intentionally or have an opinion on now restricting the testing environments to avoid this mixing?

I don't know how many user build environments rely on being able to mix an llvm-based compiler with a gnu or msvc-based lib/ar or linker tool... but I can't imagine it being difficult for the user to specify a compatible tool after the incompatibility is pointed out to them. This raises further questions of how far we should go out of the way to try to accommodate a weird mixture of tools; we now have a couple of concrete cases where use of LTO seems to rule out certain toolchain mixing, but do we now try to add potentially ever more nuanced compatibility checks (e.g. in more extreme cases, do we get "clang-cl with option X, Y, or Z, used in combination with option A is incompatible with static linkers L, M, and N, and dynamic linkers U, V, W") or does it seem reasonable to now restrict/expect a single family LLVM/msvc/gnu/etc tool chain only, like what I've started in _validate_tool_combination(...)?

Finally, since my big comment hasn't provoked any further discussion or controversy on the relative benefits/drawbacks of the "search and guess" philosophy - which, in my experience is more trouble than benefit - compared to the "stop, report, and require explicit user selection" approach, I'll also trim that comment down a little.

@eli-schwartz
Copy link
Member

The comment is too long so I was planning to wait until it was a reasonable length before trying to read it and decide what I think about it.

@GertyP
Copy link
Contributor Author

GertyP commented Jul 31, 2023

It boils down to this 😄 Supposing you object to some tool chain validation on the grounds of ...

"Why even bother with this at all? If the user's mixing weird combinations of tools, they're asking for trouble and we aren't obliged to go out of our way to clean up after them": Normally, I'd agree with this, however, meson has the undesirable tendency to quietly (i.e. without requiring some action/acknowledgement from the user) go and search and guess at things (like a static linker tool) that are needed but which the user hasn't specified. So, in the example above (and maybe with other similar 'find and guess' cases), either meson needs to do a better job at guessing at a static linker or it at least should do more checking of problem combinations and better informing the user of the problem and how to fix it.

Maybe it's too late to rein in some of meson's unfortunate 'find and guess' tendencies in these regards, which implicitly invite these issues and more... but I do sometimes daydream about a 'strict' or 'explicit' mode option where if you haven't specified it, you don't get it. Clean, simple, strict, fast, informative.

@eli-schwartz
Copy link
Member

eli-schwartz commented Jul 31, 2023

For clarity on the pdb changes... your problem is typing "s" and expecting to print a variable but stepping through the function instead? Or typing "s" and expecting to step through the function, but printing a variable instead?

@GertyP
Copy link
Contributor Author

GertyP commented Jul 31, 2023

For clarity on the pdb changes... your problem is typing "s" and expecting to print a variable but stepping through the function instead? Or typing "p" and expecting to step through the function, but printing a variable instead?

Your statement on typing 's' is correct. It's exactly the same issue with 'p'. I think I may have been debugging an error in a p = subprocess.run(...) (that may not have actually been an error at all) and tried to just inspect/use 'p' in the interpreter prompt, only to see something like *** Error: Syntax error. I wasn't aware of the 'p' (evaluate expression and print) pdb command so thought it was an error coming from the execution of the command and/or arguments and options, rather than from the interpreter/debugger itself 🙄

@eli-schwartz
Copy link
Member

Merged the pdb changes as commit 7c95561 (with an updated commit message including better matching https://cbea.ms/git-commit/).

@GertyP
Copy link
Contributor Author

GertyP commented Aug 1, 2023

... with an updated commit message including better matching https://cbea.ms/git-commit/

Thanks. By the way, (and apologies in advance for this not being an ideal place to do this) I think that's an unfortunately fairly poor blog in that it consistently fails to apply its own message of communicating the 'why' along with the 'what'; little to no rationale on the reasons for many of the rules. And if someone might argue that the rationale should be pretty obvious, then I'd ask does an obvious rule add any value to the list?

Everywhere I've seen coding standards or working practices/guidelines, they don't have things like "don't write hard to read, bug-ridden code" or "avoid using Esperanto" because it's obvious and uncontentious to all. That leaves issues about which there could conceivably be some debate or require a little rationale. But rather than exhaustively go through all that this leaves, they focus on the most important, highest value items and are accompanied by a discussion and rationale, which serves a few purposes -

  • Any list of rules/guides is a small cognitive burden, which becomes harder to retain and apply, the longer it gets, so the shorter, higher impact, the better.
  • Understanding pros/cons helps better embed the rules in the minds of the readers.
  • Including rationale prevents the many, "But why?" and "This sounds a bad idea because ..."-style follow-ups, which are apparent in the discussion section of that article.
  • Laying out the rationale also shows what pros and cons have been considered to support/oppose each item, allowing new arguments to evolve the rules/guidelines in the future.

None of this is to say I particularly disagree with any of the rules, just that it totally fails to include a hugely important aspect.

@GertyP
Copy link
Contributor Author

GertyP commented Aug 1, 2023

Regarding the clangx64ninja failure I mentioned above, where it's mixing the clang compiler (that emits LTO byte-code in one test) and mixing it with the gnu bfd linker: After a bit of poking around the build automation pages, I found this -

...
name: Run Tests
run: |
  if [[ "${{ matrix.MSYS2_ARCH }}" == "x86_64" ]]; then
    # There apparently is no clean way to add to the PATH in the 
    # previous step?
    # See for instance https://github.com/msys2/setup-msys2/issues/171
    export PATH=$PATH:$PWD/pypy3local
    # Make sure it is on the PATH
    pypy3 -c "import sys; print(sys.version)"
  fi
  export PATHEXT="$PATHEXT;.py"
  if [[ '${{ matrix.COMPILER }}' == 'clang' ]]; then
    export CC=clang
    export CXX=clang++
    export OBJC=clang
    export OBJCXX=clang++
  fi
...

in the middle of this workflow file page.

I think this looks like we're only partially specify some tools, leaving the rest up to meson and whatever can be found on the local machine, but where we can explicitly specify all tools to use the same family (llvm/gnu/msvc/arm/etc). This looks out of my reach to modify. I want to see if changing the ... == 'clang ]]; then block to add -

export CC_LD=ld.lld
export CXX_LD=ld.lld
export AR=llvm-ar

and perhaps it might be suitable to also specify the linker env.vars for OBJC_LD and OBJCXX_LD with the same tool family. I'm guessing that since the compiler's clang/++, it'd also be suitable to use ld.lld for these, too.

Providing no one objects to explicit and uniform tool selection in this build environment, or is concerned that users might somehow be relying on weird tool family mixing and will struggle to explicitly specify tools from the same set/family if they encounter the new validation message, is this something someone could help change, to see if it fixes this particular build error (while not introducing new ones)?

@GertyP
Copy link
Contributor Author

GertyP commented Aug 6, 2023

An alternative that others might prefer is to add extra skipping logic just for these LTO test tool combinations.

The meson LTO tests in the case we have above ('clangx64ninja' specifying only clang/clang++ compilers and leaving meson to guess/pick the Gnu 'ld.bfd' linker, which isn't compatible with clang LTO, instead of picking 'ld.lld') would be quietly skipped.

That would be a simple work-around but also a shame because we have an opportunity to help the user out by pointing out that they're mixing tool families that are incompatible (at least under some circumstances like using LTO) and giving them the opportinity to simply explicitly select compiler/static-linker/dynamic-linker from the same family. Without that, there's a good chance they'll run into the fairly unhelpful "invalid or corrupt file: cannot read at 0x..." error when turning on 'b_lto'... But I would prefer it if we can get windows + LTO tested under the 'clangx64ninja' build test environment, which I hope can just be a case of adding explicit linker/'_LD' (and possibly static linker/'AR') env.vars using llvm-based tools, too.

Comment on lines 461 to 465
clangcl_color_args: T.Dict[str, T.List[str]] = {
'auto': ['-fcolor-diagnostics'],
'always': ['-fcolor-diagnostics'],
'never': ['-fno-color-diagnostics'],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you added this in a previous commit, might as well just fold the commit into the other

Comment on lines 461 to 465
clangcl_color_args: T.Dict[str, T.List[str]] = {
'auto': ['-fcolor-diagnostics'],
'always': ['-fcolor-diagnostics'],
'never': ['-fno-color-diagnostics'],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend pulling the color-related changes into their own commit since I don't see their relevance to supporting LTO on Windows.

@eli-schwartz
Copy link
Member

By the way, (and apologies in advance for this not being an ideal place to do this) I think that's an unfortunately fairly poor blog in that it consistently fails to apply its own message of communicating the 'why' along with the 'what'; little to no rationale on the reasons for many of the rules.

The rule about capitalizing the subject line is the only one lacking a "why" -- it is also the only rule I frequently ignore. :)

I would recommend pulling the color-related changes into their own commit since I don't see their relevance to supporting LTO on Windows.

👍

I think this looks like we're only partially specify some tools, leaving the rest up to meson and whatever can be found on the local machine, but where we can explicitly specify all tools to use the same family (llvm/gnu/msvc/arm/etc). This looks out of my reach to modify. I want to see if changing the ... == 'clang ]]; then block to add -

This should be doable by adding mingw-w64-${{ matrix.MSYS2_ARCH }}-lld as one of the packages to install around line 85.

An alternative that others might prefer is to add extra skipping logic just for these LTO test tool combinations.

I'd prefer not to skip tests. :D

If we are going to consider alternatives to globally setting lld on the clang test runner, the alternative would be better off as setting lld just within the definition of test_lto_threads.

I'm not sure it matters much which option to pick. It's simpler to globally set lld. But then we don't test that clang with bfd works in non-lto cases. However, I think we don't have any compelling need to test that clang + bfd works in general, since that doesn't really have anything to do with meson.

So probably just set up lld right in the workflow file. You don't need permission to modify the workflow, it doesn't require special privileges or anything. (There is stuff in the ci/ciimage/ directory that does need to be specially deployed before PRs can use them, but that's a very different story. Those handle our docker test images. We don't use docker on msys2.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this commit be folded into the one where you enabled color_out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Will get on to that, thanks.
Bit puzzled about the effect that the above Added explicit use of ld.lld ... to github workflow change has had on the tests though. It's producing clang: error: invalid linker name in argument '-fuse-ld=ld.lld', which I'd guess suggests that adding -

mingw-w64-${{ matrix.MSYS2_ARCH }}-lld

to the msys2.yml workflow file may not have adequately added and pathed/set-up access to the llvm ld.lld.exe linker.

Can anyone see what I've done wrong in that regards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really think of anything. Here is the package on msys2.

https://packages.msys2.org/package/mingw-w64-x86_64-lld?repo=mingw64

@lazka I forget if you have the expertise to help here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And some of the other tests failures are a bit more expected in that they're saying things like -

...
C compiler for the host machine: clang (clang 15.0.7 "Ubuntu clang version 15.0.7")
C linker for the host machine: clang ld.bfd 2.40

__w/meson/meson/test cases/common/1 trivial/meson.build:2:0: ERROR: Mixing tools with known incompatibilities: compiler (clang), static linker (llvm-ar), dynamic linker (ld.bfd)

which is more gnu/llvm toolchain mixing that's caught by my new validation func, which I hope can easily be fixed through similar github workflow .yml changes as I've tried to do with msys2.yml. Once I figure how/what is going wrong with that one in particular.

As discussed a little above, the other build test environments are also complaining of mixing tools with known incompatibilities, even in cases where it will actually happen to work in many cases, but not in others (i.e. LTO). So I'm hoping it's reasonable to simply get all test env. build tools explicitly aligned to the same toolchain, although this may expose users to the fact that, along with some of these build test environments, they might also be relying on - and getting away with - similar toolchain mixing, but which I hope is trivial for them to fix (through AR=llvm.ar and ..._LD=ld.lld, etc)... Or we could go a little out of our way to try and accommodate mixed tools (like clang + ld.bfd) under certain conditions, but I suspect the list of conditions and weird combinations to test may grow and become tricker to maintain; I prefer the simpler but more blunt all-same-toolchain-family validation, but I can believe that some people might prefer a little more meson complexity to avoid forcing users exposed to this scenario to have to explicitly set their tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think we weren't both on the same page about what the proposal was -- my fault, I didn't read clearly enough.

But see what I said above:

If we are going to consider alternatives to globally setting lld on the clang test runner, the alternative would be better off as setting lld just within the definition of test_lto_threads.

I'm not sure it matters much which option to pick. It's simpler to globally set lld. But then we don't test that clang with bfd works in non-lto cases. However, I think we don't have any compelling need to test that clang + bfd works in general, since that doesn't really have anything to do with meson.

I was assuming that meson would allow clang + bfd as long as LTO isn't enabled, and that we didn't "have to" test it since it surely works. With the additional perspective that this PR was causing test failures reprimanding the CI runner that clang + bfd is a "known incompatibility", I'm going to have to take back what I said.

Specifically, the following points are crucial:

  • lld is, intentionally and by design, a bfd-compatible linker. This means that:

    • GCC is intended and able to use lld by default, and many people rely on that
    • clang is intended and able to use bfd by default, and many people rely on that

    ... and in both cases, the default is usually to use bfd unless explicitly configured. This means that clang's out of the box behavior on Linux is to use bfd -- a situation that works fine as a default, though it may not work for every non-default option available.

  • binutils BFD supports clang LTO as long as binutils was built with plugin support and the LLVMgold.so plugin is available. This is the case on a number of linux distros. I successfully use clang LTO with binutils bfd.

So it is one thing to:

  • check if clang is the compiler and lto is enabled and the linker is something known to not work with that, such as -- per my understanding -- the MSVC tools, or
  • perform a configure check to see if a multi-stage "compile object file with LTO, make a static library, then try to link it into a program" sanity check works

It's another thing entirely to reject combinations we don't have proof are broken, simply because we are being overly cautious. That is the kind of thing users would submit bug reports over, saying that meson is broken -- and tbh, they would be right, meson would be broken. ;)

tl;dr

It's fine for meson to error out early during configure if it can prove that building would fail. If it has false positives, then it is not fine. In such cases, emitting a warning is acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That's just the kind of clarification I needed.
I wasn't aware that the risky-sounding mixing of clang and gnu ar/bfd tools is ubiquitous, nor that they can be clang LTO-byte-code-compatible, only under particular circumstances that we can't detect just from inspecting the static/dynamic linker ID in use.

So, I think the thing to do is reign in this blunt validation to now simply check if any of the msvc lib/link tools are used in conjunction with LTO, which we know never works.
That leaves the user exposed to trying to use LTO with clang/++ and gnu ar or bfd tools that haven't been configured with what's needed for LTO plugin compatibility. If that's the case, they'll just get a slightly unhelpful link error and that's on them (for now at least).

@tristan957
Copy link
Contributor

Test erros look extremely relevant

Comment on lines 115 to 126
export CC_LD=ld.lld
export CXX=clang++
export CXX_LD=ld.lld
export OBJC=clang
export OBJC_LD=ld.lld
export OBJCXX=clang++
export OBJCXX_LD=ld.lld
export AR=llvm-ar
Copy link
Contributor

@lazka lazka Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this?

Suggested change
export CC_LD=ld.lld
export CXX=clang++
export CXX_LD=ld.lld
export OBJC=clang
export OBJC_LD=ld.lld
export OBJCXX=clang++
export OBJCXX_LD=ld.lld
export AR=llvm-ar
export CC_LD=lld
export CXX=clang++
export CXX_LD=lld
export OBJC=clang
export OBJC_LD=lld
export OBJCXX=clang++
export OBJCXX_LD=lld
export AR=llvm-ar

Note that the combination clang+lld+libstdc++ is not really used in MSYS2, so there might be bugs.

Maybe meson CI should be switched to test with the CLANG64 environment instead? I can give that a try if wanted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the help. That's fixed one significant problem. I'll work through the remaining ones when I get time.
I see one of them is a test_pefile_checksum assertion failure that I suspect is down to the previous gnu ld.bfd linker writing out some checksum field in the PE format, which llvm ld.lld may not do yet. I'll find out for sure and potentially add case to handle different ld_id = cc.get_linker_id() expectations.

@tristan957
Copy link
Contributor

https://github.com/mesonbuild/meson/actions/runs/5812159814/job/15756937887?pr=12043#step:6:1118

@GertyP
Copy link
Contributor Author

GertyP commented Aug 9, 2023

I notice that the msys2, gccx86ninja job is complaining that the 15 llvm framework tests are now running when they're expected to be skipped.

Might this be a result of unconditionally installing the lld package in the workflow file for all jobs? -

    mingw-w64-${{ matrix.MSYS2_ARCH }}-lld

This might take me a while to find out exactly where and how these framework test expectations are set, so maybe someone knows straight away if the thing to do is to guard the ...-lld package line to only be used for the 'clangx64ninja' job (and if we do something similar elsewhere so I can figure out the right syntax) or whether it's ok to change wherever the failing expectation is set (although it doesn't sound like we should concern ourselves with llvm framework tests in a 'gcc' job).

@tristan957
Copy link
Contributor

test cases/common/1 trivial/meson.build:2:0: ERROR: Mixing tools with known incompatibilities: compiler (clang), static linker (llvm-ar), dynamic linker (ld.bfd)

@tristan957
Copy link
Contributor

Might this be a result of unconditionally installing the lld package in the workflow file for all jobs?

I am thinking it is. I would just special-case that package.

@tristan957
Copy link
Contributor

@GertyP
Copy link
Contributor Author

GertyP commented Aug 29, 2023

Dear reviewers. I understand the one outstanding automated check failure is a known issue (cygwin cmake ZLIB_VERSION regex issue). Aside from that, is there anything I've missed that's preventing this from being merged?

Some kind of update or feedback would be much appreciated, whether it's, "Too busy dealing with higher priority stuff at the moment. Hope to get round to it soon", or, "Not going to take this because ...".

Again, as mentioned above, thanks to all those so far for taking the time to contribute/engage in discussions above.

@tristan957
Copy link
Contributor

I'm sure Eli will get to it when he has some time. Luckily we are tracking it in the milestone, so hopefully it isn't dropped this cycle.

@GertyP GertyP force-pushed the win_lto branch 2 times, most recently from e1787a8 to 9d8099d Compare September 15, 2023 07:40
@tristan957
Copy link
Contributor

@GertyP Looks like there are some conflicts on this branch. Wanna fix those so the PR is ready for when people can review?

@GertyP
Copy link
Contributor Author

GertyP commented Nov 2, 2023

A strange scenario popped in to my head that might account for the lack of progress/updates on merging this PR after addressing all feedback that I'm aware of.

Of course, without any clear indication otherwise, I'm just guessing it's down to massively bottlenecked individuals with long backlogs of higher priority, which is understandable (although, combined with limited feedback, maybe not ideal for fostering greater engagement and contributions from newcomers like myself)... Still, it would be reassuring to at least rule out another possibility: Is it possible that everyone is waiting for someone/anyone else to review/merge this PR and as a result, no one is actually likely to get round to it? Or does it contain changes that put it in an area where nobody is particularly confident or willing to go and merge it (and if so, what happens in such scenarios)?

Meson's automatic detection and use of the 'lib' static library tool is
perhaps working by chance with clang/clang-cl code generation.  At
least it can be incompatible when compiling with LTO enabled.  It invites
potential trouble by mixing different toolchains (llvm vs msvc) like
this.  Because of this, we also -
- Better distinguish StaticLinker 'id's. It was either that or change code
  to identify based on 'isinstance()' instead. I'm not commited to one or
  the other.
- Add some initial compiler/linker/lib-archiver family compatibility
  checking.

The new validation can't expect a valid linker: c-sharp 'csc' compiler
has none.

Because clang/clang++ & ar/ld.bfd is not necessarily an LTO-incompatible
combination (apparently it depends on the flavour/configuration of the
gnu binutils available and ones with plugin support and suitable plugins
may be just as ubiquitous as those without), unfortunately we can't have
the tool validation do a simple, blunt 'make sure all tools are from the
same toolchain family' check.  We only do a limited 'no msvc tools' check
since msvc's 'lib' or 'link' tools are always incompatible with
clang-cl's LTO byte-code.
clang-cl supports the '-fcolor-diagnostics' and '-fno-color-diagnostics'
args.

Can't re-use clang.py's 'clang_color_args':  A recent commit (3de0f6d)
changed the the clang colour options to gnu-style options, which aren't
recognised by clang-cl.
It was incorrect to expect the compiler lto args to be used in the link
params as well as the compile params.  lld-link doesn't need any '-flto'
option; it would just warn about an unknown option.
The clangx64ninja build test env explicitly specified only the clang/++
compiler env.vars (CC, CXX, OBJC, OBJCXX), which leaves meson to guess
at what static & dynamic linkers to use.  Unfortunately, it can decide
to go for the gnu 'ld.bfd' linker, which might sometimes be compatible
with clang-compiled objects but is not compatible with clang's LTO byte-
code output. Now we explictly use all llvm-based tools by adding -
- ..._LD=ld.lld
- AR=llvm-ar

Tried and failed using `..._LD=ld.lld`.  Meson seems to use the _LD
linker env.var value to directly invoke -
`clang -Wl,--version -fuse-ld=ld.lld`
which doesn't work; needs to be 'lld' (`..._LD=lld`) it seems.
The windowstests.py 'test_modules' test incorrectly assumed we're dealing
with the msvc compiler, when it's perfectly reasonable to be using the
clang-cl compiler (i.e. through CXX=clang-cl env.var), which doesn't yet
support C++ modules, so have added an extra skip check, along with a
TODO/FIXME to refine the skip check with a version check when clang-cl
supports modules.
The Gnu 'ld.bfd' linker and msvc's 'link' correctly output the PE checksum
but the llvm-based linkers like 'ld.lld' and 'lld-link' do not (yet).

This became apparent after changing the msys2 github clangx64ninja workflow
environment to explicitly use 'ld.lld' for better toolchain uniformity.
@GertyP
Copy link
Contributor Author

GertyP commented Mar 4, 2024

Can someone please either merge this or help by pointing out any blocking issues that I'm unaware of?

I do see there's one outstanding requested change but from my point of view, that's outdated and no longer applicable. Also, since it's not me who added that requested change, I think I don't have the right to close/resolve that issue.

@tristan957
Copy link
Contributor

tristan957 commented Mar 4, 2024

I tried bringing this up in Matrix. @eli-schwartz still seems to have unresolved reservations.

@GertyP
Copy link
Contributor Author

GertyP commented Mar 5, 2024

Thanks for responding @tristan957

It helps to know that there is at least a concern (as opposed to being left wondering whether I've said/done something to upset someone and ignored as a result). Still, it would be more helpful if I understood at least some of what the concerns were so that we might be able to work something out and get LTO support for clang-cl.

@tristan957
Copy link
Contributor

My guess is that it is related to #12043 (comment). I will try to advocate more for 1.5. 1.4 window has closed unfortunately.

@tristan957
Copy link
Contributor

Or #12043 (comment). Only Eli can say for sure.

@GertyP
Copy link
Contributor Author

GertyP commented Mar 7, 2024

Everything mentioned in that first linked comment is addressed in _validate_tools: No more blunt, sweeping requirement for same family of compiler, lib, and linker tools. Instead, a more limited and basic validation only of known incompatible combinations that we can be sure of from only inspecting the linker or static linker ID (i.e. lto + clang + msvc's 'lib'). This allows the user to again mix gnu tools with clang (or most other combinations), as before, but if the user tries to enable LTO then they are not helped out by any further meson checking; if their linker or static linker isn't combatible with clang's LTO bytecode, they'll be exposed to an error from the tool (for 'ld.bfd', I think that's along the lines of 'object file corruption').

Everything from the other linked comment above seems to be covered by Added explicit use of lld and llvm-ar to msys2 github workflow . I.e. make the workflow that specifies the clang compiler also use/install and specify the 'lld' and 'llvm-ar' tools in the same set of environment variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants