-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't leak non-exported symbols from staticlibs #104707
Comments
cc @chorman0773 |
FTR, I'm unsure there is a way to limit exported symbols from a staticlib, without both limiting the number of CGUs within the crate to 1 and preventing any upstream crates from using the |
Maybe partial linking would work? If not we will at least need to version all |
IDK whether all linkers support partial linking (in particular, I don't know about microsoft's link.exe). Though, personally, I don't particularily want to touch
I plan to do it slightly differently when possible - for staticlibs resolve the dependencies and produce a |
That is a backward incompatible change for rustc
That will share and thus leak the global state of libstd across the cdylib boundary. This among other things will break the mitigation of #102721 to prevent catching foreign rust panics. |
On Tue, Nov 22, 2022 at 07:28 bjorn3 ***@***.***> wrote:
for staticlibs resolve the dependencies and produce a links.o "object"
that is just a linker script.
That is a backward incompatible change for rustc
rustc currently just emits the library directly into the archive, right?
I'm not particularly sure what the difference here is, except in terms of
file size, and the fact that using libstd.so.0.1 is an option in addition
to libstd.rlib.0.1.
For cdylibs, just link as normal and when dynamically linking, add
DT_RPATH (to the stdlib directory)+DT_NEEDED as needed.
That will share and thus leak the global state of libstd across the cdylib
boundary.
That is true, though I'm unsure how it is avoidable - even when statically
linking the symbols from libstd et. al need to have default visibility so
that when linking into a dylib, the symbols can be used via the dylib. It
would not make a difference on ELF.
… —
Reply to this email directly, view it on GitHub
<#104707 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD2667YDJXO3NIQ42553WJS4AXANCNFSM6AAAAAASHT3BQM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The difference is that currently you can ship a staticlib as a standalone file and expect linking to succeed, but with your proposal you also need all dependencies to be available at exactly the same place.
When linking a cdylib, libstd is statically linked and none of it's symbols are exported from the cdylib. When linking a rust dylib, sharing state is just fine. In fact you aren't allowed to duplicate crates in that case. |
Fair enough, though the result is potentially shipping GiB for an API surface that should take MiB.
On ELF I'm unsure how this would be achieved. Symbol visibilty is controlled when the symbol is defined (in the object file), and I just send everything on to ld (post processing? What is that? I only know "add head and tail libraries to link line"). ELF shared objects don't have an "Export List", the dynamic symbol list is just built from the static symbol list usually excluding internal and hidden symbols. Every GLOBAL or WEAK symbol in the list with PROTECTED or DEFAULT visibility can be imported from the cdylib and every symbol with DEFAULT visibilty can additonally be replaced. These are functions of the link editor producing the files and the dynamic linker-loader resolving runtime relocations, and are far from under the control of rust as a language or any particular implementation. |
Rustc passes a version script to the linker specifying exactly which symbols to export and making it to hide everything else. This is but one of the reasons rustc is in change of invoking the linker. |
Ah. My problem is that I have to support link editors that are like "Version Script? What is this? Expected PHDRS, MEMORY, or SECTIONS" |
Linker scripts and versions scripts are different. Linker scripts tell what should be put where in the linked artifact. Version scripts only list which symbols are exported and which aren't amd optionally provide a version for the purpose of symbol versioning. The format of version scripts is trivial in comparison to linker scripts. See rust/compiler/rustc_codegen_ssa/src/back/linker.rs Lines 666 to 724 in b7463e8
|
I am aware of version scripts. The simplicity is not the problem. The problem is if I'm faced with a link editor that doesn't support them, which I cannot always assume. |
Which linker doesn't support it? AFAIK every platform targeted by rustc has a linker supporting them or some other way to hide symbols. |
Hmm... I'm actually not sure. Some quick research on autoconf says only GNU ld + solaris LD (and lld supports it, as will lcld). I guess older platforms may have none of the above, but IDK how old you have to get. I'm sure given enough time I could find a counterexample, but I don't want to look rn. |
Wrt. staticlibs, with the versioned symbols would it be permissible to have same/compatible versions of a rust compiler share things like the global_allocator between compiled staticlibs and final link targets (rather than rejecting with multidef errors)? |
Won't that risk mixing alloc for one allocator with free for another allocator? |
It shouldn't, since by the time any alloc calls have been made, the
allocator symbol would be resolved entirely (in my case, I'd have to
inhibit devirtualization of the global_allocator, but I'm privilaged to be
able to create a way to do that, though marking the symbol as weak would be
sufficient).
…On Tue, Nov 29, 2022 at 06:14 bjorn3 ***@***.***> wrote:
Won't that risk mixing alloc for one allocator with free for another
allocator?
—
Reply to this email directly, view it on GitHub
<#104707 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD25TNBPZ3HW6MTIQLX3WKXQSTANCNFSM6AAAAAASHT3BQM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Symbol resolution may choose the __rust_alloc symbol from one allocator shim and the __rust_dealloc symbol from another allocator shim, right? Even if current linkers are likely to choose both from the same allocator shim, there is no guarantee that this will always be the case AFAIK. |
In my case, the allocator provider is a single symbol that is just a
&'static dyn GlobalAlloc.
In rustc's case, linkonce COMDAT groups exist. Put __rust_alloc and
__rust_dealloc in the same linkonce group.
…On Tue, Nov 29, 2022 at 10:31 bjorn3 ***@***.***> wrote:
Symbol resolution may choose the __rust_alloc symbol from one allocator
shim and the __rust_dealloc symbol from another allocator shim, right? Even
if current linkers are likely to choose both from the same allocator shim,
there is no guarantee that this will always be the case AFAIK.
—
Reply to this email directly, view it on GitHub
<#104707 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD24ICR2VWFYGLBU5AE3WKYOW5ANCNFSM6AAAAAASHT3BQM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I just checked if COMDAT is supported for Mach-O. It isn't: https://godbolt.org/z/f5o9Pj6rY
|
I looked a while ago and they should be? It should be possible the way C++ handles replacing |
I couldn't find any references to COMDAT + Mach-O other than that error in the LLVM source code. For |
Maybe, though vtables would also be found in COMDATs. |
Mach-O only supports 252 sections (1 through 253, 0 is for the current image, 254 is for undefined symbols, 255 for the executable image). https://github.com/aidansteele/osx-abi-macho-file-format-reference
Lld seems to ignore S_COALESCED other than in a check if the section is a code section. If S_COALESCED behaved like COMDAT lld ignoring it would be incorrect. What I think is the case is that all weak symbols are put together in a single S_COALESCED section and then the linker coalesces every function individually rather than in a group like with COMDAT. I can't test my theory. If S_COALESCED is treated as COMDAT though, I still don't think it is very realistic to convince LLVM to support it as it would mean you can't have much more than 200 COMDAT groups in a single object file. |
Isn't this per-segment?
In that case, that would mean that the definition of C++ vtables w/o a key function (All virtual functions are inline or inherited) would be invalid, at least under the Itanium C++ ABI which AFAIK apple clang (and clang on darwin) follows (RTTI and VTable definitions provided by the same TU). |
I thought Mach-O object files (before linking) only allowed a single segment. |
Given that text and data are separate segments pre-link, I'd doubt that. |
https://github.com/aidansteele/osx-abi-macho-file-format-reference
|
Yeah, I just saw that. |
Side note, I would like to do the same thing with cdylib, though I can settle on just panic stuff, since my abi already requires me to support crossing any abi boundery that allows unwinding and treat it as native, even across abi versions - but that should have zero problem because of how panic unwinding is handled. |
cc #33221 |
To plug the leak for 1.58.0, we're trying an I wrote a post which has more details in case anyone cares. |
Great writeup @XrXr!
Interesting
That makes solving this issue a lot easier 😆 By the way rustc uses version scripts to select which symbols should be exported when linking rather than |
…bol, r=<try> Mangle rustc_std_internal_symbols functions This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`. Helps mitigate rust-lang#104707
Hello, what is the status of this? I ran into this problem recently when trying to link 2 rust static libs into an existing c program. Each static lib has the symbols from the stl causing conflicts. I do not quite understand the solution to this problem you appear to be discussing with linker scripts... Personally I would prefer a solution where rust outputs a static lib that only contains one object file that has all symbols stripped except those that should in fact be exported. I am aware that this would bloat the static libs size somewhat, but at least in my use case I presume that enabling lto during linkage of the final binary program would remove a lot of the fat. I do not care that the static lib itself would be 20mb larger as I do not ship it. The rust targets I use are linux-musl and windows-gnu. PS: |
Nothing's changed. I've got a PR waiting on review to at prevent symbol conflicts between different rustc versions: #127173 But this won't help with symbol conflicts when using the same rustc version.
That was something not directly related to this issue. It is a solution to another issue with staticlibs, but doesn't affect the symbol conflicts.
That is the partial linking option I suggested. It doesn't work on Windows and last time I tried it, on macOS it wasn't really working either. Could be that I did something wrong for macOS though. |
Currently two static libraries generated by a Rust toolchain cannot be linked together in a single binary due to symbol conflicts (see rust-lang/rust#104707). This is a problem for WebAssembly targets, where dynamic linking is not stable yet. To link multiple Rust-originated static libraries together, we need to produce a single static library from an umbrella crate that re-exports everything from its dependencies. This change allows `uniffi_automerge` to be consumed as a crate dependency by the umbrella crate.
Currently two static libraries generated by a Rust toolchain cannot be linked together in a single binary due to symbol conflicts (see rust-lang/rust#104707). This is a problem for WebAssembly targets, where dynamic linking is not stable yet. To link multiple Rust-originated static libraries together, we need to produce a single static library from an umbrella crate that re-exports everything from its dependencies. This change allows `uniffi_automerge` to be consumed as a crate dependency by the umbrella crate.
When compiling a cdylib only
#[no_mangle]
symbols are exported.#[rustc_std_internal_symbol]
and mangled symbols are not exported. This prevents symbol conflicts and avoids overriding symbols in ways that causes UB when using a rust cdylib in a rust program. For staticlibs however all symbols leak out of the staticlib. Causing symbol overrides that are potentially UB and symbols conflicts. For example when statically linking spidermonkey, you will get symbol conflicts if the rust version you compile your program with doesn't match the one the rust parts of spidermonkey were compiled with.cc rust-lang/wg-allocators#108 (comment)
cc https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/.E2.9C.94.20spidermonkey-wasm-rs/near/309604553
The text was updated successfully, but these errors were encountered: