-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduced size binary #20064
Comments
Thanks for opening this discussion @fungs ! I agree with you that Vector's current binary size is not what I would guess when thinking of a "lightweight" binary; even Vector's first "official" release (v0.10.0) had a binary of 80 MB. I think that statement was likely comparing against Splunk, FluentD, and Logstash, which are quite a bit heavier. FluentBit might be a better comparison though I note that FluentBit's binary is 50 MB so its not really that far off (I was thinking it'd be an order of magnitude). As another datapoint the OpenTelemetry Collector, even without any contrib modules, is 99 MB. All of these are looking at x86_64 builds. All of these certainly seem pretty heavy-weight for a "sidecar" deployment. I agree with the list you have to investigate, and would add a couple of things like Another note is that Vector statically compiles most dependencies (librdkafka, libsasl, etc.) which is probably not helping the overall binary size. This is done for portability reasons. |
@jszwedko, that's exactly the way I'm looking at it. I was referring to the x86_64 architecture, but I assume that the picture is similar for others. I was also comparing to fluentbit, shipping an all-in-one binary of 50 MiB, which seems to have a similar stack and purpose. Looking at the static compilation issue: How others do it: Naively and technically, I'd think that one could probably build a set of binary artifacts and bind them per individual use case, but I'm not into the whole Rust tool chain. Cheers |
I posted some suggestions on this at: #17342 (comment) |
Our internal build with just a few sinks and stuff is about 20 MB with LTO |
Incidentally, the |
I don't have much time atm to engage much in this discussion, but this was a concern for me and I spent a fair amount of time looking into building Vector for minimal size.
I don't recall fat vs thin LTO making much notable difference in size. I should add that I'm skimming through some old notes for those sizes. # `Cargo.toml` sets `opt-level = "z"` and `lto = "thin"` (not much value in fat),
RUSTFLAGS="-C strip=symbols -C relocation-model=static" OPENSSL_NO_VENDOR=1 cargo build \
--release \
--no-default-features \
--features "codecs-syslog,sources-stdin,sources-syslog,sinks-console" \
--bin vector \
--target x86_64-unknown-linux-musl
It'd be good to know what features are lightweight vs heavy, as I'd like to include a lightweight version of Vector for users to manage their logs with than the less pleasant logging setup an image I maintain has.
I've been meaning to contribute at some point a I remember hitting quite a few walls, some of it was unfamiliar, other parts making sense of what the repo build scripts were doing, looking at the At the time official Vector release binaries were like 180MB uncompressed 😨
That's pretty cool, cheers 👍 I modified it to output the feature list instead of running |
👍 You can also try |
|
@paolobarbolini, if you could share a tiny recipe about how you achieved that, it would certainly be helpful for me and others. |
What we did was patch Cargo.toml with diff --git a/Cargo.toml b/Cargo.toml
index 78cd48b..cccfdf1 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -46,6 +46,9 @@ path = "tests/e2e/mod.rs"
# compiled via the CI pipeline.
[profile.release]
debug = false # Do not include debug symbols in the executable.
+lto = true
+codegen-units = 1
+strip = true
[profile.bench]
debug = true
Then we looked at cargo build --release --no-default-features --features COMMA_SEPARATED_LIST_OF_FEATURES For example Expect the build, especially the linking step at the end, to be very slow. |
To simplify the above, you can automatically get the features by running |
Looks like it's the same as what I shared above earlier: #20064 (comment) Additional tips:
Unfortunately while I was writing up an update to respond to this, my PC crashed and I lost a fair amount of good information :( Rough recollection (my original write-up was much better formatted/detailed):
|
A note for the community
Use Cases
The first sentence on the Vector website states "A lightweight, ultra-fast tool for building observability pipelines". When I looked at the
vector
binary in the different Debian packages, it is about 127 MiB, equivalent to a full Linux distribution image. That's not really lightweight for most people (including myself, of course).The binary size can be an issue in some situations like
Attempted Solutions
No response
Proposal
I don't understand why the binary is so bloated, but here are some ideas to get it down to a reasonable size, or at least to make it more plausible
I just feel bad to augment a container image with the vector binary for doing such a simple thing as forwarding metrics and by doing so, doubling its size.
References
No response
Version
vector 0.36.1 (x86_64-unknown-linux-gnu 2857180 2024-03-11 14:32:52.417737479)
The text was updated successfully, but these errors were encountered: