-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tput improvements #13
Conversation
here's the output of running both the throughput_fast and throughput_slow files ``` aduffy@DuffyProBook /V/C/fsst (aduffy/improve-throughput)> cargo run --release --example throughput_fast Finished `release` profile [optimized] target(s) in 0.03s Running `target/release/examples/throughput_fast` building a simple symbol table building new text array of 1073741824 bytes beginning compression benchmark... test completed compression ratio: 3.7142857172507413 wall time = 1.323992917s tput: 810987589.2938784 bytes/sec aduffy@DuffyProBook /V/C/fsst (aduffy/improve-throughput)> cargo run --release --example throughput_slow Compiling fsst-rs v0.1.0 (/Volumes/Code/fsst) Finished `release` profile [optimized] target(s) in 0.15s Running `target/release/examples/throughput_slow` building a simple symbol table building new text array of 1073741824 bytes beginning compression benchmark... test completed compression ratio: 0.5 wall time = 3.814265125s tput: 281506866.6733019 bytes/sec ``` It seems like when we have a lot of escape codes, we're considerably (~4x) slower than when we have a lot of code table hits. Need to dig into this.
This comment was marked as outdated.
This comment was marked as outdated.
@@ -436,11 +459,11 @@ impl Compressor { | |||
|
|||
// SAFETY: `end` will point just after the end of the `plaintext` slice. | |||
let in_end = unsafe { in_ptr.byte_add(plaintext.len()) }; | |||
let in_end_sub8 = unsafe { in_end.byte_sub(8) }; | |||
let in_end_sub8 = in_end as usize - 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miri caught that the old thing was technically a dangling ptr.
i never dereferenced it but just to make it happy i work with it as an address instead of as a pointer
Self { | ||
bytes: [value, 0, 0, 0, 0, 0, 0, 0], | ||
} | ||
Self { num: value as u64 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to compile to the same thing in all but debug mode (godbolt link) but probably just clearer this way anyway
## 🤖 New release * `fsst-rs`: 0.1.0 -> 0.2.0 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.2.0](v0.1.0...v0.2.0) - 2024-08-20 ### Other - tput improvements ([#13](#13)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/MarcoIeni/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Improvements in throughput and allocations
find_longest_symbol
stuff and rewrotecompress_count
to just usecompress
, 5x speedup for the train benchmarkvec![Symbol::EMPTY; N]
withvec![0u64; N]
and then transmuting, savingN
calls toSymbol.clone
and replacing 2D with 1D vector (allows us to use thevec!
specialization for creating a vector of all zeros)