-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
sort: fix newline handling across large and/or multiple files #8746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
sort: fix newline handling across large and/or multiple files #8746
Conversation
I am a bit embarrassed to admit I did not notice #8652 before submitting this PR. I am not sure how this affects or overlaps with 8652 but I don't doubt that 8652 is also a valid approach to the problem. :) Edit: If nothing else, this PR still fixes the |
GNU testsuite comparison:
|
CodSpeed Performance ReportMerging #8746 will not alter performanceComparing Summary
Footnotes
|
f2448e2
to
3139847
Compare
GNU testsuite comparison:
|
e04e64d
to
d86638e
Compare
GNU testsuite comparison:
|
Test fixtures are now generated programmatically, let me know if I should squash to remove the hardcoded fixtures from the git history? |
d86638e
to
0f6f6d4
Compare
if anything, this change is much leaner and easier to review (and passes CI, at the very least), as opposed to 8652 :) |
GNU testsuite comparison:
|
0f6f6d4
to
15c0028
Compare
When the sort utility is searching for newlines in a large buffer, skip past any previously-searched data. This fixes a quadratic-time overhead that would occur in case of a line that is far longer than the configured buffer size (and
START_BUFFER_SIZE
).On my M1 MacBook Pro:
This PR also fixes a separate issue where the check
last_file_target_size != leftover_len
was used to determine whether a file is non-empty; however this could fail if the buffer was recently resized, sinceleftover_len
accounts for the additional capacity butlast_file_target_size
does not. This can cause two files to be concatenated without a newline in between. To reproduce: runhead -c 8000 /dev/zero | tr '\0' 'b' >b.txt; echo aaa >a.txt; cargo run sort b.txt a.txt
. I added a new testtest_sort::test_start_buffer
to cover this.Fixes #8583.