Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge slowdown when decompressing big RAR archive #714

Open
justbispo opened this issue Sep 7, 2024 · 7 comments
Open

Huge slowdown when decompressing big RAR archive #714

justbispo opened this issue Sep 7, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@justbispo
Copy link

Version

0.5.1

Description

I haven't tried this with big files of other formats. but at least with RAR files, the program is very slow at decompressing compared with other programs.

I've made a few quick tests with the same archive. It includes 20582 files and the total size of the archive is 306 MB.

Command Duration
ouch d file.rar 30s 523ms
ouch d --accessible file.rar 27s 618ms
unrar x file.rar 11s 621ms
7zz x file.rar 2s 879ms

In this case it's still usable. But I was trying to extract an 11 GB file with almost 1 million files a few hours ago and I couldn't even finish it, I left it running for more than 3 hours before cancelling it. I didn't time it but 7-zip took about 1 minute to decompress it.

Maybe it's an issue of the unrar.rs library.

Current Behavior

No response

Expected Behavior

No response

Additional Information

No response

@justbispo justbispo added the bug Something isn't working label Sep 7, 2024
@AntoniosBarotsis
Copy link
Contributor

That's weird. I get this:

Command Duration
ouch d test.rar -y -q 7sec 630ms 860µs 800ns
unrar x test.rar -y -idq 7sec 232ms 493µs

These are not averaged out so some noise will be present.

My archive was the target folder after a debug and release build which is ~330mb (and apparently 5k+ unpacked files).

I'm using the unrar binary that comes with winrar.

I did notice that if the files are in contention by another process, ouch was a few seconds slower than unrar (at first I was getting 12 seconds vs 8 I think). Instead of decompressing into the target folder (which vsc was probably reading) I put the archive in a separate folder which brought the time down to the same as unrar.

This is something that could depend a lot on the archive structure (maybe the problem arises with way more files?) but at least at 5k it seems normal. I'll try and do some more tests later with more files and see how that goes.

@AntoniosBarotsis
Copy link
Contributor

Ok so creating a new archive by duplicating the target folder 3 times (so 4 copies total at 1.3gb and 22k files) did show a small slowdown

Command Duration
ouch d test.rar -y -q 37sec 699ms 69µs 500ns
unrar x test.rar -y -idq 30sec 747ms 198µs 200ns
ouch d test.rar -y 37sec 597ms 47µs 600ns

I'm also glad logging has zero added cost now even with just 1 file 🫡

I'm not sure what might be causing such a massive time difference in your tests. The crate ouch uses is just a "High-level wrapper around the unrar C library provided by rarlab" so it isn't like they implemented something themselves, ideally it should be just as fast.

@justbispo
Copy link
Author

If it helps, I can give you a bit more information.

The files I tried to decompress are archives of faces for the game Football Manager. You can find the files here. The 1 million file I've tried first is the 2024.00 one, at 11,19 GB and smaller one is the 2024.01, at 305,89 MB I don't think I can link it for legal reasons (not sure if it counts as piracy) but if you'd like to try the tests with my example I can send it in another way. I've searched for test files similar to mine on the internet but couldn't find any.

I'm using Arch Linux, so both ouch and unrar comes from the Arch Linux official repository and the 7zz command comes from the 7-zip-bin package from the Arch User Repository.

I've also made the same tests you did (tho the size and number of files of the target directory were a bit different than yours) and the results are similar to yours. 7-zip still performed a bit better in both tests.

I tend to believe that there's some issue with unrar.rs with archives that have such a big number of small files. I wanted to test if the issue was with that crate, but no project that offers an easy CLI interface uses it. And the example basic_extract.rs didn't work for me (and I have no experience with Rust).

@muja
Copy link

muja commented Sep 9, 2024

And the example basic_extract.rs didn't work for me (and I have no experience with Rust).

Hey, unrar.rs author here. What issues did you face with that example?

@muja
Copy link

muja commented Sep 9, 2024

Unfortunately I cannot register on that page, it wants me to link a Steam Account.

I'll try to reproduce some other way

@justbispo
Copy link
Author

Hey, unrar.rs author here. What issues did you face with that example?

When I try to run cargo run --example basic_extract test.rar I get this error:

thread 'main' panicked at examples/basic_extract.rs:7:14:
called `Result::unwrap()` on an `Err` value: EOpen@Open (Could not open archive)

The basic_list example works tho.

@ttys3
Copy link
Contributor

ttys3 commented Dec 15, 2024

Decompression Speed Test

all these tests is under Linux /tmp dir (which is tmpfs, yes, just RAM, so there's no IO bottle neck)

cpu: 12th Gen Intel(R) Core(TM) i7

Step 1: Download the master.zip

curl -LZO https://github.com/torvalds/linux/archive/refs/heads/master.zip
ls -lh master.zip
# Output:
# 283 M

Step 2: Extract master.zip using unzip

unzip master.zip -d kernel-master
# took 7s
du -sh kernel-master
# Output:
# 1.7G  kernel-master

Step 3: Compress using rar

rar a -r -mt20 kernel.rar kernel-master
# took 19s
rm -rf kernel-master

Step 4: Verify the compressed files

ls *.zip *.rar -lh
# Output:
# .rw-r--r-- ttys3 ttys3 264 MB Sun Dec 15 21:22:20 2024  kernel.rar
# .rw-r--r-- ttys3 ttys3 283 MB Sun Dec 15 21:18:22 2024  master.zip

Step 5: Extract kernel.rar using unrar

unrar x kernel.rar ./xxx/
# took 6s

Step 6: Extract kernel.rar using ouch

ouch d kernel.rar -d ./ouch-xxx/
# took 1m2s
# Files unpacked: 93997
ouch d kernel.rar --accessible -d ./ouch-xxx/
# took 1m1s

Comparison Table

Tool Operation Time Notes
unzip Extract master.zip 7s
rar Compress kernel-master 19s 264 MB final size
unrar Extract kernel.rar 6s
ouch Extract kernel.rar 1m2s main branch release build, Without accessibility option
ouch Extract kernel.rar 1m1s main branch release build, With accessibility option

This table summarizes the decompression speed of different tools and highlights their performance for the given operations.

we can see that, both unzip and unrar took almost the same time: 6s-7s

while the unrar lib (which ouch uses) cost 60s+

related issue: muja/unrar.rs#61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants