Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a benchmark suite for storage drivers #7300

Open
DemiMarie opened this issue Feb 23, 2022 · 9 comments
Open

Create a benchmark suite for storage drivers #7300

DemiMarie opened this issue Feb 23, 2022 · 9 comments
Labels
C: storage P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.

Comments

@DemiMarie
Copy link

How to file a helpful issue

Qubes OS release (if applicable)

N/A?

Brief summary

There needs to be an answer to whether LVM or BTRFS storage pools are faster.

@DemiMarie DemiMarie added T: task P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Feb 23, 2022
@andrewdavidwong andrewdavidwong added this to the Non-release milestone Feb 23, 2022
@crat0z
Copy link

crat0z commented Mar 9, 2022

I have recently upgraded SSDs, so I needed to reinstall Qubes. Seeing this issue, I figured I should try out both LVM thin pools and btrfs. However, I really don't know what I am doing but I hope my collected data could be of help/interest.

For reference, I have a Ryzen 5 4650U, 16GB of RAM and a 970 EVO PLUS. Both installations were default, with encryption enabled. After installation, I simply installed kdiskmark in both dom0 and the fedora template. I did not update otherwise.

To benchmark, I used kdiskmark in dom0 and an AppVM with the "peak performance" and "real world performance" presets. For qvm-copy-to-vm, I changed the policy to not ask before copying the files, and I copied a 40GB file of zeros, 10 thousand files (4MB each) of zeros, and similarly with random data instead. Lastly, I timed the start up of a VM 25 times.

Results

thin pools

(don't mind the error in the terminal I forgot to rename the folder..)
Screenshot_2022-03-08_16-29-21

real 5.05 user 0.03 sys 0.01
real 4.88 user 0.03 sys 0.00
real 4.82 user 0.03 sys 0.01
real 4.86 user 0.03 sys 0.01
real 4.83 user 0.03 sys 0.00
real 5.04 user 0.03 sys 0.01
real 4.80 user 0.03 sys 0.01
real 4.61 user 0.03 sys 0.01
real 4.84 user 0.03 sys 0.00
real 4.91 user 0.03 sys 0.01
real 4.84 user 0.03 sys 0.01
real 4.73 user 0.03 sys 0.01
real 4.91 user 0.03 sys 0.00
real 4.66 user 0.03 sys 0.00
real 4.87 user 0.03 sys 0.00
real 4.94 user 0.03 sys 0.01
real 4.82 user 0.03 sys 0.00
real 4.89 user 0.03 sys 0.01
real 4.76 user 0.03 sys 0.00
real 4.87 user 0.03 sys 0.01
real 4.85 user 0.02 sys 0.01
real 4.86 user 0.03 sys 0.00
real 4.86 user 0.03 sys 0.00
real 4.77 user 0.03 sys 0.01
real 4.94 user 0.03 sys 0.00

btrfs

Screenshot_2022-03-08_14-09-03

real 4.59 user 0.03 sys 0.01
real 4.75 user 0.03 sys 0.01
real 4.71 user 0.03 sys 0.00
real 4.64 user 0.03 sys 0.01
real 4.72 user 0.03 sys 0.00
real 4.49 user 0.03 sys 0.01
real 4.59 user 0.02 sys 0.01
real 4.74 user 0.03 sys 0.00
real 4.64 user 0.03 sys 0.00
real 4.75 user 0.03 sys 0.01
real 4.82 user 0.03 sys 0.00
real 4.76 user 0.03 sys 0.00
real 4.64 user 0.03 sys 0.01
real 4.89 user 0.03 sys 0.01
real 4.88 user 0.03 sys 0.00
real 4.76 user 0.03 sys 0.01
real 4.80 user 0.03 sys 0.00
real 4.64 user 0.03 sys 0.00
real 4.69 user 0.03 sys 0.01
real 4.65 user 0.03 sys 0.00
real 4.76 user 0.03 sys 0.00
real 4.61 user 0.03 sys 0.01
real 4.82 user 0.03 sys 0.01
real 4.83 user 0.03 sys 0.00
real 4.68 user 0.02 sys 0.01

Observations

Not sure what to make of the btrfs dom0 results. Running kdiskmark in both dom0 and appvm stresses the SSD a lot, one sensor would report 50-60C and another would report 60-75C, so I could be thermal throttling. However, from my brief reading online this should only reduce performance by about 20%. As far as I recall, qvm-copy is and has always been CPU bound, so the results might not be as meaningful.

@DemiMarie
Copy link
Author

Looking at the benchmark results, it appears that in the real-world test BTRFS is faster for everything except sequential writes, while LVM2 has better results across the board in the peak performance test. I believe one cause of the weird behavior is that Qubes OS does not use direct I/O for the loop device, which causes all sorts of performance problems.

@tlaurion
Copy link
Contributor

tlaurion commented Mar 21, 2022

@DemiMarie another thread was opened on the forum is the use case of simply reencrypting LUKS and seeing speed varying from 50MiB/s to 150 MiB/s on commodity SSD on my side.

This is linked to assumptions the different tools are taking from what is reported from hardware (disk block size), partition table, partition alignment and block size of create partition.

To summarize, only modifying the LUKS sector size at LuksFormat step at partitioning has big impacts on thr rest of performance from there.
The basic culprit seems to be the hardware not reporting real physical block size, cryptsetup 2.4.0 being the first version detecting things properly (we don't have it) and other fs tools taking their own non-optimized decisions when it comes to initial partitioning. Doing some automatized tests there to validate speeds of sector size, block size, erase block size alignment and partition alignements are improving by a huge factor. Some user reported 2x speed improvement when aligning with Erase Block size when doing manual partition alignment for TLC based SSD drive.

The point here is not to go advanced necessarily, but trusting hardware in reporting the right thing, ie 512 block size nowadays, should probably be followed by testing minimally if 4096 is actually better prior of accepting that reported value as being the truth.

The thread is here and needs more testing: https://forum.qubes-os.org/t/ssd-maximal-performance-native-sector-size-partition-alignment

@DemiMarie
Copy link
Author

@marmarek is there any chance we could ship a more recent version of cryptsetup?

@tlaurion
Copy link
Contributor

tlaurion commented Mar 21, 2022

@DemiMarie @marmarek that cryptsetup in 2.4.0 might not be enough, testing needed, while real problem is what is actually reported from device lying on block sizes, and might result in same wrong decision from cryptsetup (and other tools down to the partition table creation, partition alignment of LUKS and down to filesystem (LVM/others).

Misalignment resulting in more blocks needing to be read/wrote, and in SSD firmware having to compensate and touch regions which otherwise would not need to be, resulting in speed differences easily observable from cryptsetup-reencrypt, but where filesystem performance (not filling ssd cache) will not show consequences.

@DemiMarie
Copy link
Author

@DemiMarie @marmarek that cryptsetup in 2.4.0 might not be enough, testing needed, while real problem is what is actually reported from device lying on block sizes, and might result in same wrong decision from cryptsetup (and other tools down to the partition table creation, partition alignment of LUKS and down to filesystem (LVM/others).

Sadly I am not aware of any solution other than fixing each of these tools separately 😞. Is there any way to find out the erase block size?

@crat0z
Copy link

crat0z commented Mar 26, 2022

After spending some time on btrfs, I cannot use it myself. It could be related to discards as mentioned in one of the other issues which I've lost now, but btrfs has a tendency to just murder IO on my laptop. VMs become completely unresponsive if the drive is doing something for long enough.

For example, I tried syncing the Monero blockchain, and once I hit about 75 Mbps average download speed, VMs would just start hanging intermittently. Terminals don't respond, new processes won't create, Qubes services in dom0 will freeze too.

I also tried enabling direct IO in /etc/xen/scripts/block and unfortunately it didn't help the situation. I do not have any results, but they were pretty much identical to before.

@rustybird
Copy link

I tried syncing the Monero blockchain, and once I hit about 75 Mbps average download speed, VMs would just start hanging intermittently. Terminals don't respond, new processes won't create, Qubes services in dom0 will freeze too.

Try running filefrag on the private.img if you still have it around. monerod causes some truly monstrous fragmentation, easily tens of millions of extents after a long sync. (It's also the only blockchainy thing I've noticed corrupting its own data on crash.)

btrfs filesystem defragment private.img can do wonders here. Unfortunately it blows up the space used for shared data, so you want to get rid of all that first: Basically - shut down the VM, delete private-precache.img, delete all private.img.*@* revisions, delete relevant subvolume snapshots if you have any, consider deleting any clones of the VM, and then run defrag.

@tlaurion
Copy link
Contributor

Note progress of QubesOS/qubes-core-admin#649

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: storage P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.
Projects
None yet
Development

No branches or pull requests

5 participants