-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTX is crippled by a large number of disks/or multipath disks. #115
Comments
Hi Doug, During HTX run time also, we run parted again on the disks. This is there by design. Reason being, if some one change the system config in-between "su - htx" and "htx" (i.e. start of test), in that scenario, we might run unknowingly on any of the system disk or any of the boot partition, hence corrupting those disks. To avoid this, before actually starting the test, we again run parted. |
Hi Preeti, |
Thanks Doug for the inputs. But, right now, we don't have the infrastructure to use IOCTL. That will need lot of change on our side. Need to take a call on this. |
Note, the latest HTX still has severe scaling issues. I was starting up HTX on a system with 2x69 (2 paths) disks in a SAS enclosure, plus 4x5 FC disks, plus a 4 local disks. HTX took over 45 minutes just to complete the "su - htx". Based on the messages, HTX is doing something fundamentally "wrong" in this area - I keep seeing the NVMe drive "repartition" continually during that time. How many times does HTX need to get the partition table for the same drive? It should only need to access each drive once. I seem to recall stumbling across some script/config file that was calling parted without specifying a disk - which is clearly unscalable as that causes parted to scan all disks each time. |
When HTX is preparing, during the "su - htx", it appears to be going through disks running "parted" (I see console messages indicating that various disk parition tables are being re-read, which is a known symptom of a deficiency in "parted"). With a large disk enclosure, possible multipath, it seems to be looping and touching the same disk over and over. It would appear that whatever algorithm is being used to discover/configure disks does not scale well at all, or has a bug in it.
It did eventually finish, but took over 20 minutes. There were about 130 disks on the system. There were some multipath disks, part of a 106-disk enclosure.
It also appeared to re-touch many of the disks when HTX started (Running), in spite of all those disks being disabled (halted) in the exerciser.
The text was updated successfully, but these errors were encountered: