Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTX is crippled by a large number of disks/or multipath disks. #115

Open
dougmill-ibm opened this issue Aug 8, 2017 · 4 comments
Open

Comments

@dougmill-ibm
Copy link

When HTX is preparing, during the "su - htx", it appears to be going through disks running "parted" (I see console messages indicating that various disk parition tables are being re-read, which is a known symptom of a deficiency in "parted"). With a large disk enclosure, possible multipath, it seems to be looping and touching the same disk over and over. It would appear that whatever algorithm is being used to discover/configure disks does not scale well at all, or has a bug in it.

It did eventually finish, but took over 20 minutes. There were about 130 disks on the system. There were some multipath disks, part of a 106-disk enclosure.

It also appeared to re-touch many of the disks when HTX started (Running), in spite of all those disks being disabled (halted) in the exerciser.

@preeti-dhir
Copy link
Contributor

Hi Doug,
We run parted command (parted -mls command) during setup time (su - htx) to figure out on what all disks or its partitions, HTX can run the test, For multipath, we again run parted and kpartx for that particular device to figure out mpath and its partitions which can be used for testing.
Above steps are must for us to check the availability of the device.
Now, if number of disks are large on the system, scaling is a challenge because of parted command known issue.
We will be looking into it how can we optimize it further.

During HTX run time also, we run parted again on the disks. This is there by design. Reason being, if some one change the system config in-between "su - htx" and "htx" (i.e. start of test), in that scenario, we might run unknowingly on any of the system disk or any of the boot partition, hence corrupting those disks. To avoid this, before actually starting the test, we again run parted.

@dougmill-ibm
Copy link
Author

Hi Preeti,
Yeah, I have run into the parted issues before, too. We know that fdisk does not have the same problems, but there are rumors (that I have not been able to confirm) that fdisk is going to be removed. If indeed fdisk is going way, another approach would be to simply use the ioctl to read the partition table, although I'm not sure of the structure of HTX code to know whether using C code is practical at that point. But, as you also state, the one think parted does is make us aware of just how many times the partition tables are being read. I understand wanting to make sure things haven't changed since the last time you checked, but just within the "su - htx" I see many many calls to parted for the same disk. This seems like a place for optimization. Of course, even a shell script could optimize some of this, I think, by using the "-nt" test on the block device and the last-captured partition data file.

@preeti-dhir
Copy link
Contributor

Thanks Doug for the inputs. But, right now, we don't have the infrastructure to use IOCTL. That will need lot of change on our side. Need to take a call on this.
For now, I am planning to remove parted command invocation at couple of places during su - htx time which I think might not be required. Will provide you a patch for the testing.

@dougmill-ibm
Copy link
Author

Note, the latest HTX still has severe scaling issues. I was starting up HTX on a system with 2x69 (2 paths) disks in a SAS enclosure, plus 4x5 FC disks, plus a 4 local disks. HTX took over 45 minutes just to complete the "su - htx". Based on the messages, HTX is doing something fundamentally "wrong" in this area - I keep seeing the NVMe drive "repartition" continually during that time. How many times does HTX need to get the partition table for the same drive? It should only need to access each drive once. I seem to recall stumbling across some script/config file that was calling parted without specifying a disk - which is clearly unscalable as that causes parted to scan all disks each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants