-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Bug in gpinitsystem when creating multiple segments per segment host #625
Comments
Hey, @lmugnano4537 welcome!🎊 Thanks for taking the time to point this out.🙌 |
Hey @RyanWei could you help assign engineers to have a look? Thanks. |
Hi @lmugnano4537 , looking at the log
|
Hey @liang8283 . So some more context and I added some files to look at. It seems like the issue is the same thing that happens with gpssh. You pass these utilities a host file with a list of hosts for the cluster, then the utilities in the background seem to ssh into each host in that file, grab the hostname using something like At the very least, I would hope that gpssh and gpinitsystem (and any other utilities that do this behavior) should at the very least do a duplicate hostname check when they grab the hostnames off the machines they ssh into and if there is a duplicate, throw some sort of exception and fail explicitly. I'd be happy to implement this as a bug fix if you think there is good reason to do so. The real challenge with it is it's not at all obvious that duplicate hostnames are the issue when you see the side effects of it. For instance gpinitsystem will not use all of the specified data directories and create a weird configuration (I've attached some files of an example of this with comments in the files. sdw1 and sdw2 I manually set the hostname to be the same at the OS level on both of them). This also caused all kinds of weird rsync error messages when I was doing gpcheckperf which lead me down a rabbit hole trying to find if there was a bug in rsync. I should be able to replicate this too if needed. |
Cloudberry Database version
1.6.0
What happened
See attached documents. We are creating a cluster on 12 segment hosts that is supposed to have 4 primaries per segment host spread across 4 mounted disks:
/data1
/data2
/data3
/data4
What resulted instead is we ended up with 2 primaries per segment host with it being fairly random as to what disks it created it on.
As a workaround I had to change the config file to double up the disk list. It just seems to be cutting it in half for some reason.
What you think should happen instead
it should be creating the number of primary segments as per the number of data drives in the DATA_DIRECTORY list
How to reproduce
See attached
gpinit_bug_1.txt
gpinit_bug_workaround.txt
Operating System
rocky8
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: