Skip to content

Setting up new server (replica) in an existing system

Charles Hedrick edited this page Feb 17, 2023 · 58 revisions

Return to main: Kerberos

WARNING: This file and the file referred to are not maintained. They have been tested for Redhat 9, but won't be updated for later. The current version is in the Rutgers internal Gitlab.

This uses kdc1.yml, kdc2.yml and kdc-data. They are in clhedrick/kerberos-ansible. The rest of the repo is no longer maintained, but these files are up to date as of Jun 15, 2022, for RHEL 9.

See Issues section at the end to see what went wrong in previous tries. With RHEL 9, things worked quite well.

This page describes setting up a new server where you have access to an existing server with the data. That's called creating a replica. You can completely replace the servers by killing each one and recreating it as a replica. As long as one server with the data is always left you can kill the rest and recreate them.

The process is mostly done using ansible. kdc1.yml starts with a minimal RHEL 9 installation and prepares it. Then you do the normal ipa install commands. kdc2.yml then adds the Rutgers stuff. See https://github.com/clhedrick/kerberos-ansible for the ansible setup. (It is missing our key tables, for obvious reasons. Those will have to come from the production servers.) (Our ansible setup has some extra things you won't need unless you're using our extra services. So look at the yml files to make sure they do what you want.) You'll also need the ansible host file. Here's what ours currently looks like:

krb1.cs.rutgers.edu
krb2.cs.rutgers.edu
krb4.cs.rutgers.edu
[all:vars]
kdc_ips=x.x.x.x,y.y.y.y,z.z.z.z
kdc_ips should the IP addresses of the KDCs. If this is a test setup, it's the IP addresses of the test servers.

I haven't put the IPA commands into Ansible because they depend upon the location of certificate files, and may differ if you're doing a test where DNS points to the production servers rather than your test servers.

This page is out of order. The order to do things:

  • Do a normal Redhat server install, setup the network, and get licensing working.
  • Preliminary steps
  • Ansible kdc1.yml
  • IPA install
  • Post-install. That has to be done as admin or netid.admin.
  • Ansible kdc2.yml
  • PKINIT setup. That requires certificates, which is why it's not in Ansible.
The system goes live once kdc2.yml is done. kdc1 creates firewall rules that prevent clients from seeing the system. kdc2 creates a cron job that put clients into the firewall. If we were actually using PKINIT, I'd do it before kdc2.

Table of Contents

Preliminary steps

  1. On the old system, before you take it down, do "ipa-replica-manage dnarange-show" and save the results
  2. subscription-manager unregister
  3. after you take it down, on another system do "ipa-replica-manage del SERVER"
After redhat installation (I recommend using the server software option, not server with GUI)
  1. get networking to work: nmcli c edit INTERFACE and setup ip address, gateway and DNS.
  2. subscription-manager register. user and password for our subscription are in 1password
  3. I used the RHEL web interface to add subscriptions for RHEL Linux to the server. We have two servers with the same hostname. Doing it with subscription-manager confused things.
  4. Add linux-admin with password from 1password.
  5. Install python3 if needed, though it came with RHEL server 9.
  6. Verify you can do ssh linux-admin@host from where you have the ansible files
  7. Put the certificate, chain, and key on the machine. Note that Internet 2 gives you a bad chain. I used the first cert from chain.pem and replaced the rest with the current self-signed cert for " C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority" There should only be 2 certs in chain.pem. Once chain.pem is right "cat cert.pem chain.pem > fullchain.pem"
  8. Put /etc/krb5.anonymous.keytab and /etc/krb5.tgt.keytab and /etc/scripts.keytab on the machine. They aren't in ansible for security reasons. (I think scripts may no longer be used, but I can't be sure.)
  9. on krb1 only run the ansibleSetup role, because we need to be able to look at the hosts table for updating the Guacamole data

Ansible installation

ansible-playbook -u linux-admin -k -K kdc1.yml --limit=krb1.cs.rutgers.edu --become-method=su

Now setup IPA. I haven't done this via ansible because locations of certs may be different, and it tends to fail and need error recovery.

ansible-playbook -u linux-admin -k -K kdc2.yml --limit=krb1.cs.rutgers.edu --become-method=su

Post-install steps

These need to be done when kinit'ed as admin or netid.admin. That's why they're not in ansible

1) If you're doing all 3 servers, you'll be left with the 3 talking to each other, but just pairwise. You'll need to connect the remaining two. First do this to see what the current setup is

ipa topologysegment-find domain

There should be three agreements, bidirectional between each pair. Probably one is missing. If so, add it:

ipa topologysegment-add domain --leftnode=krb1.cs.rutgers.edu --rightnode=krb4.cs.rutgers.edu

Sometimes the install process creates a one-directional sync. Kill it with "ipa topologysegment-del domain NAME" and recreate it using the previous commmand. NAME is the name of the segment as shown by "ipa topologysegment-find domain".

2) In order for credserv to work, you need to skinit as an administrator and do

ipa role-add-member "Rutgers Credserv Service" --services=credserv/krb1.cs.rutgers.edu

for all kdcs.

3) The accounts web app needs to be able to set user passwords. IN order to do that, it needs to be defined as a password sync application. Unfortunately this configuration is per-server. So it has to be done on all new servers. As admin do "ldapmodify -Y GSSAPI < sync.ldif"

dn: cn=ipa_pwd_extop,cn=plugins,cn=config
changetype: modify
add: passSyncManagersDNs
passSyncManagersDNs: krbprincipalname=http/[email protected],cn=services,cn=accounts,dc=cs,dc=rutgers,dc=edu
passSyncManagersDNs: krbprincipalname=http/[email protected],cn=services,cn=accounts,dc=cs,dc=rutgers,dc=edu
passSyncManagersDNs: uid=hedrick.admin,cn=users,cn=accounts,dc=cs,dc=rutgers,dc=edu

4) Installing a new replica could leave it without the randomly chosen range for new UIDs and GIDs, though for RHEL 9 it was OK. It will assign one automatically the first time you create a user or group. Hopefully it will pick something reasonable. If not, you may want to put back the old range that you saved from "ipa-replica-manage dnarange-show".

ipa group-add tempgroup
ipa group-del tempgroup
ipa-replica-manage dnarange-show

Make sure it has allocated a range that's close to the other servers. If not, you may want to do ipa-replica-manage dnarange-set.

5) See the section below on PKINIT setup. This is needed for "kinit -n" to work. However we currently using "kgetcred -a", so this isn't actually used. I still recommend doing it.

Tests

On the new KDC

  1. kinit with a user that doesn't use two factor.
  2. skinit with a user that uses two factor
  3. kinit or skinit with some user and do "kgetcred -l". It's best to do this with a user that has information registered. You can do "kgetcred -r" on one of our servers to do that.
  4. verify that "kgetcred -a" gives you a certificate as the anonymous user
  5. after a day, verify that the cron jobs are all working.

IPA setup commands

Now for IPA commands. If the servers you're setting up are in DNS, it's easy

  1. make sure there are no vestiges of the old server. For RHEL 9, "ipa host-del SERVER" did it.
  2. Make sure you don't have any kerberos tickets, "kdestroy -A". If the install fails, do this every time before a reinstall.
  3. make sure you have added the other servers to /etc/sysconfig/nftables.lcsr
  4. fullchain is a file with the cert first and then the chain. See above for issues we had. I had to fix up fullchain.pem
ipa-client-install
; in a reinstall you may need ipa-client-setup --force-join
ipa-replica-install \
    --dirsrv-cert-file /root/fullchain.pem \
    --dirsrv-cert-file /root/privkey.pem \
    --http-cert-file /root/fullchain.pem \
    --http-cert-file /root/privkey.pem \
    --no-pkinit [possibly --skip-conncheck]
You'll be prompted for keys twice. Currently hit CR, but it's possible you'd need to use the "kerberos / ldap key for certs" in 1Password.

Obviously you should use the actual file names of your certificates. For the first file, I used the combined file, i.e. a file that starts with the system's cert, and then the intermediate certs.

Next

  • Post-install section
  • ansible kdc2.yml
  • PKINIT setup

PKINIT setup

If you find this section confusing, you can skip it. We're not currently using PKINIT for anything. It can be done right after ipa-replica-install, or you can wait and do it later.

  • mkdir /var/kerberos/certs
  • cp cert.pem /var/kerberos/certs/kdc.crt
  • cp privkey.pem /var/kerberos/certs/kdc.key
  • cp chain.pem /var/kerberos/certs/cacert.pem
Now set up cerificates for pkinit. Unfortunately the normal commands don't let you do that using a commercial certificate. Edit /var/kerberos/krb5kdc/kdc.conf. Change two lines as follows
;  pkinit_identity = FILE:/var/kerberos/krb5kdc/kdc.crt,/var/kerberos/krb5kdc/kdc.key                                                    
  pkinit_identity = FILE:/var/kerberos/certs/kdc.crt,/var/kerberos/certs/kdc.key
  pkinit_anchors = FILE:/var/kerberos/krb5kdc/kdc.crt
;  pkinit_anchors = FILE:/var/kerberos/krb5kdc/cacert.pem                                                                                
  pkinit_anchors = FILE:/var/kerberos/certs/cacert.pem

Then "systemctl restart krb5kdc". To use this data, the client krb5.conf needs the following:

  pkinit_anchors = DIR:/etc/ssl/certs
  pkinit_eku_checking = kpServerAuth
  pkinit_kdc_hostname = krb2.cs.rutgers.edu
On the KDC, the hostname should be the local hostname. On a real client you'd have three lines, with the 3 servers.

To test it, do "kinit -n".

Test environments

I set up a test copy of our 3 servers by duplicating snapshots of the production servers. The issue here is that the hostnames of the KDCs are built into a lot of the LDAP data. There's no practical way to rename a KDC. So instead I set up the tests systems to think they're the actual KDC hosts.

If you're testing this process for one KDC, this is easy. You just add your IP address with the name of the host you're pretending to be in /etc/hosts. But if your information isn't correct in DNS, you'll need to use a special DNS server that defines krb1, krb2 and krb4 to be the test systems. Our current ubuntu systems use systemd-resolved. To use such a system as a fake DNS server, add to /etc/systemd/resolved.conf

ReadEtcHosts=yes
DNSStubListener=yes
DNSStubListenerExtra=128.6.26.16
where 128.6.26.16 is the address of that system. Then add the fake krb1, 2, and 4 to /etc/hosts.

Restart systemd-resolved.

  • Obviously you have to configure networking to have the IP addresses of the fake servers.
  • Make sure to change /etc/sysconfig/nftables.conf to have the right IP address for the other servers or you won't be able to talk between the servers.
  • Sometimes when starting from a snapshot, /etc/dirsrv/slapd-CS-RUTGERS-EDU/dse.ldif is missing. This is the top-level data file for LDAP, so LDAP won't start if it's missing. I have a cron job that saves this file in /var/lib/ipa/backup/dse.ldif. So if it's missing you can restore it from that copy. In one case I restored it by taking it from the corresponding production server.
  • If you're starting from a snapshot, kinit as an admin user, and use "ipa-replica-manage list -v krbX.cs.rutgers.edu" for all 3 servers to verify that replication is in sync. There's a good chance you'll have to resync by doing ipa-replace-manage re-initialize --from ... You may need to do it a few times, until you're in sync, because it's not always clear which server to use.

Clearing remains of old servers

When you're doing testing you may need to delete a server and recreate it. You may also need to recover from failure.

Generally "ipa-replica-manage del SERVER" will delete the info in LDAP, including replication agreements. But I'd look at /etc/dirsrv/slapd-CS-RUTGERS-EDU/dse.ldif. Look for all occurrences of krbx. If there are any replication agreements left, here's how to delete one:

ldapmodify -ZZ -x -D "cn=Directory Manager" -W -H ldap://localhost -f delreplication	
dn: cn=meTokrb2.cs.rutgers.edu,cn=replica,cn=dc\3Dcs\2Cdc\3Drutgers\2Cdc\3Dedu	
 ,cn=mapping tree,cn=config	
changetype: delete	
That will leave ruvs. The following should show them:
ipa-replica-manage list-ruv
There are several options for dealing with them. The simplest is
ipa-replica-manage clean-dangling-ruv
Here are manual versions of these things:

Find out the replication id by looking at /etc/dirsrv/slapd-CS-RUTGERS-EDU/dse.ldif. Here's an example:

nsds50ruv: {replica 4 ldap://krb1.cs.rutgers.edu:389} 588f7d78000100040000 5da
That's id 4. If there's nothing there, don't bother. Do these things on both remaining servers. There are builtin jobs to clean these up. The following cleans up replication ID 55:
ldapmodify -a -D "cn=Directory Manager" -W -p 389 -h krb1.cs.rutgers.edu -x -f cleanruv	
dn: cn=clean 55, cn=cleanallruv, cn=tasks, cn=config	
objectclass: extensibleObject	
replica-base-dn: dc=cs,dc=rutgers,dc=edu	
replica-id: 55	
cn: clean 55	

issues

1) Updating from 7, first had to disable CA service on krb1, or the replica wouldn't accept certs. This shouldn't be an issue in the future, since none of the new servers is set up as a CA. on krb1:

ldapmodify -Y GSSAPI < noca

dn: cn=CA,cn=krb1.cs.rutgers.edu,cn=masters,cn=ipa,cn=etc,dc=cs,dc=rutgers,dc=edu
changetype:modify
delete:ipaConfigString
ipaConfigString: enabledService
ipaConfigString: caRenewalMaster

2) The key tables aren't on config.lcsr for obvious reasons, so kdc2.yml failed. Transfered keytables by hand and temporarily removed from from the yml

3) Somehow ended up with a one-way replication agreement.

ipa topologysegment-find domain

will show them. To kill the bad one:

ipa topologysegment-del domain krb4.cs.rutgers.edu-to-krb2.cs.rutgers.edu

To add it back

ipa topologysegment-add domain --leftnode=krb2.cs.rutgers.edu --rightnode=krb4.cs.rutgers.edu

more issues

When installing krb4, several problems.

1) when old remnants of krb4 weren't out of the database, lots of permission failures Make sure the host is deleted, and ldap/krbx.cs.rutgers.edu, hosts/krbx.cs.rutgers.edu don't exist.

2) The biggie is a failure

Restart of krb5kdc.service complete
Waiting up to 300 seconds to see our keys appear on host ldap://krb1.cs.rutgers.edu
Starting new HTTPS connection (1): krb1.cs.rutgers.edu:443
https://krb1.cs.rutgers.edu:443 "GET /ipa/keys/dm/DMHash?xxxxx   HTTP/1.1" 502 415
Your system may be partly configured.
Run /usr/sbin/ipa-server-install --uninstall to clean up.

File "/usr/lib/python3.6/site-packages/ipapython/admintool.py", line 179, in execute
  return_value = self.run()
File "/usr/lib/python3.6/site-packages/ipapython/install/cli.py", line 340, in run
  return cfgr.run()
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 360, in run
  return self.execute()
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 386, in execute
  for rval in self._executor():
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 431, in __runner
  exc_handler(exc_info)
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 460, in _handle_execute_exception
  self._handle_exception(exc_info)
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 450, in _handle_exception
  six.reraise(*exc_info)
File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
  raise value
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 421, in __runner
  step()
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 418, in <lambda>
  step = lambda: next(self.__gen)
File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from
  six.reraise(*exc_info)
File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
  raise value
File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from
  value = gen.send(prev_value)
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 655, in _configure
  next(executor)
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 431, in __runner
  exc_handler(exc_info)
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 460, in _handle_execute_exception
  self._handle_exception(exc_info)
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 518, in _handle_exception
  self.__parent._handle_exception(exc_info)
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 450, in _handle_exception
  six.reraise(*exc_info)
File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
  raise value
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 515, in _handle_exception
  super(ComponentBase, self)._handle_exception(exc_info)
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 450, in _handle_exception
  six.reraise(*exc_info)
File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
  raise value
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 421, in __runner
  step()
File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 418, in <lambda>
  step = lambda: next(self.__gen)
File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from
  six.reraise(*exc_info)
File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
  raise value
File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from
  value = gen.send(prev_value)
File "/usr/lib/python3.6/site-packages/ipapython/install/common.py", line 65, in _install
  for unused in self._installer(self.parent):
File "/usr/lib/python3.6/site-packages/ipaserver/install/server/__init__.py", line 590, in main
  replica_install(self)
File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 402, in decorated
  func(installer)
File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 1298, in install
  custodia.import_dm_password()
File "/usr/lib/python3.6/site-packages/ipaserver/install/custodiainstance.py", line 211, in import_dm_password
  cli.fetch_key('dm/DMHash')
File "/usr/lib/python3.6/site-packages/ipaserver/secrets/client.py", line 120, in fetch_key
  r.raise_for_status()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
  raise HTTPError(http_error_msg, response=self)

The ipa-replica-install command failed, exception: HTTPError: 502 Server Error: Proxy Error for url: https://krb1.cs.rutgers.edu/ipa/keys/dm/DMHash?xxxx
502 Server Error: Proxy Error for url: https://krb1.cs.rutgers.edu/ipa/keys/dm/DMHash?ccc

It appears that this happens only when using commercial certs. It's trying to fetch the Directory Manager password (encrypted) from the primary to put it in the new sysstem. I commented out custodiainstance.py:211,

    def import_dm_password(self):
        cli = self._get_custodia_client()
#        cli.fetch_key('dm/DMHash')                                                                                    <
and copied it manually.

On the primary, open /etc/dirsrv/slapd-CS-RUTGERS-EDU/dse.ldif. Look for

nsslapd-rootpw: {SSHA}
It should be under cn=config. Now shutdown ipa on the new server (ipactl stop), edit /etc/dirsrv/slapd-CS-RUTGERS-EDU/dse.ldif, and replace that line with the one you copied from the original server. Restart ipa.

3) After the system went into production, we found that the ipa commmand failed for some systems. It turned out that these systems were talking to krb4 for the IPA command, but getting authentication from a different system. During installation, the Kerberos data for the principal HTTP/krb4.cs.rutgers.edu had failed to propagate to the other systems, probably because they already had entries for that principal that hadn't been deleted when the original server was deleted. To fix it, do

ldapsearch -ZZ -x -D "cn=Directory Manager" -W -H ldap://localhost krbprincipalname=HTTP/[email protected] krbprincipalkey
on the system that is giving errors for ipa, in this case krb4. Do the same command on a different system. If this is the problem, you'll get different values for krbprincipalkey. You need to put the right value, which is the one on the system itself (krb4 in this case) on one of the other systems. It will propagate automatically to all of them.
ldapmodify -ZZ -x -D "cn=Directory Manager" -W -H ldap://localhost -f fixhttp

dn: krbprincipalname=HTTP/[email protected],cn=services,cn=accounts,dc=cs,dc=rutgers,dc=edu
changetype:modify
replace:krbprincipalkey
krbprincipalkey:: xxxx

Of course adjust the hostname in the file. Both commands will prompt for the Directory Manager password.

Note that krbprincipalkey is binary data. The ldapsearch and ldapmodify commands give it in base64. The "::" after the attribute name indicates that the value is base64.