-
Notifications
You must be signed in to change notification settings - Fork 3
IPA operations
Here's a troubleshooting guide: https://www.freeipa.org/page/Troubleshooting
IPA is normally started by systemd. "systemctl start ipa"
However when something fails, you want to use ipactl, because it shows more output.
- ipactl status|stop|start
All we absolutely need is krb5kdc, dirserv (ldap) and otpd (handles 2FA). But dirserv is probably the most likely to fail. Tomcat and Custodia are for managing certs. It's only on krb1. We use commercial certs, so it's mostly not used, but I believe it handle automatic renewal of the self-signed certs used for "kinit -n" (and skinit).
If it doesn't start properly, the simplest solution is to restore an old copy of the VM that works. Verify that it's fully operational. "skinit foo" for a user with 2FA is a good check. It's possible however that skinit won't work, so try
kinit yourHackAccount klist, copy the credential location, probably something starting with KEYRING: or FILE: kinit -T pasteCredential your2FAaccount
Once you've verified that it's working, do
skinit adminUser ipa-replica-manage re-initialize --from=krbx.cs.rutgers.eduwhere krbx is another copy that's current.
If this process fails, it's going to be a difficult battle, involving lots of Googling of symptoms.
Most likely if 2FA isn't working, the client's certificate is out of date. This section describes that. Other things to check:
- Time: Make sure that there isn't something wrong with the time on the Kerberos servers or the device you're running the token on (probably your phone).
- See if it's just you. It's possible for your token to be out of sync. There's a resync procedure.
- Has it ever worked on this client? OTP requires a fairly recent client, and it has to be set up properly. The rest of this section describes the setup. Ubuntu 14 and Centos 6 are too old. For the Mac, you need to install Kerberos5 from Macports. For Windows, you need to install MIT's kerberos, but even then it isn't easy.
- As a last resort, I guess it's possible that otpd has failed on one server, but we've never seen that.
We normally use skinit for our administrative principal. It does "kinit -n" and then "kinit -T cachename principal." Kinit -n is used to get an initial credential cache to "armor" the query for yours. Kinit -n requests a ticket for a special "anonymous" principal. If it doesn't work, either an upgrade has broken the server, or the certificate used for the anonymous principal has expired. This happens once a year.
Note that /etc/krb5.conf has a line
pkinit_anchors = FILE:/etc/krb5.kdc.pemThis file has the certificate from each of the kdc's, combined into one file. It's not the usual SSL cert. That's a commercial cert, which normal processes can verify. It's a special self-signed cert, used only for "kinit -n" (in our configuration).
openssl x509 -in /etc/krb5.kdc.pem -textwill show you the first cert in the file.
I'm reasonably sure that you take /var/kerberos/krb5kdc/kdc.crt from the three servers and concatentate them. Then use ansible to update the hosts. You only need to update the ansible-maintained hosts, since they're the only ones likely to be using kinit -n.
If you get tired of this, edit skinit to use "kgetcred -a" rather than "kinit -n". I'm using kinit -n because I'm trying to minimize the use of Rutgers-specific code, but having to do this update once a year may not be worth it.
Don't know how this happens. A yum upgrade did it in one case, but no other. Note that this process will invalidate existing kerberized mounts. I'd be inclined to unmount everything first. Even that may not be enough. Servers cache information, and that may include the kerberos credentials for the host. The following process will increment the key version number. I don't know for sure that this is an issue; just something to watch for.
ipa-client-install --uninstall
The simplest approach is to make sure that /var/lib/ipa-client/sysrestore/sysrestore.index doesn't exist and then do kerberos-boot.yml and kerberos.yml. That should re-add it.
The following works manually after uninstall, but you'll have to type the admin password, which I really don't like, or play games with key tables. Better just to let ansible do it for you.
ipa-client-install --no-ntp --no-sudo --force-join
You'll have to type "admin" and give the admin password. It doesn't seem to take 2FA.
Then run kerberos-boot.yml and kerberos.yml
Probably it will fail 1/3 of the time, because credserv is down on one of the kdcs. Normally restarting credserv will fix it, though it would be nice to know what is wrong. The client will change credserv just like kdc's, so it's OK for credserv not to be running on one server. The problem is if it's running but returning failures.
Our systems are supposed to be able to use all 3 KDCs. So if one being down causes failures, most likely there are systems with just one KDC's hostname set up. This section will describe how to fix that. Of course one possibility is to configure the system to use krb1, krb2 and krb4. But there's a better approach.
Ideally, our systems don't use the hostnames of the KDCs. Instead, they look up CS.RUTGERS.EDU in DNS and find the location of the servers. Kerberos, LDAP, and credserv know how to do this. It's the same convention used by MS Active Directory.
In /etc/krb5.conf, in the main [libdefaults] section, set
dns_lookup_kdc = truethough I think that's the default. Then make sure that in the CS.RUTGERS.EDU section, no kdc is listed. Note that explicit admin_server's must be listed. That's the server used for kpasswd. The Kerberos spec doesn't allow discovery from DNS yet (because in traditional Kerberos there can only be one admin server -- IPA is fully symmetical, so that limitation doesn't apply to it).
For applications that authenticate with LDAP, there are ways to specify the same thing. E.g. in /etc/nslcd.conf on the aberdeens, we use
uri DNS:cs.rutgers.edu
Openldap uses a weirder spec, ldaps:///dc%3Dcs%2Cdc%3Drutgers%2Cdc%3Dedu. However you can't combine that with a base. Typically the format is ldaps://host/baseDN. But if you use the domain name, it replaces the baseDN.
Here are the actual DNS entries used to find the servers:
_ldap._tcp.cs.rutgers.edu. 3600 IN SRV 0 100 389 krb1.cs.rutgers.edu. _ldap._tcp.cs.rutgers.edu. 3600 IN SRV 0 100 389 krb4.cs.rutgers.edu. _ldap._tcp.cs.rutgers.edu. 3600 IN SRV 0 100 389 krb2.cs.rutgers.edu. _kerberos._tcp.cs.rutgers.edu. 3600 IN SRV 0 100 88 krb4.cs.rutgers.edu. _kerberos._tcp.cs.rutgers.edu. 3600 IN SRV 0 100 88 krb2.cs.rutgers.edu. _kerberos._tcp.cs.rutgers.edu. 3600 IN SRV 0 100 88 krb1.cs.rutgers.edu. _ldaps._tcp.cs.rutgers.edu. 3600 IN SRV 0 100 636 krb1.cs.rutgers.edu. _ldaps._tcp.cs.rutgers.edu. 3600 IN SRV 0 100 636 krb4.cs.rutgers.edu. _ldaps._tcp.cs.rutgers.edu. 3600 IN SRV 0 100 636 krb2.cs.rutgers.edu.The ldaps records are non-standard. I'm not sure if anyone actually uses them.
Kgetcred uses the Kerberos entries, so it's assumed that every kdc is running credserv.
In our configuration, the two servers keep in sync using the Directory Server's builtin LDAP replication. There are replication agreements, that control what servers update what other servers. If you look up "IPA replication" you can find documentation on it, e.g. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Identity_Management_Guide/ipa-replica-manage.html.
If you suspect that things are out of sync, try
skinit adminUser ipa-replica-manage list -v krbx.cs.rutgers.edu ; for each servereach command will show the other two servers. Because the feeds go both directions, there are 6 feeds. They all should be in sync. The error code should be 0, with a last date that is recent. Sometimes you'll get an error "can't acquire busy replica." Try again.
Note that it is possible to resync the servers. E.g. if you restart one from backup, its data will be out of date and you'll want to resync it. On the server to be reset, do "ipa-replica-manage re-initialize --from srv1.example.com", pointing to the server that will supply the data.
It's worth looking at a man page for ipa-replica-manage.
If resync doesn't fix things, you're going to be buried in the details of replication. /var/log/dirsrv/slapd-CS-RUTGERS-EDU/errors should tell you what failed, but fixing it is going to be difficult. If you can figure out which is wrong, it may be easier to restore it from backup and resync.
Here's a guide to restoration: https://www.freeipa.org/page/Backup_and_Restore
As you can see, they recommend using VM snapshots to restore a system, and then resyncing the data.
In principle all you have to do is "yum update", and then reboot. You need to reboot in order to get the new kernel.
WARNING: Do not upgrade more than one server at a time. Changes made on one are replicated to the others. Doing two at the same time could result in conflicts.
I generally clone krb1's VM and try the upgrade there. Once the clone is up, edit /etc/hosts, add an entry pointing krb1.cs.rutgers.edu its new IP address, and reboot. At that point things should work. Verify that the system is working (see below), then do the upgrade.
In practice, there are some other things:
- Once the upgrade is done, verify that ipa was actually upgraded. yum update doesn't show output from the upgrade process, so you can't tell if it happened. We had one situation where it didn't. /var/log/ipaupgrade.log should be from the date when you did the upgrade, and it should end with something like "2019-01-02T20:37:30Z INFO The ipa-server-upgrade command was successful".
- Once you verify that the upgrade happened, I'd check to make sure that the system is working. The upgrade will have restarted all the components. See below.
- Now reboot.
- Once it's up, login and check that things are working.
The Kerberos servers should be configured so that command-line utilities use the system itself. Normally we configure Kerberos so that utilities will try all three servers, but on the Kerberos server we want to be able to test the operation of the server, so /etc/krb5.conf is set up to point utilities to just that system.
- The system itself needs to be OK. Date and time need to be right. Hostname needs to be right and DNS working. Kerberos is very sensitive to date/time, hostname, and DNS.
- "ipactl status" should show all components running, except ntpd if chrony is running.
- kinit with a principal that doesn't use one-time passwords. That will check the directory server and krb5kdc.
- skinit with a principal that uses one-time passwords. That will check otpd and time synchronization.
- kgetcret -a. That will check credserv.
When all three systems have been upgraded, verify that they are in sync.
- kinit as a principal that has privileges in Kerberos. The core sysadmin's .admin principals should work
- ipa-replica-manage list -v krb1.cs.rutgers.edu
- same for krb2 and krb4
If the upgrade doesn't run at all (e.g. ipaupgrade.log is zero length or old), try "ipactl stop", then "ipactl start." The start should notice that you have new code and do the upgrade.
If there are issues, there should be python backtraces in ipaupgrade.log. If you look at the python code you may be able to figure out what happened. It's perfectly OK to try to upgrade several times until it works. "ipa-server-upgrade" will do an upgrade. If the system is up to date it shouldn't do anything. It will stop and start components as necessary.
In general, the upgrade does two kinds of things:
- It moves around and updates configuration files and certificates, if location or format has changed.
- It updates the LDAP database. The log will show a large number of LDAP operations. These changes can be because new versions of the components need different configuration (a lot of the configuration is kept in LDAP), or because the format of entries for users and groups has changed.
Kerberos is a fairly standard implementation. You should be able to use MIT documentation.
- Daemon is krb5kdc
- log to /var/log/krb5kdc.log. Every interaction is logged, so it grows very quickly.
- Main configuration is /var/kerberos/krb5kdc/kdc.conf
- Plugin to get data from LDAP. This is part of the IPA integration, but even without IPA, LDAP is currently recommended over the old native Kerberos database.
- Plugin to check password quality for user password changes. Kerberos by default has the usual rules to require different character classes, but NIST currently recommends against this, and instead suggests checking passwords against a database of known passwords. The plugin to do that is defined in /etc/krb5.conf on the 3 servers. The documentation for installing IPA gives the details of how it's set up. The same data (stored in a different database) is used by the Rutgers activator to check password changes from the web.
- /var/kerberos/krb5kdc/kdc.conf was edited to allow old encryptions. We did this when the Netapp couldn't handle new encryption types. This should be undone, probably at the next update.
LDAP is Redhat's ds389. This is the latest incarnation of the original UMich code, having passed through Netscape and Sun. It's big improvement is the ability to do symmetrical synchronization across multiple servers.
- Executable is called ns-slapd (ns meaning Netscape, slapd is the original UMich name).
- Config is /etc/dirsrv. Most of it is slapd-CS-RUTGERS-EDU. You shouldn't need to change any of it. The SSL config and keys are there, but there are utilities to update them. One file to be aware of is dse.ldif. Most of the data is in databases, but the core configuration needed for startup is in a flat file, dse.ldif. Now and then it's gotten lost or zeroed. For that reason, various backup copies are made. If the file is missing, use the most recent copy.
- Logs are in /var/log/dirserv/slapd-CS-RUTGERS-EDU. The usual access and error files. If you have problems starting or syncing, this is the place to start.
- It's not clear how useful backups are, since the usual approach to failure is to restore a backup VM, but we do keep nightly backups of the data in /var/lib/ipa/backup/. There's a cron job to delete old ones. At times I've untar'ed the most recent backup file to look at the structure of the data. Sometimes an emacs search or grep is a better way to find something than ldap commands.
This is a daemon specific to IPA. MIT Kerberos has provisions for 2FA. Normally Kerberos checks passwords by using a key exchange that doesn't actually send the password to the server. But 2FA has to support various vendor proprietary technologies. There's no practical way to integrate them all with Kerberos. So Kerberos creates an encrypted channel to the server (using a session key from a 2nd Kerberos credential, called the "armor"). It sends the whole password, both the conventional part and the 2FA, through that channel to the server. The server is configured to talk to a Radius server to validate the password. If that works, it generates a Kerberos credential in the usual way. Radius is used because all the proprietary 2FA systems support Radius.
IPA wanted to support multiple 2FA systems. You can select on a per-user basis whether to use 2FA and if so which one. So they have their own Radius server whose only purpose is to look up the user in LDAP to find out what 2FA (if any) they use, and forward the request to the real Radius server. This intermediate Radius server is OTPD. The KDC actually talks to it with a Unix socket, so it's not quite standard radius.
Note that for the builtin IPA 2FA, the actual 2FA implementation is done by a plugin in the LDAP server. Since that's what we use, OTPD just calls LDAP. However if we wanted to use DUO or something else, OTPD would end up calling another Radius server. The reason 2FA works with LDAP as well as Kerberos is because it's actually implemented in the LDAP server. If we used a different scheme such as DUO, the LDAP system wouldn't know anything about it, and users using DUO wouldn't be able to authenticate against our LDAP.
- The daemon is ipa-otpd
- It doesn't seem to have a log file. I'm not sure it even has configuration.
With commercial certificates, as long as system still trusts the CA, the only thing you need is
ipa-server-certinstall -w -d mysite.key mysite.crt systemctl restart httpd.service systemctl restart dirsrv@CS-RUTGERS-EDUFor mysite.crt use just the server cert. It will fail if you include the chain, at least in Redhat 8.4. (It wasn't always that way.) If you need to update the CAs, read on. (However the rest of this section hasn't been tested for the newest version of IPA. As if Centos 8.1, the documentation still claims this is right.)
The command doesn't recognize alternate subjects. So you can't use the same certificate for all 3 servers.
Compare chain.pem with the old one. At the moment we're using InCommon, which is signed by UserTrust. UserTrust goes to 2038. But Incommon will expire in Oct 2024. It looks like if you need to update the incommon cert, you pull apart chain.pem, get just that cert and then do
ipa-cacert-manage install -t C,, FILE ipa-certupdate update-ca-trust
I think update-ca-trust may not be needed. ipa-certupdate should only be needed on one system, but documentation on all of this is very unclear.
See if the CA's cert chain has changed. Get the intermediate file for the new cert and break it into separate certs. Now do
- certutil -d /etc/ipa/nssdb/ -L
- certutil -d /etc/ipa/nssdb/ -L -n addtrust
- openssl x509 -in FILE -fingerprint
If any of the certs in the new chain aren't there, pick a nickname that isn't in use and do
ipa-cacert-manage -p ADMIN_PASSWORD -n NICKNAME -t C,, install ca.crt ipa-certupdateYou need to do this separately for each cert in the chain that's not already there, start with the cert for the top level CA.
In 2024 I had to update the main Comodo root cert. It is still SHA1, so to avoid the system rejecting it I had to edit /etc/crypto-policies/back-ends/nss.config and add SHA1 to the list of acceptable encryption types.
Once the chain is OK, you can import the new cert for the server:
ipa-server-certinstall -w -d mysite.key mysite.crt systemctl restart httpd.service systemctl restart dirsrv@CS-RUTGERS-EDUSometimes the dir server (ns-slapd) is in a loop and won't stop. You may need to do kill -9.
Assuming that the kdc is also using commercial certs,
- in var/kerberos/certs/, update cacert.pem kdc.crt kdc.key
systemctl restart krb5kdc
cacert.pem is chain.pem from Internet2
The date used in the Kerberos system rolls over in 2038. IPA should have fixed this, and so should the major distributions. But it's possible something has been omitted. Particularly in my code. In 2037 you should probably do some testing. There's a different rollover in 2106. It's going to require changing data structures. One hopes that by that time the community has fixed things. The fix to 2038 will be to specific pieces of code. For 2106 the structures will have to be made bigger, and everything will have to be recompiled.