Skip to content

SteamOS auto repair process

Michael DeGuzis edited this page May 23, 2017 · 10 revisions

Table of Contents generated with DocToc

About

If the system has too much trouble starting, or experiences abrupt shutdowns, this script will likely fire off upon starting your system. Technically speaking, if the greeter (lightdm) service fails to start, the steamos-autorepair script is invoked via a systemd service. A list of services on SteamOS can be found here.

The main autorepair script attempts to:

  • Fix dpkg configurations that may be broken/unfinished
  • Fix unfinished/broken packages with apt-get -f -y install
  • Rebuild dkms modules (such as nvidia drivers).

Plymouth is called to display recovery message and progress.

Script outputs updated: 20160711

What triggers this

The SteamOS recovery process involes 3 key files (that is known at the moment):

/lib/systemd/system/lightdm.service.d/steamos-autorecover.conf

This is what triggers the unit file. If the lightdm service fails, this conf file indicates that on failure, activate the steamos-autorepair static service.

[Unit]
OnFailure=steamos-autorepair.service

steamos-autorepair.service

This is the static systemd unit file. After the above lightdm conf file activates this service, /usr/bin/steamos-autorepair.sh is initiated. It should be noted that this activation is "oneshot," so it does not try over and over (say, if lightdm actually starts after this process). If lightdm fails again, things would start from the top.

[Unit]
Description=SteamOS Autorepair

[Service]
ExecStart=/usr/bin/steamos-autorepair.sh
Type=oneshot

/usr/bin/steamos-autorepair

This is the repair process itself.

#!/bin/bash

# 10s is the time window where systemd stops trying to restart a service
sleep 15

# if lightdm is not running after 15s, it's not a random crash, but many
# otherwise nothing to do, systemd will call us again if it crashes more
if pidof -x lightdm > /dev/null
then
    exit 0
fi

# can't have this be a dependency of our unit or it'll trigger too early
service plymouth-reboot start

plymouth display-message --text="SteamOS is attempting to recover from a fatal error"
plymouth system-update --progress=10
dpkg --configure -a
apt-get -f -y install
plymouth system-update --progress=50

#
# force rebuild dkms modules
#
dkms_modules=`find /usr/src -maxdepth 2 -name dkms.conf`
arr=($dkms_modules)
let prog=50

# compute how far to move the progress bar for each module
let delta="50/${#arr[@]}"

for i in $dkms_modules
do
  module_name=`grep ^PACKAGE_NAME $i | cut -d= -f2 | tr -d \"`
  module_version=`grep ^PACKAGE_VERSION $i | cut -d= -f2 | tr -d \"`

  dkms remove $module_name/$module_version --all
  dkms build -m $module_name -v $module_version
  dkms install -m $module_name -v $module_version
  let prog="$prog + $delta"
  plymouth system-update --progress=$prog
done

plymouth system-update --progress=100
plymouth display-message --text="Recovery complete, restarting..."

sleep 1

reboot

Pausing auto-repair

This is useful if you wish to pause the auto repair to check out things, or have a WLAN connection that you don't want to go through the trouble of setting up manually. Any set of the below commands can be inserted early on in the process, after service plymouth-reboot start.

plymouth display-message --text="Sleeping for 500 seconds"
sleep 500s

Using a timer:

i=400; while [ $i -gt 0 ]; do plymouth display-message --text="Sleeping for 400 seconds. $i seconds remaining"; i=`expr $i - 1`; sleep 1;  done
Clone this wiki locally