Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Ansible Playbook for Deploying to macOS #67

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions macOS/jenkins-node/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Jenkins macOS nodes for Mantid

This describes how to deploy a macOS build node. Such a node is able to perform any of the macOS jobs.

## Prerequisites

- Access to the Keeper password manager and the `ISIS Jenkins Nodes` file.
- Access to the [`mantidproject/ansible-linode`](https://github.com/mantidproject/ansible-linode) and [`dockerfiles`](https://github.com/mantidproject/dockerfiles) repositories.
- If the machine was already setup, you will need your SSH key adding to the list so you can connect remotely.


## Manual Setup

There are few steps that need to be manually taken on a brand new machine before ansible can take over.

- Login to the provided administrator account.
- Set up a `mantidbuilder` user on the new machine:

- Open the `System Preferences -> Users & Groups` menu.
- Press the `+` button below the list of users and add a new administrator account. Use `mantidbuilder` for both the name fields and provide a strong password.

- Enable remote access:

- Open `System Preferences -> Sharing`.
- Enable `Remote Login` for all users and allow full disk access.
- Make a note of the `ssh` login instructions, especially the hostname after the `@`.
- Store the chosen password and the hostname in the `ISIS Jenkins Nodes` file in Keeper.
- Enable `Remote Management` for all users.

- Set security settings to allow for builds and consistent access:

- Open `System Preferences -> Security & Privacy`.
- In `General`, untick the `Require password [...] after sleep or screensaver begins` checkbox.
- In `FileVault` press the button to `Turn Off FileVault`.
- FileVault encrypts the contents of the disk until the first login. This means that the `ssh` service is not started until someone logs in on the physical machine, which makes the machine a pain to access after reboot.

- Install XCode Command Line Tools:

- Launch a terminal.
- Run `xcode-select --install`.
- Wait for the popup to appear and click `Install`.


- Back on the machine you will be doing the deployment on, you will need to add your SSH key to the new mac:

- `ssh-copy-id mantidbuilder@<HOST>`

## Jenkins Controller Node Creation

- Provision a new node in [Jenkins](https://builds.mantidproject.org/computer) with the following changes:
- Set *Remote root directory* to `/jenkins_workdir`
- Set environment variables:
- `BUILD_THREADS` => set based on system, e.g. number of cores
- `MANTID_DATA_STORE` => `/mantid_data`
- Once created make a note of the node's name and secret (the long string of letters and numbers)

## Deploying to the Agent

**We're calling nodes _agents_ from here on out. There's some nuance, but they're mostly interchangeable terms.**

The ansible scripts will set up the machine and connect it to the Jenkins controller ready for running builds and other jobs.

### Getting the Right Environment

1. If you already have the `ansible-linode` repo and associated conda environment, activate it and skip to step 4.
2. Clone the [`mantidproject/ansible-linode`](https://github.com/mantidproject/ansible-linode) repo.
3. Navigate to the base of the cloned repo and run:

- `mamba create --prefix ./condaenv ansible`
- `mamba activate ./condaenv`
- Note: You can activate the environment from anywhere by providing the full path to the `condaenv` directory.

4. Clone the [`dockerfiles`](https://github.com/mantidproject/dockerfiles) repo and navigate to `macOS/jenkins-node/ansible`.
5. Install the required collections from Ansible Galaxy by running:
- `ansible-galaxy install -r requirements.yml`
6. Time to use that secret you made a note of. Create an `inventory.txt` file with the details of the machines to deploy to (one per line):

```ini
[all]
<IP_ADDRESS_OR_HOSTNAME_1> agent_name=<NAME_OF_AGENT_ON_JENKINS_1> agent_secret=<SECRET_DISPLAYED_ON_CONNECTION_SCREEN_1>
<IP_ADDRESS_OR_HOSTNAME_2> agent_name=<NAME_OF_AGENT_ON_JENKINS_2> agent_secret=<SECRET_DISPLAYED_ON_CONNECTION_SCREEN_2>
```

If you've forgotten the secret, it can be found under `Environment Variables` in the `System Information` section of the agent.

### Running the Script to Deploy the Agent

1. Add your SSH key to the host by running `ssh-copy-id mantidbuilder@<HOSTNAME>` in a terminal.
1. Run the playbook to deploy to all the machines defined in your `inventory.txt` file:

```sh
ansible-playbook -i inventory.txt jenkins-agent.yml -u mantidbuilder -K
```

2. When prompted, enter the agent's password that you made earlier. If you weren't the one who made the password, it should be in the `ISIS Jenkins Nodes` file on Keeper.
3. Wait for the play to complete and visit `https://builds.mantidproject.org/computer/NAME_OF_AGENT_ON_JENKINS`. The agent should be connected within five minutes.

- Note: The agent is kept connected to the controller by a crontab entry that runs on every 5th minute. This means that on first setup the agent may not connect until a minute divisible by five has passed.


## Troubleshooting

- You may need to log in manually or by using VNC at least once to allow the ansible script to run. This can be due to FireVault blocking SSH connections until the machine is unlocked.
- To make use of VNC from a mac: Open finder and press `Cmd+K`, then enter `vnc://<HOSTNAME>`. Use the `mantidbuilder` login for the machine.
7 changes: 7 additions & 0 deletions macOS/jenkins-node/ansible/jenkins-agent.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
- name: Deploy macOS Jenkins agent for Mantid.
hosts: all

roles:
- role: agent
tags: "agent"

3 changes: 3 additions & 0 deletions macOS/jenkins-node/ansible/requirements.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
collections:
- name: geerlingguy.mac
40 changes: 40 additions & 0 deletions macOS/jenkins-node/ansible/roles/agent/tasks/check-connection.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#! /bin/sh

AGENT_NAME=${1}
AGENT_SECRET=${2}


echo "Check if the secret or name has changed and kill the current java process if it has. "

cron_entry=$(crontab -l | grep jenkins-slave.sh)
cron_name_and_secret=$(echo "$cron_entry" | grep -o "$AGENT_NAME .*")

if [[ "$AGENT_NAME $AGENT_SECRET" != "$cron_name_and_secret" ]]; then
pgrep java | xargs kill -9
fi


echo "Run the agent startup script in the background. "

$HOME/jenkins-slave.sh $AGENT_NAME $AGENT_SECRET &


echo "Wait for the script to get to its hang point. "

sleep 5


echo "Check that the script has connected the agent to the controller. "

jenkins_json=$(curl https://builds.mantidproject.org/manage/computer/$AGENT_NAME/api/json)
is_offline=$(echo "$jenkins_json" | grep \"icon\":\"symbol-computer-offline\")

if [[ $is_offline ]]; then
echo "Agent failed to connect to Jenkins controller. "
exit 1
fi


echo "Agent connected successfully. "

exit 0
17 changes: 17 additions & 0 deletions macOS/jenkins-node/ansible/roles/agent/tasks/java11.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Set up Java 11 Installation.

- name: Install Java 11.
community.general.homebrew:
name: java11
state: present

- name: Symlink Java 11.
shell: ln -sfn /opt/homebrew/opt/openjdk@11/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-11.jdk
become: true
become_user: root

- name: Ensure that the java install has been added to the path.
ansible.builtin.lineinfile:
path: ~/.zshrc
line: export PATH="/opt/homebrew/opt/openjdk@11/bin:$PATH"
create: true
34 changes: 34 additions & 0 deletions macOS/jenkins-node/ansible/roles/agent/tasks/mac-sdk.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Download and set up the Mac OSX SDK ready for Conda to use.

- name: Install gnu-tar.
community.general.homebrew:
name: gnu-tar
state: present

- name: Ensure that gnu-tar has been added to the path
ansible.builtin.lineinfile:
path: ~/.zshenv
line: export PATH="/opt/homebrew/opt/gnu-tar/libexec/gnubin:$PATH"
create: true

- name: Download the Mac SDK.
ansible.builtin.get_url:
url: https://github.com/phracker/MacOSX-SDKs/releases/download/11.3/MacOSX10.10.sdk.tar.xz
dest: ~/
mode: '777'
force: true

- name: Unarchive the Mac SDK.
ansible.builtin.unarchive:
src: ~/MacOSX10.10.sdk.tar.xz
dest: ~/
remote_src: yes

- name: Move the Mac SDK into opt
shell: mv /Users/mantidbuilder/MacOSX10.10.sdk /opt
become: true

- name: Remove the downloaded Mac SDK Tarball.
ansible.builtin.file:
path: ~/MacOSX10.10.sdk.tar.xz
state: absent
62 changes: 62 additions & 0 deletions macOS/jenkins-node/ansible/roles/agent/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
# Deploy Jenkins agent on macOS

# Set up the environment

- name: Add user to sudoers on new macs.
shell: /Applications/Privileges.app/Contents/Resources/PrivilegesCLI --add
ignore_errors: true # Not all the macs have these, so don't panic if it fails.

# Install Requirements

- name: Install homebrew
include_role:
name: geerlingguy.mac.homebrew

- name: Make sure homebrew bin is in the path.
ansible.builtin.lineinfile:
path: /etc/paths
state: present
line: '/opt/homebrew/bin'
become: true
become_user: root

- name: Install git.
community.general.homebrew:
name: git
state: latest

- name: Install and Set up Java 11
include_tasks: java11.yml

- name: Check for the MacOSX SDK
stat:
path: /opt/MacOSX10.10.sdk
register: sdk_stats

- name: Download and Install MacOSX SDK
include_tasks: mac-sdk.yml
when: not sdk_stats.stat.exists

# Configure macOS Settings.

- name: Disable screensaver.
shell: defaults write com.apple.screensaver idleTime 0

- name: Disable saved application states to avoid dialog.
shell: defaults write org.python.python NSQuitAlwaysKeepsWindows -bool false

- name: Ensure the machine boots back up after a power failure.
shell: systemsetup -setrestartpowerfailure on
become: true

# Test and start the agent. Note: Connection will only begin consistently every 5th minute if changes are made.

- name: Start the jenkins agent
include_tasks: start-jenkins-agent.yml

# Tidy up the environment.

- name: Remove user from sudoers on new macs.
shell: /Applications/Privileges.app/Contents/Resources/PrivilegesCLI --remove
ignore_errors: true # Not all the macs have these, so don't panic if it fails.
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Test and start the agent. Note: Connection will only begin consistently every 5th minute if changes are made.

- name: Download jenkins slave script.
shell: curl -o $HOME/jenkins-slave.sh https://raw.githubusercontent.com/mantidproject/mantid/main/buildconfig/Jenkins/jenkins-slave.sh

- name: Make the slave script executable.
shell: chmod 777 $HOME/jenkins-slave.sh

- name: Check the Jenkins agent connection script.
script: ./check-connection.sh {{ agent_name }} {{ agent_secret }}

- name: Setup a chrontab entry to run the agent script every 5th minute.
ansible.builtin.cron:
name: "Run slave script"
minute: "*/5"
job: "$HOME/jenkins-slave.sh {{ agent_name }} {{ agent_secret }}"