Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More failure to build #157

Open
drbitboy opened this issue May 16, 2024 · 14 comments
Open

More failure to build #157

drbitboy opened this issue May 16, 2024 · 14 comments

Comments

@drbitboy
Copy link
Contributor

This code:

https://github.com/magao-x/MagAOX/blob/dev/setup/provision.sh#L29

# Function to refresh sudo timer
refresh_sudo_timer() {
    while true; do
        $_REAL_SUDO -v
        sleep 60
    done
}

# Start refreshing sudo timer in the background
if [[ $EUID != 0 ]]; then
    $_REAL_SUDO -v
    refresh_sudo_timer &
fi

in /setup/proviion.sh

causes the build to freeze, waiting for the password to be passed to sudo.

@joseph-long
Copy link
Member

You do need to type in a password, yes. Am i missing something here?

@drbitboy
Copy link
Contributor Author

drbitboy commented May 16, 2024

You do need to type in a password, yes. Am i missing something here?

Yes several things actually:

  1. The ubuntu account is configured by default to use sudo without a password (NOPASSWD under /etc/sudoers) , so this is unnecessary.
  2. Anyone who runs the provision.sh script without that NOPASSWD option, even if temporarily, is doing it wrong.
  3. This code requires the user to type in a password every sixty seconds. I don't think so.
  4. If the user does not redirect the voluminous output of bash -lx provision.sh to somewhere other than the terminal, then that [sudo] password for ubuntu: prompt will never be noticed.

Bottom line, this code is ill-advised.

@drbitboy
Copy link
Contributor Author

Here is what I did:

linux-host$ multipass launch -n devtest -c 4 -d 20.0GiB -m 8.0GiB 22.04
Launched: devtest
linux-host$ multipass exec devtest -- touch .hushlogin
linux-host$ multipass shell devtest 
ubuntu@devtest:~$ mkdir devel
ubuntu@devtest:~$ cd devel
ubuntu@devtest:~/devel$ git clone -q --depth=1 -b dev https://github.com/magao-x/MagAOX.git
ubuntu@devtest:~/devel$ cd MagAOX/setup/
ubuntu@devtest:~/devel/MagAOX/setup$ git log --all
commit cec3472ac3c8d935775feb629562024939882978 (grafted, HEAD -> dev, origin/dev, origin/HEAD)
Author: Joseph Long <[email protected]>
Date:   Thu May 16 01:30:00 2024 -0500

    audibleAlerts: remove 30mph maggiealert
ubuntu@devtest:~/devel/MagAOX/setup$ bash -lx provision.sh 1>/home/ubuntu/provision.log 2>&1
[sudo] password for ubuntu: 
[sudo] password for ubuntu: ubuntu@devtest:~/devel/MagAOX/setup$ 

That last line is where I killed the sudo process - using sudo, btw - that had halted the provisioning.

Also, because my sudo -H PR has not yet been merged into magao-x dev branch, the provisioning left root-owned files under /home/ubuntu:

ubuntu@devtest:~$ find ~ -ls | grep root
   809059      4 drwxr-xr-x   2 root     root         4096 May 16 13:07 /home/ubuntu/.conda
   794624      4 drwxr-xr-x   3 root     root         4096 May 16 13:08 /home/ubuntu/.cache/conda
   794625      4 drwxr-xr-x   2 root     root         4096 May 16 13:08 /home/ubuntu/.cache/conda/notices
   794632      0 -rw-r--r--   1 root     root            0 May 16 13:08 /home/ubuntu/.cache/conda/notices/notices.cache
   775660      4 drwxr-xr-x   3 root     root         4096 May 16 13:02 /home/ubuntu/.cmake
   775664      4 drwxr-xr-x   3 root     root         4096 May 16 13:02 /home/ubuntu/.cmake/packages
   775665      4 drwxr-xr-x   2 root     root         4096 May 16 13:02 /home/ubuntu/.cmake/packages/Eigen3
   775666      4 -rw-r--r--   1 root     root           38 May 16 13:02 /home/ubuntu/.cmake/packages/Eigen3/ab180de1e754d8e7df17598fcfafb882

I am fairly certain those files caused the eventual failure of the provisioning script:


 SRCNAME = AOloopControl_perfTest -> LIBNAME = cacaoAOloopControlperfTest
 SRCNAME = milk_module_example
-- Checking for module 'ImageStreamIO'
--   No package 'ImageStreamIO' found
====================================================
BUILD_FLAGS             :  -DPACKAGE_NAME=\"milk\" -DCONFIGDIR=\"/opt/MagAOX/source/milk/config\" -DINSTALLDIR=\"/usr/local/milk-1.03.00\" -DSOURCEDIR=\"/opt/MagAOX/source/milk\" -DABSSRCTOPDIR=\"/opt/MagAOX/source/milk\" -DPACKAGE_BUGREPORT=\"https://github.com/milk-org/milk/issues\"
COMPILE_FLAGS           :  
COMPILE_OPTIONS         :  $<$<COMPILE_LANGUAGE:C>:-march=native>;$<$<COMPILE_LANGUAGE:C>:-flto=auto>;$<$<COMPILE_LANGUAGE:C>:-fwhole-program>;$<$<COMPILE_LANGUAGE:C>:-pipe>
CMAKE_EXE_LINKER_FLAGS  :  
CMAKE_C_FLAGS           :  
CMAKE_CXX_FLAGS         :  
CMAKE_C_FLAGS_DEBUG           : -O0 -g -Wall -Wextra
CMAKE_C_FLAGS_RELEASE         : -Ofast -DNDEBUG
CMAKE_C_FLAGS_RELWITHDEBINFO  : -O2 -g -DNDEBUG
CMAKE_C_FLAGS_MINSIZEREL      : -Os -DNDEBUG
CMAKE_CURRENT_SOURCE_DIR      : /opt/MagAOX/source/milk
CMAKE_CURRENT_BINARY_DIR      : /opt/MagAOX/source/milk/_build
INSTALL_PKGCONFIG_DIR         : lib/pkgconfig
====================================================
LINKSTRING: -lCLIcore -lImageStreamIO -lm -lmilkCOREMODarith -lmilkCOREMODiofits -lmilkCOREMODmemory -lmilkCOREMODtools 
/opt/conda/bin/python: No module named pybind11
CMake Error at python_module/CMakeLists.txt:15 (string):
  string sub-command REPLACE requires at least four arguments.


CMake Error at python_module/CMakeLists.txt:16 (string):
  string sub-command REPLACE requires at least four arguments.


-- Configuring incomplete, errors occurred!
See also "/opt/MagAOX/source/milk/_build/CMakeFiles/CMakeOutput.log".
+ exit_with_error 'milk/cacao install failed'
+ log_error 'milk/cacao install failed'
++ tput setaf 1
++ tput sgr0
+ echo -e 'milk/cacao install failed'
milk/cacao install failed
+ exit 1
++ '[' 2 = 1 ']'
ubuntu@devtest:~$ 

@joseph-long
Copy link
Member

  1. The primary purpose of the scripts in this folder is to install the MagAO-X instrument software on the instrument, not a virtual machine. That means an interactive session.
  2. I disagree.
  3. This code requires the user to type in a password once, and re-validates the sudo timestamp so they do not need to enter a password every 60 seconds.
  4. i’m not sure what to tell you about this point since it seems unlikely that you both have sudo with NOPASSWD and are getting a password prompt. If sudo -v produces a password prompt on a machine where sudo is configured to be passwordless, that’s unexpected behavior and I guess a workaround is required.

@joseph-long
Copy link
Member

If you’re worried that there is a problem with the install process, I recommend checking the build history here: https://github.com/magao-x/MagAOX/actions

Currently, the scripted install is succeeding on Rocky Linux and Ubuntu in CI.

(As you might guess, the currently failing “build image” task is to eventually produce a downloadable VM artifact so that you no longer have to run this install process yourself. Naturally, this is a little more complicated than just verifying that installation works, so stay tuned.)

@joseph-long
Copy link
Member

Also, depending on your use case, you may want to investigate the “container build”. The resulting image contains all the MagAO-X software dependencies.

@drbitboy
Copy link
Contributor Author

drbitboy commented May 16, 2024

If sudo -v produces a password prompt on a machine where sudo is configured to be passwordless, that’s unexpected behavior and I guess a workaround is required.

Then a workaround is required; see below:

linux-host$ multipass launch -n devtest -c 4 -d 20.0GiB -m 8.0GiB 22.04
Launched: devtest
linux-host$ multipass exec devtest -- touch .hushlogin
linux-host$ sudo su - 
[sudo] password for dad:   
linux-host$ multipass shell devtest 

ubuntu@devtest:~$ sudo -H su -     ### <== NOPASSWD sudo
root@devtest:~# passwd ubuntu
New password: 
Retype new password: 
passwd: password updated successfully
root@devtest:~# exit
logout

ubuntu@devtest:~$ sudo -v
[sudo] password for ubuntu:    ### [sudo -v] requests a password, even though NOPASSWD sudo is configured

@drbitboy
Copy link
Contributor Author

I understand why maintaining a working [vm] build is at the bottom of your list of priorities.

But it is consuming all my time and effort because [vm] is what I have to work with, to the point where I am not making any progress anywhere else.

@joseph-long
Copy link
Member

Thanks, that’s an interesting wrinkle. I would recommend commenting out the code in that case when you are running on a VM image from Canonical. If that combination of circumstances can be detected in the script, it can be omitted automatically.

I am no longer recommending multipass to members of our team here because their canned images are not quite like a “real” install in various ways, and this is just the latest. For a more “flight-like” configuration, I would recommend setting up a standard Rocky Linux 9 VM with your favorite virtualization tool and following the normal install process. Then, perform provisioning using a MAGAOX_ROLE of workstation.

@joseph-long
Copy link
Member

If you want to save some time using multipass, i would recommend making a “snapshot” of a successful VM build to work from (when you have one). Our dependencies don’t change THAT much, except perhaps for MILK/CACAO, so it would be possible to “spruce up” a snapshot with the latest MagAO-X code by simply doing “git pull” and “make install”.

I will revisit the “MagAO-X VM in a can” project soon; it was actually working for ARM hosts previously … but Rocky 9.4 released and things broke.

@drbitboy
Copy link
Contributor Author

For a more “flight-like” configuration, I would recommend setting up a standard Rocky Linux 9 VM with your favorite virtualization tool and following the normal install process. Then, perform provisioning using a MAGAOX_ROLE of workstation.

I don't need "flight-like," I need a working system to fiddle around with processes and sockets and pipes, and I need something that spins up quickly without a lot of user intervention, and multipass provided that.

Now I have to fiddle around with my "favorite virtualization tool" (I assume that means something like VirtualBox?) and do a lot of manual work. Do we have a procedure somewhere for "setting up a standate Roclky Linux 9 VM," or do I need to find to a Rocky installer ISO and sit through the from-scratch installer, or is there something further along?

I see the CI install uses the root user (EUID is 0), so that explains why it is immune to, and useless for detecting, issues with sudo or root-owned files in random places.

@drbitboy
Copy link
Contributor Author

If you want to save some time using multipass, i would recommend making a “snapshot”

multipass launch is so fast that I snapshots are not much of a saving, and the only reason I don't get a later snapshot is that I am chasing dev's provision.sh so I need to start at square one.

@drbitboy
Copy link
Contributor Author

If the CI builds are supposed to be representative, then the provisioning command should be

sudo -H bash -lx provision.sh

and we can dispense with all of the sudos in the scripts under /setup/.

@drbitboy
Copy link
Contributor Author

... If sudo -v produces a password prompt on a machine where sudo is configured to be passwordless, that’s unexpected behavior and I guess a workaround is required.

Done. #155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants