Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new SPRAS image and handle unpacked singularity images #145

Merged
merged 30 commits into from
Jul 1, 2024

Conversation

jhiemstrawisc
Copy link
Collaborator

This PR aims to accomplish a few things:

  1. The overarching goal is to make SPRAS something that we can run in distributed environments where containers don't necessarily have any privilege (like the OSPool). This requires handling singularity/apptainer images in a slightly different way, where instead of converting and running a .sif file directly, we first "unpack" the sif into a directory. Hence the new unpack_singularity config option.
  2. A side effect of 1 is that we need a portable runtime environment that can run SPRAS in these distributed settings. The conda environment doesn't handle installations of apptainer/docker, so this adds a new container docker-wrappers/SPRAS that can be used to this end.
  3. The next big step is going to be returning to the Snakemake executor for HTCondor, which will require a relatively new version of Snakemake (definitely in the 8.X.X range). In updating the version here, I had to chase a few other version incompatibilities. I'm sure there are more to find.

@jhiemstrawisc
Copy link
Collaborator Author

@agitter, I've tested this and it seems to work when I run it from an OSPool EP. Since I haven't yet set up the actual building of reedcompbio/spras:v1, you're welcome to use jhiemstra/spras:final-v1 as the container in the HTCondor submit file if you want to test running in the CHTC pool (I haven't tested there, but I suspect it will work since this was working in what constitutes an even more locked-down environment). Otherwise, you can let me know if my container-building instructions were clear enough to follow along with!

I'm opening this as a draft for now, until we've had a chance to discuss further.

@jhiemstrawisc jhiemstrawisc marked this pull request as ready for review February 28, 2024 22:33
@jhiemstrawisc
Copy link
Collaborator Author

One thing I'm unsure about is how garbage collecting unpacked sif images might work. I believe other images, both docker and apptainer, are added to some kind of cache that has its own garbage collection rules. However, these unpacked images are written out to disk, so they have the potential to clutter environments that aren't already ephemeral (as would be the case with container-in-container runs). Without some semi-significant refactoring, it's not obvious to me how we can return to delete them.

Two things to consider:

  • Is this our problem? Or are we fine leaving these unpacked containers around. Presumably someone using the tool might want to run things repeatedly, in which case image persistence prevents the user from re-pulling/unpacking things, which should offer a slight speedup. I'm not opposed to saying "if you use this feature, you're responsible for deleting unwanted byproducts." There aren't currently too many containers anyway, and most of them are under 1GB.
  • Do we envision anyone using this feature outside of container-in-container settings? If not, then I'm even more okay not cleaning things up -- the outer container is most likely short lived, and so when it disappears so too do the inner unpacked images.

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial pass focused on testing the HTCondor job, Docker image build, and conda environment. I haven't gone through the code changes in detail yet.

environment.yml Show resolved Hide resolved
docker-wrappers/SPRAS/README.md Show resolved Hide resolved
docker-wrappers/SPRAS/README.md Outdated Show resolved Hide resolved
docker-wrappers/SPRAS/README.md Outdated Show resolved Hide resolved
docker-wrappers/SPRAS/README.md Show resolved Hide resolved
docker-wrappers/SPRAS/README.md Outdated Show resolved Hide resolved
docker-wrappers/SPRAS/README.md Outdated Show resolved Hide resolved
@jhiemstrawisc
Copy link
Collaborator Author

@agitter Updates for this:

As it stands, this submit file is working for me from AP2002, and submits jobs to the OSPool. However, there's some container funkiness that I'm still looking into. In particular, running the submit file that points to an image on Dockerhub very frequently causes the job to fail with error /srv//spras.sh: line 10: snakemake: command not found. In the OSPool, most (if not all) Docker images are immediately converted to Apptainer, and something about this process is overwriting /usr/local/bin in the container. Unfortunately, that's where things like snakemake are supposed to be installed! This shouldn't be happening, so I'd like to keep the bits about using container_image = docker://reedcompbio/spras:v0.1.0 for now, with the note about how to work around this error in the README.

I've also observed that some snakemake jobs intermittently fail on the EPs, but often succeed after a re-run. That's why I'm using --retries 3 in the spras.sh wrapper. Hopefully that leads to more reliable execution.

To test things yourself, follow the container build instructions from spras/docker-wrappers/SPRAS/README.md. Until I can figure out what's going on with OSPool Docker containers, it might be best to immediately convert to .sif using the provided instructions, and then run the workflow.

@jhiemstrawisc
Copy link
Collaborator Author

On a side note, when we merge this, we should make sure we actually build/publish reedcompbio/spras:v0.1.0.

@jhiemstrawisc jhiemstrawisc requested a review from agitter June 28, 2024 20:56
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built and pushed reedcompbio/spras:v0.1.0. The test job in spras.sub worked for me as well when submitted from ap2001.

The overall design, readme, and test instructions look great. My remaining comments here are pretty minor.

There are a couple things to discuss with the broader team. One is the environment updates. The other is whether to make this an initial release in the GitHub repo. I'm in favor of that now that we'll have tagged Docker images.

spras/containers.py Outdated Show resolved Hide resolved
Snakefile Outdated Show resolved Hide resolved
docker-wrappers/SPRAS/README.md Outdated Show resolved Hide resolved
docker-wrappers/SPRAS/README.md Show resolved Hide resolved
docker-wrappers/SPRAS/README.md Outdated Show resolved Hide resolved
docker-wrappers/SPRAS/spras.sub Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
spras/analysis/ml.py Outdated Show resolved Hide resolved
spras/config.py Outdated Show resolved Hide resolved
@jhiemstrawisc jhiemstrawisc requested a review from agitter July 1, 2024 17:20
When running nested, unprivileged containers singularity containers, there's a bit
of extra setup we need to perform in order to get things working. In particular,
in order to avoid FUSE requirements, we can "unpack" containers into a directory
after we've converted the docker image to a `.sif`.

This PR adds a new configuration option to allow that unpacking, as well as a SPRAS
runtime container that can be used to demonstrate this behavior in HTCondor. Bundled
along with these changes are a variety of package version updates that I needed to
change to get things working.
I've tested, and the conda/pip packages seem to resolve/build for both x86/arm
on Linux and MacOS.
@agitter
Copy link
Collaborator

agitter commented Jul 1, 2024

We believe the latest test failure is due to Docker pull rate limits. The workflow and tests run for me locally.

@agitter agitter merged commit 0557289 into Reed-CompBio:master Jul 1, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants