Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump to 24.1 - Single-container setup #607

Merged
merged 24 commits into from
Nov 12, 2024
Merged

Conversation

jyotipm29
Copy link
Contributor

@jyotipm29 jyotipm29 commented Oct 3, 2024

Upgrades:

  • Base Ubuntu Image: Upgraded from version 18.04 to 22.04
  • Galaxy: Upgraded from version 20.09 to 24.1
  • PostgreSQL: Upgraded from version 11 to 15
  • Python3: Upgraded from version 3.7 to 3.10 (Python 3.10 is set as the default interpreter)

Updates:

  • The dockerfile now uses a multi-stage build to reduce the final image size and include only necessary files.

  • New Service Support:

    • Gunicorn: Replaces uWSGI as the web server for Galaxy. Installed by default inside Galaxy's virtual environment. Configured Nginx to proxy Gunicorn enabled on port 4001.
    • Celery: Installed by default inside Galaxy's virtual environment. Enabled Celery for distributed task queues and Celery Beat for periodic task running. RabbitMQ serves as the broker for Celery (if RabbitMQ is disabled, it defaults to PostgreSQL database connection). Redis is used as the backend for Celery (if Redis is disabled, it defaults to a SQLite database). Flower service is added for monitoring and debugging Celery.
    • RabbitMQ Management: Enabled the RabbitMQ management plugin on port 15672 for managing and monitoring the RabbitMQ server. The dashboard is exposed via Nginx and is accessible at the /rabbitmq/ path. The default access credentials are admin:admin.
    • Redis: Added Redis server on port 6379 as a backend for Celery.
    • Flower: Added Flower service on port 5555 for monitoring and debugging Celery. The dashboard is exposed via Nginx and is available at the /flower/ path. The default access credentials are admin:admin.
    • TUSd: Added TUSd server on port 1080 to support fault-tolerant uploads; Nginx is configured to proxy TUSd.
    • gx-it-proxy: Added gx-it-proxy service on port 4002 to support Interactive Tools.
  • Ansible Playbooks:

    • Migrated from galaxyextras git submodule to using mainatined ansible roles.
    • Added configure_rabbitmq_users.yml Ansible playbook, which removes the default guest user and adds admin, galaxy, and flower users for RabbitMQ during container startup.
  • Environment Variables:

    • Added GUNICORN_WORKERS and CELERY_WORKERS magic environment variables to set the number of Gunicorn and Celery workers, respectively, during container startup.
  • Configuration Changes:

    • Replaced the Galaxy Reports sample configuration file.
    • Removed galaxy_web, handlers, reports, and ie_proxy services from Supervisor.
    • Added Gravity for managing Galaxy services such as Gunicorn, Celery, gx-it-proxy, TUSd, reports, and handlers. It uses Supervisor as the process manager, with the configuration file located at /etc/galaxy/gravity.yml.
    • Added support for dynamic handlers (set as the default handler type).
    • Redis and Flower services are now managed by Supervisor.
    • Since Galaxy Interactive Environments are deprecated, they have been replaced by Interactive Tools (ITs). The sample configuration file tools_conf_interactive.xml.sample is placed inside GALAXY_CONFIG_DIR. Nginx is also configured to support both domain and path-based ITs.
    • Switched to using the cvmfs-config.galaxyproject.org repository for automatic configuration and updates of Galaxy project CVMFS repositories. Updated tool data table config path to include CVMFS locations from data.galaxyproject.org in --privileged mode.
    • Enabled IPv6 support in Nginx for ports 80 and 443.
    • Added Subject Alternative Name (SAN) extension (DNS:localhost and IP:127.0.0.1) while generating a self-signed SSL certificate.
    • Ensured the Nginx SSL certificate is trusted system-wide by adding it to the CA store.
    • Updated Galaxy extra dependencies.
    • Added docker_net, docker_auto_rm, and docker_set_user parameters for Docker-enabled job destinations.
    • Added update_yaml_value.py script to update nested key values in a YAML file.
    • Replaced ie_proxy with gx-it-proxy.
    • Replaced nginx_upload_module with TUSd for delegated uploads.
  • CI Tests:

    • Added dive tool for analyzing the docker image
    • Added test for check data persistence

@bgruening
Copy link
Owner

Very cool, I triggered a test run. It would be nice if we can get tests to turn green at some point. But they probably also need to be updated.

Thanks a lot!

.gitmodules Show resolved Hide resolved
@jyotipm29 jyotipm29 marked this pull request as draft October 16, 2024 19:29
@bgruening
Copy link
Owner

As a first step feel free to concentrate on the single-container option. Then the PR are easier to review.

@jyotipm29
Copy link
Contributor Author

jyotipm29 commented Oct 30, 2024

The single-container changes are done, and I’ve also updated the compose files. Everything seems to be working well, and all tests in the forked repo have passed. I also updated the tool versions for the workflow tests, as the old ones seemed incompatible.

Next, I’ll work on adding the Rustus service, integrating interactive tools, and replacing Nginx with Traefik in the compose setup. Let me know if you have any feedback on the current changes. Thanks!

@bgruening
Copy link
Owner

@jyotipm29 really impressive work. Thanks you a lot.
Please split this PR into multiple PRs. So that we can get the single-container merged with documentation and you can try to hack further on the compose stuff.

@mira-miracoli
Copy link

This looks really cool @jyotipm29 Thanks a lot! 🚀
From what I see, the tests are failing due to github walltime limits and storage limits.
If you are sure that it works, maybe we could also test it manually again, we could already merge this. What do you think @sanjaysrikakulam ?

Next, I’ll work on adding the Rustus service, integrating interactive tools, and replacing Nginx with Traefik in the compose setup. Let me know if you have any feedback on the current changes. Thanks!

A bit more important than the compose setup and traefik would be to successively replace ansible-galaxy-extras with the roles that are maintained and e.g. used in usegalaxy-eu/infrastructure-playbook or on the org server. Sorry that we did not came up with this earlier and you already updated the role in your fork. Maybe it makes the replacement easier, because you can replace the roles one by one and run the ci tests in between.
Sorry I think we are currently lacking a bit a use-case for the compose setup and we should have told you earlier.

@jyotipm29
Copy link
Contributor Author

Thanks! I will check that out.
The current PR will track the single-container setup and this will track the compose one.

@jyotipm29 jyotipm29 changed the title bump to 24.1 bump to 24.1 - Single-container setup Oct 30, 2024
.travis.yml Outdated Show resolved Hide resolved
.github/workflows/single.sh Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
galaxy/Dockerfile Outdated Show resolved Hide resolved
galaxy/tools_conf_interactive.xml.sample Outdated Show resolved Hide resolved
galaxy/startup.sh Show resolved Hide resolved
galaxy/startup.sh Show resolved Hide resolved
@sanjaysrikakulam
Copy link

This looks really cool @jyotipm29 Thanks a lot! 🚀 From what I see, the tests are failing due to github walltime limits and storage limits. If you are sure that it works, maybe we could also test it manually again, we could already merge this. What do you think @sanjaysrikakulam ?

Next, I’ll work on adding the Rustus service, integrating interactive tools, and replacing Nginx with Traefik in the compose setup. Let me know if you have any feedback on the current changes. Thanks!

A bit more important than the compose setup and traefik would be to successively replace ansible-galaxy-extras with the roles that are maintained and e.g. used in usegalaxy-eu/infrastructure-playbook or on the org server. Sorry that we did not came up with this earlier and you already updated the role in your fork. Maybe it makes the replacement easier, because you can replace the roles one by one and run the ci tests in between. Sorry I think we are currently lacking a bit a use-case for the compose setup and we should have told you earlier.

Sure! Björn already left his comments and suggestions, and I thought Jyoti would probably want to address them. However, as you pointed out, this can be merged.

@jyotipm29 Excellent work! Thank you! :)

@bgruening
Copy link
Owner

bgruening commented Oct 30, 2024

@jyotipm29 what do you think about postponing the CI compose tests until after the single-container tests is green. This way we safe a bit of CI time and you can faster iterate on the single-container one?

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@jyotipm29
Copy link
Contributor Author

@jyotipm29 what do you think about postponing the CI compose tests until after the single-container tests is green. This way we safe a bit of CI time and you can faster iterate on the single-container one?

Yes, good idea. I would temporarily disable those tests in the next commit.

@bgruening
Copy link
Owner

If someone wants to test it quickly :)

docker run -p 8080:80 quay.io/bgruening/galaxy:24.1-beta

@bgruening
Copy link
Owner

I get ...

PermissionError: [Errno 13] Permission denied: '/home/galaxy/.config/conda/.condarc'

So installing tools into the container does not work with the container. Its strange, I thought we had a test for this.

@jyotipm29
Copy link
Contributor Author

I get ...

PermissionError: [Errno 13] Permission denied: '/home/galaxy/.config/conda/.condarc'

So installing tools into the container does not work with the container. Its strange, I thought we had a test for this.

This is weird. The tool installation worked in my environment.

@bgruening
Copy link
Owner

Did you run with or without --privileged=true?

@jyotipm29
Copy link
Contributor Author

jyotipm29 commented Oct 31, 2024

It worked both ways. Even I can see in the CI test logs that the tool installation worked.

@bgruening
Copy link
Owner

Which tool.are you using to install?

@jyotipm29
Copy link
Contributor Author

I tested cherry_pick_fasta and abyss. Is there any particular tool that you want me to check?

@jyotipm29
Copy link
Contributor Author

This looks really cool @jyotipm29 Thanks a lot! 🚀 From what I see, the tests are failing due to github walltime limits and storage limits. If you are sure that it works, maybe we could also test it manually again, we could already merge this. What do you think @sanjaysrikakulam ?

Next, I’ll work on adding the Rustus service, integrating interactive tools, and replacing Nginx with Traefik in the compose setup. Let me know if you have any feedback on the current changes. Thanks!

A bit more important than the compose setup and traefik would be to successively replace ansible-galaxy-extras with the roles that are maintained and e.g. used in usegalaxy-eu/infrastructure-playbook or on the org server. Sorry that we did not came up with this earlier and you already updated the role in your fork. Maybe it makes the replacement easier, because you can replace the roles one by one and run the ci tests in between. Sorry I think we are currently lacking a bit a use-case for the compose setup and we should have told you earlier.

Just to confirm, the idea is to completely phase out ansible-galaxy-extras and instead use individual roles like usegalaxy_eu.nginx, usegalaxy_eu.htcondor etc, similar to how we currently use galaxyproject.postgresql in this repository. Is this correct?

@bgruening
Copy link
Owner

Just to confirm, the idea is to completely phase out ansible-galaxy-extras and instead use individual roles like usegalaxy_eu.nginx, usegalaxy_eu.htcondor etc, similar to how we currently use galaxyproject.postgresql in this repository. Is this correct?

Yes :-). And now since the tests work I would do that one commit at a time and see if tests still work.

galaxy/Dockerfile Outdated Show resolved Hide resolved
@bgruening
Copy link
Owner

It seems that if you start Galaxy with -v foo:/export, close it and reopen it does not start again (nginx error). This is an important functionality, as it allows persiting the state of the application.

@jyotipm29
Copy link
Contributor Author

I can't reproduce the issue. I have tested it and the persistence works. Please share what error you see.

@bgruening
Copy link
Owner

Start the contianer, install a tool.
Look at (base) root@b1b310dd2cfc:/galaxy-central# ll /export/shed_tools/ there is nothing under /export.

It is here:

(base) root@b1b310dd2cfc:/galaxy-central# head /etc/galaxy/shed_tool_conf.xml 
<?xml version="1.0" ?>
<toolbox tool_path="/galaxy-central/database/shed_tools">
    <tool file="toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/541082d03bef/samtools_stats/samtools_stats.xml" guid="toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5">
        <tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
        <repository_name>samtools_stats</repository_name>
        <repository_owner>devteam</repository_owner>
        <installed_changeset_revision>541082d03bef</installed_changeset_revision>
        <id>toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5</id>
        <version>2.0.5</version>
    </tool>
(base) root@b1b310dd2cfc:/galaxy-central# ll /galaxy-central/database/shed_tools/
total 0
drwxr-xr-x 3 galaxy galaxy 60 Nov  8 10:10 toolshed.g2.bx.psu.edu/
(base) root@b1b310dd2cfc:/galaxy-central# 

When you now stop the container and run the same command again, so mount the same directory in again. It can not load the installed tool, because it was not in the /export dir. the result is that the tool is not installed anymore. But there is also a different problem. I only see an nginx error page. Galaxy is not starting in this setting.

@jyotipm29
Copy link
Contributor Author

Start the contianer, install a tool. Look at (base) root@b1b310dd2cfc:/galaxy-central# ll /export/shed_tools/ there is nothing under /export.

It is here:

(base) root@b1b310dd2cfc:/galaxy-central# head /etc/galaxy/shed_tool_conf.xml 
<?xml version="1.0" ?>
<toolbox tool_path="/galaxy-central/database/shed_tools">
    <tool file="toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/541082d03bef/samtools_stats/samtools_stats.xml" guid="toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5">
        <tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
        <repository_name>samtools_stats</repository_name>
        <repository_owner>devteam</repository_owner>
        <installed_changeset_revision>541082d03bef</installed_changeset_revision>
        <id>toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5</id>
        <version>2.0.5</version>
    </tool>
(base) root@b1b310dd2cfc:/galaxy-central# ll /galaxy-central/database/shed_tools/
total 0
drwxr-xr-x 3 galaxy galaxy 60 Nov  8 10:10 toolshed.g2.bx.psu.edu/
(base) root@b1b310dd2cfc:/galaxy-central# 

When you now stop the container and run the same command again, so mount the same directory in again. It can not load the installed tool, because it was not in the /export dir. the result is that the tool is not installed anymore. But there is also a different problem. I only see an nginx error page. Galaxy is not starting in this setting.

In the original dockerfile, we just create the /shed_tools directory and galaxy is not configured to use that path. But for the nginx issue, are you refreshing the web page immediately after restarting the container? Because it takes a few minutes to startup again.

@jyotipm29
Copy link
Contributor Author

Start the contianer, install a tool. Look at (base) root@b1b310dd2cfc:/galaxy-central# ll /export/shed_tools/ there is nothing under /export.

It is here:

(base) root@b1b310dd2cfc:/galaxy-central# head /etc/galaxy/shed_tool_conf.xml 
<?xml version="1.0" ?>
<toolbox tool_path="/galaxy-central/database/shed_tools">
    <tool file="toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/541082d03bef/samtools_stats/samtools_stats.xml" guid="toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5">
        <tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
        <repository_name>samtools_stats</repository_name>
        <repository_owner>devteam</repository_owner>
        <installed_changeset_revision>541082d03bef</installed_changeset_revision>
        <id>toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5</id>
        <version>2.0.5</version>
    </tool>
(base) root@b1b310dd2cfc:/galaxy-central# ll /galaxy-central/database/shed_tools/
total 0
drwxr-xr-x 3 galaxy galaxy 60 Nov  8 10:10 toolshed.g2.bx.psu.edu/
(base) root@b1b310dd2cfc:/galaxy-central# 

When you now stop the container and run the same command again, so mount the same directory in again. It can not load the installed tool, because it was not in the /export dir. the result is that the tool is not installed anymore. But there is also a different problem. I only see an nginx error page. Galaxy is not starting in this setting.

Since we already persist the database folder, do we need the /export/shed_tools dir?

@jyotipm29 jyotipm29 marked this pull request as ready for review November 9, 2024 01:59
galaxy/reports.yml.sample Outdated Show resolved Hide resolved
galaxy/Dockerfile Show resolved Hide resolved
@jyotipm29
Copy link
Contributor Author

Start the contianer, install a tool. Look at (base) root@b1b310dd2cfc:/galaxy-central# ll /export/shed_tools/ there is nothing under /export.

It is here:

(base) root@b1b310dd2cfc:/galaxy-central# head /etc/galaxy/shed_tool_conf.xml 
<?xml version="1.0" ?>
<toolbox tool_path="/galaxy-central/database/shed_tools">
    <tool file="toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/541082d03bef/samtools_stats/samtools_stats.xml" guid="toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5">
        <tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
        <repository_name>samtools_stats</repository_name>
        <repository_owner>devteam</repository_owner>
        <installed_changeset_revision>541082d03bef</installed_changeset_revision>
        <id>toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5</id>
        <version>2.0.5</version>
    </tool>
(base) root@b1b310dd2cfc:/galaxy-central# ll /galaxy-central/database/shed_tools/
total 0
drwxr-xr-x 3 galaxy galaxy 60 Nov  8 10:10 toolshed.g2.bx.psu.edu/
(base) root@b1b310dd2cfc:/galaxy-central# 

When you now stop the container and run the same command again, so mount the same directory in again. It can not load the installed tool, because it was not in the /export dir. the result is that the tool is not installed anymore. But there is also a different problem. I only see an nginx error page. Galaxy is not starting in this setting.

Setting managed_config_dir to database/config would fix this i guess (since database is persisted and earlier the config pointed to /etc/galaxy by default)

@bgruening
Copy link
Owner

I will document what I found so far here, not everything needs to be addressed, its just a list to keep track.

I'm tempted to merge as it is and close old issues, ping old users to give this new version a try as soon as we can ensure the /export functionality works.

  • Major, we need to ensure that most, all useful tests are running. See my comments about the old travis stuff. New tests can also not hurt.

  • I was still not able to get the container running with an exported version of data under /export. My Galaxy does simply not start under those conditions, even if I wait for multiple minutes.

When I restart a container with a full /export:

(base) root@fc28a50a6928:/galaxy-central# galaxyctl status
Dynamic handlers are configured in Gravity but Galaxy is not configured to assign jobs to handlers dynamically, so these handlers will not handle jobs. Set the job handler assignment method in the Galaxy job configuration to `db-skip-locked` or `db-transaction-isolation` to fix this.
supervisord is not running
  • [2024-11-09T16:40:51.456] error: Node configuration differs from hardware: CPUs=12:12(hw) Boards=1:1(hw) SocketsPerBoard=12:1(hw) CoresPerSocket=1:6(hw) ThreadsPerCore=1:2(hw)

Cosmetic changes:

  • /root/502/index.shtml does not exist - it is a configured nginx error page
  • the image is still double in size of the last version 20.09 but this we can investigate another time
  • you did a great job in adding galaxy-root-path everywhere, maybe its time to rename the entire folder just to galaxy and not galaxy-central. This is really a old name no one remembers anymore :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis is not used anymore as it seems, at least it is not triggered. Do you have time to move the tests over to github actions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that

README.md Outdated Show resolved Hide resolved
galaxy/common_cleanup.sh Outdated Show resolved Hide resolved
@bgruening
Copy link
Owner

(base) root@cdb56fab569a:/galaxy-central# ll -ah /home/galaxy/
total 60K
drwxr-x--- 1 galaxy galaxy 4.0K Nov  9 18:30 ./
drwxr-xr-x 1 root   root   4.0K Nov  9 15:57 ../
drwxr-xr-x 3 root   root   4.0K Nov  9 18:02 .ansible/
-rw------- 1 galaxy galaxy   39 Nov  9 18:30 .bash_history
-rw-r--r-- 1 galaxy galaxy  220 Jan  6  2022 .bash_logout
-rw-r--r-- 1 galaxy galaxy 3.4K Nov  9 14:46 .bashrc
drwxrwxr-x 1 galaxy galaxy 4.0K Nov  9 18:03 .cache/
drwx------ 3 root   root   4.0K Nov  9 18:10 .config/
drwxr-xr-x 1 root   root   4.0K Nov  9 16:21 ephemeris/

Some folders are owned by root, this means that certain operations are crashing under the galaxy user.

@jyotipm29
Copy link
Contributor Author

(base) root@cdb56fab569a:/galaxy-central# ll -ah /home/galaxy/
total 60K
drwxr-x--- 1 galaxy galaxy 4.0K Nov  9 18:30 ./
drwxr-xr-x 1 root   root   4.0K Nov  9 15:57 ../
drwxr-xr-x 3 root   root   4.0K Nov  9 18:02 .ansible/
-rw------- 1 galaxy galaxy   39 Nov  9 18:30 .bash_history
-rw-r--r-- 1 galaxy galaxy  220 Jan  6  2022 .bash_logout
-rw-r--r-- 1 galaxy galaxy 3.4K Nov  9 14:46 .bashrc
drwxrwxr-x 1 galaxy galaxy 4.0K Nov  9 18:03 .cache/
drwx------ 3 root   root   4.0K Nov  9 18:10 .config/
drwxr-xr-x 1 root   root   4.0K Nov  9 16:21 ephemeris/

Some folders are owned by root, this means that certain operations are crashing under the galaxy user.

The root user's home directory was set to /home/galaxy. I will fix it in the next commit.

Copy link

@sanjaysrikakulam sanjaysrikakulam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to be merged. I ran a quick test, and I was able to spin it up, install the tools, upload files, run jobs, and run ITs.

Great work @jyotipm29!

@jyotipm29
Copy link
Contributor Author

I think this is ready to be merged. I ran a quick test, and I was able to spin it up, install the tools, upload files, run jobs, and run ITs.

Great work @jyotipm29!

Thanks! I really appreciate it! :)

@bgruening bgruening merged commit c369bc7 into bgruening:master Nov 12, 2024
2 checks passed
@bgruening
Copy link
Owner

Great work!

@jyotipm29 jyotipm29 deleted the 24.1 branch November 12, 2024 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants