Skip to content

Commit

Permalink
merge master
Browse files Browse the repository at this point in the history
  • Loading branch information
Acribbs committed Dec 3, 2024
2 parents 38e8946 + 243f74e commit 884d0bd
Show file tree
Hide file tree
Showing 82 changed files with 4,097 additions and 2,648 deletions.
36 changes: 35 additions & 1 deletion .github/workflows/cgatcore_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,15 @@ jobs:

steps:
- uses: actions/checkout@v3

- name: Cache conda
uses: actions/cache@v3
env:
# Increase this value to reset cache if conda/environments/cgat-core.yml has not changed
CACHE_NUMBER: 0
with:
path: ~/conda_pkgs_dir
key: ${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-${{ hashFiles('conda/environments/cgat-core.yml') }}

- name: Set installer URL
id: set-installer-url
run: |
Expand All @@ -35,6 +36,7 @@ jobs:
elif [[ "${{ matrix.os }}" == "macos-latest" ]]; then
echo "installer-url=https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh" >> $GITHUB_ENV
fi
- uses: conda-incubator/setup-miniconda@v2
with:
installer-url: ${{ env.installer-url }}
Expand All @@ -43,13 +45,45 @@ jobs:
channel-priority: true
activate-environment: cgat-core
environment-file: conda/environments/cgat-core.yml

- name: Configure Conda Paths
run: echo "/usr/share/miniconda3/condabin" >> $GITHUB_PATH

- name: Show conda
run: |
conda info
conda list
- name: Debug Python Environment
run: |
python --version
pip list
openssl version
- name: Test
run: |
pip install .
./all-tests.sh
deploy_docs:
name: Deploy MkDocs Documentation
runs-on: ubuntu-latest
needs: build

steps:
- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install MkDocs and Dependencies
run: |
pip install mkdocs mkdocs-material mkdocstrings[python]
- name: Build and Deploy MkDocs Site
run: mkdocs gh-deploy --force --clean
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2019 cgat-developers
Copyright (c) 2024 cgat-developers

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
33 changes: 5 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,18 @@

![CGAT-core](https://github.com/cgat-developers/cgat-core/blob/master/docs/img/CGAT_logo.png)
----------------------------------------
![Licence](https://img.shields.io/github/license/cgat-developers/cgat-core.svg)
![Conda](https://img.shields.io/conda/v/bioconda/cgatcore.svg)
![Build Status](https://github.com/cgat-developers/cgat-core/actions/workflows/cgatcore_python.yml/badge.svg)

<p align="left">
<a href="https://readthedocs.org/projects/cgat-core/badge/?version=latest", alt="Documentation">
<img src="https://readthedocs.org/projects/cgat-core/badge/?version=latest" /></a>
<a href="https://travis-ci.org/cgat-developers/cgat-core", alt="Travis">
<img src="https://img.shields.io/travis/cgat-developers/cgat-core.svg" /></a>
<a href="https://twitter.com/cgat_oxford?lang=en", alt="Twitter followers">
<img src="https://img.shields.io/twitter/url/http/shields.io.svg?style=social&logo=twitter" /></a>
<a href="https://twitter.com/cgat_oxford?lang=en", alt="Twitter followers">
<img src="https://img.shields.io/twitter/url/http/shields.io.svg?style=social&logo=twitter" /></a>
</p>

----------------------------------------

CGAT-core is a workflow management system that allows users to quickly and reproducibly build scalable
data analysis pipelines. CGAT-core is a set of libraries and helper functions used to enable researchers
to design and build computational workflows for the analysis of large-scale data-analysis.

Documentation for CGAT-core can be accessed at [read the docs](http://cgat-core.readthedocs.io/en/latest/)

Used in combination with CGAT-apps, we have demonstrated the functionality of our
flexible implementation using a set of well documented, easy to install and easy to use workflows,
called [CGAT-flow](https://github.com/cgat-developers/cgat-flow) ([Documentation](https://www.cgat.org/downloads/public/cgatpipelines/documentation/)).
Documentation for CGAT-core can be accessed [here](https://cgat-developers.github.io/cgat-core/)

CGAT-core is open-sourced, powerful and user-friendly, and has been continually developed
as a Next Generation Sequencing (NGS) workflow management system over the past 10 years.
Expand All @@ -32,19 +21,7 @@ as a Next Generation Sequencing (NGS) workflow management system over the past 1
Installation
============

The following sections describe how to install the [cgatcore](https://cgat-core.readthedocs.io/en/latest/index.html) framework. For instructions on how to install
our other repos, CGAT-apps (scripts) and CGAT-flow (workflows/pipelines), please follow these instructions [here](https://www.cgat.org/downloads/public/cgatpipelines/documentation/InstallingPipelines.html).
The following sections describe how to install the [cgatcore](https://cgat-developers.github.io/cgat-core/) framework.

The preferred method to install the cgatcore is using conda, by following the instructions on [read the docs](https://cgat-core.readthedocs.io/en/latest/getting_started/Installation.html). However, there are a few other methods to install cgatcore, including pip and our own bash script installer.

Linux vs OS X
=============

* ulimit works as expected in Linux but it does not have an effect on OS X. [Disabled](https://github.com/cgat-developers/cgat-core/commit/d4d9b9fb75525873b291028a622aac70c44a5065) ulimit tests for OS X.

* ssh.connect times out in OSX. Exception [caught](https://github.com/cgat-developers/cgat-core/commit/d4d9b9fb75525873b291028a622aac70c44a5065)

* Linux uses /proc/meminfo and OS X uses [vm_stat](https://github.com/cgat-developers/cgat-core/compare/bb1c75df8f42...575f0699b326)

* Currently our testing framework is broken for OSX, however we are working to fix this. However, we dont envisage any issues running the code at present.

1 change: 1 addition & 0 deletions all-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ pytest -v tests/test_pipeline_cli.py
pytest -v tests/test_pipeline_actions.py
pytest -v tests/test_execution_cleanup.py
pytest -v tests/test_s3_decorators.py
pytest -v tests/test_container.py
130 changes: 121 additions & 9 deletions cgatcore/pipeline/execution.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,60 @@ def _pickle_args(args, kwargs):
return (submit_args, args_file)


class ContainerConfig:
"""Container configuration for pipeline execution."""

def __init__(self, image=None, volumes=None, env_vars=None, runtime="docker"):
"""
Args:
image (str): Container image (e.g., "ubuntu:20.04").
volumes (list): Volume mappings (e.g., ['/data:/data']).
env_vars (dict): Environment variables for the container.
runtime (str): Container runtime ("docker" or "singularity").
"""
self.image = image
self.volumes = volumes or []
self.env_vars = env_vars or {}
self.runtime = runtime.lower() # Normalise to lowercase

if self.runtime not in ["docker", "singularity"]:
raise ValueError("Unsupported container runtime: {}".format(self.runtime))

def get_container_command(self, statement):
"""Convert a statement to run inside a container."""
if not self.image:
return statement

if self.runtime == "docker":
return self._get_docker_command(statement)
elif self.runtime == "singularity":
return self._get_singularity_command(statement)
else:
raise ValueError("Unsupported container runtime: {}".format(self.runtime))

def _get_docker_command(self, statement):
"""Generate a Docker command."""
volume_args = [f"-v {volume}" for volume in self.volumes]
env_args = [f"-e {key}={value}" for key, value in self.env_vars.items()]

return " ".join([
"docker", "run", "--rm",
*volume_args, *env_args, self.image,
"/bin/bash", "-c", f"'{statement}'"
])

def _get_singularity_command(self, statement):
"""Generate a Singularity command."""
volume_args = [f"--bind {volume}" for volume in self.volumes]
env_args = [f"--env {key}={value}" for key, value in self.env_vars.items()]

return " ".join([
"singularity", "exec",
*volume_args, *env_args, self.image,
"bash", "-c", f"'{statement}'"
])


def start_session():
"""start and initialize the global DRMAA session."""
global GLOBAL_SESSION
Expand Down Expand Up @@ -789,6 +843,13 @@ def get_val(d, v, alt):

return benchmark_data

def set_container_config(self, image, volumes=None, env_vars=None, runtime="docker"):
"""Set container configuration for all tasks executed by this executor."""

if not image:
raise ValueError("An image must be specified for the container configuration.")
self.container_config = ContainerConfig(image=image, volumes=volumes, env_vars=env_vars, runtime=runtime)

def start_job(self, job_info):
"""Add a job to active_jobs list when it starts."""
self.active_jobs.append(job_info)
Expand Down Expand Up @@ -838,15 +899,63 @@ def cleanup_failed_job(self, job_info):
else:
self.logger.info(f"Output file not found (already removed or not created): {outfile}")

def run(self, statement_list):
"""Run a list of statements and track each job's lifecycle."""
def run(
self,
statement_list,
job_memory=None,
job_threads=None,
container_runtime=None,
image=None,
volumes=None,
env_vars=None,
**kwargs,):

"""
Execute a list of statements with optional container support.
Args:
statement_list (list): List of commands to execute.
job_memory (str): Memory requirements (e.g., "4G").
job_threads (int): Number of threads to use.
container_runtime (str): Container runtime ("docker" or "singularity").
image (str): Container image to use.
volumes (list): Volume mappings (e.g., ['/data:/data']).
env_vars (dict): Environment variables for the container.
**kwargs: Additional arguments.
"""
# Validation checks
if container_runtime and container_runtime not in ["docker", "singularity"]:
self.logger.error(f"Invalid container_runtime: {container_runtime}")
raise ValueError("Container runtime must be 'docker' or 'singularity'")

if container_runtime and not image:
self.logger.error(f"Container runtime specified without an image: {container_runtime}")
raise ValueError("An image must be specified when using a container runtime")

benchmark_data = []

for statement in statement_list:
job_info = {"statement": statement}
self.start_job(job_info) # Add job to active_jobs
self.start_job(job_info)

try:
# Execute job
# Prepare containerized execution
if container_runtime:
self.set_container_config(image=image, volumes=volumes, env_vars=env_vars, runtime=container_runtime)
statement = self.container_config.get_container_command(statement)

# Add memory and thread environment variables
if job_memory:
env_vars = env_vars or {}
env_vars["JOB_MEMORY"] = job_memory
if job_threads:
env_vars = env_vars or {}
env_vars["JOB_THREADS"] = job_threads

# Debugging: Log the constructed command
self.logger.info(f"Executing command: {statement}")

# Build and execute the statement
full_statement, job_path = self.build_job_script(statement)
process = subprocess.Popen(
full_statement, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE
Expand All @@ -856,19 +965,22 @@ def run(self, statement_list):
if process.returncode != 0:
raise OSError(
f"Job failed with return code {process.returncode}.\n"
f"stderr: {stderr.decode('utf-8')}\nstatement: {statement}"
f"stderr: {stderr.decode('utf-8')}\ncommand: {statement}"
)

# Collect benchmark data if job was successful
# Collect benchmark data for successful jobs
benchmark_data.append(
self.collect_benchmark_data([statement], resource_usage=[{"job_id": process.pid}])
self.collect_benchmark_data(
statement, resource_usage={"job_id": process.pid}
)
)
self.finish_job(job_info) # Remove job from active_jobs
self.finish_job(job_info)

except Exception as e:
self.logger.error(f"Job failed: {e}")
self.cleanup_failed_job(job_info)
continue
if not self.ignore_errors:
raise

return benchmark_data

Expand Down
1 change: 1 addition & 0 deletions conda/environments/cgat-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,4 @@ dependencies:
- paramiko
- pytest
- pytest-pep8
- pyopenssl>=23.2.0
20 changes: 0 additions & 20 deletions docs/Makefile

This file was deleted.

Loading

0 comments on commit 884d0bd

Please sign in to comment.