Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run caused orphaned MPI processes to accumulate on server #95

Open
ArtPoon opened this issue Dec 10, 2024 · 2 comments
Open

Run caused orphaned MPI processes to accumulate on server #95

ArtPoon opened this issue Dec 10, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@ArtPoon
Copy link
Contributor

ArtPoon commented Dec 10, 2024

Seems to be associated with MPI call here:

gromstole/freyja/process.py

Lines 567 to 581 in 8917d1e

cmd = ["mpirun", "-np", str(args.np),
"python3", "trim.py", paths.name, processed_files.name,
"--freyja", args.freyja,
"--minimap2", args.minimap2,
"--cutadapt", args.cutadapt,
"--outdir", args.outdir,
"--indir", args.indir]
if args.sendemail:
cmd.append("--sendemail")
try:
subprocess.check_call(cmd)
except subprocess.CalledProcessError:
sys.stderr.write(f"Error running {' '.join(cmd)}\n")

@ArtPoon ArtPoon added the bug Something isn't working label Dec 10, 2024
@GopiGugan
Copy link
Collaborator

Looks like the original issue with the run was that there was a corrupt zipped file:

ERROR: Traceback (most recent call last):
  File "/home/gromstole/.local/lib/python3.10/site-packages/cutadapt/runners.py", line 87, in run
    for index, chunks in enumerate(self._read_chunks(*files)):
  File "/home/gromstole/.local/lib/python3.10/site-packages/cutadapt/runners.py", line 101, in _read_chunks
    for chunks in dnaio.read_paired_chunks(
  File "/home/gromstole/.local/lib/python3.10/site-packages/dnaio/chunks.py", line 172, in read_paired_chunks
    bufend1 = f.readinto(memoryview(buf1)[start1:]) + start1  # type: ignore
  File "/home/gromstole/miniconda3/envs/freyja/lib/python3.10/gzip.py", line 301, in read
    return self._buffer.read(size)
  File "/home/gromstole/miniconda3/envs/freyja/lib/python3.10/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/gromstole/.local/lib/python3.10/site-packages/isal/igzip.py", line 297, in read
    raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
Traceback (most recent call last):
  File "/home/gromstole/Wastewater/gromstole/freyja/trim.py", line 278, in <module>
    tf1, tf2 = cutadapt(fq1=r1, fq2=r2, ncores=2, path=args.cutadapt, sendemail=args.sendemail)
  File "/home/gromstole/Wastewater/gromstole/freyja/trim.py", line 58, in cutadapt
    _ = subprocess.check_call(cmd)
  File "/home/gromstole/miniconda3/envs/freyja/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cutadapt', '-a', 'AGATCGGAAGAGC', '-A', 'AGATCGGAAGAGC', '-o', '/tmp/tmp7w5uc4x0', '-p', '/tmp/tmp_4qdzgpi', '-j', '2', '-m', '10', '--quiet', '/home/wastewater/uploads/waterloo/run172/A3-1698-V5SP_S120_L001_R1_001.fastq.gz', '/home/wastewater/uploads/waterloo/run172/A3-1698-V5SP_S120_L001_R2_001.fastq.gz']' returned non-zero exit status 1.

$ gzip -t /home/wastewater/uploads/waterloo/run172/A3-1698-V5SP_S120_L001_R1_001.fastq.gz

gzip: /home/wastewater/uploads/waterloo/run172/A3-1698-V5SP_S120_L001_R1_001.fastq.gz: unexpected end of file

But that doesn't explain why the child processes didn't terminate gracefully. We are capturing the subprocess.CalledProcessError

@GopiGugan
Copy link
Collaborator

I wonder if this might be the reason:

By default, this error handler aborts the MPI job, except for I/O function errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants