Concurrent Runs / Run while data files are being uploaded #58

GopiGugan · 2022-06-02T14:26:37Z

The autoprocess.py script sometimes runs twice resulting in duplicate *.mapped.csv and *.coverage.csv files. This occurs because another instance of autoprocess.py starts before the first instance terminates.
The script starts while data files are being uploaded resulting in the following error:

ERROR: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/cutadapt-1.18-py3.6-linux-x86_64.egg/cutadapt/pipeline.py", line 399, in reader_process
    for chunk_index, (chunk1, chunk2) in enumerate(read_paired_chunks(f, f2, buffer_size)):
  File "/usr/local/lib/python3.6/dist-packages/cutadapt-1.18-py3.6-linux-x86_64.egg/cutadapt/seqio.py", line 890, in read_paired_chunks
    bufend2 = f2.readinto(memoryview(buf2)[start2:]) + start2
  File "/usr/lib/python3.6/gzip.py", line 276, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.6/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.6/gzip.py", line 482, in read
    raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached

The text was updated successfully, but these errors were encountered:

ArtPoon · 2022-06-07T19:20:14Z

Is it possible to use file modification dates to check whether a new upload has started after initializing a script?

ArtPoon · 2022-10-19T14:25:42Z

Still need solution for cases where the pipeline is run while data files are being uploaded

ArtPoon · 2022-10-19T14:27:20Z

Would a possible fix be to skip fastq.gz files that are incomplete?
Also edge case of R1 file being present but R2 being absent.

ArtPoon · 2023-03-07T15:01:16Z

Still no solution if pipeline is run while a user is uploading new data

ArtPoon · 2024-04-09T13:57:13Z

The remaining issue is that if a lab uploads new data to the server while the pipeline is running, the pipeline will terminate when it attempts to read an incomplete file (partial upload).
Can Python detect when a file is being written to by another process? If the pipeline encounters a file that is locked for writing by another process, it should delay reading it. If it is still locked after a reasonable delay (10 minutes?) then the pipeline should exit with an error.
@GopiGugan described another approach where we use the run summary file (manifest of FASTQ files in the run) to determine what files to look for; the pipeline would catch exceptions where a file is incomplete and it would not be written to the database. When the pipeline is run again, that file would be flagged as new for processing.

GopiGugan mentioned this issue Apr 1, 2024

Overlapping Freyja Runs #90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent Runs / Run while data files are being uploaded #58

Concurrent Runs / Run while data files are being uploaded #58

GopiGugan commented Jun 2, 2022

ArtPoon commented Jun 7, 2022

ArtPoon commented Oct 19, 2022

ArtPoon commented Oct 19, 2022 •

edited

Loading

ArtPoon commented Mar 7, 2023

ArtPoon commented Apr 9, 2024

Concurrent Runs / Run while data files are being uploaded #58

Concurrent Runs / Run while data files are being uploaded #58

Comments

GopiGugan commented Jun 2, 2022

ArtPoon commented Jun 7, 2022

ArtPoon commented Oct 19, 2022

ArtPoon commented Oct 19, 2022 • edited Loading

ArtPoon commented Mar 7, 2023

ArtPoon commented Apr 9, 2024

ArtPoon commented Oct 19, 2022 •

edited

Loading