Skip to content

Releases: molgenis/NGS_Automated

4.3.0

13 Sep 12:52
548e669
Compare
Choose a tag to compare
  • included RNA pipeline

  • included nanopore pipeline

    • copyRawNanoporeDataToPrm.sh (copy fastQ and pod5 files to prm)
    • copyRawNanoporeDataToTmp.sh (copy raw data to tmp)
    • calculateMd5NanoporeFastQ.sh (calculates checksums on nanopore machine)
    • startNanoporePipeline.sh (starts vip nextflow pipeline)
  • added constraint to prevent inhouse pipeline running on dragen node and vice versa

bugfixes:

4.2.0

25 Mar 14:16
a087eeb
Compare
Choose a tag to compare

New

  • includes the new nf_ngs_dna WGS pipeline (nextflow)
    • new script startNextflowDragenPipeline.sh
  • new script that (re)moves the sequencing data from the sequencers_incoming to the sequencers folder
    • data will be removed 2 days after .transferCompleted file is created
  • added build column in the samplesheet when not present (in moveAndCheckSamplesheet)

Edits

  • genomescan will now write UMCG_CSV samplesheet in the root of the batch (instead of in the Raw_data folder)
  • tar gzipped jobs file is now without complete data structure, only contains "basename"

Bugfixes

  • when the piped head is finished faster than the entire reading of the file (which leads to an error), using tail now instead
  • when Genomescan uses sequencers with longer names
  • when demultiplexing-only a finished state was never reached. (removed wrongly continue command)

4.1.0

08 Dec 12:22
76a6e10
Compare
Choose a tag to compare

bugfixes:

  • moveAndCheckSamplesheet converts from dos/mac to unix format with sed commands (instead of a non existing mac2unix/dos2unix on the new machines)
  • moveAndCheckSamplesheet, when the samplesheet is a genomescan samplesheet certain (required for in house) columns of the samplesheet are not filled. For GS the check will be skipped

added:

  • capturingkit name to the projectname in DRAGEN data (PullAndProcessAnalysis.sh)

  • cleanup script that will cleanup data on tmp

  • cleaning up samplesheets on tmp (is being distributed over all clusters)

  • check in copyProjectDataToPrm that checks whether the rawdata is copied to prm

  • check in copyRawDataToPrm that will skip the splitting of the samplesheet per project when it is demultiplexing only

  • parsing the metrics file for the trendanalysis

  • new group config: umcg-pr.cfg

  • .discarde in atd and gd config, notification for a failed demultiplexing will be mailed

  • cleaning up code in startPipeline

  • updated some configs

4.0.0

31 Aug 13:50
8b4c725
Compare
Choose a tag to compare

Introduction of the bucket system.
The analysis column (for NGS_DNA and GAP) in the samplesheet determine which pipelines will be run. Running multiple pipelines can be selected by a + sign (i.e. NGS_Demultiplexing+NGS_DNA)

  • bucket system for pipelines(on tmp0X)**

    • folders: tmp, Samplesheets, projects and runs have an additional subfolder with the pipeline name i.e. projects/NGS_DNA/ProjectXX and tmp/GAP/GSA_v3-XXX
    • logs folder and prm remains the same as before
  • new script; moveAndCheckSamplesheets.sh (originated from MoveSamplesheets):

  • d dat_dir, to overwrite the server.cfg/sharedConfig variable: DAT_ROOT_DIR

    • parsing the samplesheet to see what the first step of the pipeline is (and send it to the correct bucket/samplesheetsfolder)*
      * the value in the samplesheet is now hardcoded until the Darwin team make this variable (e.g. for the NGS_DNA pipeline it is by default NGS_Demultiplexing+NGS_DNA**
  • new script; splitAndMoveSamplesheetPerProject.sh, that handles the splitting of the samplesheet into projects and moving (if required) from NGS_Demultiplexing to NGS_DNA Samplesheets folder

  • new script; copyRawDataToTmp.sh, that will run on the chaperone machines (where the prm storage is mounted) that handles the copying of the rawdata to tmp (since the introduction of the new diagnostic clusters it is no longer possible to pull data).
    This step is required when the rawdata is no longer available on the diagnostic cluster

    • the script will scan the logs directory of tmp0X on the diagnostic clusters and search for ${project}.data.requested files. These files are created by the NGS_DNA/GAP pipeline and are already in the correct format to be used directly by the rsync command.
  • Dragen pipeline is also part of the NGS_Automated

    • Merged PullRawDataFromDS.sh and processGsRawData.sh into one file: PullAndProcessGsRawData.sh
      • the raw data from Genomescan is now one directory level deeper (e.g. ${gsBatch}/Raw_data/123.fastq.gz instead of in 104832-062/123.fastq.gz)
    • Created new file for pulling and processing Dragen data: PullAndProcessGsAnalysisData.sh
      • analysis data (such as bams, gvcf and vcf files) are in the ${gsBatch}/Analysis/ folder, per Sample there is one folder. (e.g. 104832-062/Analysis/sample1/sample1.gvcf.gz)
      • The script will merge all the A,B,C etc samplesheets (e.g. GS_118A, GS_118B) in one samplesheet without the suffix (e.g. GS_118.csv)
    • new script that runs the Dragen pipeline: startDragenPipeline.sh
      • it is executed in the umcg-genomescan group or in its test group (umcg-gst)
      • it will execute the NGS_DNA pipeline with the workflow_DRAGEN.csv workflow (also part of the NGS_DNA)
  • copyProjectDataToPrm.sh has extra arguments

    • -d dat_dir, to overwrite the server.cfg/sharedConfig variable: DAT_ROOT_DIR
    • -p in which samplesheets folder (which pipeline) should the script search (e.g. NGS_DNA, GAP)
    • rawdata will be processed on the same machine as where the NGS_DNA and GAP pipeline run, build in a check if the rawdata has been copied to prm yet, project data will not be copied until.
  • copyRawDataToPrm.sh has extra arguments:

    • -p in which samplesheets folder (which pipeline) should the script search (e.g. NGS_Demultiplexing, DRAGEN, AGCT)
    • -f, this argument can be used when the user that executes the script is pulling rawdata not from the inhouse demultiplexing and it comes from a different group. (in case of genomescan/dragen the option looks like this: -f run01.processGsRawData.finished). This will overwrite the RAWDATAPROCESSINGFINISHED parameter in the group.csv file. This argument will also set a mergedSamplesheet variable to true. (see below for more info about the mergedSamplesheet)
    • if the data is copied there will be a message to the diagnostic cluster that the rawdata is finished (this is needed as explained above in the copyProjectDataToPrm part)
      • for regular inhouse data it will create per project in the logsfolder on the diagnostic cluster: run01.rawDataCopiedToPrm.finished
      • in case of genomescan/dragen, the mergedSamplesheet variable is true, then the run01.rawDataCopiedToPrm.finished will only be created in the merged projectfolder name (e.g. GS_118)
  • new Trendanalysis scripts

    • copyQcDataToTmp.sh (copies all the data from a chaperone to a diagnostic cluster)
    • trendanalyse.sh (runs the actual trendanalysis)
    • copyTrendAnalysisDataToPrm.sh (copies the reports back to prm)
  • notification script is now sending messages to MS teams instead of mailing

    • message send --> .channelsnotified logfile when message send to MS teams

3.8.0

11 May 14:09
79d8284
Compare
Choose a tag to compare

notifications.sh:
added argument for debugging (-s phase:state) this option will let you run/generate output for solely that specific combination
new method for timing scripts (max duration can be set via the group cfg files)
copyProjectDataToPrm.sh:
removing samplesheet

array added:

gendercheck
missingsamples check

NGS_Automated-3.7.1

14 Oct 12:25
a62c529
Compare
Choose a tag to compare
Merge pull request #223 from RoanKanninga/master

essential bugfix for track and trace

NGS_Automated-3.7.0

28 Sep 10:57
f247fe4
Compare
Choose a tag to compare
  • added trendAnalysis.sh to monitor lab, NGS and array results over time.
  • minor changes to pipelineTiming.sh
  • minor changes to copyProjectdataToPrm.sh

NGS_Automated-3.6.3

08 Jul 07:41
b23c38f
Compare
Choose a tag to compare
  • fixing wrong logdir
  • removing creating old symlinks for concordance check (causing an error due to a removed variable)

NGS_Automated-3.6.2

23 Jun 06:57
47e89b6
Compare
Choose a tag to compare
  • Bugfix for critical error message to report a mismatch in number of samples in samplesheets
  • Create extra log file with list of files at the moment we signal an upload is complete in attempt to figure out what cause the "No GS samplesheet present" error.
  • updated datastaging server for GS (from cher-ami to medgendataxfer)

NGS_Automated-3.6.1

17 May 07:19
29c9393
Compare
Choose a tag to compare

Minor bugfix for missing track and trace in archive mode for copyRawDataToPrm.sh.