Changes

LIST OF CHANGES
---------------

 - Changed/extended tests for npg_pipeline::function::autoqc to ensure that
   the tests work with changes in https://github.com/wtsi-npg/npg_qc/pull/895
   (study_specific assessment is disabled for lanes). Tested that a pools that
   has samples from multiple studies does not cause problems.

release 68.7.0 (2024-12-02)
 - npg_pipeline::function::autoqc
   - Simplified the flow of the code.
   - Made clearer logic for choosing QC checks to run, provided comments.
   - Allowed for running the review QC check on lanes.
 - Move from Miniconda to Miniforge

release 68.6.0 (2024-10-24)
 - The runfolder_path attribute is passed to the constructor of the review
   autoqc check object when deciding whether this check should run.

release 68.5.1 (2024-10-04)
 - Added .github/dependabot.yml file to auto-update GitHub actions
 - Following a release on 07/09/2024, see https://metacpan.org/dist/App-perlbrew/changes,
   the checksum of the script served by https://install.perlbrew.pl had changed.
   https://install.perlbrew.pl is a redirect to raw
   https://github.com/gugod/App-perlbrew/blob/master/perlbrew-install, so
   the change originates from GitHub and can be trusted. Our CI flow compares
   the checksum of the downloaded script to the expected value. We now store
   an updated expected checksum value, which corresponds to the latest release.
 - GitHub CI - updated deprecated v2 runner to v3

release 68.5.0 (2024-09-04)
 - The runfolder_path argument is added to the command for the autoqc review
   check. See https://github.com/wtsi-npg/npg_qc/pull/869

release 68.4.0 (2024-08-06)
 - Ensured mark duplicate method can be inferred for a product with multiple
   studies (tag zero).
 - Upgrading tests
   - Use contemporary run folders for tests (NovaSeqX)
   - Clean fixtures
   - Prevent tests from accessing live databases (reset HOME)

release 68.3.0 (2024-05-24)
 - Removing Tidyp dependency from CI
 - Added 'SampleSheet.csv' file from the top level of the run folder to
   a list of archived run-level Illumina data. This file is only present
   in MiSeq run folders.

release 68.2.0
 - Added '--process_separately_lanes' to the pipeline to explicitly exclude
   multiple lanes from a merge.
 - Generalised dir_path method in npg_pipeline::product. This fixed a bug in
   npg_run_is_deletable, which manifested in wrong expectations about the
   directory tree for partially merged run data. 
 - Dropped a check for DRAGEN analysis data from npg_run_is_deletable.
 - Removed unnecessary tests from t/10-pluggable-central.t

release 68.1.0
 - Apply changes to the code and tests, which follow from removing some
   functionality from npg_tracking::illumina::runfolder, see
   https://github.com/wtsi-npg/npg_tracking/pull/807. The pipeline retains
   all its previous functionality.

release 68.0.0
 - Use st::api::lims->aggregate_libraries() method for both 'merge_lanes' and
   'merge_by_library' pipeline options. This is a breaking change as far as
   archival and deletion of NovaSeq Standard workflow data is concerned.
   Key change for lane merging for this data will be that tag 0 and tag 888 will
   not be merged across the lanes.
   The NovaSeq Standard workflow, where there is only one input port, is
   different to the more general merging across lanes where the (claimed) same
   library has been sequenced. But this is not a valid reason to maintain separate
   code.
 - Deletable shadow folders are detected, but not considered as deletable
   for now. They are flagged in the log of the 'npg_run_is_deletable' script.
 - Removed all code that was used for the UKB project and the upload of Heron
   project data to CLIMB. Updated the archival pipeline function graph and its
   graphical representation.
    - Removed 'cache_merge_component' and 'archive_to_s3' pipeline functions.
    - Removed all functions for retrieving the QC state of the product
      from 'npg_pipeline::product'.
    - Dropped npg_pipeline::base dependency on QC database ('qc_schema'
      attribute for 'npg_qc::Schema'). Removed test fixtures for the QC database.
    - Deleted 'npg_receipt4run_is_deletable' script.
    - Dropped checks for files upload to the third-party cloud locations when
      deciding whether the run folder is deletable.
    - Updated examples in POD in 'npg_pipeline::product::release'.
    - Excluded redundant settings from 'product_release.yml' files used in
      unit tests.

release 67.1.1
 - Fixed correct pp collection root for MiSeq

release 67.1.0
 - Fix typo in analysis specific overrides for bwa_als_se mapping to bwa0_6
 - Add in use of autosome target regions for BGE libraries in seq_alignment
 - Add 'merge_by_library' pipeline boolean option. This options is automatially
   activated for NovaSeqX platform. It triggers a discovery of sets of data
   that belong to the same libraries. If cases like this are found, the pipeline
   is instructed, at the secondary analysis stage, to process this data as a
   single entity. In practice, if the same pool is sequenced in more than one
   lane of the run, sample data for the pool are merged across these lanes.
   The 'discovery' part of the algorithm is implemented in
   https://github.com/wtsi-npg/npg_tracking/pull/772
 - Removed provisions for inline indexes
 - Removed a check for rapid runs when deciding whether to merge
 - Stop warnings about an undefined value when writing to the log from
   npg_pipeline::function::seq_alignment
 - Some tests were creating test data in the package's source tree. These
   activities are redirected to temporary files and directories in /tmp
 - Removed listing of non-existing files from MANIFEST
 - Removed superfluous dependency on now removed st::api::request
 - Added a test to expose a problem with ref cache, which is resolved by
   https://github.com/wtsi-npg/npg_tracking/pull/761

release 67.0.0
 - Turn off spatial filter QC check for NovaSeqX
 - Switch to Perlbrew to obtain multiple Perl versions
 - Remove npg_ml_warehouse dependency
 - Enhance README with more context
 - Improve Markdown format consistency
 - Add images of DAGs, add links, fix a typo
 - Add info on data intensive P4 based steps

release 66.0.0
 - small tweak to seq_alignment so GbS samples with no study ref do not fail
 - switch off spatial filter for NovaSeqX
 - for NovaSeqX, default RNA analysis should be STAR
 
release 65.1.0
 - ensure per-product archival for NovaSeqX data
 - runs with data analysed on-board are not deletable
 - ensure per product archival impacts hierarchy
 - per product publish doc and variable name fixup
 - when platform_NovaSeqX is detected, set p4 parameter i2b_nocall_qual_switch
   to "on"
 - avoid archiving "Analysis" hierarchy with XML and InterOp

release 65.0.0
 - remove wr limit of p4s1 to specific flavor

release 64.0.1
 - set p4 parameter to fix bug in bwa-mem2 + non-consented human split

release 64.0.0
 - add bwa_mem2 flag to options.pm to allow override of default bwa analyses at pipeline invocation
 - update seq_alignment to recognise the bwa_mem2 flag and also default to bwa_mem2 for NovaSeqX platform

release 63.3.0
 - force no_target_alignment for haplotag libraries

release 63.2.0
 - Adapt the pipeline to load a syslog dispatcher file for npg_publish_tree

release 63.1.0
 - add configuration and functionality to support filtering and appending
   irods error log messages to syslog.

release 63.0.0
 - removed pp_archiver from the function graph.
 - removed scripts for CLIMB data maintenance
 - add haplotag QC check

release 62.2.0
 - CI
   - update version of github actions
   - change CI runner from Ubuntu 18.04 to ubuntu-latest
 - run deletion:
     accounted for a potential human split for GBS runs and lanes where
       a primer panel is set,
     when pp files are  missing on staging, error message extended with info 
       about a product

release 62.1.0
 - allow non consented human split for GBS runs
 - allow analysis of single-end runs with non-consented human split

release 62.0.4
 - fix tag 888 bug in force markdup_method:samtools for single-end runs

release 62.0.3
 - always use markdup_method:samtools in stage2 analysis for single-end runs

release 62.0.2
 - Adjust bam_flagstats QC check invocation in seq_alignment for nonconsented
   human split and XA/Y human splits
     nchs: always use --skip_markdup_metrics
     XA/Y splits: use --skip_markdup_metrics in the same way as the target subset

release 62.0.1
 - Prevent the file glob expansion by the shell when calling a loader
   for the run parameters XML file. A non-existing file might be passed
   to the loader.

release 62.0.0
 - Add substitution metrics to the seq alignment command.
 - Removed provisions for loading the old warehouse, the function
   for loading the old warehouse was removed some time ago.
 - Extended the ml warehouse loader job. For all warehouse loader jobs
   that are run prior to setting the 'qc complete' run status an extra
   script is invoked; it loads the content of RunParameters.xml Illumina
   file to the warehouse to facilitate automatic billing.  

release 61.2.0
 - Added a new pipeline function - archive_irods_locations_to_ml_warehouse

release 61.1.0
 - Heron artic full primer version will be uploaded to Majora.

release 61.0.0
 - Reverse logic for SamHaplotag in the seq. alignment function:
   --revcomp flag - should be used when is_i5_opposite attribute is false.
 - When loading main pipeline product to iRODS, added an option to create
   files recording the location of these products.  In future these files
   will be used to load the iRODS locations to the ml warehouse.
 - Purge Net::AMQP::RabbitMQ and message notification system used for UKB
   Vanguard

release 60.6.0
 - Fixed bugs in autoqc checks validation for the npg_run_is_deletable
   scrip. Previously the runs where libraries had the primer panel set
   were not autoqc-deletable since the insert_size check results, which
   are not run for such products, were absent in the database, but were
   expected by the code. A similar problem is fixed for ref_match check
   results for GBS runs.

release 60.5.0
 - Add Targeted NanoSeq Pulldown to Duplex-Seq analysis library_type list
 - Refactored existing code to create new public functions, which will be
   used in the extension of npg_run_deletable script dealing with decisions
   about correctness of archival of portable pipelines output.
 - Extended npg_run_deletable script to deal with the output of portable
   pipelines for a case when this output is archived to iRODS.

release 60.4.0
 - Removed co-location check (host and staging area) from the pipeline
   daemon code. This check is no longer relevant, the staging servers
   listed in the code are no longer in use. 
 - do haplotag processing for appropriate library types
 - switch off spatial filtering application and QC check for NovaSeq platform
 - Remove pipeline extensions for the npg_tracking daemon monitor,
   they are no longer in use.

release 60.3.0
 - Extended i5opposite pad to 5 bases.
 - Check for pp archive existence prior to job creation.
   pp archival could notcope with data produced by non-current
   pp versions. When the version of an artic pp is updated, the
   pp archival jobs that are scheduled after the update, but have to
   archive data, which was produced prior to the update, fail.
   The solution is to check that the pp staging archive for the 'current'
   pp version exists. If it does not exist, but there is an archive for
   some other version, data from that staging archive should be archived.
   Staging archives for multiple pp versions cause the code to error.

release 60.2.0
 - When generating lane taglist files truncate and pad i7 and i5 tags independently
 - Support no_auto, no_auto_archive and no_auto_analysis tags for daemons
 - npg_run_is_deletable - delete runs with status 'run cancelled' and
   'data discarded' after 14 days of the status date irrespectively of
   what study the samples belong to. The samplesheet is not available for
   these runs. Prior to this change the code was accessing an external
   source of LIMS data (XML LIMS API) to get information about the study.
   This change also helps to go around very long deletion times which
   are set for some studies. 
 - Create class/package level methods in npg_pipeline::product::release::irods
   to allow for reusing the logic about run and product iRODS collections
   paths in the code outside of this package. 
 - Stop loading data to the old warehouse. Remove
   the update_ml_warehouse_post_qc_complete function from the archival
   pipeline graph.
 - Use CRAM files as input for the pulldown metrics autoqc job.

release 60.1.0
  - GBS - block tag 0 from GBS pipeline and extra tests.
  - Update docs for samplesheet generation
  - Function graph for a reduced Heron pipeline

release 60.0.0

 - When HiC library type detected in seq_alignment, set appropriate flag values
   for bwa mem alignment.

 - Change seq_alignment to observe gbs pipeline allowed or not in product 
   release study config.

 - Configure status change jobs to optionally save statuses to a database.
   Use an updated script name in the job - npg_status_save.
   Saving to the database is disabled when either the 'local' or
   'no_db_status_update' attributes are set to true. The description of the new
   option appears in help. Using --local option is recommended in SOP when
   testing the pipeline, which will automatically prevent the new version of
   the job from saving statuses to the database. --local flag ensures that the
   directory with the new analysis is not visible to the production daemons,
   including the status daemon. Thus, before this change one of the
   consequences of using the --local flag was the new statuses not appearing
   in the database; this will continue to be the case.

 - Pass the portable pipeline repository URL to the autoqc generic job.
   pp_repo_url parameter is read from the study configuration for a
   portable pipeline and, if defined, is passed to the autoqc generic job
   so that the URL can be captured in the information about the autoqc
   generic check, which is saved by the generic result object.

 - Reimplement management of resources in the pipeline.
   -- Declaration of resources is moved to the input JSON function graphs, see
      https://github.com/wtsi-npg/npg_seq_pipeline/blob/devel/data/config_files/README.md
      for details.
   -- All pre-existing definition of resources (number of CPUs, memory, etc),
      either in other configuration files or hardcoded in the code, are removed.
   -- A new parent class npg_pipeline::base_resource for classes in the
      npg_pipeline::finction namespace, which are responsible for job
      definition generation. The new class has a factory function
      create_definition() for generating npg_pipeline::function::definition
      type objects. This class is also responsible for correct interpretation
      of resources specified in the input JSON function graph.
   -- Extension of the npg_pipeline::pluggable class to parse resource
      definitions from the JSON input graph and to pass this to the function
      implementor.
   -- Functions, which can be executed by using the --function_order pipeline
      argument, are restricted to the ones that are defined in the JSON
      function graph, which is used by the pipeline.

 - Remove provisions for generate_compositions function as it is no longer
   used.

 - Remove provisions for the upstream_tags autoqc check as it is no longer run
   in production as part of the analysis pipeline.

release 59.2.0
 - iRODS connections to be opened on-demand for validation
 - Added an option of having a new boolean flag 'accept_undef_qc_outcome'
   in the study configuration for a product for a particular archiver.
   If this flag is set to a true value, the return value of the
   has_qc_for_release method might return true in cases where previously
   it would have returned false. This is done in order to allow for
   archival of products which either passes QC or have never been through
   manual or robo QC. The data retention policy implementation is changed
   to take into account the new flag.
 - npg_majora_for_mlwh - dry_run option goes further in the processes,
   but avoids writes/updates
 - Use GitHub actions for CI in place of Travsi-CI
 - Unused JSON files for function graphs are removed.
 - Per-study product configuration file product_release.yml, which is used
   when creating jobs, is copied to the analysis directory to preserve
   run conditions
 - CI: replace Travsi-CI with GitHub Actions

release 59.1.0
 - More options for defining wr limit groups: allow an exact match
   to the last component of pipeline function class name.
 - An additional wr limit group - s3 - is configured.
 - The wait4path pipeline job does not have a log. To enable wr to
   recognise that these jobs are unique, echoing a random string
   is added to the shell command.
 - Remove the limit on the number of NovaSeq runs being archived at
   the same time. Introduce a limit on the number of runs,
   irrespectively of the instrument type, which are moved to archival
   within the last hour.
 - Include pipeline version and name when sending sequencing run metadata
   to the majora service.


release 59.0.0
 - Following a decision to send data to CLIMB regardless of artic
   QC status, the file glob of the data to upload is changed to
   locations where both passed and failed data are available.
 - Code for the archival and analysis daemons refactored:
     1. Removed provisions for the access to configuration files
        which have never been used.
     2. The analysis daemon is no longer responsible for marking
        runs as QC runs.
     3. Removed access to ml warehouse for retrieval of LIMs data
        since this information is no longer required.
 - The qc_run pipeline option is removed, it was supporting a way
   of setting up LIMs data for MiSeq runs which is no longer used.
 - The lims_driver_type pipeline option is removed, it has never
   been used, the pipeline will use the ml_warehouse driver by default
   when creating a samplesheet. Internally this option is available
   in some classes of the pipeline code base to indicate what driver
   should be used by the jobs, this functionality remains intact.
 - A simpler implementation of wr's limit groups to allow for setting
   a persistent limit globally and for using limit groups that map
   directly to accessors of the function definition object.
 - Code in npg_pipeline::product::heron::majora is reimplemented as
   a Moose class, most of the code of the npg_majora_for_mlwh script
   is moved to this class, a logger is introduced. Improved a way
   of matching a library type to majora metadata.

release 58.3.0
 - bugfix in the code for interaction with the Majora/COG-UK API:
   cope with no iseq_flowcell entry for resultset
 - function graph for the analysis pipeline - add early archival of
   the artic pp output to iRODS

release 58.2.0
 - enhancement of code for interaction with the Majora/COG-UK API
 - added analysis for Duplex-Seq libraries

release 58.1.0
 - added npg_climb2mlwh to update warehouse from uploaded
   climb data
 - added ability to use custom locations and/or names for the
   main log of the pipeline script
 - the main log of the pipeline script is copied to the analysis directory
 - add script for updating MLWH with state of Majora/COG-UK metadata

release 58.0.0
 - a class for generating job definitions for autoqc generic
   checks
 - implementation of job generation for autoqc generic checks
   for artic and ampliconstats
 - generation of the autoqc generic result for artic and the
   review result is removed from the stage2pp job for artic
 - generation of the autoqc generic result for ampliconstats is
   removed from the stage2App job for ampliconstats

release 57.17.0
 - a generic way to specify constructor options in the function
   listing in a registry and its implementation to iRODS archival
   jobs and a stage2pp job
 - implementation for the ampliconstats portable pipeline
 - new pipeline function - stage2App - and its mapping to the
   npg_pipeline::function::stage2pp class
 - a new portable pipeline to produce ampliconstats data and its
   mapping to the stage2App pipeline function

release 57.16.0
 - a new function for archival of pp data to iRODS
 - stage2pp function implementation is refactored to create common functions
   and attributes, which in future could be used by additional portable
   pipelines

release 57.15.0
 - append autoqc generic result generation at the end of ncov2019_artic_nf
   portable pipeline
 - tests update following a change in the default behaviour of the
   add_object method in WTSI::NPG::iRODS

release 57.14.0
 - switch to sample control flag when determining eligibility for
   pp data archival

release 57.13.1
 - made run deletion policy consistent with a change to eligibility
   for iRODS archival (see commit 457da605c9f7fe97f82954ffe7155ca96e034753),
   which makes non-products (tag zero and spiked PhiX tag) not being
   archived to iRODS if none of the lane products are archived to iRODS

release 57.13.0
 - a new script - npg_upload2climb - to perform the upload, which is
   specified in the definition generated by the pp_archiver function
 - extended the spiked phix i5 tag (SPIKED_PHIX_TAG2) to 10-bases
 - required arguments are passed to the npg_upload2climb script when
   the pp_archiver function job description is generated
 - the pp_archiver function is added to the archival pipeline graph
 - archival to CLIMB is skipped for samples with withdrawn consent
 - library type and primer panel are added to the CLIMB archival
   manifest
 - simplification of dependencies representation for LSF jobs in private
   functions of the LSF executor class, which fixes the little-understood
   problem of disappering dependencies for seq_alignment jobs when they
   are split between multiple LSF job arrays

release 57.12.0
 - a new function definition class npg_pipeline::function::pp_archiver,
   implementing two new pipeline functions - 'pp_archiver' and
   'pp_archiver_manifest'

release 57.11.0
 - a generic API for sequencing data metadata upload to a third party
   and a script for uploading metadata for Illumina
   sequencing platform
 - product-specific primer panel bed file in seq_alignment
 - simple robo QC step added straight after running the ncov2019-artic-nf
   portable pipeline; the step creates a utility (user) QC outcome

release 57.10.0
 - new function, stage2pp, for running portable pipelines straight
   after stage1 in parallel to seq_alignment

release 57.9.0
 - small change to seq_alignment.pm so it does not error if
   gbs_plex_name (primer_panel) is set but lib type incompatible
   with gbs analysis
 - when markdup_method is "none", add skip_markdup_metrics flag
   to bam_flagstats qc command

release 57.8.0
 - ability to apply limits to wr groups of jobs and a limit for
   all iRODS jobs
 - function creating definitions for autoqc jobs - when evaluating
   whether the autoqc check should be run:
     reduce run time by passing to the autoqc class instance,
     where appropriate, a lims object and fastq reference path;
     explicitly pass product_conf_file_path to this instance
 - iRODS archival of non-products is driven by settings of products,
   i.e if all products in the lane should not be archived to iRODS,
   non-products (tag zero and spiked PhiX tag) will not be archived
   either
 - remove the old warehouse loader from the analysis function graph
 - remove function for illumina qc analysis archival (old way of
   saving InterOp data to QC database)

release 57.7.0
 - cluster count check and p4stage1 functions use new class
   (npg_qc::illumina::interop::parser) to parse Illumina InterOp files
 - change npg_pipeline::product::release to use tertiary config
 - new qc_interop function to run interop autoqc check
 - simplification of the analysis function graph: number of mlwh
   updates is reduced to two, one after stage 1 and interop autoqc
   check and another towards the end of the flow

release 57.6.0
 - only set p4 parameter values for markdup_method and
   markdup_optical_distance_value when do_target_alignment is true;
   this also stops an error being thrown if the entity (for example,
   tag zero product) has multiple studies and references
 - fix haplotype caller check for a PCR free library type to be case
   insensitive
 - increase memory for bqsr and haplotype caller jobs
 - make test CRAM files compliant with samtools v.1.10.0,
   which gives an error if no header is present in a file

release 57.5.1
 - bug fix - correct node id in splice (for GbS)

release 57.5.0
 - add BWA MEM2 support to seq_alignment function
 - bug fix: add -f to rm command removing intermediate files (to
     avoid error when no intermediate files are present)
 - allow selection of duplicate marking method (biobambam,samtools
     or picard) in seq_alignment via product_release.yml
 - detect flowcell type and set uses_patterned_flowcell attribute
     to allow setting of optical duplicate region size
 - add ability to select bwakit postalt processing (if reference has
     alternate haplotypes) in seq_alignment via product_release.yml

release 57.4.0
 - change genotype qc check to cram input
 - LSF array indexes fix for jobs dealign with chunked data
   (multiple jobs per product)
 - esignate no_archive directory for files for chunked entities,
   which are not end products
 - haplotype caller function: early detection of prducts that are
   not for release (tag zero and control)

release 57.3.0
 - add chromium libs (forced no target alignment) to bam prune skip
   list in seq_alignment
 - archival pipeline function for deletion of intermediate files
 - script to generate receipts files to be used by npg_run_is_deletable
   scrit for one of teh studies

release 57.2.0
 - prune bam generation for most products with no alignment and
   change bam_flagstats command in seq_alignment to crams
 - skip markdup step in seq_alignment for spike tag
 - add haplotypecaller to function list
 - use only public run folder methods
 - path logic improvements

release 57.1.1
 - all components of npg_run_is_deletable script to use samplesheet
   as a source of LIMS data

release 57.1.0
 - configurable study-level qc criteria for archival and for minimum
   delay for run folder deletion

release 57.0.3
 - add missing indexing step to merge_recompress

release 57.0.2
 - fix logic in WR dependancies where pipeline converges

release 57.0.1
 - fix where new code was not taking NPG_REPOSITORY_ROOT and add
   duplicated code to ref cache.

release 57.0.0
 - supply MD5 in bucket file upload if available in sibling md5 file
 - add function to support GATK HaplotypeCaller and apply BQSR
 - add function to concat and recompress gVCFs
 - add function to calculate BQSR table
 - cram files as input to the adapter autoqc check
 - make list of files due to be archived dependent on alignment
    confuguration of the study
 - run folders for test data restructured to reflect new-style
    product hierarchy and not to use outdated path component
    names (bustard, etc)

release 56.1.0
 - move reference cache from seq_alignment to own singleton class
 - remove provisions for old-style run folders
 - qc_review function added
 - provisions for splitting a product into chunks
 - to be forward compatible with changes in tracking, remove direct
   dependency of the pipeline daemon on the short_info and location
   tracking roles
 - ability to run the pipeline for individual products; some archival
   pipeline functions updated to enbable this ability on their level
 - autoqc adapter check - give cram files as input

release 56.0.1
 - ensure that the paths serived from the archive directory in
   different parts run_is_deletable utility are consistent.
 - add autosome stats file to product release
 - add missing bait prune to seq_alignment

release 56.0.0
 - add autosome target to seq_alignment
 - pipeline configuration module and product release configuration
   accessors are moved to npg_tracking package in order for the product
   configuration be accessible from other packages, code in this
   package refactored to accommodate the change
 - conform to bambi's v 0.12.0 file and directory naming schema for
   tileviz data
 - add facility to do LSF 1:1 job index dependencies on array jobs
 - when validating run folder for deletion, ensure linked directories
   and files are recognised

release 55.2
 - switched from S3 to Google Compute Storage
 - change bcfstats qc job to use CRAM instead of BAM file as input

release 55.1
 - added configuration option to change the S3 endpoint URL

release 55.0.1
 - bug fix for invocation of the generate() function in the
   seq_alignment function module following an addition of the
   generate_composition function

release 55.0
 - additional 'GnT MDA' library type added to allowed types for gbs analysis
 - a new archival pipeline function, cache_merge_component, for caching merge
   candidates as a part of the archival pipeline
 - no overwriting existing tileviz files when scaffolding teh runfolder
 - a new function, generate_compositions, for generating composition JSON files
 - npg_run_is_deletable:
    cross-checks for all file archival destinations to ensure that each
      product is archived in at least one destination;
    full logic for validating correctness of s3 archival

release 54.1.2
 - set explicit umask for wr jobs to guarantee that output is group-writable

release 54.1.1
 - bug fix in command generation for iRODS data archival from old-style
   run folders

release 54.1
 - minor speed-up in seq_alignment function due to caching of
   unseccessfully retrieved references
 - npg_run_is_deletable understands per-product iRODS collections and
   make runs that have products archivable to s3 not deletable
 - function for saving fastqcheck files is removed from the archival
   pipeline function graph, implementation of this function is deleted
 - changes of p4_stage1 and seq_alignment functions to accommodate
   removal of fastqcheck files generation in respective p4 templates

release 54.0
 - archival function graph includes publishing both to s3 and iRODS
 - a function graph for post 'run archived' small pipeline
 - no_s3_archival flag to switch off archival to s3 and notification by
   a message, false by default, is automatically sey to true if the
   local flag is set to true
 - per-product restart file for iRODS publisher
 - function definition for a job to wait to move from the analysis
   to the outgoing directory
 - wr job log file to be appended to if the job is retried
 - propagation of the iRODS settings to wr jobs
 - persistent mode for RabbitMQ message delivery

release 53.1
 - publishing of seq data to iRODS:
     make product destination aware;
     iRODS directories hierarchy for NovaSeq runs to mirror product
     archive directories hierarchy
 - run data validation (npg_run_id_deletable acript) reimplemented to provide
   support for new style of run dolder and merged entities.

release 53.0
 - a wrapper object npg_pipeline::product to represent a product
 - use products attribute to drive p4_stage1, seq_alignment and autoqc
 - create composition.json files to guide archiving
 - p4 params files for seq_alignment moved from no_cal/laneN to no_cal
     (changes run folder structure when merging lanes)
 - cluster_count and seqchksum_comparator checks now done at run level instead
     of lane level
 - upfront definition of all products
 - generic runfolder scaffolding for any products
 - since the top-level qc directory is no longer required, the tileviz
   directory is moved to the analysis directory
 - reshuffle of roles in npg_pipeline::roles:
     npg_pipeline::roles::business::base merged into npg_pipeline::base;
     npg_pipeline::roles::business::flag_options moved to
     npg_pipeline::base::options, a number of pipeline options from other
     modules moved to this role;
     npg_pipeline::roles::accessors moved to npg_pipeline::base::config;
     helper functions moved to a new role - npg_pipeline::function::util
 - ref_adapter_pre_exec_string method renamed to repos_pre_exec_string
 - metadata_cache_dir method, formerly in npg_pipeline::roles::business::base,
   removed; npg_pipeline::function::p4_stage1_analysis module, the only user
   of this function, switched to use the relevant accessor from the
   npg_pipeline::runfolder_scaffold role
 - minor changes for bcfstats qc check
 - executor type (lsf or wr) can be specified in the configuration file
 - wr executor:
     set per-job priority;
     increase priority for p4 stage 1 job and its predecessors;
     set priority of status and start-stop jobs to zero so that
     they are executed immediately, but still within dependencies
     and memory constraints;
     map queues to arbitrary wr options, in particular, a special queue
     for p4_stage1 maps to a specific cloud host flavour
 - correction of build method for rpt_list attribute in product
 - make bam_cluster_count_check pipeline job dependent on
     qc_spatial_filter (in function_list_central.json)
 - archival daemon - limit number of simultaneously archived NovaSeq runs
 - wr executor - explicitly propagate pipeline's environment to jobs
 - illumina archiver job:
     exclude discontinued verbose attribute and paths that are not needed
     for the minimal work this loader is doing now;
     remove LSF preexec requesting that the job is a unique runner since
     db queries are much simpler now
 - change signature of the autoqc archival job in line with extended
   functionality of the autoqc db loader (ability to find JSON files
   in the run folder)
 - change components_as_products method of npg_pipeline::product to
   return a list with one item when there is only one component in
   the composition (instead of an empty list)
 - tileviz index file with links to lane-level tileviz reports is created
 - seq_alignment supports HISAT2 aligner for RNA libraries
 - explicit iRODS destination collection is set for iRODS loaders,
   /seq/illumina/runs/RUN_ID for NovaSeq runs and /seq/RUN_ID
   for the rest
 - explicitly use iRODS loader from an 'old' dated directory for
   old style runfolders
 - a new function, archive_run_data_to_irods, to publish run-level non-product data to iRODS
 - modify run_data_to_irods_archiver module to ensure the interop files go to a dedicated directory
 - additional tags for NovaSeq in dbic_fixtures

release 52.1
 - bug fix in jobs names where jobs name should include the pipeline
   name: pipeline name is now propagated from the pluggable module
   to the function module; bug manifestation - job names contained
   function module name instead of the pipeline name, ie, for
   example prod_pipeline_end_26263_start_stop instead of
   prod_lsf_start_26263_central
 - pipeline name attribute is derived from the script name that
   invoked the pipeline, making it unnecessary to explicitly pass
   the function list name in the archival pipeline script
 - fix for seq_alignment so specified rna aligners do rna analysis
 - added (samtools) target stats to stage2 analysis
 - correct p4 prunes for samtools stats (target/baits)

release 52.0.5
 - bug fix in npg_run_is_deletable: stop using unsupported options
   for npg_pipeline::cache
 - npg_run_is_deletable should not expect adapter qc results for a
   pool, the source files do not exist since release 52.0
 - add log archiver to the end of the archival pipeline
 - use outgoing paths for jobs which are run after the run_qc_complete
   function; this patch also fixes the log file path for lsf_end job of
   the archival pipeline, which previously was always in outgoing

release 52.0.4
 - bug fix: change path for a file with LSF commands to a path in
   outgoing for jobs that run after the run was moved to the outgoing
   directory

release 52.0.3
 - bug fix: use analysis_path instaed of bam_basecall_path in a method
   that is used by both analysis and archival pipelines; the value of
   bam_basecall_path is available only when explicitly set, ie only
   in the analysis pipeline

release 52.0.2
 - allocate more memory to sequence_error and insert_size autoqc
   checks since they now use newer bwa, which creates twice larger
   reference index

release 52.0.1
 - alignment of tag#0 not done by default (align_tag0 flag added)

release 52.0
 - remove dependency of tests of LIMs XML, use samplesheet instead
 - remove dependency on tracking XML feeds
 - update p4 stage1 default values in general_values.ini
     restored p4_stage1_split_threads_count=4
 - removed illumina_basecall_stats function and associated code
 - remove generation of empty fastq and fastqcheck files
 - removed bam2fastqcheck_and_cached_fastq function
 - removed create_archive_directory function, scaffolding the runfolder
   is called in the beginning of the pipeline within the 'prepare'
   method of the analysis pipeline
 - increased number of threads for p4 stage1 (newer bambi version required)
 - added LSF-independent evaluation for number of threads
 - removed redundant dependency on illumina2bam jars
 - stopped forcing ownership and permissions when creating
   new directories
 - single log directory for all jobs with per-function subdirectories
 - added LSF-independent for number of threads
 - added wr executor
 - new modules to execute submission of definitions to LSF
 - captured dependencies between pipeline steps in a directed acyclic graph
 - moved flags, attributes and method related to the overall
   pipeline logic to npg_pipeline::pluggable
 - flattened directory structure for modules implementing functions,
   they all now belong to npg_pipeline::function namespace
 - removed methods representing functions, created mapping of
   functions to modules, methods and options in
   npg_pipeline::pluggable::registry
 - removed ::harold:: component from pipelines'namespace
 - removed post_qc_review pipeline module
 - added npg_pipeline_ prefix to this package's script names if
   was not part of their name
 - removed unused module for fixing Illumina config files
 - removed unused module for LSF job creation for tag deplexing -
   this is now done within p4 stage 1
 - removed unused implementation for function copy_interop_files_to_irods
 - removed unused spatial_filter, fix_broken_files and force_phix_split flags
 - removed a number of unused methods in npg_pipeline::base
 - no lane-lavel bam files are produced by p4 stage1 for pools - do not run
   the adapter check in these cases
 - adapterfind flag added to switch adapterfind on/off (default: on)
 - scaffolding of runfolder includes .npg_cache_10000 directory creation (lane and plex)
 - stage1 analysis: parse interop data for cluster count calculation (used for 10K subsampling)
 - seq_alignment reads tag_metrics files to calculate fraction for 10K subsampling
 - seqchksum_comparator function now uses seqchksum files from analyses (no regeneration)
 - QC spatial_filter now run as standard QC check
 - add p4s2_aligner_intfile flag to force temporary file production in stage2 alignment
 - p4 stage1 splice/prune directives moved from vtfp command line to params file

release 51.12.2
 - fixed lane taglist files for TraDIS libraries
     no longer pad spiked phix tag simply add missing i5 tag for dual index runs
 - update p4 stage defaul values in general_values.ini
     p4_stage1_memory=20000, +p4_stage1_slots=8, +p4_stage1_i2b_thread_count=8

release 51.12.1
 - tweak to GbS library type check in seq_alignment.pm as arrived as GBS (now case-insensitive).

release 51.12.0
 - Travis CI build - add iRODS test server
 - run_is_deletable script moved to this package from data_handling,
     custom conversion between run id and run folder path refactored to use
     npg_tracking::illumina::runfolder,
     lims-driver-type argument is added to reset the default samplesheet driver type,
     iRODS build is added to Travis CI configuration to enable all new tests to run,
     Log::Log4perl is used for logging
 - added support for GbS processing
 - travis build tweak for npg_qc

release 51.11.3
 - seq_alignment: fixes for no target alignment and no target alignment+non-consented human split

release 51.11.2
 - use align_intfile_opt=1 when aligning with star to produce intermediate bam file
 - by default, force bambi i2b to single-threading (general_values parameter available for override)

release 51.11.1
 - Handle dual indexes (create new format lane tag files)
 - remove remaining broken provisions for xml LIMs driver
 - use the new log publisher
 - now allows XA/Y-split with no target alignment

release 51.11.0
 - added support for RNA analysis/quantification using STAR and salmon
 - STAR alignment jobs get more memory using bmod after seq_alignment jobs have been submitted.
 - removed unneeded coordinate sort and duplicate marking when there is no alignment to a target reference

release 51.10.3
 - no alignments for chromium libraries
 - seq_alignment to do_rna analysis regardless of the organism specified (other conditions stay in place)

release 51.10.2
 - use bwa aln for human split with tophat target alignment

release 51.10.1
 - Modified qc run function list, removed copy_interop and switched archive_to_irods to samplesheet

release 51.10
 - Chained execution of RNA-SeQC to the vtfp/viv alignment cmd for RNA-Seq libraries only:
     entries for qc check rna_seqc removed from central function and parallelisation.
     code that created rna_seqc-specific directories has been removed as this is
     now handled by the check itself using qc_out arg.
 - remove GCLP-specific code and configuration files
 - remove unused force_p4 attribute
 - OLB analysis removed
 - recalibration removed
 - pb_cal_path and dif_files_path accessors disabled
 - allow p4 stage 1 to analyse runs with different length reads
 - illumina2bam function removed
 - update p4 stage 2 (seq_alignment) warn rather than croak if multiple references for tag 0
 - update p4 stage 2 (seq_alignment) to use bambi chrsplit instead of SplitBamByChromosomes.jar for Y-split runs
 - pipeline scripts - redirect stderr output to the log to capture output from all
   NPG and CPAN modules in one place

release 51.9
 - p4stage2 speed-up by caching references
 - p4stage2 errors in getting a reference made fatal
 - iRODS publish script new options: (1) --restart_file to pin the script's
     process file name to a particular LSF job, (2) --max_errors to force the script to
     fail after certain number of errors (10 specified in the configuration file)
 - seqchksum_comparator test fixed for gseq by generating a cram file with a header
   that lists a reference available on gseq and supressing an outside search by
   setting REF_PATH to an invalid value; the test will continue to work on hosts
   where REF_PATH i sset and available
 - consistent computation of absolute path, which takes account of substitution

release 51.8
 - when comparing checksums, generate seqchksums for each cram file and merge
   the results rather than merging the cram files and generating seqchksum

release 51.7
 - replaces the original log role with the one from DNAP utilities,
   which provides a Log4perl logger and some convenience methods.
 - new signature for the sequencescape warehouse loader so that it uses
   samplsheet LIMs driver at the analysis stage and ml_warehouse_fc_cache
   LIMs driver at the archival stage

release 51.6
 - test and code fixes to ensure problem-free tests under Perl 5.22.2
 - tweak to qc_report_dir in bsub command for one library per lane case
 - fix convert-low-quality flag in bambi decode command; set bid_implementation always to bambi

release 51.5
 - update p4 stage 2 (seq_alignment) to handle all cases (e.g. no target alignment, spike tag)
 - support generation of targeted stats files in seq_alignment.pm with p4
 - qc jobs creation, can_run, check object instantiation:
     do not supply path/qc_in, which is now optional
     do not set attributes that the object does not have
 - allow specification of implementation (java or bambi) of illumina2bam and bamindexdecoder
    in p4 stage 1 via general_values.ini
 - add Broad Institute's RNA-SeQC to list of autoqc checks
 - run bam_flagstats autoqc check via the qc script
 - tweak for targeted stats files and also human split

release 51.2
 - patch to script_must_be_unique_runner - only ignore exact matches to the job id
 - change function order to run p4 stage 1 analysis by default

release 51.1.1
 - extended is_hiseqx_run to detect HiSeq 4000 runs
 - samtools1 cat .. doesn't work with different references, replaced by samtools1 merge ..

release 51.1
 - replaced bamcat .. by samtools1 cat .. in seqchksum comparision
   previous command line was too long for large pools
 - changes for pools with >999 samples, LSF job array index now 5 digits
   modified tests
 - added lims_driver_type cli option

release 51.0
 - use npg_irods npg_publish_illumina_run.pl in place of data_handling irods_bam_loader.pl
 - provide appropiately changed second index read tags for ordered flowcell
   instruments (typically rev. complement) e.g. HiSeqX
 - in both the analysis and archival function order have an extra
   ml warehouse loader job to set the stage for loading to iRODS
 - warehouse loaders that are run after setting qc complete date
   wait for the runfolder to be moved to outgoing, their log location
   is updated accordingly

release 50.3
 - use 'purpose' field to decide if qc_run
 - study-specific software stack for the analysis pipeline
 - added new module for p4 stage1 analysis

release 50.2
 - names of the pipeline daemon modules and scripts harmonised
 - the daemon module does not inherit from the pipeline base class thus
   reducing the number of command line script options
 - common code moved from the daemon scripts to the daemon module
 - a new role for common accessors

release 50.1
 - bug fix to allow archival daemon to work (restore availabilty of run folder
   finding method)

release 50.0
 - purge carriage returns (as well line feeds) from study descriptions for RG header
   records (xml lims driver has previously done this as part of XML parsing)
 - require minimum version 5.10 for perl
 - add study analysis configuration accessor
 - simpler name for the archival daemon module
 - parent class for pipeline daemons
 - dry_run option for daemons
 - consistent behaviour of the archival and analysis daemons
   when LIMs data are not available in the ml warehouse and
   the run is not a QC run, the run is skipped
 - the pipeline daemons define the type of the pipeline to run
   (default, gclp, qc) and set appropriate backward-compatible
   options for the pipeline script
 - Log::Log4perl logger is used in pipeline daemons
 - cached samplesheet generation - use ml warehouse for all
   runs except QC runs, for which the old warehouse is still used
 - add npg_pipeline_job_env_to_threads script (to avoid excessive repeated perl one-
   liners in command arguments).
 - archival of logs should run after an asynchronous move to outgoing (peformed by the
   staging daemon) - paths adjusted and job preexec checking for the existence of the
   runfolder in outgoing is added

release 49.8
 - seq-alignment now uses bwa_aln_se for single read runs
 - disable log archival pending enhancements

release 49.7
 - seq-alignment - P4 and new bwa for older chemistries & forcing mem for alt references
 - bug fix after passing RG paramater to illumina2bam: change SplitBamByReadGroup options:
     do not set OUTPUT_COMMON_RG_HEAD_TO_TRIM, strip last component (runid_lane) from
     OUTPUT_PREFIX
 - use threading for bam and cram creation in seq_alignment
   references and GCLP
 - add attribute "gclp" common to analysis and archival scripts
 - function list config files always contain pipeline module name e.g. central

release 49.6
 - pass RG paramater to illumina2bam
 - add archive::file::logs

release 49.5
 - correctly determine path to SplitBamByChromosomes.jar
 - drop redundant do_markduplicates and not_strip_bam_tag options from args list
   of the old-style bam alignment script
 - factor out generation of bam_flagstats metics into a method
 - call new bam_flagstats execute method instead of invoking individual parsers explicitly -
   forward compatibility
 - npg_pipeline::cache - reuse_cache_only option added RT#486264
 - check for an inline index when calculating index_length

release 49.4
 - LSF job creation for autoqc checks - use qc check objects directly when
   testing whether to create a job

release 49.3
 - error if padding for spiked Phix index sequence is not long enough
 - kill unwanted jobs efficiently (one command for all ids and -b option)
 - call warehouse loaders with verbose option
 - call ml warehouse loader at the end of the analysis pipeline so that the product
   table is loaded by the time the run goes into QC thus allowing to query this
   warehouse using run id
 - simplified name generation for fastq files
 - use 'subset' option of the bam_flagstast autoqc result instead
   of the 'human_split' option
 - allow p4 to be used where no alignment is specified for target but human
     split (contains_nonconsented_human) is
 - new tests for various p4 analysis options in seq_alignment
     (20-archive_file_generation-seq_alignment.t)

release 49.2.1
 - to avoid deprecation warnings in Config::Any,
   ensure XS extensions are available for YAML and JSON

release 49.2
 - test updates only

release 49.1
 - remove 'move_to_outgoing' step from function list for qc runs

release 49.0
 - run the archival pipeline entirely in the directory where it was started, ie
   do not move the runnfolder to outgoing; this will be done by the staging
   monitor

release 48.9
 - run illumina analysis loader in a lowload lsf queue

release 48.8
 - pipeline daemon - when calling the pipeline, do not use paths that are local to the host

release 48.7
 - generate  fastqcheck files for empty fastq files explicitly without
   running fastqcheck executable, which is not available on gseq cluster
 - daemon to process a run if machine location is unknown

release 48.6
 - force gclp analysis along the p4 route

release 48.5
 - archive to a "gclp" iRODS if function_list looks like gclp variant
 - use low_load queue for upstream_tags qc (accesses tracking and qc DBs)

release 48.4
 - Make group to change analysis directories to optional
 - use LSB_BIND_CPU_LIST over LSB_MCPU_HOSTS to determine number of threads to use
   within a job (cope with hyperthreading where LSF gives one slot to what is presented
   as two cpu - this will try to make use of apparent CPUs)
 - make number of slots used by seq_alignment configurable
 - allow running on file server by
   + use npg_tracking::util::abs_path to patch absolute paths
   + avoid perl chdir to give job working dir
 - drop not_strip_bam_tag option, explicitly disable bam tag stripping in seq_alignment
 - add daemon.ini and optionally add command_prefix to commands
 - use lowload lsf queue
 - seqchksum_comparator to cope with higher plexing (with a chdir)
 - force HiSeq rapid run V2 flowcells (BCXX suffix) to use p4
 - enable p4 single-end processing (bwa mem only, not RNA or non-consented human split)

release 48.3
 - get informatrion about a spike directly from lims
 - use old bam_alignment.pl, not P4, if omission of alignments requested
 - option (default on) to force analyses to assume phix spike
 - force P4 and so bwa mem for runs with reads > 100bp
 - enable human split for P4 in seq_alignment (using bwa aln, adapter trimming)
 - run old warehouse loader live in order to pick up pool-level information
     that is currently needed in SeqQC
 - gclp-specific function list for archival

release 48.2
 - added update_ml_warehouse
 - removed sf48 from list of staging areas in green room

release 48.1
 - pipeline-specific function lists
 - setting olb or qc_run flags to true results in olb or qc_run function lists used
 - qc_run flag on its own does not cause a change of lims driver
 - unused pipeline flags and options removed
 - for gclp runs, the analysis daemon to pass gclp function list to analysis pipeline

release 48.0
 - if LIMS cached data creation fails, pipeline script fails before submitting jobs:
     removed spider function from function order for both analysis and archival pipelines;
     introduced spider boolean flag that defaults to true;
     spider is run within prepare() method before the functions are executed
 - removed test and live section in configuration files
 - removed configuration for external script names
 - 'PB_cal_bam' analysis pipeline renamed to 'central'

release 47.9.1
 - Fixed split sanity checks

release 47.9
 - analysis daemon patch: ensure runfolder glob expressions are used when finding the runfolder path
 - use samtools1 (rather than samtools1_1) for samtools in archival and P4 pipelines

release 47.8.1
 - workaround bug: daemon missing new warehouse access

release 47.8
 - GCLP compliance-related daemon changes:
     runs will not be progressed to analysis/archival unless the
     flowcell barcode is set in the npg_tracking database;
     runs are not going to be progressed if it's impossible to fetch LIMs data for a flowcell;
     both analysis and archival daemon to pass runfolder path to the
     pipeline script;
     if batch_id is available, pass it to the analysis pipeline script
 - use appropriate driver for samplesheet generation (xml, warehouse, ml_warehouse)
 - GCLP compliance-related pipeline changes:
     get the flowcell barcode needed for
     accessing LIMs information from runfolder path/content;
     use batch id if provided by the caller;
     derive the run id from runfolder path/content
 - check for RTA run tag is dropped in the analysis daemon - all current runs are RTA
 - use alt_process flag when archiving qc runs

release 47.7
 - switch seqchksum_comparator to cram, add new test data and updated tests
   convert all cram files to bam as current version of bamcat will not read cram
   dropped one test as converting an empty bam file to cram produces a valid cram file
 - More sanity checks: use of y split and nonconsented X and autosome split only with Homo
   sapiens reference, use of nonconsented human split only with non Homo sapiens reference.
 - always apply sanity checks (even when not running P4 based pipelines).

release 47.6
 - reenable archival of index files for CRAM/BAM files to iRODS

release 47.5
 - turn off BAM archival to iRODS

release 47.4
 - multiple TraDIS library types now in use, all assumed to start with TraDIS

release 47.3
 - P4 can process nonconsented X and autosome human split, and separate Y chromosome data
 - tidy of conditions for selection of p4 processing, and new force_p4 flag to override them

release 47.2
 - don't process phix using the p4 pipeline

release 47.1
 - when using and copying an existing cache directory copy everything (instead of
   restricting to npg directory)
 - try to create samplesheet if it does not exist, even if copying cache

release 47.0
 - code moved to git repository

release 46.3
 - test fix

release 46.2
 - use samtools1.1 in seq_alignment P4 based analyses
 - externally specified webcache and samplesheet are copied to the default location
     inside the analysis folder
 - seqchksum primary data comparison between final product and post illumina2bam
 - run seqchksum and cluster count check at same time as post bam qc
 - tag list files creation:
     remove dedicated function
     call the code from illumina2bam function
     refactor into a stand-alone per-lane module
     create these files in the metadata cache directory
 - allow ref_match qc jobs to run 8 at a time (patched Bowtie ameliorates Lustre problem)
 - use subtemplate/library base templates rather than monolith ones for seq_alignment P4
 - do not run lane level pulldown_metrics job for a pool
 - extra seq_alignment check that run is compatible with available P4 pipelines
 - limit bam split by tag to lanes requested (fix)

release 46.1
 - for V4 HiSeq runs without a reference use bam_alignment.pl

release 46.0
 - remove redundant npg_pipeline::archive and npg_pipeline::roles::business::file_constructs modules
 - remove dependency on tag files npg_common::roles::run::lane::tag_info role
 - remove a callback for a phix flavour of the sequence error check (function is not in use)
 - move options for the adapter detection job to where the job is generated
 - error is thrown for non-existing qc check
 - remove generation of tag files (tag list files remain)
 - always create new tag list files
 - remove unused configuration options
 - remove --lane pipeline option (was used only in tests)
 - move generation of the bam2fastqcheck_and_cached_fastq job out of a module for
     job generation for autoqc functions
 - remove unused scripts

release 45.6
 - remove mostly redundant npg_pipeline::roles::business::internal_info, move
     tradis flag to the module creating illumina2bam job
 - remove redundant npg_pipeline::roles::business::bustard_lsf_reqs

release 45.5
 - remove unused callbacks for old-style run and lane status updates
 - remove a callback for lane completion files
 - remove --no_status_updates pipeline option
 - remove a prereq. script for checking for existence of files

release 45.4
 - omit PhiX sample name and study from strings generated for illumina2bam
   BAM RG record generation (lane/pool level)
 - ensure ref_match qc jobs run serially (to try to alleviate Lustre slow io
   on simultaneous file read bug)

release 45.3
 - write log files for status change to qc complete to outgoing

release 45.2
 - remove the following functions from function order:
     status updates that do not create status files
     redundant touch_completed_lane

release 45.1
 - bug fixes and code improvements in a callback for file-based statuses
 - file-base status updates added to function order
 - ensure P4 pipelines in seq_alignment are aborted if required analysis is
   not yet supported
 - remove bam_alignment and rna_seq_alignment steps (having been replaced
   by seq_alignment)
 - run bwa mem P4 alignment pipeline for V4 HiSeq runs as well as HiSeqX runs

release 45.0
 - Use P4 based BWA Mem analysis in seq_alignment if HiSeqX run
 - Use variable number of CPU slots for seq_alignment jobs (12 to 16)

release 44.15
 - Use soft filtering instead of hard filtering for spatial_filter
 - generate bam_alignment autoqc json for RNAseq analyses
 - don't try per plex "seq_alignment" analysis  or upstream tag qc
   if no indexing read
 - callbacks for functions saving run and lane statuses to file

release 44.14
 - switched cluster count check to InterOp files
 - reinstated bam_cluster_counter_check for HiSeqX

release 44.13
 - seq_alignment refinements:
  + to RNAseq p4 script pass:
   - library_type, fr-unstranded or fr-firststrand if library is dUTP
   - AlignentFilter.jar location
   - real PhiX fasta location
  + add autoqc bam_flagstat json generation

release 44.12
 - increase nfs resources for seq_alignment to 4
 - drop localscratch requirement for seq_alignment
 - amended generation of rna_seq alignment commands to use new parameters, and amended corresponding tests

release 44.11
 - set PU option when calling Illumina2bam
 - HiSeqX run: skip bam_cluster_counter_check

release 44.10
 - update p4 vtfp template location in seq_alignment
 - do not run illumina_basecall_stats step for HiSeqX data

release 44.9
 - parallelise seq_alignment function

release 44.8
 - use seq_alignment module to replace bam_alignment to produce
   production output files and get rna analysis into production

release 44.7
 - use analysis_path as a location for the cached data directory
 - allow for flattened runfolder directory structure, ie do not
   insist on Illumina RTA directory structure

release 44.6
 - remove unused test data
 - use more up-to-date runfolder directory structure in tests
 - remove unused methods from test utility module
 - do not use analysis_path either in tests or in the code - this option is not
   being used
 - remove unused analysis_type option to the latest summary link creation job

release 44.5
 - Add qc_verify_bam_id to list of qc functions

release 44.4
 - pool-level asset ids are not loaded from a samplesheet, creating problems in SeqQC;
   warehouse loader not to take lims data from a cached samplesheet

release 44.3
 - pipeline's unused no_spider flag removed
 - 'spider' function re-implemented to create a cache suitable for
   samplesheet-based lims objects
 - cache directory moved down to the bam basecall directory
 - 'create_webcache_softlink' function removed since the location
   of the cache is now inambiguous
 - stand-alone module npg_pipeline::cache for generating a cache
 - unused functions for handling emails in tests removed
 - 'no_recalibration' flag replaced by 'recalibration' flag that
   defaults to false
 - analysis pipeline is started without explicitly using setting
   'recalibration' flag
 - use Biobambam based adapter detection instead of illumina2bam's

release 44.2
 - autoqc data retriever has changed, update the constructor's attributes

release 44.1
 - added Illumina2bam.jar options for runs with an inline index to put the
   tags on the read with the inline index and modified tests accordingly

release 44.0
 - location of the log for the archival warehouse loader job changed
   from the analysis directory to outgoing
 - reduce dependency on npg_common:
     parse configuration files directly;
     move functionality of the npg_common::roles::run::fs_resource
     role into the base class;
 - remove run_conf and add_to_run_conf attr/method from the base class

release 43.4
 - extended inline index to read 2
 - create tileviz directory when archive and qc directories are created

release 43.3
 - remove tileviz function from the pipeline
 - test update following changes to ping procedure for npg daemons

release 43.2
 - added --tileviz option to spatial filter parameters in pb_cal_align command
 - changed default region size to 200 and dropped region_min_count
 - added a check for pools with only one non-phix tag and extended tests

release 43.1
 - remove unused cram file generation function
 - remove provisions for running daemons on the old lenny farm
 - daemon to start analysis and archival for run folders that are co-located with
   daemon's host
 - update hosts in daemon utility that runs the pipeline daemons
 - upgrade bam alignment job memory requirement to 16GB
 - drop appending boost library path to LD_LIBRARY_PATH - not needed
   on Ubuntu precise
 - rely on path to find tophat2 executable

release 43.0
 - remove analysis and archival daemons dependency on npg::api, replace by
     direct calls to tracking database
 - remove analysis adaemon dependency on staging area globbing
 - increase job priority of HiSeq2500 runs
 - fix a bug in calculating jobs priority - priority should be set regardless
     of whether a priority value is set in teh tracking database
 - if the run does not have RTA tag, daemon will try again later
 - analysis daemon to call analysis on a whole run rather than listing
     lanes explicitly
 - remove analysis daemon dependency on LIMs data
 - remove definition of lsf_resource_select from the config file;
     both daemons to supply this option as "lenny" if running on the old farm

release 42.7
 - analysis daemon to submit runs form a subset of staging areas depending
     on the cluster name it is running on
 - redundant tests and test code removed

release 42.6
 - turn off default PhiX based Qval calibration
 - hard code the memory rather than look in config file
 - remove mocked objects for qc tests and add explicit test for qc_adapter
 - separate test executables for lsadmin

release 42.5
 - improvements to lsf_job module and its tests
 - to be able to submit tileviz job accross farms,
     drop request for particular nodes from the tileviz lsf job spec

release 42.4
 - new module lsf_job to generate LSF7/9 memory limit strings
 - remove /software/bin/perl in local LSF commands

release 42.1
 - add upstream_tags qc check

release 42.0
 - to ensure host-specific path to script is used, use bare script name in daemon definitions

release 41.10
 - job that updates run status to 'run archived' should write log to analysis not outgoing
 - production environment is now set in login shell - no need to set npg
   and npg cpan lib location through PERL5LIB;

release 41.9
 - patch for adaptor autoqc jobs memory requirements; mismatch in
   requested and allowed causes an error in job submission on farm2

release 41.8
 - patch to script_must_be_unique_runner - it's job id we need

release 41.7
 - patch to script_must_be_unique_runner - having a pipe expands the value
   of $LSB_JOB_INDEX prematurely

release 41.6
 - archival to irods jobs for the same run are prevented from running concurrently
 - qc_adapter uses explicit span hsots in bsub command

release 41.5
 - pg_pipeline/analysis::harold_calibration_bam, alignment_script invocation: removed
    deprecated --intensity_dir flag; added --bam_join_jar flag which uses CLASSPATH
    to locate its argument
 - use one nfs slot for rna alignment since all compute is done locally on a node
 - finding mountpoint for directories in /tmp does not go well under erl 5.16.3
   on farm-precise-dev64; explicitly set TEST_FS_RESOURCE in the tests for
   rna alignment farm job creation

release 41.4
 - OLB can use more CPUs - from 8 to 16 (changed from fixed 8)
 - --no_recalibration option no longer stops PhiX alignments
 - amended npg_pipeline/archive/folder/WebCache.pm and lib/npg_pipeline/pluggable/harold.pm
      to use script_name attribute instead of $PROGRAM_NAME

release 41.3
 - ensure irods archival does not block setting qc review pending status -
   bug fix in the parallelisation configuration

release 41.2
 - changes to tophat alignment script to cope with single reads
 - qc_adapter uses 2 cpu 1500M, illumina2bam uses 2 cpu 4000M
 - move to qc review pending state does not depend on the outcome
   of archival to irods to allow for manual qc to proceed reagrdless
   of the state of the IRODs repository; failed archival to
   irods jobs will still show in stuck jobs list
 - fixed a test that accessed xml feeds from live url

release 41.1
 - rna-seq alignment added

release 41.0
 - pipeline daemons settings to work under perl 5.14
   this set up requires that '/etc/bashrc' is sources in the user's
   .bashrc
 - perlcritic policy name printed when perlcritic errors are displayed

release 40.6
 - return archival of bam files to irods to the analysis pipeline

release 40.5
 - local analysis and archival deamons defenitions (moved from instrument handling)
 - take inline index end from st::api::lims module

release 40.4
 - exclude archival of bam files to irods from the analysis pipeline -
   temporary measure to tier over a whole day of irods maintenance

release 40.3
 - following controlcentre-less deployment of daemons, irods path is lost
   in deamons; replace sourcing lsf configuration with sourcing /etc/bashrc
   which sould take care of lsf configuration and add whatever is defined
   in .softwarerc (eg irods)

release 40.2
 - limit max threading for phiX bwa sampe (so we don't break memory limit)

release 40.1
 - bugfix - fixed cycle_start1 in pb_calibration jobs for '3 prime poly-A pulldown' lanes

release 40.0
 - dependency on /software removed, filesystem_locations.ini configuration file removed
 - production pipeline daemons to use standard error for logging, which will go to a file
   designated when setting the daemon
 - no fall back onto stored tag files, LIMS should provide information about tags
 - pb calibration tools are located dynamically
 - a full path to OLB in the configuration file
 - unused npg_pipeline_preexec_lsf_resource_max script removed
 - allow alignment jobs to use 6 to 12 processor slots
 - removed type==X86_64 lsf select option that was used for some autoqc lsf jobs
   since it does not combine well with select[lenny] or similar
 - added lsf_resource_select pipeline option that is set by default to lenny in the config file;
   if set, this option is added as -R select[xxx] to all lsf job submissions;
   if set for pipeline daemons, gets appended to pipeline script options
 - installing data/config_files added to the build's install target
 - lsf configuration file path is stored in the pipeline configuration files
 - threading in sam{se,pe} and bump bam_alignment memory up to 13200MB

release 39.7
 - removed hardcoding illumina2bam location from the pipeline
 - job creation fails if necessary jar files not found
 - outdated special test for bam alignment tradis removed
 - change pb calibration version to 10.6
 - removed hardcoding a full path to the bam alignment script

release 39.6
 - spatial filter - default to removing failed reads and use V10.5

release 39.5
 - bump BAM creation memory from 8G to 10G

release 39.4
 - fix running of pb_align for single read data
 - loosen REs looking for cycles to use fo rindex in 3' pulldown sample descriptions

release 39.3
 - fix misleading cluster counter check fail message
 - disable cram creation until tools supporting 1.1 are released

release 39.2
 - only apply spatial_filter if a spatial_filter filter file exists
 - use lower case for homebrew BAM tags used for random bases in 3' pulldown library illumina2bam usage
 - extend 3' pulldown "jecfoo" to cope with index starting at different cycles and different read lengths
 - use spatial filter numbers when checking cluster counts add up
 - remove unnecessary export of npg_comon modules
 - reflect the move of some npg_common modules to the npg_tracking namespace

release 39.1
 - bugfix - use bash for PB_score bsub command (as it uses a bash'ism)

release 39.0
 - hardcoded list of special-index library types should be compatible with tracking
 - modified illumina2bam to handle the "jecfoo" special read 1 index
 - reflect the fact the run, lane and tag roles moved from npg_common to npg_tracking namespace

release 38.3
 - avoid pointless compression in data pipe out of spatial_filter in PB score jobs

release 38.2
 - switched to v10.1 of pb_calibration

release 38.1
 - bugfix spatial filter takes raw intensity path, not dif file location
 - bump filesystem resources used at calibration table generation step

release 38.0
 - daemons updated, outdated code removed
 - spatial filter added
 - calibration code changed to V9 - one table calibration (incudes both reads)

release 37.2
 - analysis daemon patch to exlude deleted earlier option

release 37.1
 - redundant modules removed
 - default for the local flag reflects the value of no_bsub flag
 - spatial_filter flag added and propagated to illumina2bam job
 - bump OLB to 1.9.4
 - setting nonconsented human and spiked phix flags for a lane - dependency on npg tracking database removed

release 37.0
 - don't stop spiked phiX "harold" alignment if no_recalibration is set - we still need to filter any PhiX out.
 - Add adapter detection code after Ilumina2bam for paired read data (data in "a3" and "ah" tags)
 - TraDIS tags tr and tq no longer stripped by BamTagStripper wrapper so turn back on strip BAM tags by default for TraDIS

release 36.13
 - tileviz job creation improvements in order to address lsf job failures due to problems with the nfs file system:
     preexec to check for the existence of the qc directory,
     pipeline script to create the tileviz directory if it does not exist

release 36.12
 - bug fix; stopped using roles that were removed in the previous point release

release 36.11
 - change function_order for tileviz and log name
 - corrected syntax error in construction of job command in cram.pm
 - added tests for cram.pm

release 36.10
 - bug fix for tileviz and calibration_table jobs

release 36.9
 - npg_pipeline::analysis::harold_calibration module deprecated, its test removed
 - attributes/methods for retrieving control species reference simplified
 - removed npg_pipeline::pre_exec::references_adapters and refactored bin/npg_pipeline_preexec_references to use reference finder roles directly
 - removed npg_pipeline::roles::business::pre_exec_strings, its code integrated into npg_pipeline::roles::business::base
 - fixed files_present_pre_exec_string function; preexec script should ensure that all lanes (not only the ones currently processed) are ready for the state change
 - removed npg_pipeline::pre_exec::FilesPresent; replaced its code with simple code snippet in npg_pipeline_files_present
 - added tileviz function

release 36.8
 - amended cram.pm to supply archive_path to the cram_generation module to avoid lookup at cram creation time; reenabled cpu_limit addition to job name; removed unused code

release 36.7
 - added cram.pm module to create lsf submissions for bam to cram conversion; added cram_generation method to PB_cal_bam.pm

release 36.6
 - pass is_paired_read flag to bam_alignment scripts
 - bam pipeline diagram

release 36.5
 - convert illumina2bam path to absolute path and pass the result to bam_alignment script as well

release 36.4
 - add pulldown metrics autoqc check
 - change illumina2bam jar file name to Illumina2bam.jar

release 36.3
 - ensure crashing analysis launch for one run does not affect others launching

release 36.2
 - use index_length method inherited from long_info role in create_lane_tag_file to avoid to rebuild

release 36.1
 - remove archive lane directory and bustard directory for auto qc results loading

release 36.0
 - Added copy_interop_to_irods

release 35.3
 - npg tracking db fixtures fixed to reflect db schema changes in pending npg-tracking release 68.2

release 35.2
 - ensure use_bases is passed in bsub of old deplex jobs

release 35.1
 - touch complete lane job output name fixed; its command line generated listing only the necessary options

release 35.0
 - roll back NoGetopt metaclass in lsf queues attrs

release 34.3
 - bin/npg_pipeline_check_lsf_jobs now contains David's script copied from ~dj3/team117/npgsj
 - npg_pipeline::ConfigReader removed since this was just on object wrapper around a role
 - instrument_type and instrument_model pipelien options removed, not needed as options
 - old pipeline option to switch OLB preprocessing, defaults to false
 - npg_pipeline::analysis::bustard4pbcb - new module for preprocessing with OLB
 - npg_pipeline::pluggable::harold::PB_cal_bam_qc (with tests and a script) removed, its functionality moved to npg_pipeline::pluggable::harold::PB_cal_bam
 - execution of spidering moved to npg_pipeline::roles::business::base, special module for it removed
 - unused no_cached_webservice_data and webservice_cache_dir options removed
 - sourcing lsf and oracle conf files removed from the daemons, role and config file for this removed
 - archive_to_irods step called at the end of analysis, no_irods_archival option to toggle this
 - references config file removed
 - phix snip file location in config file changes from the master ref repository to lustre mirror
 - some unused attr of the base object and roles removed
 - some options and attributes marked as not available for setting from the command line
 - add a not_strip_bam_tag flag to pass to bam_alignment script to keep tags like OQ, ci etc. in final bam file
 - leave bam_alignment.pl script to figure out number of threads to use itself (from LSB_MCPU_HOSTS environment variable)
 - set bwa aln threads using LSB_MCPU_HOSTS environment variable in PB align commands
 - no_sf_resource flag added to allow for running on an LSF cluster where sf and irods resource is not defined
 - npg_pipeline::roles::business::databaseConnection role removed; the code switched to using npg_tracking schema attribute that is inherited from npg_common::...path modules
 - npg_pipeline::pre_exec::connectDBIxTracking module removed; bin/npg_pipeline_prexec_connect_dbix_tracking uses teh npg_tracking_schema attribute of npg_pipeline::base directly

release 34.2
 - unused npg_pipeline::reorder_fastq and npg_pipeline::Checker removed

release 34.0
 - spidering simplified and speeded up
 - tests that request NPG and Sequencescape XML feeds use webcache and nothing else
 - tests that post to NPG XML removed
 - tests that request connection to external live databases removed
 - propagation of the ref repository location to the can_run part of npg_pipeline::archive::file::qc
 - tests that access reference repository use test repository
 - unused test data (xml files) removed

release 33.7
 - Add option to BamIndexDecoder to convert low quality bases in barcode read to Ns and increase MAX_NO_CALLs from 4 to 6 for single plex lane

release 33.6
 - use lsf irods resource for bam loading job

release 33.5
 - a script to compute pipeline performance

release 33.4
 - bug fix: naming of empty placeholder fastq files for a single run RT#245665
 - unused scripts deleted from external_script_names.ini

release 33.3
 - set recalibrated path for create summary link lsf job

release 33.2
 - code from module npg_pipeline::pluggable::harold::qc moved to npg_pipeline::pluggable::harold
 - modules that inherited from npg_pipeline::pluggable::harold::qc noe inherit directly from npg_pipeline::pluggable::harold
 - module npg_pipeline::pluggable::harold::qc and its tests removed
 - script npg_pipeline_qc removed
 - a new update_warehouse function created
 - create_webcache_softlink function added to the pb_cal_bam_qc function order

release 33.1
 - module npg_pipeline::run removed, its children inherit directly from npg_pipeline::base
 - npg_pipeline::run::folder::move refactored to cope with moving folders whose names do not conform to standard RT#244644
 - npg_pipeline::run::folder::link refactored to pass options to a script explicitly
 - lsf job for creating the summary link moved to the small queue
 - nfs resourses string removed from creating the summary link and moving runfolder lsf jobs
 - removed unused test data directory t/fuse
 - all job scheduling code moved to npg_pipeline::pluggable
 - a list of submitted job ids returned unsorted to maintain the actual order the jobs were submitted in
 - for bsub command, npg_pipeline::base->submit_bsub_command returns job number as integer
 - standard bsub return value (job number as a phrase) for the test bsub script
 - function create_webcache_softlink moved up to npg_pipeline::pluggable::harold to make it available for the bam pipeline

release-33.0
 - either launch all jobs or none, ie if cannot launch all jobs, kill the once that has been launched; RT#243670
 - token lsf job at the end to make last job failure trackable RT#244290
 - methods and accessors moved between base, pluggable and harold to achieve a more logical split between these three mosules
 - functions and accessors for inferring function order simplified
 - use of 'NoGetopt' metaclass for attributes that do not have to be script options
 - Keep calibrated qualities score when merging with original phix alignment
 - No longer vary filesystem resources used for BAM and fastq split by tag depending on number of plex (Sanger ISG have upgraded NFS server kernel to avoid XFS/NFS fragmentation bug)

release-32.2
 - fix missing phasing numbers in BustardSummary.xml - illumina_basecall_stats

release-32.1
 - move forward qc_tag_metrics just after illumina2bam
 - parallelise illumina_basecall_stats with illumina2bam
 - croak when no expected tag sequence or index given in sequencescape and don't create one based on the stardard illumina or sanger tag list
 - make sure harold_recalibration job submitted to creat bam file soft link when no_recalibration flag given
 - explicit arguments are given to the illumina analysis archival scripts to prevent future failures when rearrangements are made in the parents, see RT#243937

release-32.0
 - change to new BAM based pipeline for default analysis
 - suspended start of the pipeline
 - config path is built relative to the bin directory; if running from the local directory, local configuration is going to be used
 - spidering methods list reduced
 - increase max_no_calls for decoding if only one plex for a lane

release-31.1
 - special case of bam input for the adapter check
 - unused scripts scripts/functional_script.pl scripts/spider_batch removed
 - extra function in the function order for pb_cal_bam, bam_cluster_counter_check

release-31.0
 - don't try to return control recalibration table if no position given
 - skip non spiked-phix lane for recalibration in PB_cal_bam
 - create lane and lane qc directory in create_archive_directory function order
 - add create_archive_directory in PB_cal_bam
 - add bam_alignment function_order in PB_cal_bam, for bam alignment, filtering, sorting and markduplicates
 - increase bwa aln threads from 4 to 6 in PB_cal_bam, and get their value from config file
 - separate module for autoqc functions npg_pipeline::pluggable::harold::qc
 - pb_cal_bam_qc combined module integrates pb_cal_bam with qc and post_qc_review
 - hardcoded description of a study for control entity
 - npg_pipeline::roles::business::base clean-up - redundant test-related code deleted
 - npg_pipeline::daemons::harold_analysis_runner - calles to batch->lanes replaced with
   calls to the new st::api::lims module
 - npg_pipeline::pluggable::harold::post_qc_review - migration pipeline related functions deleted
 - npg_pipeline::archive::mpsa_to_irods_migration_helper and its tests removed - not needed
 - function for creating empty fastq files and cached short fastq files from a bam file
 - function to run tag_metrics autoqc check
 - qc script to take --qc_in and --qc_out arguments instead of --archive_path to allow for
   being flexible about qc input directories
 - parallelesation of  archive_to_irods and archive_to_sra removed since archive_to_sra is not used any more
 - npg_pipeline::roles::business::base->batch method removed
 - some changes to the npg::api::lane object usage to stop using deprecated methods, especially getting hardcoded references

 - introduction of webcache in tests to cope with npg::api::lane methods using st::api::lims object and the latest batch xml
 - explicit spidering of st modules from npg::api modules discontinued; getting references and insert sizes explicitly in spidering discontinued since this should be covered by spidering the lims objects

 - redundant code removed:
     npg_pipeline::pluggable->alert accessor
     npg_pipeline::dispatch_tree::evaluator module removed

release-30.4
 - schema info picks up from a local directory before the users directory. Move
   run_lane_status changes to be run from the runfolder, and this should avoid this
   problem
 - don't try to run any recalibration on a lane marked control through PB_cal

release-30.3
 - only run PB_cal recalibration on a lane which is either phix or a control

release-30.2
 - enable genotype qc check

release-30.1
 - ensure that the tag files can be created if the tag on a spike is longer than the tags on the plexes

release-30.0
 - pb_cal_bam: turn off compression for pb_predictor bam output and merge phix alignment into this output
 - split_bam_by_tag function order added for pb_cal_bam
 - increase illumina2bam fs resources from 1 to default 4
 - increase the number of log files for check_lsf_jobs to look through
 - finish added to function order by default
 - update run_lane status, which will handle updating runs to analysis complete/qc review pending

release-29.3
 - make sure the same study not stored more than once for a multiplexed lane

release-29.2
 - phasing for lane changed to lane from auto
 - remove code and files that existed from initial ideas about the pipeline setup

release-29.1
 - bug fix about file name for lane tag

release-29.0
 - pipeline to use single access point lims module
 - pipeline to take actual positions from batch xml in prererence to the run-lane information from the tracking database
 - tag lists generated only for the lanes the pipeline has to deal with
 - vary filesystem resources used for fastq split by tag depending on number of plex

release-28.10
 - bump up filesystem resources used for BAM mark duplicates

release-28.9
 - decrease cpu requirement for non-consented and spiked phix fastq splitting from 12 to 8

release-28.8
 - switched to branch 7.0 of PB_cal (detect bad tile/cyles + new caltable format)
 - bump up filesystem resources used for creating fastq files

release-28.7
 - bug fix to catch the command line for recalibration_alignment for schema information

release-28.6
 - bam & markduplicate files need creating for removed phix, even if the lane is multiplexed

release-28.5
 - daemon only submits for lanes present in batch xml
 - spidering only spiders lanes which are declared from the positions method

release-28.4
 - get study from lane entity directly instead of from sample

release-28.3
 - use study publishable name for bam header, and add study description as well

release-28.2
 - increase cpu requirement for non-consented and spiked phix fastq splitting from 4 to 12

release-28.1
 - OLB version upped to 1.9.3
 - bugfix - BamIndexDecoder not merge by subversion for release-28.0 - corrected
 - bugfix - Jobs which are launched by a job which is part of an array, take the array index as part of the job requirement

release-28.0
 - lane based post_qseq launched
 - analysis complete and qc review pending moved to primary analysis pipeline
 - analysis complete dependency on post qseq, plus pre-exec test for existence of
   files in archive directory which will be written at end of each lanes secondary
   analysis
 - create a lane taglist file with tag sequence, name and library, sample and study name
 - pipe illumina2bam output to BamIndexDecoder if multiplexed run

release-27.3
 - patch to fix only getting stats for lane demultiplex if the lane is multiplexed
 - bustard and gerald lane based option
 - pass through bustard.py parameters with override_all_bustard_options
 - bugfix - ensure that spidering gets assets for plexes

release-27.2
 - increase memory requirement for illumina2bam job to make sure JVM can start

release-27.1
 - new function_order illumina2bam to convert bcl files to bam
 - reduced spidering to only do functions we know are called
 - dependency on current running job added to any jobs with no dependency
   if the job launching is a job itself
 - remove dependency on MooseX::InsideOut (which is broken in Moose2.0)

release-27.0
 - increase number of job io slots for pb_cal jobs to 4
 - increase number of processors to 4 for score and alignment
 - push number of processors through to bwa with aln_parms
 - stop bam production for full lanes which are multiplexed
 - stop the subsequent mark duplicate jobs for those lanes
 - stop the subsequent gc_bias qc check for those lanes
 - in scripts a helper script for checking spidering. This is always subject to change!

release-26.4
 - increase memory for bam_markduplicates to cope with higer density coverage

release-26.3
 - spidering imporovements to capture more data to ensure we get full requests to get the correct study for the sequencing request
 - all runs without a control lane will run with the PB_cal no recalibration option

release-26.2
 - job_priority option, to enable a user to determine the job priority to be used for all jobs, regardless of the queue to which the job would be submitted
 - remove bam_index function order
 - fix launching the post_qc_review in pipeline migration helper to include user defined requirements
 - spider calls studies on lane to ensure that we get the studies
 - move run folder drops the log file into a log directory within the folder moved to, rather than just the folder,
   so that any wildcard matching won't find that by accident

release-26.1
 - spidering improved to pick up more relating to samples on multiplex lanes, but not to go down the children, as this might end up pulling the entirety of Sequencescape
 - munge summary files removed from function_orders.yml as no longer necessary to run
 - bugfix: correct cal table name should be generated for all functions/scripts which need access to it
 - references_adapters directory check: must only replace the word references with adapters, not if is is part of a larger word or construct

release-26.0
 - removal of the munge filenames/readnames that were needed for old version of GERALD recalibration
 - add in job_name_prefix (both in general_values.ini and as command option) to add a prefix to all job names, for ease of seeing them in lsf
 - no_secondcall flag to ensure switch off generating second basecalls and storing them in the bam files
 - update live versions of OLB (for bcl2qseq) and CASAVA

release-25.1
 - addition of code to check the lsf queues for stuck runs and let you know which jobs failed

release-25-0
 - complete removal of srf generation and code for gcfreq creation
 - removal of illumina_pipeline_bin and it's conf key/valeu pairs as legacy which is not needed
 - switched order that fastq2bam jobs are launched, so that full lanes bams are generated in preference
   to plex ones. This may give a slight time improvement due to more efficient resource allocation
 - move as much qc to run in parallel with fastq2bam as possible, to endeavour to reduce the time these
   extend the pipeline by

release-24.4
 - daemon for launching analysis to check if the run has spiked lanes, and then launch PB_cal for spiked, with a
   no_cal/bustard&gerald to run alongside

release-24.3
 - add bam cluster count checking based on bam flag stats

release-24.2
 - pre-exec references job added to fastq2bam jobs as these need to query the references repository during running
 - pre-exec references job added to alignment and calibration jobs in PB_cal
 - spiked calibration tables need to know the snp files for phix
 - phix snp file added to pb_cal_pipeline.ini, so as to be retrieved for calibration job

release-24.1
 - turn off running full PB_cal pipeline as standard
 - lsf_queue and small_lsf_queue to be used to determine queues, so now functional command line options

release-24.0
 - a function to perform the ref_match autoqc check
 - no_srf_generation stops trying to split and index srf files
 - bugfix - is_spiked_phix is boolean
 - a function to split nonconsented if not done yet
 - if the run is performed on a HiSeq, then do not create srf files
 - a function to perfor a sequence error check for spiked phix part
 - an ability to pass extra options from functions to an autoqc script

release-23.0
 - bug fix to make sure correct basecalling software returned from bustard config xml file
 - bugfix - drop tag from readname in srfs for lanes on multiplexed runs which are not multiplexed
 - deal with spiked in phix for pb_cal calibration
 - patch - give Instrument generated BaseCalls directory to setupBcl2Qseq.py
 - tests: now use domain test in the conf files, so that consistency can be kept in tests, whilst live info may change

release-22.0
 - spiked phix splitting will generate fastqcheck and md5 for the newly generated fastq files
 - remove the file renaming part from spiked phix splitting command for schema information
 - consider spiked phix part for cluster count checking
 - strip out all dependencies on srpipe::util
 - strip out dependencies on srpipe::config::constants and srpipe::config::instruments
 - spidering now calls the required_fragment_size on an asset to ensure that any further xml is loaded

release-21.1
 - fastq generation only adds the --index flag on lanes which are multiplexed, not all lanes on a run
   which have the indexing cycles performed

release-21.0
 - the pipeline to spider over requested lanes only
 - moved lane_tile_clustercount to npg_common::roles::run::long_info
 - demultiplexing only tiles which it can find in the Data/Intensities/config.xml file (since HiSeqs do not have consecutively numbered tiles)
 - phasing and matrix options moved to conf file, and phasing, prephasing and matrix now options to be passed on analyse_RTA's command line
 - using reference finder for human reference location for creating nonconsented human part bam file
 - bjobs for spliting spiked phix out, renaming srf and fastq files, and generating bam file
 - npg_pipeline::analysis::FixConfigFiles - go through the config.xml files in Data/Intensities and Data/Intensities/BaseCalls, and check that
  1) Last Cycle numbers and number of reads in each are the same (croak if not)
  2) the runfolder (on fs) has the same name as in NPG tracking (croak if not)
  3) Assuming 1 + 2 OK, then ensure that both config files have the correct Instrument name, runfolder name and id_run (fixing if needed)
 - add nonconsented splitting command line and spiked phix spliting program to schema information

release-20.1
 - MPSA archived HiSeq runs no longer visible by default

release-20.0
 - bcl2seq can have separate path from OLB

release-19.0
 - a post_qseq function to generate a stand-alone archive directory for old-runs2irods pipeline
 - a post_qseq function to call the post qc review script
 - SecondBaseCall - pb_cal needs to start producing second base calls
 - check recalibrated qseq files are original. If not, pass the original qseq files to bam generation

release-18.4
 - moving a run folder no longer requires using srpipe::util, so should be adaptable for anything, although, it no longer
   copes with a paired run
 - SchemaInformation now has improved data set, for preparation to go into the Bam headers

release-18.3
 - qc_cache_reads - additional step which will cache reads for qc's to use, saving processor time
 - archive_to_irods now in parallel with archive_to_sra
 - id_run now to be passed to the qc tests in the command line
 - srfs - index has increased memory for HiSeq benefits. srf_creation has increased memory so that not too many get launched
   on the same node

release-18.2
 - patch - webcache now checks the program name as a pattern match rather than an equality, in case the program name contains a path
 - move runfolder bug - remove the name check against the runfolder as name will not have any padding 0's
 - md5's - don't record the md5's of full fastq/srfs where the lane shou;d have split non-consented data
 - improvements to schema_information to make DS a more accurate description, and bugfix where we weren't getting the correct PB_cal information

release-18.1
 - Patch to SchemaInformation to cope with some changes to the way npg_common operates
 - parallelise hash now in a config file
 - removed some now unnecessary hashes/arrays of function orders and parallelise
 - test for qseq2fastq.pl will check to see if there is an md5sum and fastqcheck which can run on your system
   and modify/skip tests that rely on checking this functionality. Note: If this is the case, then you will
   need to modify the production code accordingly

release-18.0
 - Dependency on srpipe::config::instruments reduced - should use runfolder_path first, and fall back to this
   expect full removal within a few releases
 - fastq generation - md5 and fastqcheck creation options turned on
 - fastq generation - unique option turned on so that testing for uniqueness of readnames out of qseq files is done
 - md5 creation - script runs so that in process decisions can be made about the existence of files, and if md5s hav
   already been created
 - schema_information now created before bam's are created
 - schema_information always gets the RTA value and sticks at the top
 - DS tag in schema_information for those which are probably appropriate to go into a Bam Header

release-17.1
 - pass archive_path to irods_bam_loader instead of id_run
 - bugfixes
 - srf and qc_contamination parallelised with bam_generation in order to pull together rate determining steps
 - cluster counts do not check in the fastq files, the value is allegedly guaranteed in the fastqcheck file
 - analysis runner daemon now looks at the instrument type and determines which analysis to launch
 - PB_cal no recalibrated for HiSeq as production
 - PB_cal analysis now generates their own folder structures within the flag-waver (may need to be moved)
 - analyse_RTA no longer launches PB_cal
 - PB_cal - no_recalibration works how it should, so small_pb_cal now removed

release-17.0
 - schema information files generated which should report on how the run progressed, and various program versions used
 - patch to cluster counts to be happy if there are no srf's found for the lane
 - bam indexing done
 - softlink webcache used for the analysis into the archive directory for that analysis, and the post_qc_review can use that

release-16.5
 - pb_cal needs new directory paths so that multiple ones can be run
 - small_pb_cal for running a concurrent job which does no recalibration steps (HS benefit at current time)
 - pass archive path to bam generation, so that this is less dependent on working out paths itself

release-16.4
 - check for cif/dif/stats files and flag option to recreate 'dummy' cif/dif files
 - use a log folder to catch lsf log/out files in the various directories, instead of dumping directly into it
 - archival to irods

release-16.3
 - instrument name now taken either from short_info, or if must be calculated itself, then using prefixes
   from general_values.ini

release-16.2
 - PB_cal has moved to v6.0, with extra step and using bam files
 - run_conf file generated in runfolder directory. Current for bustard and gerald directory propagation throughout
   bustard/gerald primary pipeline
 - no_bsub flag, so instead of submitting a job to LSF, just logs the command. returns 50 as a potential job id when used
 - HiSeq archival to sra - no SRF files, and fastq's are either group unconsented (if unconsented), or hiseq (to be hidden
   but archived, as users should be utilising bam files from iRods)
 - no-eamss flag turned on for bcl2qseq in PB_cal pipeline
 - change production version of OLB to OLB-1.8.1a2

release-16.1
 - cluster count checking - fastq, fastqcheck and srfs get checked to see if they contain the same number of pf reads
   as the pf cluster count from BustardSummary.xml
 - moved most values, external script names and pathways to config files

release-16
 - PB_cal score has additional options to take a control calibration table, and use this if it can't legitimately use
   a lanes own cal table (see srl as to why)
 - touch all mp fastq files at start of post_qseq to ensure that they are found for all jobs, even if there are no PF reads
 - point fastq2bam to latest (v37) human reference

release-15.3
 - pre-exec script testing that the references and adapter directories can be found, before a job actually starts
   on the farm, so that there is less chance it will croak out if some repositories go away

release-15.2
 - all possible arrayed submissions have been done
 - PB_cal control lane now uses mode 2

release-15.1
 - bugfix multiplex fastqcheck generation
 - patch to psuedo down qseq files from BaseCalls for PB_cal pipeline

release 15.0
 - All post_qseq dispatches are submitted as job arrays, chaining if needed via -w'done(1234[*])' -J job_name[1-8]
 - PB_cal runs setupBcl2Qseq and amkes the qseq files in the basecalls directory, and then runs off these

release 14.2
 - load fastqcheck files into qc database added to post_qc_review

release 14.1
 - pass index tag fastq file to bam generation to add them in bam file

release 14.0
 - require 4 processors for human splitting bjobs
 - dependency on multiplex qc jobs added
 - turn off EAMSS (Killer B's)
 - mpsa upload now loads missing files for a lane, rather than assumes that because it has 1, it will have all
   - downside, you must ensure that this is not running for a run, if you want to manually do it
 - no archival of srf or fastq files for phix control lanes to fuse

release 13.0
 - PB_cal separated into independent pipeline launched at the end of the illumina analysis pipeline
 - PB_cal pipeline kicks off a post_qseq pipeline on it by default (but with no Latest_Summary,run_status updates or munge Summary xml)
 - sf_resource now put into rusage requirement, to try to limit the amount of IO on the nfs staging partitions

release 12.1
 - bug fix bam submissions
   - typo which stopped non split full lanes getting submitted
   - flagstats json file to go to qc directory
   - fastq files needed full path
   - strange typo of ; switched for . meant incorrect name and path generation

release 12.0
 - move bam_flagstats json file for each plex into lane qc directory
 - turn off creation of sig2 files
 - "analysis complete" moved after all qc and alignments to before "qc review pending", "secondary analysis in progress" now where "analysis complete" was (after Illumina pipeline is done).

release 11.0
 - job array for qc on plexed samples
 - require 4 cpus for bam creation because bwa alignments using threading now
 - pass human_split type and tag_index for bam generation if available
 - patch for ensuring a few axtra cached webpages are there

release 10.4
 - swapped the order of set_run_status_run_archived and move_to_outgoing
 - for post qc review script, unset an env variable that gives the cache location

release 10.2
 - bump default OLB to OLB-1.8.1a1

release 10.1
 - no_munge_summary flag added so that post_qseq can run on PB_cal directory
 - new postqseq function and module bam_markduplicate
 - add archive lane path into autoqc data loading list to pick up bam flagstats json file

release 10.0
 - gc-bias check is done
 - create lane tag file using expected tag sequence from sequencescape if available
 - caching of webservice responses in order to enable
   - less dependencies outside of the filesystem once jobs have been submitted
   - reduce load on webservers from multiple parallel jobs (in particular when we have a multiplex run)
   - find out up front if webservices are unavailable, and don't do anything

release 9.0
 - bam generation with second base call
 - launch harold recalibration
 - generate md5 for correct files after human splitting
 - patch to ensure that manually created lane tag files are not overwritten
 - update copyright
 - bam generation even if no reference
 - split multiplex fastq by tag
 - bam generation for multiplex split fastq
 - fastqcheck for multiplex split fastq
 - moved qseq2fastq.pl from the sanger-pipeline project to this project for better maintenance and deployment
 - only decode lanes which are multiplexed on the flowcell
 - generate tag decoding stats as json file for each lane

release 8.0
 - drop gcfreq creation
 - bam file generation
 - multiplex lane specific tag file support
 - qc_gc_fraction test
 - munge Summary.xml/htm in GERALD after a multiplex run has gone through GERALD processing
 - processor_fork_number moved from bustard_lsf_reqs to business/base, so it can be passed to post_qseq
 - summary data loading removed, as done within illumina summary loading
 - softlinked qseq_custom files will now use relative path instead of absolute path
 - Drop dependencies to use SangerPaths
 - $VERSION now on the same line as use Readonly; so that new Build.PL won't carp.
 - Build.PL, dependencies updated, but dependencies on other internal packages commented out until their $VERSION lines can be corrected like those in this package
 - md5s checked of the files once uploaded to mpsa, and compared to that in the md5 file

release 7.0
 - more launches optimised with MooseX::AttributerCloner generating command line options
 - srpipe::analysisrunfolder removed with the consumption of npg_common::roles::run::long_info
 - move to coping with the new way that GERALD deals with needing file structure for multiplexed samples
 - split srfs will now have srf_index_hash run on them once created
 - split_nonconsented memory allocation upped to 8G

release 6.1
 - moving to usage of MooseX::AttributerCloner to generate command line options
 - Bustard defaults now able to be overridden on the command line, so if the number of processors, etc need changing, then this can be done here without code changes
 - Bustard and GERALD steps now separated for easier launching of different analysis versions via the command line
 - 1.6 pipeline set to default
 - flag_options role, so that we can universally disable functions or pass info (such as no_control_lane) via a flag on the command line
 - change to use the new illumina_analysis_loader, which should accept particular paths from a command line, and so launch it with paths populated as the launcher object has
 - change to Bustard to do it's best to accept and force a control_lane determined by the user, but still relies on some info from Sequencescape to be present and correct
 - tag_file for indexed runs can be manually set rather than relying on choosing a tag_file from the defaults

release 6.0
 - major refactor of code to allow command line variables to be propagated through to objects using MooseX::Getopt and MooseX::AttributeCloner
 - separation of pipelines into separate npg_pipeline::pluggable::harold::<type> in order to make code management easier
 - due to MooseX::AttributeCloner, less code in npg_pipeline::pluggable::harold::<type> methods
 - increased test and POD coverage
 - create_pseudo_qseq_custom.pm - if there has been no recalibration, softlink in GERALD the qseq files in Bustard with qseq_custom names
 - fix_qseqs.pm - ensure that the id_run is correct in the qseq_custom files

release 5.0
 - npg_pipeline::base class with common methods used by many of the npg_pipeline modules
 - try to submit a job to lsf upto 5 times, croaking if unsuccessful after that
 - daemon runner and pipeline for RTA analysis

release 4.7
 - preexecute script to ensure that only 1 of a type of job could run at a time, and applied to npg_qc_api.pl (illumina analysis) loader
 - option to include tag files to split non_consented
 - default queue in archival runner - srpipeline
 - script to resort fastq files if they get created in the wrong order

release 4.6
 - archival runner
 - code for fixing broken fastqs
 - correct order of qseq_custom files to be alphanumerical when submitted for jobs

release 4.5
 - patch release

release 4.0
 - option/support to only run on a single lane
 - post_qc_review pipeline
 - refactor of code from srpipe::archive.pm to npg_pipeline::archive::file::to_sra so that this can be done from post_qc_review pipeline
 - post_qc_review can submit to lsf jobs to upload illumina_analysis, illumina_summary and auto_qc

release 3.0
 - bug fix - id_run not passed through to qc checks
 - adapt to use new qseq2fastq with changes for multiplex runs
 - croak out of post_qseq if it is not a full analysis

release 2.0
 - add qc_insert_size
 - srf_creation also does indexing script
 - log files are created
 - in log file, json string of dispatch tree
 - parse of the json string, which interrogates LSF for info
 - Latest Summary link now relative path
 - illumina2srf always runs out of 1.4 pipeline
 - archive dir is now set to be archive dir, not archive_test

release 1.0
 - set up
 - create npg_pipeline namespace
 - generate replacement to cleanup_run