This repository has been archived by the owner on Apr 15, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathRELEASE_NOTES
148 lines (99 loc) · 5.47 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
Version 1.7.0 (13 January 2021)
-------------
* Change to the assignment of sampled reads to genomes in the event of a
tie-break in which a read aligns equally well, i.e. with fewest mismatches, to
more than one genome; in such cases reads will be preferentially assigned to
(one of) the expected species within that dataset and, if still tied, to the
genome that has the highest number of best alignments for all other sampled
reads. This change was motivated by an update to the viruses collection from
NCBI used at CRUK CI as a single genomic reference and which now contains
several genomic sequences that share considerable sequence similarity to the
PhiX control added to every lane of sequencing carried out at the CRUK CI
Genomics facility; reads aligning to PhiX control will preferentially be
assigned to the PhiX reference genome over the virus genome collection without
the need to exclude homologous viral genomes from the virus collection.
* Fix issue with formatting of the trim start and length as floating point
numbers in the report instead of whole numbers.
* Updated adapter contaminant list using contaminant sequences taken from FastQC
v0.11.9.
Version 1.6.0 (5 August 2020)
-------------
* Updated to use workflow manager v1.8
* Use of OpenCSV to read the sample sheet.
Version 1.5.0 (5 August 2020)
-------------
* Updated to use workflow manager v1.7
* Various code improvements/refactoring.
Version 1.4.1 (22 May 2018)
-------------
* Fixed issue arising with sort order of alignments for sampled reads that can
occur when using numeric dataset identifiers. Sampled reads from multiple
datasets are collated in a different order to that expected by a check made
when reading alignments during the report generation step. The workflow
manager uses a numeric-aware comparator that recognizes the numeric components
of filenames when sorting. The comparator used when reading alignments has
been changed to use the same numeric-aware comparator.
(https://github.com/crukci-bioinformatics/MGA/issues/3)
Version 1.4 (27 February 2018)
-----------
* Added details of uniquely aligning reads to the results table and for
preferentially aligning reads which align best, i.e. with fewest mismatches to
the given reference genome, possibly tied with other matches to other genomes.
* Changed the logic for assigning reads to reference genomes; reads are now
assigned to the reference genome to which they align with the smallest number
of mismatches. In the event of a tie where the read aligns to more than one
genome with the same number of mismatches, the reference genome with the
largest number of preferentially aligned reads (best hits) is chosen.
* Colour highlighting now based on assigned reads and mismatch rates for
assigned reads.
* Added --trim-start option to prepare-pipeline to specify the position within
sequences from which to start trimming for subsequent alignment; any bases
before this position will be trimmed. This can be used to remove molecular
barcodes within the first 10 - 15 bases of reads.
* Run ID property in sample sheet is no longer required, the run identifier can
alternatively be specified using the --run-id option when running
prepare-pipeline.
* Added support for large reference genomes requiring use of large bowtie
indexes.
* Fixed problem with report generation not able to run on hosts without an X11
environment.
* Improvements to the report including change of criteria for including a
species as separate entry in the table of potential contaminants (only the
species most likely to be contaminants now included)
* Added prototype MGA wrapper for Illumina BaseSpace.
* Fixed problem with separate report per dataset containing tables for all
datasets.
Version 1.3 (10 January 2014)
-----------
* Major reworking of pipeline to combine FASTQ records from multiple datasets
and splitting the sequence data into chunks with a specified number of
records, corresponding changes to report generation code.
* FASTQ sampling rewritten to use modified version of reservoir sampling method.
* Added option for setting the maximum number of FASTQ records to sample from.
* Added option to create separate reports for each dataset in addition to the
report for all.
* Added option to specify the minimum number of sequences to display on the
x-axis of the stacked bar plot in the report.
* Improved scaling for larger image sizes for the stacked bar plot and added
anti-aliasing for text.
* Removed utility for creating a configuration file from the Solexa LIMS (no
longer in use).
Version 1.2 (4 February 2013)
-----------
* Fix for failing validation of XML schema definitions.
* New utility for creating the run metadata file by retrieving information
from the Solexa LIMS via the SOAP API.
Version 1.1 (26 September 2012)
-----------
* Summary report contains new graphical representation of the alignment results
and different highlighting for spike-in/control samples.
* Fixed bug in which MGA fails if any of the FASTQ sequence data files have no
sequences.
* Bowtie indexes no longer need to have '.bowtie' preceding the usual suffixes
(1.ebwt, 2.ebwt, 3.ebwt, 4,ebwt, rev.1.ebwt, rev.2.ebwt)
* Base qualities in the FASTQ sequence data files are now expected to be on the
Sanger scale, i.e. Phred+33.
Version 1.0 (13 July 2011)
-----------
* First release using workflow system also developed at the CRI and deployed as
a Java application, replacing the initial prototype developed in ruby.