Releases: EI-CoreBioinformatics/mikado
Version v2.2.0
Version 2.2.0
Removed Cython from the requirements.txt file. This allows to perform the tests correctly in a Conda environment (as Conda disallows installing Cython as part of a distributed package).
As a result of this change, the preferred installation procedure from source has to be slightly amended:
- either install using
pip wheel -w dist . && pip install dist/Mikado*whl
- or install with
python setup.py bdist_wheel
after having forcibly installed Cython, withpip install Cython
or the like.
Other changes:
- Fix #381: now Mikado will be able to guess correctly
the input file format, instead of relying on the file name extension or user's settings. Sniffing for files
provided as a stream is disabled though. - Fix #382: now Mikado can accept generic BED12 files
as input junctions, not just Portcullis junctions. This allows e.g. a user to provide a set of gene models
in BED12 format as sources of valid junctions. - Fix #384: now Mikado convert deals properly with
unsorted GTFs/GFFs. - Fix #386: dealing better with unsorted GFFs/GTFs for
the stats utility. - Fix #387: now Mikado will always use a static seed,
rather than generating a new one per call unless specifically instructed to do so. The old behaviour can still be
replicated by either setting theseed
parameter tonull
(ieNone
) in the configuration file, or by
specifying--random-seed
during the command invocation. - General increase in code unit-test coverage; in particular:
- Slightly increased the unit-test coverage for the locus classes, e.g. properly covering the
as_dict
andload_dict
methods. Minor bugfixes related to the introduction of these unit-tests.
- Slightly increased the unit-test coverage for the locus classes, e.g. properly covering the
Mikado.parsers.to_gff
has been renamed toMikado.parsers.parser_factory
.- The code related to the transcript padding has been moved to the submodule
Mikado.transcripts.pad
, rather than
being part of theMikado.loci.locus
submodule. - Mikado will error informatively if the scoring configuration file is malformed.
Patch release
Hotfix release:
- IMPORTANT Mikado now uses correctly the scores associated to a given source.
- IMPORTANT Mikado was not forwarding the original source to transcripts derived by chimera splitting. This compounded the issue above.
- Corrected the issue that caused the issues above, ie transcripts where not dumping and reloading all relevant fields. Now implemented properly and tested with specific new routines.
- Corrected an issue that caused Mikado to erroneously calculate twice the metrics and scores of loci, therefore reporting some wrong ones in the output files.
- affected metrics where e.g.
selected_cds_intron_fraction
andcombined_cds_intron_fraction
.
- affected metrics where e.g.
- Removed
quicksect
from the requirements.
v2.1.0: Issue 375 (#379)
Bugfix and speed improvement release.
- Fix a bug that prevented Mikado from reporting the correct metrics/scores in the output of loci files. This bug only affected reporting, not the results themselves. See issue 376
- Fix a bug in printing out the statistics for an annotation file with
mikado util stats
(issue 378) - When doing serialising, Mikado now by default will drop and reload everything. The previous default behaviour results in hard-to-parse errors and is not what is usually desired anyway.
- Improved the performance of pick in multiple ways (issue 375):
- now only external metrics that are requested in the scoring file will be printed out in the final
metrics
files. This reduces runtime in e.g. Minos. The new CLI switch--report-all-external-metrics
(both inconfigure
andpick
) can be used to revert to the old behaviour. - the
external
table in the Mikado database now is indexed properly, increasing speed. - batch and compress the results before sending them through a queue (@ljyanesm)
- @brentp enhanced the bcbio
intervaltree.pyx
intoquicksect
. Copied this new version of interval tree and adapted it to Mikado. - Using sqlalchemy bakeries for the SQLite queries, as well as LRU caches in various parts of Mikado.
- Removed excessive copying in multiple parts of the program, especially regarding the configuration objects and during padding.
- Using
operator.attrgetter
instead of a custom (and slower) recursivegetattr
function.
- now only external metrics that are requested in the scoring file will be printed out in the final
- Removed unsafe calls to tempfile.mktemp and the like, for increased security according to CodeQL.
2.0.2
Bugfix release.
- Fix infinite recursion bug when trying to recover lost transcripts
- Fix performance regression by passing the configuration to Excluded locus objects.
Marshmallow mate
- Fixed a bug that caused Mikado configure (but not daijin configure, or "mikado configure --daijin") to print out invalid configuration files.
- Restored the functionality of "--full" - now Mikado can print out both partial (but still valid) or fully-fledged configuration files.
- Ported also the scoring configuration to MarshMallow dataclass. As a direct results, removed from the dependencies
jsonschema
. - Configured bumpversion
- Corrected a small bug in parsing EnsEMBL GFF3
- Cured some deprecation warning messages from marshmallow and numpy
- Small bug fix in the CLIs of mikado/daijin configure.
- Default value of the seed is now 0 (ie: undefined, a random one will be selected). Only integers are allowed values.
- Small bugfixes/extensions in the test suite.
- Minor code reorganisation, without changes to the API.
Mikado version 2
Official second release of Mikado. All users are advised to update as soon as possible.
See https://github.com/EI-CoreBioinformatics/mikado/milestone/22?closed=1 for a non-comprehensive list of all the issues closed in
relation to this release.
Mikado 2, public release candidate 2
Minor amendments to 2.0rc1 - in order to get Mikado to install properly in BioConda.
Mikado 2, public release candidate 1
This version of Mikado is finally ready to go into Conda, DockerHub, PyPI and Singularity Hub.
Many thanks to @ljyanesm, thanks to whom Mikado has become much more performant.
Most notable changes:
- Mikado serialise will now accept tabular BLAST files (with the extra columns
ppos
andbtop
). Both XML and TSV loading have parts written in Cython. Thank you to @srividya22 for first asking about improvements in this sense. #280 - Mikado prepare now will remove redundancies based on intron chains, not perfect to-the-base identity. This should massively reduce the input data. The redundancy filter can be controlled per-source: ie, Mikado is able to keep all transcripts from certain input files (reference annotations, ab initio predictions, transcript assemblies, etc) while removing any redundant transcript from others (long-read alignments). Thanks to @lijing28101. #270
- Mikado prepare now will try to split transcripts with very long introns, rather than outright discard them.
- Mikado pick will now operate in stringent mode by default (ie: only split transcripts when there is strong evidence of them being chimeras, as per the BLAST data).
- Mikado now uses TOML as default configuration language, as it is much more human-readable than either YAML or JSON (#239).
- Various bugfixes.
Version 2.0, release candidate 6
- #216: now
mikado prepare
will explicitly tell users to use themikado_prepared.fasta
for theserialise
step. Moreover,mikado serialise
will informatively crash if users try to do something different (a common mistake seems to be to use a FASTA file derived directly from the input assemblies). - #220: Fixed a bug in
mikado serialise
- #222: now
daijin
will makeprodigal
orTransDecoder
use alternative genetic codes, upon request. IMPORTANT:TransDecoder
does not support all of the known genetic codes listed by NCBI. - #223: fixed the start-adjustment method in the ORF module.
- #226:
mikado compare
,mikado util stats
andmikado util grep
are now compatible with non-standard NCBI GFF3 files (having e.g.pseudogene
features without any associated transcript but associated exons, orrRNA
transcript features without any parent gene) - #227: now
mikado compare
will always consider valid transcripts, even if they are multiexonic yet missing a defined strand orientation. - #229:
mikado pick
will now:- report the padding as INFO, not as WARNING
- report on finishing the analysis of a chromosome, not the parsing
- report the temporary analysis directory
- provide
--max-intron-length
as a command line option
- fixed a small bug in
mikado serialise
- fixed a bug in the ORF module that caused a crash when the sequence was not completely uppercase
- #230: fixed some bugs related to the
daijin
conda environments and to updates to thesnakemake
code upstream. - Fix a small bug in reference_gene.py and transcript.py, related to
sys.intern
- #232: typo in the help for
mikado serialise
.
Version 2.0, release candidate 5
- Switched from
ujson
torapidjson
(actively maintained and as performant) - Fix #209:
daijin
has been debugged and it is now properly tested. Also, when usingdaijin mikado
, the number of XMLs will be equal or greater than the number of requested threads. - #177:
mikado serialise
is now completely parallelised. This allows for very significant speed-ups, especially when loading a large number of ORFs. - Speedups for
mikado pick
: now the GTF will be parsed much more quickly, by avoiding to create a full GTFline object for each line during the parsing (which was extra-slow). daijin
can now optionally useconda
environments, using theconda
directive ofsnakemake
.- Speedup in
mikado pick
: now everything is written to databases (#218). This allows for cleaner temporary directories and parsing of the partial outputs. mikado pick
now will not, by default, print out the subloci file.- Speed up in
mikado pick
: now using a lightweight graph also for the splicing. - Amend #134 - now the minimum CDS overlap is 50%, not 75%.
- Fixed a bug for
mikado compare
in multiprocessing mode - Fixed a bug in
mikado configure
- the scoring file will not be embedded within the printed file (otherwise it will be impossible to change it dynamically).