Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mango cannot load fasta files with pipes in the contigName #417

Open
benwbooth opened this issue Aug 30, 2018 · 2 comments
Open

Mango cannot load fasta files with pipes in the contigName #417

benwbooth opened this issue Aug 30, 2018 · 2 comments

Comments

@benwbooth
Copy link

If I run mango-submit passing a .fasta file as reference:

/users/bbooth/src/mango/bin/mango-submit /data/seqdata/analysis/fakereads/S288C_reference_genome_R64-2-1_20150113/S288C_reference_sequence_R64-2-1_20150113.fsa.fasta -features /data/seqdata/analysis/fakereads//data/seqdata/analysis/fakereads/S288C_reference_genome_R64-2-1_20150113/saccharomyces_cerevisiae_R64-2-1_20150113.genes.gff3

The fasta file has chromosome names with pipes, e.g.:

ref|NC_001141|
ref|NC_001136|
ref|NC_001135|
ref|NC_001144|

..., etc.

Then I get this error:

Command body threw exception:
java.lang.AssertionError: assertion failed: SequenceRecord.name is null or empty
Exception in thread "main" java.lang.AssertionError: assertion failed: SequenceRecord.name is null or empty
        at scala.Predef$.assert(Predef.scala:170)
        at org.bdgenomics.adam.models.SequenceRecord.<init>(SequenceDictionary.scala:287)
        at org.bdgenomics.adam.models.SequenceRecord$.apply(SequenceDictionary.scala:403)
        at org.bdgenomics.adam.util.ReferenceContigMap$$anonfun$1.apply(ReferenceContigMap.scala:51)
        at org.bdgenomics.adam.util.ReferenceContigMap$$anonfun$1.apply(ReferenceContigMap.scala:50)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at org.bdgenomics.adam.util.ReferenceContigMap.<init>(ReferenceContigMap.scala:50)
        at org.bdgenomics.adam.util.ReferenceContigMap$.apply(ReferenceContigMap.scala:107)
        at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadReferenceFile$1.apply(ADAMContext.scala:3010)
        at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadReferenceFile$1.apply(ADAMContext.scala:3007)
        at scala.Option.fold(Option.scala:158)
        at org.apache.spark.rdd.Timer.time(Timer.scala:48)
        at org.bdgenomics.adam.rdd.ADAMContext.loadReferenceFile(ADAMContext.scala:3005)
        at org.bdgenomics.mango.models.AnnotationMaterialization.<init>(AnnotationMaterialization.scala:42)
        at org.bdgenomics.mango.cli.VizReads.initAnnotations(VizReads.scala:638)
        at org.bdgenomics.mango.cli.VizReads.run(VizReads.scala:586)
        at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
        at org.bdgenomics.mango.cli.VizReads.run(VizReads.scala:579)
        at org.bdgenomics.utils.cli.BDGCommandCompanion$class.main(BDGCommand.scala:33)
        at org.bdgenomics.mango.cli.VizReads$.main(VizReads.scala:69)
        at org.bdgenomics.mango.cli.VizReads.main(VizReads.scala)

This is due to the following lines in FastaConverter.parseDescriptionLine:

          // is this description metadata or not? if it is metadata, it will contain "|"
          if (split._1.contains('|')) {
            (None, Some(dL.stripPrefix(">").trim))

If a pipe character appears in the contig name, then the NucleotideFragment doesn't get a name, but only gets a description with the name included. This seems counterintuitive.

If there is no contigName, then mango doesn't know how to handle it. It seems obvious that fasta files should always get a contigName, even if the name contains a pipe character.

@benwbooth
Copy link
Author

Converting the fasta file to two-bit format works as a workaround for this case.

@akmorrow13
Copy link
Contributor

Hi @benwbooth, thanks for the catch! This looks like it is a bug in ADAM FastaConverter, not Mango. Can you make an issue there so we can track it?

In general, twoBit files are a little nicer to work with for the browser, due to their smaller size and responsiveness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants