Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting error #9

Open
shanmugavadivelps opened this issue Jul 8, 2022 · 7 comments
Open

getting error #9

shanmugavadivelps opened this issue Jul 8, 2022 · 7 comments

Comments

@shanmugavadivelps
Copy link

shanmugavadivelps commented Jul 8, 2022

java -Xmx30G -jar /home/iipruser/VirGenA_v1.4/VirGenA.jar assemble -c /home/iipruser/VirGenA_v1.4/config_test_linux.xml
java.io.IOException: File /media/iipruser/shanmu_data/Sanjay_Viral_whole_genome/denovo_with_reference_alignment_27th_Nov_2021/ALL_NPV/AllNPV_samtools_reads_1.96m_reads/all_npv_samtools_R1_paired.fastq.gz have incorrect sequence identifier string
at DataReader.readFilesWithReads(DataReader.java:142)
at DataReader.readData(DataReader.java:41)
at DataReader.(DataReader.java:75)
at DataReader.getInstance(DataReader.java:102)
at KMerCounter.(KMerCounter.java:17)
at KMerCounter.getInstance(KMerCounter.java:59)
at Mapper.(Mapper.java:29)
at ConsensusBuilderSimple.(ConsensusBuilderSimple.java:23)
at ConsensusBuilderWithReassembling.(ConsensusBuilderWithReassembling.java:41)
at RefBasedAssembler.run(RefBasedAssembler.java:665)
at VirGenA.main(VirGenA.java:34)
java.lang.NullPointerException
at KMerCounter.(KMerCounter.java:40)
at KMerCounter.getInstance(KMerCounter.java:59)
at Mapper.(Mapper.java:29)
at ConsensusBuilderSimple.(ConsensusBuilderSimple.java:23)
at ConsensusBuilderWithReassembling.(ConsensusBuilderWithReassembling.java:41)
at RefBasedAssembler.run(RefBasedAssembler.java:665)
at VirGenA.main(VirGenA.java:34)
java.io.IOException: File /media/iipruser/shanmu_data/Sanjay_Viral_whole_genome/denovo_with_reference_alignment_27th_Nov_2021/ALL_NPV/AllNPV_samtools_reads_1.96m_reads/all_npv_samtools_R1_paired.fastq.gz have incorrect sequence identifier string
at DataReader.readFilesWithReads(DataReader.java:142)
at DataReader.readData(DataReader.java:41)
at DataReader.(DataReader.java:75)
at DataReader.getInstance(DataReader.java:102)
at ConsensusBuilderWithReassembling.assemble(ConsensusBuilderWithReassembling.java:762)
at RefBasedAssembler.run(RefBasedAssembler.java:666)
at VirGenA.main(VirGenA.java:34)
java.lang.NullPointerException
at ConsensusBuilderWithReassembling.assemble(ConsensusBuilderWithReassembling.java:764)
at RefBasedAssembler.run(RefBasedAssembler.java:666)
at VirGenA.main(VirGenA.java:34)

I am using same reads for denovo assembly with SPAdes and that works fine. but getting error here.

@gFedonin
Copy link
Owner

gFedonin commented Jul 8, 2022

Hello!

It says that the files "have incorrect sequence identifier string". VirGenA assumes the reads' names in pairs in two files to concide: it actually assumes that some reads may have no pairs, so it matches reads pairs by names, not by the order in the files. The other assemblers may ignore name issues. Could you share a couple of top strings from all_npv_samtools_R1_paired.fastq.gz and all_npv_samtools_R2_paired.fastq.gz?

Sincerely yours,
Gennady.

@shanmugavadivelps
Copy link
Author

shanmugavadivelps commented Jul 9, 2022 via email

@gFedonin
Copy link
Owner

gFedonin commented Jul 9, 2022

I can't see the attachment.

@shanmugavadivelps
Copy link
Author

shanmugavadivelps commented Jul 10, 2022 via email

@gFedonin
Copy link
Owner

The names are expected to either end with /1 or /2 or to contain the ' ' character. The id of the pair is considered to be a substring starting from the first character after @ till the '/' or ' '. Here are the examples of Illumina names from Wikipedia:

"@HWUSI-EAS100R:6:73:941:1973#0/1"

"@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG"

"@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36"

It looks like your reads were preprocessed and the names were trimmed or even the read order was changed. In case you sure the order is correct you can try to rename all the reads like "SampleID_1 1" ... "SampleID_10000 1" in the first file and "SampleID_1 2" ... "SampleID_10000 2" in the second. But are you really sure these reads you printed do form the correct pairs?

@shanmugavadivelps
Copy link
Author

shanmugavadivelps commented Jul 10, 2022 via email

@gFedonin
Copy link
Owner

This looks really strange: the names you've posted last time are all correct: they all end with '/1' or '/2'. Are you sure you are giving these reads to the program? It should work... May be a few of the reads still have no '/1' or '/2' at the ends?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants