Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: number of input seqs differ (aa: 1; nuc: 2)!! #32

Open
madzafv opened this issue Jan 25, 2022 · 1 comment
Open

ERROR: number of input seqs differ (aa: 1; nuc: 2)!! #32

madzafv opened this issue Jan 25, 2022 · 1 comment

Comments

@madzafv
Copy link

madzafv commented Jan 25, 2022

Hello, I'm running dNdS() on the cds of 2 species containing 13486 orthologous pairs, but only 1754 genes get the calculations done for. The rest runs into this error.

ERROR: number of input seqs differ (aa: 1; nuc: 2)!!

I'm running the program as follow:

rm(list=ls())
library(orthologr);
getwd();
workingDir = "/users/mfariasv/data/mfariasv/aligned_newBFV2/dNdS/"
setwd(workingDir);
args = commandArgs(trailingOnly=TRUE)
query = args[1]
subject = args[2]
res <- dNdS( query , subject ,
                 ortho_detection = "RBH",
                 seq_type = "cds",
                 aa_aln_type     = "multiple",
                 aa_aln_tool     = "clustalo",
                 codon_aln_tool  = "pal2nal",
                 dnds_est.method = "YN",
                 comp_cores      = 1,
                 store_locally = TRUE)
write.csv(res, gsub(".fa","ZF.dNdS", basename(args[2])))

The program runs:

Starting orthology inference (RBH) and dNdS estimation (YN) using the follwing parameters:
query = 'ZFcdsorth.fa'
subject = 'BFcdsorth.fa'
seq_type = 'cds'
e-value: 1E-5
aa_aln_type = 'multiple'
aa_aln_tool = 'clustalo'
comp_cores = '1'


Creating folder 'orthologr_alignment_files' to store alignment files ...
Starting Orthology Inference ...
Running blastp: 2.9.0+ ...
There seem to be 6 coding sequences in your input dataset which cannot be properly divided in base triplets, because their sequence length cannot be divided by 3.
A fasta file storing all corrupted coding sequences for inspection was generated and stored at '/gpfs/data/ehuertas/mfariasv/aligned_newBFV2/dNdS/ZFcdsorth.fa_corrupted_cds
_seqs.fasta'.


You chose option 'delete_corrupt_cds = FALSE', thus corrupted coding sequences were retained for subsequent analyses.
The following modifications were made to the CDS sequences that were not divisible by 3:
- If the sequence had 1 residue nucleotide then the last nucleotide of the sequence was removed.
- If the sequence had 2 residue nucleotides then the last two nucleotides of the sequence were removed.
If after consulting the file 'ZFcdsorth.fa_corrupted_cds_seqs.fasta' you wish to remove all corrupted coding sequences please specify the argument 'delete_corrupt_cds = TRU
E'.
All corrupted CDS were trimmed.


Building a new DB, current time: 01/24/2022 19:25:22
New DB name:   /tmp/RtmpUFtcuE/_blast_db/blastdb_BFcdsorth.fa_protein.fasta
New DB title:  blastdb_BFcdsorth.fa_protein.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 13486 sequences in 0.380335 seconds.
Running blastp: 2.9.0+ ...
There seem to be 6 coding sequences in your input dataset which cannot be properly divided in base triplets, because their sequence length cannot be divided by 3.
A fasta file storing all corrupted coding sequences for inspection was generated and stored at '/gpfs/data/ehuertas/mfariasv/aligned_newBFV2/dNdS/ZFcdsorth.fa_corrupted_cds
_seqs.fasta'.


You chose option 'delete_corrupt_cds = FALSE', thus corrupted coding sequences were retained for subsequent analyses.
The following modifications were made to the CDS sequences that were not divisible by 3:
- If the sequence had 1 residue nucleotide then the last nucleotide of the sequence was removed.
- If the sequence had 2 residue nucleotides then the last two nucleotides of the sequence were removed.
If after consulting the file 'ZFcdsorth.fa_corrupted_cds_seqs.fasta' you wish to remove all corrupted coding sequences please specify the argument 'delete_corrupt_cds = TRU
E'.
All corrupted CDS were trimmed.


Building a new DB, current time: 01/24/2022 20:21:27
New DB name:   /tmp/RtmpUFtcuE/_blast_db/blastdb_ZFcdsorth.fa_protein.fasta
New DB title:  blastdb_ZFcdsorth.fa_protein.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 13486 sequences in 0.404176 seconds.
There seem to be 6 coding sequences in your input dataset which cannot be properly divided in base triplets, because their sequence length cannot be divided by 3.
A fasta file storing all corrupted coding sequences for inspection was generated and stored at '/gpfs/data/ehuertas/mfariasv/aligned_newBFV2/dNdS/ZFcdsorth.fa_corrupted_cds
_seqs.fasta'.


You chose option 'delete_corrupt_cds = FALSE', thus corrupted coding sequences were retained for subsequent analyses.
The following modifications were made to the CDS sequences that were not divisible by 3:
- If the sequence had 1 residue nucleotide then the last nucleotide of the sequence was removed.
- If the sequence had 2 residue nucleotides then the last two nucleotides of the sequence were removed.
If after consulting the file 'ZFcdsorth.fa_corrupted_cds_seqs.fasta' you wish to remove all corrupted coding sequences please specify the argument 'delete_corrupt_cds = TRU
E'.
All corrupted CDS were trimmed.
Orthology Inference Completed.
Starting dN/dS Estimation ...

ERROR: number of input seqs differ (aa: 1;  nuc: 2)!!

   aa  'A1CF'
   nuc 'A1CF A1CF'
*****************************************************************
Function: Parse fasta file with aligned pairwise sequences into AXT file
Reference: Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J: KaKs Calculator: Calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinforma
tics 2006 , 4:259-263.
Web Link: Documentation, example and updates at <http://code.google.com/p/kaks-calculator>
*****************************************************************

I noticed that all the orthologous pairs for which the error DOES NOT have different names

[mfariasv@login005 dNdS]$ head BFcdsorthZF.dNdS
"","query_id","subject_id","dN","dS","dNdS","method","perc_identity","num_ident_matches","alig_length","mismatches","gap_openings","n_gaps","pos_match","ppos","q_start","q_end","q_len","qcov","qcovhsp","s_start","s_end","s_len","evalue","bit_score","score_raw"
"1","ABCF2","LOC110475106",0.000859565,0.0378507,0.0227094,"YN",99.801,501,502,1,0,0,501,99.8,52,553,553,100,91,123,624,624,0,1051,2719
[mfariasv@login005 dNdS]$ sed -n -e '/ABCF2/,/>/ p' ZFcdsorth.fa
>ABCF2
ATGCCCTCCGACCTGGCCAAAAAGAAGGCGGCCAAGAAGAAGGAGGCGGCCAAGGCCCGG
CAGCGGCCGCGCCGGGTCCCGGACGAGAACGGTGATGCCGGGACGGAGCCGCAGGAAGTC
CGGTCCCCGGAGGCCAACGGGACGGTGCTGCCAGGGAAATCCATGCTTTTGTCAGCTATT
GGGAAGCGAGAAGTGCCTATCCCAGAGCACATTGACATCTATCACCTGACCCGAGAGATG
CCTCCCAGTGACAAGACCCCTCTGCAGTGTGTGATGGAAGTGGATACAGAGAGGGCCATG
TTGGAGCGAGAAGCGGAACGTTTAGCTCATGAAGATGCGGAATGTGAGAAACTCCTGGAG
TTATATGAACGCCTGGAGGAGCTGGATGCTGATAAGGCAGAAGCACGAGCCTCACGTATC
CTTCACGGCTTGGGGTTCACACCGGCCATGCAGAGGAAGAAGCTGAAGGACTTCAGTGGT
GGCTGGCGAATGAGGGTGGCCCTTGCCAGAGCGCTCTTCATTCGGCCTTTCATGCTGCTG
CTTGATGAGCCCACAAACCACCTTGACCTGGATGCCTGTGTGTGGTTGGAGGAAGAGCTG
AAAACGTTCAAGCGGATTCTTGTGCTGATATCCCACTCCCAGGACTTCCTGAATGGCGTC
TGCACCAACATCATCCACATGCACAACCGCAAACTTAAGTACTACACGGGAAATTATGAT
CAGTATGTAAAGACTCGCTTAGAACTAGAAGAAAATCAAATGAAGCGATTCCACTGGGAG
CAAGATCAGATTGCTCATATGAAGAATTACATTGCACGATTTGGCCATGGTAGTGCGAAG
CTGGCCAGGCAAGCTCAGAGCAAGGAGAAGACCCTTCAAAAAATGATGGCTTCTGGCTTG
ACAGAGAGAGTTGTGAATGATAAGACTTTATCATTCTACTTTCCACCCTGTGGGAAAATT
CCCCCTCCTGTCATCATGGTGCAGAATGTCAGCTTCAGATACACCAAGGATGGGCCATGG
ATCTATAATAACCTGGAGTTTGGGATTGATCTGGATACTCGTGTAGCTCTTGTTGGACCC
AATGGAGCTGGAAAGTCAACACTGCTGAAACTGCTCACAGGAGAGCTGCTGCCCACAGAT
GGGATGATTCGCAAGCACTCACATGTGAAGATCGGTAGATACCACCAGCACTTGCAAGAG
CAGTTGGACTTAGACCTCTCACCATTGGAGTACATGCTGAAATGCTACCCAGAGATCAAG
GAGAAGGAGGAGATGAGGAAAATCATTGGCAGATACGGTTTGACAGGGAAGCAGCAGGTG
AGCCCCATCAGGAACCTCTCTGATGGGCAGAAGTGCCGTGTGTGCTTTGCCTGGCTGGCC
TGGCAGAACCCTCACATGCTCTTCCTGGACGAGCCCACCAACCACCTGGACATAGAAACC
ATAGATGCACTGGCAGATGCTATCAATGAGTTCGAGGGAGGAATGATGCTTGTCAGCCAT
GACTTCAGACTCATCCAACAGGTTGCACAGGAAATCTGGGTCTGTGAGAAGCAGACAATC
GCCAAGTGGCAAGGGGACATCCTTGCCTACAAGGAGCATCTCAAGTCGAAGCTGGTGGAT
GAGGACCCGCAGCTCACCAAACGGACCCACAATGTGTGA
>ABCG1
[mfariasv@login005 dNdS]$ sed -n -e '/LOC110475106/,/>/ p' BFcdsorth.fa
>LOC110475106
ATGCCCTCCGACCTGGCCAAGAAGAAGGCGGCCAAGAAGAAGGAGGCGGCCAAGGCCCGG
CAGCGGCCGCGCCGGGTCCCGGACGAGAACGGTGATGCCGGGACGGAGCCGCAGGAAGTC
CGGTCCCCGGAGGCCAACGGGACGGTGCTACCAGAGGTGGATGCTCTTACAAAGGAGCTG
GAGGATTTTGAGTTAAAGAAAGCTGCTGCCCGAGCCGTGACAGGAGTGCTGGCCTCCCAC
CCCAACAGCACTGATGTGCATATCATCAACCTCTCACTGACCTTTCATGGCCAGGAGCTG
CTGAGTGACACCAAACTGGAGCTGAACTCTGGGAGACGCTATGGCCTGATTGGACTCAAT
GGGATTGGGAAATCCATGCTTTTGTCAGCTATTGGGAAGCGAGAAGTGCCCATCCCAGAG
CACATTGACATCTATCACCTGACCCGAGAGATGCCTCCCAGTGACAAGACCCCTCTGCAG
TGTGTGATGGAAGTGGATACAGAGAGGGCCATGTTGGAGCGAGAAGCGGAACGTTTAGCT
CATGAAGATGCGGAATGTGAGAAACTCCTGGAGTTATATGAACGCCTGGAGGAGCTGGAT
GCTGATAAGGCAGAAGCACGAGCCTCACGTATCCTTCATGGCTTGGGGTTCACGCCGGCC
ATGCAGAGGAAGAAGCTGAAGGACTTCAGTGGTGGCTGGCGAATGAGGGTGGCCCTTGCC
AGAGCGCTCTTCATTCGGCCTTTCATGCTGCTGCTCGATGAGCCCACAAACCACCTTGAC
CTGGATGCCTGTGTGTGGTTGGAGGAAGAGCTGAAAACGTTCAAGCGGATTCTTGTGCTG
ATATCCCACTCCCAGGACTTCCTGAATGGTGTCTGCACCAACATCATCCACATGCACAAC
CGCAAACTTAAGTACTACACGGGAAATTATGATCAGTATGTAAAGACACGCTTAGAACTA
GAAGAAAATCAAATGAAGCGATTCCACTGGGAGCAAGATCAGATTGCTCATATGAAGAAT
TACATTGCACGATTTGGCCATGGTAGTGCGAAGCTGGCCAGGCAAGCTCAGAGCAAGGAG
AAGACCCTTCAAAAAATGATGGCTTCTGGCTTGACAGAGAGAGTTGTGAATGATAAGACT
TTATCATTCTACTTTCCACCCTGTGGGAAAATTCCCCCTCCTGTCATCATGGTGCAGAAT
GTCAGCTTCAGATACACCAAGGATGGGCCATGGATCTATAATAACCTGGAGTTTGGGATT
GACCTGGATACTCGTGTAGCTCTTGTTGGACCCAATGGAGCTGGAAAGTCAACCCTGCTG
AAACTGCTCACAGGAGAGCTGCTGCCCACAGATGGGATGATTCGCAAGCACTCGCATGTG
AAGATCGGTAGATACCACCAGCACTTGCAAGAGCAGTTGGACTTAGACCTCTCACCATTA
GAGTACATGCTGAAATGCTACCCAGAGATCAAGGAGAAGGAGGAGATGAGGAAAATCATT
GGCAGATACGGTTTGACAGGGAAGCAGCAGGTGAGTCCCATCAGGAACCTCTCTGATGGA
CAGAAGTGCCGTGTGTGCTTTGCCTGGCTGGCCTGGCAGAACCCTCACATGCTCTTCCTG
GATGAGCCCACCAACCACCTGGACATAGAAACTATAGATGCACTGGCAGATGCTATCAAT
GAGTTTGAGGGAGGAATGATGCTTGTCAGCCATGACTTCAGACTCATCCAACAGGTTGCA
CAGGAAATCTGGGTCTGTGAGAAGCAGACAATCACCAAGTGGCAAGGGGACATCCTTGCC
TACAAGGAGCATCTCAAGTCGAAGCTGGTGGATGAGGACCCGCAGCTCACCAAACGGACC
CACAACGTGTGA
>LOC110475116

While all genes for which the error happens and dNdS is not calculated have the same names in both species:

For example for ABCG1


[mfariasv@login005 dNdS]$ sed -n -e '/ABCG1/,/>/ p' ZFcdsorth.fa
>ABCG1
ATGGCATGTCTGATGGCGGCTTTCTCCCTGGGCAGCGCTTTGGGTGGCAGCAGTTCTGGT
TGCACCATGGCCGAGCCAAAGTCTGTGTGTGTTTCTGTGGACGAGGTGGTCTCCAATGGC
ACAGACACCCAGGACATCCGACTCATCAATGGACACTTAAAAAAAGTGGACAATGCTCTG
ACAGAAGCCCACAGGTTCTCCTACCTGCCCCGCAGGCCAGCTGTGAACATTGAGTTTAAA
GAACTGTCCTACTCTATCCAGGAAGGGCCATGGTGGAGAAAGAAAGGTTATAAGACCCTT
TTGAAAGGAATTTCAGGGAAATTCAGCAGTGGAGAACTCGTTGCAATTATGGGACCTTCA
GGAGCTGGGAAGTCAACGCTTATGAATATTCTGGCAGGATACAGAGAGACGGGGATGAAA
GGAGAAATCCTCATCAACGGGCAGCCCCGCGACCTGCGCTCCTTCCGCAAGGTCTCCTGC
TACATCATGCAGGATGACATGCTCCTTCCTCACCTCACTGTCCAGGAAGCTATGATGGTA
TCTGCTCATCTGAAACTTCAAGAGAAAGATGAAGGGAGGAGAGAAATGGTGAAGGAAATC
CTGACAGCCCTTGGTTTGCTGGCGTGTGCCAACACCAGGACTGGGAGTCTCTCAGGAGGC
CAGAGGAAGCGCCTCGCCATCGCTCTGGAGCTGGTGAACAACCCTCCTGTCATGTTCTTC
GATGAACCAACCAGTGGCTTGGACAGTGCATCATGTTTCCAGGTGGTCTCTCTGATGAAG
GCTTTGGCCCAGGGTGGCAGATCCATCATCTGCACGATTCACCAGCCCAGTGCAAAACTG
TTTGAGCTCTTTGACCAGCTCTATGTTCTAAGTCAAGGTCAGTGCATTTACCGTGGGAAG
GTGACAAACCTCGTCCCTTACTTGAGAGATTTGGGGTTGAATTGTCCAACCTACCACAAC
CCAGCAGATTTTGTGATGGAAGTGGCCTCGGGTGAGTACGGGGACCAGAACAGCCGCCTG
GTCAGGGCTGTGAGGGAGAGGATTTGTGACACAGACTACAAGAGAGACGTGGTTGGGGAG
CACGAGCTGAACCCCTTCCTCTGGCACCGGCCCTCTGAAGAGGACTCATCCTCCACAGAA
GGCTGCCACAGCTTCTCTGCCAGCTGCCTAACCCAGTTCTGCATCCTCTTCAAAAGAACT
TTCCTCACCATCATGAGAGACTCGGTCCTGACACACTTGAGGATCACCTCACACATTGGC
ATTGGGCTGCTCATTGGATTGCTCTACTTGGGCATTGGCAATGAAGCCAAGAAAGTCCTC
AGCAACTCGGGGTTCCTCTTCTTCTCCATGTTGTTCCTCATGTTTGCTGCGCTCATGCCG
ACCGTCCTCACCTTTCCCCTTGAGATGGGAGTGTTTCTCAGAGAGCACCTGAACTACTGG
TACAGCCTAAAAGCCTATTACCTCGCCAAAACCATGGCTGATGTTCCTTTCCAGATCATG
TTCCCTGTGGCTTACTGCAGCATCGTGTACTGGATGACTTCCCAGCCCTCCGACGCGCTC
CGCTTCGTCCTCTTCGCAGCCCTGGGGACCATGACATCCCTGGTGGCTCAGTCACTGGGC
CTGCTCATAGGTGCAGCCTCCACATCCCTCCAGGTGGCAACTTTTGTGGGCCCAGTTACT
GCCATCCCAGTCCTCCTGTTCTCTGGGTTTTTTGTCAGCTTTGACACCATCCCAACATAC
CTCCAGTGGATGTCCTACATTTCCTATGTCAGATATGGGTTCGAAGGAGTCATCCTCTCC
ATCTACGGACTGGATCGAGAAGATCTGCATTGTGACAAAGATGACACCTGCCACTTCCAA
AAATCAGAGGCCATCCTGAAAGAACTGGATGTAGAAAATGCCAAACTTTACCTGGACTTC
ATTGTTCTTGGGATTTTCTTCTTCTCTCTGCGCCTGATTGCCTATTTTGTCCTCAGATAC
AAAATCCGAGCGGAGAGGTAA
>ABCG2
[mfariasv@login005 dNdS]$ sed -n -e '/ABCG1/,/>/ p' BFcdsorth.fa
>ABCG1
ATGGCATGTCTGATGGCGGCTTTCTCCCTGGGCAGCGCTTCGGGTGGCAGCAGTTCTGGT
TGCACCATGGCCGAGCCAAAGTCTGTGTGTGTTTCTGTGGACGAGGTGGTCTCCAATGGC
ACAGACACCCAGGACATCCGACTCATCAATGGACACTTAAAAAAAGTGGACAATGCTCTG
ACAGAAGCTCACAGGTTCTCCTACCTGCCCCGCAGGCCAGCTGTGAACATTGAGTTTAAA
GAACTCTCCTACTCTATCCAGGAAGGGCCATGGTGGAGAAAGAAAGGTTATAAAACCCTT
TTGAAAGGAATTTCAGGGAAGTTCAGCAGTGGAGAGCTCGTTGCAATTATGGGACCTTCA
GGAGCTGGGAAGTCAACGCTTATGAATATTCTGGCAGGATACAGAGAGACGGGGATGAAA
GGAGAAATCCTCATCAACGGGCAGCCCCGCGACCTGCGCTCCTTCCGCAAGGTCTCCTGC
TACATCATGCAGGATGACATGCTCCTTCCTCACCTCACTGTCCAGGAAGCTATGATGGTA
TCTGCTCATCTGAAACTTCAAGAGAAAGATGAAGGGAGGAGAGAAATGGTGAAGGAAATC
CTGACAGCCCTTGGTTTGCTGGCCTGTGCCAACACCAGGACTGGGAGCCTCTCAGGAGGC
CAGAGGAAGCGCCTCGCCATCGCTCTGGAGCTGGTGAACAACCCTCCTGTCATGTTCTTC
GATGAACCAACCAGTGGCTTGGACAGTGCATCATGTTTTCAGGTGGTCTCTCTGATGAAG
GCTTTGGCCCAGGGTGGCAGATCCATCATCTGCACAATTCACCAGCCCAGTGCAAAACTG
TTTGAGCTCTTTGACCAGCTCTATGTTCTAAGTCAAGGTCAGTGCATTTACCGTGGGAAG
GTGACAAACCTTGTCCCTTACTTGAGAGATTTGGGGTTGAATTGTCCAACCTACCACAAC
CCAGCAGATTTTGTAATGGAAGTGGCCTCGGGTGAGTACGGGGACCAGAACAGCCGCCTG
GTCAGGGCTGTGAGAGAGAGGATTTGTGACACAGACTACAAGAGAGACGTGGCTGGGGAG
CACGAGCTGAACCCCTTCCTCTGGCACCGGCCCTCTGAAGAGGATTCCTCCTCCACAGAA
GGATGCCACAGCTTCTCTGCCAGCTGCCTAACCCAGTTCTGCATCCTCTTCAAAAGAACT
TTCCTCACCATCATGAGGGACTCGGTCCTGACACACTTGAGGATCACCTCACACATTGGC
ATTGGGCTGCTCATTGGACTGCTCTACTTGGGCATTGGCAATGAAGCCAAGAAAGTCCTC
AGCAACTCAGGGTTCCTCTTCTTCTCCATGTTGTTCCTCATGTTTGCTGCACTCATGCCG
ACCGTCCTCACCTTTCCCCTTGAGATGGGAGTGTTTCTCAGAGAGCATCTGAACTACTGG
TACAGCCTGAAAGCCTATTACCTCGCCAAAACCATGGCTGATGTTCCTTTTCAGATCATG
TTCCCTGTGGCTTACTGCAGCATCGTGTACTGGATGACTTCCCAGCCCTCCGACGCGCTC
CGCTTCGTCCTCTTCGCAGCCCTGGGGACCATGACATCCCTGGTGGCTCAGTCACTGGGC
CTGCTCATAGGTGCAGCCTCCACATCCCTCCAGGTGGCAACTTTTGTGGGCCCAGTTACT
GCCATCCCAGTCCTCCTGTTCTCTGGGTTTTTTGTCAGCTTTGACACCATCCCAACATAC
CTCCAGTGGATGTCCTACATTTCCTATGTCAGATACGGGTTCGAAGGAGTCATCCTCTCC
ATCTACGGACTGGATCGAGAAGATCTGCATTGTGACAAAGATGACACCTGCCACTTCCAA
AAATCAGAGGCCATCCTGAAAGAACTGGATGTAGAAAATGCCAAACTCTACCTGGACTTC
ATCGTTCTTGGGATTTTCTTCTTCTCTCTGCGCCTGATTGCCTATTTTGTCCTCAGATAC
AAAATCCGAGCGGAGAGGTAA
>ABCG2

But the sequences are indeed different:

diff ABCG1_BF ABCG1_ZF
2c2
< ATGGCATGTCTGATGGCGGCTTTCTCCCTGGGCAGCGCTTCGGGTGGCAGCAGTTCTGGT
---
> ATGGCATGTCTGATGGCGGCTTTCTCCCTGGGCAGCGCTTTGGGTGGCAGCAGTTCTGGT
5,7c5,7
< ACAGAAGCTCACAGGTTCTCCTACCTGCCCCGCAGGCCAGCTGTGAACATTGAGTTTAAA
< GAACTCTCCTACTCTATCCAGGAAGGGCCATGGTGGAGAAAGAAAGGTTATAAAACCCTT
< TTGAAAGGAATTTCAGGGAAGTTCAGCAGTGGAGAGCTCGTTGCAATTATGGGACCTTCA
---
> ACAGAAGCCCACAGGTTCTCCTACCTGCCCCGCAGGCCAGCTGTGAACATTGAGTTTAAA
> GAACTGTCCTACTCTATCCAGGAAGGGCCATGGTGGAGAAAGAAAGGTTATAAGACCCTT
> TTGAAAGGAATTTCAGGGAAATTCAGCAGTGGAGAACTCGTTGCAATTATGGGACCTTCA
12c12
< CTGACAGCCCTTGGTTTGCTGGCCTGTGCCAACACCAGGACTGGGAGCCTCTCAGGAGGC
---
> CTGACAGCCCTTGGTTTGCTGGCGTGTGCCAACACCAGGACTGGGAGTCTCTCAGGAGGC
14,15c14,15
< GATGAACCAACCAGTGGCTTGGACAGTGCATCATGTTTTCAGGTGGTCTCTCTGATGAAG
< GCTTTGGCCCAGGGTGGCAGATCCATCATCTGCACAATTCACCAGCCCAGTGCAAAACTG
---
> GATGAACCAACCAGTGGCTTGGACAGTGCATCATGTTTCCAGGTGGTCTCTCTGATGAAG
> GCTTTGGCCCAGGGTGGCAGATCCATCATCTGCACGATTCACCAGCCCAGTGCAAAACTG
17,26c17,26
< GTGACAAACCTTGTCCCTTACTTGAGAGATTTGGGGTTGAATTGTCCAACCTACCACAAC
< CCAGCAGATTTTGTAATGGAAGTGGCCTCGGGTGAGTACGGGGACCAGAACAGCCGCCTG
< GTCAGGGCTGTGAGAGAGAGGATTTGTGACACAGACTACAAGAGAGACGTGGCTGGGGAG
< CACGAGCTGAACCCCTTCCTCTGGCACCGGCCCTCTGAAGAGGATTCCTCCTCCACAGAA
< GGATGCCACAGCTTCTCTGCCAGCTGCCTAACCCAGTTCTGCATCCTCTTCAAAAGAACT
< TTCCTCACCATCATGAGGGACTCGGTCCTGACACACTTGAGGATCACCTCACACATTGGC
< ATTGGGCTGCTCATTGGACTGCTCTACTTGGGCATTGGCAATGAAGCCAAGAAAGTCCTC
< AGCAACTCAGGGTTCCTCTTCTTCTCCATGTTGTTCCTCATGTTTGCTGCACTCATGCCG
< ACCGTCCTCACCTTTCCCCTTGAGATGGGAGTGTTTCTCAGAGAGCATCTGAACTACTGG
< TACAGCCTGAAAGCCTATTACCTCGCCAAAACCATGGCTGATGTTCCTTTTCAGATCATG
---
> GTGACAAACCTCGTCCCTTACTTGAGAGATTTGGGGTTGAATTGTCCAACCTACCACAAC
> CCAGCAGATTTTGTGATGGAAGTGGCCTCGGGTGAGTACGGGGACCAGAACAGCCGCCTG
> GTCAGGGCTGTGAGGGAGAGGATTTGTGACACAGACTACAAGAGAGACGTGGTTGGGGAG
> CACGAGCTGAACCCCTTCCTCTGGCACCGGCCCTCTGAAGAGGACTCATCCTCCACAGAA
> GGCTGCCACAGCTTCTCTGCCAGCTGCCTAACCCAGTTCTGCATCCTCTTCAAAAGAACT
> TTCCTCACCATCATGAGAGACTCGGTCCTGACACACTTGAGGATCACCTCACACATTGGC
> ATTGGGCTGCTCATTGGATTGCTCTACTTGGGCATTGGCAATGAAGCCAAGAAAGTCCTC
> AGCAACTCGGGGTTCCTCTTCTTCTCCATGTTGTTCCTCATGTTTGCTGCGCTCATGCCG
> ACCGTCCTCACCTTTCCCCTTGAGATGGGAGTGTTTCTCAGAGAGCACCTGAACTACTGG
> TACAGCCTAAAAGCCTATTACCTCGCCAAAACCATGGCTGATGTTCCTTTCCAGATCATG
31c31
< CTCCAGTGGATGTCCTACATTTCCTATGTCAGATACGGGTTCGAAGGAGTCATCCTCTCC
---
> CTCCAGTGGATGTCCTACATTTCCTATGTCAGATATGGGTTCGAAGGAGTCATCCTCTCC
33,34c33,34
< AAATCAGAGGCCATCCTGAAAGAACTGGATGTAGAAAATGCCAAACTCTACCTGGACTTC
< ATCGTTCTTGGGATTTTCTTCTTCTCTCTGCGCCTGATTGCCTATTTTGTCCTCAGATAC
---
> AAATCAGAGGCCATCCTGAAAGAACTGGATGTAGAAAATGCCAAACTTTACCTGGACTTC
> ATTGTTCTTGGGATTTTCTTCTTCTCTCTGCGCCTGATTGCCTATTTTGTCCTCAGATAC



@HajkD
Copy link
Member

HajkD commented Feb 22, 2022

Hi Madza,

Thank you very much for making me aware of this.

Did I understand the issue correctly that you have the same header names in two different fasta files (representing two different species), but behind each header name lies a different coding sequence? Can we assume that headers with the same name in two different species are supposed to be orthologous genes?

If I understood correctly, then it seems to me that internally the wrong header name is selected when computing dNdS. Did you try renaming the headers to from >ABCG1 to e.g. >ABCG1_BF and >ABCG1_ZF? If yes, does the same issue remain?

Would it be possible to construct a small example run with only a few sequences so that I can reproduce this issue and
troubleshoot at each analysis step?

I hope this helps.

Cheers,
Hajk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants