diff --git a/prediction_methods_description.txt b/prediction_methods_description.txt deleted file mode 100644 index 41b9362..0000000 --- a/prediction_methods_description.txt +++ /dev/null @@ -1,99 +0,0 @@ - -create_nr_homolog_hits_file params: - sim_threshold_percent - cmscore_tr - cm_threshold_percent - - # 'rfam_rnafoldc' find the related RFAM model (cmsearch), align to it(cmalign), then take conserved bp as constrains in RNAfold -c - None params for cmscan, which are set to -g (global) to penalize for incomplete alignment, - None params for cmalign (set to default) - to much params to optimize alignment - # 'rfam_subopt' find the related RFAM model (cmsearch), extract reference structure, run mfold for suboptimals and select the most similar one to rfam model - "mfold" (params for mfold, tuple of (P, W, M)) - # 'subopt_rapidshapes' compute reference structure with common suboptimal structure from homologous sequences, then extract shape of reference and use in for rapidshapes - create_nr_homolog_hits_file - "mfold" - "shape_level" (1-5) - "rapidshapes" specific params (--allowLP 1 set as default) - # 'rfam_rapidshapes' find the related RFAM model (cmsearch), extract reference structure, then extract shape of reference and use in for rapidshapes - None params for cmscan, which are set to -g (global) to penalize for incomplete alignment - "shape_level" (1-5) - "rapidshapes" specific params (--allowLP 1 set as default) - # 'clustalo_alifold_rapidshapes' select homologous subset, compute alignment with clustalo in the homologous subset, compute reference structure with alifold, then extract shape of reference and use in for rapidshapes - create_nr_homolog_hits_file - "clustalo" specific params - "alifold" specific params - "shape_level" specific params - "rapidshapes" specific params (now allowLP as default) - # 'muscle_alifold_rapidshapes' select homologous subset, compute alignment with muscle in the homologous subset, compute reference structure with alifold, then extract shape of reference and use in for rapidshapes - ...create_nr_homolog_hits_file - "muscle" specific params - "alifold" specific params - "shape_level" specific params - "rapidshapes" specific params (now allowLP as default) - # 'rcoffee_alifold_rapidshapes' select homologous subset, compute alignment with rcoffee in the homologous subset, compute reference structure with alifold, then extract shape of reference and use in for rapidshapes - ...create_nr_homolog_hits_file - "rcoffee" specific params - "alifold" specific params - "shape_level" specific params - "rapidshapes" specific params (now allowLP as default) - # 'rnafold' predict all structures with RNAFold - ..."rnafold" specific params - # 'subopt_fold_query' predict structure of query with RNAFold and take it as reference, then predict suboptimal structures with mfold and select structure most similar to predicted structure of query - ..."rnafold" specific params for folding query sequence - "mfold" (params for mfold, tuple of (P, W, M)) - # 'subopt_fold_clustal_alifold' select homologous sequences, align with clustal, reference structure with alifold, suboptimal structures with mfold and select structure most similar to predicted structure of query - ...create_nr_homolog_hits_file_MSA_safe - "clustalo" specific params - "alifold" specific params - "mfold" (params for mfold, tuple of (P, W, M)) - # 'subopt_fold_muscle_alifold' select homologous sequences, align with muscle, reference structure with alifold, suboptimal structures with mfold and select structure most similar to predicted structure of query - ...create_nr_homolog_hits_file_MSA_safe - "muscle" specific params - "alifold" specific params - "mfold" (params for mfold, tuple of (P, W, M)) - # 'alifold_refold' select homologous subset, compute alignment with clustalo in the homologous subset, compute reference structure with alifold, compute profile alignment of selected sequences with all sequences, match the reference structure to all sequence from profile alignment, run refold on the result - ...create_nr_homolog_hits_file - "clustalo" specific params - "alifold" specific params - "clustalo_profile" (params applied to clustalo in the phase of profile alignment) - # 'alifold_refold_rnafold_c' select homologous subset, compute alignment with clustalo in the homologous subset, compute reference structure with alifold, compute profile alignment of selected sequences with all sequences, match the reference structure to all sequence from profile alignment, run refold and RNAFold -c on the result - ...create_nr_homolog_hits_file - "clustalo" specific params - "alifold" specific params - "clustalo_profile" (params applied to clustalo in the phase of profile alignment) - # 'alifold_unpaired_conserved_refold' select homologous subset, compute alignment with clustalo in the homologous subset, compute reference structure with alifold, compute profile alignment of selected sequences with all sequences, match the reference structure to all sequence from profile alignment, select conserved parts of alignment that have singlestrand consensus structure annotation and use them as constrains for RNAFold -c - ...create_nr_homolog_hits_file - "clustalo" specific params - "alifold" specific params - "clustalo_profile" (params applied to clustalo in the phase of profile alignment) - "repred_unpaired_tr" how much MSA must be conserved to denote the position as singlestrand for RNAfold -c - "conseq_conserved" how many bases in a row must be conserved to denote the position as singlestrand for RNAfold -c - # 'muscle_alifold_refold' select homologous subset, compute alignment with muscle in the homologous subset, compute reference structure with alifold, compute profile alignment of selected sequences with all sequences, match the reference structure to all sequence from profile alignment, run refold on the result - ...create_nr_homolog_hits_file - "muscle" specific params - "alifold" specific params - "clustalo_profile" (params applied to clustalo in the phase of profile alignment) - # 'muscle_alifold_refold_rnafold_c' select homologous subset, compute alignment with muscle in the homologous subset, compute reference structure with alifold, compute profile alignment of selected sequences with all sequences, match the reference structure to all sequence from profile alignment, run refold and RNAFold -c on the result - ...create_nr_homolog_hits_file - "muscle" specific params - "alifold" specific params - "clustalo_profile" (params applied to clustalo in the phase of profile alignment) - # 'muscle_alifold_unpaired_conserved_refold'...select homologous subset, compute alignment with muscle in the homologous subset, compute reference structure with alifold, compute profile alignment of selected sequences with all sequences, match the reference structure to all sequence from profile alignment, select conserved parts of alignment that have singlestrand consensus structure annotation and use them as constrains for RNAFold -c - ...create_nr_homolog_hits_file - "muscle" specific params - "alifold" specific params - "clustalo_profile" (params applied to clustalo in the phase of profile alignment) - "repred_unpaired_tr" how much MSA must be conserved to denote the position as singlestrand for RNAfold -c - "conseq_conserved" how many bases in a row must be conserved to denote the position as singlestrand for RNAfold -c - # 'dh_tcoffee_alifold_refold' select homologous subset, compute alignment with a, compute consensus with alifold, - # 'dh_tcoffee_alifold_refold_rnafoldc' - # 'dh_tcoffee_alifold_conserved_ss_rnafoldc' - # 'dh_clustal_alifold_refold' - # 'dh_clustal_alifold_refold_rnafoldc' - # 'dh_clustal_alifold_conserved_ss_rnafoldc' - # 'pairwise_centroid_homfold' - # 'TurboFold_conservative' - # 'TurboFold' - # 'tcoffee_rcoffee_alifold_refold' - # 'tcoffee_rcoffee_alifold_refold_rnafoldc' - # 'tcoffee_rcoffee_alifold_conserved_ss_rnafoldc' \ No newline at end of file diff --git a/prediction_methods_nr_hom.txt b/prediction_methods_nr_hom.txt deleted file mode 100644 index a75ded6..0000000 --- a/prediction_methods_nr_hom.txt +++ /dev/null @@ -1,84 +0,0 @@ -prediction method nr ambig_crash function and comment -================================================================================ -rfam_rnafoldc No - No nt - -rfam_subopt No - No nt - -subopt_rapidshapes Yes - Yes nt - safe - need multiple homologous sequences (min 2) to predict common structure using suboptimal structures, if 1, this will degrade to MFE or MEA prediction (not implemented) -rfam_rapidshapes No - Yes nt - -clustalo_alifold_rapidshapes Yes - Yes nt - safe - need multiple homologous sequences (min 2) to run clustal and alifold fo consensus prediction, if 1, this will degrade to MFE or MEA with rapidshapes (not implemented) -muscle_alifold_rapidshapes Yes - Yes nt - safe - see above (muscle) -rcoffee_alifold_rapidshapes Yes - Yes nt - safe - see above (rcoffee) -alifold_refold Yes - No nt - safe - need multiple for clustal and alifold for consensus prediction, if 1, the profile alignment would fail (maybe mafft could mitigate that, but consensus pred is still a problem) -muscle_alifold_refold Yes - No nt - safe - see above (muscle) -rnafold No - No nt - -subopt_fold_query No - No nt - -subopt_fold_clustal_alifold Yes - No nt - safe - for clustal and alifold -subopt_fold_muscle_alifold Yes - No nt - safe - for muscle and alifold -alifold_refold_rnafold_c Yes - No nt - safe - for clustal and alifold -muscle_alifold_refold_rnafold_c Yes - No nt - safe - for muscle and alifold -alifold_unpaired_conserved_refold Yes - No nt - safe - for clustal and alifold -muscle_alifold_unpaired_conserved_refold Yes - No nt - safe - for muscle and alifold -dh_tcoffee_alifold_refold -dh_tcoffee_alifold_refold_rnafoldc -dh_tcoffee_alifold_conserved_ss_rnafoldc -dh_clustal_alifold_refold -dh_clustal_alifold_refold_rnafoldc -dh_clustal_alifold_conserved_ss_rnafoldc -pairwise_centroid_homfold Yes - No - (one homologous sequence suffice - this we always have) -TurboFold_conservative Yes - Yes - unsafe - (one homologous sequence suffice) -TurboFold Yes - Yes - nr safe -tcoffee_rcoffee_alifold_refold Yes - No nt - safe -tcoffee_rcoffee_alifold_refold_rnafoldc Yes - No nt - safe -tcoffee_rcoffee_alifold_conserved_ss_rnafoldc Yes - No nt - safe - -* safe - minimum of one homologous sequences (other then query) is needed for prediction method - unsafe - none (zero) homologous sequences are needed for the prediction (query sequences suffice) - - - -need to decide if it is better to raise an error if too few sequences pass the filtering, or to do some mitigation - -cases -- no homologs -- no unique homologs (with query) - -decouple homologs -cases -- no homologs -- no unique homologs -- no non homologs -- only one non homologs - -================================================ -mitigation option for safe version: -case: -- only one unique sequence after similarity filtering - solution: add the most disimilar sequence to the list - -- no predicted homologous sequences at all - solution: - 1) duplicate the query sequence to fool the 3rd party programs - 2) return no structures for that procedure (or raise exception), run other procedures if requested (for me, this is preferable behaviour) - - -decouple homologs -selecting homologous sequences is same as in other versions - -inside there is special stage when the sequences are divided to homologous and nonhomologous -that must be delt with separetly - -ambiguos base crash -3rd party -locarna OK -centroidhomfold OK -rapidshapes crash -clustalo OK -muscle allseqtypes OK -RNAfold OK -RNAalifold OK -cmbuild OK - ambiguos base not propagated to cm model -cmalign OK - aling with model build from seq with ambiguos and align CM to seqs containing ambiguos -refold.pl OK - ambig in seqs, even if in consensus -TurboFold crash -t-coffee OK -r-coffee OK -mfold (hybrid-ss-min) OK