MIG-Phylogenomics

To the extent possible under law, the person who associated CC0 with this work has waived all copyright and related or neighboring rights to this work.

Read data

The read data for this analysis is in SRA under accession number PRJNA340324

Raw paired read libraries

Expect read-1 and read-2 fastq files for each of the following libraries. FIle paths in notebooks my need to be adjusted depending on where you place the files on your machine (big data is usually placed outsied the work drive and the path for those are system specific)

Sample	Library	Used in analysis
MincA14	150715_D00248_0103_AC75KUANXX_4_IL-TP-021	+
MareHarA	150403_D00261_0236_AC6E37ANXX_8_IL-TP-021	+
MareHarA	150403_D00261_0236_AC6E37ANXX_8_IL-TP-023
MareHarA	150521_D00200_0260_AC6V40ANXX_2_IL-TP-021
MareHarA	150521_D00200_0260_AC6V40ANXX_2_IL-TP-023
MjavLD15	150715_D00248_0103_AC75KUANXX_4_IL-TP-010	+
MincL19	150715_D00248_0103_AC75KUANXX_4_IL-TP-011	+
MareL32	150715_D00248_0103_AC75KUANXX_4_IL-TP-022	+
MareL28	150715_D00248_0103_AC75KUANXX_4_IL-TP-008	+
MjavL57	150715_D00248_0103_AC75KUANXX_4_IL-TP-001	+
MjavVW4	mjavanicaVW4_500	+
MjavVW4	mjavanicaVW4_300
MincW1	150212_D00261_0225_AC6EKCANXX_1_IL-TP-013	+
MincW1	150212_D00261_0225_AC6EKCANXX_1_IL-TP-005
MincVW6	150212_D00261_0225_AC6EKCANXX_1_IL-TP-007	+
MincVW6	150212_D00261_0225_AC6EKCANXX_1_IL-TP-002
MincHarC	150212_D00261_0225_AC6EKCANXX_1_IL-TP-012	+
MincHarC	150212_D00261_0225_AC6EKCANXX_1_IL-TP-004
Minc557R	150212_D00261_0225_AC6EKCANXX_1_IL-TP-006	+
MincL9	150715_D00248_0103_AC75KUANXX_4_IL-TP-009	+
MincL27	150715_D00248_0103_AC75KUANXX_4_IL-TP-020	+
MjavLD17	150715_D00248_0103_AC75KUANXX_4_IL-TP-003	+
MentL30	150716_D00248_0104_BC75KYANXX_3_IL-TP-005	+
MentL30	150716_D00248_0104_BC75KYANXX_3_IL-TP-019
MfloSJF1	160425_E00397_0014_AHLYG7CCXX_1_TP-D7-003	+
MfloSJF1	160426_K00166_0058_AH7WLVBBXX_8_TP-D7-005_TP-D5-003

Genome assembly scripts

Genome assembly scripts by Dr. Laura Salazar are available here. The genome assembly files are in this repository.

Quality trimmed paired read file

These were used for mapping of genes and of contig pairs, based on raw read libraries indicated by + . They are available in this location until 25/6/2018. Aternatively, they can be created in notebook 2.

25M read subset of the first trimmed read file

These were used for mitochondrial genome assembly, based on the first read trimmed file. When link is provided instead of a file, the trimmed read one file had less than 25 M reads in it and was also used as the subset. The links will need to be recreated on your system. These files are created in notebook 5.

Notebooks and related files

0. Dependencies

Notebook file name: Dependencies.ipynb

1. CDSs and proteins from genome assemblies

Notebook file name: CDSs_and_proteins_from_genome_assemblies.ipynb

Related files:

meloidogyne_assemblies: contains fasta genome assemblies
annotation: contain gff files for the assemblies in assemblies

dirs that start with None: genes, cdss or proteins without premature stop codon
dirs that start with stopped: genes, cdss or proteins with a premature stop codon
dirs that start with all: a merge of None and stopped
dirs that end with files: raw, as indicated in the gff
dirs that end with centroinds: cds files that were reduced with a vsearch step
dirs that end with reviewed: final treated datasets (see notebook)
ref in all the dir names indicate that these files are derived from a genome assembly annotation.

2. Map-assemble genes from read data for samples without assemblies

Notebook file name: Map_assemble_gene.ipynb

Related files

<sample name>_bwa/<sample name>.nt.fasta: map-assembled gene files

<None | stopped | all >_<cdss | proteins | gffs>
with None indicating that nothing is written.

dirs that start with None: gffs, cdss or proteins without premature stop codon
dirs that start with stopped: gffs, cdss or proteins with a premature stop codon
dirs that start with all: a merge of None and stopped

3. Orthology clustering

Notebook file name: Orthology_clustering.ipynb

Related files:

orthofinder/all_inputs/<sample name><None|_ref>.aa.fasta: links to protein sequences of all the samples. They will need to be regenerated locally (step included in the notebook).

orthofinder/all_inputs/Results_Jan16/<inflation value>_OrthologousGroup.csv: Orthology clusters, with <inflation value> representign the mcl inflation parameter, except for 0, representing an inflation of 1.5, and 1, representing inflation of 1.1.

orthofinder/all_inputs/Results_Jan16/WorkingDirectory: OrthoFinder inputs and outputs of the Blast step.
orthofinder/all_inputs/Results_Jan16/OGs_I2_1-4.gb.gz: A genbank file with coding and protein sequences of orthology clusters with 1 to 4 gene copies for each reference sample`.

orthofinder/all_inputs/Results_Jan16/OGs_I2_1-4.gb.loci.<csv|txt>: ReproPhylo formated list of the loci that are in the genbank file.

orthofinder/all_inputs/Results_Jan16/rootknot_phylogenomics: Input and output files of the OC filtering and correction pipeline, with trimal settings of gt=0.7 and st=0.01`

orthofinder/all_inputs/Results_Jan16/I2_3X2_gt0.7_st_0.01_alns_<1-4 | all2 | flo2>: Sequence alignments of orthology clusters in which inparalogs are collapsed into a single sequence, OCs with fragmanted orthologs are excluded and each genome copy contains up to one copy per sample.
1-4: all the orthology clusters in which there are at least 3 reference samples with 2 gene copies. all2: a subset of 1-4 in which all the reference samples have two gene copies.
flo2: a subset of 1-4 in which all MfloSJF1 has two gene copies.

Figures

4. Nuclear phylogenomics

Notebook file name: Nuclear_phylogenomics.ipynb

Related files:

orthofinder/all_inputs/Results_Jul02/I2_3X2_gt0.7_st_0.01_alns_1-4/<astralshuffeled | raxmlshuffled>: randomization analyses in which homeolog 1 and homeolog 2 are randomly assigned for each gene.
astralshuffeled: 100 astral runs, in which hom 1 and 2 were randomly assigend for each gene.
raxmlshuffled: 100 raxml supermatrix trees, in which hom 1 and 2 were randomly assigned for each gene, prior to the concatenation of the supermatrix.

orthofinder/all_inputs/Results_Jul02/I2_3X2_gt0.7_st_0.01_alns_1-4/trees.txt: a list of gene trees that were used for astral (non randomized)

orthofinder/all_inputs/Results_Jul02/I2_3X2_gt0.7_st_0.01_alns_1-4/raxmlshuffled/trees.txt: a list of randomized supermatrix trees.

orthofinder/all_inputs/Results_Jul02/I2_3X2_gt0.7_st_0.01_alns_1-4/ RAxML_StrictConsensusTree<AstStrict | RaxStrict>: strict consensus trees that resulted from the two randomization analyses with astral and raxml.

orthofinder/all_inputs/Results_Jul02/I2_3X2_gt0.7_st_0.01_alns_1-4/ RAxML_<>.merged_clusters_<>:
A through raxml tree reconstrction of a supermatrix of all the OCs, following a treeCL analysis confirming their shared phylogeny.

Figures

5. Mitochondrial genome assembly

Notebook file name: Mitochondrial_genome_assembly.ipynb

Related files:

<sample name>_mitobim: mitobim assembly based on mitochondrial gene seeds.
mito_references: reference mitochondrial genomes from ncbi.

6. Mitochondrial genome annotation

Notebook file name: Mitochondrial_genomes_annotation.ipynb

Related files:

mitochondrial_assemblies/<sample name>_genes.fasta: fasta files of mitochondrial genes as predicted by exonerate.

7. Mitochondrial genome phylogenomics

Notebook file name: Mitochondrial_genomes_tree.ipynb

Related files:

mitochondrial_assemblies/phylogenetic_analysis: all the files associated with the reprophylo pipeline.

Figures

8. Intra-genome identity among homeolog gene pairs

Notebook file name: Intra_genome_sequence_divergence.ipynb

Related files:

intrablast_p_ident_dict.pkl: pairwise homoeolog identity values for all the samples.

Figures:

9. Coverage ratio between homeolog contigs within a genome

Notebook file name: Median_ratio.ipynb

Related files:

<sample>_contig_pairs_bwa: read mapping to homoeolog contig pairs.
coverage_ratio_histograms: outputs.
genes_to_contigs.pkl: contig assignment of genes.
OG_contig_relationship.pkl: contig assignments of OCs.
contig_pairs_data: fasta files with contigs pairs.

Figures

10. Gene Conversion

Notebook file name: GeneConversion.ipynb

Related files:

synteny: all the related files.

Figures

11. Transposable elements

Notebook file name: TE.ipynb

Related files:

TEs: all the related files.

Figures

12. Intra and interspecific genetic diversity

Notebook file name: GeneticVariation.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
GS978784		GS978784
MareHarA_contig_pairs_bwa		MareHarA_contig_pairs_bwa
MareHarA_mitobim		MareHarA_mitobim
MareL28_bwa		MareL28_bwa
MareL28_contig_pairs_bwa		MareL28_contig_pairs_bwa
MareL28_mitobim		MareL28_mitobim
MareL32_bwa		MareL32_bwa
MareL32_contig_pairs_bwa		MareL32_contig_pairs_bwa
MareL32_mitobim		MareL32_mitobim
Meloidogyne_Genomes @ 998e775		Meloidogyne_Genomes @ 998e775
MentL30_mitobim		MentL30_mitobim
MfloJB5_mitobim		MfloJB5_mitobim
MfloSJF1_mitobim		MfloSJF1_mitobim
Minc557R_bwa		Minc557R_bwa
Minc557R_contig_pairs_bwa		Minc557R_contig_pairs_bwa
Minc557R_mitobim		Minc557R_mitobim
MincA14_bwa		MincA14_bwa
MincA14_contig_pairs_bwa		MincA14_contig_pairs_bwa
MincA14_mitobim		MincA14_mitobim
MincHarC_bwa		MincHarC_bwa
MincHarC_contig_pairs_bwa		MincHarC_contig_pairs_bwa
MincHarC_mitobim		MincHarC_mitobim
MincL15_bwa		MincL15_bwa
MincL15_contig_pairs_bwa		MincL15_contig_pairs_bwa
MincL17_bwa		MincL17_bwa
MincL17_contig_pairs_bwa		MincL17_contig_pairs_bwa
MincL19_bwa		MincL19_bwa
MincL19_contig_pairs_bwa		MincL19_contig_pairs_bwa
MincL19_mitobim		MincL19_mitobim
MincL27_bwa		MincL27_bwa
MincL27_contig_pairs_bwa		MincL27_contig_pairs_bwa
MincL27_mitobim		MincL27_mitobim
MincL9_bwa		MincL9_bwa
MincL9_contig_pairs_bwa		MincL9_contig_pairs_bwa
MincL9_mitobim		MincL9_mitobim
MincVW6_bwa		MincVW6_bwa
MincVW6_contig_pairs_bwa		MincVW6_contig_pairs_bwa
MincVW6_mitobim		MincVW6_mitobim
MincW1_contig_pairs_bwa		MincW1_contig_pairs_bwa
MincW1_mitobim		MincW1_mitobim
MjavL57_bwa		MjavL57_bwa
MjavL57_contig_pairs_bwa		MjavL57_contig_pairs_bwa
MjavL57_mitobim		MjavL57_mitobim
MjavLD15_mitobim		MjavLD15_mitobim
MjavLD17_mitobim		MjavLD17_mitobim
MjavVW4_contig_pairs_bwa		MjavVW4_contig_pairs_bwa
MjavVW4_mitobim		MjavVW4_mitobim
MjavVW5_bwa		MjavVW5_bwa
MjavVW5_contig_pairs_bwa		MjavVW5_contig_pairs_bwa
MjavVW5_mitobim		MjavVW5_mitobim
MlanSJH1_mitobim		MlanSJH1_mitobim
OrthoFinderExe		OrthoFinderExe
TEs		TEs
all_cds_ref_reviewed		all_cds_ref_reviewed
all_cdss		all_cdss
all_gene_ref_reviewed		all_gene_ref_reviewed
all_gffs		all_gffs
all_protein_ref_reviewed		all_protein_ref_reviewed
all_proteins		all_proteins
annotation		annotation
cds_ref_centroids		cds_ref_centroids
cds_ref_files		cds_ref_files
cds_ref_reviewed		cds_ref_reviewed
cdss		cdss
contig_pairs_data		contig_pairs_data
coverage_ratio_histograms		coverage_ratio_histograms
gene_ref_files		gene_ref_files
gene_ref_reviewed		gene_ref_reviewed
gffs		gffs
meloidogyne_assemblies		meloidogyne_assemblies
mito_references		mito_references
mitochondrial_assemblies		mitochondrial_assemblies
orthofinder/all_inputs		orthofinder/all_inputs
protein_ref_file		protein_ref_file
protein_ref_reviewed		protein_ref_reviewed
proteins		proteins
stopped_cds_ref_reviewed		stopped_cds_ref_reviewed
stopped_cdss		stopped_cdss
stopped_gene_ref_reviewed		stopped_gene_ref_reviewed
stopped_gffs		stopped_gffs
stopped_protein_ref_reviewed		stopped_protein_ref_reviewed
stopped_proteins		stopped_proteins
synteny		synteny
.gitmodules		.gitmodules
CDSs_and_proteins_from_genome_assemblies.ipynb		CDSs_and_proteins_from_genome_assemblies.ipynb
Dependencies.ipynb		Dependencies.ipynb
GeneConversion.ipynb		GeneConversion.ipynb
GeneticVariation.ipynb		GeneticVariation.ipynb
Intra_genome_sequence_divergence.ipynb		Intra_genome_sequence_divergence.ipynb
Map_assemble_genes.ipynb		Map_assemble_genes.ipynb
Median_ratio.ipynb		Median_ratio.ipynb
Mitochondrial_genome_assembly.ipynb		Mitochondrial_genome_assembly.ipynb
Mitochondrial_genomes_annotation.ipynb		Mitochondrial_genomes_annotation.ipynb
Mitochondrial_genomes_tree.ipynb		Mitochondrial_genomes_tree.ipynb
Nuclear_phylogenomics.ipynb		Nuclear_phylogenomics.ipynb
OG_contig_relationship.pkl		OG_contig_relationship.pkl
Orthology_clustering.ipynb		Orthology_clustering.ipynb
README.md		README.md
TE.ipynb		TE.ipynb
genes_to_contigs.pkl		genes_to_contigs.pkl

HullUni-bioinformatics/MIG-Phylogenomics

Folders and files

Latest commit

History

Repository files navigation

MIG-Phylogenomics

Read data

Raw paired read libraries

Genome assembly scripts

Quality trimmed paired read file

25M read subset of the first trimmed read file

Notebooks and related files

0. Dependencies

1. CDSs and proteins from genome assemblies

Related files:

2. Map-assemble genes from read data for samples without assemblies

Related files

3. Orthology clustering

Related files:

Figures

4. Nuclear phylogenomics

Related files:

Figures

5. Mitochondrial genome assembly

Related files:

6. Mitochondrial genome annotation

Related files:

7. Mitochondrial genome phylogenomics

Related files:

Figures

8. Intra-genome identity among homeolog gene pairs

Related files:

Figures:

9. Coverage ratio between homeolog contigs within a genome

Related files:

Figures

10. Gene Conversion

Related files:

Figures

11. Transposable elements

Related files:

Figures

12. Intra and interspecific genetic diversity

Figures

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages