data_mining_influenza

A project for CS 5140/6140 Data Mining at the University of Utah.

Here we derive methods to discover hidden structure in influenza protein and DNA sequences. All the data is found here: https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?go=database. We aim to use text analysis to discover relationships between influenza strands <to what aim>.

The influenza virus is a major public health issue. During the US annual flu seasons from 2010-2018 the CDC models estimate that over 28 million people are sickened, over 470,000 are hospitalized, and 42,000 people died on average per flu season (https://www.cdc.gov/flu/about/burden/index.html). Rapid DNA sequencing and characterization of influenza viruses is already used to track strains and to attempt to determine the origin of virus strains, particularly the most virulent ones (Garten et al. 2009). Rapid tracking and analysis facilitates the response to seasonal flu and potential flu epidemics. Computational analysis of genetic information is also being used to improve the yearly flu vaccine cocktail.

Project Structure

The majority of this project is based in Jupyter Notebook. That is where you will find all of the analysis that was done. There a couple supporting .py files that help speed up time consuming tasks and segregate large portions of utility code.

Most of the images should be described inside of the notebooks where they are created.

-- Brian Avery & Lukas Gust

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
ABCtype_LDA_plot.png		ABCtype_LDA_plot.png
DM_poster.png		DM_poster.png
FASTA_parse.py		FASTA_parse.py
HNType_thirty_LDA_plot.png		HNType_thirty_LDA_plot.png
Jaccard_Sim_Dist.ipynb		Jaccard_Sim_Dist.ipynb
LDA_flu_test.Rmd		LDA_flu_test.Rmd
LDA_flu_test.pdf		LDA_flu_test.pdf
Minhash_JacDist.ipynb		Minhash_JacDist.ipynb
README.md		README.md
canine_flu_test.fa		canine_flu_test.fa
canine_flu_test_cmp.dendro.png		canine_flu_test_cmp.dendro.png
canine_flu_test_cmp.matrix.png		canine_flu_test_cmp.matrix.png
cmp.dendro.png		cmp.dendro.png
cmp.matrix.png		cmp.matrix.png
dimension cost.png		dimension cost.png
gene_LDA_plot.png		gene_LDA_plot.png
influenza.faa.pklz		influenza.faa.pklz
influenza.fna.pklz		influenza.fna.pklz
multiprocess_JaccardDist.ipynb		multiprocess_JaccardDist.ipynb
pairplot.png		pairplot.png
parse_fasta.ipynb		parse_fasta.ipynb
parse_fasta_wJS_03182019.ipynb		parse_fasta_wJS_03182019.ipynb
pca pairplot.png		pca pairplot.png
seq_distance.ipynb		seq_distance.ipynb
sequence_object_example.ipynb		sequence_object_example.ipynb
short_test1.fa		short_test1.fa
similarity_first_tests_03182019.ipynb		similarity_first_tests_03182019.ipynb
sourmash_flu_test.ipynb		sourmash_flu_test.ipynb
sourmash_minhashing_test.ipynb		sourmash_minhashing_test.ipynb
uniflna_NA_sub1.pkl		uniflna_NA_sub1.pkl
uniflna_NA_sub2.pkl		uniflna_NA_sub2.pkl
uniflna_sub1_cmp.dendro.png		uniflna_sub1_cmp.dendro.png
uniflna_sub1_k7_cmp.dendro.png		uniflna_sub1_k7_cmp.dendro.png
uniflna_test1.csv		uniflna_test1.csv
uniflna_test2.csv		uniflna_test2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data_mining_influenza

Project Structure

About

Releases

Packages

Contributors 2

Languages

lukas-gust/data_mining_influenza

Folders and files

Latest commit

History

Repository files navigation

data_mining_influenza

Project Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages