Introduction to Data Science using Python

This repository contains teaching materials for a 3-day workshop on using Python for data science.

Computational environment

This workshop is run using the Anaconda Python distribution. A conda environment to run all the materials can be created using

conda create -n ds_python python=3.7 anaconda

The necessary packages: numpy, pandas, matplotlib, seaborn, scipy, sklearn, biopython, statsmodels The packages plotly and altair are needed for one document, but are otherwise not covered in the materials.

Teaching materials and documents

All the data files used in the workshop are contained in data. The homework assignments and solutions are contained in homeworks. The introduction document, all sides, and a schedule for the three days are in workshop_documents. Administrative materials for running the workshop are in the workshop_documents folder. The docs folder contains .pdf and .html book like versions of the material covered in the workshop.

The material covered during the workshop is in jupyter notebooks as they allow the use of Markdown and embedding for figures. During the workshop most material is covered via live-coding using either Spyder, jupyter, or google-colab. The other component of the class is work from screen cast videos allowing independent work.

The data and notebooks are also contained in a google drive folder which provides easy access to the notebooks and data for students unfamiliar with coding in general. We found fewer issues with this than other methods of providing the data and live notebooks.

TODO:

Clean up the data folder and make sure it only has the material we actually use, the folder new_march has the data we will provide for the March Workshop I believe it has all of it.
Better file system/work setup explanation at the beginning - see day 0 slides
Adjust Genomics project work

Licenses

All software code in this repository is licensed under the MIT License (see LICENSE)

All textual and written material in this repository is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
docs		docs
graphs		graphs
homeworks		homeworks
live_coding		live_coding
new_march		new_march
pdf_notebooks		pdf_notebooks
workshop_documents		workshop_documents
.gitignore		.gitignore
00_python_primer.Rmd		00_python_primer.Rmd
00_python_primer.ipynb		00_python_primer.ipynb
01_python_tools_ds.ipynb		01_python_tools_ds.ipynb
02_python_pandas.Rmd		02_python_pandas.Rmd
02_python_pandas.ipynb		02_python_pandas.ipynb
03_python_vis.ipynb		03_python_vis.ipynb
05_python_learning.Rmd		05_python_learning.Rmd
05_python_learning.ipynb		05_python_learning.ipynb
06_python_appl.Rmd		06_python_appl.Rmd
06_python_appl.ipynb		06_python_appl.ipynb
Introduction_workshop_slides.pptx		Introduction_workshop_slides.pptx
LICENSE		LICENSE
ML_slides.pptx		ML_slides.pptx
README.md		README.md
create_env.bash		create_env.bash
data.zip		data.zip
python_stat_extended.ipynb		python_stat_extended.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Data Science using Python

Computational environment

Teaching materials and documents

TODO:

Licenses

About

Releases

Packages

Languages

License

ggerlach1/BIOF085

Folders and files

Latest commit

History

Repository files navigation

Introduction to Data Science using Python

Computational environment

Teaching materials and documents

TODO:

Licenses

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages