OEAS895: Advanced Data Science Techniques in Ocean, Earth and Environmental Sciences

3 credits, Spring 2023
Dr Sophie Clayton, [email protected]
Office hours: 13:00 - 15:00 M, OCNPS 423
Class times: 9:30 - 10:45 T/Th, OCNPS 403
Link to syllabus
Students will require a laptop, all computational tools used in the course are available for free.

Course description

The Ocean, Earth and Environmental Sciences are quickly moving from being a data-poor to data-rich disciplines, with many scientific and industry-related applications enabled by the analysis, synthesis and statistical modeling of large and diverse environmental data sets.

This is an advanced computational analysis course designed to introduce students to data management and analysis methods commonly used in data science applications. The data analysis portion of the course will be primarily based on machine learning methods. The course will also give an overview of a selection of scientific databases which host freely available oceanographic data and output from numerical model simulations. This course is not discipline specific and will be useful for any students who want to work with data efficiently and gain experience in data management, proper techniques in developing analytical pipelines and applying machine learning to their research.

The class will meet two days a week, Tuesday and Thursday. Classes will consist of a combination of lectures, discussions and practical coding exercises where collaboration and teamwork will be encouraged. The outcome of the course will be an individual capstone project where each student applies the techniques learned during the course to undertake a data analysis project based on their own research interests using at least 2 different data sources from open scientific databases, and may include data that they have generated themselves. Students will be expected to publish the code developed and results of their project in a public GitHub repository.

Course schedule with links to notes

A pdf of the schedule can be found here

Week	Topic	Notes and code	Homework
1 (1/10)	Open Science and FAIR data	Lecture slides
1 (1/12)	Version control, git, GitHub	Version control overview, Intro to git	HW1 (due January 25^th 5pm)
2 (1/17)	Data science workflow and project organization	Data science workflow overview, Project organization
2 (1/19)	Intro to environments, Exploratory Data Analysis	conda notes, conda cheatsheet, EDA with pandas jupyter notebook
3 (1/24)	Plotting with seaborn, more pandas EDA		HW2 (due February 3^rd 5pm)
3 (1/26)	Environmental databases and toolboxes, mapping with cartopy	List of databases, making a map with cartopy, plotting data on a map
4	NO CLASSES	--	--
5 (2/7)	Machine learning overview	Notes TBA
5 (2/9)	Intro to scikit-learn and Supervised Regression	Nitrate linear regression example	HW3 (due February 22^nd 5pm)
6 (2/14)	Scaling, Neural Network Regressors, Nearest Neighbor Regressors, in class practice	Examples of different regression estimators
6 (2/16)	Supervised learning - classification, evaluation and error metrics	Notes TBA
7 (2/21)	In class practice		HW4 (due March 8^th 5pm)
7 (2/23)	Supervised learning - KNN and MLPClassifier	iris dataset examples: KNN classifier, MLPClassifier and feature scaling
8 (2/28)	Unsupervised learning - KMeans	KMeans example using the seeds dataset
8 (3/2)	Unsupervised learning and Capstone Project Development		HW5 Capstone Proposal (due March 17^th 5pm)
9	NO CLASSES	SPRING BREAK	--
10 (3/14)	Dimensionality Reduction, Feature Extraction with PCA	PCA feature extraction example using the wine dataset
11 (3/21)	Feature Selection methods
11 (3/23)	Thursday: Project work
12 (3/28)	Cross-validation for training models on small datasets
12 (3/30)	Thursday: Project work
13 (4/4)	Paper discussion
13 (4/6)	Thursday: Project work
14 (4/11)	TBD
14 (4/13)	Thursday: Project work
15 (4/18)	Project Presentations		Capstone Presentation Instructions
15 (4/20)	Project Presentations

Learning objectives

Understand FAIR data principles and how to apply them when generating, sharing and accessing data.
Develop a working knowledge of existing ocean and earth science databases and how to efficiently access data from them, including via APIs.
Students will develop their own data analysis toolbox using, but not limited to, Python and shell scripts.
Understand and use version control (e.g. git), environments (e.g. conda) and code repositories (e.g. GitHub) to manage and share code.
Understand the underlying principles of machine learning techniques for regression and classification, including supervised and unsupervised learning and apply them to a targeted research question.
Understand the process of model evaluation and optimization and commonly used metrics for reporting model performance.

Capstone Project

The goal of the final capstone project is to assess students ability to combine and apply the skills learned in class in the context of a real-world research problem. The class will mostly focus on tools for data analysis, visualization and developing and evaluating machine learning models, so this will be the focus of the capstone project. Students must have the dataset(s) and general scope of their capstone project approved by the instructor the week after spring break.

Detailed information on the capstone project is posted here.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
data		data
homework		homework
notebooks		notebooks
notes		notes
readings		readings
slides		slides
.gitattributes		.gitattributes
.gitignore		.gitignore
OEAS895_AdvData_schedule.pdf		OEAS895_AdvData_schedule.pdf
OEAS895_AdvData_syllabus_Sp23.pdf		OEAS895_AdvData_syllabus_Sp23.pdf
README.md		README.md
funtimes.txt		funtimes.txt
funtimes.yml		funtimes.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OEAS895: Advanced Data Science Techniques in Ocean, Earth and Environmental Sciences

Course description

Course schedule with links to notes

Learning objectives

Capstone Project

About

Releases

Packages

Languages

sophieclayton/OEAS805_envdatasci

Folders and files

Latest commit

History

Repository files navigation

OEAS895: Advanced Data Science Techniques in Ocean, Earth and Environmental Sciences

Course description

Course schedule with links to notes

Learning objectives

Capstone Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages