Skip to content

sophieclayton/OEAS805_envdatasci

Repository files navigation

OEAS895: Advanced Data Science Techniques in Ocean, Earth and Environmental Sciences

3 credits, Spring 2023
Dr Sophie Clayton, [email protected]
Office hours: 13:00 - 15:00 M, OCNPS 423
Class times: 9:30 - 10:45 T/Th, OCNPS 403
Link to syllabus
Students will require a laptop, all computational tools used in the course are available for free.

Course description

The Ocean, Earth and Environmental Sciences are quickly moving from being a data-poor to data-rich disciplines, with many scientific and industry-related applications enabled by the analysis, synthesis and statistical modeling of large and diverse environmental data sets.

This is an advanced computational analysis course designed to introduce students to data management and analysis methods commonly used in data science applications. The data analysis portion of the course will be primarily based on machine learning methods. The course will also give an overview of a selection of scientific databases which host freely available oceanographic data and output from numerical model simulations. This course is not discipline specific and will be useful for any students who want to work with data efficiently and gain experience in data management, proper techniques in developing analytical pipelines and applying machine learning to their research.

The class will meet two days a week, Tuesday and Thursday. Classes will consist of a combination of lectures, discussions and practical coding exercises where collaboration and teamwork will be encouraged. The outcome of the course will be an individual capstone project where each student applies the techniques learned during the course to undertake a data analysis project based on their own research interests using at least 2 different data sources from open scientific databases, and may include data that they have generated themselves. Students will be expected to publish the code developed and results of their project in a public GitHub repository.

Course schedule with links to notes

A pdf of the schedule can be found here

Week Topic Notes and code Homework
1 (1/10) Open Science and FAIR data Lecture slides
1 (1/12) Version control, git, GitHub Version control overview, Intro to git HW1 (due January 25th 5pm)
2 (1/17) Data science workflow and project organization Data science workflow overview, Project organization
2 (1/19) Intro to environments, Exploratory Data Analysis conda notes, conda cheatsheet, EDA with pandas jupyter notebook
3 (1/24) Plotting with seaborn, more pandas EDA HW2 (due February 3rd 5pm)
3 (1/26) Environmental databases and toolboxes, mapping with cartopy List of databases, making a map with cartopy, plotting data on a map
4 NO CLASSES -- --
5 (2/7) Machine learning overview Notes TBA
5 (2/9) Intro to scikit-learn and Supervised Regression Nitrate linear regression example HW3 (due February 22nd 5pm)
6 (2/14) Scaling, Neural Network Regressors, Nearest Neighbor Regressors, in class practice Examples of different regression estimators
6 (2/16) Supervised learning - classification, evaluation and error metrics Notes TBA
7 (2/21) In class practice HW4 (due March 8th 5pm)
7 (2/23) Supervised learning - KNN and MLPClassifier iris dataset examples: KNN classifier, MLPClassifier and feature scaling
8 (2/28) Unsupervised learning - KMeans KMeans example using the seeds dataset
8 (3/2) Unsupervised learning and Capstone Project Development HW5 Capstone Proposal (due March 17th 5pm)
9 NO CLASSES SPRING BREAK --
10 (3/14) Dimensionality Reduction, Feature Extraction with PCA PCA feature extraction example using the wine dataset
11 (3/21) Feature Selection methods
11 (3/23) Thursday: Project work
12 (3/28) Cross-validation for training models on small datasets
12 (3/30) Thursday: Project work
13 (4/4) Paper discussion
13 (4/6) Thursday: Project work
14 (4/11) TBD
14 (4/13) Thursday: Project work
15 (4/18) Project Presentations Capstone Presentation Instructions
15 (4/20) Project Presentations

Learning objectives

  1. Understand FAIR data principles and how to apply them when generating, sharing and accessing data.
  2. Develop a working knowledge of existing ocean and earth science databases and how to efficiently access data from them, including via APIs.
  3. Students will develop their own data analysis toolbox using, but not limited to, Python and shell scripts.
  4. Understand and use version control (e.g. git), environments (e.g. conda) and code repositories (e.g. GitHub) to manage and share code.
  5. Understand the underlying principles of machine learning techniques for regression and classification, including supervised and unsupervised learning and apply them to a targeted research question.
  6. Understand the process of model evaluation and optimization and commonly used metrics for reporting model performance.

Capstone Project

The goal of the final capstone project is to assess students ability to combine and apply the skills learned in class in the context of a real-world research problem. The class will mostly focus on tools for data analysis, visualization and developing and evaluating machine learning models, so this will be the focus of the capstone project. Students must have the dataset(s) and general scope of their capstone project approved by the instructor the week after spring break.

Detailed information on the capstone project is posted here.

About

Course materials for OEAS805

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published