Revamp of lesson structure + content #40

mike-ivs · 2023-05-03T04:10:26Z

Hi Team! (the repo looked a bit quiet... I hope this hasn't gone stale! <3 )

We recently ran a "carpentries style" Introduction to Python/ML/DL workshop for which we included this incubator lesson (over other pre-alpha/alpha carpentry incubators) alongside Novice-inflammation and Intro-to-Deep-learning (incubator in Beta).

We were a bit surprised that there is no formal "intro to ML" lesson in the carpentries and so we decided (as others have #37 and here) to pick this incubator lesson as the most established/suited and make a few further changes to content and structure before we delivered.

Now that we've made and delivered the first bunch of these changes we thought it would be useful to feed them back into the lesson and community to get some wider feedback and hopefully help the carpentries get an established "intro to ML" lesson.

I've submitted our changes all at once and will summarise them below in a bit more detail. I'm happy to re-submit them in smaller, by-episode chunks if that is easier for you.

Changes

Overall structure

We've adjusted the overall structure of the lesson to give you a more balanced overview of supervised and unsupervised learning, with examples of regression, classification(new), clustering, and dimension reduction.

For each of those episodes we made sure to show and compare two different techniques to give a flavour of the topics:

regression - linear vs polynomial
classification - Decision tree vs SVM
clustering - k-means vs spectral
dim red - PCA vs t-SNE

We also tried to reduce the conceptual overhead for ML / gradually introduce concepts as the lesson progressed:

in ep.1 we touch on "what if we compare against new data" and in ep.2 we introduce train-test splits
in ep.2 we touch on "over-fitting vs model complexity" and in ep.3 we play more with hyper-parameters

We also made some tweaks across the whole lesson to improve text flow/clarity/formatting, and added in a few more figures / more plotting code to help reinforce things with the visual aspect of learning.

Introduction

We overhauled the introduction to give a clearer explanation of:

what is machine learning
where is it used in our daily lives
AI vs ML vs DL (very similar to the intro-to-DL lesson, shameless figure reuse)
Types of machine learning; summary of which are covered in the lesson
limitations of ML

We removed the "over hyping" section as, while it may be true that ML/AI is overhyped, it felt like a bit too negative of a tone to take for an introduction to the topic.

Regression

We decided to remove the "create your own python regression" lesson in favour of using purely SKlearn by combining the two regression lessons into one. We needed extra time to teach classification, and while I understand the reasoning behind doing a manual regression before using SKlearn it felt like quite a time sink to not use it in a lesson about "ML with SKlearn".

We added in a quick section to introduce Supervised learning and Sklearn before moving onto regression. We also used a small test dataset instead of the gapminder dataset (as done by #39 ) to try and reduce the learner burden of having to understand the dataset alongside learning ML for the first time. (maybe it's too small of a dataset...)

Classification

This one felt like it was missing from the original! We made a quick classification lesson, based upon the same penguin dataset as the "intro-to-DL" lesson. It steps up the complexity of the coding from a simple 2-list dataset, but it feels like a nice intermediate between the regression lesson and the eventual "intro to DL" lesson.

Clustering

We added in a section to explain the idea of unsupervised learning, touched a little on the concept of hyper-parameters, and broke up the code to make a few more plots to give bit more of a visualisation of the clustering process.

Dimension reduction

We expanded this section to try and give a better overview of the MNIST dataset and the higher dimensionality of these images. We also tried to give a better explanation of PCA, though have only just glanced through #39 it would be worth including some of those changes into the lesson!

Neural Networks

We left this section mostly unchanged (apart from minor grammar/flow changes). Given that we ran "Intro to ML" AND "intro to DL" we actually left the NN section to the "Intro to DL" part of our workshop, in favour of covering the classical learning in ML.

My two cents on the direction of development

Given the advanced development of the "intro to DL" lesson it might be worth dropping the NN section of this lesson and instead focusing on Ensemble learning and/or Reinforcement learning in future expansions of this lesson - they seem to be the only big "ML" topics that aren't covered whereas NNs are a mandatory concept for the "intro to DL"

Thanks for all the effort put in so far, and happy to discuss this PR :)

"Episode 05 - Dimensionality reduction has been completed. figures pca.svg, tsne.svg, MnistExamples.png is added"

Add files via upload

Jens

Classification lesson 1st draft

Jens take on Regression

converted jupyter with jupytext to markdown

Update with new changes from Mikes repo

Toms tweaks to the lesson text

Tweaked JensRegression

Changes made to enable delivery of CeR - ML Carpentries - August 2024

…rning-novice-sklearn into gh-pages

mike-ivs · 2024-07-30T22:36:38Z

Closing for now due to significant changes

Changes to classification

mike-ivs · 2024-09-23T21:58:13Z

Reopening after a chat with Colin :)

I'll go through and make a summary of all the changes we've done along the way, a combination of the initial changes we mentioned in the PR and all the additional changes we built upon those.

The new lesson can be previewed here - https://mike-ivs.github.io/machine-learning-novice-sklearn/

mike-ivs · 2024-09-24T02:47:27Z

Overall structure

We've adjusted the overall structure of the lesson to give a broad overview of basic ML: what ML is (vs DL+AI), supervised vs non-supervised, regression, classification, clustering, dimensionality reduction, and ensemble learning.

For each of those episodes we made sure to show and compare two different techniques to give a flavour of the topics:

regression: linear vs polynomial
classification: Decision tree vs SVM
ensemble: Bagging vs Stacking
clustering: k-means vs spectral
dimensionality reduction: PCA vs t-SNE

We also tried to reduce the conceptual overhead for ML / gradually introduce concepts as the lesson progressed:

in ep.1 we introduce the general "ML/DL" workflow, fit some data, and ease towards the concept of overfitting on a data subset.
in ep.2 we introduce "train-test-split" and the concept of hyper parameters.
in ep.3 we build on regression/classification using Ensemble techniques (random forest)
in ep.4 build on the concept of hyper parameters and introduce the idea of performance (tradeoffs)
in ep.5 we look at larger/complex data, and frame dimensionality reduction as a useful step prior to other ML techniques.

We've tried to function'ise the code as much as possible, the idea being we slowly go through the process of creating reusable workflow functions before putting them into practice multiple times (new data, hyperparameter changes, etc) i.e. teaching the underlying workflow before practicing doing it a few times.

We've also tried to keep the datasets as "built-in" as possible to reduce any prep-overhead prior to teaching a workshop.

mike-ivs and others added 30 commits February 10, 2023 16:53

Change to episode structure

9c36cbd

Change to structure of intro, not text

ce4da98

improvements to intro: Ai vs ML vs DL

803707c

Minor changes for GHs sake

09d45bb

Add link to SKLearn in intro

1c163fd

small improvements to text flow

1336223

fix to challenge boxes

7544a5d

fix challenge boxes again

1fea65f

another fix to top challenge box

3c9f85c

Add files via upload

9e50248

"Episode 05 - Dimensionality reduction has been completed. figures pca.svg, tsne.svg, MnistExamples.png is added"

Merge pull request #6 from NidhiGowdra/gh-pages

4fd42b3

Add files via upload

Override existing files with new files

16f1062

Update introduction

3eaf209

Tweaks to episode 2

e100eb1

Classification lesson 1st draft

9d46aa4

Test commit

2272598

Create 02-regression Jens.md

46d7573

Jens

Jens2

82b5628

Merge pull request #8 from NelisDrost/gh-pages

077db0f

Classification lesson 1st draft

Merge pull request #9 from jensbri/gh-pages

61a6798

Jens take on Regression

converted jupyter with jupytext to markdown

492306c

Merge pull request #10 from jensbri/gh-pages

79286e3

converted jupyter with jupytext to markdown

fix to hyperlinks

f38ee5f

Initial tweaks to lessons 1,2,4,5

42b5c84

Second major set of tweaks

b9b02a7

Merge pull request #1 from mike-ivs/gh-pages

e81a571

Update with new changes from Mikes repo

Merge pull request #11 from tesaunders/gh-pages

c362397

Toms tweaks to the lesson text

Tweaked JensRegression

29da1b6

Merge pull request #12 from tesaunders/gh-pages

79a7e8e

Tweaked JensRegression

tweaks to technicalities

2f7a00d

mike-ivs and others added 20 commits May 25, 2024 15:36

Reorder the classification content

96f1c5f

fix note headers

daee623

fix another heading

e9ea7dc

update intro figure

8efe2db

Tweak ordering of classification

d9cbb0e

Ensemble updated

fc95e6b

Ensemble updated regression house price

a481ff5

voting regressor house price

bf329fb

voting regressor house price content change

14bf8b6

voting regressor house price page changed

15762c2

dimensionality reduction refactor and ensemble clean up

efc80c3

clean up ep 04,06 and fig

3c2ad8a

ensemble penguins classification added

db8362f

Merge pull request #20 from NidhiGowdra/gh-pages

3f8a217

Changes made to enable delivery of CeR - ML Carpentries - August 2024

ensemble methods code errors fixed

891785d

Dimensionality reduction changes in code and figures

ca8ee63

ep 05-06 code consistancy and refactor

d3b582c

Merge branch 'mike-ivs:gh-pages' into gh-pages

dbc547e

changes to classification and omission of NN

5db707c

Merge branch 'gh-pages' of https://github.com/NidhiGowdra/machine-lea…

136d1c3

…rning-novice-sklearn into gh-pages

mike-ivs closed this Jul 30, 2024

NidhiGowdra and others added 2 commits July 31, 2024 10:36

reverting omission of NN

66930c7

Merge pull request #21 from NidhiGowdra/gh-pages

8fb9d3c

Changes to classification

mike-ivs reopened this Sep 23, 2024

colinsauze approved these changes Sep 25, 2024

View reviewed changes

colinsauze merged commit b77f3ca into carpentries-incubator:gh-pages Sep 25, 2024
6 checks passed

colinsauze mentioned this pull request Nov 12, 2024

Helper function scripts, cluster quality assessment, and PCA elaboration #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp of lesson structure + content #40

Revamp of lesson structure + content #40

mike-ivs commented May 3, 2023

mike-ivs commented Jul 30, 2024

mike-ivs commented Sep 23, 2024

mike-ivs commented Sep 24, 2024

Revamp of lesson structure + content #40

Revamp of lesson structure + content #40

Conversation

mike-ivs commented May 3, 2023

Changes

Overall structure

Introduction

Regression

Classification

Clustering

Dimension reduction

Neural Networks

My two cents on the direction of development

mike-ivs commented Jul 30, 2024

mike-ivs commented Sep 23, 2024

mike-ivs commented Sep 24, 2024

Overall structure