-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp of lesson structure + content #40
Revamp of lesson structure + content #40
Conversation
"Episode 05 - Dimensionality reduction has been completed. figures pca.svg, tsne.svg, MnistExamples.png is added"
Add files via upload
Classification lesson 1st draft
Jens take on Regression
converted jupyter with jupytext to markdown
Update with new changes from Mikes repo
Toms tweaks to the lesson text
Tweaked JensRegression
Changes made to enable delivery of CeR - ML Carpentries - August 2024
Closing for now due to significant changes |
Changes to classification
Reopening after a chat with Colin :) I'll go through and make a summary of all the changes we've done along the way, a combination of the initial changes we mentioned in the PR and all the additional changes we built upon those. The new lesson can be previewed here - https://mike-ivs.github.io/machine-learning-novice-sklearn/ |
Overall structureWe've adjusted the overall structure of the lesson to give a broad overview of basic ML: what ML is (vs DL+AI), supervised vs non-supervised, regression, classification, clustering, dimensionality reduction, and ensemble learning. For each of those episodes we made sure to show and compare two different techniques to give a flavour of the topics:
We also tried to reduce the conceptual overhead for ML / gradually introduce concepts as the lesson progressed:
We've tried to function'ise the code as much as possible, the idea being we slowly go through the process of creating reusable workflow functions before putting them into practice multiple times (new data, hyperparameter changes, etc) i.e. teaching the underlying workflow before practicing doing it a few times. We've also tried to keep the datasets as "built-in" as possible to reduce any prep-overhead prior to teaching a workshop. |
Hi Team! (the repo looked a bit quiet... I hope this hasn't gone stale! <3 )
We recently ran a "carpentries style" Introduction to Python/ML/DL workshop for which we included this incubator lesson (over other pre-alpha/alpha carpentry incubators) alongside Novice-inflammation and Intro-to-Deep-learning (incubator in Beta).
We were a bit surprised that there is no formal "intro to ML" lesson in the carpentries and so we decided (as others have #37 and here) to pick this incubator lesson as the most established/suited and make a few further changes to content and structure before we delivered.
Now that we've made and delivered the first bunch of these changes we thought it would be useful to feed them back into the lesson and community to get some wider feedback and hopefully help the carpentries get an established "intro to ML" lesson.
I've submitted our changes all at once and will summarise them below in a bit more detail. I'm happy to re-submit them in smaller, by-episode chunks if that is easier for you.
Changes
Overall structure
We've adjusted the overall structure of the lesson to give you a more balanced overview of supervised and unsupervised learning, with examples of regression, classification(new), clustering, and dimension reduction.
For each of those episodes we made sure to show and compare two different techniques to give a flavour of the topics:
We also tried to reduce the conceptual overhead for ML / gradually introduce concepts as the lesson progressed:
We also made some tweaks across the whole lesson to improve text flow/clarity/formatting, and added in a few more figures / more plotting code to help reinforce things with the visual aspect of learning.
Introduction
We overhauled the introduction to give a clearer explanation of:
We removed the "over hyping" section as, while it may be true that ML/AI is overhyped, it felt like a bit too negative of a tone to take for an introduction to the topic.
Regression
We decided to remove the "create your own python regression" lesson in favour of using purely SKlearn by combining the two regression lessons into one. We needed extra time to teach classification, and while I understand the reasoning behind doing a manual regression before using SKlearn it felt like quite a time sink to not use it in a lesson about "ML with SKlearn".
We added in a quick section to introduce Supervised learning and Sklearn before moving onto regression. We also used a small test dataset instead of the gapminder dataset (as done by #39 ) to try and reduce the learner burden of having to understand the dataset alongside learning ML for the first time. (maybe it's too small of a dataset...)
Classification
This one felt like it was missing from the original! We made a quick classification lesson, based upon the same penguin dataset as the "intro-to-DL" lesson. It steps up the complexity of the coding from a simple 2-list dataset, but it feels like a nice intermediate between the regression lesson and the eventual "intro to DL" lesson.
Clustering
We added in a section to explain the idea of unsupervised learning, touched a little on the concept of hyper-parameters, and broke up the code to make a few more plots to give bit more of a visualisation of the clustering process.
Dimension reduction
We expanded this section to try and give a better overview of the MNIST dataset and the higher dimensionality of these images. We also tried to give a better explanation of PCA, though have only just glanced through #39 it would be worth including some of those changes into the lesson!
Neural Networks
We left this section mostly unchanged (apart from minor grammar/flow changes). Given that we ran "Intro to ML" AND "intro to DL" we actually left the NN section to the "Intro to DL" part of our workshop, in favour of covering the classical learning in ML.
My two cents on the direction of development
Given the advanced development of the "intro to DL" lesson it might be worth dropping the NN section of this lesson and instead focusing on Ensemble learning and/or Reinforcement learning in future expansions of this lesson - they seem to be the only big "ML" topics that aren't covered whereas NNs are a mandatory concept for the "intro to DL"
Thanks for all the effort put in so far, and happy to discuss this PR :)