diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md index 4de21d6..165d4df 100644 --- a/_episodes/01-introduction.md +++ b/_episodes/01-introduction.md @@ -12,14 +12,14 @@ objectives: keypoints: - "Machine learning is a set of tools and techniques that use data to make predictions." -- "Artificial intelligence is a broader term that refers to making computers show human like intelligence." +- "Artificial intelligence is a broader term that refers to making computers show human-like intelligence." - "Deep learning is a subset of machine learning." - "All machine learning systems have limitations to be aware of." --- # What is machine learning? -Machine learning is a set of techniques that enable computers to use data to improve in their performance of a given task. This is similar in concept to how humans learn to make predictions based upon previous experience and knowledge. Machine learning encompasses a wide range of activities, but broadly speaking it can be used to: find trends in a dataset, classify data into groups or categories, make decisions and predictions based upon data, and even "learn" how to interact with an environment when provided with goals to achieve. +Machine learning is a set of techniques that enable computers to use data to improve their performance in a given task. This is similar in concept to how humans learn to make predictions based upon previous experience and knowledge. Machine learning encompasses a wide range of activities, but broadly speaking it can be used to: find trends in a dataset, classify data into groups or categories, make decisions and predictions based upon data, and even "learn" how to interact with an environment when provided with goals to achieve. ### Machine learning in our daily lives @@ -41,15 +41,15 @@ Machine learning has quickly become an important technology and is now frequentl > 4. learning to interact in an environment {: .challenge} -### Artificial Intelligence vs Machine Learning +### Artificial intelligence vs machine learning The term machine learning (ML) is often mentioned alongside artificial intelligence (AI) and deep learning (DL). Deep learning is a subset of machine learning, and machine learning is a subset of artificial intelligence. -AI is a broad term used to describe a system possessing a "general intelligence" that can be applied to solve a diverse range problems, often mimicking the behaviour of intelligent biological systems. Another definition of AI dates back to the 1950s and Alan Turing's "Immitation Game". Turing said we could consider a system intelligent when it could fool a human into thinking they were talking to another human when they were actually talking to a computer. Modern attempts are getting close to fooling humans, but although there have been great advances in AI and ML research, human-like intelligence is only possible in a few specialist areas. +AI is a broad term used to describe a system possessing a "general intelligence" that can be applied to solve a diverse range of problems, often mimicking the behaviour of intelligent biological systems. Another definition of AI dates back to the 1950s and Alan Turing's "Immitation Game". Turing said we could consider a system intelligent when it could fool a human into thinking they were talking to another human when they were actually talking to a computer. Modern attempts are getting close to fooling humans, but while there have been great advances in AI and ML research, human-like intelligence is only possible in a few specialist areas. -ML refers to techniques where a computer can "learn" patterns in data, usually by being shown many training examples. While ML-algorithms can learn to solve specific problems, or multiple similar problems, they are not considered to possess a general intelligence. ML-algorithms often need hundreds or thousands of examples to learn a task and are confined to tasks such as simple classifications. A human-like system could learn much quicker than this, and potentially learn from a single example by using it's knowledge of many other problems. +ML refers to techniques where a computer can "learn" patterns in data, usually by being shown many training examples. While ML algorithms can learn to solve specific problems, or multiple similar problems, they are not considered to possess a general intelligence. ML algorithms often need hundreds or thousands of examples to learn a task and are confined to activities such as simple classifications. A human-like system could learn much quicker than this, and potentially learn from a single example by using it's knowledge of many other problems. -DL is a particular field of machine learning where algorithms called neural networks are used to create highly-complex systems. Large collections of neural networks are able to learn from vast quantities of data. Deep learning can be used to solve a wide range of problems, but it can also require huge amounts of input data and computational resources to train. +DL is a particular field of machine learning where algorithms called neural networks are used to create highly complex systems. Large collections of neural networks are able to learn from vast quantities of data. Deep learning can be used to solve a wide range of problems, but it can also require huge amounts of input data and computational resources to train. The image below shows the relationships between artificial intelligence, machine learning and deep learning. @@ -65,11 +65,11 @@ The image above is by Tukijaaliwa, CC BY-SA 4.0, via Wikimedia Commons, original > 4. Do you have any examples of the system failing? {: .challenge} -# What are some useful types of Machine Learning? +# Useful types of lachine learning This lesson will introduce you to some of the key concepts and sub-domains of ML such as supervised learning, unsupervised learning, and neural networks. -The figure below provides a nice overview of some of the sub-domains of ML and the techniques used within each sub-domain. We recommend checking out the Scikit Learn [webpage](https://scikit-learn.org/stable/index.html) for additional examples of the topics we will cover in this lesson. We will cover topics highlighted in blue: classical learning techniques such as regression, classification, clustering, and dimension reduction, as well as a brief introduction to neural networks using perceptrons. +The figure below provides a nice overview of some of the sub-domains of ML and the techniques used within each sub-domain. We recommend checking out the Scikit-Learn [webpage](https://scikit-learn.org/stable/index.html) for additional examples of the topics we will cover in this lesson. We will cover topics highlighted in blue: classical learning techniques such as regression, classification, clustering, and dimension reduction, as well as a brief introduction to neural networks using perceptrons. ![Types of Machine Learning](../fig/ML_summary.png) [Image from Vasily Zubarev via their blog](https://vas3k.com/blog/machine_learning/) with modifications in blue to denote lesson content. @@ -82,7 +82,7 @@ There is a classic expression in computer science, "garbage in = garbage out". T ### Biases due to training data -The performance of a ML system depends on the breadth and quality of input data used to train it. If the input data contains biases or blind spots then these will be reflected in the ML system. For example, if we collect data on public transport use only from high-socioeconomic areas, the resulting input data may be biased due to the likelihood of people from those areas to use private transport vs public options. +The performance of a ML system depends on the breadth and quality of input data used to train it. If the input data contains biases or blind spots then these will be reflected in the ML system. For example, if we collect data on public transport use from only high socioeconomic areas, the resulting input data may be biased due to a range of factors that may increase the likelihood of people from those areas using private transport vs public options. ### Extrapolation diff --git a/_episodes/02-regression.md b/_episodes/02-regression.md index 6f37915..d4ee14e 100644 --- a/_episodes/02-regression.md +++ b/_episodes/02-regression.md @@ -3,7 +3,7 @@ title: "Regression" teaching: 30 exercises: 20 questions: -- "What is Supervised Learning?" +- "What is supervised learning?" - "How can I model data and make predictions using regression?" objectives: - "Apply linear regression with Scikit-Learn to create a model." @@ -12,16 +12,16 @@ objectives: - "Understand how more complex models can be built with non-linear equations." - "Apply polynomial modelling to non-linear data using Scikit-Learn." keypoints: -- "Scikit Learn is a Python library with lots of useful machine learning functions." -- "Scikit Learn includes a linear regression function." -- "Scikit Learn includes a polynomial modelling function which is useful for modelling non-linear data." +- "Scikit-Learn is a Python library with lots of useful machine learning functions." +- "Scikit-Learn includes a linear regression function." +- "Scikit-Learn includes a polynomial modelling function which is useful for modelling non-linear data." --- -# Supervised Learning +# Supervised learning -Classical machine learning is often divided into two categories – Supervised and Unsupervised Learning. +Classical machine learning is often divided into two categories – supervised and unsupervised learning. -For the case of supervised learning we act as a "supervisor" or "teacher" for our ML-algorithms by providing the algorithm with "labelled data" that contains example answers of what we wish the algorithm to achieve. +For the case of supervised learning we act as a "supervisor" or "teacher" for our ML algorithms by providing the algorithm with "labelled data" that contains example answers of what we wish the algorithm to achieve. For instance, if we wish to train our algorithm to distinguish between images of cats and dogs, we would provide our algorithm with images that have already been labelled as "cat" or "dog" so that it can learn from these examples. If we wished to train our algorithm to predict house prices over time we would provide our algorithm with example data of datetime values that are "labelled" with house prices. @@ -31,15 +31,15 @@ In this episode we will explore how we can use regression to build a "model" tha ## About Scikit-Learn -[Scikit-Learn](http://github.com/scikit-learn/scikit-learn) is a python package designed to give access to well-known machine learning algorithms within Python code, through a clean API. It has been built by hundreds of contributors from around the world, and is used across industry and academia. +[Scikit-Learn](http://github.com/scikit-learn/scikit-learn) is a python package designed to give access to well-known machine learning algorithms within Python code, through a clean application programming interface (API). It has been built by hundreds of contributors from around the world, and is used across industry and academia. Scikit-Learn is built upon Python's [NumPy (Numerical Python)](http://numpy.org) and [SciPy (Scientific Python)](http://scipy.org) libraries, which enable efficient in-core numerical and scientific computation within Python. As such, Scikit-Learn is not specifically designed for extremely large datasets, though there is [some work](https://github.com/ogrisel/parallel_ml_tutorial) in this area. For this introduction to ML we are going to stick to processing small to medium datasets with Scikit-Learn, without the need for a graphical processing unit (GPU). # Regression -Regression is a statistical technique that relates a dependent variable (a label in ML terms) to one or more independent variables. A regression model attempts to describe this relation by fitting the data as closely as possible according to a mathematical criteria. This model can then be used to predict new labelled values by inputting the independent variables into it - if we create a house price model we can then input any datetime value we wish to predict a new house price value for that inputted datetime. +Regression is a statistical technique that relates a dependent variable (a label in ML terms) to one or more independent variables. A regression model attempts to describe this relation by fitting the data as closely as possible according to mathematical criteria. This model can then be used to predict new labelled values by inputting the independent variables into it. For example, if we create a house price model we can then feed in any datetime value we wish, and get a new house price value prediction. -Regression can be as simple as drawing a "line of best fit" through data points, known as Linear regression, or more complex models such as polynomial regression, and is used routinely around the world in both industry and research. You may have already used regression in the past without knowing that it is also considered a machine learning technique! +Regression can be as simple as drawing a "line of best fit" through data points, known as linear regression, or more complex models such as polynomial regression, and is used routinely around the world in both industry and research. You may have already used regression in the past without knowing that it is also considered a machine learning technique! ![Example of linear and polynomial regressions](../fig/regression_example.png) @@ -65,7 +65,7 @@ plt.show() ![Inspection of our dataset](../fig/regression_inspect.png) -Now lets import scikit-Learn and use it to create a linear regression model. The Scikit-Learn `regression` function that we will use is designed for datasets where multiple parameters are used and so it expects to be given multi-dimensional array data. To get it to accept our single dimension data, we need to convert the simple lists to numpy arrays with numpy's `reshape` function. +Now lets import Scikit-Learn and use it to create a linear regression model. The Scikit-Learn `regression` function that we will use is designed for datasets where multiple parameters are used and so it expects to be given multi-dimensional array data. To get it to accept our single dimension data, we need to convert the simple lists to numpy arrays with numpy's `reshape` function. ~~~ import sklearn.linear_model as skl_lin @@ -110,7 +110,7 @@ plt.show() ![Linear regression of our dataset](../fig/regression_linear.png) -This looks like a reasonably good fit to the data points, but rather than rely on our own judgement lets calculate the fit error instead. Scikit-Learn doesn't provide a root mean squared error function, but it does provide a mean squared error function. We can calculate the root mean squared error simply by taking the square root of the output of this function. The `mean_squared_error` function is part of the Scikit-Learn metrics module, so we'll have to add that to our imports as well as the `math` module: +This looks like a reasonably good fit to the data points, but rather than rely on our own judgement lets calculate the fit error instead. Scikit-Learn doesn't provide a root mean squared error function, but it does provide a mean squared error function. We can calculate the root mean squared error simply by taking the square root of the output of this function. The `mean_squared_error` function is part of the Scikit-Learn `metrics` module, so we'll have to add that to our imports as well as the `math` module: import math import sklearn.metrics as skl_metrics @@ -176,7 +176,7 @@ Comparing the plots and errors it seems like a polynomial regression of N=2 fits {: .challenge} -> ## Exercise: How do are models perform against new data? +> ## Exercise: How do models perform against new data? > We now have some more exam score data that we can use to evaluate our existing models: > ~~~ > x_new = [2.5, 4.5, 6.7, 8, 10, 11] # hours spent revising @@ -190,7 +190,7 @@ Comparing the plots and errors it seems like a polynomial regression of N=2 fits > {: .solution} {: .challenge} -When looking at our original dataset it seems the higher the degree of polynomial the better the fit as the curve hits all the points. But as soon as we input our new dataset we see that our models fail to predict the new results, and higher degree polynomials noticible perform worse than the original linear regression. This phenomena is known as overfitting - our original models have become too specific to our original data and now lack the generality we expect from a model. You could say that our models have learnt the answers but failed to understand the assignment! +When looking at our original dataset it seems the higher the degree of polynomial, the better the fit, as the curve hits all the points. But as soon as we input our new dataset we see that our models fail to predict the new results, and higher degree polynomials perform noticably worse than the original linear regression. This phenomenon is known as overfitting - our original models have become too specific to our original data and now lack the generality we expect from a model. You could say that our models have learnt the answers but failed to understand the assignment! Remember: *Garbage in, Garbage out* and *correlation does not equal causation*. Just because almost every winner in the olympic games drank water, it doesn't mean that drinking heaps of water will make you an olympic winner. diff --git a/_episodes/03-classification.md b/_episodes/03-classification.md index 9a0da7a..8d5c965 100644 --- a/_episodes/03-classification.md +++ b/_episodes/03-classification.md @@ -6,7 +6,7 @@ questions: - "How can I classify data into known categories?" objectives: - "Use two different supervised methods to classify data." -- "Learn about the concept of Hyper-parameters." +- "Learn about the concept of hyper-parameters." - "Learn to validate and ?cross-validate? models" keypoints: - "Classification requires labelled data (is supervised)" @@ -18,8 +18,8 @@ Classification is a supervised method to recognise and group data objects into a In this lesson we are going to introduce the concept of supervised classification by classifying penguin data into different species of penguins using Scikit-Learn. -### The Penguin dataset -We're going to be using the penguins dataset of Allison Horst, published [here](https://github.com/allisonhorst/palmerpenguins) in 2020, which is comprised of 342 observations of three species of penguins: Adelie, Chinstrap & Gentoo. For each penguin we have measurements of its bill length and depth (mm), flipper length (mm) and body mass (g), as well as information on its species, island, and sex. +### The penguins dataset +We're going to be using the penguins dataset of Allison Horst, published [here](https://github.com/allisonhorst/palmerpenguins) in 2020, which is comprised of 342 observations of three species of penguins: Adelie, Chinstrap & Gentoo. For each penguin we have measurements of bill length and depth (mm), flipper length (mm), body mass (g), and information on species, island, and sex. ~~~ import seaborn as sns @@ -31,11 +31,11 @@ dataset.head() Our aim is to develop a classification model that will predict the species of a penguin based upon measurements of those variables. -As a rule of thumb for ML/DL modelling, it is best to start with a simple model and progressively add complexity to in order to meet our desired classification performance. +As a rule of thumb for ML/DL modelling, it is best to start with a simple model and progressively add complexity in order to meet our desired classification performance. For this lesson we will limit our dataset to only numerical values such as bill_length, bill_depth, flipper_length, and body_mass while we attempt to classify species. -The above table contains multiple categorical objects such as species, If we attempt to include the other categorical fields, island and sex, we hinder classification performance due to the complexity of the data. +The above table contains multiple categorical objects such as species. If we attempt to include the other categorical fields, island and sex, we hinder classification performance due to the complexity of the data. ### Training-testing split When undertaking any machine learning project, it's important to be able to evaluate how well your model works. In order to do this, we set aside some data (usually 20%) as a testing set, leaving the rest as your training dataset. @@ -85,14 +85,14 @@ plt.show() ~~~ {: .language-python} -We can see that penguins from each species form fairly distinct spatial clusters in these plots, so that you could draw lines between those clusters to delineate each species. This is effectively what many classification algorithms do - using the training data to delineate the observation space, in this case the 4 measurement dimensions, into classes. When given new observations the model then finds which of those class areas that observation falls in to. +We can see that penguins from each species form fairly distinct spatial clusters in these plots, so that you could draw lines between those clusters to delineate each species. This is effectively what many classification algorithms do. They use the training data to delineate the observation space, in this case the 4 measurement dimensions, into classes. When given a new observation, the model finds which of those class areas the new observation falls in to. -## Classification using a Decision Tree -We'll first apply a decision tree classifier to the data. Decisions trees are conceptually similar to flow diagrams (or more precisely for the biologists: dichotomous keys) - they split the classification problem into a binary tree of comparisons, at each step comparing a measurement to a value, and moving left or right down the tree until a classification is reached. +## Classification using a decision tree +We'll first apply a decision tree classifier to the data. Decisions trees are conceptually similar to flow diagrams (or more precisely for the biologists: dichotomous keys). They split the classification problem into a binary tree of comparisons, at each step comparing a measurement to a value, and moving left or right down the tree until a classification is reached. (figure) -Training and using a decision tree in scikit-learn is straightforward: +Training and using a decision tree in Scikit-Learn is straightforward: ~~~ from sklearn.tree import DecisionTreeClassifier, plot_tree @@ -121,12 +121,12 @@ plt.show() ~~~ {: .language-python} -We can see from this that there's some very tortuous logic being used to tease out every single observation in the training set - for example the single purple Gentoo node at the bottom of the tree. If we truncated that branch to the second level (Chinstrap), we'd have a little inaccuracy, 5 non-Chinstraps in with 47 Chinstraps, but a less convoluted model. +We can see from this that there's some very tortuous logic being used to tease out every single observation in the training set. For example, the single purple Gentoo node at the bottom of the tree. If we truncated that branch to the second level (Chinstrap), we'd have a little inaccuracy, 5 non-Chinstraps in with 47 Chinstraps, but a less convoluted model. -The tortuous logic, such as the bottom purple Gentoo node, is a clear indication that this model is over-fit - it has developed a very complex delineation of the classification space in order to match every single observation, which will likely lead to poor results for new observations. +The tortuous logic, such as the bottom purple Gentoo node, is a clear indication that this model has been over-fitted. It has developed a very complex delineation of the classification space in order to match every single observation, which will likely lead to poor results for new observations. ### Visualising the classification space -We can visualise the delineation produced, but only for two parameters at a time, so the model produced here isn't exactly that same as that used above: +We can visualise the delineation produced, but only for two parameters at a time, so the model produced here isn't exactly the same as that used above: ~~~ from sklearn.inspection import DecisionBoundaryDisplay @@ -145,15 +145,15 @@ plt.show() ~~~ {: .language-python} -We can see that rather than clean lines between species, the decision tree produces orthogonal regions as each decision only considers a single parameter. Again, we can see that the model is overfit as the decision space is far more complex than needed, with regions that only select a single point. +We can see that rather than clean lines between species, the decision tree produces orthogonal regions as each decision only considers a single parameter. Again, we can see that the model is over-fitting as the decision space is far more complex than needed, with regions that only select a single point. -## Classification using SVM -Next, we'll look at another commonly used classification algorithm, and see how it compares. Support Vector Machines (SVM) work in a way that is conceptually similar to your own intuition when first looking at the data - they devise a set of hyperplanes that delineate the parameter space, such that each region contains ideally only observations from one class, and the boundaries fall between classes. +## Classification using support vector machines +Next, we'll look at another commonly used classification algorithm, and see how it compares. Support Vector Machines (SVM) work in a way that is conceptually similar to your own intuition when first looking at the data. They devise a set of hyperplanes that delineate the parameter space, such that each region contains ideally only observations from one class, and the boundaries fall between classes. ### Normalising data -Unlike decision trees, SVMs require an additional pre-processing step for our data - we need it to be normalised. Our raw data has parameters with different magnitudes - bill length measured in 10's mm's vs. body mass measured in 1000's of grams. If we trained an SVM directly on this data, it would only consider the parameter with the greatest variance - body mass. +Unlike decision trees, SVMs require an additional pre-processing step for our data. We need to normalise it. Our raw data has parameters with different magnitudes such as bill length measured in 10's of mm's, whereas body mass is measured in 1000's of grams. If we trained an SVM directly on this data, it would only consider the parameter with the greatest variance (body mass). -Normalising maps each parameter to a new range, so that it has a mean of 0, and a standard deviation of 1. +Normalising maps each parameter to a new range so that it has a mean of 0 and a standard deviation of 1. ~~~ from sklearn import preprocessing @@ -217,7 +217,7 @@ plt.ylabel('Accuracy') ~~~ {: .language-python} -Here we can see that a maximum depth of two performs just as well as our original model with a depth of five - in this example if even performs a little better. +Here we can see that a maximum depth of two performs just as well as our original model with a depth of five. In this example it even performs a little better. Reusing our visualisation code from above, we can inspect our simplified decision tree and decision space: @@ -255,7 +255,7 @@ We can see that both the tree and the decision space are much simpler, but still ### Note that care is needed when splitting data -- You generally want to ensure that each class is represented proportionately in both training + testing (beware just taking the first 80%) -- Sometimes you want to make sure a group is excluded from the train/test split, e.g.: when multiple samples come from one individual +- You generally want to ensure that each class is represented proportionately in both training and testing (beware of just taking the first 80%). +- Sometimes you want to make sure a group is excluded from the train/test split, e.g.: when multiple samples come from one individual. - This is often called stratification See [Scikit-Learn](https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation-iterators) for more information. diff --git a/_episodes/04-clustering.md b/_episodes/04-clustering.md index f31d2d9..403e84b 100644 --- a/_episodes/04-clustering.md +++ b/_episodes/04-clustering.md @@ -3,7 +3,7 @@ title: "Clustering with Scikit-Learn" teaching: 15 exercises: 20 questions: -- "What is Unsupervised learning?" +- "What is unsupervised learning?" - "How can we use clustering to find data points with similar attributes?" objectives: - "Understand the difference between supervised and unsupervised learning" @@ -20,27 +20,27 @@ keypoints: - "Scikit-Learn has functions to create example data." --- -# Unsupervised Learning +# Unsupervised learning -In episode 2 we learnt about Supervised Learning. Now it is time to explore Unsupervised Learning. +In episode 2 we learnt about supervised learning. Now it is time to explore unsupervised learning. Sometimes we do not have the luxury of using labelled data. This could be for a number of reasons: -* We have labelled data, but not enough to accurately our train model +* We have labelled data, but not enough to accurately train our model * Our existing labelled data is low-quality or innacurate * It is too time-consuming to (manually) label more data * We have data, but no idea what correlations might exist that we could model! -In this case we need to use unsupervised learning. As the name suggests, this time we do not "supervise" the ML-algorithm by providing it labels, but instead we let it try to find its own patterns in the data and report back on any correlations that it might find. In a sense, you can think of unsupervised learning as a means of discovering labels from the data itself. +In this case we need to use unsupervised learning. As the name suggests, this time we do not "supervise" the ML algorithm by providing it labels, but instead we let it try to find its own patterns in the data and report back on any correlations that it might find. You can think of unsupervised learning as a way to discover labels from the data itself. # Clustering Clustering is the grouping of data points which are similar to each other. It can be a powerful technique for identifying patterns in data. -Clustering analysis does not usually require any training and is therefore known as an 'unsupervised' learning technique. Clustering can be applied quickly due to this lack of training. +Clustering analysis does not usually require any training and is therefore known as an unsupervised learning technique. Clustering can be applied quickly due to this lack of training. ## Applications of clustering * Looking for trends in data -* Reducing the data around a point to just that point as a form of data compression (e.g. reducing colour depth in an image) +* Reducing the data around a point to just that point (e.g. reducing colour depth in an image) * Pattern recognition ## K-means clustering @@ -73,7 +73,7 @@ data, cluster_id = skl_datasets.make_blobs(n_samples=400, cluster_std=0.75, cent ~~~ {: .language-python} -Now that we have some data we can try to identify the clusters using k-means. First, we need to initialise the KMeans module and tell it how many clusters to look for. Next, we supply it with some data via the `fit` function, in much the same we did with the regression functions earlier on. Finally, we run the predict function to find the clusters. +Now that we have some data we can try to identify the clusters using k-means. First, we need to initialise the KMeans module and tell it how many clusters to look for. Next, we supply it with some data via the `fit` function, in much the same way we did with the regression functions earlier on. Finally, we run the predict function to find the clusters. ~~~ Kmean = skl_cluster.KMeans(n_clusters=4) @@ -82,7 +82,7 @@ clusters = Kmean.predict(data) ~~~ {: .language-python} -The data can now be plotted to show all the points we randomly generated. To make it clearer which cluster points have been classified we can set the colours (the c parameter) to use the `clusters` list that was returned by the predict function. The Kmeans algorithm also lets us know where it identified the centre of each cluster. These are stored as a list called 'cluster_centers_' inside the `Kmean` object. Let's plot the points from the clusters, colouring them by the output from the K-means algorithm, and also plot the centres of each cluster as a red X. +The data can now be plotted to show all the points we randomly generated. To make it clearer which cluster points have been classified we can set the colours (the c parameter) to use the `clusters` list that was returned by the `predict` function. The Kmeans algorithm also lets us know where it identified the centre of each cluster. These are stored as a list called 'cluster_centers_' inside the `Kmean` object. Let's plot the points from the clusters, colouring them by the output from the K-means algorithm, and also plot the centres of each cluster as a red X. ~~~ import matplotlib.pyplot as plt diff --git a/_episodes/05-dimensionality-reduction.md b/_episodes/05-dimensionality-reduction.md index a2113c8..4448470 100644 --- a/_episodes/05-dimensionality-reduction.md +++ b/_episodes/05-dimensionality-reduction.md @@ -16,7 +16,7 @@ keypoints: # Dimensionality reduction -As seen in the last episode, general clustering algorithms work well with low-dimensional data. In this episode we will work with higher-dimension data such as images of handwritten text or numbers. The dataset we will be using is the Modified National Institute of Standards and Technology (MNIST) dataset. The MNIST dataset contains 60,000 handwritten labelled images from 0-9. An illustration of the dataset is presented below. +As seen in the last episode, general clustering algorithms work well with low-dimensional data. In this episode we will work with higher-dimension data such as images of handwritten text or numbers. The dataset we will be using is the Modified National Institute of Standards and Technology (MNIST) dataset. The MNIST dataset contains 70,000 images of handwritten numbers from 0-9, labelled with the number they contain. An illustration of the dataset is presented below. TODO EXPLAIN THE 8x8 64 dimensions @@ -65,7 +65,7 @@ The process of reducing dimensionality in PCA is as follows, Minimizing the eigen values closer to zero implies that the dataset has been successfully decomposed into it's respective principal components. -Utilizing Scikit-Learn makes applying PCA very easy. Lets code and apply PCA to the MNIST dataset. +Scikit-Learn lets us apply PCA in a relatively simple way. Lets code and apply PCA to the MNIST dataset. ~~~ # PCA @@ -93,7 +93,7 @@ plt.savefig("pca.svg") As illustrated in the figure above, PCA does not handle outlier data well, primarily due to global preservation of structural information. Pre-determining the principal components also has some of the same drawbacks as k-means clustering approaches. ### t-distributed Stochastic Neighbor Embedding (t-SNE) -t-SNE is a non-deterministic non-linear technique which involves several optional hyperparameters such as perplexity, learning rate, and number of steps. While the t-SNE algorithm is complex to explain, it works on the principle of preserving local similarities by minimizing the pairwise gaussian distance between two or more points in high-dimensional space. The versatility of the algorithm in transforming the underlying structural information into lower-order projections makes t-SNE applicable to a wide range of research domains. +t-SNE is a non-deterministic non-linear technique which involves several optional hyper-parameters such as perplexity, learning rate, and number of steps. While the t-SNE algorithm is complex to explain, it works on the principle of preserving local similarities by minimizing the pairwise gaussian distance between two or more points in high-dimensional space. The versatility of the algorithm in transforming the underlying structural information into lower-order projections makes t-SNE applicable to a wide range of research domains. Scikit-Learn allows us to apply t-SNE in a relatively simple way. Lets code and apply t-SNE to the MNIST dataset. @@ -112,7 +112,7 @@ plt.savefig("tsne.svg") ![Reduction using t-SNE](../fig/tsne.svg) -The major drawback of applying t-SNE to datasets is the large computational requirement. Furthermore, hyperparameter tuning of t-SNE usually requires some trial-and-error to perfect. In the above figure, the algorithm still has trouble in separating all the classes perfectly. To account for even higher-order input data, neural networks were developed to more accurately extract feature information. +The major drawback of applying t-SNE to datasets is the large computational requirement. Furthermore, hyper-parameter tuning of t-SNE usually requires some trial and error to perfect. In the above figure, the algorithm still has trouble separating all the classes perfectly. To account for even higher-order input data, neural networks were developed to more accurately extract feature information. > ## Exercise: Working in three dimensions diff --git a/_episodes/06-neural-networks.md b/_episodes/06-neural-networks.md index 510c63b..6a1ecc2 100644 --- a/_episodes/06-neural-networks.md +++ b/_episodes/06-neural-networks.md @@ -28,8 +28,7 @@ keypoints: # Neural networks -Neural networks are a machine learning method inspired by how the human brain works. They are particularly good at doing pattern recognition and classification tasks, often using images as inputs. They are a well-established machine learning technique, having been around since the 1950s, but they've gone through several iterations to overcome limitations in previous generations. Using state-of-the-art neural networks is often referred to as 'deep learning'. - +Neural networks are a machine learning method inspired by how the human brain works. They are particularly good at pattern recognition and classification tasks, often using images as inputs. They are a well-established machine learning technique, having been around since the 1950s, but they've gone through several iterations to overcome limitations in previous generations. Using state-of-the-art neural networks is often referred to as 'deep learning'. ## Perceptrons @@ -41,7 +40,7 @@ Perceptrons are the building blocks of neural networks. They are an artificial v Below is an example of a perceptron written as a Python function. The function takes three parameters: `Inputs` is a list of input values, `Weights` is a list of weight values and `Threshold` is the activation threshold. -First let us multiply each input by the corresponding weight. To do this quickly and concisely, we will use the numpy multiply function which can multiply each item in a list by a corresponding item in another list. +First we multiply each input by the corresponding weight. To do this quickly and concisely, we will use the numpy multiply function which can multiply each item in a list by a corresponding item in another list. We then take the sum of all the inputs multiplied by their weights. Finally, if this value is less than the activation threshold, we output zero, otherwise we output a one. @@ -159,7 +158,7 @@ A single perceptron cannot be used to solve a non-linearly separable function. F Multi-layer perceptrons need to be trained by showing them a set of training data and measuring the error between the network's predicted output and the true value. Training takes an iterative approach that improves the network a little each time a new training example is presented. There are a number of training algorithms available for a neural network today, but we are going to use one of the best established and well known, the backpropagation algorithm. This algorithm is called back propagation because it takes the error calculated between an output of the network and the true value and takes it back through the network to update the weights. If you want to read more about back propagation, please see [this chapter](http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf) from the book "Neural Networks - A Systematic Introduction". -### Multi-layer perceptrons in scikit-learn +### Multi-layer perceptrons in Scikit-Learn We are going to build a multi-layer perceptron for recognising handwriting from images. Scikit-Learn includes some example handwriting data from the [MNIST data set](http://yann.lecun.com/exdb/mnist/), which is a dataset containing 70,000 images of hand-written digits. Each image is 28x28 pixels in size (784 pixels in total) and is represented in grayscale with values between zero for fully black and 255 for fully white. This means we will need 784 perceptrons in our input layer, each taking the input of one pixel and 10 perceptrons in our output layer to represent each digit we might classify. If trained correctly, only the perceptron in the output layer will "fire" to represent the contents of the image (but this is a massive oversimplification!). @@ -182,7 +181,7 @@ data = data / 255.0 This is instead of writing a loop ourselves to divide every pixel by 255. Although the final result is the same and will take about the same amount of computation (possibly a little less, it might do some clever optimisations). -Now we need to initialise a neural network. Scikit-Learn has an entire library `sklearn.neural_network` for this and the `MLPClassifier` class handles multi-layer perceptrons. This network takes a few parameters including the size of the hidden layer, the maximum number of training iterations we're going to allow, the exact algorithm to use, whether or not we'd like verbose output about what the training is doing, and the initial state of the random number generator. +Now we need to initialise a neural network. Scikit-Learn has an entire library for this (`sklearn.neural_network`) and the `MLPClassifier` class handles multi-layer perceptrons. This network takes a few parameters including the size of the hidden layer, the maximum number of training iterations we're going to allow, the exact algorithm to use, whether or not we'd like verbose output about what the training is doing, and the initial state of the random number generator. In this example we specify a multi-layer perceptron with 50 hidden nodes, we allow a maximum of 50 iterations to train it, we turn on verbose output to see what's happening, and initialise the random state to 1 so that we always get the same behaviour. @@ -239,7 +238,7 @@ mlp.fit(data_train,labels_train) ~~~ {: .language-python} -Finally, let us score the accuracy of our network against both the original training data and the test data. If the training had converged to the point where each iteration of training was not improving the accuracy, then the accuracy of the training data should be 1.0 (100%). +Finally, we will score the accuracy of our network against both the original training data and the test data. If the training had converged to the point where each iteration of training was not improving the accuracy, then the accuracy of the training data should be 1.0 (100%). ~~~ print("Training set score", mlp.score(data_train, labels_train)) @@ -412,7 +411,7 @@ test = 1,2 The `sklearn.model_selection` module provides support for doing k-fold cross validation in Scikit-Learn. It can automatically partition our data for cross validation. -Let us import this and call it `skl_msel` +Import this and call it `skl_msel` ~~~ import sklearn.model_selection as skl_msel @@ -487,13 +486,13 @@ mlp.fit(data,labels) ## Deep learning -Deep learning usually refers to newer neural network architectures which use a special type of network known as a 'convolutional network'. Typically, these have many layers and thousands of neurons. They are very good at tasks such as image recognition but take a long time to train and run. They are often used with GPUs (Graphical Processing Units) which are good at executing multiple operations simultaneously. It is very common to use cloud computing or HPC systems with multiple GPUs attached. +Deep learning usually refers to newer neural network architectures which use a special type of network known as a 'convolutional network'. Typically, these have many layers and thousands of neurons. They are very good at tasks such as image recognition but take a long time to train and run. They are often used with GPUs (Graphical Processing Units) which are good at executing multiple operations simultaneously. It is very common to use cloud computing or high performance computing systems with multiple GPUs attached. Scikit-Learn is not really setup for deep learning. We will have to rely on other libraries. Common choices include Google's TensorFlow, Keras, (Py)Torch or Darknet. There is, however, an interface layer between sklearn and tensorflow called skflow. A short example of using this layer can be found at [https://www.kdnuggets.com/2016/02/scikit-flow-easy-deep-learning-tensorflow-scikit-learn.html](https://www.kdnuggets.com/2016/02/scikit-flow-easy-deep-learning-tensorflow-scikit-learn.html). ### Cloud APIs -Google, Microsoft, Amazon, and many others now have cloud based Application Programming Interfaces (APIs) where you can upload an image and have them return you the result. Most of these services rely on a large pre-trained (and often proprietary) neural network. +Google, Microsoft, Amazon, and many other companys now have cloud based Application Programming Interfaces (APIs) where you can upload an image and have them return you the result. Most of these services rely on a large pre-trained (and often proprietary) neural network. > ## Exercise: Try cloud image classification > Take a photo with your phone camera or find an image online of a common daily scene. diff --git a/_episodes/07-ethics.md b/_episodes/07-ethics.md index 8396d70..04fa3b6 100644 --- a/_episodes/07-ethics.md +++ b/_episodes/07-ethics.md @@ -5,7 +5,7 @@ exercises: 5 questions: - "What are the ethical implications of using machine learning in research?" objectives: -- "Consider the ethical implications of machine learning in general and in research." +- "Consider the ethical implications of machine learning, in general, and in research." keypoints: - "The results of machine learning reflect biases in the training and input data." - "Many machine learning algorithms can't explain how they arrived at a decision." @@ -19,7 +19,7 @@ As machine learning has risen in visibility, so to have concerns around the ethi * The first death from a driverless car which failed to brake for a pedestrian.[\[1\]](https://www.forbes.com/sites/meriameberboucha/2018/05/28/uber-self-driving-car-crash-what-really-happened/) * Highly targetted advertising based around social media and internet usage. [\[2\]](https://www.wired.com/story/big-tech-can-use-ai-to-extract-many-more-ad-dollars-from-our-clicks/) -* The outcomes of elections and referenda being influenced by highly targetted social media posts. This is compunded by data being obtained without the users consent. [\[3\]](https://www.vox.com/policy-and-politics/2018/3/23/17151916/facebook-cambridge-analytica-trump-diagram) +* The outcomes of elections and referenda being influenced by highly targetted social media posts. This is compounded by data being obtained without the user's consent. [\[3\]](https://www.vox.com/policy-and-politics/2018/3/23/17151916/facebook-cambridge-analytica-trump-diagram) * The widespread use of facial recognition technologies. [\[4\]](https://www.bbc.co.uk/news/technology-44089161) * The potential for autonomous military robots to be deployed in combat. [\[5\]](https://www.theverge.com/2021/6/3/22462840/killer-robot-autonomous-drone-attack-libya-un-report-context) @@ -31,13 +31,13 @@ Machine learning systems are often argued to be be fairer and more impartial in Many machine learning systems (e.g. neural networks) can't really explain their decisions. Although the input and output are known, trying to explain why the training caused the network to behave in a certain way can be very difficult. When decisions are questioned by a human it's -difficult to provide any rationale as to how a decision was arrived at. +difficult to provide any rationale for how a decision was arrived at. ## Problems with accuracy No machine learning system is ever 100% accurate. Getting into the high 90s is usually considered good. But when we're evaluating millions of data items this can translate into 100s of thousands of mis-identifications. -This would be an unacceptable margin of error if the results were going to have major implications for people, such as being imprisoned or structuring debt repayments. +This would be an unacceptable margin of error if the results were going to have major implications for people, such as criminal sentencing decisions or structuring debt repayments. ## Energy use diff --git a/_episodes/08-learn-more.md b/_episodes/08-learn-more.md index 33cfee9..f9a770b 100644 --- a/_episodes/08-learn-more.md +++ b/_episodes/08-learn-more.md @@ -8,12 +8,12 @@ objectives: - "Know where to go to learn more about machine learning" keypoints: - "This course has only touched on a few areas of machine learning and is designed to teach you just enough to do something useful." -- "Machine learning is a rapidly developing field and new tools and techniques are constantly appearing." +- "Machine learning is a rapidly evolving field and new tools and techniques are constantly appearing." --- # Other algorithms -There are many other machine learning algorithms that might be suitable for helping to answer your research questions. +There are many other machine learning algorithms that might be suitable for helping you to answer your research questions. The Scikit-Learn [webpage](https://scikit-learn.org/stable/index.html) has a good overview of all the features available in the library.