bg | layout | mathjax | title |
---|---|---|---|
path.jpg |
page |
true |
datacampTutorial |
- Norm
- Special matrices
- Eigendecomposition
- Singular value decomposition
- Pseudoinverse
- Trace operator
- The determinant
- PCA
This series of tutorials will get you through the main linear algebra concepts usefull in deep learning, machine learning and data science in general. The goal is to get our hands dirty and bind code with mathematical concepts. There is no particular prerequisites but if you are not sure what a matrix is or how to do the dot product, you can see the first posts (1 to 4) of my series on the deep learning book by Ian Goodfellow.
I think that having a practical tutorials on theoretical topics like linear algebra can be useful because writing and reading code is a good way to truly understand mathematical concepts. And above all I think that it can be a lot of fun!
This first chapter concerns an important concept for machine learning and deep learning: the norm. The norm is extensively used in the context of supervised learning, to evaluate the goodness of a model. It will help you get what the norm represent. Imagine that you have a big data set of songs with a lot of metadata and that you want to build a model that predicts the duration of a song according to other features like the genre of music, the instrumentation etc. You think that you have a decent model that you have trained on a part of your database. But now you want to know if you model is good to predict the duration of a new song. One way to do so is to take some data not use for training your model and predict the song durations. Since you know the real duration of each song you can then look at the difference between real and predicted duration for each observation. Imagine that you have the following results in seconds:
errorModel1 = [22, -4, 2, 7, 6, -36, 12]
These differences can be tought of the error of the model. A perfect model would have only 0's while a very bad model would have huge values.
Now imagine that you try another model and you end up with the following differences between predicted and real song durations:
errorModel2 = [14, 9, -13, 19, 8, -21, 4]
What can you do if you want to find the best model? A natural way to do it is to take the sum of the absolute value of these errors. The absolute value is used because a negative error (true duration smaller than predicted duration) is also an error. The model with the smaller total error is the better:
totalErrorModel1 = np.sum(errorModel1)
totalErrorModel1
9
totalErrorModel2 = np.sum(errorModel2)
totalErrorModel2
20
It looks like the model 1 is far better than the model 2.
Congratulation! What you have just done is calculate the norm of the vector of errors!
You can think of the norm as the length of the vector. To have an idea of the graphical representation of this, let's take again our preceding example. The error vectors are multidimensional: there is one dimension per observation. In our simple case, there is 7 observations so there is 7 dimensions. It is still quite hard to represent 7 dimensions so let's again simplify the example and keep only 2 observations:
errorModel1 = [22, -4]
errorModel2 = [14, 9]
Now we can represent these vectors considering that the first element of the array is the x-coordinate and the second element is the y-coordinate. We will use a custom function defined bellow to plot the vectors.
plotVectors([errorModel1, errorModel2], [sns.color_palette()[0], sns.color_palette()[1]])
plt.xlim(-1, 25)
plt.ylim(-5, 10)
So we have two vectors and the way to see which one is the smaller one (hence which model has the smaller error) is to take the sum of each coordinate. It is what we have done earlier. Actually we need to take the absolute value of each coordinate because we don't want to cancel out coordinates.
We just used a custom function to plot vectors in Python. Let's see how it works:
def plotVectors(vecs, cols, alpha=1):
"""
Plot set of vectors.
Parameters
----------
vecs : array-like
Coordinates of the vectors to plot. Each vectors is in an array. For
instance: [[1, 3], [2, 2]] can be used to plot 2 vectors.
cols : array-like
Colors of the vectors. For instance: ['red', 'blue'] will display the
first vector in red and the second in blue.
alpha : float
Opacity of vectors
Returns:
fig : instance of matplotlib.figure.Figure
The figure of the vectors
"""
plt.axvline(x=0, color='#A9A9A9', zorder=0)
plt.axhline(y=0, color='#A9A9A9', zorder=0)
for i in range(len(vecs)):
if (isinstance(alpha, list)):
alpha_i = alpha[i]
else:
alpha_i = alpha
if (len(vecs[i])==2):
x = np.concatenate([[0,0],vecs[i]])
elif (len(vecs[i])==4):
x = vecs[i]
plt.quiver([x[0]],
[x[1]],
[x[2]],
[x[3]],
angles='xy', scale_units='xy', scale=1, color=cols[i],
alpha=alpha_i)
It takes an array of vectors that we want to plot (vecs
) and their colors as input (cols
). If only 2 dimensions are specified in the vector, it starts at (0, 0). Under the hood, we iterate on this array of vectors and use plt.quiver()
to plot them.
We saw one way to calculate the length of the vector but the norm can be any function that maps a vector to a positive value. Different functions can be used and we will see few examples. These functions can be called norms if they are characterized by the following properties:
-
Norms are non-negative values. If you think of the norms as a length, you easily see why it can't be negative.
-
Norms are
$0$ if and only if the vector is a zero vector -
Norms respect the triangle inequity. See bellow.
-
$\norm{\bs{k}\cdot \bs{u}}=\norm{\bs{k}}\cdot\norm{\bs{u}}$ . The norm of a vector multiplied by a scalar is equal to the absolute value of this scalar multiplied by the norm of the vector.
It is usually written with two horizontal bars:
The norm of the sum of some vectors is less than or equal the sum of the norms of these vectors.
To show what this means, we will take two vectors containing each two elements (usefull to be represented as x and y coordinates). Our vectors are:
u = np.array([1, 6])
u
array([1, 6])
and
v = np.array([4, 2])
v
array([4, 2])
Let's compare:
and:
using the np.linalg.norm()
.
np.linalg.norm(u+v)
9.4339811320566032
and
np.linalg.norm(u)+np.linalg.norm(v)
10.554898485297798
We can see that the triangle inequity is respected since:
Graphical explanation
You will see that the graphical representation of this theorem makes it quite trivial. We will plot the vectors
u = np.array([0,0,1,6])
v = np.array([0,0,4,2])
w = u+v
u_bis = [u[2], u[3], v[2],v[3]]
plotVectors([u, u_bis, w],
[sns.color_palette()[0],
sns.color_palette()[1],
sns.color_palette()[2]])
plt.xlim(-2, 6)
plt.ylim(-2, 9)
plt.text(-1, 3.5, r'$||\vec{u}||$', color=sns.color_palette()[0], size=20)
plt.text(2.5, 7.5, r'$||\vec{v}||$', color=sns.color_palette()[1], size=20)
plt.text(2, 2, r'$||\vec{u}+\vec{v}||$', color=sns.color_palette()[2], size=20)
plt.show()
plt.close()
The length of
We have seen the conditions required by the function to be called norm. This means that there are multiple functions that can be used as norms. We will see later the pros and cons of these different norms. We call p-norm the following category of functions that depends on
Let's dive into this equation step by step. We can see that there is a sum of elements so we can think of it as an iteration over the
-
$\vert\bs{x}_i\vert$ Calculate the absolute value of the $i$th element -
$\vert\bs{x}_i\vert^p$ Take its power$p$ -
$\sum_i\vert\bs{x}_i\vert^p$ Sum all these powered absolute values -
$(\sum_i\vert\bs{x}_i\vert^p)^{1/p}$ Take the power$\frac{1}{p}$ of this result
This will be clear with examples using these widely used
If
Let's see what it means. Using the power
Therefore this norm corresponds to the number of non-zero elements in the vector. It is not really a norm because if you multiply the vector by
The Euclidean norm is the
We can note that the absolute value is not needed anymore since
Let's see an example of this norm:
Graphically, the Euclidean norm corresponds to the length of the vector from the origin to the point obtained by linear combination (Pythagorean theorem). We will see an example in 2 dimensions: the vector
For instance:
Let's start by calculating the norm with the formula:
So the
By the way, the linalg.norm()
function from Numpy:
np.linalg.norm([3, 4])
5.0
Here is the graphical representation of the vector:
u = np.array([3, 4])
plt.ylim(-1, 5)
plt.xlim(-1, 5)
plotVectors([u], [sns.color_palette()[0]])
We can see that the vector goes from the origin (0, 0) to (3, 4) and that its length is 5.
In this case, the vector is in a 2-dimensional space but this stands also for more dimensions.
The squared
The squared Euclidean norm is widely used in machine learning partly because it can be calculated with the vector operation
We will see in this example that the squared Euclidean norm can be calculated with vectorized operations. Let's start with a vector
Now let's take the transpose of this vector. This will just convert the initial column vector to a row vector:
The dot product of
This is exactly the definition of the squared Euclidean norm!
As usual we will use code to check the process. First let's create our Numpy vector
x = np.array([[2], [5], [3], [3]])
x
array([[2], [5], [3], [3]])
Then, we will calculate the transpose of T
method of Numpy objects and calculate its dot product with
euclideanNorm = x.T.dot(x)
euclideanNorm
array([[47]])
It should be our squared Euclidean norm! Let's calculate it from the
np.linalg.norm(x)**2
47.0
It works! The possibility to use a vectorized operation is a huge advantage over the other norms.
We have seen that the norms can be used to evaluate the goodness of a model by summarizing the vectors of errors (see above).
Now we want to go a step further and know how we can change the parameters of our model to reduce the overall error. To do that we can use a cost function that associates the error of the model in function of the parameters values. The gradient descent algorithm can be used to find the minimum of this function. The gradient descent is done by calculating the derivatives according to each parameter (partial derivatives = gradients). This is why this is crucial to be able to calculate the derivative efficiently.
Indeed, a big advantage of the squared
We have seen that its squared
Then, to calculate the partial derivatives, we consider all other variables as constant. For instance, the partial derivative according to
What is great about the gradients of squared
In the case of the
Let's calculate the derivative of it according to
We can see that the partial derivative of
The squared
We can see this by graphically comparing the squared
Squared
For comparison, here is the
These plots have been done with the help of this website. Go and plot these norms if you need to move them in order to catch their shape.
It is the
This is equivalent to take the
The same Numpy function can be use:
A = np.array([[1, 2], [6, 4], [3, 2]])
A
array([[1, 2], [6, 4], [3, 2]])
np.linalg.norm(A)
8.3666002653407556
The dot product between the vectors
Let's take two vectors in 2 dimensions:
and
The following plot shows their graphical representation:
x = [0,0,0,2]
y = [0,0,2,2]
plotVectors([x, y], [sns.color_palette()[0], sns.color_palette()[1]])
plt.xlim(-1, 3)
plt.ylim(-1, 3)
plt.text(-0.5, 1, r'$\vec{x}$', size=18, color=sns.color_palette()[0])
plt.text(1.5, 0.5, r'$\vec{y}$', size=18, color=sns.color_palette()[1])
We took this example for its simplicity. As we can see, the angle
First, let's calculate the dot product of the vectors:
x = np.array([0, 2])
y = np.array([2, 2])
x.dot(y)
4
Great! And now let's calculate their norms:
and
So with the above formula, we have:
This is the same results as with the dot product. Here are the operations using numpy. Just note that we use the function deg2rad
from Numpy because np.cos
takes the angle in radian so we have to do the conversion.
# Note: np.cos take the angle in radian
np.cos(np.deg2rad(45))*2*np.sqrt(8)
4.0000000000000009
Nice!
We will see interesting types of vectors and matrices in this chapter:
- Diagonal matrices
- Symmetric matrices
- Unit vectors
- Orthogonal vectors
- Orthogonal matrices
If you look at the above matrix, you can see that it contains two kind of values: zeros in light blue and non-zeros in dark blue. A matrix
In this case the matrix is also square (there is the same number of rows and columns). We can see that all the non diagonal values are
There also can be non-square diagonal matrices. For instance with more rows than columns:
Or with more columns than rows:
A diagonal matrix can be denoted
In this matrix,
The Numpy function diag()
can be used to create square diagonal matrices:
v = np.array([2, 4, 3, 1])
np.diag(v)
array([[2, 0, 0, 0], [0, 4, 0, 0], [0, 0, 3, 0], [0, 0, 0, 1]])
The mutliplication between a diagonal matrix and a vector is thus just a ponderation of each element of the vector by
and the following vector
The dot product of
It corresponds to a ponderation of the vector
Non square matrices have the same properties:
and
If you need some details about the inversion of a matrix checkout my blog post here. The invert of a square diagonal matrix exists if all entries of the diagonal are non-zeros. If it is the case, the invert is easy to find. Also, the inverse doen't exist if the matrix is non-square. Having the following matrix
A = np.array([[2, 0, 0, 0],
[0, 4, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 1]])
A
array([[2, 0, 0, 0], [0, 4, 0, 0], [0, 0, 3, 0], [0, 0, 0, 1]])
The inverse
A_inv = np.linalg.inv(A)
A_inv
array([[ 0.5 , 0. , 0. , 0. ], [ 0. , 0.25 , 0. , 0. ], [ 0. , 0. , 0.33333333, 0. ], [ 0. , 0. , 0. , 1. ]])
We can check that the dot product of
A.dot(A_inv)
array([[ 1., 0., 0., 0.], [ 0., 1., 0., 0.], [ 0., 0., 1., 0.], [ 0., 0., 0., 1.]])
Great!
Illustration of a symmetric matrixThe matrix
This concerns only square matrices.
Here is an example of a symmetric matrix:
A = np.array([[2, 4, -1], [4, -8, 0], [-1, 0, 3]])
A
array([[ 2, 4, -1], [ 4, -8, 0], [-1, 0, 3]])
We will check that it corresponds to its transpose:
A.T
array([[ 2, 4, -1], [ 4, -8, 0], [-1, 0, 3]])
Hooray!
This one is simple! A unit vector is a vector of length equal to 1. It can be denoted by a letter with a hat:
Two orthogonal vectors are separated by a 90° angle. The dot product of two orthogonal vectors gives 0.
x = [0,0,2,2]
y = [0,0,2,-2]
plt.quiver([x[0], y[0]],
[x[1], y[1]],
[x[2], y[2]],
[x[3], y[3]],
angles='xy', scale_units='xy', scale=1)
plt.xlim(-2, 4)
plt.ylim(-3, 3)
plt.axvline(x=0, color='grey')
plt.axhline(y=0, color='grey')
plt.text(1, 1.5, r'$\vec{u}$', size=18)
plt.text(1.5, -1, r'$\vec{v}$', size=18)
plt.show()
plt.close()
and
Let's calculate their dot product:
In addition, when the norm of orthogonal vectors is the unit norm they are called orthonormal.
It is impossible to have more than $n$ vectors mutually orthogonal in $\mathbb{R}^n$.It is impossible to have more than
Orthogonal matrices are important because they have interesting properties. A matrix is orthogonal if columns are mutually orthogonal and have a unit norm (orthonormal) and rows are mutually orthonormal and have unit norm.
An orthogonal matrixFor instance, let's have the following matrix:
It is orthogonal if the column vectors:
and
are orthogonal and if the row vectors:
and
are also orthogonal (cf. above for definition of orthogonal vectors).
We will see that the dot product of an orthogonal matrix with its transpose gives the identity matrix. Let's see why!
Let's have the following matrix:
The transpose of this matrix is:
Let's do their product:
We know from the norm chapter the norm of the vector $\begin{bmatrix}
a & c
\end{bmatrix}$ is equal to
In addition, we just saw that the rows of
Also,
And we know that the columns are orthogonal which means that:
We thus have the identity matrix:
The inverse of an orthogonal matrix is equal to its transpose. We have just seen that
The first step is to multiply each side of the equation
Recall that a matrix or a vector doesn't change when it is multiplied by the identity matrix (see here). So we have:
We also know that matrix multiplication is associative so we can remove the parenthesis:
We also know from above that
This shows that
You can refer to this question for more details.
Sine and cosine are convenient to create orthogonal matrices. Let's create the following matrix for our example:
In Python, we will create the matrix like that:
angle = np.deg2rad(50)
A = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
A
array([[ 0.64278761, -0.76604444], [ 0.76604444, 0.64278761]])
Then, for simplicity we will create variables for each row and each column:
col0 = A[:, 0].reshape(A[:, 0].shape[0], 1)
col1 = A[:, 1].reshape(A[:, 1].shape[0], 1)
row0 = A[0, :].reshape(A[0, :].shape[0], 1)
row1 = A[1, :].reshape(A[1, :].shape[0], 1)
Let's now check that rows and columns are orthogonal. If it is the case, the dot product of the rows between them and the dot product of the columns between them have to be equal to
col0.T.dot(col1)
array([[ 0.]])
row0.T.dot(row1)
array([[ 0.]])
We can also check that:
A.T.dot(A)
array([[ 1., 0.], [ 0., 1.]])
and that:
A.T
array([[ 0.64278761, 0.76604444], [-0.76604444, 0.64278761]])
numpy.linalg.inv(A)
array([[ 0.64278761, 0.76604444], [-0.76604444, 0.64278761]])
Everything is correct!
In this chapter we saw different different type of matrices with specific properties. It is generally useful to recall these properties while we deal with this kind of matrices to simplify equations for instance.
In the next chapter we will saw a central idea in linear algebra: the eigendecomposition. Keep reading!
We will see some major concepts of linear algebra in this chapter so hang on it worth it! We will start with getting some ideas on eigenvectors and eigenvalues. We will develop on the idea that a matrix can be seen as a linear transformation and that applying a matrix on its eigenvectors gives new vectors with the same direction. Then we will see how to express quadratic equations into the matrix form. We will see that the eigendecomposition of the matrix corresponding to a quadratic equation can be used to find the minimum and maximum of this function. As a bonus, we will also see how to visualize linear transformations in Python!
The eigendecomposition is one form of matrix decomposition. Decomposing a matrix means that we want to find a product of matrices that is equal to the initial matrix. In the case of the eigendecomposition, we decompose the initial matrix into the product of its eigenvectors and eigenvalues. Before all, let's see what are eigenvectors and eigenvalues.
The main thing you have to understand here is that matrices can be seen as linear transformations. Some matrices will rotate your space, others will rescale it etc. So when we apply a matrix to a vector, we end up with a transformed version of the vector. When we say that we 'apply' the matrix to the vector it means that we calculate the dot product of the matrix with the vector. We will start with a basic example of this kind of transformation.
First, create the matrix
A = np.array([[-1, 3], [2, -2]])
A
array([[-1, 3], [ 2, -2]])
and the vector
v = np.array([[2], [1]])
v
array([[2], [1]])
Now, plot the vector
plotVectors([v.flatten()], cols=['#1190FF'])
plt.ylim(-1, 4)
plt.xlim(-1, 4)
That's great! You have everything to see the effect of the matrix
In order to apply the matrix
Av = A.dot(v)
print 'Av:\n', Av
plotVectors([v.flatten(), Av.flatten()], cols=['#1190FF', '#FF9A13'])
plt.ylim(-1, 4)
plt.xlim(-1, 4)
Av: [[1] [2]]A simple vector and its transformation
We can see that applying the matrix
Now that you can think of matrices as linear transformation recipes, let's see the case of a very special type of vector: the eigenvector.
We have seen an example of a vector transformed by a matrix. Now imagine that the transformation of the initial vector gives us a new vector that has the exact same direction. The scale can be different but the direction is the same. Applying the matrix didn't change the direction of the vector. This special vector is called an eigenvector of the matrix. We will see that finding the eigenvectors of a matrix can be very useful for different purposes.
Imagine that the transformation of the initial vector by the matrix gives a new vector with the exact same direction. This vector is called an eigenvector of $\bs{A}$.This means that
Don't worry, we will see all of this in examples!
First, create a new matrix
A = np.array([[5, 1], [3, 3]])
A
array([[5, 1], [3, 3]])
For this first try, I'll give you the eigenvector
So you can create the vector:
v = np.array([[1], [1]])
v
array([[1], [1]])
Just in case I'm wrong, please check
and since:
we have:
Looks good! It corresponds to our formula:
This means that
We can represent
Av = A.dot(v)
orange = '#FF9A13'
blue = '#1190FF'
plotVectors([Av.flatten(), v.flatten()], cols=[blue, orange])
plt.ylim(-1, 7)
plt.xlim(-1, 7)
We can see that their directions are the same!
Now, I have news for you: there is another eigenvector.
Another eigenvector of
because
and
We have:
So the corresponding eigenvalue is
Let's create the vector:
v = np.array([[1], [-3]])
v
array([[ 1], [-3]])
And see its graphical representation:
Av = A.dot(v)
plotVectors([Av.flatten(), v.flatten()], cols=[blue, orange])
plt.ylim(-7, 1)
plt.xlim(-1, 3)
This example shows that the eigenvectors
Numpy provides a function returning eigenvectors and eigenvalues. Here a demonstration with the preceding example:
A = np.array([[5, 1], [3, 3]])
A
array([[5, 1], [3, 3]])
np.linalg.eig(A)
(array([ 6., 2.]), array([[ 0.70710678, -0.31622777], [ 0.70710678, 0.9486833 ]]))
The first array corresponds to the eigenvalues and the second to the eigenvectors concatenated in columns.
We can see that the eigenvalues are the same than the ones we used before: 6 and 2 (first array). The eigenvectors correspond to the columns of the second array. This means that the eigenvector corresponding to
The eigenvector corresponding to
You can notice that these vectors are not the ones we used in the example. They look different because they have not necessarly the same scaling than the ones we gave in the example. We can easily see that the first corresponds to a scaled version of our $\begin{bmatrix}
1\\
1
\end{bmatrix}$. But the same property stands. We have still
With
For the second eigenvector we can check that it corresponds to a scaled version of $\begin{bmatrix} 1\\ -3 \end{bmatrix}$. We can draw these vectors and see if they are parallel.
v = np.array([[1], [-3]])
Av = A.dot(v)
v_np = [-0.31622777, 0.9486833]
plotVectors([Av.flatten(), v.flatten(), v_np], cols=[blue, orange, 'blue'])
plt.ylim(-7, 1)
plt.xlim(-1, 3)
We can see that the vector found with Numpy (in dark blue) is a scaled version of our preceding $\begin{bmatrix} 1\\ -3 \end{bmatrix}$.
As we saw it with numpy, if
Let's try to rescale
from our preceding example.
For instance,
So we have:
We have well
Now that we have an idea of what eigenvectors and eigenvalues are we can see how it can be used to decompose a matrix. Let's have a first look at the formula:
First, what is np.linalg.eig(A)
):
The first column
is the eigenvector corresponding to
is the eigenvector corresponding to
Second, the vector
So
Finally,
Let's now have a second look to the formula:
Now that we have seen the formula used to decompose the matrix, let's try to do it for
Let's create the matrix
V = np.array([[1, 1], [1, -3]])
V
array([[ 1, 1], [ 1, -3]])
The diagonal matrix is all zeros except the diagonal that is our vector
This can be created with the Numpy function np.diag()
:
lambdas = np.diag([6, 2])
lambdas
array([[6, 0], [0, 2]])
We also need to calculate the inverse matrix of np.linalg.inv()
function from Numpy:
V_inv = np.linalg.inv(V)
V_inv
array([[ 0.75, 0.25], [ 0.25, -0.25]])
We now have all the elements of the equation
Before doing the matrix product in Python (it is a one-liner), we will have a look to the details. We will start to do the dot product of the first two matrices:
So if we replace into the equation:
With Python:
V.dot(lambdas).dot(V_inv)
array([[ 5., 1.], [ 3., 3.]])
Hooray! It is
In the case of real symmetric matrices (more details about symmetric matrices in the chapter 2.), the eigendecomposition can be expressed as
where
The notation can be confusing because the name of the matrix of eigenvectors and the name of the matrix containing the eigenvalues have changed. However the idea is the same: the only difference is that instead of using the inverse of the matrix of eigenvectors we use its transpose.
Let's take an example of a symmetric matrix.
This matrix is symmetric (we can see that
A = np.array([[6, 2], [2, 3]])
A
array([[6, 2], [2, 3]])
Now we will calculate its eigenvectors. Remember that the function np.linalg.eig()
outputs two matrices: the first one is an array containing the eigenvalues and the second an array containing the eigenvectors.
eigVals, eigVecs = np.linalg.eig(A)
eigVecs
array([[ 0.89442719, -0.4472136 ], [ 0.4472136 , 0.89442719]])
So the eigenvectors of
We will now put its eigenvalues in a diagonal matrix:
eigVals = np.diag(eigVals)
eigVals
array([[ 7., 0.], [ 0., 2.]])
It gives us the following matrix:
We have everything to calculate
Let's calculate
eigVecs.T
array([[ 0.89442719, 0.4472136 ], [-0.4472136 , 0.89442719]])
So we have:
It is easier to do it in Python:
eigVecs.dot(eigVals).dot(eigVecs.T)
array([[ 6., 2.], [ 2., 3.]])
We can see that the result corresponds to our initial matrix.
A quadratic function is a function of the form:
x = np.arange(-11, 11)
y = x**2
plt.figure()
plt.plot(x, y)
plt.xlim(-10, 10)
plt.ylim(0, 100)
plt.show()
In this case the function is an univariate quadratic function but there can be multivariate ones (with more than one variable). For instance, a bivariate quadratic function can look like a bowl in 3 dimensions.
We will see that eigendecomposition can be used to optimize quadratic functions, that is to say to find the minimum or the maximum of the function. Indeed, when
But the first thing to understand is the link between all the matrix manipulations we have done so far and these quadratic equations. The answer is that we can write quadratic functions under the matrix form!
Look at this matrix product:
Note that there are 2 dimensions
The matrices correspond to a quadratic function:
The matrix form is just another way of writing the function.
We can concatenate the variables
and let's call
Now we can write:
This is pretty sweet way of writing a quadratic function! We call them matrix forms. This form is useful to do various things on the quadratic equation like constrained optimization (see bellow).
If you look at the relation between these forms you can see that
In this example, we will see how two different matrices can lead to the same equation.
Let's start with:
and:
Remember that you can find the function
This gives the following quadratic form:
Now we will try with another matrix that has the same total for
We still have the quadratic same form:
We can also try to find the quadratic form from the matrix form. For this example, we will go from the matrix form to the quadratic form using the symmetric matrix
and
The equation can be retrieved:
Our quadratic equation is thus:
If
Take the following matrix form:
If
Now that we have seen the link between the matrix form and the quadratic form of an equation, we will build on that and see how we can manipulate the equation in order to simplify it. The simplification we want to make is to get rid of the cross term of the equation (see above). Without the cross term, it will then be easier to characterize the function and eventually optimize it (i.e finding its maximum or minimum).
One way to get rid of the cross term of the equation is to use a change of variable. A change of variable (or linear substitution) simply means that we replace a variable by another one. If we stay in the quadratic functions we used previously, we want to find the variables
Let's take again our previous quadratic form:
The change of variable will concern
You will see why it can be usefull to be able to go from the quadratic form to the matrix form. The matrix
Let's recall that the matrix form of our equation is:
and
and that the eigenvectors of
With the purpose of simplification, we can replace these values with:
So our first eigenvector is:
and our second eigenvector is:
To do the change of variable we will apply the vector
so we have
So far so good! To summarize, it means that to get rid of the cross term
Let's do that:
That's great! Our new equation doesn't have any cross terms!
Actually there is a simpler way to do the change of variable. We can stay in the matrix form. Recall that we start with the form:
The linear substitution can be wrote in these terms. We want replace the variables
We want to find
Can you see the how to transform the left hand side (
All of this implies that we can use
That's nice! If you look back to the change of variable that we have done in the quadratic form, you will see that we have found the same values!
This form (without cross-term) is called the principal axes form.
To summarise, the principal axes form can be found with
where
We will see that there is a way to find
Let's start from:
We know that if
Since
This is a usefull property. If
This example will show that
and the eigenvalues
So if:
In the same way, if $\bs{x}=\begin{bmatrix}
-0.4472136 & 0.89442719
\end{bmatrix}$,
Depending to the context, optimizing a function means finding its maximum or its minimum. It is for instance widely used to minimize the error of cost functions in machine learning.
It is finally time to see why removing the cross terms of the quadratic equation makes the process of optimization easy. Using matrix form means that we can do the eigendecomposition of the matrix. Here we bind all our recently acquired knowledge and see how eigendecomposition can be used to optimize quadratic functions. We will use functions without cross-term to show why it is easy and why we learned to remove this term.
We will see a case of constraint optimization. This means that we want to find the minimum or the maximum of the function while the variables respect certain constraints. In our example we want to find the minimum and the maximum of the function
To do that we will start from our last example and use the transformation that removed the cross terms. You will see that it makes the process straightforward!
Here is the function that we want to optimize:
After the change of variable we ended up with this equation (no cross term):
Furthermore, the constraint of
You might think that it concerns
So
Great! We have everything we need to do the optimization. Here is the reasoning: since
then:
Hence:
This means that the maximum value of
The same reasoning can lead to find the minimum of
So the minimum of
We can note that the minimum of
We saw that the quadratic functions
Graphically, these functions can take one of three general shapes (click on the links to go to the Surface Plotter and move the shapes):
1.Positive-definite form | 2.Negative-definite form | 3.Indefinite form |
---|---|---|
With the constraints that
We have seen a lot of things in this chapter. We saw that linear algebra can be used to solve a variety of mathematical problems and more specifically that eigendecomposition is a powerful tool! I hope that you can now see a matrix as a linear transformation recipe and to easily visualize what matrix do what kind of transformation, have a look at the BONUS!
We can see the effect of eigenvectors and eigenvalues in linear transformation. We will see first how linear transformation works. Linear transformation is a mapping between an input vector and an output vector. Different operations like projection or rotation are linear transformations. Every linear transformations can be though as applying a matrix on the input vector. We will see the meaning of this graphically. For that purpose, let's start by drawing the set of unit vectors (they are all vectors with a norm of 1).
t = np.linspace(0, 2*np.pi, 100)
x = np.cos(t)
y = np.sin(t)
plt.figure()
plt.plot(x, y)
plt.xlim(-1.5, 1.5)
plt.ylim(-1.5, 1.5)
plt.show()
Then, we will transform each of these points by applying a matrix
- the origin set of unit vectors
- the transformed set of unit vectors
- the eigenvectors
- the eigenvectors scalled by their eigenvalues
def linearTransformation(transformMatrix):
"""
Plot the transformation of the unit circle by a matrix along with the eigenvectors scaled by the eigenvalues. Depends on the function plotVectors().
Parameters
----------
transformMatrix : array-like
Matrix to apply to the unit circle. For instance: np.array([[1, 3], [2, 2]]).
Returns:
fig : instance of matplotlib.figure.Figure
The figure showing the unit circle, the transformed circled and the eigenvectors.
"""
# Create original set of unit vectors
t = np.linspace(0, 2*np.pi, 100)
x = np.cos(t)
y = np.sin(t)
# Calculate eigenvectors and eigenvalues
eigVecs = np.linalg.eig(transformMatrix)[1]
eigVals = np.diag(np.linalg.eig(transformMatrix)[0])
# Create vectors of 0 to store new transformed values
newX = np.zeros(len(x))
newY = np.zeros(len(x))
for i in range(len(x)):
unitVector_i = np.array([x[i], y[i]])
# Apply the matrix to the vector
newXY = transformMatrix.dot(unitVector_i)
newX[i] = newXY[0]
newY[i] = newXY[1]
# Plot the unit circle
plt.plot(x, y)
# Plot the eigenvectors rescaled by their respective eigenvalues
plotVectors([eigVals[0,0]*eigVecs[:,0], eigVals[1,1]*eigVecs[:,1]],
cols=[sns.color_palette()[1], sns.color_palette()[1]])
# Plot the transformed circle
plt.plot(newX, newY)
plt.xlim(-5, 5)
plt.ylim(-5, 5)
plt.show()
A = np.array([[1,-1], [-1, 4]])
linearTransformation(A)
We can see the unit circle in dark blue, the transformed unit circle and the scaled eigenvectors in green.
It is worth noting that the eigenvectors are orthogonal here because the matrix is symmetric. Let's try with a non-symmetric matrix:
A = np.array([[1,1], [-1, 4]])
linearTransformation(A)
In this case, the eigenvectors are not orthogonal!