Mathematics for Machine Learning

004: Matrix Decomposition, Characteristic Equation

Matrix Decomposition

Matrix decomposition, also known as matrix factorization, is the process of breaking down a matrix into a product of simpler matrices. Matrix decomposition is an important concept in linear algebra and has many applications in areas such as signal processing, data analysis, and machine learning.

There are many different types of matrix decompositions, each with its own properties and applications. Here are a few examples:

  1. LU decomposition: A square matrix A can be decomposed into the product of a lower triangular matrix L and an upper triangular matrix U. LU decomposition is used to solve systems of linear equations and to compute the determinant and inverse of a matrix.
import scipy.linalg

# Define the matrix A
A = [[2, 3, 1],
     [6, 13, 5],
     [1, 3, 1]]

# Perform LU decomposition
P, L, U = scipy.linalg.lu(A)

# Print the results
print("Matrix A:")
print(A)
print("P:")
print(P)
print("L:")
print(L)
print("U:")
print(U)

In the code above, we first define the matrix A that we want to decompose. We then use the scipy.linalg.lu function to compute the LU decomposition, which returns the permutation matrix P, the lower triangular matrix L, and the upper triangular matrix U

  1. QR decomposition: A matrix A can be decomposed into the product of an orthogonal matrix Q and an upper triangular matrix R. The QR decomposition is used for least squares regression and eigenvalue computation.
import numpy as np

# Define the matrix A
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Compute QR decomposition
Q, R = np.linalg.qr(A)

# Print the results
print("Matrix A:")
print(A)
print("Q:")
print(Q)
print("R:")
print(R)
  1. Singular Value Decomposition (SVD): is a matrix factorization technique that decomposes a matrix An into the product of three matrices, A = UΣV^T, where U and V are orthogonal matrices and Σ is a diagonal matrix.

  2. The SVD can be computed using various numerical algorithms, such as the Jacobi algorithm, the Golub-Kahan algorithm, and the QR algorithm. In practice, the SVD is often computed using a divide-and-conquer algorithm that is based on the QR decomposition.

import numpy as np

# Define a matrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Perform SVD decomposition
U, s, V = np.linalg.svd(A)

# Print the results
print("U:\n", U)
print("s:\n", s)
print("V:\n", V)
  1. Eigendecomposition: A square matrix A can be decomposed into the product of a diagonal matrix D and a matrix V, where the columns of V are the eigenvectors of A. The eigendecomposition is used to compute the eigenvalues and eigenvectors of a matrix.
import numpy as np

# Define a matrix
A = np.array([[1, 2], [2, 1]])

# Perform eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(A)

# Print the results
print("Eigenvalues:\n", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

we define a 2x2 matrix A and then perform eigendecomposition using the numpy.linalg.eig() function.

Note that the eigenvectors returned by numpy.linalg.eig() are normalized to have unit length.

  1. Cholesky decomposition: also known as Cholesky factorization, is a matrix decomposition technique that decomposes a symmetric positive-definite matrix A into the product of a lower triangular matrix L and its transpose, A = LL^T.

Matrix decomposition is an important tool in many areas of mathematics, science, and engineering. By decomposing a matrix into simpler components, we can gain insight into its structure and properties, and use these insights to solve problems and develop new applications.

Laplace Expansion

The Laplace expansion, also known as the cofactor expansion, is a method used to calculate the determinant of a square matrix. The Laplace expansion can be used to expand the determinant of an n x n matrix along any row or column.

To apply the Laplace expansion along a row or column,

  1. We choose one element from that row or column and compute its corresponding cofactor. The cofactor is the determinant of the (n-1) x (n-1) matrix obtained by removing the row and column of the selected element.

  2. We then multiply the selected element by its corresponding cofactor and add the resulting terms. This process is repeated for each element in the row or column, and the resulting sum gives the determinant of the original matrix.

More formally, let A be an n x n matrix, and let aij be an element of A. The Laplace expansion along the ith row is given by:

det(A) = ∑j=1n (-1)i+j aij Mij

where Mij is the cofactor of aij, defined by:

Mij = (-1)i+j * det(Aij)

where Aij is the (n-1) x (n-1) matrix obtained by deleting the ith row and jth column of A.

The Laplace expansion along the jth column is defined similarly, except we sum over the rows instead of the columns:

det(A) = ∑i=1n (-1)i+j aij Mij

where Mij is defined as above.

Calculating Inverse using Minor and Cofactor

The minor of a matrix is the determinant of a square submatrix obtained by deleting some rows and columns of the original matrix.

More formally, given an n x n matrix A, the (i, j)-th minor of A, denoted by M_ij, is the determinant of the (n-1) x (n-1) submatrix obtained by deleting the ith row and jth column of A.

The minors of a matrix are important in the calculation of the cofactors of a matrix and its determinant. Specifically, the cofactor of the (i, j)-th element of A is given by C_ij = (-1)^(i+j) * M_ij, and the determinant of A can be expressed as the sum of the products of the elements of any row or column of A with their corresponding cofactors.

Trace of the Matrix

The trace of a square matrix is the sum of the elements on its main diagonal. In other words, for an n x n matrix A, the trace of A, denoted by tr(A), is given by:

tr(A) = ∑i=1 to n A_ii

where A_ii is the element in the i-th row and i-th column of A.

The trace of a matrix has several important properties. For example, the trace is invariant under cyclic permutations of the matrix product, meaning that for any n x n matrices A, B, and C, we have:

tr(ABC) = tr(CAB) = tr(BCA)

The trace also has a relationship with the determinant and the eigenvalues of a matrix. Specifically, for any n x n matrix A, we have:

tr(A) = λ_1 + λ_2 + ... + λ_n

Characteristic Polynomial

The characteristic polynomial is a polynomial associated with a square matrix. Given an n x n matrix A, the characteristic polynomial is defined as the polynomial det(A - λI), where det denotes the determinant, λ is a scalar, and I is the n x n identity matrix.

In other words, the characteristic polynomial is obtained by subtracting λ times the identity matrix from the given matrix A, computing the determinant of the resulting matrix, and expressing the result as a polynomial in λ.

The degree of the characteristic polynomial is n, the size of the matrix A. The roots of the characteristic polynomial are known as the eigenvalues of matrix A.

Let's take an example:

Suppose we have the matrix A = [2 1; 4 3]. To find the characteristic polynomial of A, we first subtract λ times the identity matrix from A:

A - λI = [2 - λ 1; 4 3 - λ]

Next, we compute the determinant of the resulting matrix:

det(A - λI) = (2 - λ)(3 - λ) - 4 = λ^2 - 5λ + 2

Therefore, the characteristic polynomial of A is given by λ^2 - 5λ + 2. The roots of this polynomial are the eigenvalues of the matrix A, which can be found by solving the equation λ^2 - 5λ + 2 = 0. Using the quadratic formula, we find that the eigenvalues are λ1 ≈ 4.303 and λ2 ≈ 0.697

Determinant and product of eigenvalues

The determinant of a matrix A ∈ Rn×n is the product of its eigenvalues. In other words, if λ1, λ2, ..., λn are the eigenvalues of A, then det(A) = λ1λ2...λn. Recall that the characteristic polynomial of A is defined as det(A - λI), where det denotes the determinant and I is the n x n identity matrix. By the fundamental theorem of algebra, the characteristic polynomial has n complex roots, which are the eigenvalues of A.

Expanding det(A - λI) as a polynomial in λ, we obtain:

det(A - λI) = (-1)^n λ^n + c1 λ^(n-1) + ... + cn-1 λ + cn

where c1, ..., cn are coefficients that depend on the entries of A. Setting λ = 0, we obtain det(A) = (-1)^n cn, which is the product of the eigenvalues of A (with appropriate signs).

Therefore, we conclude that the determinant of matrix A is the product of its eigenvalues.

Eigen Decomposition

Consider the following 2x2 matrix A:

 A= [ 3  1 ]
    [ 1  3 ]

To perform eigendecomposition, we first need to find the eigenvalues and eigenvectors of A.

The eigenvalues are the values λ that satisfy the equation det(A - λI) = 0, where I is the identity matrix. In this case, we have:

det(A - λI) = 
    | 31  |
    |  1   3-λ |  
= (3-λ)(3-λ) - 1 
= λ^2 - 6λ + 8 
= (λ - 4)(λ - 2) = 0

So the eigenvalues of A are λ1 = 2 and λ2 = 4

To find the eigenvectors, we need to solve the equation (A - λI)x = 0 for each eigenvalue λ. For λ1 = 2, we have:

(A - λ1I)x = 
    [ 3-λ1   1 ] [ x1 ]   [ 1 ]
    [  1   3-λ1 ] [ x2 ] = [ 1 ]
Simplifying this equation gives:

    [ 1   1 ] [ x1 ]   [ 0 ]
    [ 1   1 ] [ x2 ] = [ 0 ]

Subtracting the second equation from the first, we get x1 = -x2. Therefore, any vector of the form [x1, -x1] is an eigenvector of A corresponding to λ1 = 2. A possible choice is [1, -1].

For λ2 = 4, we have:
(A - λ2I)x = [ 3-λ2 1 ] [ x1 ] [ 1 ] [ 1 3-λ2 ] [ x2 ] = [ 1 ]

Simplifying this equation gives:

[-1   1 ] [ x1 ]   [ 0 ]
[ 1  -1 ] [ x2 ] = [ 0 ]

Adding the two equations, we get x1 = x2. Therefore, any vector of the form [x1, x1] is an eigenvector of A corresponding to λ2 = 4. A possible choice is [1, 1].

So we have found two eigenvectors for A: v1 = [1, -1] and v2 = [1, 1]. We can normalize these vectors to have unit length:

v1     =     [1/sqrt(2), -1/sqrt(2)]
v2     =     [1/sqrt(2), 1/sqrt(2)]

Now we can write the eigendecomposition of A as:

    [1/sqrt(2) -1/sqrt(2)] [ 2    0 ] [1/sqrt(2)  1/sqrt(2)]
A = [1/sqrt(2)  1/sqrt(2)] [ 0    4 ] [1/sqrt(2) -1/sqrt(2)]

This shows that A can be decomposed into the product of its eigenvectors and a diagonal matrix containing its eigenvalues. This form is useful for computing matrix powers, performing data compression, and other applications.