Understanding Singular Value Decomposition
Singular value decomposition expresses any real m × n matrix A as the product of three matrices: A = UΣVT. Here, U and V are orthogonal matrices (their columns are perpendicular unit vectors), and Σ is a diagonal matrix containing non-negative singular values in descending order.
The singular values represent the 'strengths' of directions in your data. Large singular values indicate dominant patterns, while small ones reflect noise or redundancy. This decomposition works for rectangular matrices of any shape, making it far more flexible than eigenvalue decomposition, which requires square matrices.
For complex matrices, replace the transpose VT with the conjugate transpose V*. SVD appears across machine learning (principal component analysis), image processing, signal filtering, and solving ill-posed inverse problems.
The SVD Decomposition Formula
Any m × n matrix A can be factored as shown below, where U is m × m, Σ is m × n with singular values on the diagonal, and VT is n × n:
A = UΣVT
The singular values σ₁, σ₂, …, σᵣ (where r ≤ min(m, n)) are the square roots of the non-zero eigenvalues of ATA or AAT:
σᵢ = √(λᵢ(ATA))
U— Orthogonal m × m matrix whose columns are left singular vectors (eigenvectors of AA<sup>T</sup>)Σ— Diagonal m × n matrix containing singular values in descending order on the main diagonalV<sup>T</sup>— Transpose of orthogonal n × n matrix whose columns are right singular vectors (eigenvectors of A<sup>T</sup>A)σᵢ— The i-th singular value; always non-negative and ordered σ₁ ≥ σ₂ ≥ … ≥ σᵣ ≥ 0
Computing SVD by Hand
To manually decompose a matrix, construct two auxiliary square matrices from your m × n input A:
- ATA (n × n matrix): eigenvectors form the columns of V
- AAT (m × m matrix): eigenvectors form the columns of U
The process:
- Find eigenvalues of ATA and compute their square roots to obtain singular values
- Calculate the corresponding eigenvectors of ATA and arrange them as columns of V in order of decreasing singular value magnitude
- Normalize and arrange eigenvectors of AAT as columns of U
- Construct Σ with singular values on the diagonal and zeros elsewhere
- Verify by multiplying: UΣVT should recover A (within numerical precision)
This method highlights why SVD relates fundamentally to eigendecomposition while handling non-square matrices elegantly.
Special Cases and Properties
Symmetric matrices: If A is symmetric and positive definite, its singular values equal its eigenvalues, and U = V. SVD and eigendecomposition coincide.
Unitary matrices: For unitary matrices (where AA* = I), all singular values equal 1. The factorization becomes A = A × I × I.
Low-rank approximations: Keep only the largest k singular values and corresponding columns of U and V to create a rank-k approximation. This is the foundation of data compression and noise reduction.
Uniqueness: The singular values and Σ are always unique when ordered in descending order. However, U and V are not unique—you can multiply columns by −1, or introduce rotations in the subspace of repeated singular values, and still obtain a valid SVD.
Practical Tips and Common Pitfalls
Avoid these common mistakes when working with SVD:
- Numerical precision matters — SVD computations are sensitive to floating-point rounding, especially for ill-conditioned matrices (those with very small singular values). Always check if the reconstruction UΣV<sup>T</sup> matches your original matrix within acceptable tolerance, not bit-for-bit equality.
- Don't confuse with eigendecomposition — Only square matrices have eigendecomposition. SVD works on any rectangular matrix, making it more general. For non-square matrices, you must use SVD to obtain analogous factorizations.
- Interpret singular values correctly — Singular values are always non-negative and ordered largest to smallest. They quantify the 'importance' of each direction. A rapid drop-off suggests low effective rank and opportunity for compression.
- Beware of repeated singular values — When two or more singular values are equal, the corresponding singular vectors are not unique—any orthogonal combination of those vectors is valid. This doesn't affect applications like compression but matters for reproducibility.