Table of contents
(2024-02-14)
-
Derivative is the amount of change in a target object caused by a variable’s change.
-
A row a matrix consists of the coefficient of each term in a linear equation. And based on the “sum rule” of derivative ($(f+g)'=f'+g'$), the derivative of the linear equation w.r.t. a variable is the summation of the derivative of each element in the row w.r.t. the variable.
d Ax
(2024-01-13)
Source video: Derivative of a Matrix : Data Science Basics - ritvikmath
Matrix 𝐀() stands for a linear transformation (function). And only the derivative of a function (𝐀𝐱) makes sense.
- Matrix is a representation of linear systems.
The derivative of the linear transformation 𝐀𝐱 w.r.t. x is A. It analog to single-variable function.
A matrix $A$ is a “scalar”. More concretely, it’s a collection of scalars in a box.
Therefore, the derivative of A means the derivative of a constant, which would be 0. So, it doesn’t make any sense.
Thereby, we are not calculating the derivative of a matrix, but the derivative of a linear transformation 𝐀𝐱 w.r.t. 𝐱.
d xᵀAx
$$ \begin{aligned} 𝐱ᵀ𝐀𝐱 &= \begin{bmatrix} x₁ & x₂ \end{bmatrix} \begin{bmatrix} a₁₁ & a₁₂ \\\ a₂₁ & a₂₂ \end{bmatrix} \begin{bmatrix} x₁ \\\ x₂ \end{bmatrix} \\\ &= \begin{bmatrix} x₁ & x₂ \end{bmatrix} \begin{bmatrix} a₁₁x₁+ a₁₂x₂ \\\ a₂₁x₁ + a₂₂x₂ \end{bmatrix} \\\ &= a₁₁x₁²+ a₁₂x₁x₂ + a₂₁x₁x₂ + a₂₂x₂² ⇒ f(x₁,x₂) \end{aligned} $$Consider 𝐀 is a symmetric matrix, so a₂ = a₃. Then, $𝐱ᵀ𝐀𝐱 = a₁₁x₁²+ a₁₂x₁x₂ + a₂₁x₁x₂ + a₂₂x₂² = f(x₁,x₂)$
The derivative of the linear transformation 𝐱ᵀ𝐀𝐱:
$$ \begin{aligned} \frac{d𝐱ᵀ𝐀𝐱}{d𝐱} &= \begin{bmatrix} ∂f/∂x₁ \\\ ∂f/∂x₂ \end{bmatrix} \\\ &= \begin{bmatrix} 2a₁₁x₁+2a₁₂x₂ \\\ 2a₁₂x₂ + 2a₂₂x₂ \end{bmatrix} \\\ &= 2 \begin{bmatrix} a₁₁ & a₁₂ \\\ a₁₂ & a₂₂ \end{bmatrix} \begin{bmatrix} x₁ \\\ x₂ \end{bmatrix} \\\ &= 2𝐀𝐱 \end{aligned} $$It’s an analog to quadratic of matrix operations.
3 cases
Source article: The derivative matrix - Math Insight
-
A matrix 𝐀 contains elements that are functions of a scalar x.
-
The $\frac{d𝐀}{dx}$ is a matrix of the same size as 𝐀.
Refer to Definition 5 in Matrix Differentiation - Department of Atmospheric Sciences
-
-
The derivative of a multi-variable scalar-valued function $f$ is a matrix of partial derivatives of each function with respect to each variable.
- Derivative of 𝐟 w.r.t. each coordinate axis.
- $\frac{df}{d𝐱} = [ \frac{∂f}{∂x₁}\ \frac{∂f}{∂x₂}\ ⋯ \ \frac{∂f}{∂xₙ} ]$
-
A matrix 𝐀 contains elements that are functions of a vector 𝐱.
-
$𝐀(𝐱) = 𝐟(𝐱) = (f_1(𝐱),\ f_2(𝐱),\ ..., f_m(𝐱)) = \begin{bmatrix} f_1(𝐱) \\\ f_2(𝐱) \\\ ⋮ \\\ f_m(𝐱) \end{bmatrix}$
-
The $\frac{d𝐀}{d𝐱}$ is a matrix with the size of mxn:
$$ \frac{d𝐀}{d𝐱} = \begin{bmatrix} \frac{f_1}{x_1} & \frac{f_1}{x_2} & ⋯ & \frac{f_1}{xₙ} \\\ \frac{f_2}{x_1} & \frac{f_2}{x_2} & ⋯ & \frac{f_2}{xₙ} \\\ ⋮ & ⋮ & ⋮ & ⋮ \\\ \frac{f_m}{x_1} & \frac{f_m}{x_2} & ⋯ & \frac{f_m}{xₙ} \\\ \end{bmatrix} $$
-
Matrix derivative
(2023-02-12)
Matrix derivatie is in terms of the whole matrix, instead of each element. Whereas partial derivatives of a matrix
Given a matrix $[^{a\ b}\_{d\ c}]$,the derivative of its inverse matrix $\frac{1}{ac-bd}[^{\ c\ -b}\_{-d\ a}]$ w.r.t. the original matrix is the “coefficient” in their relation:
$$ \underbrace{ \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} \frac{1}{ac-bd} \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} }\_{\text{Coefficient}} \begin{bmatrix} a & b \\\ d & c \end{bmatrix} = \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} $$-
This transformation can be understood as that the original matrix first times its inverse $\frac{1}{ac-bd}[^{\ c\ -b}\_{-d\ a}]$ to become the identity matrix $[^{1\ 0}_{0\ 1}]$, which gets multiplied by $[^{\ c\ -b}\_{-d\ a}]$ to yield the inverse matrix.
Therefore, the coefficient is:
$$ \frac{1}{ac-bd} \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} = \frac{1}{ac-bd} \begin{bmatrix} c² + bd & -bc-ab \\\ -cd-ad & bd+a²\end{bmatrix} $$In this case, is the optimizing objective the whole matrix $[^{a\ b}_{d\ c}]$, with its coefficient serving as the gradient?
On the other hand, the partial derivatives of the inverse matrix $\frac{1}{ac-bd}[^{\ c\ -b}\_{-d\ a}]$ with respect to each element a, b, c, d can be conceptualized as:
how does changes in the 4 “variables” $a,\ b,\ c,\ d$ affect the matrix $\frac{1}{ac-bd}[^{\ c\ -b}\_{-d\ a}]$
$$ \begin{aligned} \frac{ ∂\frac{1}{ac-bd} \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix}}{∂a} &= \begin{bmatrix} \frac{∂}{∂a} (\frac{c}{ac-bd} ) & \frac{∂}{∂a} (\frac{-b}{ac-bd}) \\\ \frac{∂}{∂a} (\frac{-d}{ac-bd}) & \frac{∂}{∂a} (\frac{a}{ac-bd} ) \\\ \end{bmatrix} \\\ &= \begin{bmatrix} \frac{-c²}{(ac-bd)²} & \frac{bc}{(ac-bd)²} \\\ \frac{dc}{(ac-bd)²} & \frac{-bd}{(ac-bd)²} \\\ \end{bmatrix} \end{aligned} $$The total change of the matrix magnitude caused by moving $a$ by one unit would be:
$$\frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}\_{-d\ a}] )}{∂a} = \frac{-c² + bc + dc - bd}{(ac-bd)²} $$- Particularly, with this derivative, $a$ can be optimized via gradient descent.
Similarly, the partial derivatives of the matrix w.r.t. $b,\ c,\ d$ are:
$$ \begin{aligned} \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}\_{-d\ a}] )}{∂b} &= \frac{cd-ac-d²+ad}{(ac-bd)²} \\\ \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}\_{-d\ a}] )}{∂c} &= \frac{-bd+ba+da-a²}{(ac-bd)²} \\\ \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}\_{-d\ a}] )}{∂d} &= \frac{cb-b²-ac+ab}{(ac-bd)²} \\\ \end{aligned} $$(2024-02-13)
Matrix Derivatives: What’s up with all those transposes ? - David Levin
Gradient: Matrix form -> indices form -> matrix form
XᵀwX
(2024-04-06)
拆分成:向量函数 + 多元函数
空间的基可以是多项式函数, 幂函数, 所以线性方程可以表示非线性函数
【微积分和线性代数碰撞的数学盛宴:最小二乘法公式推导!】-晓之车高山老师 - bilibili
(2024-05-15)
(2024-07-22)
Source video: 手推机器学习1⃣️—矩阵求导 - S-WangZ(2024-05-24)
-
Scalar-value function $f: \\R^n → \\R$
-
Defined with field and vector space: Scalar-valued function definition - SE (Searched by “scalar function” in DDG)
-
A field $k$ comprises one set k and two operations: addition and multiplicaiton. $k = (k, +, ⋅)$
-
A vector space $V$ comprises two sets $k$ and $V$ and two operations: addition and multiplicaiton. $V = (V, +, k, ⋅)$
- An element in the set k is a scalar. An element in the set V is a vector.
- Scalar-field funtion f maps a vector space to a scalar: $f: V → k$
-
-
“Scalar function is a function with one-dimensional scalar output” Scalar Function, Definition of Scalar - Statistics How To
-
“A scalar-value function is a function that takes one or more values and returns a single value.” World Web Math: Vector Calculus: Scalar Valued Functions - MIT
-