(2024-02-14)
-
Derivative is the amount of change in a target object caused by a variable’s change.
-
A row a matrix consists of the coefficient of each term in a linear equation. And based on the “sum rule” of derivative ($(f+g)’=f’+g’$), the derivative of the linear equation w.r.t. a variable is the summation of the derivative of each element in the row w.r.t. the variable.
d Ax
(2024-01-13)
Source video: Derivative of a Matrix : Data Science Basics - ritvikmath
Matrix 𝐀() stands for a linear transformation (function). And only the derivative of a function (𝐀𝐱) makes sense.
- Matrix is a representation of linear systems.
$$ \begin{aligned} f(x) &= 𝐀𝐱 \\ &= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} x₁ \\ x_2 \end{bmatrix} \\ &= \begin{bmatrix} x₁ + 2 x₂ \\ 3x₁ + 4x₂ \end{bmatrix} ⇒ \begin{bmatrix} f₁(x₁,x₂) \\ f₂(x₁,x₂) \end{bmatrix} \end{aligned} $$
$$ \frac{d𝐀𝐱}{d𝐱} = \begin{bmatrix} ∂f₁/∂x₁ & ∂f₁/∂x₂ \\ ∂f₂/∂x₁ & ∂f₂/∂x₂ \end{bmatrix}= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} $$
The derivative of the linear transformation 𝐀𝐱 w.r.t. x is A. It analog to single-variable function.
A matrix $A$ is a “scalar”. More concretely, it’s a collection of scalars in a box.
Therefore, the derivative of A means the derivative of a constant, which would be 0. So, it doesn’t make any sense.
Thereby, we are not calculating the derivative of a matrix, but the derivative of a linear transformation 𝐀𝐱 w.r.t. 𝐱.
d xᵀAx
$$ \begin{aligned} 𝐱ᵀ𝐀𝐱 &= \begin{bmatrix} x₁ & x₂ \end{bmatrix} \begin{bmatrix} a₁₁ & a₁₂ \\ a₂₁ & a₂₂ \end{bmatrix} \begin{bmatrix} x₁ \\ x₂ \end{bmatrix} \\ &= \begin{bmatrix} x₁ & x₂ \end{bmatrix} \begin{bmatrix} a₁₁x₁+ a₁₂x₂ \\ a₂₁x₁ + a₂₂x₂ \end{bmatrix} \\ &= a₁₁x₁²+ a₁₂x₁x₂ + a₂₁x₁x₂ + a₂₂x₂² ⇒ f(x₁,x₂) \end{aligned} $$
Consider 𝐀 is a symmetric matrix, so a₂ = a₃. Then, $𝐱ᵀ𝐀𝐱 = a₁₁x₁²+ a₁₂x₁x₂ + a₂₁x₁x₂ + a₂₂x₂² = f(x₁,x₂)$
The derivative of the linear transformation 𝐱ᵀ𝐀𝐱:
$$ \begin{aligned} \frac{d𝐱ᵀ𝐀𝐱}{d𝐱} &= \begin{bmatrix} ∂f/∂x₁ \\ ∂f/∂x₂ \end{bmatrix} \\ &= \begin{bmatrix} 2a₁₁x₁+2a₁₂x₂ \\ 2a₁₂x₂ + 2a₂₂x₂ \end{bmatrix} \\ &= 2 \begin{bmatrix} a₁₁ & a₁₂ \\ a₁₂ & a₂₂ \end{bmatrix} \begin{bmatrix} x₁ \\ x₂ \end{bmatrix} \\ &= 2𝐀𝐱 \end{aligned} $$
It’s an analog to quadratic of matrix operations.
3 cases
Source article: The derivative matrix - Math Insight
-
A matrix 𝐀 contains elements that are functions of a scalar x.
-
The $\frac{d𝐀}{dx}$ is a matrix of the same size as 𝐀.
Refer to Definition 5 in Matrix Differentiation - Department of Atmospheric Sciences
-
-
The derivative of a multi-variable scalar-valued function $f$ is a matrix of partial derivatives of each function with respect to each variable.
- Derivative of 𝐟 w.r.t. each coordinate axis.
- $\frac{df}{d𝐱} = [ \frac{∂f}{∂x₁}\ \frac{∂f}{∂x₂}\ ⋯ \ \frac{∂f}{∂xₙ} ]$
-
A matrix 𝐀 contains elements that are functions of a vector 𝐱.
-
$𝐀(𝐱) = 𝐟(𝐱) = (f_1(𝐱),\ f_2(𝐱),\ …, f_m(𝐱)) = \begin{bmatrix} f_1(𝐱) \\ f_2(𝐱) \\ ⋮ \\ f_m(𝐱) \end{bmatrix}$
-
The $\frac{d𝐀}{d𝐱}$ is a matrix with the size of mxn:
$$ \frac{d𝐀}{d𝐱} = \begin{bmatrix} \frac{f_1}{x_1} & \frac{f_1}{x_2} & ⋯ & \frac{f_1}{xₙ} \\ \frac{f_2}{x_1} & \frac{f_2}{x_2} & ⋯ & \frac{f_2}{xₙ} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ \frac{f_m}{x_1} & \frac{f_m}{x_2} & ⋯ & \frac{f_m}{xₙ} \\ \end{bmatrix} $$
-
Matrix derivative
(2023-02-12)
Matrix derivatie is in terms of the whole matrix, instead of each element. Whereas partial derivatives of a matrix
Given a matrix $[^{a\ b}_{d\ c}]$,the derivative of its inverse matrix $\frac{1}{ac-bd}[^{\ c\ -b}_{-d\ a}]$ w.r.t. the original matrix is the “coefficient” in their relation:
$$ \underbrace{ \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} \frac{1}{ac-bd} \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} }_{\text{Coefficient}} \begin{bmatrix} a & b \\ d & c \end{bmatrix} = \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} $$
-
This transformation can be understood as that the original matrix first times its inverse $\frac{1}{ac-bd}[^{\ c\ -b}_{-d\ a}]$ to become the identity matrix $[^{1\ 0}_{0\ 1}]$, which gets multiplied by $[^{\ c\ -b}_{-d\ a}]$ to yield the inverse matrix.
Therefore, the coefficient is:
$$ \frac{1}{ac-bd} \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} = \frac{1}{ac-bd} \begin{bmatrix} c² + bd & -bc-ab \\ -cd-ad & bd+a²\end{bmatrix} $$
In this case, is the optimizing objective the whole matrix $[^{a\ b}_{d\ c}]$, with its coefficient serving as the gradient?
On the other hand, the partial derivatives of the inverse matrix $\frac{1}{ac-bd}[^{\ c\ -b}_{-d\ a}]$ with respect to each element a, b, c, d can be conceptualized as:
how does changes in the 4 “variables” $a,\ b,\ c,\ d$ affect the matrix $\frac{1}{ac-bd}[^{\ c\ -b}_{-d\ a}]$
$$ \begin{aligned} \frac{ ∂\frac{1}{ac-bd} \begin{bmatrix} c & -b \\ -d & a \end{bmatrix}}{∂a} &= \begin{bmatrix} \frac{∂}{∂a} (\frac{c}{ac-bd} ) & \frac{∂}{∂a} (\frac{-b}{ac-bd}) \\ \frac{∂}{∂a} (\frac{-d}{ac-bd}) & \frac{∂}{∂a} (\frac{a}{ac-bd} ) \\ \end{bmatrix} \\ &= \begin{bmatrix} \frac{-c²}{(ac-bd)²} & \frac{bc}{(ac-bd)²} \\ \frac{dc}{(ac-bd)²} & \frac{-bd}{(ac-bd)²} \\ \end{bmatrix} \end{aligned} $$
The total change of the matrix magnitude caused by moving $a$ by one unit would be:
$$\frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}_{-d\ a}] )}{∂a} = \frac{-c² + bc + dc - bd}{(ac-bd)²} $$
- Particularly, with this derivative, $a$ can be optimized via gradient descent.
Similarly, the partial derivatives of the matrix w.r.t. $b,\ c,\ d$ are:
$$ \begin{aligned} \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}_{-d\ a}] )}{∂b} &= \frac{cd-ac-d²+ad}{(ac-bd)²} \\ \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}_{-d\ a}] )}{∂c} &= \frac{-bd+ba+da-a²}{(ac-bd)²} \\ \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}_{-d\ a}] )}{∂d} &= \frac{cb-b²-ac+ab}{(ac-bd)²} \\ \end{aligned} $$
(2024-02-13)
Matrix Derivatives: What’s up with all those transposes ? - David Levin
Gradient: Matrix form -> indices form -> matrix form
XᵀwX
(2024-04-06)
拆分成:向量函数 + 多元函数
空间的基可以是多项式函数, 幂函数, 所以线性方程可以表示非线性函数
【微积分和线性代数碰撞的数学盛宴:最小二乘法公式推导!】-晓之车高山老师 - bilibili
(2024-05-15)