memo: Calc | Derivative of "Matrix"

Table of contents

(2024-02-14)

  • Derivative is the amount of change in a target object caused by a variable’s change.

  • A row a matrix consists of the coefficient of each term in a linear equation. And based on the “sum rule” of derivative ($(f+g)'=f'+g'$), the derivative of the linear equation w.r.t. a variable is the summation of the derivative of each element in the row w.r.t. the variable.


d Ax

(2024-01-13)

Source video: Derivative of a Matrix : Data Science Basics - ritvikmath

Matrix 𝐀() stands for a linear transformation (function). And only the derivative of a function (𝐀𝐱) makes sense.

  • Matrix is a representation of linear systems.
$$ \begin{aligned} f(x) &= 𝐀𝐱 \\\ &= \begin{bmatrix} 1 & 2 \\\ 3 & 4 \end{bmatrix} \begin{bmatrix} x₁ \\\ x_2 \end{bmatrix} \\\ &= \begin{bmatrix} x₁ + 2 x₂ \\\ 3x₁ + 4x₂ \end{bmatrix} ⇒ \begin{bmatrix} f₁(x₁,x₂) \\\ f₂(x₁,x₂) \end{bmatrix} \end{aligned} $$$$ \frac{d𝐀𝐱}{d𝐱} = \begin{bmatrix} ∂f₁/∂x₁ & ∂f₁/∂x₂ \\\ ∂f₂/∂x₁ & ∂f₂/∂x₂ \end{bmatrix}= \begin{bmatrix} 1 & 2 \\\ 3 & 4 \end{bmatrix} $$

The derivative of the linear transformation 𝐀𝐱 w.r.t. x is A. It analog to single-variable function.

A matrix $A$ is a “scalar”. More concretely, it’s a collection of scalars in a box.

Therefore, the derivative of A means the derivative of a constant, which would be 0. So, it doesn’t make any sense.

Thereby, we are not calculating the derivative of a matrix, but the derivative of a linear transformation 𝐀𝐱 w.r.t. 𝐱.


d xᵀAx

$$ \begin{aligned} 𝐱ᵀ𝐀𝐱 &= \begin{bmatrix} x₁ & x₂ \end{bmatrix} \begin{bmatrix} a₁₁ & a₁₂ \\\ a₂₁ & a₂₂ \end{bmatrix} \begin{bmatrix} x₁ \\\ x₂ \end{bmatrix} \\\ &= \begin{bmatrix} x₁ & x₂ \end{bmatrix} \begin{bmatrix} a₁₁x₁+ a₁₂x₂ \\\ a₂₁x₁ + a₂₂x₂ \end{bmatrix} \\\ &= a₁₁x₁²+ a₁₂x₁x₂ + a₂₁x₁x₂ + a₂₂x₂² ⇒ f(x₁,x₂) \end{aligned} $$

Consider 𝐀 is a symmetric matrix, so a₂ = a₃. Then, $𝐱ᵀ𝐀𝐱 = a₁₁x₁²+ a₁₂x₁x₂ + a₂₁x₁x₂ + a₂₂x₂² = f(x₁,x₂)$

The derivative of the linear transformation 𝐱ᵀ𝐀𝐱:

$$ \begin{aligned} \frac{d𝐱ᵀ𝐀𝐱}{d𝐱} &= \begin{bmatrix} ∂f/∂x₁ \\\ ∂f/∂x₂ \end{bmatrix} \\\ &= \begin{bmatrix} 2a₁₁x₁+2a₁₂x₂ \\\ 2a₁₂x₂ + 2a₂₂x₂ \end{bmatrix} \\\ &= 2 \begin{bmatrix} a₁₁ & a₁₂ \\\ a₁₂ & a₂₂ \end{bmatrix} \begin{bmatrix} x₁ \\\ x₂ \end{bmatrix} \\\ &= 2𝐀𝐱 \end{aligned} $$

It’s an analog to quadratic of matrix operations.


3 cases

Source article: The derivative matrix - Math Insight

  1. A matrix 𝐀 contains elements that are functions of a scalar x.

  2. The derivative of a multi-variable scalar-valued function $f$ is a matrix of partial derivatives of each function with respect to each variable.

    • Derivative of 𝐟 w.r.t. each coordinate axis.
    • $\frac{df}{d𝐱} = [ \frac{∂f}{∂x₁}\ \frac{∂f}{∂x₂}\ ⋯ \ \frac{∂f}{∂xₙ} ]$
  3. A matrix 𝐀 contains elements that are functions of a vector 𝐱.

    • $𝐀(𝐱) = 𝐟(𝐱) = (f_1(𝐱),\ f_2(𝐱),\ ..., f_m(𝐱)) = \begin{bmatrix} f_1(𝐱) \\\ f_2(𝐱) \\\ ⋮ \\\ f_m(𝐱) \end{bmatrix}$

    • The $\frac{d𝐀}{d𝐱}$ is a matrix with the size of mxn:

      $$ \frac{d𝐀}{d𝐱} = \begin{bmatrix} \frac{f_1}{x_1} & \frac{f_1}{x_2} & ⋯ & \frac{f_1}{xₙ} \\\ \frac{f_2}{x_1} & \frac{f_2}{x_2} & ⋯ & \frac{f_2}{xₙ} \\\ ⋮ & ⋮ & ⋮ & ⋮ \\\ \frac{f_m}{x_1} & \frac{f_m}{x_2} & ⋯ & \frac{f_m}{xₙ} \\\ \end{bmatrix} $$

Matrix derivative

(2023-02-12)

Matrix derivatie is in terms of the whole matrix, instead of each element. Whereas partial derivatives of a matrix

Given a matrix $[^{a\ b}\_{d\ c}]$,the derivative of its inverse matrix $\frac{1}{ac-bd}[^{\ c\ -b}\_{-d\ a}]$ w.r.t. the original matrix is the “coefficient” in their relation:

$$ \underbrace{ \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} \frac{1}{ac-bd} \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} }\_{\text{Coefficient}} \begin{bmatrix} a & b \\\ d & c \end{bmatrix} = \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} $$
  • This transformation can be understood as that the original matrix first times its inverse $\frac{1}{ac-bd}[^{\ c\ -b}\_{-d\ a}]$ to become the identity matrix $[^{1\ 0}_{0\ 1}]$, which gets multiplied by $[^{\ c\ -b}\_{-d\ a}]$ to yield the inverse matrix.

    Therefore, the coefficient is:

    $$ \frac{1}{ac-bd} \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix} = \frac{1}{ac-bd} \begin{bmatrix} c² + bd & -bc-ab \\\ -cd-ad & bd+a²\end{bmatrix} $$

    In this case, is the optimizing objective the whole matrix $[^{a\ b}_{d\ c}]$, with its coefficient serving as the gradient?

  • perplexity


On the other hand, the partial derivatives of the inverse matrix $\frac{1}{ac-bd}[^{\ c\ -b}\_{-d\ a}]$ with respect to each element a, b, c, d can be conceptualized as:

how does changes in the 4 “variables” $a,\ b,\ c,\ d$ affect the matrix $\frac{1}{ac-bd}[^{\ c\ -b}\_{-d\ a}]$

$$ \begin{aligned} \frac{ ∂\frac{1}{ac-bd} \begin{bmatrix} c & -b \\\ -d & a \end{bmatrix}}{∂a} &= \begin{bmatrix} \frac{∂}{∂a} (\frac{c}{ac-bd} ) & \frac{∂}{∂a} (\frac{-b}{ac-bd}) \\\ \frac{∂}{∂a} (\frac{-d}{ac-bd}) & \frac{∂}{∂a} (\frac{a}{ac-bd} ) \\\ \end{bmatrix} \\\ &= \begin{bmatrix} \frac{-c²}{(ac-bd)²} & \frac{bc}{(ac-bd)²} \\\ \frac{dc}{(ac-bd)²} & \frac{-bd}{(ac-bd)²} \\\ \end{bmatrix} \end{aligned} $$

The total change of the matrix magnitude caused by moving $a$ by one unit would be:

$$\frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}\_{-d\ a}] )}{∂a} = \frac{-c² + bc + dc - bd}{(ac-bd)²} $$
  • Particularly, with this derivative, $a$ can be optimized via gradient descent.

Similarly, the partial derivatives of the matrix w.r.t. $b,\ c,\ d$ are:

$$ \begin{aligned} \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}\_{-d\ a}] )}{∂b} &= \frac{cd-ac-d²+ad}{(ac-bd)²} \\\ \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}\_{-d\ a}] )}{∂c} &= \frac{-bd+ba+da-a²}{(ac-bd)²} \\\ \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}\_{-d\ a}] )}{∂d} &= \frac{cb-b²-ac+ab}{(ac-bd)²} \\\ \end{aligned} $$

(2024-02-13)

Matrix Derivatives: What’s up with all those transposes ? - David Levin

Gradient: Matrix form -> indices form -> matrix form


Matrix Calculus - Online


XᵀwX

(2024-04-06)

拆分成:向量函数 + 多元函数

空间的基可以是多项式函数, 幂函数, 所以线性方程可以表示非线性函数

【微积分和线性代数碰撞的数学盛宴:最小二乘法公式推导!】-晓之车高山老师 - bilibili


(2024-05-15)


(2024-07-22)

Source video: 手推机器学习1⃣️—矩阵求导 - S-WangZ(2024-05-24)

  • Scalar-value function $f: \\R^n → \\R$

Built with Hugo
Theme Stack designed by Jimmy