memo: Calc | Derivative of "Matrix"

(2024-02-14)

  • Derivative is the amount of change in a target object caused by a variable’s change.

  • A row a matrix consists of the coefficient of each term in a linear equation. And based on the “sum rule” of derivative ($(f+g)’=f’+g’$), the derivative of the linear equation w.r.t. a variable is the summation of the derivative of each element in the row w.r.t. the variable.


d Ax

(2024-01-13)

Source video: Derivative of a Matrix : Data Science Basics - ritvikmath

Matrix 𝐀() stands for a linear transformation (function). And only the derivative of a function (𝐀𝐱) makes sense.

  • Matrix is a representation of linear systems.

$$ \begin{aligned} f(x) &= 𝐀𝐱 \\ &= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} x₁ \\ x_2 \end{bmatrix} \\ &= \begin{bmatrix} x₁ + 2 x₂ \\ 3x₁ + 4x₂ \end{bmatrix} ⇒ \begin{bmatrix} f₁(x₁,x₂) \\ f₂(x₁,x₂) \end{bmatrix} \end{aligned} $$

$$ \frac{d𝐀𝐱}{d𝐱} = \begin{bmatrix} ∂f₁/∂x₁ & ∂f₁/∂x₂ \\ ∂f₂/∂x₁ & ∂f₂/∂x₂ \end{bmatrix}= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} $$

The derivative of the linear transformation 𝐀𝐱 w.r.t. x is A. It analog to single-variable function.

A matrix $A$ is a “scalar”. More concretely, it’s a collection of scalars in a box.

Therefore, the derivative of A means the derivative of a constant, which would be 0. So, it doesn’t make any sense.

Thereby, we are not calculating the derivative of a matrix, but the derivative of a linear transformation 𝐀𝐱 w.r.t. 𝐱.


d xᵀAx

$$ \begin{aligned} 𝐱ᵀ𝐀𝐱 &= \begin{bmatrix} x₁ & x₂ \end{bmatrix} \begin{bmatrix} a₁₁ & a₁₂ \\ a₂₁ & a₂₂ \end{bmatrix} \begin{bmatrix} x₁ \\ x₂ \end{bmatrix} \\ &= \begin{bmatrix} x₁ & x₂ \end{bmatrix} \begin{bmatrix} a₁₁x₁+ a₁₂x₂ \\ a₂₁x₁ + a₂₂x₂ \end{bmatrix} \\ &= a₁₁x₁²+ a₁₂x₁x₂ + a₂₁x₁x₂ + a₂₂x₂² ⇒ f(x₁,x₂) \end{aligned} $$

Consider 𝐀 is a symmetric matrix, so a₂ = a₃. Then, $𝐱ᵀ𝐀𝐱 = a₁₁x₁²+ a₁₂x₁x₂ + a₂₁x₁x₂ + a₂₂x₂² = f(x₁,x₂)$

The derivative of the linear transformation 𝐱ᵀ𝐀𝐱:

$$ \begin{aligned} \frac{d𝐱ᵀ𝐀𝐱}{d𝐱} &= \begin{bmatrix} ∂f/∂x₁ \\ ∂f/∂x₂ \end{bmatrix} \\ &= \begin{bmatrix} 2a₁₁x₁+2a₁₂x₂ \\ 2a₁₂x₂ + 2a₂₂x₂ \end{bmatrix} \\ &= 2 \begin{bmatrix} a₁₁ & a₁₂ \\ a₁₂ & a₂₂ \end{bmatrix} \begin{bmatrix} x₁ \\ x₂ \end{bmatrix} \\ &= 2𝐀𝐱 \end{aligned} $$

It’s an analog to quadratic of matrix operations.


3 cases

Source article: The derivative matrix - Math Insight

  1. A matrix 𝐀 contains elements that are functions of a scalar x.

  2. The derivative of a multi-variable scalar-valued function $f$ is a matrix of partial derivatives of each function with respect to each variable.

    • Derivative of 𝐟 w.r.t. each coordinate axis.
    • $\frac{df}{d𝐱} = [ \frac{∂f}{∂x₁}\ \frac{∂f}{∂x₂}\ ⋯ \ \frac{∂f}{∂xₙ} ]$
  3. A matrix 𝐀 contains elements that are functions of a vector 𝐱.

    • $𝐀(𝐱) = 𝐟(𝐱) = (f_1(𝐱),\ f_2(𝐱),\ …, f_m(𝐱)) = \begin{bmatrix} f_1(𝐱) \\ f_2(𝐱) \\ ⋮ \\ f_m(𝐱) \end{bmatrix}$

    • The $\frac{d𝐀}{d𝐱}$ is a matrix with the size of mxn:

      $$ \frac{d𝐀}{d𝐱} = \begin{bmatrix} \frac{f_1}{x_1} & \frac{f_1}{x_2} & ⋯ & \frac{f_1}{xₙ} \\ \frac{f_2}{x_1} & \frac{f_2}{x_2} & ⋯ & \frac{f_2}{xₙ} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ \frac{f_m}{x_1} & \frac{f_m}{x_2} & ⋯ & \frac{f_m}{xₙ} \\ \end{bmatrix} $$


Matrix derivative

(2023-02-12)

Matrix derivatie is in terms of the whole matrix, instead of each element. Whereas partial derivatives of a matrix

Given a matrix $[^{a\ b}_{d\ c}]$,the derivative of its inverse matrix $\frac{1}{ac-bd}[^{\ c\ -b}_{-d\ a}]$ w.r.t. the original matrix is the “coefficient” in their relation:

$$ \underbrace{ \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} \frac{1}{ac-bd} \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} }_{\text{Coefficient}} \begin{bmatrix} a & b \\ d & c \end{bmatrix} = \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} $$

  • This transformation can be understood as that the original matrix first times its inverse $\frac{1}{ac-bd}[^{\ c\ -b}_{-d\ a}]$ to become the identity matrix $[^{1\ 0}_{0\ 1}]$, which gets multiplied by $[^{\ c\ -b}_{-d\ a}]$ to yield the inverse matrix.

    Therefore, the coefficient is:

    $$ \frac{1}{ac-bd} \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} \begin{bmatrix} c & -b \\ -d & a \end{bmatrix} = \frac{1}{ac-bd} \begin{bmatrix} c² + bd & -bc-ab \\ -cd-ad & bd+a²\end{bmatrix} $$

    In this case, is the optimizing objective the whole matrix $[^{a\ b}_{d\ c}]$, with its coefficient serving as the gradient?

  • perplexity


On the other hand, the partial derivatives of the inverse matrix $\frac{1}{ac-bd}[^{\ c\ -b}_{-d\ a}]$ with respect to each element a, b, c, d can be conceptualized as:

how does changes in the 4 “variables” $a,\ b,\ c,\ d$ affect the matrix $\frac{1}{ac-bd}[^{\ c\ -b}_{-d\ a}]$

$$ \begin{aligned} \frac{ ∂\frac{1}{ac-bd} \begin{bmatrix} c & -b \\ -d & a \end{bmatrix}}{∂a} &= \begin{bmatrix} \frac{∂}{∂a} (\frac{c}{ac-bd} ) & \frac{∂}{∂a} (\frac{-b}{ac-bd}) \\ \frac{∂}{∂a} (\frac{-d}{ac-bd}) & \frac{∂}{∂a} (\frac{a}{ac-bd} ) \\ \end{bmatrix} \\ &= \begin{bmatrix} \frac{-c²}{(ac-bd)²} & \frac{bc}{(ac-bd)²} \\ \frac{dc}{(ac-bd)²} & \frac{-bd}{(ac-bd)²} \\ \end{bmatrix} \end{aligned} $$

The total change of the matrix magnitude caused by moving $a$ by one unit would be:

$$\frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}_{-d\ a}] )}{∂a} = \frac{-c² + bc + dc - bd}{(ac-bd)²} $$

  • Particularly, with this derivative, $a$ can be optimized via gradient descent.

Similarly, the partial derivatives of the matrix w.r.t. $b,\ c,\ d$ are:

$$ \begin{aligned} \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}_{-d\ a}] )}{∂b} &= \frac{cd-ac-d²+ad}{(ac-bd)²} \\ \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}_{-d\ a}] )}{∂c} &= \frac{-bd+ba+da-a²}{(ac-bd)²} \\ \frac{∂ (\frac{1}{ac-bd} [^{\ c\ -b}_{-d\ a}] )}{∂d} &= \frac{cb-b²-ac+ab}{(ac-bd)²} \\ \end{aligned} $$


(2024-02-13)

Matrix Derivatives: What’s up with all those transposes ? - David Levin

Gradient: Matrix form -> indices form -> matrix form


Matrix Calculus - Online


XᵀwX

(2024-04-06)

拆分成:向量函数 + 多元函数

空间的基可以是多项式函数, 幂函数, 所以线性方程可以表示非线性函数

【微积分和线性代数碰撞的数学盛宴:最小二乘法公式推导!】-晓之车高山老师 - bilibili


(2024-05-15)