watch: Nathan 09 | Tensor Decompositions

Course page: AM584

Descovered by Ytb search: CANDECOMP/PARAFAC (CP) decomposition

Tensor Decomposition

Source video Applied Linear Algebra: Tensor Decompositions

Tensor is the generalization of matrix. Tensor decomposition is doing multiple times SVD to extract the corelation in each dimension.

One piece of data could be a matrix, e.g., the temperture distribution in a region of latitude and longitude.

By stacking the data matrices of different time points, the obtained data forms a cube.

X ` ` . . ` ` X ` ` . .

Vectorizing

Vectorize a (n-dimensional) array is to reshape it into a vector.

Stacking the columns on top of each other.

X : X O r ` ` . . ` ` ` ` . . : X

Vectorizing doesn’t bother the similarity against each smaples? because the total amount of difference is fixed? No matter using which kind of metric, the normalized distinctions between each other are fixed?

SVD

By flattening each sample, and arrange them into a matrix, then SVD can be applied.

Reconstruct the matrix A with low-rank structures?

m n A = σ u 1 v 1 + σ u v + σ u v +

To avoid vectorization (flattening) messes up or mixes together the independent attributes (temperture, pressure, and humidity) and keep them in their own dimensions separately, SVD needs to be applied for every dimension.

Decompose a data cube 𝓜 by performing SVD in 3 directions:

c σ a a b c b = + σ a b c + σ a b c +

A dominant vector is responsible for its homologous vectors, having the same dimensionality.

For example, the above 5-dimensional vector is representing for the vectors parallel to it.

As the following hand illustration shows, Align the fingers with the vector and move towards the direction that the palm is facing. The vectors on the hand path will be represented.

(image comes from here)

Reconstruct

The r-rank approximation to the data cube M is the sum of the outer product of the dominant vectors in 3 directions.

𝓜 = ∑ⱼ₌₁ʳ σⱼ∘aⱼ∘bⱼ∘cⱼ

All individual dominant vectors aⱼ constitute 𝓐ᵣ. And 𝓑ᵣ is the collection of all the bⱼ, and so do 𝓒ᵣ.

𝓜 = 𝓐ᵣ ∘ 𝓑ᵣ ∘ 𝓒ᵣ

Inner and Outer product:

Given two column vectors: $𝐮=[^{{u_1}}_{^{u_2}_{u_3}}]$ and $𝐯=[^{v_1}_{^{v_2}_{v_3}}]$,

  • Inner product (equals dot product):

    $$ \rm uᵀv = (u₁,u₂,u₃) [^{v_1}_{^{v_2}_{v_3}}] = u₁v₁ +u₂v₂ + u₃v₃ \text{= scalar} $$

  • Outer prodcut: $$ \rm 𝐮⊗𝐯 = uvᵀ = [^{{u_1}}_{^{u_2}_{u_3}}] (v₁,v₂,v₃) \\ = \begin{bmatrix} \rm u₁v₁ & \rm u₁v₂ & \rm u₁v₃ \\ \rm u₂v₁ & \rm u₂v₂ & \rm u₂v₃ \\ \rm u₃v₁ & \rm u₃v₂ & \rm u₃v₃ \end{bmatrix} $$

    Any two vectors (with difference lengths) can perform outer product:

    $$ \rm (^{v₁}_{v₂}) ⊗ (^{w₁}_{^{w₂}_{w₃}}) = \begin{bmatrix} \rm v₁w₁ & \rm v₁w₂ & \rm v₁w₃ \\ \rm v₂w₁ & \rm v₂w₂ & \rm v₂w₃ \end{bmatrix} $$

    Each entry in the matrix is a scalar and can be indexed by: (𝐯 ⊗ 𝐰)ᵢⱼ= 𝐯ᵢ⋅𝐰ⱼ, the product of an element in the first vector with an element in the second vector.

  • Vector-matrix outer product

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    >>> v = torch.tensor([0,1])         # [2]
    >>> m = torch.tensor([[2,0],[1,3]]) # [2, 2]
    >>> torch.einsum('p,qr-> pqr',v,m)
    
    tensor([[[0, 0],
           [0, 0]],
    
          [[2, 0],
           [1, 3]]])
    

    Pytorch batch matrix vector outer product - SO

  • The SVD decomposition can be interpreted as the sum of outer products of each left (𝐮ₖ) and right (𝐯ₖ) singular vectors, scaled by the corresponding nonzero signular value σₖ wiki

    𝐀 = 𝐔 𝚺 𝐕ᵀ = $∑_{k=1}^{rank(A)}$ (𝐮ₖ ⊗ 𝐯ₖ)σₖ

Code Demo

Source video: Applied Linear Algebra: Implementing Tensor Decompositions

Dataset

The data cube is generated by a spatial temporal function. It’s like a video having x and y directions and chaning in time.

y x t = σ m o t d e 1 x y + σ m o a d e 2 x y

The function mixes two fundamental features (modes) together. So the decomposition method should pull out the the two features back.

  • Feature 1: $f₁(x,y,t) = e^{-(x²+0.5*y²)} cos(2t)$, the spatial structure is a elliptical shaped gaussian, and this bump oscillate up and down in time.
  • Feature 2: $f₂(x,y,t) = sech(x)tanh(x) e^{-0.2y^2} sin(t)$, x direction is the shape -∿-, y direction is gaussian bell, and it oscillates with sine t
  • Data matrix: F = f₁ + f₂
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
clear all; close all; clc

% Basic domains (each frame is a rectangle)
x = -5:0.1:5; y = -6:0.1:6; t = 0:0.1:10*pi;
[X, Y, T] = meshgrid(x, y, t);  % Data cube

f1 = exp(-(X.^2 + 0.5*Y.^2)).*(cos(2*T));       % (Y,X,t): (121, 101, 315)
f2 = (sech(X).*tanh(X).*exp(-0.2*Y.^2)).*sin(T)
A = f1 + f2;  % Tensor

% Visualizing the shapes
for j = 1:length(t):
    surfl(x, y, f1(:,:,j);
    %surfl(x, y, f2(:,:,j);
    %axis([-5,5], [-6,6], [-1,1]);
    colormap(hot)
    shading interp; drawnow
end

% Vectorizing (for performing standard SVD)
nx = length(x); ny = length(y);
for j = 1:length(t):
    % Every flattened column vector is appended to the matrix Af
    Af(:,j) = reshape(A(:,:,j), nx*ny, 1);  % (nx*ny, t)
end

% Make matrix Af back to a tensor
for j = 1:length(t)
    At(:,:,j) = reshape(Af(:,j), nx, ny)
end

% n-way array tensor decomposition (parallel factorization algorithm)
% Given a hypercube of data, it will find the dominant factors in each direction.
% Input tensor and rank (# of dominant components truncated)
model = parafac(A,2);

% Determine factors in each directions from each low-rank projections
% If A is a 10-dim tensor, A1,A2,...,A10 directions are expected.
% A1 has 2 factors, A2 has 2 factors, ...
[A1, A2, A3] = fac2let(model);  

% Plot the 2 factors (vectors) in each component.
subplot(3,1,1), plot(y, A1, 'Linewidth', [2])   % 
subplot(3,1,2), plot(x, A2, 'Linewidth', [2])
subplot(3,1,3), plot(t, A3, 'Linewidth', [2])

In y direction has two gaussian: $e^{-0.5y^2}$ and $e^(-0.2y^2)$

In x direction, there are a gaussian $e^(-x^2)$ and $sech(x)*tanh(x)$

In t direciton, cos(2t) and sin(t) are pulled out.