memo: Vis | Visual Metrics

1 Peak signal-to-noise Ratio

两幅大小为m×n的单色图，一幅是原图I，另一幅是近似图像K。两者之间的平均平方误差Mean Squared Error是：

$$ MSE = \frac{1}{mn} \sum_{i=1}^m \sum_{j=1}^n\ [I(i,j) - K(i,j)]^2 $$

则峰值信噪比定义为： PSNR = 10 log₁₀ (MAXᵢ² / MSE)

MAXᵢ 是原图中可能的最大像素值，当一个像素用8 bits表示时，MAXᵢ=255
为什么是这样子？
常用于衡量有损压缩编解码器的重建质量。 wikipedia
MSE越小，PSNR越大。当两图没有误差时，PSNR趋于无穷

2 Structural SIMilarity Index

用于预测数字图像的可感知质量 perceived quality

SSIM ² is used to measure the distortion degree of an image, or measure the similarity between two images from 3 aspects: Luminance, Contrast, and Structure.

Ratio of luminance l(X,Y) = 2μₓμᵧ+C₁/(μₓ²+μᵧ²+C₁), where μₓ is the mean of the intensity of all pixels in image X: μₓ = 1/N ∑ᵢ₌₁ᴺ Xᵢ. And C₁ prevents the divisor from being 0.
Based on AM-GM inequality: (√x - √y)²≥0, if and only if μₓ=μᵧ, l(X,Y)=1.
Ratio of contrast c(X,Y) = 2σₓσᵧ+C₂/(σₓ²+σᵧ²+C₂), where σₓ is the (non-bias) standard deviation of the intensity of all pixels in the image X: σₓ = (1/(N-1) ∑ᵢ₌₁ᴺ (Xᵢ-μₓ)²)¹ᐟ².
If and only if σ₁=σ₂, c(X,Y)=0.
Structure is reflected by correlation coefficient: s(X,Y) = σₓᵧ+C₃/(σₓσᵧ+C₃), where σₓᵧ is the covaraiance of the intensity of 2 images: σₓᵧ= 1/(N-1) ∑ᵢ₌₁ᴺ (Xᵢ-μₓ)(Yᵢ-μᵧ).

The intent of the paper may be to measure the information aside from Luminance and Contrast ⁵, so each pixel is reduced by the mean and devided by stddev: (Xᵢ-μₓ)/σₓ and (Yᵢ-μᵧ)/σᵧ, i.e., normalization, and then compute the distance between corresponding pixels in 2 images by inner product.

Thus, S(X,Y) is 1/(N-1) ∑ᵢ₌₁ᴺ [(Xᵢ-μₓ)/σₓ ⋅ (Yᵢ-μᵧ)/σᵧ ] = σₓᵧ/σₓσᵧ

The final expression is the product of the above 3 features with specified power (weights) α β γ:

S(X,Y) = l(X,Y)ᵅ ⋅ c(X,Y)ᵝ ⋅ s(X,Y)ᵞ ∈ [-1,1]

where l(X,Y), c(X,Y), s(X,Y) ∈ [-1,1], and since brightness ≥0, the actural range of l and c are (0,1]. If and only if the images X and Y are the same, the 3 items are equal to 1 at the same time.

Let α, β, γ=1, the SSIM(X,Y) = $\frac{ 2(μₓμᵧ+C₁) (2σₓᵧ + C₂) }{(μₓ²+μᵧ²+C₁) (σₓ²+σᵧ²+C₂)}$, where C₁ = (K₁L)², C₂ = (K₂L)², C₃=C₂/2, and L is the maximum for a pixel (L=2^b). Based on the rule of thumb, K₁ = 0.01, K₂=0.03.

In practice, SSIM is not performed on the entire image, but calculating the mean and stddev in a local sliding window (kernel, filter of 11x11 with stddev=1.5, sum=1), which represents a circular-symmetric Gaussian Weighting Function.

Hence, the local mean and std-dev are confined within the kernel:

μₓ = ∑ᵢ wᵢ Xᵢ
σₓ = ∑ᵢ wᵢ (Xᵢ-μₓ)²)¹ᐟ²
σₓᵧ= ∑ᵢ wᵢ (Xᵢ-μₓ)(Yᵢ-μᵧ)

where wᵢ is the parameters of the Gaussian kernel smoothing the image, such that fine-details are smeared and compare mainly the general features.

At the end, the average of all local SSIM is M-SSIM(X,Y) = 1/M . ∑ⱼ₌₁ᴹ SSIM(Xᵢ, Yⱼ)

Used as loss

Since the SSIM measures the similarity between 2 images, it can be used in the supervised training. Thus, SSIM dissimilarity, SSIMD = (1-SSIM)/2 ∈ (0,] is a kind of loss function.

有损压缩Lossy compression两种基本机制： wikipedia

有损变换编解码：对图像/声音采样，
预测编解码：

3 LPIPS

The motivation of Learned Perceptual Image Patch Similarity (LPIPS) ³ is that:
the conclusion for the similarity between 2 images from deep neural networks are aligned with humans perception. But the structure-based metrics usually give opposite judgement on the too-smoothed images ⁵. (2023-02-18)

LPIPS compares the intermediate convolutional feature vectors at different levels.

lpips.LPIPS(net="alex") 把两幅图输入神经网络 (VGG, Alexnet) 进行多层级特征间的对比。每层输出的 feature mapᴴˣᵂˣᶜ 激活后归一化，再相减，将各层差异按像素加权求和，并(除以像素个数)做spatial 平均，再把各层差异加起来 ⁴

1 Peak signal-to-noise Ratio

2 Structural SIMilarity Index

Used as loss

3 LPIPS

Ref