1 Peak signal-to-noise Ratio
-
两幅大小为m×n的单色图,一幅是原图I,另一幅是近似图像K。 两者之间的平均平方误差Mean Squared Error是:
$$ MSE = \frac{1}{mn} \sum_{i=1}^m \sum_{j=1}^n\ [I(i,j) - K(i,j)]^2 $$
则峰值信噪比定义为: PSNR = 10 log₁₀ (MAXᵢ² / MSE)
MAXᵢ 是原图中可能的最大像素值,当一个像素用8 bits表示时,MAXᵢ=255
-
为什么是这样子?
-
常用于衡量有损压缩编解码器的重建质量。 wikipedia
-
MSE越小,PSNR越大。当两图没有误差时,PSNR趋于无穷
2 Structural SIMilarity Index
- 用于预测数字图像的可感知质量 perceived quality
SSIM ² is used to measure the distortion degree of an image, or measure the similarity between two images from 3 aspects: Luminance, Contrast, and Structure.
-
Ratio of luminance l(X,Y) = 2μₓμᵧ+C₁/(μₓ²+μᵧ²+C₁), where μₓ is the mean of the intensity of all pixels in image X: μₓ = 1/N ∑ᵢ₌₁ᴺ Xᵢ. And C₁ prevents the divisor from being 0.
Based on AM-GM inequality: (√x - √y)²≥0, if and only if μₓ=μᵧ, l(X,Y)=1. -
Ratio of contrast c(X,Y) = 2σₓσᵧ+C₂/(σₓ²+σᵧ²+C₂), where σₓ is the (non-bias) standard deviation of the intensity of all pixels in the image X: σₓ = (1/(N-1) ∑ᵢ₌₁ᴺ (Xᵢ-μₓ)²)¹ᐟ².
If and only if σ₁=σ₂, c(X,Y)=0. -
Structure is reflected by correlation coefficient: s(X,Y) = σₓᵧ+C₃/(σₓσᵧ+C₃), where σₓᵧ is the covaraiance of the intensity of 2 images: σₓᵧ= 1/(N-1) ∑ᵢ₌₁ᴺ (Xᵢ-μₓ)(Yᵢ-μᵧ).
The intent of the paper may be to measure the information aside from Luminance and Contrast ⁵, so each pixel is reduced by the mean and devided by stddev: (Xᵢ-μₓ)/σₓ and (Yᵢ-μᵧ)/σᵧ, i.e., normalization, and then compute the distance between corresponding pixels in 2 images by inner product.
Thus, S(X,Y) is 1/(N-1) ∑ᵢ₌₁ᴺ [(Xᵢ-μₓ)/σₓ ⋅ (Yᵢ-μᵧ)/σᵧ ] = σₓᵧ/σₓσᵧ
The final expression is the product of the above 3 features with specified power (weights) α β γ:
S(X,Y) = l(X,Y)ᵅ ⋅ c(X,Y)ᵝ ⋅ s(X,Y)ᵞ ∈ [-1,1]
where l(X,Y), c(X,Y), s(X,Y) ∈ [-1,1], and since brightness ≥0, the actural range of l and c are (0,1]. If and only if the images X and Y are the same, the 3 items are equal to 1 at the same time.
Let α, β, γ=1, the SSIM(X,Y) = $\frac{ 2(μₓμᵧ+C₁) (2σₓᵧ + C₂) }{(μₓ²+μᵧ²+C₁) (σₓ²+σᵧ²+C₂)}$, where C₁ = (K₁L)², C₂ = (K₂L)², C₃=C₂/2, and L is the maximum for a pixel (L=2^b). Based on the rule of thumb, K₁ = 0.01, K₂=0.03.
In practice, SSIM is not performed on the entire image, but calculating the mean and stddev in a local sliding window (kernel, filter of 11x11 with stddev=1.5, sum=1), which represents a circular-symmetric Gaussian Weighting Function.
Hence, the local mean and std-dev are confined within the kernel:
- μₓ = ∑ᵢ wᵢ Xᵢ
- σₓ = ∑ᵢ wᵢ (Xᵢ-μₓ)²)¹ᐟ²
- σₓᵧ= ∑ᵢ wᵢ (Xᵢ-μₓ)(Yᵢ-μᵧ)
where wᵢ is the parameters of the Gaussian kernel smoothing the image, such that fine-details are smeared and compare mainly the general features.
At the end, the average of all local SSIM is M-SSIM(X,Y) = 1/M . ∑ⱼ₌₁ᴹ SSIM(Xᵢ, Yⱼ)
Used as loss
Since the SSIM measures the similarity between 2 images, it can be used in the supervised training. Thus, SSIM dissimilarity, SSIMD = (1-SSIM)/2 ∈ (0,] is a kind of loss function.
有损压缩Lossy compression两种基本机制: wikipedia
- 有损变换编解码:对图像/声音采样,
- 预测编解码:
3 LPIPS
The motivation of Learned Perceptual Image Patch Similarity (LPIPS) ³ is that:
the conclusion for the similarity between 2 images from deep neural networks are aligned with humans perception.
But the structure-based metrics usually give opposite judgement on the too-smoothed images ⁵. (2023-02-18)
LPIPS compares the intermediate convolutional feature vectors at different levels.
lpips.LPIPS(net="alex") 把两幅图输入神经网络 (VGG, Alexnet) 进行多层级特征间的对比。
每层输出的 feature mapᴴˣᵂˣᶜ 激活后归一化,再相减,将各层差异按像素加权求和,并(除以像素个数)做spatial 平均,再把各层差异加起来 ⁴