read: Wi-HSNN for dimension reduction

Wi-HSNN: A subnetwork-based encoding structure for dimension reduction and food classification via harnessing multi-CNN model high-level features (Neuralcomputing)

Authors: Wandong Zhang, Jonathan Wu, Yimin Yang
Neurocomputing (2020-07-18)

Abstract

  • Wide hierarchical subnetwork-based neural network (Wi-HSNN)
  • Iterative training by adding subnetnetwork nodes into modle one-by-one
  • batch-by-batch training instead of processing the entire dataset (one-batch)
    • Place365 has 1.8 million samples

1 Introduction

3 Proposed Wi-HSNN

Loss function of SLFN is MSE: min J= ½ ‖𝐓 - g(𝐗, 𝐰₁, 𝐛)⋅𝐰₂‖²

The output weights can be initialized from the Target data (letting 𝐇 denote g(𝐗, 𝐰₁, 𝐛)):

𝐰₂ = ( 𝐇ᵀ𝐇 + I/C )⁻¹ 𝐇ᵀ 𝐓

In this paper, the input data 𝐗 is first transformed to 𝐅ₑₙ, which then passes a SNN (SLFN, subnetwork node) and becomes g(𝐅ₑₙ⋅𝐚ₑₓ + bₑₓ). Next, the input 𝐗 is fed into multiple SNN sequentially. And their outputs are accumulated with the weight 𝐚ₜ.

Therefore, if there are L SNNs, the loss function for this problem is:

min J = ½ ‖𝐓 - ∑ᵢ₌₁ᴸ g(𝐅ₑₙⁱ⋅𝐚ₑₓⁱ + bₑₓⁱ) ⋅ 𝐚ₜᴸ‖²

3.2 Training the Wi-HSNN

  1. Feedforward with randomly initialized (𝐚ₑₙ¹, bₑₙ¹) and (𝐚ₑₓ¹, bₑₓ¹) and calculate the optimal output weights based on pseudo-inverse:

    𝐅ₑₙ¹ = g(𝐗⋅𝐚ₑₙ¹ + bₑₙ¹)
    𝐅ₜ¹ = 𝐅ₑₓ¹ = g(𝐅ₑₙ¹ ⋅𝐚ₑₓ¹ + bₑₓ¹)
    𝐚ₜ¹ = (𝐅ₜᵀ𝐅ₜ + I/C)⁻¹ 𝐅ₜᵀ 𝐓

  2. Obtain the feedback error (“feature H”) matrix 𝐏ₑₓ¹ and 𝐏ₑₙ¹ for solving the weights 𝐚ₑₓ (=𝐰₂), 𝐚ₑₙ (=𝐰₁) of next iteration.

    𝐏ₑₓ¹ = g⁻¹ (𝐞¹ ⋅ (I/C + (𝐚ₜ¹)ᵀ⋅𝐚ₜ¹)⁻¹ ⋅ (𝐚ₜ¹)ᵀ )
    𝐏ₑₙ¹ = g⁻¹ (𝐏ₑₓ¹ ⋅ (I/C + (𝐚ₑₓ¹)ᵀ ⋅ 𝐚ₑₓ¹ )⁻¹ ⋅ (𝐚ₑₓ¹)ᵀ )

  3. Calculate the 𝐚ₑₓ, 𝐚ₑₙ for next SNN:

    𝐚ₑₙⁱ = (𝐗ᵀ𝐗 + I/C )⁻¹ 𝐗ᵀ 𝐏ₑₓⁱ⁻¹ (Entrance layer weights)
    𝐚ₑₓⁱ = ( (𝐅ₑₙⁱ)ᵀ𝐅ₑₙⁱ + I/C )⁻¹ (𝐅ₑₙⁱ)ᵀ 𝐏ₑₙⁱ⁻¹ (Exit layer weights)

  4. Summarize outputs 𝐅ₜⁱ from all exisiting SNN, and update output weight 𝐚ₜⁱ.

    𝐅ₜⁱ = ∑ₖ₌₁ⁱ 𝐅ₑₓᵏ
    𝐚ₜⁱ = ( (𝐅ₜⁱ)ᵀ𝐅ₜⁱ + I/C )⁻¹ (𝐅ₜⁱ)ᵀ 𝐓

  5. Obtain the feedback error 𝐏ₑₓⁱ and 𝐏ₑₙⁱ:

    𝐏ₑₓⁱ = g⁻¹ (𝐞ⁱ ⋅ (I/C + (𝐚ₜⁱ)ᵀ⋅𝐚ₜⁱ)⁻¹ ⋅ (𝐚ₜⁱ)ᵀ )
    𝐏ₑₙⁱ = g⁻¹ (𝐏ₑₓⁱ ⋅ (I/C + (𝐚ₑₓⁱ)ᵀ ⋅ 𝐚ₑₓⁱ )⁻¹ ⋅ (𝐚ₑₓⁱ)ᵀ )

  6. Repeat step 3 to step 5 L-2 times. 𝐅ₜᴸ is the final encoding. And 𝐘 = 𝐅ₜᴸ 𝐚ₜᴸ is the final classification prediction.

3.4 Batch-by-batch scheme with parallelism strategy

The entire feature set 𝐅ᴺᕽᵈ (i.e., 𝐇) and the target set 𝐓 are split into p subsets:

𝐇 = $[^{𝐇(𝐱₁)}_{^{…}_{𝐇(𝐱ₚ)}}]$, 𝐓 = $[^{𝐓(𝐱₁)}_{^{…}_{𝐓(𝐱ₚ)}}]$

The desired weights matrix 𝐰₂ (i.e., “𝛃”) represented weith pseudo-inverse matrix becomes

𝐰₂ = (𝐇ᵀ𝐇 + I/C)⁻¹𝐅ᵀ⋅𝐓
𝐰₂ = ([𝐇₁ᵀ,…, 𝐇ₚᵀ] $[^{_{𝐇(𝐱₁)}} _{^{…}_{𝐇(𝐱ₚ)}}] + ^I_{^-_C}$)⁻¹ [𝐇₁ᵀ,…, 𝐇ₚᵀ] $[ ^{_{𝐓(𝐱₁)}}_{^{…}_{𝐓(𝐱ₚ)}} ]$
𝐰₂ = ([𝐇(𝐱₁)ᵀ𝐇(𝐱₁) + … + 𝐇(𝐱ₚ)ᵀ𝐇(𝐱ₚ)] + I/C)⁻¹ ⋅ [𝐇(𝐱₁)ᵀ𝐓(𝐱₁) + … + 𝐇(𝐱ₚ)ᵀ𝐓(𝐱ₚ) ]
𝐰₂ = (∑ᵢ₌₁ᵖ 𝐇(𝐱ᵢ)ᵀ𝐇(𝐱ᵢ) + I/C)⁻¹ ⋅ ∑ᵢ₌₁ᵖ 𝐇(𝐱ᵢ)ᵀ𝐓(𝐱ᵢ)

First, calculate (∑ᵢ₌₁ᵖ 𝐇(𝐱ᵢ)ᵀ𝐇(𝐱ᵢ) + I/C)⁻¹.

  1. The matrices are accumulated batch by batch in the code. After the 1st iteration, K=(𝐇₁ᵀ𝐇₁ + I/C)⁻¹ has obtained and returned.

  2. When next batch 𝐇₂ is retrieved:

    K_new = (𝐇₂ᵀ𝐇₂ + 𝐇₁ᵀ𝐇₁ + I/C)⁻¹
    = (𝐇₂ᵀ𝐇₂ + K⁻¹)⁻¹ # analogy to (UBV + A)⁻¹, where U=𝐇₂ᵀ, V=𝐇₂, B=I, A=K⁻¹
    Woodbury: “A⁻¹ - A⁻¹⋅U⋅(I+BV⋅A⁻¹⋅U)⁻¹ BV A⁻¹ " 1
    = K - K⋅𝐇₂ᵀ⋅(I+𝐇₂⋅K⋅𝐇₂ᵀ)⁻¹ 𝐇₂ K
    = (I - K⋅𝐇₂ᵀ⋅(I+𝐇₂⋅K⋅𝐇₂ᵀ)⁻¹ 𝐇₂ ) ⋅ K

    Let Kₚ = (I - K⋅𝐇₂ᵀ⋅(I+𝐇₂⋅K⋅𝐇₂ᵀ)⁻¹ 𝐇₂ ). So K_new = Kₚ ⋅ K

Then, for the second item ∑ᵢ₌₁ᵖ 𝐇(𝐱ᵢ)ᵀ𝐓(𝐱ᵢ),

  1. In the first batch, 𝛃=K⋅𝐇₁ᵀ⋅𝐓₁ is obtained and returned.

  2. When the second batch coming, there should be (𝐇₁ᵀ⋅𝐓₁ + 𝐇₂ᵀ⋅𝐓₂)

    𝛃_new = K_new ⋅ (𝐇₁ᵀ⋅𝐓₁ + 𝐇₂ᵀ⋅𝐓₂)
    = K_new ⋅ (𝐇₁ᵀ⋅𝐓₁ + 𝐇₂ᵀ⋅𝐓₂)
    = Kₚ ⋅ K ⋅ 𝐇₁ᵀ⋅𝐓₁ + K_new ⋅ 𝐇₂ᵀ⋅𝐓₂
    = Kₚ ⋅ 𝛃 + K_new ⋅ 𝐇₂ᵀ⋅𝐓₂


矩阵之和的逆

(DDG search: “矩阵之和的逆”) 两个矩阵相加后求逆 - ~海棠依旧~ - CSDN

关于两个矩阵之和逆阵的讨论 - docin

(Google search: “两矩阵和的逆”) 兩矩陣和的逆矩陣 - 線代啟示錄 - 周志成(阳明交大)


矩阵之和的逆 不等于 逆矩阵的和

(A+B)⁻¹ ≠ A⁻¹ + B⁻¹

∵ (A+B)(A⁻¹ + B⁻¹) = E + BA⁻¹ + AB⁻¹ + E ≠ E

最简单的例子:取 A=B=E,(A+B)(A⁻¹ + B⁻¹) = E + BA⁻¹ + AB⁻¹ + E = 4E ≠ E 矩阵和的逆矩阵 逆矩阵的和 相等吗 -百度知道

Ref