read: Wi-HSNN for dimension reduction

Authors: Wandong Zhang, Jonathan Wu, Yimin Yang
Neurocomputing (2020-07-18)

Abstract

Wide hierarchical subnetwork-based neural network (Wi-HSNN)
Iterative training by adding subnetnetwork nodes into modle one-by-one
batch-by-batch training instead of processing the entire dataset (one-batch)
- Place365 has 1.8 million samples

1 Introduction

3 Proposed Wi-HSNN

Loss function of SLFN is MSE: min J= ½ ‖𝐓 - g(𝐗, 𝐰₁, 𝐛)⋅𝐰₂‖²

The output weights can be initialized from the Target data (letting 𝐇 denote g(𝐗, 𝐰₁, 𝐛)):

𝐰₂ = ( 𝐇ᵀ𝐇 + I/C )⁻¹ 𝐇ᵀ 𝐓

In this paper, the input data 𝐗 is first transformed to 𝐅ₑₙ, which then passes a SNN (SLFN, subnetwork node) and becomes g(𝐅ₑₙ⋅𝐚ₑₓ + bₑₓ). Next, the input 𝐗 is fed into multiple SNN sequentially. And their outputs are accumulated with the weight 𝐚ₜ.

Therefore, if there are L SNNs, the loss function for this problem is:

min J = ½ ‖𝐓 - ∑ᵢ₌₁ᴸ g(𝐅ₑₙⁱ⋅𝐚ₑₓⁱ + bₑₓⁱ) ⋅ 𝐚ₜᴸ‖²

3.2 Training the Wi-HSNN

Feedforward with randomly initialized (𝐚ₑₙ¹, bₑₙ¹) and (𝐚ₑₓ¹, bₑₓ¹) and calculate the optimal output weights based on pseudo-inverse:

𝐅ₑₙ¹ = g(𝐗⋅𝐚ₑₙ¹ + bₑₙ¹)
𝐅ₜ¹ = 𝐅ₑₓ¹ = g(𝐅ₑₙ¹ ⋅𝐚ₑₓ¹ + bₑₓ¹)
𝐚ₜ¹ = (𝐅ₜᵀ𝐅ₜ + I/C)⁻¹ 𝐅ₜᵀ 𝐓
Obtain the feedback error (“feature H”) matrix 𝐏ₑₓ¹ and 𝐏ₑₙ¹ for solving the weights 𝐚ₑₓ (=𝐰₂), 𝐚ₑₙ (=𝐰₁) of next iteration.

𝐏ₑₓ¹ = g⁻¹ (𝐞¹ ⋅ (I/C + (𝐚ₜ¹)ᵀ⋅𝐚ₜ¹)⁻¹ ⋅ (𝐚ₜ¹)ᵀ )
𝐏ₑₙ¹ = g⁻¹ (𝐏ₑₓ¹ ⋅ (I/C + (𝐚ₑₓ¹)ᵀ ⋅ 𝐚ₑₓ¹ )⁻¹ ⋅ (𝐚ₑₓ¹)ᵀ )
Calculate the 𝐚ₑₓ, 𝐚ₑₙ for next SNN:

𝐚ₑₙⁱ = (𝐗ᵀ𝐗 + I/C )⁻¹ 𝐗ᵀ 𝐏ₑₓⁱ⁻¹ (Entrance layer weights)
𝐚ₑₓⁱ = ( (𝐅ₑₙⁱ)ᵀ𝐅ₑₙⁱ + I/C )⁻¹ (𝐅ₑₙⁱ)ᵀ 𝐏ₑₙⁱ⁻¹ (Exit layer weights)
Summarize outputs 𝐅ₜⁱ from all exisiting SNN, and update output weight 𝐚ₜⁱ.

𝐅ₜⁱ = ∑ₖ₌₁ⁱ 𝐅ₑₓᵏ
𝐚ₜⁱ = ( (𝐅ₜⁱ)ᵀ𝐅ₜⁱ + I/C )⁻¹ (𝐅ₜⁱ)ᵀ 𝐓
Obtain the feedback error 𝐏ₑₓⁱ and 𝐏ₑₙⁱ:

𝐏ₑₓⁱ = g⁻¹ (𝐞ⁱ ⋅ (I/C + (𝐚ₜⁱ)ᵀ⋅𝐚ₜⁱ)⁻¹ ⋅ (𝐚ₜⁱ)ᵀ )
𝐏ₑₙⁱ = g⁻¹ (𝐏ₑₓⁱ ⋅ (I/C + (𝐚ₑₓⁱ)ᵀ ⋅ 𝐚ₑₓⁱ )⁻¹ ⋅ (𝐚ₑₓⁱ)ᵀ )
Repeat step 3 to step 5 L-2 times. 𝐅ₜᴸ is the final encoding. And 𝐘 = 𝐅ₜᴸ 𝐚ₜᴸ is the final classification prediction.

3.4 Batch-by-batch scheme with parallelism strategy

The entire feature set 𝐅ᴺᕽᵈ (i.e., 𝐇) and the target set 𝐓 are split into p subsets:

𝐇 = $[^{𝐇(𝐱₁)}_{^{…}_{𝐇(𝐱ₚ)}}]$, 𝐓 = $[^{𝐓(𝐱₁)}_{^{…}_{𝐓(𝐱ₚ)}}]$

The desired weights matrix 𝐰₂ (i.e., “𝛃”) represented weith pseudo-inverse matrix becomes

𝐰₂ = (𝐇ᵀ𝐇 + I/C)⁻¹𝐅ᵀ⋅𝐓
𝐰₂ = ([𝐇₁ᵀ,…, 𝐇ₚᵀ] $[^{_{𝐇(𝐱₁)}} _{^{…}_{𝐇(𝐱ₚ)}}] + ^I_{^-_C}$)⁻¹ [𝐇₁ᵀ,…, 𝐇ₚᵀ] $[ ^{_{𝐓(𝐱₁)}}_{^{…}_{𝐓(𝐱ₚ)}} ]$
𝐰₂ = ([𝐇(𝐱₁)ᵀ𝐇(𝐱₁) + … + 𝐇(𝐱ₚ)ᵀ𝐇(𝐱ₚ)] + I/C)⁻¹ ⋅ [𝐇(𝐱₁)ᵀ𝐓(𝐱₁) + … + 𝐇(𝐱ₚ)ᵀ𝐓(𝐱ₚ) ]
𝐰₂ = (∑ᵢ₌₁ᵖ 𝐇(𝐱ᵢ)ᵀ𝐇(𝐱ᵢ) + I/C)⁻¹ ⋅ ∑ᵢ₌₁ᵖ 𝐇(𝐱ᵢ)ᵀ𝐓(𝐱ᵢ)

First, calculate (∑ᵢ₌₁ᵖ 𝐇(𝐱ᵢ)ᵀ𝐇(𝐱ᵢ) + I/C)⁻¹.

The matrices are accumulated batch by batch in the code. After the 1st iteration, K=(𝐇₁ᵀ𝐇₁ + I/C)⁻¹ has obtained and returned.
When next batch 𝐇₂ is retrieved:

K_new = (𝐇₂ᵀ𝐇₂ + 𝐇₁ᵀ𝐇₁ + I/C)⁻¹
= (𝐇₂ᵀ𝐇₂ + K⁻¹)⁻¹ # analogy to (UBV + A)⁻¹, where U=𝐇₂ᵀ, V=𝐇₂, B=I, A=K⁻¹
Woodbury: “A⁻¹ - A⁻¹⋅U⋅(I+BV⋅A⁻¹⋅U)⁻¹ BV A⁻¹ " 1
= K - K⋅𝐇₂ᵀ⋅(I+𝐇₂⋅K⋅𝐇₂ᵀ)⁻¹ 𝐇₂ K
= (I - K⋅𝐇₂ᵀ⋅(I+𝐇₂⋅K⋅𝐇₂ᵀ)⁻¹ 𝐇₂ ) ⋅ K

Let Kₚ = (I - K⋅𝐇₂ᵀ⋅(I+𝐇₂⋅K⋅𝐇₂ᵀ)⁻¹ 𝐇₂ ). So K_new = Kₚ ⋅ K

Then, for the second item ∑ᵢ₌₁ᵖ 𝐇(𝐱ᵢ)ᵀ𝐓(𝐱ᵢ),

In the first batch, 𝛃=K⋅𝐇₁ᵀ⋅𝐓₁ is obtained and returned.
When the second batch coming, there should be (𝐇₁ᵀ⋅𝐓₁ + 𝐇₂ᵀ⋅𝐓₂)

𝛃_new = K_new ⋅ (𝐇₁ᵀ⋅𝐓₁ + 𝐇₂ᵀ⋅𝐓₂)
= K_new ⋅ (𝐇₁ᵀ⋅𝐓₁ + 𝐇₂ᵀ⋅𝐓₂)
= Kₚ ⋅ K ⋅ 𝐇₁ᵀ⋅𝐓₁ + K_new ⋅ 𝐇₂ᵀ⋅𝐓₂
= Kₚ ⋅ 𝛃 + K_new ⋅ 𝐇₂ᵀ⋅𝐓₂