Authors: Wandong Zhang et. al. Publish date: 2020-03-30 (Finished in 2019)
IEEE Trans. Industrial Informatics | G.Drive | G.Scholar
Try to summary (2023-02-26):
- Different features are concatenated and then fed into a “I-ELM with subnetwork nodes”.
- What is optimized it the combination weights, but the feature vectors themselves are not changed.
- It is the weights (IW, š) are refined.
- Sepecifically, the new R-SNN node is improved by adding a part of unlearned wights accquired from the residual error of the last node. That is the weights are accumulated on the newest node. So the ultimate R-SNN node contains all the previous training outcomes. What we kept is only the last R-SNN node, i.e., a SLFN.
- Does that require the performance of the final R-SNN is the best among the former nodes.
(In code) The update process of a SLFN is as follows:
Abstract
- A supervised multi-layer subnetwork-based feature refinement and classification model for representation learning.
- Expand the width for a generalized hidden layer rather than stack more layers to go deeper
- One-shot solution for finding the meaningful latent space to recognize the objects rather than searching separate spaces to find a generalized feature space.
- Multimodal fusion fusing various feature sources into a superstate encoding instead of a unimodal feature coding in the traditional feature representation methods.
ā . Introduction
(Task & Application & List of ralated research field & Problem & Existing solutions brif)
- Task: high-dimensional data processing and learning
- Problem definition: selecting the optimal feature descriptors
- 2 branch of solutions: hand-crafted descriptors and deep-learning-based features.
(Criticize the former feature extraction solutions and introduce proposed method:)
- Features derived from approaches of those 2 categories are too inflexible to contribute a robust model.
- This method “encodes and refines these? raw features from multiple sources to improve the classification performance”.
For example: 4 extracted features (from AlexNet, ResNet, HMP, and SPF) are concatenated into 1 vector taken as the input to a “3-layer” model, where only a single “genearlized” hidden layer (latent space) bridges the raw feature space (transformation ax+b) and the final target space (residual error).
(Recap deep learning models and mention the theory base of this work)
- Deep networks often get “trapped in local minimum and are sensitive to the learning rate” because their training fundation is BP.
- Regression-based feature learning. Least-squares representation learning methods.
(Problems to be solved)
Drawbacks of regression-based approaches:
- “block” models? don’t perform one-shot training philosophy based on the relation between raw data and the target.
- A model trained by some “designed” process has a inferious generalizatio n capacity than the model derived from one-shot training strategy (least-squares).
Drawbacks of multilayer neural networks & solution
- Deeper layer-stacked models suffer from overfitting with limited training samples.
- Network-in-network structure enhances the network’s generalization capacity for learning feature. ELM with subnetwork nodes.
- Contributions:
- Subnetwork neural nodes (SNN) realized multilayer representation learning. Unlike the ensembled network, the SNN is trained based on the error term.
- Feature space transformation and the classification are solved together by searching iteratively the optimal encoding space (hidden layer).
- Concatenation of multiple features result more discriminative representations for samples.
ā ”. Literature review
A. Conventional Feature Coding
" Supervised method of learning representaiton evaluates the importance of a specific feature through the correlation between features and categories."
Conventional feature coding of images depends on prior knowledge of the problem. Thus, the features are not complete representations.
This paper enhances the feature by fusing (discriminative) hand-crafted features and (class-specific) CNN-based features.
B. Least-Squares Encoding Methods
The least-squares approximation methods, such as random forest and alternating minimization, have been exhaustively investigated in single-layer neural networks.
Related works: Moore-Penrose inverse; Universal approximation capacity of I-ELM, ELM autoe-ncoder14, Features combined with subnetwork nodes 18
Each SNN is applied as a local feature descriptor. Hence, the subspace features can be extracted? from the original data independently, and the useful features are generated via the combination of these features.
ā ¢. Proposed Method
A. Algorithmic Summary
Two steps:
- Preprocessing: concatenate various feature vectors into a single “supervector”.
- Train the width-growth model:
Terminology:layer name marker params in out input Entrance (feature) layer š šįµ¢į¶ , šįµ¢į¶ random vct linear combination š hidden Refinement layer/subspace š šįµ¢Ź³, šįµ¢Ź³ (š,b) š partial feature ĪØ output Least square learning layr š£ šįµ¢įµ (š) ĪØ sum up all partial features: šŖ
residual error š
(An entrance layer and a refinement layer both are “SNN”, and their combination is a “R-SNN”)
-
Initialization: For the 1st R-SNN, šāį¶ , šāʳ are random generating a false feature ĪØ.
Then the first least-square method (pseudoinverse) is performed to calculate šāįµ based on target š and ĪØ. -
Iteratively add the R-SNN (2⤠i⤠L) (refinement subspace) into the hidden layer (optimal feature space)
flowchart TB subgraph In[input feature] x1((1)) & x2((2)) & xe(("ā®")) & xn((n)) end EnW("Entrance layer\n šįµ¢į¶ , šįµ¢į¶ \n random") subgraph H["entrance feature š"] h1((1)) & h2((2)) & he(("ā®")) & hD((D)) end RefineW("Refinement layer\n šįµ¢Ź³, šįµ¢Ź³") subgraph Psi[partial feature ĪØ] ĪØ1((1)) & ĪØ2((2)) & ĪØe(("ā®")) & ĪØd((d)) end OW("Output layer\n šįµ¢įµ") subgraph Out["Output vector"] o1((1)) & o2((2)) & oe(("ā®")) & om((m)) end x1 & x2 & xe & xn --> EnW --> h1 & h2 & he & hD --> RefineW --> ĪØ1 & ĪØ2 & ĪØe & ĪØd --> OW --> o1 & o2 & oe & om Out -->|"- šįµ¢āā"| erri["šįµ¢"] erri & OW -.->|pinv| newĪØ("š \n yielding\n šįµ¢") subgraph H1["entrace feature šįµ¢āā"] h11((1)) & h12((2)) & h1e(("ā®")) & h1D((D)) end In --> EnW1("Entrance layer\n šįµ¢āāį¶ , šįµ¢āāį¶ \n random") --> h11 & h12 & h1e & h1D H1 --> RefineW1("Refinement layer\n šįµ¢āāŹ³, šįµ¢āāŹ³") %%-.-|solved by P| newĪØ newĪØ -.-> RefineW1 subgraph Psi1[partial feature ĪØ] ĪØ11((1)) & ĪØ12((2)) & ĪØ1e(("ā®")) & ĪØ1d((d)) end RefineW1 --> ĪØ11 & ĪØ12 & ĪØ1e & ĪØ1d --> OW1("Output layer\n šįµ¢āāįµ") %%OW1 -.-|solved by| erri erri -.-> OW1 subgraph Out1["Output vector"] o11((1)) & o12((2)) & o1e(("ā®")) & o1m((m)) end OW1 --> o11 & o12 & o1e & o1m Out1 -->|"- šįµ¢"| erri+1["šįµ¢āā"] --> newP
B. Model Definition
-
SLFN solves the regression problem can be expressed as:
-
MLNN has nested transformation:
-
Proposed method is a generlized SLFN:
minimize J = ½ āš-f(šįµ¢į¶ , šįµ¢Ź³, šįµ¢Ź³)ā š_Lįµā²,
- f(šįµ¢į¶ , šįµ¢Ź³, šįµ¢Ź³) = āįµ¢āāᓸ g(šįµ¢į¶ ā šįµ¢Ź³ + šįµ¢Ź³): sum all R-SNN
- šįµ¢į¶ = g(šįµ¢į¶ , šįµ¢į¶ , š)
- š ā āᓺį½įµ: expected output, target feature
- š ā āᓺį½āæ: input matrix
- L : number of R-SNN node
- g : activateion function
-
3 differences from other least-squares-based MLNNs
-
SNN combines each dimension of the feature vector serving as local feature descriptor. While the R-SNN is the basic unit to refine feature vectors.
-
Optimal feature is the aggregation of R-SNN added one by one. R-SNN is densly connected to input vector and output layer containing twice linear projection. Different R-SNNs are independent because they learn from different error.
-
The latent space is the aggregation of all R-SNN nodes subspace. So the parameters training has no block-wise communication between different spaces. That means the feature refinement and classification are doen together.
-
C. Proposed Width-Growth Model
-
Input weights and bias šįµ¢į¶ , šįµ¢į¶ : randomly initialized;
Entrance feature: šįµ¢į¶ = g(ššįµ¢į¶ + šįµ¢į¶ );
Refined partial feature: ĪØįµ¢=g(šįµ¢į¶ šįµ¢Ź³+ šįµ¢Ź³), where šįµ¢Ź³ is random; -
Output weights: šįµ¢įµ=(š/C + šŖįµšŖ)ā»Ā¹šŖįµā š,
where C is hyperparameter for regularization, and (š/C + šŖįµšŖ)ā»Ā¹šŖįµ is the pseudoinverse of output vector šŖ (label?)
Error: šįµ¢ = š - šįµ¢įµ -
š is the desired matrix generating šįµ¢ by: šįµ¢ā šįµ¢įµ=šįµ¢, so
šįµ¢ = šįµ¢ā (I/C + (šįµ¢įµ)įµšįµ¢įµ)ā»Ā¹(šįµ¢įµ)įµ -
Refinement layer weights of next R-SNN:
šįµ¢āāŹ³ = (I/C +šįµ¢įµšįµ¢)ā»Ā¹šįµ¢įµ ā gā»Ā¹(šįµ¢),
because g(šįµ¢āāā šįµ¢āāŹ³+ šįµ¢āāŹ³) = šįµ¢.
Next partial feature: ĪØįµ¢āā = g(šįµ¢āāā šįµ¢āāŹ³+ šįµ¢āāŹ³) -
Accumulate the partial feature to the optimal feature: šŖįµ¢āā = šŖįµ¢ + ĪØįµ¢āā
-
Update error: šįµ¢āā = šįµ¢-šįµ¢įµšŖįµ¢
Repeat steps 4-6 L-2 times, and the final feature šŖ$_L$ is the generalized feature correponding to the best output parameter š $_Lįµ$ for classification.