read: Nets - SLFN | B-ELM

Bidirectional Extreme Learning Machine for Regression Problem (TNNLS 2012)

Authors: Yimin Yang; Yaonan Wang; Xiaofang Yuan

Code | TNNLS (2012-06-20)

Try to summary:
(2023-02-15) B-ELM is a variant of I-ELM by dividing the hidden nodes into 2 types: odd and even, which differ in wehther the input parameters (๐š,b) is randomly generated or calculated by twice inverse operations.

Specifically, the first node is solved from the target ๐ญ. Then for subsequent nodes, the even node is solved based on the residual error of its previous odd node.

  • The output weights of the odd nodes are calculated as: ๐‡โ‚‚โ‚™โ‚‹โ‚สณ + eโ‚‚โ‚™โ‚‹โ‚‚ (or ๐ญ) โž” ๐›ƒโ‚‚โ‚™โ‚‹โ‚ (โž”eโ‚‚โ‚™โ‚‹โ‚)
  • The input weights ๐š and bias b of even nodes are solved based on the residual error: eโ‚‚โ‚™โ‚‹โ‚ โž” ๐‡โ‚‚โ‚™แต‰ โž” ๐š,b (โž” ^๐‡โ‚‚โ‚™แต‰ โž” ๐›ƒโ‚‚โ‚™ โž” eโ‚‚โ‚™)
  • Note: the superscript r and e stand for “random” and “error” marking the source of H.

Abstract

This algorithm tends to reduce network output error to 0

by solving the input weights ๐š and bias b based on the network residual error. (In other words, the residual error is represented by a, b of subsequent nodes. Or the error is absorbed by others parameters besides ๐›ƒ.)

โ… . Introduction

For ELM with a fixed structure, the best number of hidden nodes need to ffound by trial-and-error, because the residual error is not always decreasing when there are more hidden nodes in an ELM.

For incremental ELM, the hidden node is added one by one, so the residual error keeps decreasing. But the model training has to do multiple iterations, i.e., calculating inverse matrix is needed after adding each node.

Compared with other incremental ELM, this method is

  1. Faster and with fewer hidden nodes
  2. showing the relationship between the network output residual error ๐ž and output weights ๐›ƒ, which is named “error-output weights ellipse”
  3. The hidden layer (input weights) with determined parameters instead of random numbers would make the error reduce, or improve the accuracy.

โ…ก. Preliminaries and Notation

A. Notations and Definitions

โŸจu,vโŸฉ = โˆซโ‚“u(๐ฑ)vโ€พ(๐ฑ)d๐ฑ is the Frobenius inner product of two matrices u,v, where the overline denotes the complex conjugate.

B. I-ELM

Lemma 1(proved by Huangโธ): indicated that (For the incremental ELM,) the target function can be approximated with more nodes added into the network by reducing the residual error to 0, as the ๐›ƒ of each new node is calculated based on the error of the network last status eโ‚™โ‚‹โ‚ as:

๐›ƒ = $\frac{โŸจeโ‚™โ‚‹โ‚, ๐‡โ‚™สณโŸฉ}{โ€–๐‡โ‚™สณโ€–ยฒ}$

The numerator is the inner product measuring the distance from the nth (random) hidden node ๐‡โ‚™สณ to be added and the error of the network with n-1 hidden nodes.

So the output of the newly added hidden nodes ๐‡โ‚™สณ are getting smaller and smaller, because they are learning something from the residual error.

โ…ข. Proposed Bidirectional ELM Method

A. Structure of the Proposed Bidirectional ELM Method

Two types of node, the node with odd index {2n-1} has random ๐š,b, while the ๐š,b of the node with even index {2n} are calculated based on the residual error of the network with an odd number of nodes at the last moment.

Their output weights both are calculated based on Lemma 1. The ๐‡ of the even node is from residual error, not from the random a,b.

  • The odd node aims to approximate the target through ๐‡โ‚‚โ‚™โ‚‹โ‚สณฮฒโ‚‚โ‚™โ‚‹โ‚, where ๐‡โ‚‚โ‚™โ‚‹โ‚ is yield based on random a,b;

  • The odd node 2n-1 approximates the previous residual error with random generated a,b;

  • But the even node approximates the residual error ๐žโ‚‚โ‚™โ‚‹โ‚ through ๐‡โ‚‚โ‚™แต‰ฮฒโ‚‚โ‚™ with calculated a,b, where ๐‡โ‚‚โ‚™แต‰ is yield with the weights a,b solved based on the residual error ๐žโ‚‚โ‚™โ‚‹โ‚ from the target (the job hasn’t done by all the previous nodes)

Bi-direction means the approximation is learned from both the target and the error, where the odd node (ฮฒโ‚‚โ‚™โ‚‹โ‚) solved by the target, while the even node (ฮฒโ‚‚โ‚™) calculated by the error. So a pair of odd node and even node is a complete step toward to the target.

Bi-directional means Hโ‚‚โ‚™แต‰ is calculated first from eq.(6); Then it is calculated again using the ^a,^b, which are solved based on Hโ‚‚โ‚™แต‰, to get the updated ^Hโ‚‚โ‚™แต‰, which is used to calculate the output weight for the next random odd node based on the Lemma 1.

flowchart LR subgraph in[inputs] x1 & xe["โ‹ฎ"] & xn end rand(("random\n ๐š,b")) calculated(("solved\n ๐š,b")) subgraph hid[hidden] H1 & he["โ‹ฎ"] & h2n-1 & h2n end x1 & xe & xn --> rand --> h2n-1["๐‡โ‚‚โ‚™โ‚‹โ‚สณ"] ---|Lemma 1| ฮฒ2n-1(("ฮฒโ‚‚โ‚™โ‚‹โ‚"))--> e2n-1["eโ‚‚โ‚™โ‚‹โ‚"] x1 & xe & xn --> calculated --> h2n["๐‡โ‚‚โ‚™แต‰"] ---|Lemma 1| ฮฒ2n(("ฮฒโ‚‚โ‚™"))--> e2n["eโ‚‚โ‚™"] h2n -.- |"โฌ… inv of\n ฮฒโ‚‚โ‚™โ‚‹โ‚"| e2n-1 calculated -.-|"โฌ… inv of\n ๐ฑ"| h2n

The dot lines represent the inverse calculation.
ฮฒโ‚‚โ‚™โ‚‹โ‚ is derived from the network residual error of the last status.


Block diagram:

flowchart TB init[Initialization: \n Given training set,\n expect accracy ฮท and \n let #hidden nodes L=0] --> incre[L = L+1] --> OdEv{"Is L \n odd or even?"} OdEv -->|L=2n+1| I-ELM --> calcE[Calculate \n residual error E] OdEv -->|L=2n| Theorem2 --> update["Update ^H_L = ^๐š_L ๐ฑ + ^b)\n and calculate the output weight \n ฮฒ_L based on eq.(7)"] --> calcE subgraph Theorem2 direction TB calcH["Calculate output matrix \n H_L basd on eq.(6)"] --> calcab["Calculate hidden-node parameters \n (^๐š_L,^b_L) based on eq.(14)"] end subgraph I-ELM direction TB rand["Randomly assign hidden-node\n parameters (๐š_L,b_L) \n and obtain\n output matrix H_L"] --> |Lemma 1| outW["Calculate the output weight \n ฮฒL according to eq.(8)"] end calcE --> thres{"โ€–Eโ€–<ฮท"} --> |Yes| END %%thres ---->|No| incre %% mess up the chart incre o---o|No| thres

B. Bidirectional ELM method

Theorem 1 states the target function ๐‘“ is approximated by the existing network ๐‘“โ‚‚โ‚™โ‚‹โ‚‚ plus the last two nodes: ๐‡โ‚‚โ‚™โ‚‹โ‚สณฮฒโ‚‚โ‚™โ‚‹โ‚ and ๐‡โ‚‚โ‚™แต‰ฮฒโ‚‚โ‚™, when nโ†’โˆž. On the other hand, the residual error the network is 0:

$lim_{nโ†’โˆž}โ€–๐‘“-(๐‘“โ‚‚โ‚™โ‚‹โ‚‚ + ๐‡โ‚‚โ‚™โ‚‹โ‚สณฮฒโ‚‚โ‚™โ‚‹โ‚ + ๐‡โ‚‚โ‚™แต‰ฮฒโ‚‚โ‚™)โ€– = 0$

where the sequence of the ๐‡โ‚‚โ‚™แต‰ (the output of the even node calculated from the feedback error) is determined by the inverse of last output weight ฮฒโ‚‚โ‚™โ‚‹โ‚:

$๐‡โ‚‚โ‚™แต‰ = eโ‚‚โ‚™โ‚‹โ‚ โ‹…(ฮฒโ‚‚โ‚™โ‚‹โ‚)โปยน$ โ€ƒ (6)

That means $๐‡โ‚‚โ‚™แต‰ โ‹… ฮฒโ‚‚โ‚™โ‚‹โ‚ = eโ‚‚โ‚™โ‚‹โ‚$, this even node is approaching the last residual error based on the known output weight ฮฒโ‚‚โ‚™โ‚‹โ‚.

Then its output weight can be calculated based on the Lemma 1 as:

$ฮฒโ‚‚โ‚™ = \frac{โŸจeโ‚‚โ‚™โ‚‹โ‚, ๐‡โ‚‚โ‚™แต‰โŸฉ}{โ€–๐‡โ‚‚โ‚™แต‰โ€–ยฒ}$ โ€ƒ (7)

Once this even node is determined, the corresponding residual error eโ‚‚โ‚™can be generated, and also the output weight of the next odd node (with ๐‡โ‚‚โ‚™โ‚Šโ‚สณ from random weights) can be calculated based on Lemma 1:

$ฮฒโ‚‚โ‚™โ‚Šโ‚ = \frac{โŸจeโ‚‚โ‚™, ๐‡โ‚‚โ‚™โ‚Šโ‚สณโŸฉ}{โ€–๐‡โ‚‚โ‚™โ‚Šโ‚สณโ€–ยฒ}$ โ€ƒ (8)

Theorem 2 further states the updated ^๐‡โ‚‚โ‚™แต‰ calculated with the optimal ^a, ^b solved based on ๐‡โ‚‚โ‚™แต‰ by the least-square (pseudo inverse) can also let the residual error converge to 0.

Remark 1 clarifies the differences between this method and I-ELM, that is the input weights and bias of even nodes are calculated, not randomly generated. And the output weights are set similarly based on Lemma 1.

Based on the eq. 6, the ฮ” = โ€–eโ‚‚โ‚™โ‚‹โ‚โ€–ยฒ + โ€–eโ‚‚โ‚™โ€–ยฒ can be reformalized to an ellipse curve.

Code

Training needs to store parameters for each node. And testing needs to query each node sequentially.

Reference