Authors: Yimin Yang; Yaonan Wang; Xiaofang Yuan
Try to summary:
(2023-02-15) B-ELM is a variant of I-ELM by dividing the hidden nodes into 2 types: odd and even,
which differ in wehther the input parameters (๐,b) is randomly generated or calculated by twice inverse operations.
Specifically, the first node is solved from the target ๐ญ. Then for subsequent nodes, the even node is solved based on the residual error of its previous odd node.
- The output weights of the odd nodes are calculated as: ๐โโโโสณ + eโโโโ (or ๐ญ) โ ๐โโโโ (โeโโโโ)
- The input weights ๐ and bias b of even nodes are solved based on the residual error: eโโโโ โ ๐โโแต โ ๐,b (โ ^๐โโแต โ ๐โโ โ eโโ)
- Note: the superscript r and e stand for “random” and “error” marking the source of H.
Abstract
This algorithm tends to reduce network output error to 0
by solving the input weights ๐ and bias b based on the network residual error. (In other words, the residual error is represented by a, b of subsequent nodes. Or the error is absorbed by others parameters besides ๐.)
โ . Introduction
For ELM with a fixed structure, the best number of hidden nodes need to ffound by trial-and-error, because the residual error is not always decreasing when there are more hidden nodes in an ELM.
For incremental ELM, the hidden node is added one by one, so the residual error keeps decreasing. But the model training has to do multiple iterations, i.e., calculating inverse matrix is needed after adding each node.
Compared with other incremental ELM, this method is
- Faster and with fewer hidden nodes
- showing the relationship between the network output residual error ๐ and output weights ๐, which is named “error-output weights ellipse”
- The hidden layer (input weights) with determined parameters instead of random numbers would make the error reduce, or improve the accuracy.
โ ก. Preliminaries and Notation
A. Notations and Definitions
โจu,vโฉ = โซโu(๐ฑ)vโพ(๐ฑ)d๐ฑ is the Frobenius inner product of two matrices u,v, where the overline denotes the complex conjugate.
B. I-ELM
Lemma 1(proved by Huangโธ): indicated that (For the incremental ELM,) the target function can be approximated with more nodes added into the network by reducing the residual error to 0, as the ๐ of each new node is calculated based on the error of the network last status eโโโ as:
๐ = $\frac{โจeโโโ, ๐โสณโฉ}{โ๐โสณโยฒ}$
The numerator is the inner product measuring the distance from the nth (random) hidden node ๐โสณ to be added and the error of the network with n-1 hidden nodes.
So the output of the newly added hidden nodes ๐โสณ are getting smaller and smaller, because they are learning something from the residual error.
โ ข. Proposed Bidirectional ELM Method
A. Structure of the Proposed Bidirectional ELM Method
Two types of node, the node with odd index {2n-1} has random ๐,b, while the ๐,b of the node with even index {2n} are calculated based on the residual error of the network with an odd number of nodes at the last moment.
Their output weights both are calculated based on Lemma 1. The ๐ of the even node is from residual error, not from the random a,b.
-
The odd node aims to approximate the target through ๐โโโโสณฮฒโโโโ, where ๐โโโโ is yield based on random a,b; -
The odd node 2n-1 approximates the previous residual error with random generated a,b;
-
But the even node approximates the residual error ๐โโโโ through ๐โโแตฮฒโโ with calculated a,b, where ๐โโแต is yield with the weights a,b solved based on the residual error ๐โโโโ from the target (the job hasn’t done by all the previous nodes)
Bi-direction means the approximation is learned from both the target and the error,
where the odd node (ฮฒโโโโ) solved by the target, while the even node (ฮฒโโ) calculated by the error.
So a pair of odd node and even node is a complete step toward to the target.
Bi-directional means Hโโแต is calculated first from eq.(6); Then it is calculated again using the ^a,^b, which are solved based on Hโโแต, to get the updated ^Hโโแต, which is used to calculate the output weight for the next random odd node based on the Lemma 1.
The dot lines represent the inverse calculation.
ฮฒโโโโ is derived from the network residual error of the last status.
Block diagram:
B. Bidirectional ELM method
Theorem 1 states the target function ๐ is approximated by the existing network ๐โโโโ plus the last two nodes: ๐โโโโสณฮฒโโโโ and ๐โโแตฮฒโโ, when nโโ. On the other hand, the residual error the network is 0:
$lim_{nโโ}โ๐-(๐โโโโ + ๐โโโโสณฮฒโโโโ + ๐โโแตฮฒโโ)โ = 0$
where the sequence of the ๐โโแต (the output of the even node calculated from the feedback error) is determined by the inverse of last output weight ฮฒโโโโ:
$๐โโแต = eโโโโ โ (ฮฒโโโโ)โปยน$ โ (6)
That means $๐โโแต โ ฮฒโโโโ = eโโโโ$, this even node is approaching the last residual error based on the known output weight ฮฒโโโโ.
Then its output weight can be calculated based on the Lemma 1 as:
$ฮฒโโ = \frac{โจeโโโโ, ๐โโแตโฉ}{โ๐โโแตโยฒ}$ โ (7)
Once this even node is determined, the corresponding residual error eโโcan be generated, and also the output weight of the next odd node (with ๐โโโโสณ from random weights) can be calculated based on Lemma 1:
$ฮฒโโโโ = \frac{โจeโโ, ๐โโโโสณโฉ}{โ๐โโโโสณโยฒ}$ โ (8)
Theorem 2 further states the updated ^๐โโแต calculated with the optimal ^a, ^b solved based on ๐โโแต by the least-square (pseudo inverse) can also let the residual error converge to 0.
Remark 1 clarifies the differences between this method and I-ELM, that is the input weights and bias of even nodes are calculated, not randomly generated. And the output weights are set similarly based on Lemma 1.
Based on the eq. 6, the ฮ = โeโโโโโยฒ + โeโโโยฒ can be reformalized to an ellipse curve.
Code
Training needs to store parameters for each node. And testing needs to query each node sequentially.