- Wikipedia-ELM
- Controversy: RBF (1980s) raised the similar idea of ELM.
- Dispute about the originality of ELM: Origins of ELM
- Portal of ELM
- python toolbox: hpelm
Facts
ELM is ā½Ā¹ā¾
-
a type of single hidden layer feedforward neural network (SLFN).
-
The parameters (š°,b) between input layer and hidden layer are set randomly.
Thus, for N input n-dimensional samples and L hidden nodes, the output of the hidden layer is $š = š_{NĆn} š_{nĆL}+š_{nĆL}$ -
Only the number of hidden nodes needs to be predefined manually without other hyper-parameters.
-
The output weights are initialized randomly and solved based on the pseudo inverse matrix in one-shot.
-
For a n-dimensional sample š±ā±¼ and its target šā±¼=[tįµ¢ā, tįµ¢ā, …, tįµ¢ā]įµā āįµ,
the output of ELM with L hidden nodes is šØā±¼ = āįµ¢āāᓸ šįµ¢ g(š°įµ¢įµā š±ā±¼ + bįµ¢), where- g(ā ) is activation function;
- šįµ¢ is the weights of the ith ouput unit: šįµ¢=[βᵢā, βᵢā, …, βᵢā]įµ;
- š°įµ¢ is input weight: š°įµ¢=[wįµ¢ā, wįµ¢ā, …, wįµ¢ā]įµ;
- š±ā±¼ is a n-dimensional input: š±ā±¼=[xįµ¢ā, xįµ¢ā, …, xįµ¢ā]įµā āāæ;
- bįµ¢ is the bias of the ith hidden unit;
- šØā±¼ is a m-dimensional vector: šØā±¼=[oįµ¢ā, oįµ¢ā, …, oįµ¢ā]įµā āįµ;
-
The ideal parameters (š°,b,š) should satisfy:
āįµ¢āāᓸ šįµ¢ g(š°įµ¢įµā š±ā±¼ + bįµ¢) = šā±¼
For total N samples, this mapping can be reforomalized with matrices:
$š_{NĆL} \pmb\beta_{LĆm} = š_{NĆm}$, where-
š is the output of the hidden layer for N samples:
$$š(š°ā,…,š°_L, bā,…,b_L, š±ā,…š±_L) = \\ \begin{bmatrix} g(š°āā š±ā+bā) & \dots & g(š°_Lā š±ā+b_L)\\ \vdots & \ddots & \vdots\\ g(š°āā š±_N+bā) & \dots & g(š°_Lā š±_N+b_L) \end{bmatrix}_{NĆL}$$ -
š is the output weights matrix:
[ šāįµ ; … ; š$_Lįµ ]_{LĆm}$ -
Target data: š = $\begin{bmatrix}šāįµ\\ \vdots \\š_Nįµ\end{bmatrix}_{NĆm}$
-
-
Generally, $š_{NĆm}$ is not a square matrix (not invertible). Hence, š=šā»Ā¹š cannot be applied. However, the optimal š can be approached by minimizing the traning error iteratively: āā±¼āāį“ŗāšØā±¼-šā±¼ā.
-
Best estimation: $\^š°įµ¢, \^bįµ¢$, ^šįµ¢ satisfy:
āš(^š°įµ¢, ^bįµ¢)ā ^šįµ¢- šā = min_{š°įµ¢, bįµ¢, šįµ¢} āš(š°įµ¢, bįµ¢)ā šįµ¢- šā, where i=1,…,L -
Loss function: J = āā±¼āāᓺ (āįµ¢āāᓸ šįµ¢ā g(š°įµ¢ā š±ā±¼ + bįµ¢) - šā±¼)²
-
Solve š based on the āJ/āš=0, such that the optimal parameter is:
^š = $š^ā š$ = (šįµš)ā»Ā¹šįµ š,
where $š^ā $ is the Moore-Penrose inverse (Pseudo-inverse) of š.
It can be proved that the norm of ^š is the smallest and unique solution (for a set of random (š°įµ¢, bįµ¢)).
Moore-Penrose inverse
Also called pseudoinverse or generalized inverse ā½Ā²ā¾.
(bilibili search: “ä¼Ŗéē©éµ”) 深度å¦ä¹ -åč±ä¹¦0103ä¼Ŗéē©éµęå°äŗä¹
(DDG search: “ä¼Ŗéē©éµ”)
ä¼Ŗéē©éµēęä¹åę±ę³ļ¼ - ē„ä¹
numpy.linalg.pinv()
- pinv(š) = (šįµ š)ā»Ā¹ šįµ
- pinv(š) š = š pythonä¹numpyä¹ä¼Ŗénumpy.linalg.pinv - åč”ē¾č” - CSDN
Example Code
This matlab code ā½Ā¹ā¾ trains and tests a ELM on the NIR spectra dataset (regression) and the Iris dataset (classification).
- Note that each column is a sample, and each row is an attribute/feature.
Notations:
- Q: number of samples
- R: input features
- S: output features
- $P_{RĆQ}$: input pattern matrix
- $T_{SĆQ}$: target data matrix
- N: number of hidden nodes
- TF: transfer function
- $IW_{NĆR}$: input weights matrix
- $B_{NĆQ}$: bias matrix
- $LW_{NĆS}$: transposed output weights matrix
Train (calculate the LW):
- $tempH_{NĆQ} = IW_{NĆR}ā P_{RĆQ} + B_{NĆQ}$
- $H_{NĆQ} = TF(tempH)$
- $LW_{SĆN} = T_{SĆQ}$ā pinv(H), based on: š$_{SĆN} š_{NĆQ} = š_{SĆQ}$
Test:
- $tempH_{NĆQ} = IW_{NĆR}ā P_{RĆQ} + B_{NĆQ}$
- $H_{NĆQ} = TF(tempH)$
- $Y_{SĆQ} = LW_{SĆN}ā H_{NĆQ}$
Example code (py)
Build an Extreme Learning Machine in Python | by Glenn Paul Gara … searched by DDG: “incremental elm python”
I-ELM
incremental just means adding neurons?
OS-ELM
On-line elm