Table of contents
- Wikipedia-ELM
- Controversy: RBF (1980s) raised the similar idea of ELM.
- Dispute about the originality of ELM: Origins of ELM
- Portal of ELM
- python toolbox: hpelm
Facts
ELM is ā½Ā¹ā¾
-
a type of single hidden layer feedforward neural network (SLFN).
-
The parameters (š°,b) between input layer and hidden layer are set randomly.
Thus, for N input n-dimensional samples and L hidden nodes, the output of the hidden layer is $š = š_{NĆn} š_{nĆL}+š\_{nĆL}$ -
Only the number of hidden nodes needs to be predefined manually without other hyper-parameters.
-
The output weights are initialized randomly and solved based on the pseudo inverse matrix in one-shot.
-
For a n-dimensional sample š±ā±¼ and its target šā±¼=[tįµ¢ā, tįµ¢ā, …, tįµ¢ā]įµā āįµ,
the output of ELM with L hidden nodes is šØā±¼ = āįµ¢āāᓸ šįµ¢ g(š°įµ¢įµā š±ā±¼ + bįµ¢), where- g(ā ) is activation function;
- šįµ¢ is the weights of the ith ouput unit: šįµ¢=[βᵢā, βᵢā, …, βᵢā]įµ;
- š°įµ¢ is input weight: š°įµ¢=[wįµ¢ā, wįµ¢ā, …, wįµ¢ā]įµ;
- š±ā±¼ is a n-dimensional input: š±ā±¼=[xįµ¢ā, xįµ¢ā, …, xįµ¢ā]įµā āāæ;
- bįµ¢ is the bias of the ith hidden unit;
- šØā±¼ is a m-dimensional vector: šØā±¼=[oįµ¢ā, oįµ¢ā, …, oįµ¢ā]įµā āįµ;
-
The ideal parameters (š°,b,š) should satisfy:
āįµ¢āāᓸ šįµ¢ g(š°įµ¢įµā š±ā±¼ + bįµ¢) = šā±¼
For total N samples, this mapping can be reforomalized with matrices:
$š\_{NĆL} \pmb\beta\_{LĆm} = š\_{NĆm}$, where-
š is the output of the hidden layer for N samples:
$$š(š°ā,...,š°_L, bā,...,b_L, š±ā,...š±_L) = \\\ \begin{bmatrix} g(š°āā š±ā+bā) & \dots & g(š°_Lā š±ā+b\_L)\\\ \vdots & \ddots & \vdots\\\ g(š°āā š±_N+bā) & \dots & g(š°_Lā š±_N+b\_L) \end{bmatrix}_{NĆL}$$
-
š is the output weights matrix:
[ šāįµ ; … ; š$\_Lįµ ]_{LĆm}$ -
Target data: š = $\begin{bmatrix}šāįµ\\\ \vdots \\\š_Nįµ\end{bmatrix}_{NĆm}$
-
-
Generally, $š\_{NĆm}$ is not a square matrix (not invertible). Hence, š=šā»Ā¹š cannot be applied. However, the optimal š can be approached by minimizing the traning error iteratively: āā±¼āāį“ŗāšØā±¼-šā±¼ā.
-
Best estimation: $\\^š°įµ¢, \\^bįµ¢$, ^šįµ¢ satisfy:
āš(^š°įµ¢, ^bįµ¢)ā ^šįµ¢- šā = min_{š°įµ¢, bįµ¢, šįµ¢} āš(š°įµ¢, bįµ¢)ā šįµ¢- šā, where i=1,…,L -
Loss function: J = āā±¼āāᓺ (āįµ¢āāᓸ šįµ¢ā g(š°įµ¢ā š±ā±¼ + bįµ¢) - šā±¼)²
-
Solve š based on the āJ/āš=0, such that the optimal parameter is:
^š = $š^ā š$ = (šįµš)ā»Ā¹šįµ š,
where $š^ā $ is the Moore-Penrose inverse (Pseudo-inverse) of š.
It can be proved that the norm of ^š is the smallest and unique solution (for a set of random (š°įµ¢, bįµ¢)).
Moore-Penrose inverse
Also called pseudoinverse or generalized inverse ā½Ā²ā¾.
(bilibili search: “ä¼Ŗéē©éµ”) 深度å¦ä¹ -åč±ä¹¦0103ä¼Ŗéē©éµęå°äŗä¹
(DDG search: “ä¼Ŗéē©éµ”)
ä¼Ŗéē©éµēęä¹åę±ę³ļ¼ - ē„ä¹
numpy.linalg.pinv()
- pinv(š) = (šįµ š)ā»Ā¹ šįµ
- pinv(š) š = š pythonä¹numpyä¹ä¼Ŗénumpy.linalg.pinv - åč”ē¾č” - CSDN
Example Code
This matlab code ā½Ā¹ā¾ trains and tests a ELM on the NIR spectra dataset (regression) and the Iris dataset (classification).
- Note that each column is a sample, and each row is an attribute/feature.
Notations:
- Q: number of samples
- R: input features
- S: output features
- $P\_{RĆQ}$: input pattern matrix
- $T\_{SĆQ}$: target data matrix
- N: number of hidden nodes
- TF: transfer function
- $IW\_{NĆR}$: input weights matrix
- $B\_{NĆQ}$: bias matrix
- $LW\_{NĆS}$: transposed output weights matrix
Train (calculate the LW):
- $tempH\_{NĆQ} = IW\_{NĆR}ā P\_{RĆQ} + B\_{NĆQ}$
- $H\_{NĆQ} = TF(tempH)$
- $LW\_{SĆN} = T\_{SĆQ}$ā pinv(H), based on: š$\_{SĆN} š\_{NĆQ} = š\_{SĆQ}$
Test:
- $tempH\_{NĆQ} = IW\_{NĆR}ā P\_{RĆQ} + B\_{NĆQ}$
- $H\_{NĆQ} = TF(tempH)$
- $Y\_{SĆQ} = LW\_{SĆN}ā H\_{NĆQ}$
Example code (py)
Build an Extreme Learning Machine in Python | by Glenn Paul Gara … searched by DDG: “incremental elm python”
I-ELM
incremental just means adding neurons?
OS-ELM
On-line elm