sum: ELM

Table of contents

Facts

ELM is ⁽¹⁾

  • a type of single hidden layer feedforward neural network (SLFN).

  • The parameters (š°,b) between input layer and hidden layer are set randomly.
    Thus, for N input n-dimensional samples and L hidden nodes, the output of the hidden layer is $š‡ = š—_{NƗn} š–_{nƗL}+š›\_{nƗL}$

  • Only the number of hidden nodes needs to be predefined manually without other hyper-parameters.

  • The output weights are initialized randomly and solved based on the pseudo inverse matrix in one-shot.

  • For a n-dimensional sample š±ā±¼ and its target š­ā±¼=[tᵢ₁, tᵢ₂, …, tįµ¢ā‚˜]įµ€āˆˆ ā„įµ,
    the output of ELM with L hidden nodes is šØā±¼ = āˆ‘įµ¢ā‚Œā‚į“ø š›ƒįµ¢ g(š°įµ¢įµ€ā‹…š±ā±¼ + bįµ¢), where

    • g(ā‹…) is activation function;
    • š›ƒįµ¢ is the weights of the ith ouput unit: š›ƒįµ¢=[βᵢ₁, βᵢ₂, …, βᵢₙ]įµ€;
    • š°įµ¢ is input weight: š°įµ¢=[wᵢ₁, wᵢ₂, …, wᵢₙ]įµ€;
    • š±ā±¼ is a n-dimensional input: š±ā±¼=[xᵢ₁, xᵢ₂, …, xᵢₙ]įµ€āˆˆ ā„āæ;
    • bįµ¢ is the bias of the ith hidden unit;
    • šØā±¼ is a m-dimensional vector: šØā±¼=[oᵢ₁, oᵢ₂, …, oįµ¢ā‚˜]įµ€āˆˆ ā„įµ;
  • The ideal parameters (š°,b,š›ƒ) should satisfy:
    āˆ‘įµ¢ā‚Œā‚į“ø š›ƒįµ¢ g(š°įµ¢įµ€ā‹…š±ā±¼ + bįµ¢) = š­ā±¼
    For total N samples, this mapping can be reforomalized with matrices:
    $š‡\_{NƗL} \pmb\beta\_{LƗm} = š“\_{NƗm}$, where

    • š‡ is the output of the hidden layer for N samples:

      $$š‡(š°ā‚,...,š°_L, b₁,...,b_L, š±ā‚,...š±_L) = \\\ \begin{bmatrix} g(š°ā‚ā‹…š±ā‚+b₁) & \dots & g(š°_Lā‹…š±ā‚+b\_L)\\\ \vdots & \ddots & \vdots\\\ g(š°ā‚ā‹…š±_N+b₁) & \dots & g(š°_Lā‹…š±_N+b\_L) \end{bmatrix}_{NƗL}$$
    • š›ƒ is the output weights matrix:
      [ š›ƒā‚įµ€ ; … ; š›ƒ$\_Lįµ€ ]_{LƗm}$

    • Target data: š“ = $\begin{bmatrix}š“ā‚įµ€\\\ \vdots \\\š“_Nįµ€\end{bmatrix}_{NƗm}$

  • Generally, $š‡\_{NƗm}$ is not a square matrix (not invertible). Hence, š›ƒ=š‡ā»Ā¹š“ cannot be applied. However, the optimal š›ƒ can be approached by minimizing the traning error iteratively: āˆ‘ā±¼ā‚Œā‚į“ŗā€–šØā±¼-š­ā±¼ā€–.

  • Best estimation: $\\^š°įµ¢, \\^bįµ¢$, ^š›ƒįµ¢ satisfy:
    ā€–š‡(^š°įµ¢, ^bįµ¢)ā‹…^š›ƒįµ¢- š“ā€– = min_{š°įµ¢, bįµ¢, š›ƒįµ¢} ā€–š‡(š°įµ¢, bįµ¢)ā‹…š›ƒįµ¢- š“ā€–, where i=1,…,L

  • Loss function: J = āˆ‘ā±¼ā‚Œā‚į“ŗ (āˆ‘įµ¢ā‚Œā‚į“ø š›ƒįµ¢ā‹…g(š°įµ¢ā‹…š±ā±¼ + bįµ¢) - š­ā±¼)²

  • Solve š›ƒ based on the āˆ‚J/āˆ‚š›ƒ=0, such that the optimal parameter is:
    ^š›ƒ = $š‡^† š“$ = (š‡įµ€š‡)ā»Ā¹š‡įµ€ š“,
    where $š‡^†$ is the Moore-Penrose inverse (Pseudo-inverse) of š‡.
    It can be proved that the norm of ^š›ƒ is the smallest and unique solution (for a set of random (š°įµ¢, bįµ¢)).

Moore-Penrose inverse

Also called pseudoinverse or generalized inverse ⁽²⁾.

(bilibili search: “ä¼Ŗé€†ēŸ©é˜µ”) 深度学习-å•ƒčŠ±ä¹¦0103ä¼Ŗé€†ēŸ©é˜µęœ€å°äŗŒä¹˜

(DDG search: “ä¼Ŗé€†ēŸ©é˜µ”)

ä¼Ŗé€†ēŸ©é˜µēš„ę„ä¹‰åŠę±‚ę³•ļ¼Ÿ - ēŸ„ä¹Ž

numpy.linalg.pinv()

Example Code

This matlab code ⁽¹⁾ trains and tests a ELM on the NIR spectra dataset (regression) and the Iris dataset (classification).

  • Note that each column is a sample, and each row is an attribute/feature.

Notations:

  • Q: number of samples
  • R: input features
  • S: output features
  • $P\_{RƗQ}$: input pattern matrix
  • $T\_{SƗQ}$: target data matrix
  • N: number of hidden nodes
  • TF: transfer function
  • $IW\_{NƗR}$: input weights matrix
  • $B\_{NƗQ}$: bias matrix
  • $LW\_{NƗS}$: transposed output weights matrix

Train (calculate the LW):

  • $tempH\_{NƗQ} = IW\_{NƗR}ā‹…P\_{RƗQ} + B\_{NƗQ}$
  • $H\_{NƗQ} = TF(tempH)$
  • $LW\_{SƗN} = T\_{SƗQ}$ā‹… pinv(H), based on: š›ƒ$\_{SƗN} š‡\_{NƗQ} = š“\_{SƗQ}$

Test:

  • $tempH\_{NƗQ} = IW\_{NƗR}ā‹…P\_{RƗQ} + B\_{NƗQ}$
  • $H\_{NƗQ} = TF(tempH)$
  • $Y\_{SƗQ} = LW\_{SƗN}ā‹…H\_{NƗQ}$

Example code (py)

Build an Extreme Learning Machine in Python | by Glenn Paul Gara … searched by DDG: “incremental elm python”

I-ELM

incremental just means adding neurons?

github

OS-ELM

On-line elm

Deep incremental RVFL

Deep incremental random vector functional-link network: A non-iterative constructive sketch via greedy feature learning

Reference

(Back to Top)

Built with Hugo
Theme Stack designed by Jimmy