Illustration for SNN

Table of contents

Single-hidden-layer neural network:

D a X t a I n w p e u i I t g W h l t a s y e r H f i e H d a d t e u n r e O u w p e 𝛃 u i t g h l t a s y e r = T a T r g e t

Optimization objective: Refining the $IW$ (input-layer weights), such that the hidden feature $H$ gets refined.

  1. Iteration 1

    • Forward:

      $$ \begin{aligned} H = IW ⋅ X \\\ Y_{pred} = \bm β ⋅ H \end{aligned} $$
    • Compute error E:

      $$ E = T - Y_{pred} $$
    • Considering there is an imaginary data $P$ resulting in the error $E$ formulated by the equation: $E = \bm β ⋅ P$

      Thus, the data $P$ can be solved by the pseudo-inverse of $\bm β$:

      $$ P = \bm β⁺ ⋅ E $$
    • Using the $P$ to solve a “supplemental IW” $IW_{supp}$ by considering the relationship: $P = IW_{supp} ⋅ X$

      $$ IW_{supp} = X⁺ ⋅ P $$
    • Update the input-layer weight by adding the supplemental $IW_{res}$ to the initial $IW$

      $$ IW = IW + IW_{supp} $$
    • Update $\bm β$ based on the updated $IW$ and the equation $T = \bm β ⋅ H$:

      $$ \begin{aligned} H = IW ⋅ X \\\ \bm β = H⁺ ⋅ T \end{aligned} $$
    • Compute the error at present:

      $$ \begin{aligned} Y_{pred} = \bm β ⋅ H \\\ E = T - Y_{pred} \end{aligned} $$
  2. Iteration 2:

    • Compute $P$
    • Compute supplemental $IW_{res}$
    • Update $IW$
    • Update $β$
    • Compute $E$
  3. Iteration 3:

    • Perform the same 5 steps

The following may be wrong

This morning, I forgot the model architecture consists of 2 weight matrices. So, what I told you this morning is only refining a single weight matrix:

Given an equation: $Y_{pred} = A ⋅ X$, one wants to find the coefficient matrix A.

\begin{algorithm}
\caption{SNN}
\begin{algorithmic}
\STATE \COMMENT{The optimal A sovled by least squares with Moore-Penrose inverse:}
\STATE $A = X⁺ ⋅ Y_{pred}$
\STATE \COMMENT {There are still some errors E:}
\STATE $E = T - Y_{pred}$
\STATE \COMMENT {By considering the E is attributed to an imaginary data P, there is: $E = A ⋅ P$}
\STATE \COMMENT {The data P can be solved as:}
\STATE $P = A⁺ ⋅ E$

\STATE \COMMENT {To fit the error E, we can use another coeff. matrix A₂ and the equation: $E = A_2 ⋅ P$}

\STATE \COMMENT {So, the A₂ can be solved as:}
\STATE $A_2 = P⁺ ⋅ E$

\STATE \COMMENT {Update A:}
\STATE $A = A + A_2$

\STATE \COMMENT {Compute the new error:}
\STATE $E = T - A ⋅ X$
\STATE Go to line \#5.

\end{algorithmic}
\end{algorithm}

(2024-09-30)

  • 训练经典的神经网络中的参数,是使用反向传播和梯度下降这一套优化方法, 而训练子网络是不断引入新的权重,每一次迭代是用新权重去“解释”残差,最后把所有的权重合并起来。

Doubt Priviledge

Question:

  1. What are the fundamental difference betwen deep neural networks and single layer wide networks?

  2. Which one is better?

  3. Why people all like deep nets, rather than wide nets?


Spculations:

(2024-11-11)

  1. An accuate inverse matrix is not easy to obtained.

    • SLFNN needs to calculate the inverse of a giant matrix, which is difficult and not accurate.

      Also, the weight matrix could be singular. this may bring error when computing inverse, although a singular matrix can be inversible by using a regularization term.

      Hence, the precison of the optimized parameters could be low.

Built with Hugo
Theme Stack designed by Jimmy