Illustration for SNN

Single-hidden-layer neural network:

Optimization objective: Refining the $IW$ (input-layer weights), such that the hidden feature $H$ gets refined.

Iteration 1
- Forward:
  $$ \begin{aligned} H = IW ⋅ X \\\ Y_{pred} = \bm β ⋅ H \end{aligned} $$
- Compute error E:
  $$ E = T - Y_{pred} $$
- Considering there is an imaginary data $P$ resulting in the error $E$ formulated by the equation: $E = \bm β ⋅ P$
  
  Thus, the data $P$ can be solved by the pseudo-inverse of $\bm β$:
  $$ P = \bm β⁺ ⋅ E $$
- Using the $P$ to solve a “supplemental IW” $IW_{supp}$ by considering the relationship: $P = IW_{supp} ⋅ X$
  $$ IW_{supp} = X⁺ ⋅ P $$
- Update the input-layer weight by adding the supplemental $IW_{res}$ to the initial $IW$
  $$ IW = IW + IW_{supp} $$
- Update $\bm β$ based on the updated $IW$ and the equation $T = \bm β ⋅ H$:
  $$ \begin{aligned} H = IW ⋅ X \\\ \bm β = H⁺ ⋅ T \end{aligned} $$
- Compute the error at present:
  $$ \begin{aligned} Y_{pred} = \bm β ⋅ H \\\ E = T - Y_{pred} \end{aligned} $$
Iteration 2:
- Compute $P$
- Compute supplemental $IW_{res}$
- Update $IW$
- Update $β$
- Compute $E$
Iteration 3:
- Perform the same 5 steps

The following may be wrong

This morning, I forgot the model architecture consists of 2 weight matrices. So, what I told you this morning is only refining a single weight matrix:

Given an equation: $Y_{pred} = A ⋅ X$, one wants to find the coefficient matrix A.

\begin{algorithm}
\caption{SNN}
\begin{algorithmic}
\STATE \COMMENT{The optimal A sovled by least squares with Moore-Penrose inverse:}
\STATE $A = X⁺ ⋅ Y_{pred}$
\STATE \COMMENT {There are still some errors E:}
\STATE $E = T - Y_{pred}$
\STATE \COMMENT {By considering the E is attributed to an imaginary data P, there is: $E = A ⋅ P$}
\STATE \COMMENT {The data P can be solved as:}
\STATE $P = A⁺ ⋅ E$

\STATE \COMMENT {To fit the error E, we can use another coeff. matrix A₂ and the equation: $E = A_2 ⋅ P$}

\STATE \COMMENT {So, the A₂ can be solved as:}
\STATE $A_2 = P⁺ ⋅ E$

\STATE \COMMENT {Update A:}
\STATE $A = A + A_2$

\STATE \COMMENT {Compute the new error:}
\STATE $E = T - A ⋅ X$
\STATE Go to line \#5.

\end{algorithmic}
\end{algorithm}

(2024-09-30)

训练经典的神经网络中的参数，是使用反向传播和梯度下降这一套优化方法，而训练子网络是不断引入新的权重，每一次迭代是用新权重去“解释”残差，最后把所有的权重合并起来。

Doubt Priviledge

Question:

What are the fundamental difference betwen deep neural networks and single layer wide networks?
Which one is better?
Why people all like deep nets, rather than wide nets?

Spculations:

(2024-11-11)

An accuate inverse matrix is not easy to obtained.
- SLFNN needs to calculate the inverse of a giant matrix, which is difficult and not accurate.
  
  Also, the weight matrix could be singular. this may bring error when computing inverse, although a singular matrix can be inversible by using a regularization term.
  
  Hence, the precison of the optimized parameters could be low.

Table of contents

The following may be wrong

Doubt Priviledge