โ . Curve Fitting
Sec 4.1: Classic Curve Fitting and Least-Squares Regression
Regression โ curve fitting โ ๐๐ฑ=๐ (Least-square fitting), where ๐ is the data, ๐ฑ is the parameters of the model, ๐ is the target.
Regression: Fitting a model to some data with some parameters
Over- and Under-determined
Models could be Over- and Under-determined.
-
Over-determined system normally has No solution.
There’re massive constraints (equations, samples) given, but the complexity (#variables) of the system is not enough to describe the existing data.
An over-determined system can be “No solution” or “Infinitely many solution” Over~ wiki
-
Homogeneous case: (no bias)
- The coefficient matrix is a tall, skinny matrix, the “all-zero solution” always holds.
- If there’re enough equations are dependent, and after eliminating #non-zero row $<$ #cols in the coefficient matrix in row-echelon form, this over-determined homogeneous system has “infinitely many solutions” (including all-0 solution). Otherwise, “all-zero” is the only solution.
-
Non-homogeneous case:
-
If the last non-zero row of the augmented matrix in row echelon form is only having the constant entry of the last column (giving an equaltion 0=c), this system has “No solution”.
Since it’s a tall matrix, this case is likely to happen:
$$ \left[ \begin{array}{cc|c} 2 & 1 & -1 \\ -3 & 1 & -2 \\ -1 & 1 & 1 \\ \end{array} \right] \rightarrow \left[ \begin{array}{cc|c} 2 & 1 & -1 \\ 0 & 5/2 & -7/2 \\ 0 & 3/2 & 1/2 \\ \end{array} \right] \rightarrow \left[ \begin{array}{cc|c} 2 & 1 & -1 \\ 0 & 5/2 & -7/2 \\ 0 & 0 & 13/5 \\ \end{array} \right] $$
-
Since the equations are way more than unknowns, the coefficient matrix in row echelon form is most likely not having: #non-zero rows = #cols. So the case “single unique solution” is almost impossible, but there’ll be “Infinite many solutions”.
-
Unless, enough rows can be eliminated (lines overlap), and the remaining coefficient matrix has the same rank as the augmented matrix, and also the #non-zero rows = #cols in the coefficient matrix, this system has “single unique solution”.
-
(2024-05-31)
-
ๅฆๆ็บฆๆๅคชๅค๏ผๅฐฑๅๆโ่ถ ๅฎ็ณป็ปโไบ๏ผ้ๅธธๆ ่งฃ๏ผ ๆญคๆถๅฐฑ่ฆๅผๅ ฅๆดๅค็โๆช็ฅ้โ๏ผๅๆไธไธชโๆฌ ๅฎ็ณป็ปโ๏ผๅฐฑๆๆ ็ฉทๅค่งฃไบ
-
ไนๅๆ่ฏด่ๅผ้กพๅฟ็ไธ่ฅฟๅคชๅค๏ผๆ ่งฃ๏ผๆไปฅ่ฆๅผๅ ฅโๅ้โใๅค็ๅคๅฐ่ฏไธๅ็ไธ่ฅฟ๏ผๆไบบๅฐฑๆๅๆฐ๏ผๅๅ้ใ
-
-
Under-determined system normally has infinitely many solutions.
Because the #parameters is more than equations (samples), some variables are not restricted, which caused the infinitude of solutions. In other words, there’re not enough equations to uniquely determine each unknown. So the problems is how to determine which solution is the best.
An underdetermined linear system has either No solution or Infinitely many solutions.Under~ wiki
- Homogeneous case: The coefficient matrix is a short, fat matrix, #non-zero rows $<$ #cols. So the system always has “Infinitely many solutions” (including all-0 solution).
- Non-homogeneous case:
- Rank of augmented matrix > Rank of coefficient matrix: No solution
- Rank of augmented matrix = Rank of coefficient matrix: There must be some solutions. But since the #row is already less than #cols for the coefficient matrix, the single unique solution (except for all-0) is impossible. So this system indeed has infinitely many potential solutions. $$ \left[ \begin{array}{ccc|c} 1 & 1 & 1 & 1 \\ 1 & 1 & 2 & 3 \end{array} \right] \rightarrow \left[ \begin{array}{ccc|c} 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 2 \end{array} \right] $$
How to solve
-
If there’re infinite optional solutions, then the best model should be selected according to other constraints, i.e., regularizers g(๐ฑ).
For instance, the penalty for L2 norm of the parameters vector โ๐ฑโโ can lead to the model with smallest mse; And the penalty for L1 norm of the parameters vector โ๐ฑโโ can lead to the model with sparse number of parameters.
Therefore, the objective is:
$argmin_๐ฑ g(๐ฑ)$ subject to โ๐๐ฑ-๐โโ=0.
If some error can be tolerated to just find the model that has the minimum L2-norm params, the constraint can be relaxed by a small error, like: โ๐๐ฑ-๐โโโคฮต
-
If there’s no solution, the expected model should have the minimum error.
Therefore, the objective is: $argmin_๐ฑ$ (โ๐๐ฑ-๐โโ).
Adding regularizers can confine the optimizer to find the right model, so the objective can be
$argmin_๐ฑ$ (โ๐๐ฑ-๐โโ + ฮปg(๐ฑ))
Ordinary least squares gives the approximate solutions (when no exact solution exists) or the exact solution (when it exists) by minimizing the square error:
$argmin_๐ฑ$ โ๐๐ฑ-๐โ
Its solution is ๐ฑ = (๐แต๐)โปยน๐แต๐. And using QR factorization of A to solve the least squares can achieve good numerical accuracy.
More broadly, this generic architecture can be generalized to non-linear regressions by replacing โ๐๐ฑ-๐โโ to non-linear constraints f(๐,๐ฑ,๐):
$$ argmin_๐ฑ ( f(๐,๐ฑ,๐) + ฮปg(๐ฑ)) \quad \text{(Over-determined) or} \\ argmin_๐ฑ g(๐ฑ) \ \text{subject to } f(๐,๐ฑ,๐) โค ฮต \quad \text{(Under-determined)} $$
An over-determined non-linear system can be solved by Gauss-Newton iteration
Over- and Under-fitting
Two canonical issues:
-
Over-fitting: With enhencing the model complexity, the training error keeps dropping, but the evaluating error on withhold data goes up.
-
Under-fitting: No matter how you increase the model complexity (parameters), the training error doesn’t drop, either that’s a bad model or there’s not enough data.
Regression framework
Generic Regression: ๐ = ๐(๐,๐), input ๐ into model ๐, then target ๐ can be obtained.
Regression needs 2 things: select a model ๐ and find its parameters beta that can map X to Y.
Different norms are used to measure the error: Lโ norm, L1 norm, L2 norm. And the selected norm (error metric) has big impact on the “Goodness of Fit”.
โ ก. Nonlinear Regression
Sec 4.2: Nonlinear Regression and Gradient Descent
โ ข. Regularization
Sec 4.3: Over- and Under-determined Systems; Code demo
โ ฃ. Optimization
Sec 4.4: Optimization for regression
Optimization is the cornerstone of regression.
- Optimization is to minimize some objective function.
- Regression is finding the parameters of the given model that maps the input to the output.
Regularization is Critical
- Regularizers are what determine the type solutions obtained.
Simple Example
The data is generated from parabola + noise $f(x) = x^2 + ๐(0,ฯ)$. However, the model in practice is unknown, and regularization can help find better models.
Given 100 Realizations of data. And the model is represented as $๐ = ๐(๐,ฮฒ)$
Small noise results complex predicted models.
Different regression
|
|
pinv(): Pseudo-inverse (Least square with the minimum L2 norm)\(bashslash): QR decompositionlasso()robustfit()()
Least-square fit pinv get very different model for samll pertubation every time and all the coefficients are functioning.
The uncertainty of this solver is high.
The high-dgree terms are highly volatile, which makes the model unstable.
Parsimony (Pareto front) Keep adding higher degree polynomials to the model, the error will no longer decrease at some point and instead even rise up.
Pareto front
Sec 4.7: The Pareto front and Lex Parsimoniae
Small number of parameters can bring interperity