Table of contents
Video 9 - Error and noise 10-18-2021
Outline
- Error measures
- Noisy targets
- Preamble to the theory
Review of Lec 3
Linear Models
-
Using “signal” to classify and regress
-
signal:
$$ \sum_{i=0}^d w_i x_i = \mathbf{w^T x} $$ -
Linear Classification: $h(\mathbf x) = \rm sign(\mathbf{w^T x})$ (把信号传入threshold, PLA, Pocket)
-
Linear Regression: $h(\mathbf x) = \mathbf{w^T x}$ (不把信号传入threshold, one-shot learning)
$\mathbf w = \mathbf{(x^T x)^{-1} x^T} y$
Error measures
-
Quantify the dissimilarity between the output of hypothesis $h$ and the output of the unknown target function $f$.
-
Almost all error measures are pointwise
Compute $h$ and $f$ on individual points $\mathbf x$ using a pointwise error $e(h(\mathbf x), f(\mathbf x))$:
Binary error: $e(h(\mathbf x), f(\mathbf x))= [\![ h(\mathbf x) \neq f(\mathbf x) ]\!]$ (不相等error=1, 相等error=0) (Classification)
Squared error: $e(h(\mathbf x), f(\mathbf x)) = (h(\mathbf x) - f(\mathbf x))^2$ (真实距离) (Regression)
-
In-sample error: $h(x)$ 与 $f(x)$ 在各样本点上的差异
$$ E_{in}(h) = \frac{1}{N} \sum_{n=1}^N e(h(\mathbf x_n), f(\mathbf x_n)) $$Out-of-sample error: $h(x)$ 与 $f(x)$ 在空间所有点上的偏差的期望
$$ E_{out}(h) = \mathbb E_x [e(h(\mathbf x), f(\mathbf x))] $$ -
How to choose the error measure
False accept and False reject
confusion matrix (混淆矩阵):
$$ \begin{array}{c|lcr} & \qquad f (\text{unknown}) & \\ h& +1 & -1 \\ \hline +1 & \text{no error} & \text{false accept} \\ -1 & \text{false reject} & \text{no error} \\ \end{array} $$The error measure is pretty much related to the kind of application with different penalty.
Noisy targets
-
确定的目标分布 $f(\mathbf x) = \mathbb E(y|\mathbf x)$ + 噪声 $y-f(\mathbf x)$
-
有时相同的输入对应不同的标签,所以潜在关系不是一个"函数" $y=f(\mathbf x)$,而是一个分布 $P(y|\mathbf x)$
$\mathbf x$ 按照某种未知的分布 $P(\mathbf x)$ 从空间$\mathcal X$ 中抽取出来。标签 $y$ 服从分布 $P(y|\mathbf x)$。所以输入 $(\mathbf x,y)$ 是由联合分布 $P(\mathbf x) P(y|\mathbf x) = P(\mathbf x,y)$ 产生。
Determistic target 是当 P(y|x)=0 的特殊的noisy target, 那时噪声=0,也就是 $y=f(\mathbf x)$
Preamble to the theory
- Learning is feasible in a probabilitstic sence: $E_{out}(g) \approx E_{in}(g)$
- We need $g\approx f$, which means $E_{out}(g) \approx 0$
- $E_{out}(g) \approx E_{in}(g)$ (Hoeffding Inequality)
- $E_{in}(g) \approx 0$ (PLA, Pocket, Linear classification/regression)