watch: AML 04 | Error and Noise

Table of contents

Video 9 - Error and noise 10-18-2021

Outline

  1. Error measures
  2. Noisy targets
  3. Preamble to the theory

Review of Lec 3

Linear Models

  • Using “signal” to classify and regress

  • signal:

    $$ \sum_{i=0}^d w_i x_i = \mathbf{w^T x} $$
  • Linear Classification: $h(\mathbf x) = \rm sign(\mathbf{w^T x})$ (把信号传入threshold, PLA, Pocket)

  • Linear Regression: $h(\mathbf x) = \mathbf{w^T x}$ (不把信号传入threshold, one-shot learning)

    $\mathbf w = \mathbf{(x^T x)^{-1} x^T} y$

Error measures

  • Quantify the dissimilarity between the output of hypothesis $h$ and the output of the unknown target function $f$.

  • Almost all error measures are pointwise

    Compute $h$ and $f$ on individual points $\mathbf x$ using a pointwise error $e(h(\mathbf x), f(\mathbf x))$:

    Binary error: $e(h(\mathbf x), f(\mathbf x))= [\![ h(\mathbf x) \neq f(\mathbf x) ]\!]$ (不相等error=1, 相等error=0) (Classification)

    Squared error: $e(h(\mathbf x), f(\mathbf x)) = (h(\mathbf x) - f(\mathbf x))^2$ (真实距离) (Regression)

  • In-sample error: $h(x)$ 与 $f(x)$ 在各样本点上的差异

    $$ E_{in}(h) = \frac{1}{N} \sum_{n=1}^N e(h(\mathbf x_n), f(\mathbf x_n)) $$

    Out-of-sample error: $h(x)$ 与 $f(x)$ 在空间所有点上的偏差的期望

    $$ E_{out}(h) = \mathbb E_x [e(h(\mathbf x), f(\mathbf x))] $$
  • How to choose the error measure

    False accept and False reject

    confusion matrix (混淆矩阵):

    $$ \begin{array}{c|lcr} & \qquad f (\text{unknown}) & \\ h& +1 & -1 \\ \hline +1 & \text{no error} & \text{false accept} \\ -1 & \text{false reject} & \text{no error} \\ \end{array} $$

    The error measure is pretty much related to the kind of application with different penalty.

Noisy targets

  • 确定的目标分布 $f(\mathbf x) = \mathbb E(y|\mathbf x)$ + 噪声 $y-f(\mathbf x)$

  • 有时相同的输入对应不同的标签,所以潜在关系不是一个"函数" $y=f(\mathbf x)$,而是一个分布 $P(y|\mathbf x)$

    $\mathbf x$ 按照某种未知的分布 $P(\mathbf x)$ 从空间$\mathcal X$ 中抽取出来。标签 $y$ 服从分布 $P(y|\mathbf x)$。所以输入 $(\mathbf x,y)$ 是由联合分布 $P(\mathbf x) P(y|\mathbf x) = P(\mathbf x,y)$ 产生。

    Determistic target 是当 P(y|x)=0 的特殊的noisy target, 那时噪声=0,也就是 $y=f(\mathbf x)$

Preamble to the theory

  • Learning is feasible in a probabilitstic sence: $E_{out}(g) \approx E_{in}(g)$
  • We need $g\approx f$, which means $E_{out}(g) \approx 0$
    1. $E_{out}(g) \approx E_{in}(g)$ (Hoeffding Inequality)
    2. $E_{in}(g) \approx 0$ (PLA, Pocket, Linear classification/regression)
Built with Hugo
Theme Stack designed by Jimmy