memo: Vis

Note

GL_PROJECTION matrix: camera space point(xe,ye,-ze) -> perspective projection to near plane -> homo:(xp,yp,1) -> scale -> homo:(xn,yn,zn,1) -> (x_clip,y_clip,z_clip, w_clip) , where ‘xn,yn,zn’ are variables raning from -1 to 1.

Refer to OpenGL Projection Matrix - songho

Normalized Device Coordinates (NDC) is used to determine whether a 3D point can appear on the computer monitor, which is a cube with length 1. The transformation from eye coordinates to NDC is mapping the truncated pyramid frustum to a cube.

According to the perspective projection, the projection point of a world point (xₑ,yₑ,zₑ) on the near plane is

$$ \begin{cases} x_p = \frac{n}{-z_e} x_e \\ y_p = \frac{n}{-z_e} y_e \end{cases} $$

(Camera coordinates is right-hand system looking in the -z direction, while NDC is looking in the z direction under left-hand coordinates)

In order to use a matrix to represent NDC transformation, the homogeneous coordinates are used to enable division, so the transformation can be represented as:

$$ \begin{pmatrix} x_{clip} \\ y_{clip} \\ z_{clip} \\ w_{clip} \end{pmatrix} = M_{projection} \cdot \begin{pmatrix} x_e \\ y_e \\ z_e \\ w_e \end{pmatrix} $$

(2024-02-15) wₑ is the homogeneous coordinate for storing the original depth value of the camera point after the multiplication with the projection matrix, where the 4-th row is [0 0 -1 0], so wₑ = 1. And the depth will be divided at the very end step: the perspective division, so as to make the intermediate processes linear operations.

Comparing merely projecting a 3D point onto plane with a w2c, the projection matrix specifies specific behavior for the z-axis of the ND space (not the source camera space any more).

Therefore, the NDC is:

$$ \begin{pmatrix} x_{ndc} \\ y_{ndc} \\ z_{ndc} \end{pmatrix} = \begin{pmatrix} x_{clip} / w_{clip} \\ y_{clip}/w_{clip} \\ z_{clip}/w_{clip} \end{pmatrix} $$

Because $w_{clip}$ is the denominator, it should equal to -zₑ; Hence, the forth row of matrix should be $[0\ 0\ -1\ 0]$

Mapping [l, r] and [b, t] to [-1, 1] with linear realtionship: Two points (l,-1),(r,1) can be used to determine the line:

$$ x_{NDC} = \frac{1-(-1)}{r-l} \cdot x_p + β $$

and then substitute (r,1) for $(x_p,x_{NDC})$ to solve β = -(r+l)/(r-l).

Therefore, $x_{NDC} = \frac{2}{r-l}x_p - \frac{r+l}{r-l}$.

Similarly, $y_{NDC} = \frac{2}{t-b} y_p- \frac{t+b}{t-b}$

Substitute xp, yp with the form of xₑ, yₑ:

$$ x_{NDC} = (\frac{2n}{r-l} \cdot x_e + \frac{r+l}{r-l} \cdot z_e) / -z_e \\ y_{NDC} = (\frac{2n}{t-b} \cdot y_e + \frac{t+b}{t-b} \cdot z_e) / -z_e $$

Therefore, the first two row elements of the matrix can be determined.

Suppose the third row is $[0\ 0\ A\ B]$ (z value is independent to x and y), so:

$$ z_{NDC} = \frac{A z_e + B w_e}{-z_e} $$

Substitute the correspondence between (-n, -f) and (-1, 1) into the above equation:

$$ \begin{cases} \frac{-A n + B}{n} = -1 \newline \frac{-A f + B}{f} = 1 \end{cases} $$

Therefore, A = -(f+n)/(f-n), and B = -2fn / (f-n)

Finally, the matrix $M_{projection}$ is

$$ \begin{pmatrix} \frac{2n}{r-l} & 0 & \frac{r+l}{r-l} & 0 \\ 0 & \frac{2n}{t-b} & \frac{t+b}{t-b} & 0 \\ 0 & 0 & \frac{-(f+n)}{f-n} & \frac{-2fn}{f-n} \\ 0 & 0 & -1 & 0 \end{pmatrix} $$

Code

(2023-10-02)

3D world coords multiplied with Inverse intrinsics matrix,
Scale the [near, far] to [0,1]

Code credits MatchNeRF

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


def get_coord_ref_ndc(extr_ref, intr_ref, pts_3D, inv_scale, near_far=None, lindisp=False):
  '''
  Warp the provided position to the reference coordinate, and normalize to NDC coordinate.
  pts_3D [batch, N_rays N_sample 3]
  '''

  bs, N_rays, N_samples, N_dim = pts_3D.shape # (bs=1, n_rays=1024, n_pts=128, n_dim=3)
  pts_3D = pts_3D.reshape(bs, -1, N_dim)  # (bs, n_rays*n_pts, n_dim)
  near, far = torch.split(near_far, [1, 1], dim=-1)   # (1,2) -> both are (1,1)

  # wrap to ref view
  if extr_ref is not None:
    # 3D pts in world space -> camera space of a src view
    pts_ref_world = world2cam(pts_3D, extr_ref)

  if intr_ref is not None:
    # using projection
    # pts in camera space -> image plane coords with z
    point_samples_pixel = pts_ref_world @ intr_ref.transpose(-1, -2)    
    # normalize to 0~1
    point_samples_pixel[..., :2] = (point_samples_pixel[..., :2] / point_samples_pixel[..., -1:] + 0.0) / inv_scale.reshape(bs, 1, 2)
    if not lindisp:
      point_samples_pixel[..., 2] = (point_samples_pixel[..., 2] - near) / (far - near)  # normalize to 0~1
    else:
      point_samples_pixel[..., 2] = (1.0/point_samples_pixel[..., 2]-1.0/near)/(1.0/far - 1.0/near)
  else:
    # using bounding box
    near, far = near.view(bs, 1, 3), far.view(bs, 1, 3)
    point_samples_pixel = (pts_ref_world - near) / (far - near)  # normalize to 0~1

  point_samples_pixel = point_samples_pixel.view(bs, N_rays, N_samples, 3)    # (bs, n_rays*n_pts, 3) -> (bs, n_rays, n_pts, 3)
  return point_samples_pixel

near-far = [0,1]

(2024-01-16)

Projection Matrix 详解 - 贰芍的文章 - 知乎