memo: algo | KNN

Open3D

(2024-05-13)

Just found Open3D also can find nearest neighbors: 5-Step Guide to generate 3D meshes from point clouds with Python - Medium

1
2
3
4
5
6
7
import open3d as o3d

pcd = o3d.io.read_point_cloud("/home/yi/Downloads/CasMVSNet_pl-comments/results/dtu/image_ref/scan1/points3d.ply", format='ply')

distances = pcd.compute_nearest_neighbor_distance()
avg_dist = np.mean(distances)
radius = 3 * avg_dist

CUDA Impl

(2024-03-05)

Example from simple-knn:

Use thread block to divide the sorted points

Given 100K points, using only 1D grid and 1D block which length is 1024 (SO), the number of blocks is $\frac{100K + 1024-1}{1024} = 97$

Therefore, the kernel launch configuration parameters are <<<97, 1024>>>

  1. Convert 3D cartesian coordinates to Morton code

  2. Make up key-value pairs, where Morton code is key and points indices are values.

    Such that, the points (indices) are sorted and similar points will gather together.

    1
    2
    3
    4
    5
    6
    7
    8
    
    thrust::device_vector<uint32_t> indices(P);	// indices for points
    thrust::sequence(indices.begin(), indices.end());	// indices: 0, 1, 2, 3. ... P
    thrust::device_vector<uint32_t> indices_sorted(P);	// the sorted indices
    
    cub::DeviceRadixSort::SortPairs(nullptr, temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P);
    temp_storage.resize(temp_storage_bytes);
    
    cub::DeviceRadixSort::SortPairs(temp_storage.data().get(), temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P);
    

    Once the points are spread onto blocks, points within a block are similar to each other.

  3. Find the min and max coordinates combinations in each block

    1
    
    boxMinMax << <num_boxes, BOX_SIZE >> > (P, points, indices_sorted.data().get(), boxes.data().get());
    

    Each thread has a point index. The indices indices_sorted of points are already sorted based on the Morton (z-order) code. So points within a block are similar to each other, and their distances are supposed to be small.

    G r a i i B n d l p d o o e c i x k n 1 t 0 0 2 m 4 i n t , h r m e a a x d s B l o c k 1 1 0 2 m 4 i n t , h r m e a a x d s B l o m c i k n , 9 7 m a x

    The vector boxes records each block’s min and max coordinates:

    b 2 l x k y 0 z b 2 l x k y 1 z b 2 l x k y 2 z b 2 l x k y n z

    Based on the min and max coordinates, a point can quickly determine whether a block contains potential nearest neighbors.

    Specifically, if the distance of a point to the min or max of a block is larger than reject, the block won’t be searched for possible nearest neighbors.

    1
    2
    3
    4
    5
    
    for (int b = 0; b < (P + BOX_SIZE - 1) / BOX_SIZE; b++){
        MinMax box = boxes[b];
        float dist = distBoxPoint(box, point);
        if (dist > reject || dist > best[2])
            continue;
    

    The reject value is the distance of the 3-rd nearest neighbor based on the initially sorted points sequence.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    int idx = cg::this_grid().thread_rank();
    float3 point = points[indices[idx]];
    float best[3] = { FLT_MAX, FLT_MAX, FLT_MAX };
    
    for (int i = max(0, idx - 3); i <= min(P - 1, idx + 3); i++){
        if (i == idx)
            continue;
        updateKBest<3>(point, points[indices[i]], best);
    }
    

    In this way, quiet amount of points are filtered out to save computation.

  4. Iterate every points in the potential block

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    for (int b = 0; b < (P + BOX_SIZE - 1) / BOX_SIZE; b++){
        ...
        for (int i = b * BOX_SIZE; i < min(P, (b + 1) * BOX_SIZE); i++){
            if (i == idx)
                continue;
            updateKBest<3>(point, points[indices[i]], best);
        }
    }
    dists[indices[idx]] = (best[0] + best[1] + best[2]) / 3.0f;
    
  5. Compute distance every two points:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
    __device__ void updateKBest(const float3& ref, const float3& point, float* knn)
    {
        float3 d = { point.x - ref.x, point.y - ref.y, point.z - ref.z };	// diff
        float dist = d.x * d.x + d.y * d.y + d.z * d.z;	// sequare error
        for (int j = 0; j < K; j++)	// K-nearest
        {
            if (knn[j] > dist)	// closer than j
            {
                float t = knn[j];	// tmp for previous j
                knn[j] = dist;	// replace the previous j
                dist = t;	// the previous j is compared with remaining neighbors
            }
        }
    }
    

Test

(2024-03-06)

Ubuntu 20.04, 1050Ti, cuda-11.6 (nvcc -V)

  1. Clone: git clone https://gitlab.inria.fr/bkerbl/simple-knn.git

  2. Environment: conda env create -f environment.yml

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
    name: PCRefine
    channels:
      - pytorch
      - conda-forge
      - defaults
    dependencies:
      - python=3.10
      - pip
      - pip:
        - torch==1.12.1+cu116
        - torchvision==0.13.1+cu116
        - --extra-index-url https://download.pytorch.org/whl/cu116
        - ipykernel
        - plyfile
    
  3. Compile based on the setup.py: pip install .

  4. Call from python:

    1
    2
    3
    4
    5
    6
    7
    
    import numpy as np
    import torch
    from simple_knn._C import distCUDA2
    from utils import fetchPly
    
    pcd = fetchPly("/home/yi/Downloads/nerf/data/nerf_synthetic/lego/points3d.ply")
    meanDist = distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda())
    

    Output the mean distance from neighbors to each point:

    1
    2
    
    (PCRefine) yi@yi:~/Downloads/simple-knn-comments$ python test_knn.py 
    tensor([0.0013, 0.0020, 0.0008,  ..., 0.0023, 0.0017, 0.0007], device='cuda:0')
    

Indices

(2024-03-08)

Modify the original code to return the neighbors’ indices:

1
2
3
4
5
6
7
8
(casmvsnet_pl) yi@yi:~/Downloads/CasMVSNet_pl-comments/submodules/simple-knn-comments$ python test_knn.py 
tensor([[41057, 23085, 16403],
        [21674, 59181, 31901],
        [18699, 99481, 15716],
        ...,
        [17352, 51604, 48154],
        [90929, 45350, 94932],
        [ 6797, 62182, 78410]], device='cuda:0', dtype=torch.int32)

pytorch3d

(2024-03-12)

Set up environment to compile pytorch3d:

1
2
3
4
5
conda create -n casmvsnet_pl python=3.8
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 --extra-index-url https://download.pytorch.org/whl/cu116

pip install git+https://github.com/facebookresearch/pytorch3d.git
pip install -r requirements.txt

Example from SC-GS

1
nn_dist, nn_idxs, _ = pytorch3d.ops.knn_points(init_pcl[None], init_pcl[None], None, None, K=K+1)

Use pytorch3d to find KNN in the lego point cloud:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import numpy as np
from plyfile import PlyData
from typing import NamedTuple
import torch
import pytorch3d.ops

class BasicPointCloud(NamedTuple):
    points : np.array
    colors : np.array
    normals : np.array

# Code from: https://github.com/graphdeco-inria/gaussian-splatting
def fetchPly(path):
    plydata = PlyData.read(path)
    vertices = plydata['vertex']    # coordinates (x,y,z), normal (nx,ny,nz), r,g,b of 100K points
    positions = np.vstack([vertices['x'], vertices['y'], vertices['z']]).T  # xyz: (3, 100K) -> (100K,3)
    colors = np.vstack([vertices['red'], vertices['green'], vertices['blue']]).T / 255.0    # rgb: (3, 100K) -> (100K, 3)
    normals = np.vstack([vertices['nx'], vertices['ny'], vertices['nz']]).T # normal xyz: (100K,3)
    return BasicPointCloud(points=positions, colors=colors, normals=normals)    # NamedTuple

pcd = fetchPly("/home/yi/Downloads/nerf/data/nerf_synthetic/lego/points3d.ply")
init_pcl = torch.from_numpy(np.asarray(pcd.points)).float().cuda()
K=3
nn_dist, nn_idxs, _ = pytorch3d.ops.knn_points(init_pcl[None], init_pcl[None], None, None, K=K+1)  # Both are (1,100k,4)

print(nn_dist[0][:3])
print(nn_idxs[0][:3])

pytorch3d’s function can return nearest neighbors distance and indices at the same time. And the results are the same as the above cuda implementation:

1
2
3
4
5
6
tensor([[0.0000, 0.0008, 0.0016, 0.0017],
        [0.0000, 0.0012, 0.0021, 0.0026],
        [0.0000, 0.0006, 0.0008, 0.0009]], device='cuda:0')
tensor([[    0, 41057, 23085, 16403],
        [    1, 21674, 59181, 31901],
        [    2, 18699, 99481, 15716]], device='cuda:0')

I also tested the point cloud predicted by casmvsnet_pl, but this time the pytorch3d is so much slower than the cuda code (1h). Dont’t know why.

Results are the same:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
scan1 contains 9.03 M points
# cuda result nn indices:
tensor([[2654648,  181848, 2654649],
        [1825888, 2654649,  182085],
        [1825889, 2025037,  182085],
        ...,
        [1788118, 5669911, 1996297],
        [2195795, 2814578, 1568896],
        [2814579, 2597015, 2195796]], device='cuda:0', dtype=torch.int32)

# pytorch3d result nn indices: (1, 9030200, 4)
tensor([[[      0, 2654648,  181848, 2654649],
         [      1, 1825888, 2654649,  182085],
         [      2, 1825889, 2025037,  182085],
         ...,
         [9030197, 1788118, 5669911, 1996297],
         [9030198, 2195795, 2814578, 1568896],
         [9030199, 2814579, 2597015, 2195796]]], device='cuda:0')

print(nn_dist)
tensor([[[0.0000, 0.0137, 0.0167, 0.3456],
         [0.0000, 0.2516, 0.3126, 0.3652],
         [0.0000, 0.0262, 0.0991, 0.2446],
         ...,
         [0.0000, 0.0775, 0.0991, 0.1228],
         [0.0000, 0.0577, 0.0749, 0.0777],
         [0.0000, 0.0449, 0.1124, 0.1350]]], device='cuda:0')

In simple-knn, the distance dist is a square error:

1
2
float3 d = { point.x - ref.x, point.y - ref.y, point.z - ref.z };
float dist = d.x * d.x + d.y * d.y + d.z * d.z;