memo: algo

Open3D

(2024-05-13)

Just found Open3D also can find nearest neighbors: 5-Step Guide to generate 3D meshes from point clouds with Python - Medium

1
2
3
4
5
6
7


import open3d as o3d

pcd = o3d.io.read_point_cloud("/home/yi/Downloads/CasMVSNet_pl-comments/results/dtu/image_ref/scan1/points3d.ply", format='ply')

distances = pcd.compute_nearest_neighbor_distance()
avg_dist = np.mean(distances)
radius = 3 * avg_dist

CUDA Impl

(2024-03-05)

Example from simple-knn:

Use thread block to divide the sorted points

Given 100K points, using only 1D grid and 1D block which length is 1024 (SO), the number of blocks is $\frac{100K + 1024-1}{1024} = 97$

How many threads can a Nvidia GPU launch?, found by DDG

Therefore, the kernel launch configuration parameters are <<<97, 1024>>>

Convert 3D cartesian coordinates to Morton code

Make up key-value pairs, where Morton code is key and points indices are values.

Such that, the points (indices) are sorted and similar points will gather together.

1
2
3
4
5
6
7
8


thrust::device_vector<uint32_t> indices(P);	// indices for points
thrust::sequence(indices.begin(), indices.end());	// indices: 0, 1, 2, 3. ... P
thrust::device_vector<uint32_t> indices_sorted(P);	// the sorted indices

cub::DeviceRadixSort::SortPairs(nullptr, temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P);
temp_storage.resize(temp_storage_bytes);

cub::DeviceRadixSort::SortPairs(temp_storage.data().get(), temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P);

Once the points are spread onto blocks, points within a block are similar to each other.

Find the min and max coordinates combinations in each block

1

boxMinMax << <num_boxes, BOX_SIZE >> > (P, points, indices_sorted.data().get(), boxes.data().get());

Each thread has a point index. The indices indices_sorted of points are already sorted based on the Morton (z-order) code. So points within a block are similar to each other, and their distances are supposed to be small.

The vector boxes records each block’s min and max coordinates:

Based on the min and max coordinates, a point can quickly determine whether a block contains potential nearest neighbors.

Specifically, if the distance of a point to the min or max of a block is larger than reject, the block won’t be searched for possible nearest neighbors.

1
2
3
4
5


for (int b = 0; b < (P + BOX_SIZE - 1) / BOX_SIZE; b++){
    MinMax box = boxes[b];
    float dist = distBoxPoint(box, point);
    if (dist > reject || dist > best[2])
        continue;

The reject value is the distance of the 3-rd nearest neighbor based on the initially sorted points sequence.

1
2
3
4
5
6
7
8
9


int idx = cg::this_grid().thread_rank();
float3 point = points[indices[idx]];
float best[3] = { FLT_MAX, FLT_MAX, FLT_MAX };

for (int i = max(0, idx - 3); i <= min(P - 1, idx + 3); i++){
    if (i == idx)
        continue;
    updateKBest<3>(point, points[indices[i]], best);
}

In this way, quiet amount of points are filtered out to save computation.

Iterate every points in the potential block

1
2
3
4
5
6
7
8
9


for (int b = 0; b < (P + BOX_SIZE - 1) / BOX_SIZE; b++){
    ...
    for (int i = b * BOX_SIZE; i < min(P, (b + 1) * BOX_SIZE); i++){
        if (i == idx)
            continue;
        updateKBest<3>(point, points[indices[i]], best);
    }
}
dists[indices[idx]] = (best[0] + best[1] + best[2]) / 3.0f;

Compute distance every two points:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


__device__ void updateKBest(const float3& ref, const float3& point, float* knn)
{
    float3 d = { point.x - ref.x, point.y - ref.y, point.z - ref.z };	// diff
    float dist = d.x * d.x + d.y * d.y + d.z * d.z;	// sequare error
    for (int j = 0; j < K; j++)	// K-nearest
    {
        if (knn[j] > dist)	// closer than j
        {
            float t = knn[j];	// tmp for previous j
            knn[j] = dist;	// replace the previous j
            dist = t;	// the previous j is compared with remaining neighbors
        }
    }
}

Test

(2024-03-06)

Ubuntu 20.04, 1050Ti, cuda-11.6 (nvcc -V)

Clone: git clone https://gitlab.inria.fr/bkerbl/simple-knn.git

Environment: conda env create -f environment.yml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


name: PCRefine
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - python=3.10
  - pip
  - pip:
    - torch==1.12.1+cu116
    - torchvision==0.13.1+cu116
    - --extra-index-url https://download.pytorch.org/whl/cu116
    - ipykernel
    - plyfile

Compile based on the setup.py: pip install .

Call from python:

1
2
3
4
5
6
7


import numpy as np
import torch
from simple_knn._C import distCUDA2
from utils import fetchPly

pcd = fetchPly("/home/yi/Downloads/nerf/data/nerf_synthetic/lego/points3d.ply")
meanDist = distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda())

Output the mean distance from neighbors to each point:

1
2


(PCRefine) yi@yi:~/Downloads/simple-knn-comments$ python test_knn.py 
tensor([0.0013, 0.0020, 0.0008,  ..., 0.0023, 0.0017, 0.0007], device='cuda:0')

Indices

(2024-03-08)

Modify the original code to return the neighbors’ indices:

1
2
3
4
5
6
7
8


(casmvsnet_pl) yi@yi:~/Downloads/CasMVSNet_pl-comments/submodules/simple-knn-comments$ python test_knn.py 
tensor([[41057, 23085, 16403],
        [21674, 59181, 31901],
        [18699, 99481, 15716],
        ...,
        [17352, 51604, 48154],
        [90929, 45350, 94932],
        [ 6797, 62182, 78410]], device='cuda:0', dtype=torch.int32)

pytorch3d

(2024-03-12)

Set up environment to compile pytorch3d:

1
2
3
4
5


conda create -n casmvsnet_pl python=3.8
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 --extra-index-url https://download.pytorch.org/whl/cu116

pip install git+https://github.com/facebookresearch/pytorch3d.git
pip install -r requirements.txt

Example from SC-GS

1

nn_dist, nn_idxs, _ = pytorch3d.ops.knn_points(init_pcl[None], init_pcl[None], None, None, K=K+1)

Use pytorch3d to find KNN in the lego point cloud:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


import numpy as np
from plyfile import PlyData
from typing import NamedTuple
import torch
import pytorch3d.ops

class BasicPointCloud(NamedTuple):
    points : np.array
    colors : np.array
    normals : np.array

# Code from: https://github.com/graphdeco-inria/gaussian-splatting
def fetchPly(path):
    plydata = PlyData.read(path)
    vertices = plydata['vertex']    # coordinates (x,y,z), normal (nx,ny,nz), r,g,b of 100K points
    positions = np.vstack([vertices['x'], vertices['y'], vertices['z']]).T  # xyz: (3, 100K) -> (100K,3)
    colors = np.vstack([vertices['red'], vertices['green'], vertices['blue']]).T / 255.0    # rgb: (3, 100K) -> (100K, 3)
    normals = np.vstack([vertices['nx'], vertices['ny'], vertices['nz']]).T # normal xyz: (100K,3)
    return BasicPointCloud(points=positions, colors=colors, normals=normals)    # NamedTuple

pcd = fetchPly("/home/yi/Downloads/nerf/data/nerf_synthetic/lego/points3d.ply")
init_pcl = torch.from_numpy(np.asarray(pcd.points)).float().cuda()
K=3
nn_dist, nn_idxs, _ = pytorch3d.ops.knn_points(init_pcl[None], init_pcl[None], None, None, K=K+1)  # Both are (1,100k,4)

print(nn_dist[0][:3])
print(nn_idxs[0][:3])

pytorch3d’s function can return nearest neighbors distance and indices at the same time. And the results are the same as the above cuda implementation:

1
2
3
4
5
6


tensor([[0.0000, 0.0008, 0.0016, 0.0017],
        [0.0000, 0.0012, 0.0021, 0.0026],
        [0.0000, 0.0006, 0.0008, 0.0009]], device='cuda:0')
tensor([[    0, 41057, 23085, 16403],
        [    1, 21674, 59181, 31901],
        [    2, 18699, 99481, 15716]], device='cuda:0')

I also tested the point cloud predicted by casmvsnet_pl, but this time the pytorch3d is so much slower than the cuda code (1h). Dont’t know why.

Results are the same:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


scan1 contains 9.03 M points
# cuda result nn indices:
tensor([[2654648,  181848, 2654649],
        [1825888, 2654649,  182085],
        [1825889, 2025037,  182085],
        ...,
        [1788118, 5669911, 1996297],
        [2195795, 2814578, 1568896],
        [2814579, 2597015, 2195796]], device='cuda:0', dtype=torch.int32)

# pytorch3d result nn indices: (1, 9030200, 4)
tensor([[[      0, 2654648,  181848, 2654649],
         [      1, 1825888, 2654649,  182085],
         [      2, 1825889, 2025037,  182085],
         ...,
         [9030197, 1788118, 5669911, 1996297],
         [9030198, 2195795, 2814578, 1568896],
         [9030199, 2814579, 2597015, 2195796]]], device='cuda:0')

print(nn_dist)
tensor([[[0.0000, 0.0137, 0.0167, 0.3456],
         [0.0000, 0.2516, 0.3126, 0.3652],
         [0.0000, 0.0262, 0.0991, 0.2446],
         ...,
         [0.0000, 0.0775, 0.0991, 0.1228],
         [0.0000, 0.0577, 0.0749, 0.0777],
         [0.0000, 0.0449, 0.1124, 0.1350]]], device='cuda:0')

In simple-knn, the distance dist is a square error:

1
2


float3 d = { point.x - ref.x, point.y - ref.y, point.z - ref.z };
float dist = d.x * d.x + d.y * d.y + d.z * d.z;