Open3D
(2024-05-13)
Just found Open3D also can find nearest neighbors: 5-Step Guide to generate 3D meshes from point clouds with Python - Medium
|
|
CUDA Impl
(2024-03-05)
Example from simple-knn:
Use thread block to divide the sorted points
Given 100K points, using only 1D grid and 1D block which length is 1024 (SO), the number of blocks is $\frac{100K + 1024-1}{1024} = 97$
- How many threads can a Nvidia GPU launch?, found by DDG
Therefore, the kernel launch configuration parameters are <<<97, 1024>>>
-
Convert 3D cartesian coordinates to Morton code
-
Make up key-value pairs, where Morton code is key and points indices are values.
Such that, the points (indices) are sorted and similar points will gather together.
1 2 3 4 5 6 7 8thrust::device_vector<uint32_t> indices(P); // indices for points thrust::sequence(indices.begin(), indices.end()); // indices: 0, 1, 2, 3. ... P thrust::device_vector<uint32_t> indices_sorted(P); // the sorted indices cub::DeviceRadixSort::SortPairs(nullptr, temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P); temp_storage.resize(temp_storage_bytes); cub::DeviceRadixSort::SortPairs(temp_storage.data().get(), temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P);Once the points are spread onto blocks, points within a block are similar to each other.
-
Find the min and max coordinates combinations in each block
1boxMinMax << <num_boxes, BOX_SIZE >> > (P, points, indices_sorted.data().get(), boxes.data().get());Each thread has a point index. The indices
indices_sortedof points are already sorted based on the Morton (z-order) code. So points within a block are similar to each other, and their distances are supposed to be small.The vector
boxesrecords each block’s min and max coordinates:Based on the min and max coordinates, a point can quickly determine whether a block contains potential nearest neighbors.
Specifically, if the distance of a point to the min or max of a block is larger than
reject, the block won’t be searched for possible nearest neighbors.1 2 3 4 5for (int b = 0; b < (P + BOX_SIZE - 1) / BOX_SIZE; b++){ MinMax box = boxes[b]; float dist = distBoxPoint(box, point); if (dist > reject || dist > best[2]) continue;The
rejectvalue is the distance of the 3-rd nearest neighbor based on the initially sorted points sequence.1 2 3 4 5 6 7 8 9int idx = cg::this_grid().thread_rank(); float3 point = points[indices[idx]]; float best[3] = { FLT_MAX, FLT_MAX, FLT_MAX }; for (int i = max(0, idx - 3); i <= min(P - 1, idx + 3); i++){ if (i == idx) continue; updateKBest<3>(point, points[indices[i]], best); }In this way, quiet amount of points are filtered out to save computation.
-
Iterate every points in the potential block
1 2 3 4 5 6 7 8 9for (int b = 0; b < (P + BOX_SIZE - 1) / BOX_SIZE; b++){ ... for (int i = b * BOX_SIZE; i < min(P, (b + 1) * BOX_SIZE); i++){ if (i == idx) continue; updateKBest<3>(point, points[indices[i]], best); } } dists[indices[idx]] = (best[0] + best[1] + best[2]) / 3.0f; -
Compute distance every two points:
1 2 3 4 5 6 7 8 9 10 11 12 13 14__device__ void updateKBest(const float3& ref, const float3& point, float* knn) { float3 d = { point.x - ref.x, point.y - ref.y, point.z - ref.z }; // diff float dist = d.x * d.x + d.y * d.y + d.z * d.z; // sequare error for (int j = 0; j < K; j++) // K-nearest { if (knn[j] > dist) // closer than j { float t = knn[j]; // tmp for previous j knn[j] = dist; // replace the previous j dist = t; // the previous j is compared with remaining neighbors } } }
Test
(2024-03-06)
Ubuntu 20.04, 1050Ti, cuda-11.6 (nvcc -V)
-
Clone:
git clone https://gitlab.inria.fr/bkerbl/simple-knn.git -
Environment:
conda env create -f environment.yml1 2 3 4 5 6 7 8 9 10 11 12 13 14name: PCRefine channels: - pytorch - conda-forge - defaults dependencies: - python=3.10 - pip - pip: - torch==1.12.1+cu116 - torchvision==0.13.1+cu116 - --extra-index-url https://download.pytorch.org/whl/cu116 - ipykernel - plyfile -
Compile based on the setup.py:
pip install . -
Call from python:
1 2 3 4 5 6 7import numpy as np import torch from simple_knn._C import distCUDA2 from utils import fetchPly pcd = fetchPly("/home/yi/Downloads/nerf/data/nerf_synthetic/lego/points3d.ply") meanDist = distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda())Output the mean distance from neighbors to each point:
1 2(PCRefine) yi@yi:~/Downloads/simple-knn-comments$ python test_knn.py tensor([0.0013, 0.0020, 0.0008, ..., 0.0023, 0.0017, 0.0007], device='cuda:0')
Indices
(2024-03-08)
Modify the original code to return the neighbors’ indices:
|
|
pytorch3d
(2024-03-12)
Set up environment to compile pytorch3d:
|
|
Example from SC-GS
|
|
Use pytorch3d to find KNN in the lego point cloud:
|
|
pytorch3d’s function can return nearest neighbors distance and indices at the same time. And the results are the same as the above cuda implementation:
|
|
I also tested the point cloud predicted by casmvsnet_pl, but this time the pytorch3d is so much slower than the cuda code (1h). Dont’t know why.
Results are the same:
|
|
In simple-knn, the distance dist is a square error:
|
|