-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved concurrency for multi-threaded indexing #236
Conversation
Great contribution! Let me check this against valgrind and do some minimal benchmarking before merging. |
Happy indeed for a second opinion on this. |
I updated nanoflann benchmark to make some quick benchmarking with this PR. The results are good but, as usual, the optimal approach depends on the data distribution and dataset size: Build index profiling for the KITTI 04 sequence pointclouds:
Some graphs: For 4 threads vs 1 thread For 8 threads vs 1 thread The advantage of this PR is clearer for 4 threads than for 8 threads... for this particular pointcloud. Anyway, I feel positive about the logic of the change, and valgrind (including helgrind) is happy, so I'm merging it... |
Interesting results. Modest speedup for these (small?) point clouds. |
Yes, they are small in comparison. The horizontal axis is number of points... But the trend is clear in that multi threading is better for larger clouds. |
I'm a bit intrigued about four threads being faster than eight threads. |
@nigels-com The exact place to force a fixed number of threads in our benchmark is here: feel free of using that in your project/dataset just to see what happens... My guess is that results may depend a lot depending on how "random" the distribution of consecutive points are. In 3D LIDARs (as KITTI), point coordinates have a strong correlation with their neighbors in the sequence. |
Thanks for the details. Our points are probably less coherent because the Hovermap lidar is both spinning and moving through space. |
Concurrent tree building is faster on my 12 core laptop for ~130M point clouds, in this arrangement.
The intuition is to avoid having threads wait for both the left and right subtrees if they are both asynchronous.
In this arrangment we first recurse to left on the same thread (depth-first to the left) with an async recursion of the right subtree concurrently.
Once the left is all visited we'll recurse to the right with the current thread.
What this is aiming to avoid is having too many threads blocked (doing no work) waiting for async processing to complete on both left and right.
I see around a 2x speedup for 12 cores and ~130M, around 30 seconds rather than 60 seconds.