Cuda acceleration #112

andre-nguyen · 2016-03-10T20:31:41Z

I saw issue #29 and wasn't interested in the GPU-voxel approach. It is clear that many ros applications use octomap as a standard and we would gain to work on parallelizing octomap. The advent of embedded GPU's such as the nvidia TK1 and TX1 are making this much more interesting for mobile robotics.

I would like to slowly incrementally develop this by speeding up small parts of the code.

How feasible do you think this is and do you have any pointers on where to start?

ahornung · 2016-03-12T19:16:51Z

Great to hear that you're interested in improving the performance! That sounds definitely feasible, and incrementally taking care of parts is probably the best way forward.

The critical functions would be computeUpdate(...) in OccupancyOcTreeBase and computeRayKeys(...) in OcTreeBaseImpl. You'll find that there are already conditional OpenMP parallelizations in place, these could give you some hints for a start.

andre-nguyen · 2016-03-13T17:17:26Z

Thanks, time for me to learn Cuda then :D

ahornung · 2016-04-16T12:47:25Z

Just in case you're generally looking for speedups and are not yet commited to Cuda: It's probably worth having a look at SIMD intrinsics (SSE) as well. These changes could be less intrusive than switching certain parts to Cuda.

andre-nguyen · 2016-04-24T05:00:01Z

Thanks for the tip and sorry for the late response. I unfortunately only recently received my hardware but SSE would certainly be interesting that way I could work from home without the need for the TK1.

Please don't count too much on this though, if it is ever ready, it will be for the end of the summer.

gsp-27 · 2016-08-19T12:00:20Z

Hi, Can you point me to some resources which can point me to understand octrees more intuitively. I understand segment trees and also familiar with lazy update in 1D segment trees. Octrees are 3 dimensional version of segment trees but it is difficult for me to imagine lazy update in it. I wanted to make contribution for it. I am writing this comment because I also plan on parallelising, if it is even possible.
Your help will be of immense help.

ahornung · 2016-08-26T16:25:19Z

The best documentation will be Wikipedia, the OctoMap AuRo journal paper, and the code; with increasing depth into the topic.

dblanm · 2017-06-19T06:49:56Z

Hi @andre-nguyen ,

How is it going the implementation of CUDA with Octomap? I am also planning on implementing CUDA in Octomap. Maybe I could try to help you.

gsp-27 · 2017-06-19T07:08:39Z

If you guys plan some specific tasks I would also love to help.

…

On Mon, Jun 19, 2017, 12:20 PM David Mulero ***@***.***> wrote: Hi @andre-nguyen <https://github.com/andre-nguyen> , How is it going the implementation of CUDA with Octomap? I am also planning on implementing CUDA in Octomap. Maybe I could try to help you. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#112 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADLApYYEbhaSbUt0pXc6JdDUf7DkEmieks5sFhoWgaJpZM4HuHJ8> .

andre-nguyen · 2017-06-19T13:41:44Z

@dblanm @gsp-27 Like many projects, other tasks got out of hand and I didn't have time to get to this 😭 😭 😭

sbaktha · 2019-07-22T12:44:49Z

Hi, Is there any update on the status of CUDA implementation?

saifullah3396 · 2019-10-15T10:10:55Z

Hi @ahornung, I have developed a CUDA based replacement of the computeUpdate() and computeRayKeys(). Can you please look at my fork https://github.com/saifullah3396/octomap and tell me if its good for pull request. For now it does not have conflicts with the basic implementation? I'd really like further development on this to be done in this repository. The implementation can be tested by building the cuda-devel branch (add cmake parameter -D__CUDA_SUPPORT__=ON) and running graph2tree as follows:
../bin/graph2tree -i ../octomap/share/data/spherical_scan.graph -o out.bt
I am still facing a few issues regarding speeding up the process. Right now a lot of data has to be copied to GPU before updating the scan. For that maybe its better to copy the tree once on GPU and then keep using it? or create the tree on GPU directly. In any case, copying the tree on GPU takes a lot of time.

ahornung · 2019-10-16T20:09:39Z

Thanks for your contribution @saifullah3396, that sounds really useful!

Do you have a first indication about processing times, ideally on the same benchmark data as used in the paper?

Unfortunately, I won't have time for an in-depth review, so best would be a cleaned up pull request that can be iteratively discussed and improved by the community.

saifullah3396 · 2019-10-17T10:20:42Z

@ahornung Well in basic usage the current implementation is definitely faster but before I produce some results on the benchmark data, I will be working on the implementation a bit more for making it even faster. It might take me some time to add a CUDA - based hashmap in there but it will definitely increase performance. I will share the benchmark results once I'm finished with it and send a PR ! :)

ahornung added the enhancement label Mar 12, 2016

saifullah3396 linked a pull request Oct 25, 2019 that will close this issue

Cuda support and acceleration for octomaps. #257

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda acceleration #112

Cuda acceleration #112

andre-nguyen commented Mar 10, 2016

ahornung commented Mar 12, 2016

andre-nguyen commented Mar 13, 2016

ahornung commented Apr 16, 2016

andre-nguyen commented Apr 24, 2016

gsp-27 commented Aug 19, 2016

ahornung commented Aug 26, 2016

dblanm commented Jun 19, 2017

gsp-27 commented Jun 19, 2017 via email

andre-nguyen commented Jun 19, 2017

sbaktha commented Jul 22, 2019

saifullah3396 commented Oct 15, 2019 •

edited

Loading

ahornung commented Oct 16, 2019

saifullah3396 commented Oct 17, 2019

Cuda acceleration #112

Cuda acceleration #112

Comments

andre-nguyen commented Mar 10, 2016

ahornung commented Mar 12, 2016

andre-nguyen commented Mar 13, 2016

ahornung commented Apr 16, 2016

andre-nguyen commented Apr 24, 2016

gsp-27 commented Aug 19, 2016

ahornung commented Aug 26, 2016

dblanm commented Jun 19, 2017

gsp-27 commented Jun 19, 2017 via email

andre-nguyen commented Jun 19, 2017

sbaktha commented Jul 22, 2019

saifullah3396 commented Oct 15, 2019 • edited Loading

ahornung commented Oct 16, 2019

saifullah3396 commented Oct 17, 2019

saifullah3396 commented Oct 15, 2019 •

edited

Loading