Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAHOUT-1974 CUDA support #310

Open
wants to merge 14 commits into
base: CUDA
Choose a base branch
from
Open

Conversation

nsakharnykh
Copy link

@nsakharnykh nsakharnykh commented Apr 27, 2017

Initial PR for CUDA bindings support through JCuda

@andrewpalumbo
Copy link
Member

Tests pass on my system:

Mahout JVM Sparse multiplication time: 1914 ms.
Mahout JCuda Sparse multiplication time: 195 ms.
- sparse mmul at geometry of 1000 x 1000 %*% 1000 x 1000 density = .2.  5 runs
Mahout JVM Sparse multiplication time: 43 ms.
Mahout JCuda Sparse multiplication time: 11 ms.
- sparse mmul at geometry of 1000 x 1000 %*% 1000 x 1000 density = .02.  5 runs
Mahout JVM Sparse multiplication time: 2 ms.
Mahout JCuda Sparse multiplication time: 1 ms.
- sparse mmul at geometry of 1000 x 1000 %*% 1000 x 1000 density = .002.  5 runs
UserSetCUDATestSuite:
Mahout JVM Sparse multiplication time: 45 ms.
Mahout JCuda Sparse multiplication time: 10 ms.
User Defined sparse mmul at geometry of 1000 x 1000 %*% 1000 x 1000 density = 0.02 3 runs : 10 ms
- User Defined sparse mmul at geometry of 1000 x 1000 %*% 1000 x 1000 density = 0.02 3 runs 

@andrewpalumbo
Copy link
Member

@nsakharnykh @rawkintrevo I intend to have dense hammered out on Sunday.

@andrewpalumbo
Copy link
Member

@nsakharnykh , @rawkintrevo, I ran out of time tonight to finish out dense %*% dense and dense %x% sparse; went down a rabbit hole woth the NVIDIA c api docs for cusparse. I noticed that JCuda supported only a single dense dense dgemm algorithm, with column major-matrices. Most mahout matrices are row-major, but i began considering the dense sparse multiplication, and was slightly thrown off by what seems to be required csr compression. it seems that sparse matrices should be compressed as csc since the. Anyways I ended up in the LAPACK fortran; apologies for not finishing it up tonight guys, I got off on a long tangent and ran out of time.

I pushed my beginning work up to my MAHOUT-1974 branch. Nothing really worth looking at right now, but I wil' make a PR against this when I get the densework together.

Regardless, I should have at least a quick n dirty version ready to go soon, while i work out what we'll need for experiments and benchmarking. We can still discuss and consider different SPARK configurations tomorrow with out dense cases. but I'd of course like to get this right.

As I mentioned on the last call we allow a "Sparse" DRM's in-core components to be both sparse and dense. Currently the threshold for conversion of a DRM block to be changed from a sparse to a dense matrix is pretty high (25% non zero estimate). In the future we will need to allow the user to set the sparsity somehow.

FYI:
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/package.scala#L431

@nsakharnykh
Copy link
Author

@andrewpalumbo regarding column-major: yes, this is the default mode for CUBLAS, sorry I think I didn't mention it in my original email. There are a couple options we can exercise here. 1. We can use transposed versions of gemm routines if the input matrices are row-major. I think the output matrix will be always column-major so we'll have to transpose it by using geam if we want to keep it in a different format. 2. We can also keep the dense matrices in column-major format on the GPU and move between csc and csr formats for sparse matrices by using CUSPARSE conversion routines like csr2csc. There are also existing API functions in CUSPARSE to convert sparse to dense csr2dense and the other way around dense2csr. I think we should try to use the available conversion APIs from CUSPARSE as much as possible to avoid writing this on our own.

cuda/pom.xml Outdated
<parent>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout</artifactId>
<version>0.13.0-SNAPSHOT</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be 0.13.1-SNAPSHOT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@andrewpalumbo
Copy link
Member

@nsakharnykh I have my MAHOUT-1974 branch that is almost complete with dense, etc (less the column major issues. We'd discussed just making a PR against this. but It may be easiest if you just went ahead and pushed this to MAHOUT/CUDA, and then I'll make a PR against that, which will be public so that others may comment on it.

@andrewpalumbo
Copy link
Member

andrewpalumbo commented May 7, 2017

@nsakharnykh https://github.com/andrewpalumbo/mahout/tree/MAHOUT-1974/cuda ^^
P.S. this is still WIP so there's alot of garbage in it..

@nsakharnykh
Copy link
Author

@andrewpalumbo Ok, sounds good. I'll try to push what I have as soon as I have some time in front of my laptop. I'm currently at GTC so my schedule is a bit fragmented.

@andrewpalumbo
Copy link
Member

Great, thanks. I figured you were there, and very busy, I'll keep working on my end, and there should be no (or few conflicts).. no rush, since my branch is based off of yours.

@rawkintrevo
Copy link
Contributor

looking awesome @nsakharnykh @andrewpalumbo

Before merging, don't forget to fill out
https://github.com/apache/mahout/blob/master/website/docs/native-solvers/cuda.md

@andrewpalumbo
Copy link
Member

@rawkintrevo I asked @nsakharnykh to just go ahead and push this to the mahout/CUDA branch, since he's already up at GTC, and we're pushing this through as quickly as possible, and has spotty time to do this. I will immediately open up a [WIP] PR from my https://github.com/andrewpalumbo/mahout/tree/MAHOUT-1974/cuda branch (on top of his) and will fill out the md from there.

asfgit pushed a commit that referenced this pull request May 8, 2017
@balashashanka
Copy link
Contributor

Just checking if we need to keep this PR open - I'm guessing this is already merged in feature branch: https://github.com/apache/mahout/tree/CUDA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants