Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ANN bench options to specify CAGRA graph and dataset locations #1896
ANN bench options to specify CAGRA graph and dataset locations #1896
Changes from 82 commits
d1b53b1
a617988
69effcd
2a67251
01a12aa
c3ce61e
49ca646
bded674
f0e3c8f
f66fd21
20b793a
ad255fd
6d7f390
491d090
c9569a5
28bee2b
1e7ba4f
563b386
9585f20
87e3be0
74e6a5d
fcd029f
a56227e
3fcd1e9
208fe0e
929005b
c63cbcd
1ec75ba
a5585fa
7d21375
001c224
c430bb8
02cb915
30428fd
b5606c1
db2d210
cb2eef8
375c38e
c4fb53c
f38031a
8bb273c
d539316
fce179b
f54a757
385b4f4
95c12db
7b67e89
419d994
1e7b5c8
36d4dd3
667b95c
daffaf4
eb167e3
284897c
072d43d
6d9082d
c4eee56
3e9079d
15539ea
f9aad90
b33363e
32e7b31
07c9b55
dec8c53
b223dae
be139f5
68bd922
53c6ded
4ec2576
e355ff0
64705d3
0cb632a
7aded1b
8f0f78b
8ad652c
313876b
abc4343
69f8d88
ef6d8aa
e3b2726
ca5ab3a
bb0f9e8
ca2f17a
6f9a39c
6ca2a3d
0c44d84
fb7847c
a1f1e97
36cef0d
918cd35
61339e0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a request provide an alternative methods for allocation (without
mmap
) so that we can use Transparent Huge Pages. I will submit a follow up PR with these changes. I am thinking about the following design:In this case we would have a single custom
mr
class that we would configure according to the benchmark input parameter. Let me know what do you think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'll have to think about this. I don't think I generally mind the approach of having a
composite_memory_resource
ordelegating_memory_resource
but I think we should continue using RMM's abstraction layers everywhere possible instead of, for example, callingcudaMallocHost
directly. My concern is a pretty easy fix, though, by just storing the corresponding memory_resource instances inside thedelegating_memory_resource
and delegating as necessary. That would maintain the compatibility w/ RMM (this could, for example, still be set as thecurrent_device_resource
if we so wanted).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also cc'ing @harrism for his thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class above is a subclass of
rmm::mr::device_memory_resource
. We are using it to allocatemdarray
s that are accessible from both host and device. It is a workaround to unblock some tests.I feel that RAFT's distinction between
host_mdarry
anddevice_mdarray
stands in the way. What our algorithm needs is anmdarray
which we can configured with any allocator (host, device). In a few places our algorithm does not care whether the input is ahost_mdarray
or adevice_mdarray
, just passes the pointer tocudaMemcpy
, to fetch a chunk of the array we are working on.How we allocate
mdarray
is described by aContainerPolicy
class. So probably the only thing we need is to implement a contaner policy that could be initialized by bothdevice_memory_resource
andhost_memory_resource
. Afterwards we can simply utilize existing RMM memory resources.The only thing we are missing from RMM side is a memory resource that uses
mmap
andmadvise
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look, it's fine to provide your own base-level
memory_resource
classes that derive from RMM resource base classes. The problem today is that there is no common base for host and device MRs. We are starting to refactor on top ofcuda::memory_resource
which abstracts memorykind
as a template parameter, and addsresource_ref
andasync_resource_ref
which have template parameters to specify memory kind. This means you can have a function that accepts any memory resource that is device accessible, host accessible, or both, or whatever kind you can define. And we can start to reuse all the MR and adaptor types in RMM for different kinds of memory.The problem with @tfeher 's suggestion at the top is that it appears to mix host- and device-accessible memory kinds under
device_memory_resource
, which is supposed to allocate only device-accessible memory.As to @cjnolet 's comment, I agree that you should have separate resources for different kinds of allocation, and then have a higher level MR that takes pointers/refs to MRs for each kind you want to dispatch to, rather than hard-coding different allocation calls within a single MR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to see these new memory resources worked back into RMM eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RMM has a pinned memory resource. But it is a subclass of
host_memory_resource
. The problem is on our side: our API for managing allocators for mdspan requires adevice_memory_resource
. We would need to enable host allocator there, and then we could usermm::mr::host_memory_resource
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a plan to add a device-accessible pinned memory resource. Ideally using the new cuda::memory_resource refactoring.