Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add host_buffer class #260

Open
jrhemstad opened this issue Jan 28, 2020 · 24 comments
Open

[FEA] Add host_buffer class #260

jrhemstad opened this issue Jan 28, 2020 · 24 comments
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request

Comments

@jrhemstad
Copy link
Contributor

jrhemstad commented Jan 28, 2020

Is your feature request related to a problem? Please describe.

It has come up in several independent conversations w/ @jakirkham and others that it would be nice to have RMM provide a corollary to device_buffer for host allocated memory. See conversation in #141 and #216

In short, it would be convenient to have a common abstraction for host memory allocations used by RAPIDS libraries. This would allow for things like having pinned host memory allocations for use in more performant device to host memory spilling.

Describe the solution you'd like

Add a rmm::host_buffer class to act as a host memory corollary to device_buffer, i.e., untyped, uninitialized host memory allocation.

Additional context

Note that this opens a (few) sizable can of worms of questions that will eventually need to be answered. From my comment here

  • Would rmm::alloc be renamed to rmm::device_alloc and we add a rmm::host_alloc?
  • Does a host memory resource accept streams for alloc/free? If not, then host/device_memory_resource cannot share the same base class.
  • Do we enforce using RMM host memory resources anywhere host memory is being allocated in the same way we do with device memory? (e.g., are we going to provide a rmm::host_vector to replace std::vector?)
  • Are there alignment requirements for host allocations?
  • Will there be a separate default memory resource for host allocations?
  • Do we need host memory pools?

Here's what I think the simplest and least effort path forward is:

  • Provide a host_memory_resource base class mirrored from device_memory_resource.
  • Provide host_buffer that accepts a host_memory_resource* to use for allocation
  • Do NOT provide mirrors of the default device memory resource infrastructure (e.g., get_default_resource/set_default_resource())
  • Do NOT provide a mirror for rmm::alloc/free for host memory allocations
  • If a host memory pool is required, only support it in C++17 and beyond.
@jrhemstad jrhemstad added the feature request New feature or request label Jan 28, 2020
@jakirkham
Copy link
Member

cc @kkraus14 @pentschev

@kkraus14
Copy link
Contributor

If a host memory pool is required, only support it in C++17 and beyond.

I assume this would be a crazy effort to backport to C++14? We typically get pushback from requiring newer compiler versions / system libraries.

@jrhemstad
Copy link
Contributor Author

jrhemstad commented Jan 28, 2020

I assume this would be a crazy effort to backport to C++14? We typically get pushback from requiring newer compiler versions / system libraries.

Yeah, it's not really possible to backport. You'd be better off just re-implementing all of the memory pool logic or using some other open source host memory pool.

That said, we can probably insulate user libraries from needing C++17. We can wrap the C++17 bits in include guards and throw an error if someone tries to use the host memory pools pre-c++17.

Note that providing a host memory pool is orthogonal to providing a host_buffer.

@harrism
Copy link
Member

harrism commented Jan 28, 2020

I would leave host memory pools to future work.

@kkraus14
Copy link
Contributor

Yeah, it's not really possible to backport. You'd be better off just re-implementing all of the memory pool logic or using some other open source host memory pool.

That said, we can probably insulate user libraries from needing C++17. We can wrap the C++17 bits in include guards and throw an error if someone tries to use the host memory pools pre-c++17.

Note that providing a host memory pool is orthogonal to providing a host_buffer.

Yea I understand, just that there likely will be enterprise customers that want the memory pool and C++14 support, but beggars can't be choosers 😄.

@jakirkham
Copy link
Member

Would using Boost help?

@harrism
Copy link
Member

harrism commented Jan 30, 2020

We don't want RMM to depend on Boost.

@jakirkham
Copy link
Member

Yeah that makes sense. Just thinking about what alternatives we might have 🙂

@jakirkham
Copy link
Member

What if we take this as an opportunity to start using the Conda compilers? That would put us on GCC 7.3.0, which has C++17 support (unless I've missed something). Besides this was something we were planning to do anyways. ( rapidsai/cudf#1210 )

@harrism
Copy link
Member

harrism commented Jan 30, 2020

I think we should just stick to C++14, and leave host memory pools for future work.

@kkraus14
Copy link
Contributor

What if we take this as an opportunity to start using the Conda compilers? That would put us on GCC 7.3.0, which has C++17 support (unless I've missed something). Besides this was something we were planning to do anyways. ( rapidsai/cudf#1210 )

That's only for conda packages though. We have people wanting to build from source themselves without conda and can't necessarily guarantee a new enough compiler for C++17. We've already had pushback for C++14 😄.

@jrhemstad
Copy link
Contributor Author

jrhemstad commented Jan 30, 2020

Agreed host memory pool is future work, just wanted to bring it up since we're talking about adding host memory management to RMM.

@jakirkham
Copy link
Member

Ok makes sense. Thanks for the context and thanks for putting up with suggestions you likely have already considered 🙂

@harrism
Copy link
Member

harrism commented Mar 12, 2020

Can this issue be closed now that host_memory_resource exists and rmm::alloc/free are going to be dropped?

@jrhemstad
Copy link
Contributor Author

We still need a host_buffer analog to device_buffer.

@harrism
Copy link
Member

harrism commented Mar 12, 2020

Ah yes, I got that confused.

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@randerzander
Copy link

Still desired

@jakirkham
Copy link
Member

(for context this came up today when discussing how to improve spilling and serialization performance)

cc @quasiben (for awareness)

@jrhemstad jrhemstad added 0 - Backlog In queue waiting for assignment and removed inactive-30d inactive-90d labels Jun 11, 2021
@jrhemstad
Copy link
Contributor Author

There's unlikely to be any movement here until NVIDIA/libcudacxx#158 is complete in the next month or so.

@jakirkham
Copy link
Member

Thanks for the update Jake 🙂

@jrhemstad
Copy link
Contributor Author

This conversation is quite stale now, so I'll give a quick update on the status quo:

  • The rmm::(host/device)_memory_resource base class interface is on its way out in favor of the cuda::mr functionality
  • cuda::mr should be thought of as taking what we learned the last several years with RMM, generalizing it, and then centralizing it in libcu++
  • RMM is in the process of migrating to the new cuda::mr interface: Use cuda::mr::memory_resource instead of raw device_memory_resource #1095
  • Today, libcu++ only offers the interface. It doesn't offer any concrete implementations nor does it offer any data structures or containers that use a memory resource for allocation. This is admittedly a bit limited today.
  • Our immediate next steps are to 1) Add concrete memory resource implementations, 2) Create allocators, data structures, and containers that use a cuda::mr resource for allocation. Step 2 would ultimately satisfy the original request of having a simple host_buffer and device_buffer classes. Instead, we'd likely have cuda::buffer<device_accessible> and cuda::buffer<host_accessible>.

@harrism
Copy link
Member

harrism commented Feb 26, 2024

Would you distinguish between async and synchronous host_buffer and device_buffer classes (so there would be 4 classes)? Or would you have two classes with both sync and async methods? Or would host_buffer always be synchronous and device_buffer always async?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request
Projects
Status: No status
Development

No branches or pull requests

5 participants