Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default pinned pool that falls back to new pinned allocations #15665

Merged
merged 44 commits into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
163ad97
pool with fallback
vuule May 6, 2024
1e850d6
don't use default pool
vuule May 6, 2024
3be42ba
fix allocator copy assignment
vuule May 6, 2024
395dcf1
fix ver2
vuule May 6, 2024
70ae74e
Merge branch 'branch-24.06' into bug-allocator-copy-wrong-stream
vuule May 6, 2024
503d170
Merge branch 'bug-allocator-copy-wrong-stream' into perf-defaul-piine…
vuule May 6, 2024
0873b1f
copyright
vuule May 6, 2024
ff18a21
fix operator==
vuule May 7, 2024
f5a735c
Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …
vuule May 7, 2024
5766805
Merge branch 'bug-allocator-copy-wrong-stream' of https://github.com/…
vuule May 7, 2024
0bd92bf
Merge branch 'bug-allocator-copy-wrong-stream' into perf-defaul-piine…
vuule May 7, 2024
854c0ab
simplify pool creation
vuule May 7, 2024
5bf0ce4
namespace; comments
vuule May 7, 2024
284654d
Merge branch 'branch-24.06' into perf-defaul-piined-pool
vuule May 7, 2024
ff4d7f6
clean up
vuule May 7, 2024
f5b2c84
Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …
vuule May 7, 2024
cf3f8a3
Merge branch 'perf-defaul-piined-pool' of https://github.com/vuule/cu…
vuule May 7, 2024
80b5963
mild polish
vuule May 7, 2024
d23684d
Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …
vuule May 8, 2024
1828e05
Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …
vuule May 9, 2024
0122038
remove inline
vuule May 9, 2024
60030da
scoped_lock
vuule May 9, 2024
a62377e
try
vuule May 9, 2024
6733c45
clean up
vuule May 9, 2024
abf40a8
clarify try-catch fallback
vuule May 9, 2024
fa7dce7
remove export
vuule May 9, 2024
a244d7c
non-indexed stream
vuule May 9, 2024
7076e73
Update cpp/src/io/utilities/config_utils.cpp
ttnghia May 9, 2024
0b8aa44
Merge branch 'branch-24.06' into perf-defaul-piined-pool
vuule May 9, 2024
3db44a3
Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …
vuule May 13, 2024
27d30c8
make default_pinned_mr cheap to call multiple times
vuule May 13, 2024
224e68f
static_assert fixed_pinned_pool_memory_resource
vuule May 13, 2024
382e7b3
Merge branch 'perf-defaul-piined-pool' of https://github.com/vuule/cu…
vuule May 13, 2024
b2fd734
fix host_mr
vuule May 13, 2024
0eccf9a
add config function
vuule May 13, 2024
ecd6481
align config size; add missing header
vuule May 13, 2024
709123f
Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …
vuule May 13, 2024
01b1bdb
Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …
vuule May 14, 2024
ecb5f5a
CUDF_EXPORT
vuule May 14, 2024
f0d0bf0
fail config if resource is already created
vuule May 14, 2024
f989a56
fix config check
vuule May 15, 2024
d0e6dd7
Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …
vuule May 15, 2024
2b4952a
docs
vuule May 15, 2024
fdcfad3
Merge branch 'branch-24.06' into perf-defaul-piined-pool
vuule May 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion cpp/include/cudf/detail/utilities/stream_pool.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -81,6 +81,11 @@ class cuda_stream_pool {
*/
cuda_stream_pool* create_global_cuda_stream_pool();

/**
* @brief Get the global stream pool.
*/
cuda_stream_pool& global_cuda_stream_pool();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to expose the pool to get a stream from it without forking


/**
* @brief Acquire a set of `cuda_stream_view` objects and synchronize them to an event on another
* stream.
Expand Down
9 changes: 9 additions & 0 deletions cpp/include/cudf/io/memory_resource.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,13 @@ rmm::host_async_resource_ref set_host_memory_resource(rmm::host_async_resource_r
*/
rmm::host_async_resource_ref get_host_memory_resource();

/**
* @brief Configure the size of the default host memory resource.
*
* Must be called before any other function in this header.
Copy link
Contributor

@ttnghia ttnghia May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dangerous requirement, and may not be satisfied. How about making the static function re-configuruable?

Copy link
Contributor

@ttnghia ttnghia May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For doing so:

  1. Static variable is declared outside of function scope (but in an anonymous namespace, so it is static inside just this TU). In addition, it can be a smart pointer.
  2. host_mr will initialize it with std::nullopt size if it is nullptr, otherwise just derefs the current pointer and returns.
  3. User can specify a size parameter to recompute and overwrite that static variable with a new mr.
  4. All these ops should be thread safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify, the issue with calling config after get/set is that it would have no effect.

allowing this opens another can of worms., e.g. what is the intended effect of calling config after set?

Copy link
Contributor

@ttnghia ttnghia May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't allow this, let's make some validity check to prevent it from being accidentally misused. It sounds unsafe if we just make an assumption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abellina what behavior do you suggest when config is called after the first resource use? I'm not sure if we should throw or just warn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should throw, I agree with @ttnghia that we should do something in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a mechanism to throw if config is called after the default resource has already been created.
@abellina might be good to test your branch with this change.

*
* @param size The size of the default host memory resource
*/
void config_host_memory_resource(size_t size);

} // namespace cudf::io
183 changes: 168 additions & 15 deletions cpp/src/io/utilities/config_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,13 @@

#include "config_utils.hpp"

#include <cudf/detail/utilities/stream_pool.hpp>
#include <cudf/io/memory_resource.hpp>
#include <cudf/utilities/error.hpp>
#include <cudf/utilities/export.hpp>

#include <rmm/cuda_device.hpp>
#include <rmm/mr/device/pool_memory_resource.hpp>
#include <rmm/mr/pinned_host_memory_resource.hpp>
#include <rmm/resource_ref.hpp>

Expand Down Expand Up @@ -87,38 +90,188 @@ bool is_stable_enabled() { return is_all_enabled() or get_env_policy() == usage_

} // namespace nvcomp_integration

inline std::mutex& host_mr_lock()
} // namespace detail

namespace {
class fixed_pinned_pool_memory_resource {
using upstream_mr = rmm::mr::pinned_host_memory_resource;
using host_pooled_mr = rmm::mr::pool_memory_resource<upstream_mr>;

private:
upstream_mr upstream_mr_{};
size_t pool_size_{0};
// Raw pointer to avoid a segfault when the pool is destroyed on exit
host_pooled_mr* pool_{nullptr};
void* pool_begin_{nullptr};
void* pool_end_{nullptr};
cuda::stream_ref stream_{cudf::detail::global_cuda_stream_pool().get_stream().value()};

public:
fixed_pinned_pool_memory_resource(size_t size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check and throw if size == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so; the resource works fine with a zero-capacity pool. I used this when benchmarking. Basically to verify that the performance is the same as the non-pooled resource. So zero is a valid value for the size IMO.

: pool_size_{size}, pool_{new host_pooled_mr(upstream_mr_, size, size)}
{
if (pool_size_ == 0) { return; }

// Allocate full size from the pinned pool to figure out the beginning and end address
pool_begin_ = pool_->allocate_async(pool_size_, stream_);
pool_end_ = static_cast<void*>(static_cast<uint8_t*>(pool_begin_) + pool_size_);
pool_->deallocate_async(pool_begin_, pool_size_, stream_);
}

void* do_allocate_async(std::size_t bytes, std::size_t alignment, cuda::stream_ref stream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm late, but these do_ versions should probably be protected/private?

{
if (bytes <= pool_size_) {
try {
return pool_->allocate_async(bytes, alignment, stream);
} catch (...) {
// If the pool is exhausted, fall back to the upstream memory resource
}
}

return upstream_mr_.allocate_async(bytes, alignment, stream);
}

void do_deallocate_async(void* ptr,
ttnghia marked this conversation as resolved.
Show resolved Hide resolved
std::size_t bytes,
std::size_t alignment,
cuda::stream_ref stream) noexcept
{
if (bytes <= pool_size_ && ptr >= pool_begin_ && ptr <= pool_end_) {
pool_->deallocate_async(ptr, bytes, alignment, stream);
} else {
upstream_mr_.deallocate_async(ptr, bytes, alignment, stream);
}
}

void* allocate_async(std::size_t bytes, cuda::stream_ref stream)
{
return do_allocate_async(bytes, rmm::RMM_DEFAULT_HOST_ALIGNMENT, stream);
}

void* allocate_async(std::size_t bytes, std::size_t alignment, cuda::stream_ref stream)
{
return do_allocate_async(bytes, alignment, stream);
}

void* allocate(std::size_t bytes, std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT)
{
auto const result = do_allocate_async(bytes, alignment, stream_);
stream_.wait();
return result;
}

void deallocate_async(void* ptr, std::size_t bytes, cuda::stream_ref stream) noexcept
{
return do_deallocate_async(ptr, bytes, rmm::RMM_DEFAULT_HOST_ALIGNMENT, stream);
}

void deallocate_async(void* ptr,
std::size_t bytes,
std::size_t alignment,
cuda::stream_ref stream) noexcept
{
return do_deallocate_async(ptr, bytes, alignment, stream);
}

void deallocate(void* ptr,
std::size_t bytes,
std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT) noexcept
{
deallocate_async(ptr, bytes, alignment, stream_);
stream_.wait();
}

bool operator==(fixed_pinned_pool_memory_resource const& other) const
{
return pool_ == other.pool_ and stream_ == other.stream_;
}

bool operator!=(fixed_pinned_pool_memory_resource const& other) const
{
return !operator==(other);
}

[[maybe_unused]] friend void get_property(fixed_pinned_pool_memory_resource const&,
cuda::mr::device_accessible) noexcept
{
}

[[maybe_unused]] friend void get_property(fixed_pinned_pool_memory_resource const&,
cuda::mr::host_accessible) noexcept
{
}
};
vuule marked this conversation as resolved.
Show resolved Hide resolved

static_assert(cuda::mr::resource_with<fixed_pinned_pool_memory_resource,
cuda::mr::device_accessible,
cuda::mr::host_accessible>,
"");

rmm::host_async_resource_ref make_default_pinned_mr(std::optional<size_t> config_size)
{
static std::mutex map_lock;
return map_lock;
static fixed_pinned_pool_memory_resource mr = [config_size]() {
auto const size = [&config_size]() -> size_t {
if (auto const env_val = getenv("LIBCUDF_PINNED_POOL_SIZE"); env_val != nullptr) {
return std::atol(env_val);
}

if (config_size.has_value()) { return *config_size; }

size_t free{}, total{};
CUDF_CUDA_TRY(cudaMemGetInfo(&free, &total));
// 0.5% of the total device memory, capped at 100MB
return std::min(total / 200, size_t{100} * 1024 * 1024);
Comment on lines +222 to +225
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm late, but this should use rmm::percent_of_free_device_memory. That function only takes an integer percent. If you need a decimal percent, please file an issue. Or you can just use 1% and then divide by 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least use rmm::available_device_memory().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could be using rmm::available_device_memory() to get the memory capacity. I'll address this in 24.08.
If there's a plan to add percent_of_total_device_memory, that would be even better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already exists.

}();

// rmm requires the pool size to be a multiple of 256 bytes
auto const aligned_size = (size + 255) & ~255;
CUDF_LOG_INFO("Pinned pool size = {}", aligned_size);

// make the pool with max size equal to the initial size
return fixed_pinned_pool_memory_resource{aligned_size};
}();

return mr;
}

inline rmm::host_async_resource_ref default_pinned_mr()
rmm::host_async_resource_ref make_host_mr(std::optional<size_t> size)
{
static rmm::mr::pinned_host_memory_resource default_mr{};
return default_mr;
static rmm::host_async_resource_ref mr_ref = make_default_pinned_mr(size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this was asked before but I'm curious when/how this object is destroyed?
Is it destroyed automatically when the process ends i.e. after main() completes?
Are there any CUDA API calls in the destructor(s)?
Maybe this is ok for host memory resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great question. Currently the pool itself is not destroyed as it caused a segfault at the end of some tests; presumably because of the call to cudaFreeHost after main(). But this is something I should revisit and verify what exactly the issue was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, can't destroy a static pool resource object. Open to suggestions to avoid the pool leak.

return mr_ref;
}

CUDF_EXPORT inline auto& host_mr()
std::mutex& host_mr_mutex()
{
static rmm::host_async_resource_ref host_mr = default_pinned_mr();
return host_mr;
static std::mutex map_lock;
return map_lock;
}

} // namespace detail
rmm::host_async_resource_ref& host_mr()
{
static rmm::host_async_resource_ref mr_ref = make_host_mr(std::nullopt);
return mr_ref;
}
vuule marked this conversation as resolved.
Show resolved Hide resolved

} // namespace

rmm::host_async_resource_ref set_host_memory_resource(rmm::host_async_resource_ref mr)
{
std::lock_guard lock{detail::host_mr_lock()};
auto last_mr = detail::host_mr();
detail::host_mr() = mr;
std::scoped_lock lock{host_mr_mutex()};
auto last_mr = host_mr();
host_mr() = mr;
return last_mr;
}

rmm::host_async_resource_ref get_host_memory_resource()
{
std::lock_guard lock{detail::host_mr_lock()};
return detail::host_mr();
std::scoped_lock lock{host_mr_mutex()};
return host_mr();
}

void config_host_memory_resource(size_t size)
{
std::scoped_lock lock{host_mr_mutex()};
make_host_mr(size);
}

} // namespace cudf::io
Loading