Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control whether a file data source memory-maps the file with an environment variable #17004

Merged
merged 8 commits into from
Oct 18, 2024
37 changes: 29 additions & 8 deletions cpp/src/io/utilities/datasource.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
*/

#include "file_io_utilities.hpp"
#include "getenv_or.hpp"

#include <cudf/detail/utilities/logger.hpp>
#include <cudf/detail/utilities/vector_factories.hpp>
Expand Down Expand Up @@ -227,14 +228,27 @@ class memory_mapped_source : public file_source {
}

private:
[[nodiscard]] bool should_register_mmap_buffer()
{
if (_map_addr == nullptr) { return false; }

auto const policy = getenv_or("LIBCUDF_MMAP_REGISTER_ENABLED", std::string{"AUTO"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment: In a future improvement, boolean type handling could be made more flexible by allowing both letter cases and supporting common key words such as "yes", "true", similar to what KvikIO does here.


if (policy == "ALWAYS") { return true; }
vuule marked this conversation as resolved.
Show resolved Hide resolved
if (policy == "AUTO") { return pageableMemoryAccessUsesHostPageTables(); }
if (policy == "OFF") { return false; }
ttnghia marked this conversation as resolved.
Show resolved Hide resolved

CUDF_FAIL("Invalid LIBCUDF_MMAP_REGISTER_POLICY value: " + policy);
}

/**
* @brief Page-locks (registers) the memory range of the mapped file.
*
* Fixes nvbugs/4215160
*/
void register_mmap_buffer(size_t offset, size_t size)
{
if (_map_addr == nullptr or not pageableMemoryAccessUsesHostPageTables()) { return; }
if (not should_register_mmap_buffer()) { return; }
ttnghia marked this conversation as resolved.
Show resolved Hide resolved

// Registered region must be within the mapped region
_reg_offset = std::max(offset, _map_offset);
Expand Down Expand Up @@ -467,15 +481,22 @@ std::unique_ptr<datasource> datasource::create(std::string const& filepath,
CUDF_EXPECTS(max_size_estimate == 0 or min_size_estimate <= max_size_estimate,
"Invalid min/max size estimates for datasource creation");

#ifdef CUFILE_FOUND
if (cufile_integration::is_always_enabled()) {
// avoid mmap as GDS is expected to be used for most reads
auto const use_memory_mapping = [] {
auto const policy = getenv_or("LIBCUDF_MMAP_ENABLED", std::string{"ON"});

if (policy == "ON") { return true; }
if (policy == "OFF") { return false; }
ttnghia marked this conversation as resolved.
Show resolved Hide resolved

CUDF_FAIL("Invalid LIBCUDF_MMAP_ENABLED value: " + policy);
}();

if (use_memory_mapping) {
return std::make_unique<memory_mapped_source>(
filepath.c_str(), offset, max_size_estimate, min_size_estimate);
} else {
// `file_source` reads the file directly without memory mapping
return std::make_unique<file_source>(filepath.c_str());
}
#endif
// Use our own memory mapping implementation for direct file reads
return std::make_unique<memory_mapped_source>(
filepath.c_str(), offset, max_size_estimate, min_size_estimate);
}

std::unique_ptr<datasource> datasource::create(host_buffer const& buffer)
Expand Down
Loading