Skip to content

Commit

Permalink
Fix ordering / heading levels in README.md and python example in guid…
Browse files Browse the repository at this point in the history
…e.md (#1513)

This is a minor fix to reorder some of the C++ docs in README.md and fix heading levels. It also adds prompt characters to the pytorch allocator example in the Python guide.md to be consistent with other examples in the doc.

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #1513
  • Loading branch information
harrism authored Apr 10, 2024
1 parent cdf20a6 commit 6771b71
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 54 deletions.
102 changes: 51 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,38 +207,7 @@ alignment argument. All allocations are required to be aligned to at least 256B.
`device_memory_resource` adds an additional `cuda_stream_view` argument to allow specifying the stream
on which to perform the (de)allocation.

## `cuda_stream_view` and `cuda_stream`

`rmm::cuda_stream_view` is a simple non-owning wrapper around a CUDA `cudaStream_t`. This wrapper's
purpose is to provide strong type safety for stream types. (`cudaStream_t` is an alias for a pointer,
which can lead to ambiguity in APIs when it is assigned `0`.) All RMM stream-ordered APIs take a
`rmm::cuda_stream_view` argument.

`rmm::cuda_stream` is a simple owning wrapper around a CUDA `cudaStream_t`. This class provides
RAII semantics (constructor creates the CUDA stream, destructor destroys it). An `rmm::cuda_stream`
can never represent the CUDA default stream or per-thread default stream; it only ever represents
a single non-default stream. `rmm::cuda_stream` cannot be copied, but can be moved.

## `cuda_stream_pool`

`rmm::cuda_stream_pool` provides fast access to a pool of CUDA streams. This class can be used to
create a set of `cuda_stream` objects whose lifetime is equal to the `cuda_stream_pool`. Using the
stream pool can be faster than creating the streams on the fly. The size of the pool is configurable.
Depending on this size, multiple calls to `cuda_stream_pool::get_stream()` may return instances of
`rmm::cuda_stream_view` that represent identical CUDA streams.

### Thread Safety

All current device memory resources are thread safe unless documented otherwise. More specifically,
calls to memory resource `allocate()` and `deallocate()` methods are safe with respect to calls to
either of these functions from other threads. They are _not_ thread safe with respect to
construction and destruction of the memory resource object.

Note that a class `thread_safe_resource_adapter` is provided which can be used to adapt a memory
resource that is not thread safe to be thread safe (as described above). This adapter is not needed
with any current RMM device memory resources.

### Stream-ordered Memory Allocation
## Stream-ordered Memory Allocation

`rmm::mr::device_memory_resource` is a base class that provides stream-ordered memory allocation.
This allows optimizations such as re-using memory deallocated on the same stream without the
Expand Down Expand Up @@ -270,39 +239,39 @@ For further information about stream-ordered memory allocation semantics, read
Allocator](https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-1/)
on the NVIDIA Developer Blog.

### Available Resources
## Available Device Resources

RMM provides several `device_memory_resource` derived classes to satisfy various user requirements.
For more detailed information about these resources, see their respective documentation.

#### `cuda_memory_resource`
### `cuda_memory_resource`

Allocates and frees device memory using `cudaMalloc` and `cudaFree`.

#### `managed_memory_resource`
### `managed_memory_resource`

Allocates and frees device memory using `cudaMallocManaged` and `cudaFree`.

Note that `managed_memory_resource` cannot be used with NVIDIA Virtual GPU Software (vGPU, for use
with virtual machines or hypervisors) because [NVIDIA CUDA Unified Memory is not supported by
NVIDIA vGPU](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#cuda-open-cl-support-vgpu).

#### `pool_memory_resource`
### `pool_memory_resource`

A coalescing, best-fit pool sub-allocator.

#### `fixed_size_memory_resource`
### `fixed_size_memory_resource`

A memory resource that can only allocate a single fixed size. Average allocation and deallocation
cost is constant.

#### `binning_memory_resource`
### `binning_memory_resource`

Configurable to use multiple upstream memory resources for allocations that fall within different
bin sizes. Often configured with multiple bins backed by `fixed_size_memory_resource`s and a single
`pool_memory_resource` for allocations larger than the largest bin size.

### Default Resources and Per-device Resources
## Default Resources and Per-device Resources

RMM users commonly need to configure a `device_memory_resource` object to use for all allocations
where another resource has not explicitly been provided. A common example is configuring a
Expand All @@ -327,7 +296,7 @@ Accessing and modifying the default resource is done through two functions:
`get_current_device_resource()`
- For more explicit control, you can use `set_per_device_resource()`, which takes a device ID.

#### Example
### Example

```c++
rmm::mr::cuda_memory_resource cuda_mr;
Expand All @@ -339,7 +308,7 @@ rmm::mr::set_current_device_resource(&pool_mr); // Updates the current device re
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource(); // Points to `pool_mr`
```
#### Multiple Devices
### Multiple Devices
A `device_memory_resource` should only be used when the active CUDA device is the same device
that was active when the `device_memory_resource` was created. Otherwise behavior is undefined.
Expand Down Expand Up @@ -386,17 +355,48 @@ line of the error comment.
}
```

### Allocators
## `cuda_stream_view` and `cuda_stream`

`rmm::cuda_stream_view` is a simple non-owning wrapper around a CUDA `cudaStream_t`. This wrapper's
purpose is to provide strong type safety for stream types. (`cudaStream_t` is an alias for a pointer,
which can lead to ambiguity in APIs when it is assigned `0`.) All RMM stream-ordered APIs take a
`rmm::cuda_stream_view` argument.

`rmm::cuda_stream` is a simple owning wrapper around a CUDA `cudaStream_t`. This class provides
RAII semantics (constructor creates the CUDA stream, destructor destroys it). An `rmm::cuda_stream`
can never represent the CUDA default stream or per-thread default stream; it only ever represents
a single non-default stream. `rmm::cuda_stream` cannot be copied, but can be moved.

## `cuda_stream_pool`

`rmm::cuda_stream_pool` provides fast access to a pool of CUDA streams. This class can be used to
create a set of `cuda_stream` objects whose lifetime is equal to the `cuda_stream_pool`. Using the
stream pool can be faster than creating the streams on the fly. The size of the pool is configurable.
Depending on this size, multiple calls to `cuda_stream_pool::get_stream()` may return instances of
`rmm::cuda_stream_view` that represent identical CUDA streams.

## Thread Safety

All current device memory resources are thread safe unless documented otherwise. More specifically,
calls to memory resource `allocate()` and `deallocate()` methods are safe with respect to calls to
either of these functions from other threads. They are _not_ thread safe with respect to
construction and destruction of the memory resource object.

Note that a class `thread_safe_resource_adapter` is provided which can be used to adapt a memory
resource that is not thread safe to be thread safe (as described above). This adapter is not needed
with any current RMM device memory resources.

## Allocators

C++ interfaces commonly allow customizable memory allocation through an [`Allocator`](https://en.cppreference.com/w/cpp/named_req/Allocator) object.
RMM provides several `Allocator` and `Allocator`-like classes.

#### `polymorphic_allocator`
### `polymorphic_allocator`

A [stream-ordered](#stream-ordered-memory-allocation) allocator similar to [`std::pmr::polymorphic_allocator`](https://en.cppreference.com/w/cpp/memory/polymorphic_allocator).
Unlike the standard C++ `Allocator` interface, the `allocate` and `deallocate` functions take a `cuda_stream_view` indicating the stream on which the (de)allocation occurs.

#### `stream_allocator_adaptor`
### `stream_allocator_adaptor`

`stream_allocator_adaptor` can be used to adapt a stream-ordered allocator to present a standard `Allocator` interface to consumers that may not be designed to work with a stream-ordered interface.

Expand All @@ -415,7 +415,7 @@ auto p = adapted.allocate(100);
adapted.deallocate(p,100);
```

#### `thrust_allocator`
### `thrust_allocator`

`thrust_allocator` is a device memory allocator that uses the strongly typed `thrust::device_ptr`, making it usable with containers like `thrust::device_vector`.

Expand Down Expand Up @@ -497,13 +497,13 @@ Similar to `device_memory_resource`, it has two key functions for (de)allocation
Unlike `device_memory_resource`, the `host_memory_resource` interface and behavior is identical to
`std::pmr::memory_resource`.
### Available Resources
## Available Host Resources
#### `new_delete_resource`
### `new_delete_resource`
Uses the global `operator new` and `operator delete` to allocate host memory.
#### `pinned_memory_resource`
### `pinned_memory_resource`
Allocates "pinned" host memory using `cuda(Malloc/Free)Host`.
Expand Down Expand Up @@ -611,7 +611,7 @@ resources are detectable with Compute Sanitizer Memcheck.
It may be possible in the future to add support for memory bounds checking with other memory
resources using NVTX APIs.
## Using RMM in Python Code
# Using RMM in Python
There are two ways to use RMM in Python code:
Expand All @@ -622,7 +622,7 @@ There are two ways to use RMM in Python code:
RMM provides a `MemoryResource` abstraction to control _how_ device
memory is allocated in both the above uses.
### DeviceBuffers
## DeviceBuffer
A DeviceBuffer represents an **untyped, uninitialized device memory
allocation**. DeviceBuffers can be created by providing the
Expand Down Expand Up @@ -662,7 +662,7 @@ host:
array([1., 2., 3.])
```

### MemoryResource objects
## MemoryResource objects

`MemoryResource` objects are used to configure how device memory allocations are made by
RMM.
Expand Down
6 changes: 3 additions & 3 deletions python/docs/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,8 @@ for memory allocations using their by configuring the current
allocator.

```python
from rmm.allocators.torch import rmm_torch_allocator
import torch
>>> from rmm.allocators.torch import rmm_torch_allocator
>>> import torch

torch.cuda.memory.change_current_allocator(rmm_torch_allocator)
>>> torch.cuda.memory.change_current_allocator(rmm_torch_allocator)
```

0 comments on commit 6771b71

Please sign in to comment.