Celeritas
0.5.0-86+4a8eea4
|
Manage attributes of the GPU. More...
#include <Device.hh>
Public Types | |
Type aliases | |
using | MapStrInt = std::map< std::string, int > |
Public Member Functions | |
Device (int id) | |
Construct from a device ID. | |
int | device_id () const |
Get the CUDA device ID, if active. | |
operator bool () const | |
True if device is initialized. | |
std::string | name () const |
Device name. | |
std::size_t | total_global_mem () const |
Total memory capacity (bytes) | |
int | max_threads_per_block () const |
Maximum number of threads per block (for launch limits) | |
int | max_blocks_per_grid () const |
Maximum number of threads per block (for launch limits) | |
int | max_threads_per_cu () const |
Maximum number of concurrent threads per compute unit (for occupancy) | |
unsigned int | threads_per_warp () const |
Number of threads per warp. | |
bool | can_map_host_memory () const |
Whether the device supports mapped pinned memory. | |
unsigned int | eu_per_cu () const |
Number of execution units per compute unit (1 for NVIDIA, 4 for AMD) | |
unsigned int | capability () const |
CUDA/HIP capability: major * 10 + minor. | |
MapStrInt const & | extra () const |
Additional potentially interesting diagnostics. | |
StreamId::size_type | num_streams () const |
Number of streams allocated. | |
void | create_streams (unsigned int num_streams) const |
Allocate the given number of streams. More... | |
Stream & | stream (StreamId) const |
Access a stream. More... | |
Static Public Member Functions | |
static int | num_devices () |
Get the number of available devices. More... | |
static bool | debug () |
Whether verbose messages and error checking are enabled. More... | |
Manage attributes of the GPU.
CUDA/HIP translation table:
CUDA/NVIDIA | HIP/AMD | Description |
---|---|---|
thread | work item | individual local work element |
warp | wavefront | "vectorized thread" operating in lockstep |
block | workgroup | group of threads able to sync |
multiprocessor | compute unit | hardware executing one or more blocks |
multiprocessor | execution unit | hardware executing one or more warps |
Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.
global_device
is global. CUDA needs to be activated using activate_device
or activate_device_local
on every thread, using the same device ID. void celeritas::Device::create_streams | ( | unsigned int | num_streams | ) | const |
Allocate the given number of streams.
If no streams have been created, the default stream will be used.
|
static |
Whether verbose messages and error checking are enabled.
This is true if CELERITAS_DEBUG
is set or if the CELER_DEBUG_DEVICE
environment variable exists and is not empty.
|
static |
Get the number of available devices.
This is nonzero if and only if CUDA support is built-in, if at least one CUDA-capable device is present, and if the CELER_DISABLE_DEVICE
environment variable is not set.
Access a stream.
This returns the default stream if no streams were allocated.