Celeritas 0.6.0-rc.2.10+develop.de0a3a05
|
Manage attributes of the GPU. More...
#include <Device.hh>
Public Types | |
Type aliases | |
using | MapStrInt = std::map< std::string, int > |
Public Member Functions | |
Device (int id) | |
Construct from a device ID. | |
int | device_id () const |
Get the CUDA device ID, if active. | |
operator bool () const | |
True if device is initialized. | |
std::string | name () const |
Device name. | |
std::size_t | total_global_mem () const |
Total memory capacity (bytes) | |
int | max_threads_per_block () const |
Maximum number of threads per block (for launch limits) | |
int | max_blocks_per_grid () const |
Maximum number of threads per block (for launch limits) | |
int | max_threads_per_cu () const |
Maximum number of concurrent threads per compute unit (for occupancy) | |
unsigned int | threads_per_warp () const |
Number of threads per warp. | |
bool | can_map_host_memory () const |
Whether the device supports mapped pinned memory. | |
unsigned int | eu_per_cu () const |
Number of execution units per compute unit (1 for NVIDIA, 4 for AMD) | |
unsigned int | capability () const |
CUDA/HIP capability: major * 10 + minor. | |
MapStrInt const & | extra () const |
Additional potentially interesting diagnostics. | |
StreamId::size_type | num_streams () const |
Number of streams allocated. | |
void | create_streams (unsigned int num_streams) const |
Allocate the given number of streams. | |
void | destroy_streams () const |
Deallocate all streams before shutting down CUDA. | |
Stream & | stream (StreamId) const |
Access a stream after creating. | |
Static Public Member Functions | |
static int | num_devices () |
Get the number of available devices. | |
static bool | debug () |
Whether verbose messages and error checking are enabled. | |
static bool | async () |
Whether asynchronous operations are supported. | |
Manage attributes of the GPU.
CUDA/HIP translation table:
CUDA/NVIDIA | HIP/AMD | Description |
---|---|---|
thread | work item | individual local work element |
warp | wavefront | "vectorized thread" operating in lockstep |
block | workgroup | group of threads able to sync |
multiprocessor | compute unit | hardware executing one or more blocks |
multiprocessor | execution unit | hardware executing one or more warps |
Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.
global_device
is global. CUDA needs to be activated using activate_device
or activate_device_local
on every thread, using the same device ID.
|
static |
Whether asynchronous operations are supported.
This is true by default if CUDA or HIP (5.2 <= HIP_VERSION < 5.7) is in use, and can be disabled by setting the CELER_DEVICE_ASYNC
environment variable.
|
static |
Whether verbose messages and error checking are enabled.
This is true if CELERITAS_DEBUG
is set or if the CELER_DEBUG_DEVICE
environment variable exists and is not empty.
void celeritas::Device::destroy_streams | ( | ) | const |
Deallocate all streams before shutting down CUDA.
Depending on initialization order, CUDA may be shut down (or shutting down) by the time the destructor for the global Device fires.
Note that this is used in the constructor to initialize a single global stream for the device. The streams_
vector is only empty when the device is false
.
|
static |
Get the number of available devices.
This is nonzero if and only if CUDA support is built-in, if at least one CUDA-capable device is present, and if the CELER_DISABLE_DEVICE
environment variable is not set.