Celeritas  0.5.0-86+4a8eea4
Classes | Public Member Functions | Static Public Member Functions | List of all members
celeritas::Device Class Reference

Manage attributes of the GPU. More...

#include <Device.hh>

Public Types

Type aliases
using MapStrInt = std::map< std::string, int >
 

Public Member Functions

 Device (int id)
 Construct from a device ID.
 
int device_id () const
 Get the CUDA device ID, if active.
 
 operator bool () const
 True if device is initialized.
 
std::string name () const
 Device name.
 
std::size_t total_global_mem () const
 Total memory capacity (bytes)
 
int max_threads_per_block () const
 Maximum number of threads per block (for launch limits)
 
int max_blocks_per_grid () const
 Maximum number of threads per block (for launch limits)
 
int max_threads_per_cu () const
 Maximum number of concurrent threads per compute unit (for occupancy)
 
unsigned int threads_per_warp () const
 Number of threads per warp.
 
bool can_map_host_memory () const
 Whether the device supports mapped pinned memory.
 
unsigned int eu_per_cu () const
 Number of execution units per compute unit (1 for NVIDIA, 4 for AMD)
 
unsigned int capability () const
 CUDA/HIP capability: major * 10 + minor.
 
MapStrInt const & extra () const
 Additional potentially interesting diagnostics.
 
StreamId::size_type num_streams () const
 Number of streams allocated.
 
void create_streams (unsigned int num_streams) const
 Allocate the given number of streams. More...
 
Streamstream (StreamId) const
 Access a stream. More...
 

Static Public Member Functions

static int num_devices ()
 Get the number of available devices. More...
 
static bool debug ()
 Whether verbose messages and error checking are enabled. More...
 

Detailed Description

Manage attributes of the GPU.

CUDA/HIP translation table:

CUDA/NVIDIA HIP/AMD Description
thread work item individual local work element
warp wavefront "vectorized thread" operating in lockstep
block workgroup group of threads able to sync
multiprocessor compute unit hardware executing one or more blocks
multiprocessor execution unit hardware executing one or more warps

Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.

Warning
The current multithreading/multiprocess model is intended to have one GPU serving multiple CPU threads simultaneously, and one MPI process per GPU. The active CUDA device is a static thread-local property but global_device is global. CUDA needs to be activated using activate_device or activate_device_local on every thread, using the same device ID.

Member Function Documentation

◆ create_streams()

void celeritas::Device::create_streams ( unsigned int  num_streams) const

Allocate the given number of streams.

If no streams have been created, the default stream will be used.

◆ debug()

bool celeritas::Device::debug ( )
static

Whether verbose messages and error checking are enabled.

This is true if CELERITAS_DEBUG is set or if the CELER_DEBUG_DEVICE environment variable exists and is not empty.

◆ num_devices()

int celeritas::Device::num_devices ( )
static

Get the number of available devices.

This is nonzero if and only if CUDA support is built-in, if at least one CUDA-capable device is present, and if the CELER_DISABLE_DEVICE environment variable is not set.

◆ stream()

Stream & celeritas::Device::stream ( StreamId  id) const

Access a stream.

This returns the default stream if no streams were allocated.


The documentation for this class was generated from the following files: