Manage attributes of the GPU. More...

#include <Device.hh>

Public Types
Type aliases
using	MapStrInt = std::map< std::string, int >

Public Member Functions
	Device (int id)
	Construct from a device ID.

int	device_id () const
	Get the CUDA device ID, if active.

	operator bool () const
	True if device is initialized.

std::string	name () const
	Device name.

std::size_t	total_global_mem () const
	Total memory capacity (bytes)

int	max_threads_per_block () const
	Maximum number of threads per block (for launch limits)

int	max_blocks_per_grid () const
	Maximum number of threads per block (for launch limits)

int	max_threads_per_cu () const
	Maximum number of concurrent threads per compute unit (for occupancy)

unsigned int	threads_per_warp () const
	Number of threads per warp.

bool	can_map_host_memory () const
	Whether the device supports mapped pinned memory.

unsigned int	eu_per_cu () const
	Number of execution units per compute unit (1 for NVIDIA, 4 for AMD)

unsigned int	capability () const
	CUDA/HIP capability: major * 10 + minor.

MapStrInt const &	extra () const
	Additional potentially interesting diagnostics.

StreamId::size_type	num_streams () const
	Number of streams allocated.

void	create_streams (unsigned int num_streams) const
	Allocate the given number of streams.

void	destroy_streams () const
	Deallocate all streams before shutting down CUDA.

Stream &	stream (StreamId) const
	Access a stream after creating.

Static Public Member Functions
static int	num_devices ()
	Get the number of available devices.

static bool	debug ()
	Whether verbose messages and error checking are enabled.

static bool	async ()
	Whether asynchronous operations are supported.

Detailed Description

Manage attributes of the GPU.

CUDA/HIP translation table:

CUDA/NVIDIA	HIP/AMD	Description
thread	work item	individual local work element
warp	wavefront	"vectorized thread" operating in lockstep
block	workgroup	group of threads able to sync
multiprocessor	compute unit	hardware executing one or more blocks
multiprocessor	execution unit	hardware executing one or more warps

Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.

Warning: The current multithreading/multiprocess model is intended to have one GPU serving multiple CPU threads simultaneously, and one MPI process per GPU. The active CUDA device is a static thread-local property but global_device is global. CUDA needs to be activated using activate_device or activate_device_local on every thread, using the same device ID.

Todo:: Const correctness for streams is wrong; we should probably make the global device non-const (and thread-local?) and then activate it on "move".

Member Function Documentation

◆ async()

bool celeritas::Device::async ( )

static

Whether asynchronous operations are supported.

This is true by default if CUDA or HIP (5.2 <= HIP_VERSION < 5.7) is in use, and can be disabled by setting the CELER_DEVICE_ASYNC environment variable.

◆ debug()

bool celeritas::Device::debug ( )

static

Whether verbose messages and error checking are enabled.

This is true if CELERITAS_DEBUG is set or if the CELER_DEBUG_DEVICE environment variable exists and is not empty.

◆ destroy_streams()

void celeritas::Device::destroy_streams ( ) const

Deallocate all streams before shutting down CUDA.

Depending on initialization order, CUDA may be shut down (or shutting down) by the time the destructor for the global Device fires.

Note that this is used in the constructor to initialize a single global stream for the device. The streams_ vector is only empty when the device is false.

Todo:: Const correctness for create_ and destroy_ streams is wrong; we should probably make the global device non-const (and thread-local?) and then activate it on "move".

◆ num_devices()

int celeritas::Device::num_devices ( )

static

Get the number of available devices.

This is nonzero if and only if CUDA support is built-in, if at least one CUDA-capable device is present, and if the CELER_DISABLE_DEVICE environment variable is not set.

The documentation for this class was generated from the following files:

Public Types

Public Member Functions

Static Public Member Functions

Detailed Description

Member Function Documentation

◆ async()

◆ debug()

◆ destroy_streams()

◆ num_devices()