System¶
The system subdirectory provides uniform interfaces to hardware and the operating system.
Configuration¶
The corecel/Config.hh
configure file contains all-caps definitions of the
CMake configuration options as 0/1 defines so they can be used with if
constexpr
and other C++ expressions. In addition, it defines external C strings
with configuration options such as key dependent library versions.
Additionally, corecel/Version.hh
defines version numbers as preprocessor
definition, a set of integers, and a descriptive string. The external API of
Celeritas should depend almost exclusively on the version, not the configured
options.
-
CELERITAS_VERSION¶
Celeritas version as a compile-time constant.
Encoded as a big-endian hexadecimal with one byte per component: (major * 256 + minor) * 256 + patch.
GPU management¶
-
class Device¶
Manage attributes of the GPU.
CUDA/HIP translation table:
CUDA/NVIDIA
HIP/AMD
Description
thread
work item
individual local work element
warp
wavefront
“vectorized thread” operating in lockstep
block
workgroup
group of threads able to sync
multiprocessor
compute unit
hardware executing one or more blocks
multiprocessor
execution unit
hardware executing one or more warps
Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.
- Todo:
Const correctness for streams is wrong; we should probably make the global device non-const (and thread-local?) and then activate it on “move”.
Warning
The current multithreading/multiprocess model is intended to have one GPU serving multiple CPU threads simultaneously, and one MPI process per GPU. The active CUDA device is a static thread-local property but
global_device
is global. CUDA needs to be activated usingactivate_device
oractivate_device_local
on every thread, using the same device ID.
-
void celeritas::activate_device()¶
Initialize the first device if available, when not using MPI.
Platform portability macros¶
The Macros.hh
file also defines language and compiler abstraction macro
definitions. It includes cross-platform (CUDA, C++, HIP) macros that expand to
attributes depending on the compiler and build configuration.
-
CELER_FUNCTION¶
Decorate a function that works on both host and device, with and without NVCC.
The name of this function and its siblings is based on the Kokkos naming scheme.
-
CELER_CONSTEXPR_FUNCTION¶
Decorate a function that works on both host and device, with and without NVCC, can be evaluated at compile time, and should be forcibly inlined.
-
CELER_DEVICE_COMPILE¶
Defined and true if building device code in HIP or CUDA.
This is a generic replacement for
__CUDA_ARCH__
.
The DeviceRuntimeApi
file, which must be included from all .cu
files and .cc
file which make CUDA/HIP API calls (see
Device compilation), provides cross-platform compatibility macros for
building against CUDA and HIP.
-
CELER_DEVICE_API_SYMBOL(TOK)¶
Add a prefix “hip” or “cuda” to a code token.
An assertion macro in Assert.hh
checks the return result of CUDA/HIP API calls and throws a detailed exception if they fail:
-
CELER_DEVICE_API_CALL(STMT)¶
Safely and portably dispatch a CUDA/HIP API call.
When CUDA or HIP support is enabled, execute the wrapped statement prepend the argument with “cuda” or “hip” and throw a RuntimeError if it fails. If no device platform is enabled, throw an unconfigured assertion.
Example:
CELER_DEVICE_API_CALL(Malloc(&ptr_gpu, 100 * sizeof(float))); CELER_DEVICE_API_CALL(DeviceSynchronize());
Note
A file that uses this macro must include
corecel/DeviceRuntimeApi.hh
. TheCorecelDeviceRuntimeApiHh
declaration enforces this when CUDA/HIP are disabled, and the absence ofCELER_DEVICE_API_SYMBOL
enforces when enabled.
Environment variables¶
-
class Environment¶
Interrogate and extend environment variables.
This makes it easier to generate reproducible runs, launch Celeritas remotely, or integrate with application drivers. The environment variables may be encoded as JSON input to supplement or override system environment variables, or set programmatically via this API call. Later the environment class can be interrogated to find which environment variables were accessed.
Unlike the standard environment which returns a null pointer for an unset variable, this returns an empty string.
Note
This class is not thread-safe on its own. The
celeritas::getenv
free function however is safe, although it should only be used in setup (single-thread) steps.Note
Once inserted into the environment map, values cannot be changed. Standard practice in the code is to evaluate the environment variable exactly once and cache the result as a static const variable. If you really wanted to, you could call
celeritas::environment() = {};
but that could result in the end-of-run diagnostic reporting different values than the ones actually used during the code’s setup.
-
Environment &celeritas::environment()¶
Access a static global environment variable.
This static variable should be shared among Celeritas objects.
-
std::string const &celeritas::getenv(std::string const &key)¶
Thread-safe access to global modified environment variables.
This function will insert the current value of the key into the environment, which remains immutable over the lifetime of the program (allowing the use of
static const
data to be set from the environment).
-
GetenvFlagResult celeritas::getenv_flag(std::string const &key, bool default_val)¶
Get a true/false flag with a default value.
The return value is a pair that has (1) the flag as determined by the environment variable or default value, and (2) an “insertion” flag specifying whether the default was used. The insertion result can be useful for providing a diagnostic message to the user.
As with the general
Environment
instance that this references, any already-set values (e.g., from JSON input) override whatever variables are in the system environment (e.g., from the shell script that invoked this executable).Allowed true values:
"1", "t", "yes", "true", "True"
Allowed false values:
"0", "f", "no", "false", "False"
Empty value returns the default
Other value warns and returns the default
MPI support¶
-
class ScopedMpiInit¶
RAII class for initializing and finalizing MPI.
The
CELER_DISABLE_PARALLEL
environment variable can be used to turn off MPI calls when built with CELERITAS_USE_MPI .- Todo:
Change to CELER_ENABLE_MPI .
Note
Unlike the MpiCommunicator and MpiOperations class, it is not necessary to link against MPI to use this class.
-
class MpiCommunicator¶
Wrap an MPI communicator.
This class uses
ScopedMpiInit
to determine whether MPI is available and enabled. As many instances as desired can be created, but Celeritas by default will share the instance returned bycomm_world
, which defaults toMPI_COMM_WORLD
if MPI has been initialized, or a “self” comm if it has not.A “null” communicator (the default) does not use MPI calls and can be constructed without calling
MPI_Init
or having MPI compiled. It will act likeMPI_Comm_Self
but will not actually use MPI calls.Note
This does not perform any copying or freeing of MPI communiators.
Performance profiling¶
These classes generalize the different low-level profiling libraries, both device and host, described in Performance profiling.
-
class ScopedProfiling¶
Enable and annotate performance profiling during the lifetime of this class.
This RAII class annotates the profiling output so that, during its scope, events and timing are associated with the given name. For use cases inside separate begin/end functions of a class (often seen in Geant4), use
std::optional
to start and end the class lifetime.This is useful for wrapping specific code fragment in a range for profiling, e.g., ignoring of VecGeom instantiation kernels, or profiling a specific action. It is very similar to the NVTX .
Example:
void do_program() { do_setup() ScopedProfiling profile_this{"run"}; do_run(); }
Caveats:
The Nvidia/CUDA implementation of
ScopedProfiling
only does something when the application using Celeritas is run through a tool that supports NVTX, e.g., nsight compute with the —nvtx argument. If this is not the case, API calls to nvtx are no-ops.The HIP/AMD ROCTX implementation requires the roctx library, which may not be available on all systems.
The CPU implementation requires Perfetto. It is not available when Celeritas is built with device support (CUDA/HIP).
I/O¶
These functions and classes are for communicating helpfully with the user.
-
CELER_LOG(LEVEL)¶
Return a LogMessage object for streaming into at the given level.
The regular
CELER_LOG
call is for code paths that happen uniformly in parallel, approximately the same message from every thread and task.The logger will only format and print messages. It is not responsible for cleaning up the state or exiting an app.
CELER_LOG(debug) << "Don't print this in general"; CELER_LOG(warning) << "You may want to reconsider your life choices"; CELER_LOG(critical) << "Caught a fatal exception: " << e.what();
-
CELER_LOG_LOCAL(LEVEL)¶
Like
CELER_LOG
but for code paths that may only happen on a single process or thread.Use sparingly because this can be very verbose. This is typically used only for error messages coming from an a event or track at runtime.
-
enum class celeritas::LogLevel
Enumeration for how important a log message is.
Values:
-
enumerator debug
Debugging messages.
-
enumerator diagnostic
Diagnostics about current program execution.
-
enumerator status
Program execution status (what stage is beginning)
-
enumerator info
Important informational messages.
-
enumerator warning
Warnings about unusual events.
-
enumerator error
Something went wrong, but execution can continue.
-
enumerator critical
Something went terribly wrong, should probably abort.
-
enumerator size_
Sentinel value for looping over log levels.
-
enumerator debug
-
class Logger¶
Create a log message to be printed based on output/verbosity settings.
This should generally be called by the
world_logger
andself_logger
functions below. The calloperator()
returns an object that should be streamed into in order to create a log message.This object is assignable, so to replace the default log handler with a different one, you can call
world_logger = Logger(my_handler);
When using with MPI, the
world_logger
global objects are different on each process: rank 0 will have a handler that outputs to screen, and the other ranks will have a “null” handler that suppresses all log output.- Todo:
For v1.0, replace the back-end with
spdlog
to reduce maintenance burden and improve flexibility.
-
class ScopedSignalHandler¶
Catch the given signal type within the scope of the handler.
On instantiation with a non-empty argument, this class registers a signal handler for the given signal. A class instance is true if and only if the class is handling a signal. The instance’s “call” operator will check and return whether the assigned signal has been caught. The move-assign operator can be used to unregister the handle.
When the class exits scope, the signal for the active type will be cleared.
Signal handling can be disabled by setting the environment variable
CELER_DISABLE_SIGNALS
flag, but hopefully this will not be necessary because signal handling should be used sparingly.#include <csignal> int main() { ScopedSignalHandler interrupted(SIGINT); while (true) { if (interrupted()) { CELER_LOG(error) << "Interrupted"; break; } if (stop_handling_for_whatever_reason()) { // Clear handler interrupted = {}; } } return interrupted() ? 1 : 0; }
Warning
This class is not thread safe. If multiple threads have this in scope, only one active (and indeterminate!) thread will mark the flag as intercepted.