System

The system subdirectory provides uniform interfaces to hardware and the operating system.

Configuration

The corecel/Config.hh configure file contains all-caps definitions of the CMake configuration options as 0/1 defines so they can be used with if constexpr and other C++ expressions. In addition, it defines external C strings with configuration options such as key dependent library versions.

Additionally, corecel/Version.hh defines version numbers as preprocessor definition, a set of integers, and a descriptive string. The external API of Celeritas should depend almost exclusively on the version, not the configured options.

CELERITAS_VERSION

Celeritas version as a compile-time constant.

Encoded as a big-endian hexadecimal with one byte per component: (major * 256 + minor) * 256 + patch.

GPU management

class Device

Manage attributes of the GPU.

CUDA/HIP translation table:

CUDA/NVIDIA

HIP/AMD

Description

thread

work item

individual local work element

warp

wavefront

“vectorized thread” operating in lockstep

block

workgroup

group of threads able to sync

multiprocessor

compute unit

hardware executing one or more blocks

multiprocessor

execution unit

hardware executing one or more warps

Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.

Todo:

Const correctness for streams is wrong; we should probably make the global device non-const (and thread-local?) and then activate it on “move”.

Warning

The current multithreading/multiprocess model is intended to have one GPU serving multiple CPU threads simultaneously, and one MPI process per GPU. The active CUDA device is a static thread-local property but global_device is global. CUDA needs to be activated using activate_device or activate_device_local on every thread, using the same device ID.

Device const &celeritas::device()

Get the shared default device.

void celeritas::activate_device()

Initialize the first device if available, when not using MPI.

Platform portability macros

The Macros.hh file also defines language and compiler abstraction macro definitions. It includes cross-platform (CUDA, C++, HIP) macros that expand to attributes depending on the compiler and build configuration.

CELER_FUNCTION

Decorate a function that works on both host and device, with and without NVCC.

The name of this function and its siblings is based on the Kokkos naming scheme.

CELER_CONSTEXPR_FUNCTION

Decorate a function that works on both host and device, with and without NVCC, can be evaluated at compile time, and should be forcibly inlined.

CELER_DEVICE_COMPILE

Defined and true if building device code in HIP or CUDA.

This is a generic replacement for __CUDA_ARCH__ .

The DeviceRuntimeApi file, which must be included from all .cu files and .cc file which make CUDA/HIP API calls (see Device compilation), provides cross-platform compatibility macros for building against CUDA and HIP.

CELER_DEVICE_API_SYMBOL(TOK)

Add a prefix “hip” or “cuda” to a code token.

An assertion macro in Assert.hh checks the return result of CUDA/HIP API calls and throws a detailed exception if they fail:

CELER_DEVICE_API_CALL(STMT)

Safely and portably dispatch a CUDA/HIP API call.

When CUDA or HIP support is enabled, execute the wrapped statement prepend the argument with “cuda” or “hip” and throw a RuntimeError if it fails. If no device platform is enabled, throw an unconfigured assertion.

Example:

CELER_DEVICE_API_CALL(Malloc(&ptr_gpu, 100 * sizeof(float)));
CELER_DEVICE_API_CALL(DeviceSynchronize());

Note

A file that uses this macro must include corecel/DeviceRuntimeApi.hh . The CorecelDeviceRuntimeApiHh declaration enforces this when CUDA/HIP are disabled, and the absence of CELER_DEVICE_API_SYMBOL enforces when enabled.

Environment variables

class Environment

Interrogate and extend environment variables.

This makes it easier to generate reproducible runs, launch Celeritas remotely, or integrate with application drivers. The environment variables may be encoded as JSON input to supplement or override system environment variables, or set programmatically via this API call. Later the environment class can be interrogated to find which environment variables were accessed.

Unlike the standard environment which returns a null pointer for an unset variable, this returns an empty string.

Note

This class is not thread-safe on its own. The celeritas::getenv free function however is safe, although it should only be used in setup (single-thread) steps.

Note

Once inserted into the environment map, values cannot be changed. Standard practice in the code is to evaluate the environment variable exactly once and cache the result as a static const variable. If you really wanted to, you could call celeritas::environment() = {}; but that could result in the end-of-run diagnostic reporting different values than the ones actually used during the code’s setup.

Environment &celeritas::environment()

Access a static global environment variable.

This static variable should be shared among Celeritas objects.

std::string const &celeritas::getenv(std::string const &key)

Thread-safe access to global modified environment variables.

This function will insert the current value of the key into the environment, which remains immutable over the lifetime of the program (allowing the use of static const data to be set from the environment).

GetenvFlagResult celeritas::getenv_flag(std::string const &key, bool default_val)

Get a true/false flag with a default value.

The return value is a pair that has (1) the flag as determined by the environment variable or default value, and (2) an “insertion” flag specifying whether the default was used. The insertion result can be useful for providing a diagnostic message to the user.

As with the general Environment instance that this references, any already-set values (e.g., from JSON input) override whatever variables are in the system environment (e.g., from the shell script that invoked this executable).

  • Allowed true values: "1", "t", "yes", "true", "True"

  • Allowed false values: "0", "f", "no", "false", "False"

  • Empty value returns the default

  • Other value warns and returns the default

MPI support

class ScopedMpiInit

RAII class for initializing and finalizing MPI.

The CELER_DISABLE_PARALLEL environment variable can be used to turn off MPI calls when built with CELERITAS_USE_MPI .

Todo:

Change to CELER_ENABLE_MPI .

Note

Unlike the MpiCommunicator and MpiOperations class, it is not necessary to link against MPI to use this class.

class MpiCommunicator

Wrap an MPI communicator.

This class uses ScopedMpiInit to determine whether MPI is available and enabled. As many instances as desired can be created, but Celeritas by default will share the instance returned by comm_world , which defaults to MPI_COMM_WORLD if MPI has been initialized, or a “self” comm if it has not.

A “null” communicator (the default) does not use MPI calls and can be constructed without calling MPI_Init or having MPI compiled. It will act like MPI_Comm_Self but will not actually use MPI calls.

Note

This does not perform any copying or freeing of MPI communiators.

Performance profiling

These classes generalize the different low-level profiling libraries, both device and host, described in Performance profiling.

class ScopedProfiling

Enable and annotate performance profiling during the lifetime of this class.

This RAII class annotates the profiling output so that, during its scope, events and timing are associated with the given name. For use cases inside separate begin/end functions of a class (often seen in Geant4), use std::optional to start and end the class lifetime.

This is useful for wrapping specific code fragment in a range for profiling, e.g., ignoring of VecGeom instantiation kernels, or profiling a specific action. It is very similar to the NVTX .

Example:

void do_program()
{
    do_setup()
    ScopedProfiling profile_this{"run"};
    do_run();
}

Caveats:

  • The Nvidia/CUDA implementation of ScopedProfiling only does something when the application using Celeritas is run through a tool that supports NVTX, e.g., nsight compute with the —nvtx argument. If this is not the case, API calls to nvtx are no-ops.

  • The HIP/AMD ROCTX implementation requires the roctx library, which may not be available on all systems.

  • The CPU implementation requires Perfetto. It is not available when Celeritas is built with device support (CUDA/HIP).

I/O

These functions and classes are for communicating helpfully with the user.

CELER_LOG(LEVEL)

Return a LogMessage object for streaming into at the given level.

The regular CELER_LOG call is for code paths that happen uniformly in parallel, approximately the same message from every thread and task.

The logger will only format and print messages. It is not responsible for cleaning up the state or exiting an app.

CELER_LOG(debug) << "Don't print this in general";
CELER_LOG(warning) << "You may want to reconsider your life choices";
CELER_LOG(critical) << "Caught a fatal exception: " << e.what();
CELER_LOG_LOCAL(LEVEL)

Like CELER_LOG but for code paths that may only happen on a single process or thread.

Use sparingly because this can be very verbose. This is typically used only for error messages coming from an a event or track at runtime.

enum class celeritas::LogLevel

Enumeration for how important a log message is.

Values:

enumerator debug

Debugging messages.

enumerator diagnostic

Diagnostics about current program execution.

enumerator status

Program execution status (what stage is beginning)

enumerator info

Important informational messages.

enumerator warning

Warnings about unusual events.

enumerator error

Something went wrong, but execution can continue.

enumerator critical

Something went terribly wrong, should probably abort.

enumerator size_

Sentinel value for looping over log levels.

class Logger

Create a log message to be printed based on output/verbosity settings.

This should generally be called by the world_logger and self_logger functions below. The call operator() returns an object that should be streamed into in order to create a log message.

This object is assignable, so to replace the default log handler with a different one, you can call

world_logger = Logger(my_handler);

When using with MPI, the world_logger global objects are different on each process: rank 0 will have a handler that outputs to screen, and the other ranks will have a “null” handler that suppresses all log output.

Todo:

For v1.0, replace the back-end with spdlog to reduce maintenance burden and improve flexibility.

class ScopedSignalHandler

Catch the given signal type within the scope of the handler.

On instantiation with a non-empty argument, this class registers a signal handler for the given signal. A class instance is true if and only if the class is handling a signal. The instance’s “call” operator will check and return whether the assigned signal has been caught. The move-assign operator can be used to unregister the handle.

When the class exits scope, the signal for the active type will be cleared.

Signal handling can be disabled by setting the environment variable CELER_DISABLE_SIGNALS flag, but hopefully this will not be necessary because signal handling should be used sparingly.

#include <csignal>

int main()
{
   ScopedSignalHandler interrupted(SIGINT);

   while (true)
   {
       if (interrupted())
       {
           CELER_LOG(error) << "Interrupted";
           break;
       }

       if (stop_handling_for_whatever_reason())
       {
           // Clear handler
           interrupted = {};
       }
   }
   return interrupted() ? 1 : 0;
}

Warning

This class is not thread safe. If multiple threads have this in scope, only one active (and indeterminate!) thread will mark the flag as intercepted.