System¶

The system subdirectory provides uniform interfaces to hardware and the operating system.

Configuration¶

The corecel/Config.hh configure file contains all-caps definitions of the CMake configuration options as 0/1 defines so they can be used with if constexpr and other C++ expressions. In addition, it defines external C strings with configuration options such as key dependent library versions.

Additionally, corecel/Version.hh defines version numbers as preprocessor definition, a set of integers, and a descriptive string. The external API of Celeritas should depend almost exclusively on the version, not the configured options.

CELERITAS_VERSION¶

Celeritas version as a compile-time constant.

Encoded as a big-endian hexadecimal with one byte per component: (major * 256 + minor) * 256 + patch.

GPU management¶

class Device¶

Manage attributes of the GPU.

CUDA/HIP translation table:

CUDA/NVIDIA	HIP/AMD	Description
thread	work item	individual local work element
warp	wavefront	“vectorized thread” operating in lockstep
block	workgroup	group of threads able to sync
multiprocessor	compute unit	hardware executing one or more blocks
multiprocessor	execution unit	hardware executing one or more warps

Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.

Todo:: Const correctness for streams is wrong; we should probably make the global device non-const (and thread-local?) and then activate it on “move”.

Warning

The current multithreading/multiprocess model is intended to have one GPU serving multiple CPU threads simultaneously, and one MPI process per GPU. The active CUDA device is a static thread-local property but global_device is global. CUDA needs to be activated using activate_device or activate_device_local on every thread, using the same device ID.

Device const &celeritas::device()¶

void celeritas::activate_device()¶

Platform portability macros¶

The Macros.hh file also defines language and compiler abstraction macro definitions. It includes cross-platform (CUDA, C++, HIP) macros that expand to attributes depending on the compiler and build configuration.

CELER_FUNCTION¶

Decorate a function that works on both host and device, with and without NVCC.

The name of this function and its siblings is based on the Kokkos naming scheme.

CELER_CONSTEXPR_FUNCTION¶: Decorate a function that works on both host and device, with and without NVCC, can be evaluated at compile time, and should be forcibly inlined.

CELER_DEVICE_COMPILE¶

Defined and true if building device code in HIP or CUDA.

This is a generic replacement for __CUDA_ARCH__ .

The DeviceRuntimeApi file, which must be included from all .cu files and .cc file which make CUDA/HIP API calls (see Device compilation), provides cross-platform compatibility macros for building against CUDA and HIP.

CELER_DEVICE_API_SYMBOL(TOK)¶: Add a prefix “hip” or “cuda” to a code token.

An assertion macro in Assert.hh checks the return result of CUDA/HIP API calls and throws a detailed exception if they fail:

CELER_DEVICE_API_CALL(STMT)¶

Safely and portably dispatch a CUDA/HIP API call.

When CUDA or HIP support is enabled, execute the wrapped statement prepend the argument with “cuda” or “hip” and throw a RuntimeError if it fails. If no device platform is enabled, throw an unconfigured assertion.

Example:

CELER_DEVICE_API_CALL(Malloc(&ptr_gpu, 100 * sizeof(float)));
CELER_DEVICE_API_CALL(DeviceSynchronize());

Note

A file that uses this macro must include corecel/DeviceRuntimeApi.hh . The CorecelDeviceRuntimeApiHh declaration enforces this when CUDA/HIP are disabled, and the absence of CELER_DEVICE_API_SYMBOL enforces when enabled.

Environment variables¶

class Environment¶

Interrogate and extend environment variables.

This makes it easier to generate reproducible runs, launch Celeritas remotely, or integrate with application drivers. The environment variables may be encoded as JSON input to supplement or override system environment variables, or set programmatically via this API call. Later the environment class can be interrogated to find which environment variables were accessed.

Unlike the standard environment which returns a null pointer for an unset variable, this returns an empty string.

Note

This class is not thread-safe on its own. The celeritas::getenv free function however is safe, although it should only be used in setup (single-thread) steps.

Note

Once inserted into the environment map, values cannot be changed. Standard practice in the code is to evaluate the environment variable exactly once and cache the result as a static const variable. If you really wanted to, you could call celeritas::environment() = {}; but that could result in the end-of-run diagnostic reporting different values than the ones actually used during the code’s setup.

Environment &celeritas::environment()¶

std::string const &celeritas::getenv(std::string const &key)¶

GetenvFlagResult celeritas::getenv_flag(std::string const &key, bool default_val)¶

MPI support¶

class ScopedMpiInit¶

RAII class for initializing and finalizing MPI.

The CELER_DISABLE_PARALLEL environment variable can be used to turn off MPI calls when built with CELERITAS_USE_MPI .

Todo:: Change to CELER_ENABLE_MPI .

Note

Unlike the MpiCommunicator and MpiOperations class, it is not necessary to link against MPI to use this class.

class MpiCommunicator¶

Wrap an MPI communicator.

This class uses ScopedMpiInit to determine whether MPI is available and enabled. As many instances as desired can be created, but Celeritas by default will share the instance returned by comm_world , which defaults to MPI_COMM_WORLD if MPI has been initialized, or a “self” comm if it has not.

A “null” communicator (the default) does not use MPI calls and can be constructed without calling MPI_Init or having MPI compiled. It will act like MPI_Comm_Self but will not actually use MPI calls.

Note

This does not perform any copying or freeing of MPI communiators.

Performance profiling¶

These classes generalize the different low-level profiling libraries, both device and host, described in Performance profiling.

class ScopedProfiling¶

Enable and annotate performance profiling during the lifetime of this class.

This RAII class annotates the profiling output so that, during its scope, events and timing are associated with the given name. For use cases inside separate begin/end functions of a class (often seen in Geant4), use std::optional to start and end the class lifetime.

This is useful for wrapping specific code fragment in a range for profiling, e.g., ignoring of VecGeom instantiation kernels, or profiling a specific action. It is very similar to the NVTX .

Profiling is off by default but must be enabled (in conjunction with other tools; see the profiling section for more details). The CELER_ENABLE_PROFILING environment variable is used to override this behavior. Profiling is never enabled if CUDA/ROC-TX/Perfetto are unavailable.

Example:

Profile only the run, not the setup.

void do_program()
{
    do_setup()
    ScopedProfiling profile_this{"run"};
    do_run();
}

Caveats:

The Nvidia/CUDA implementation of ScopedProfiling only does something when the application using Celeritas is run through a tool that supports NVTX, e.g., nsight compute with the —nvtx argument. If this is not the case, API calls to nvtx are no-ops.
The HIP/AMD ROCTX implementation requires the roctx library, which may not be available on all systems.
The CPU implementation requires Perfetto. It is not available when Celeritas is built with device support (CUDA/HIP).

class TracingSession¶

Record Perfetto events during the lifetime of this object.

This RAII class manages a Perfetto tracing session. Only a single tracing mode is supported. If you are only interested in application-level events (ScopedProfiling and trace_counter), then the in-process mode is sufficient and is enabled by providing the trace data filename to the constructor. When using in-process tracing, the buffer size can be configured by setting CELER_PERFETTO_BUFFER_SIZE_MB.

If no filename is provided, start a system tracing session which records both application-level events and kernel events. Root privilege and Linux ftrace https://kernel.org/doc/Documentation/trace/ftrace.txt are required. To start the system daemons using the perfetto backend, see https://perfetto.dev/docs/quickstart/linux-tracing#capturing-a-trace

Note

Profiling is disabled unless the CELER_ENABLE_PROFILING environment variable is set; see celeritas::ScopedProfiling.

I/O¶

These functions and classes are for communicating helpfully with the user.

CELER_LOG(LEVEL)¶

Return a LogMessage object for streaming into at the given level.

The regular CELER_LOG call is for code paths that happen uniformly in parallel, approximately the same message from every thread and task.

The logger will only format and print messages. It is not responsible for cleaning up the state or exiting an app.

CELER_LOG(debug) << "Don't print this in general";
CELER_LOG(warning) << "You may want to reconsider your life choices";
CELER_LOG(critical) << "Caught a fatal exception: " << e.what();

CELER_LOG_LOCAL(LEVEL)¶

Like CELER_LOG but for code paths that may only happen on a single process or thread.

Use sparingly because this can be very verbose. This is typically used only for error messages coming from an a event or track at runtime.

enum class celeritas::LogLevel

Enumeration for how important a log message is.

Values:

enumerator debug: Debugging messages.

enumerator diagnostic: Diagnostics about current program execution.

enumerator status: Program execution status (what stage is beginning)

enumerator info: Important informational messages.

enumerator warning: Warnings about unusual events.

enumerator error: Something went wrong, but execution can continue.

enumerator critical: Something went terribly wrong, should probably abort.

enumerator size_: Sentinel value for looping over log levels.

class Logger¶

Create a log message to be printed based on output/verbosity settings.

This should generally be called by the world_logger and self_logger functions below. The call operator() returns an object that should be streamed into in order to create a log message.

This object is assignable, so to replace the default log handler with a different one, you can call

world_logger = Logger(my_handler);

When using with MPI, the world_logger global objects are different on each process: rank 0 will have a handler that outputs to screen, and the other ranks will have a “null” handler that suppresses all log output.

Todo:: For v1.0, replace the back-end with spdlog to reduce maintenance burden and improve flexibility.

class ScopedSignalHandler¶

Catch the given signal type within the scope of the handler.

On instantiation with a non-empty argument, this class registers a signal handler for the given signal. A class instance is true if and only if the class is handling a signal. The instance’s “call” operator will check and return whether the assigned signal has been caught. The move-assign operator can be used to unregister the handle.

When the class exits scope, the signal for the active type will be cleared.

Signal handling can be disabled by setting the environment variable CELER_DISABLE_SIGNALS flag, but hopefully this will not be necessary because signal handling should be used sparingly.

#include <csignal>

int main()
{
   ScopedSignalHandler interrupted(SIGINT);

   while (true)
   {
       if (interrupted())
       {
           CELER_LOG(error) << "Interrupted";
           break;
       }

       if (stop_handling_for_whatever_reason())
       {
           // Clear handler
           interrupted = {};
       }
   }
   return interrupted() ? 1 : 0;
}

Warning

This class is not thread safe. If multiple threads have this in scope, only one active (and indeterminate!) thread will mark the flag as intercepted.