Core functionality¶

The corecel directory contains functionality shared by Celeritas and ORANGE primarily pertaining to GPU abstractions.

Configuration¶

The corecel/Config.hh configure file contains all-caps definitions of the CMake configuration options as 0/1 defines so they can be used with if constexpr and other C++ expressions. In addition, it defines external C strings with configuration options such as key dependent library versions. Finally, corecel/Version.hh defines version numbers as preprocessor definition, a set of integers, and a descriptive string. The external API of Celeritas should depend almost exclusively on the version, not the configured options.

The files celeritas_config,version,cmake_strings,sys_config.h and corecel/device_runtime_api.h are deprecated aliases for backward-compatibility. .. deprecated:: v0.5

These will be removed in v0.6.

CELERITAS_VERSION¶

Celeritas version as a compile-time constant.

Encoded as a big-endian hexadecimal with one byte per component: (major * 256 + minor) * 256 + patch.

Fundamentals¶

Several high-level types are defined as aliases since they may change based on configuration: for example, size_type is a 32-bit integer when building with device code enabled, but is a 64-bit integer on other 64-bit systems.

using celeritas::size_type = unsigned int¶: Standard type for container sizes, optimized for GPU use.

using celeritas::real_type = double¶: Numerical type for real numbers.

Debug assertions¶

Celeritas exception types and assertions are defined in corecel/Assert. Debug assertions (see Test thoroughly) are only enabled when the CELERITAS_DEBUG (host code) and CELERITAS_DEVICE_DEBUG (device code) CMake configuration options are set.

The assertion macros CELER_EXPECT, CELER_ASSERT, and CELER_ENSURE correspond to “precondition contract”, “internal assertion”, and “postcondition contract”.

CELER_EXPECT(COND)¶

Precondition debug assertion macro.

We “expect” that the input values or initial state satisfy a precondition, and we throw exception in debug mode if they do not.

CELER_ASSERT(COND)¶

Internal debug assertion macro.

This replaces standard assert usage.

CELER_ENSURE(COND)¶

Postcondition debug assertion macro.

Use to “ensure” that return values or side effects are as expected when leaving a function.

The following two macros will throw debug assertions or cause undefined behavior at runtime to allow compiler optimizations:

CELER_ASSERT_UNREACHABLE()¶

Throw an assertion if the code point is reached.

When debug assertions are turned off, this changes to a compiler hint that improves optimization (and may force the code to exit uncermoniously if the point is encountered, rather than continuing on with undefined behavior).

CELER_ASSUME(COND)¶

Always-on compiler assumption.

This should be used very rarely: you should make sure the resulting assembly is simplified in optimize mode from using the assumption. For example, sometimes informing the compiler of an assumption can reduce code bloat by skipping standard library exception handling code (e.g. in std::visit by assuming

!var_obj.valueless_by_exception() 

Finally, a few runtime macros will always throw helpful errors based on incorrect configuration or input values.

CELER_VALIDATE(COND, MSG)¶

Always-on runtime assertion macro.

This can check user input and input data consistency, and will raise RuntimeError on failure with a descriptive error message that is streamed as the second argument. This macro cannot be used in __device__ -annotated code.

The error message should read:

 "<PROBLEM> (<WHY IT'S A PROBLEM>) <SUGGESTION>?"

Examples with correct casing and punctuation:

”failed to open ‘{filename}’ (should contain relaxation data)”
”unexpected end of file ‘{filename}’ (data is inconsistent with boundaries)”
”MPI was not initialized (needed to construct a communicator). Maybe set the environment variable CELER_DISABLE_PARALLEL=1 to disable externally?”
”invalid min_range={opts.min_range} (must be positive)”

This looks in practice like:

CELER_VALIDATE(file_stream,
               << "failed to open '" << filename
               << "' (should contain relaxation data)");

An always-on debug-type assertion without a detailed message can be constructed by omitting the stream (but leaving the comma):

CELER_VALIDATE(file_stream,);

CELER_NOT_CONFIGURED(WHAT)¶

Assert if the code point is reached because an optional feature is disabled.

This generally should be used for the constructors of dummy class definitions in, e.g., Foo.nocuda.cc:

Foo::Foo()
{
    CELER_NOT_CONFIGURED("CUDA");
}

CELER_NOT_IMPLEMENTED(WHAT)¶

Assert if the code point is reached because a feature has yet to be fully implemented.

This placeholder is so that code paths can be “declared but not defined” and implementations safely postponed in a greppable manner. This should not be used to define “unused” overrides for virtual classes. A correct use case would be:

if (z > AtomicNumber{26})
{
    CELER_NOT_IMPLEMENTED("physics for heavy nuclides");
}

Utility macros¶

The corecel/Macros.hh file defines language and compiler abstraction macro definitions.

CELER_TRY_HANDLE(STATEMENT, HANDLE_EXCEPTION)¶: “Try” to execute the statement, and “handle” all thrown errors by calling the given function-like error handler with a std::exception_ptr object.

Note

A file that uses this macro must include the <exception> header (but since the HANDLE_EXCEPTION needs to take an exception pointer, it’s got to be included anyway).

CELER_TRY_HANDLE_CONTEXT(STATEMENT, HANDLE_EXCEPTION, CONTEXT_EXCEPTION)¶

Try the given statement, and if it fails, chain it into the given exception.

The given CONTEXT_EXCEPTION must be an expression that yields an rvalue to a std::exception subclass that isn’t final . The resulting chained exception will be passed into HANDLE_EXCEPTION for processing.

CELER_DEFAULT_COPY_MOVE(CLS)¶

Explicitly declare defaulted copy and move constructors and assignment operators.

Use this if the destructor is declared explicitly, or as part of the “protected” section of an interface class to prevent assignment between incompatible classes.

CELER_DELETE_COPY_MOVE(CLS)¶

Explicitly declare deleted copy and move constructors and assignment operators.

Use this for scoped RAII classes.

CELER_DEFAULT_MOVE_DELETE_COPY(CLS)¶

Explicitly declare defaulted copy and move constructors and assignment operators.

Use this if the destructor is declared explicitly.

CELER_DISCARD(CODE)¶

The argument is an unevaluated operand which will generate no code but force the expression to be used.

This is used in place of the

[[maybe_unused]] 

attribute, which actually generates warnings in older versions of GCC.

Platform portability macros¶

The Macros.hh file also defines language and compiler abstraction macro definitions. It includes cross-platform (CUDA, C++, HIP) macros that expand to attributes depending on the compiler and build configuration.

CELER_FUNCTION¶

Decorate a function that works on both host and device, with and without NVCC.

The name of this function and its siblings is based on the Kokkos naming scheme.

CELER_CONSTEXPR_FUNCTION¶: Decorate a function that works on both host and device, with and without NVCC, can be evaluated at compile time, and should be forcibly inlined.

CELER_DEVICE_COMPILE¶

Defined and true if building device code in HIP or CUDA.

This is a generic replacement for __CUDA_ARCH__ .

The DeviceRuntimeApi file, which must be included from all .cu files and .cc file which make CUDA/HIP API calls (see Device compilation), provides cross-platform compatibility macros for building against CUDA and HIP.

CELER_DEVICE_API_SYMBOL(TOK)¶: Add a prefix “hip” or “cuda” to a code token.

An assertion macro in Assert.hh checks the return result of CUDA/HIP API calls and throws a detailed exception if they fail:

CELER_DEVICE_API_CALL(STMT)¶

Safely and portably dispatch a CUDA/HIP API call.

When CUDA or HIP support is enabled, execute the wrapped statement prepend the argument with “cuda” or “hip” and throw a RuntimeError if it fails. If no device platform is enabled, throw an unconfigured assertion.

Example:

CELER_DEVICE_API_CALL(Malloc(&ptr_gpu, 100 * sizeof(float)));
CELER_DEVICE_API_CALL(DeviceSynchronize());

Note

A file that uses this macro must include corecel/DeviceRuntimeApi.hh . The CorecelDeviceRuntimeApiHh declaration enforces this when CUDA/HIP are disabled, and the absence of CELER_DEVICE_API_SYMBOL enforces when enabled.

System¶

The system subdirectory provides uniform interfaces to hardware and the operating system.

GPU management¶

class Device¶

Manage attributes of the GPU.

CUDA/HIP translation table:

CUDA/NVIDIA	HIP/AMD	Description
thread	work item	individual local work element
warp	wavefront	“vectorized thread” operating in lockstep
block	workgroup	group of threads able to sync
multiprocessor	compute unit	hardware executing one or more blocks
multiprocessor	execution unit	hardware executing one or more warps

Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.

Todo:: Const correctness for streams is wrong; we should probably make the global device non-const (and thread-local?) and then activate it on “move”.

Warning

The current multithreading/multiprocess model is intended to have one GPU serving multiple CPU threads simultaneously, and one MPI process per GPU. The active CUDA device is a static thread-local property but global_device is global. CUDA needs to be activated using activate_device or activate_device_local on every thread, using the same device ID.

Device const &celeritas::device()¶: Get the shared default device.

void celeritas::activate_device()¶: Initialize the first device if available, when not using MPI.

Environment variables¶

class Environment¶

Interrogate and extend environment variables.

This makes it easier to generate reproducible runs, launch Celeritas remotely, or integrate with application drivers. The environment variables may be encoded as JSON input to supplement or override system environment variables, or set programmatically via this API call. Later the environment class can be interrogated to find which environment variables were accessed.

Unlike the standard environment which returns a null pointer for an unset variable, this returns an empty string.

Note

This class is not thread-safe on its own. The celeritas::getenv free function however is safe, although it should only be used in setup (single-thread) steps.

Note

Once inserted into the environment map, values cannot be changed. Standard practice in the code is to evaluate the environment variable exactly once and cache the result as a static const variable. If you really wanted to, you could call celeritas::environment() = {}; but that could result in the end-of-run diagnostic reporting different values than the ones actually used during the code’s setup.

Environment &celeritas::environment()¶

Access a static global environment variable.

This static variable should be shared among Celeritas objects.

std::string const &celeritas::getenv(std::string const &key)¶

Thread-safe access to global modified environment variables.

This function will insert the current value of the key into the environment, which remains immutable over the lifetime of the program (allowing the use of static const data to be set from the environment).

GetenvFlagResult celeritas::getenv_flag(std::string const &key, bool default_val)¶

Get a true/false flag with a default value.

The return value is a pair that has (1) the flag as determined by the environment variable or default value, and (2) an “insertion” flag specifying whether the default was used. The insertion result can be useful for providing a diagnostic message to the user.

Allowed true values: "1", "t", "yes", "true", "True"
Allowed false values: "0", "f", "no", "false", "False"
Empty value returns the default
Other value warns and returns the default

MPI support¶

class ScopedMpiInit¶: RAII class for initializing and finalizing MPI.

Note

Unlike the MpiCommunicator and MpiOperations class, it is not necessary to link against MPI to use this class.

class MpiCommunicator¶

Wrap an MPI communicator.

This class uses ScopedMpiInit to determine whether MPI is available and enabled. As many instances as desired can be created, but Celeritas by default will share the instance returned by comm_world , which defaults to MPI_COMM_WORLD if MPI has been initialized, or a “self” comm if it has not.

A “null” communicator (the default) does not use MPI calls and can be constructed without calling MPI_Init or having MPI compiled. It will act like MPI_Comm_Self but will not actually use MPI calls.

Note

This does not perform any copying or freeing of MPI communiators.

Utility functions¶

These functions replace or extend those in the C++ standard library <utility> header but work in GPU code without the special --expt-relaxed-constexpr flag.

template<class T> T &&celeritas::forward(typename std::remove_reference<T>::type &v) noexcept¶: Implement perfect forwarding with device-friendly functions.

template<class T> auto celeritas::move(T &&v) noexcept -> typename std::remove_reference<T>::type&&¶: Cast a value as an rvalue reference to allow move construction.

template<class T> void celeritas::trivial_swap(T &a, T &b) noexcept¶: Support swapping of trivial types.

template<class T, class U = T> T celeritas::exchange(T &dst, U &&src)¶

Replace a value and return the original.

This has a similar signature to atomic updates.

Algorithms¶

These device-compatible functions replace or extend those in the C++ standard library <algorithm> header. The implementations of sort and other partitioning elements are derived from LLVM’s libc++.

template<class InputIt, class Predicate> inline bool celeritas::all_of(InputIt iter, InputIt last, Predicate p)¶: Whether the predicate is true for all items.

template<class InputIt, class Predicate> inline bool celeritas::any_of(InputIt iter, InputIt last, Predicate p)¶: Whether the predicate is true for any item.

template<class InputIt, class Predicate> inline bool celeritas::all_adjacent(InputIt iter, InputIt last, Predicate p)¶: Whether the predicate is true for pairs of consecutive items.

template<class ForwardIt, class T, class Compare> ForwardIt celeritas::lower_bound(ForwardIt first, ForwardIt last, T const &value, Compare comp)¶: Find the insertion point for a value in a sorted list using a binary search.

template<class ForwardIt, class T, class Compare> ForwardIt celeritas::lower_bound_linear(ForwardIt first, ForwardIt last, T const &value, Compare comp)¶: Find the insertion point for a value in a sorted list using a linear search.

template<class ForwardIt, class T, class Compare> ForwardIt celeritas::upper_bound(ForwardIt first, ForwardIt last, T const &value, Compare comp)¶

Find the first element which is greater than

template<class ForwardIt, class T, class Compare> inline ForwardIt celeritas::find_sorted(ForwardIt first, ForwardIt last, T const &value, Compare comp)¶: Find the given element in a sorted range.

template<class ForwardIt, class Predicate> ForwardIt celeritas::partition(ForwardIt first, ForwardIt last, Predicate pred)¶

Partition elements in the given range, “true” before “false”.

This is done by swapping elements until the range is partitioned.

template<class RandomAccessIt, class Compare> void celeritas::sort(RandomAccessIt first, RandomAccessIt last, Compare comp)¶

Sort an array on a single thread.

This implementation is not thread-safe nor cooperative, but it can be called from CUDA code.

template<class T, std::enable_if_t<!std::is_floating_point<T>::value, bool> = true> T const &celeritas::max(T const &a, T const &b) noexcept¶

Return the higher of two values.

This function is specialized so that floating point types use std::fmax for better performance on GPU and ARM.

template<class T, std::enable_if_t<!std::is_floating_point<T>::value, bool> = true> T const &celeritas::min(T const &a, T const &b) noexcept¶

Return the lower of two values.

This function is specialized so that floating point types use std::fmin for better performance on GPU and ARM.

template<class ForwardIt, class Compare> inline ForwardIt celeritas::min_element(ForwardIt iter, ForwardIt last, Compare comp)¶: Return an iterator to the lowest value in the range as defined by Compare.

Numerics¶

These functions replace or extend those in the C++ standard library <cmath> and <numeric> headers.

template<class T> inline T const &celeritas::clamp(T const &v, T const &lo, T const &hi)¶

Clamp the value between lo and hi values.

If the value is between lo and hi, return the value. Otherwise, return lo if it’s below it, or hi above it.

This replaces:

min(hi, max(lo, v))

max(v, min(v, lo))

assuming that the relationship between lo and hi holds.

This is constructed to propagate NaN.

template<class T> T celeritas::clamp_to_nonneg(T v) noexcept¶

Return the value or (if it’s negative) then zero.

This is constructed to propagate NaN.

template<unsigned int N, class T> T celeritas::ipow(T v) noexcept¶

Return a nonnegative integer power of the input value.

Example:

assert(9.0 == ipow<2>(3.0));
assert(256 == ipow<8>(2));
static_assert(256 == ipow<8>(2));

template<class T, std::enable_if_t<std::is_floating_point<T>::value, bool> = true> inline T celeritas::fastpow(T a, T b)¶

Raise a number to a power with simplifying assumptions.

This should be faster than std::pow because we don’t worry about special cases for zeros, infinities, or negative values for a.

Example:

assert(9.0 == fastpow(3.0, 2.0));

double celeritas::rsqrt(double value)¶: Calculate an inverse square root.

template<class T, std::enable_if_t<std::is_floating_point<T>::value, bool> = true> T celeritas::fma(T a, T b, T y)¶

Use fused multiply-add for generic calculations.

This provides a floating point specialization so that fma can be used in code that is accelerated for floating point calculations but still works correctly with integer arithmetic.

Because of the single template parameter, it may be easier to use std::fma directly in most cases.

template<class T> T celeritas::ceil_div(T top, T bottom)¶: Integer division, rounding up, for positive numbers.

template<class T> T celeritas::negate(T value)¶: Negation that won’t return signed zeros.

template<class T, std::enable_if_t<std::is_floating_point<T>::value, bool> = true> T celeritas::eumod(T num, T denom)¶

Calculate the Euclidean modulus of two numbers.

num numerator
denom denominator

If both numbers are positive, this should be the same as fmod. If the sign of the remainder and denominator don’t match, the remainder will be remapped so that it is between zero and the denominator.

This function is useful for normalizing user-provided angles. Examples:

eumod(3, 2) == 1
eumod(-0.5, 2) == 1.5
eumod(-2, 2) == 0

template<class T> int celeritas::signum(T x)¶

Calculate the sign of a number.

Returns:: -1 if negative, 0 if exactly zero (or NaN), 1 if positive

void celeritas::sincos(double a, double *s, double *c)¶: Simultaneously evaluate the sine and cosine of a value.

void celeritas::sincospi(double a, double *s, double *c)¶: Simultaneously evaluate the sine and cosine of a value factored by pi.

template<class T> int celeritas::popcount(T x) noexcept¶: Count the number of set bits in an integer.

template<class Numeric> struct numeric_limits¶

Subset of numeric limits compatible with both host and device.

Deprecated:: The name of this class will change to NumericLimits to conform to the Celeritas naming system, since it is not a complete replacement for the std implementation.

Note

CUDART_NAN and CUDART_INF are not constexpr in CUDA 10 at least, so we have replaced those with compiler built-ins that work in GCC, Clang, and MSVC.

Atomics¶

These atomic functions are for use in kernel code (CUDA/HIP/OpenMP) that use track-level parallelism.

template<class T> T celeritas::atomic_add(T *address, T value)¶: Add to a value, returning the original value.

template<class T> T celeritas::atomic_min(T *address, T value)¶: Set the value to the minimum of the actual and given, returning old.

template<class T> T celeritas::atomic_max(T *address, T value)¶: Set the value to the maximum of the actual and given, returning old.

Array utilities¶

These operate on fixed-size arrays of data (see Containers), usually Real3 as a Cartesian spatial coordinate.

using celeritas::Real3 = Array<real_type, 3>¶: Three-dimensional cartesian coordinates.

template<class T, size_type N> inline void celeritas::axpy(T a, Array<T, N> const &x, Array<T, N> *y)¶

Increment a vector by another vector multiplied by a scalar.

\[ y \gets \alpha x + y \]

Note that this uses celeritas::fma which supports types other than floating point.

template<class T, size_type N> inline T celeritas::dot_product(Array<T, N> const &x, Array<T, N> const &y)¶

Dot product of two vectors.

Note that this uses celeritas::fma which supports types other than floating point.

template<class T> inline Array<T, 3> celeritas::cross_product(Array<T, 3> const &x, Array<T, 3> const &y)¶: Cross product of two space vectors.

template<class T, size_type N> inline T celeritas::norm(Array<T, N> const &v)¶: Calculate the Euclidean (2) norm of a vector.

template<class T, size_type N> inline Array<T, N> celeritas::make_unit_vector(Array<T, N> const &v)¶

Construct a unit vector.

Unit vectors have an Euclidean norm magnitude of 1.

template<class T, size_type N> inline T celeritas::distance(Array<T, N> const &x, Array<T, N> const &y)¶: Calculate the Euclidean (2) distance between two points.

template<class T> inline Array<T, 3> celeritas::from_spherical(T costheta, T phi)¶

Calculate a Cartesian vector from spherical coordinates.

Theta is the angle between the z axis and the outgoing vector, and phi is the angle between the x axis and the projection of the vector onto the x-y plane.

template<class T> inline Array<T, 3> celeritas::rotate(Array<T, 3> const &dir, Array<T, 3> const &rot)¶

Rotate a direction about the given scattering direction.

The equivalent to calling the Shift transport code’s

void cartesian_vector_transform(
    double      costheta,
    double      phi,
    Vector_View vector);

is the call

vector = rotate(from_spherical(costheta, phi), vector);

This code effectively decomposes the given rotation vector rot into two sequential transform matrices, one with an angle theta about the y axis and one about phi rotating around the z axis. These two angles are the spherical coordinate transform of the given rot cartesian direction vector.

There is some extra code in here to deal with loss of precision when the incident direction is along the z axis. As rot approaches z, the azimuthal angle \( \phi \) must be calculated carefully from both the x and y components of the vector, not independently. If rot actually equals z then the azimuthal angle is completely indeterminate so we arbitrarily choose \( \phi = 0 \).

This function is often used for calculating exiting scattering angles. In that case, dir is the exiting angle from the scattering calculation, and rot is the original direction of the particle. The direction vectors are defined as

\[ \vec{\Omega} = \sin\theta\cos\phi\vec{i} + \sin\theta\sin\phi\vec{j} + \cos\theta\vec{k} \,. \]

Soft equivalence¶

These utilities are used for comparing real-valued numbers to a given tolerance.

template<class RealType = ::celeritas::real_type> class SoftEqual¶

Functor for noninfinite floating point equality.

This function-like class considers an absolute tolerance for values near zero, and a relative tolerance for values far from zero. It correctly returns “false” if either value being compared is NaN. The call operator is commutative: eq(a,b) should always give the same as eq(b,a).

The actual comparison is:

\[ |a - b| < \max(\epsilon_r \max(|a|, |b|), \epsilon_a) \]

Note

The edge case where both values are infinite (with the same sign) returns false for equality, which could be considered reasonable because relative error is meaningless. To explicitly allow infinities to compare equal, you must test separately, e.g., a == b || soft_eq(a, b) using celeritas::EqualOr.

template<class RealType = ::celeritas::real_type> class SoftZero¶: Functor for comparing values to zero with an absolute precision.

template<class F> class EqualOr : public F ¶

Compare for equality before checking with the given functor.

This CRTP class allows SoftEqual to work for infinities.

template<class T = ::celeritas::real_type> class ArraySoftUnit¶

Test for being approximately a unit vector.

Consider a unit vector v with a small perturbation along a unit vector e :

\[ \vec v + \epsilon \vec e \]

The magnitude of this “nearly unit” is

\[ m^2 = (v + \epsilon e) \cdot (v + \epsilon e) = 1 + 2 (v \cdot e) \epsilon + \epsilon^2 \]

Since by the triangle inequality

\[ |v \cdot e| <= |v||e| = 1 \]

, then the magnitude squared of a perturbed unit vector is bounded

\[ m^2 = 1 \pm 2 \epsilon + \epsilon^2 \]

Instead of calculating the square of the tolerance we use \( \epsilon^2 < \epsilon \) to make the “soft unit vector” condition

\[ | \vec v \vd \vec v - 1 | < 3 \epsilon . \]

I/O¶

These functions and classes are for communicating helpfully with the user.

CELER_LOG(LEVEL)¶

Return a LogMessage object for streaming into at the given level.

The regular CELER_LOG call is for code paths that happen uniformly in parallel, approximately the same message from every thread and task.

The logger will only format and print messages. It is not responsible for cleaning up the state or exiting an app.

CELER_LOG(debug) << "Don't print this in general";
CELER_LOG(warning) << "You may want to reconsider your life choices";
CELER_LOG(critical) << "Caught a fatal exception: " << e.what();

CELER_LOG_LOCAL(LEVEL)¶

Like CELER_LOG but for code paths that may only happen on a single process or thread.

Use sparingly because this can be very verbose. This is typically used only for error messages coming from an a event or track at runtime.

enum class celeritas::LogLevel

Enumeration for how important a log message is.

Values:

enumerator debug: Debugging messages.

enumerator diagnostic: Diagnostics about current program execution.

enumerator status: Program execution status (what stage is beginning)

enumerator info: Important informational messages.

enumerator warning: Warnings about unusual events.

enumerator error: Something went wrong, but execution can continue.

enumerator critical: Something went terribly wrong, should probably abort.

enumerator size_: Sentinel value for looping over log levels.