Core functionality

The corecel directory contains functionality shared by Celeritas and ORANGE primarily pertaining to GPU abstractions.

Configuration

The corecel/Config.hh configure file contains all-caps definitions of the CMake configuration options as 0/1 defines so they can be used with if constexpr and other C++ expressions. In addition, it defines static C strings with configuration options such as key dependent library versions. Finally, corecel/Version.hh defines version numbers as preprocessor definition, a set of integers, and a descriptive string. celeritas_version.h, celeritas_cmake_strings.h, celeritas_sys_config.h, celeritas_config.h and corecel/device_runtime_api.h are deprecated and kept as aliases for backward-compatibility. They may be removed in an upcoming major version.

CELERITAS_VERSION

Celeritas version as a compile-time constant.

Encoded as a big-endian hexadecimal with one byte per component: (major * 256 + minor) * 256 + patch.

constexpr char celeritas_version[] = "0.5.0-85+b9c98bc"

Celeritas version string with git metadata.

Fundamentals

Several high-level types are defined as aliases since they may change based on configuration: for example, size_type is a 32-bit integer when building with device code enabled, but is a 64-bit integer on other 64-bit systems.

using celeritas::size_type = unsigned int

Standard type for container sizes, optimized for GPU use.

using celeritas::real_type = double

Numerical type for real numbers.

Macros

The Macros.hh file defines language and compiler abstraction macro definitions. It includes cross-platform (CUDA, C++, HIP) macros that expand to attributes depending on the compiler and build configuration.

CELER_FUNCTION

Decorate a function that works on both host and device, with and without NVCC.

The name of this function and its siblings is based on the Kokkos naming scheme.

CELER_CONSTEXPR_FUNCTION

Decorate a function that works on both host and device, with and without NVCC, can be evaluated at compile time, and should be forcibly inlined.

CELER_DEVICE_COMPILE

Defined and true if building device code in HIP or CUDA.

This is a generic replacement for .

CELER_TRY_HANDLE(STATEMENT, HANDLE_EXCEPTION)

“Try” to execute the statement, and “handle” all thrown errors by calling the given function-like error handler with a std::exception_ptr object.

Note

A file that uses this macro must include the <exception> header (but since the HANDLE_EXCEPTION needs to take an exception pointer, it’s got to be included anyway).

CELER_TRY_HANDLE_CONTEXT(STATEMENT, HANDLE_EXCEPTION, CONTEXT_EXCEPTION)

Try the given statement, and if it fails, chain it into the given exception.

The given CONTEXT_EXCEPTION must be an expression that yields an rvalue to a std::exception subclass that isn’t final . The resulting chained exception will be passed into HANDLE_EXCEPTION for processing.

CELER_DEFAULT_COPY_MOVE(CLS)

Explicitly declare defaulted copy and move constructors and assignment operators.

Use this if the destructor is declared explicitly, or as part of the “protected” section of an interface class to prevent assignment between incompatible classes.

CELER_DELETE_COPY_MOVE(CLS)

Explicitly declare deleted copy and move constructors and assignment operators.

Use this for scoped RAII classes.

CELER_DEFAULT_MOVE_DELETE_COPY(CLS)

Explicitly declare defaulted copy and move constructors and assignment operators.

Use this if the destructor is declared explicitly.

CELER_DISCARD(CODE)

The argument is an unevaluated operand which will generate no code but force the expression to be used.

This is used in place of the

[[maybe_unused]] 
attribute, which actually generates warnings in older versions of GCC.

Debug assertions

Celeritas debug assertions are only enabled when the CELERITAS_DEBUG configuration option is set. The macros CELER_EXPECT, CELER_ASSERT, and CELER_ENSURE correspond to “precondition contract”, “internal assertion”, and “postcondition contract”.

CELER_EXPECT(COND)

Precondition debug assertion macro.

We “expect” that the input values or initial state satisfy a precondition, and we throw exception in debug mode if they do not.

CELER_ASSERT(COND)

Internal debug assertion macro.

This replaces standard assert usage.

CELER_ENSURE(COND)

Postcondition debug assertion macro.

Use to “ensure” that return values or side effects are as expected when leaving a function.

The following two macros will throw debug assertions or cause undefined behavior at runtime:

CELER_ASSERT_UNREACHABLE()

Throw an assertion if the code point is reached.

When debug assertions are turned off, this changes to a compiler hint that improves optimization (and may force the coded to exit uncermoniously if the point is encountered, rather than continuing on with undefined behavior).

CELER_ASSUME(COND)

Always-on compiler assumption.

This should be used very rarely: you should make sure the resulting assembly is simplified in optimize mode from using the assumption. For example, sometimes informing the compiler of an assumption can reduce code bloat by skipping standard library exception handling code (e.g. in std::visit by assuming !var_obj.valueless_by_exception() ).

Finally, a few runtime macros will always throw helpful errors based on incorrect configuration or input values.

CELER_VALIDATE(COND, MSG)

Always-on runtime assertion macro.

This can check user input and input data consistency, and will raise RuntimeError on failure with a descriptive error message that is streamed as the second argument. This macro cannot be used in -annotated code.

The error message should read:

 "<PROBLEM> (<WHY IT'S A PROBLEM>) <SUGGESTION>?"

Examples with correct casing and punctuation:

  • ”failed to open ‘{filename}’ (should contain relaxation data)”

  • ”unexpected end of file ‘{filename}’ (data is inconsistent with

    boundaries)”

  • ”MPI was not initialized (needed to construct a communicator). Maybe set the environment variable CELER_DISABLE_PARALLEL=1 to disable externally?”

    - “invalid min_range={opts.min_range} (must be positive)”

This looks in pracice like:

CELER_VALIDATE(file_stream,
               << "failed to open '" << filename
               << "' (should contain relaxation data)");

An always-on debug-type assertion without a detailed message can be constructed by omitting the stream (but leaving the comma):

CELER_VALIDATE(file_stream,);

CELER_NOT_CONFIGURED(WHAT)

Assert if the code point is reached because an optional feature is disabled.

This generally should be used for the constructors of dummy class definitions in, e.g., Foo.nocuda.cc:

Foo::Foo()
{
    CELER_NOT_CONFIGURED("CUDA");
}

CELER_NOT_IMPLEMENTED(WHAT)

Assert if the code point is reached because a feature has yet to be fully implemented.

This placeholder is so that code paths can be “declared but not defined” and implementations safely postponed in a greppable manner. This should not be used to define “unused” overrides for virtual classes. A correct use case would be:

if (z > AtomicNumber{26})
{
    CELER_NOT_IMPLEMENTED("physics for heavy nuclides");
}

System

class Device

Manage attributes of the GPU.

CUDA/HIP translation table:

CUDA/NVIDIA

HIP/AMD

Description

thread

work item

individual local work element

warp

wavefront

“vectorized thread” operating in lockstep

block

workgroup

group of threads able to sync

multiprocessor

compute unit

hardware executing one or more blocks

multiprocessor

execution unit

hardware executing one or more warps

Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.

Warning

The current multithreading/multiprocess model is intended to have one GPU serving multiple CPU threads simultaneously, and one MPI process per GPU. The active CUDA device is a static thread-local property but global_device is global. CUDA needs to be activated using activate_device or activate_device_local on every thread, using the same device ID.

Device const &celeritas::device()

Get the shared default device.

void celeritas::activate_device()

Initialize the first device if available, when not using MPI.

class Environment

Interrogate and extend environment variables.

This makes it easier to generate reproducible runs, launch Celeritas remotely, or integrate with application drivers. The environment variables may be encoded as JSON input to supplement or override system environment variables, or set programmatically via this API call. Later the environment class can be interrogated to find which environment variables were accessed.

Unlike the standard environment which returns a null pointer for an unset variable, this returns an empty string.

Note

This class is not thread-safe on its own. The celeritas::getenv free function however is safe, although it should only be used in setup (single-thread) steps.

Note

Once inserted into the environment map, values cannot be changed. Standard practice in the code is to evaluate the environment variable exactly once and cache the result as a static const variable. If you really wanted to, you could call celeritas::environment() = {}; but that could result in the end-of-run diagnostic reporting different values than the ones actually used during the code’s setup.

Environment &celeritas::environment()

Access a static global environment variable.

This static variable should be shared among Celeritas objects.

std::string const &celeritas::getenv(std::string const &key)

Thread-safe access to global modified environment variables.

This function will insert the current value of the key into the environment, which remains immutable over the lifetime of the program (allowing the use of static const data to be set from the environment).

GetenvFlagResult celeritas::getenv_flag(std::string const &key, bool default_val)

Get a true/false flag with a default value.

The return value is a pair that has (1) the flag as determined by the environment variable or default value, and (2) an “insertion” flag specifying whether the default was used. The insertion result can be useful for providing a diagnostic message to the user.

  • Allowed true values: "1", "t", "yes", "true", "True"

  • Allowed false values: "0", "f", "no", "false", "False"

  • Empty value returns the default

  • Other value warns and returns the default

Utility functions

These functions replace or extend those in the C++ standard library <utility> header.

template<class T>
T &&celeritas::forward(typename std::remove_reference<T>::type &v) noexcept

Implement perfect forwarding with device-friendly functions.

template<class T>
auto celeritas::move(T &&v) noexcept -> typename std::remove_reference<T>::type&&

Cast a value as an rvalue reference to allow move construction.

template<class T>
void celeritas::trivial_swap(T &a, T &b) noexcept

Support swapping of trivial types.

template<class T, class U = T>
T celeritas::exchange(T &dst, U &&src)

Exchange values on host or device.

Algorithms

These functions replace or extend those in the C++ standard library <algorithm> header. The implementations of sort and other partitioning elements are derived from LLVM’s libc++.

template<class InputIt, class Predicate>
inline bool celeritas::all_of(InputIt iter, InputIt last, Predicate p)

Whether the predicate is true for all items.

template<class InputIt, class Predicate>
inline bool celeritas::any_of(InputIt iter, InputIt last, Predicate p)

Whether the predicate is true for any item.

template<class InputIt, class Predicate>
inline bool celeritas::all_adjacent(InputIt iter, InputIt last, Predicate p)

Whether the predicate is true for pairs of consecutive items.

template<class ForwardIt, class T, class Compare>
ForwardIt celeritas::lower_bound(ForwardIt first, ForwardIt last, T const &value, Compare comp)

Find the insertion point for a value in a sorted list using a binary search.

template<class ForwardIt, class T, class Compare>
ForwardIt celeritas::upper_bound(ForwardIt first, ForwardIt last, T const &value, Compare comp)

Find the first element which is greater than

template<class ForwardIt, class T, class Compare>
inline ForwardIt celeritas::find_sorted(ForwardIt first, ForwardIt last, T const &value, Compare comp)

Find the given element in a sorted range.

template<class ForwardIt, class Predicate>
ForwardIt celeritas::partition(ForwardIt first, ForwardIt last, Predicate pred)

Partition elements in the given range, “true” before “false”.

This is done by swapping elements until the range is partitioned.

template<class RandomAccessIt, class Compare>
void celeritas::sort(RandomAccessIt first, RandomAccessIt last, Compare comp)

Sort an array on a single thread.

This implementation is not thread-safe nor cooperative, but it can be called from CUDA code.

template<class T, std::enable_if_t<!std::is_floating_point<T>::value, bool> = true>
T const &celeritas::max(T const &a, T const &b) noexcept

Return the higher of two values.

This function is specialized when building CUDA device code, which has special intrinsics for max.

template<class T, std::enable_if_t<!std::is_floating_point<T>::value, bool> = true>
T const &celeritas::min(T const &a, T const &b) noexcept

Return the lower of two values.

This function is specialized when building CUDA device code, which has special intrinsics for min.

template<class ForwardIt, class Compare>
inline ForwardIt celeritas::min_element(ForwardIt iter, ForwardIt last, Compare comp)

Return an iterator to the lowest value in the range as defined by Compare.

Numerics

These functions replace or extend those in the C++ standard library <cmath> and <numeric> headers.

template<class T>
inline T const &celeritas::clamp(T const &v, T const &lo, T const &hi)

Clamp the value between lo and hi values.

If the value is between lo and hi, return the value. Otherwise, return lo if it’s below it, or hi above it.

This replaces:

min(hi, max(lo, v))
or
max(v, min(v, lo))
assuming that the relationship between lo and hi holds.

template<class T>
T celeritas::clamp_to_nonneg(T v) noexcept

Return the value or (if it’s negative) then zero.

This is constructed to correctly propagate NaN.

template<unsigned int N, class T>
T celeritas::ipow(T v) noexcept

Return an integer power of the input value.

Example:

assert(9.0 == ipow<2>(3.0));
assert(256 == ipow<8>(2));
static_assert(256 == ipow<8>(2));

template<class T, std::enable_if_t<std::is_floating_point<T>::value, bool> = true>
inline T celeritas::fastpow(T a, T b)

Raise a number to a power with simplifying assumptions.

This should be faster than std::pow because we don’t worry about exceptions for zeros, infinities, or negative values for a.

Example:

assert(9.0 == fastpow(3.0, 2.0));

double celeritas::rsqrt(double value)

Calculate an inverse square root.

template<class T, std::enable_if_t<std::is_floating_point<T>::value, bool> = true>
T celeritas::fma(T a, T b, T y)

Use fused multiply-add for generic calculations.

Provide an FMA-like interface for integers.

This provides a floating point specialization so that fma can be used in code that is accelerated for floating point calculations but still works correctly with integer arithmetic.

Because of the single template parameter, it may be easier to use std::fma directly in most cases.

template<class T>
T celeritas::ceil_div(T top, T bottom)

Integer division, rounding up, for positive numbers.

template<class T>
T celeritas::negate(T value)

Negation that won’t return signed zeros.

template<class T, std::enable_if_t<std::is_floating_point<T>::value, bool> = true>
T celeritas::eumod(T numer, T denom)

Calculate the Euclidian modulus of two numbers.

If both numbers are positive, this should be the same as fmod. If the sign of the remainder and denominator don’t match, the remainder will be remapped so that it is between zero and the denominator.

This function is useful for normalizing user-provided angles.

template<class T>
int celeritas::signum(T x)

Calculate the sign of a number.

Returns:

-1 if negative, 0 if exactly zero (or NaN), 1 if positive

void celeritas::sincos(double a, double *s, double *c)

Simultaneously evaluate the sine and cosine of a value

template<class Numeric>
struct numeric_limits

Subset of numeric limits compatible with both host and device.

Note

CUDART_NAN and CUDART_INF are not constexpr in CUDA 10 at least, so we have replaced those with compiler built-ins that work in GCC, Clang, and MSVC.

Atomics

These atomic functions are for use in kernel code (CUDA/HIP/OpenMP) that use track-level parallelism.

template<class T>
T celeritas::atomic_add(T *address, T value)

Add to a value, returning the original value.

Note that on CPU, this assumes the atomic add is being done in with track-level parallelism rather than event-level because these utilities are meant for “kernel” code.

Warning

Multiple events must not use this function to simultaneously modify shared data.

template<class T>
T celeritas::atomic_min(T *address, T value)

Set the value to the minimum of the actual and given, returning old.

template<class T>
T celeritas::atomic_max(T *address, T value)

Set the value to the maximum of the actual and given, returning old.

Array utilities

These operate on fixed-size arrays of data (see Containers), usually Real3 as a Cartesian spatial coordinate.

using celeritas::Real3 = Array<real_type, 3>

Three-dimensional cartesian coordinates.

template<class T, size_type N>
inline void celeritas::axpy(T a, Array<T, N> const &x, Array<T, N> *y)

Increment a vector by another vector multiplied by a scalar.

Note that this uses celeritas::fma which supports types other than floating point.

template<class T, size_type N>
inline T celeritas::dot_product(Array<T, N> const &x, Array<T, N> const &y)

Dot product of two vectors.

Note that this uses celeritas::fma which supports types other than floating point.

template<class T>
inline Array<T, 3> celeritas::cross_product(Array<T, 3> const &x, Array<T, 3> const &y)

Cross product of two space vectors.

template<class T, size_type N>
inline T celeritas::norm(Array<T, N> const &v)

Calculate the Euclidian (2) norm of a vector.

template<class T, size_type N>
inline Array<T, N> celeritas::make_unit_vector(Array<T, N> const &v)

Construct a unit vector.

Unit vectors have an Euclidian norm magnitude of 1.

template<class T, size_type N>
inline T celeritas::distance(Array<T, N> const &x, Array<T, N> const &y)

Calculate the Euclidian (2) distance between two points.

template<class T>
inline Array<T, 3> celeritas::from_spherical(T costheta, T phi)

Calculate a Cartesian vector from spherical coordinates.

Theta is the angle between the Z axis and the outgoing vector, and phi is the angle between the x axis and the projection of the vector onto the x-y plane.

template<class T>
inline Array<T, 3> celeritas::rotate(Array<T, 3> const &dir, Array<T, 3> const &rot)

Rotate the direction about the given Z-based scatter direction.

The equivalent to calling the Shift transport code’s

void cartesian_vector_transform(
    double      costheta,
    double      phi,
    Vector_View vector);
is the call
vector = rotate(from_spherical(costheta, phi), vector);

This code effectively decomposes the given rotation vector rot into two sequential transform matrices, one with an angle theta about the y axis and one about phi rotating around the z axis. These two angles are the spherical coordinate transform of the given rot cartesian direction vector.

There is some extra code in here to deal with loss of precision when the incident direction is along the z axis. As rot approaches z, the azimuthal angle phi must be calculated carefully from both the x and y components of the vector, not independently. If rot actually equals z then the azimuthal angle is completely indeterminate so we arbitrarily choose phi = 0.

This function is often used for calculating exiting scattering angles. In that case, dir is the exiting angle from the scattering calculation, and rot is the original direction of the particle. The direction vectors are defined as

\[ \Omega = \sin\theta\cos\phi\mathbf{i} + \sin\theta\sin\phi\mathbf{j} + \cos\theta\mathbf{k} \,. \]

Soft equivalence

These utilities are used for comparing real-valued numbers to a given tolerance.

template<class RealType = ::celeritas::real_type>
class SoftEqual

Functor for noninfinite floating point equality.

This function-like class considers an absolute tolerance for values near zero, and a relative tolerance for values far from zero. It correctly returns “false” if either value being compared is NaN. The call operator is commutative: eq(a,b) should always give the same as eq(b,a).

The actual comparison is:

\[ |a - b| < \max(\epsilon_r \max(|a|, |b|), \epsilon_a) \]

Note

The edge case where both values are infinite (with the same sign) returns false for equality, which could be considered reasonable because relative error is meaningless. To explicitly allow infinities to compare equal, you must test separately, e.g., a == b || soft_eq(a, b).

template<class RealType = ::celeritas::real_type>
class SoftZero

Functor for floating point equality.

template<class T = ::celeritas::real_type>
class ArraySoftUnit

Test for being approximately a unit vector.

Consider a unit vector v with a small perturbation along a unit vector e :

\[ \vec v + \epsilon \vec e \]
The magnitude squared is
\[ m^2 = (v + \epsilon e) \cdot (v + \epsilon e) = v \cdot v + 2 \epsilon v \cdot e + \epsilon^2 e \cdot e = 1 + 2 \epsilon v \cdot e + \epsilon^2 \]

Since

\[ |v \cdot e| <= |v||e| = 1 \]
by the triangle inequality, then the magnitude squared of a perturbed unit vector is bounded
\[ m^2 = 1 \pm 2 \epsilon + \epsilon^2 \]

Instead of calculating the square of the tolerance we loosely bound with another epsilon.

I/O

These functions and classes are for communicating helpfully with the user.

CELER_LOG(LEVEL)

Return a LogMessage object for streaming into at the given level.

The regular CELER_LOG call is for code paths that happen uniformly in parallel.

The logger will only format and print messages. It is not responsible for cleaning up the state or exiting an app.

CELER_LOG(debug) << "Don't print this in general";
CELER_LOG(warning) << "You may want to reconsider your life choices";
CELER_LOG(critical) << "Caught a fatal exception: " << e.what();
CELER_LOG_LOCAL(LEVEL)

Like CELER_LOG but for code paths that may only happen on a single process or thread.

Use sparingly.

enum class celeritas::LogLevel

Enumeration for how important a log message is.

Values:

enumerator debug

Debugging messages.

enumerator diagnostic

Diagnostics about current program execution.

enumerator status

Program execution status (what stage is beginning)

enumerator info

Important informational messages.

enumerator warning

Warnings about unusual events.

enumerator error

Something went wrong, but execution can continue.

enumerator critical

Something went terribly wrong, should probably abort.

enumerator size_

Sentinel value for looping over log levels.