Core functionality¶
The corecel
directory contains functionality shared by Celeritas and ORANGE
primarily pertaining to GPU abstractions.
Configuration¶
The corecel/Config.hh
configure file contains all-caps definitions of the
CMake configuration options as 0/1 defines so they can be used with if
constexpr
and other C++ expressions. In addition, it defines static C strings
with configuration options such as key dependent library versions.
Finally, corecel/Version.hh
defines version numbers as preprocessor definition,
a set of integers, and a descriptive string. celeritas_version.h
,
celeritas_cmake_strings.h
, celeritas_sys_config.h
, celeritas_config.h
and corecel/device_runtime_api.h
are deprecated and kept as aliases for
backward-compatibility. They may be removed in an upcoming major version.
-
CELERITAS_VERSION¶
Celeritas version as a compile-time constant.
Encoded as a big-endian hexadecimal with one byte per component: (major * 256 + minor) * 256 + patch.
-
constexpr char celeritas_version[] = "0.5.0-85+b9c98bc"¶
Celeritas version string with git metadata.
Fundamentals¶
Several high-level types are defined as aliases since they may change based on
configuration: for example, size_type
is a 32-bit integer when building
with device code enabled, but is a 64-bit integer on other 64-bit systems.
-
using celeritas::size_type = unsigned int¶
Standard type for container sizes, optimized for GPU use.
-
using celeritas::real_type = double¶
Numerical type for real numbers.
Macros¶
The Macros.hh
file defines language and compiler abstraction macro
definitions. It includes cross-platform (CUDA, C++, HIP) macros that expand to
attributes depending on the compiler and build configuration.
-
CELER_FUNCTION¶
Decorate a function that works on both host and device, with and without NVCC.
The name of this function and its siblings is based on the Kokkos naming scheme.
-
CELER_CONSTEXPR_FUNCTION¶
Decorate a function that works on both host and device, with and without NVCC, can be evaluated at compile time, and should be forcibly inlined.
-
CELER_DEVICE_COMPILE¶
Defined and true if building device code in HIP or CUDA.
This is a generic replacement for
.
-
CELER_TRY_HANDLE(STATEMENT, HANDLE_EXCEPTION)¶
“Try” to execute the statement, and “handle” all thrown errors by calling the given function-like error handler with a
std::exception_ptr
object.Note
A file that uses this macro must include the
<exception>
header (but since theHANDLE_EXCEPTION
needs to take an exception pointer, it’s got to be included anyway).
-
CELER_TRY_HANDLE_CONTEXT(STATEMENT, HANDLE_EXCEPTION, CONTEXT_EXCEPTION)¶
Try the given statement, and if it fails, chain it into the given exception.
The given
CONTEXT_EXCEPTION
must be an expression that yields an rvalue to astd::exception
subclass that isn’tfinal
. The resulting chained exception will be passed intoHANDLE_EXCEPTION
for processing.
-
CELER_DEFAULT_COPY_MOVE(CLS)¶
Explicitly declare defaulted copy and move constructors and assignment operators.
Use this if the destructor is declared explicitly, or as part of the “protected” section of an interface class to prevent assignment between incompatible classes.
-
CELER_DELETE_COPY_MOVE(CLS)¶
Explicitly declare deleted copy and move constructors and assignment operators.
Use this for scoped RAII classes.
-
CELER_DEFAULT_MOVE_DELETE_COPY(CLS)¶
Explicitly declare defaulted copy and move constructors and assignment operators.
Use this if the destructor is declared explicitly.
-
CELER_DISCARD(CODE)¶
The argument is an unevaluated operand which will generate no code but force the expression to be used.
This is used in place of the
attribute, which actually generates warnings in older versions of GCC.[[maybe_unused]]
Debug assertions¶
Celeritas debug assertions are only enabled when the CELERITAS_DEBUG
configuration option is set. The macros CELER_EXPECT
, CELER_ASSERT
, and
CELER_ENSURE
correspond to “precondition contract”, “internal assertion”,
and “postcondition contract”.
-
CELER_EXPECT(COND)¶
Precondition debug assertion macro.
We “expect” that the input values or initial state satisfy a precondition, and we throw exception in debug mode if they do not.
-
CELER_ASSERT(COND)¶
Internal debug assertion macro.
This replaces standard
assert
usage.
-
CELER_ENSURE(COND)¶
Postcondition debug assertion macro.
Use to “ensure” that return values or side effects are as expected when leaving a function.
The following two macros will throw debug assertions or cause undefined behavior at runtime:
-
CELER_ASSERT_UNREACHABLE()¶
Throw an assertion if the code point is reached.
When debug assertions are turned off, this changes to a compiler hint that improves optimization (and may force the coded to exit uncermoniously if the point is encountered, rather than continuing on with undefined behavior).
-
CELER_ASSUME(COND)¶
Always-on compiler assumption.
This should be used very rarely: you should make sure the resulting assembly is simplified in optimize mode from using the assumption. For example, sometimes informing the compiler of an assumption can reduce code bloat by skipping standard library exception handling code (e.g. in
std::visit
by assuming!var_obj
.valueless_by_exception() ).
Finally, a few runtime macros will always throw helpful errors based on incorrect configuration or input values.
-
CELER_VALIDATE(COND, MSG)¶
Always-on runtime assertion macro.
This can check user input and input data consistency, and will raise RuntimeError on failure with a descriptive error message that is streamed as the second argument. This macro cannot be used in
-annotated code.
The error message should read:
"<PROBLEM> (<WHY IT'S A PROBLEM>) <SUGGESTION>?"
Examples with correct casing and punctuation:
”failed to open ‘{filename}’ (should contain relaxation data)”
”unexpected end of file ‘{filename}’ (data is inconsistent with
boundaries)”
”MPI was not initialized (needed to construct a communicator). Maybe set the environment variable CELER_DISABLE_PARALLEL=1 to disable externally?”
- “invalid min_range={opts.min_range} (must be positive)”
This looks in pracice like:
CELER_VALIDATE(file_stream, << "failed to open '" << filename << "' (should contain relaxation data)");
An always-on debug-type assertion without a detailed message can be constructed by omitting the stream (but leaving the comma):
CELER_VALIDATE(file_stream,);
-
CELER_NOT_CONFIGURED(WHAT)¶
Assert if the code point is reached because an optional feature is disabled.
This generally should be used for the constructors of dummy class definitions in, e.g.,
Foo.nocuda.cc
:Foo::Foo() { CELER_NOT_CONFIGURED("CUDA"); }
-
CELER_NOT_IMPLEMENTED(WHAT)¶
Assert if the code point is reached because a feature has yet to be fully implemented.
This placeholder is so that code paths can be “declared but not defined” and implementations safely postponed in a greppable manner. This should not be used to define “unused” overrides for virtual classes. A correct use case would be:
if (z > AtomicNumber{26}) { CELER_NOT_IMPLEMENTED("physics for heavy nuclides"); }
System¶
-
class Device¶
Manage attributes of the GPU.
CUDA/HIP translation table:
CUDA/NVIDIA
HIP/AMD
Description
thread
work item
individual local work element
warp
wavefront
“vectorized thread” operating in lockstep
block
workgroup
group of threads able to sync
multiprocessor
compute unit
hardware executing one or more blocks
multiprocessor
execution unit
hardware executing one or more warps
Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.
Warning
The current multithreading/multiprocess model is intended to have one GPU serving multiple CPU threads simultaneously, and one MPI process per GPU. The active CUDA device is a static thread-local property but
global_device
is global. CUDA needs to be activated usingactivate_device
oractivate_device_local
on every thread, using the same device ID.
-
void celeritas::activate_device()¶
Initialize the first device if available, when not using MPI.
-
class Environment¶
Interrogate and extend environment variables.
This makes it easier to generate reproducible runs, launch Celeritas remotely, or integrate with application drivers. The environment variables may be encoded as JSON input to supplement or override system environment variables, or set programmatically via this API call. Later the environment class can be interrogated to find which environment variables were accessed.
Unlike the standard environment which returns a null pointer for an unset variable, this returns an empty string.
Note
This class is not thread-safe on its own. The
celeritas::getenv
free function however is safe, although it should only be used in setup (single-thread) steps.Note
Once inserted into the environment map, values cannot be changed. Standard practice in the code is to evaluate the environment variable exactly once and cache the result as a static const variable. If you really wanted to, you could call
celeritas::environment() = {};
but that could result in the end-of-run diagnostic reporting different values than the ones actually used during the code’s setup.
-
Environment &celeritas::environment()¶
Access a static global environment variable.
This static variable should be shared among Celeritas objects.
-
std::string const &celeritas::getenv(std::string const &key)¶
Thread-safe access to global modified environment variables.
This function will insert the current value of the key into the environment, which remains immutable over the lifetime of the program (allowing the use of
static const
data to be set from the environment).
-
GetenvFlagResult celeritas::getenv_flag(std::string const &key, bool default_val)¶
Get a true/false flag with a default value.
The return value is a pair that has (1) the flag as determined by the environment variable or default value, and (2) an “insertion” flag specifying whether the default was used. The insertion result can be useful for providing a diagnostic message to the user.
Allowed true values:
"1", "t", "yes", "true", "True"
Allowed false values:
"0", "f", "no", "false", "False"
Empty value returns the default
Other value warns and returns the default
Utility functions¶
These functions replace or extend those in the C++ standard library
<utility>
header.
-
template<class T>
T &&celeritas::forward(typename std::remove_reference<T>::type &v) noexcept¶ Implement perfect forwarding with device-friendly functions.
-
template<class T>
auto celeritas::move(T &&v) noexcept -> typename std::remove_reference<T>::type&&¶ Cast a value as an rvalue reference to allow move construction.
Algorithms¶
These functions replace or extend those in the C++ standard library
<algorithm>
header. The implementations of sort
and other partitioning
elements are derived from LLVM’s libc++
.
-
template<class InputIt, class Predicate>
inline bool celeritas::all_of(InputIt iter, InputIt last, Predicate p)¶ Whether the predicate is true for all items.
-
template<class InputIt, class Predicate>
inline bool celeritas::any_of(InputIt iter, InputIt last, Predicate p)¶ Whether the predicate is true for any item.
-
template<class InputIt, class Predicate>
inline bool celeritas::all_adjacent(InputIt iter, InputIt last, Predicate p)¶ Whether the predicate is true for pairs of consecutive items.
-
template<class ForwardIt, class T, class Compare>
ForwardIt celeritas::lower_bound(ForwardIt first, ForwardIt last, T const &value, Compare comp)¶ Find the insertion point for a value in a sorted list using a binary search.
-
template<class ForwardIt, class T, class Compare>
ForwardIt celeritas::upper_bound(ForwardIt first, ForwardIt last, T const &value, Compare comp)¶ Find the first element which is greater than
-
template<class ForwardIt, class T, class Compare>
inline ForwardIt celeritas::find_sorted(ForwardIt first, ForwardIt last, T const &value, Compare comp)¶ Find the given element in a sorted range.
-
template<class ForwardIt, class Predicate>
ForwardIt celeritas::partition(ForwardIt first, ForwardIt last, Predicate pred)¶ Partition elements in the given range, “true” before “false”.
This is done by swapping elements until the range is partitioned.
-
template<class RandomAccessIt, class Compare>
void celeritas::sort(RandomAccessIt first, RandomAccessIt last, Compare comp)¶ Sort an array on a single thread.
This implementation is not thread-safe nor cooperative, but it can be called from CUDA code.
-
template<class T, std::enable_if_t<!std::is_floating_point<T>::value, bool> = true>
T const &celeritas::max(T const &a, T const &b) noexcept¶ Return the higher of two values.
This function is specialized when building CUDA device code, which has special intrinsics for max.
Numerics¶
These functions replace or extend those in the C++ standard library
<cmath>
and <numeric>
headers.
-
template<class T>
inline T const &celeritas::clamp(T const &v, T const &lo, T const &hi)¶ Clamp the value between lo and hi values.
If the value is between lo and hi, return the value. Otherwise, return lo if it’s below it, or hi above it.
This replaces:
ormin(hi, max(lo, v))
assuming that the relationship betweenmax(v, min(v, lo))
lo
andhi
holds.
-
template<class T>
T celeritas::clamp_to_nonneg(T v) noexcept¶ Return the value or (if it’s negative) then zero.
This is constructed to correctly propagate
NaN
.
-
template<unsigned int N, class T>
T celeritas::ipow(T v) noexcept¶ Return an integer power of the input value.
Example:
assert(9.0 == ipow<2>(3.0)); assert(256 == ipow<8>(2)); static_assert(256 == ipow<8>(2));
-
template<class T, std::enable_if_t<std::is_floating_point<T>::value, bool> = true>
inline T celeritas::fastpow(T a, T b)¶ Raise a number to a power with simplifying assumptions.
This should be faster than
std::pow
because we don’t worry about exceptions for zeros, infinities, or negative values for a.Example:
assert(9.0 == fastpow(3.0, 2.0));
-
double celeritas::rsqrt(double value)¶
Calculate an inverse square root.
-
template<class T, std::enable_if_t<std::is_floating_point<T>::value, bool> = true>
T celeritas::fma(T a, T b, T y)¶ Use fused multiply-add for generic calculations.
Provide an FMA-like interface for integers.
This provides a floating point specialization so that
fma
can be used in code that is accelerated for floating point calculations but still works correctly with integer arithmetic.Because of the single template parameter, it may be easier to use
std::fma
directly in most cases.
-
template<class T>
T celeritas::ceil_div(T top, T bottom)¶ Integer division, rounding up, for positive numbers.
-
template<class T, std::enable_if_t<std::is_floating_point<T>::value, bool> = true>
T celeritas::eumod(T numer, T denom)¶ Calculate the Euclidian modulus of two numbers.
If both numbers are positive, this should be the same as fmod. If the sign of the remainder and denominator don’t match, the remainder will be remapped so that it is between zero and the denominator.
This function is useful for normalizing user-provided angles.
-
template<class T>
int celeritas::signum(T x)¶ Calculate the sign of a number.
- Returns:
-1 if negative, 0 if exactly zero (or NaN), 1 if positive
-
void celeritas::sincos(double a, double *s, double *c)¶
Simultaneously evaluate the sine and cosine of a value
-
template<class Numeric>
struct numeric_limits¶ Subset of numeric limits compatible with both host and device.
Note
CUDART_NAN
andCUDART_INF
are notconstexpr
in CUDA 10 at least, so we have replaced those with compiler built-ins that work in GCC, Clang, and MSVC.
Atomics¶
These atomic functions are for use in kernel code (CUDA/HIP/OpenMP) that use track-level parallelism.
-
template<class T>
T celeritas::atomic_add(T *address, T value)¶ Add to a value, returning the original value.
Note that on CPU, this assumes the atomic add is being done in with track-level parallelism rather than event-level because these utilities are meant for “kernel” code.
Warning
Multiple events must not use this function to simultaneously modify shared data.
Array utilities¶
These operate on fixed-size arrays of data (see Containers), usually Real3
as a
Cartesian spatial coordinate.
-
template<class T, size_type N>
inline void celeritas::axpy(T a, Array<T, N> const &x, Array<T, N> *y)¶ Increment a vector by another vector multiplied by a scalar.
Note that this uses
celeritas::fma
which supports types other than floating point.
-
template<class T, size_type N>
inline T celeritas::dot_product(Array<T, N> const &x, Array<T, N> const &y)¶ Dot product of two vectors.
Note that this uses
celeritas::fma
which supports types other than floating point.
-
template<class T>
inline Array<T, 3> celeritas::cross_product(Array<T, 3> const &x, Array<T, 3> const &y)¶ Cross product of two space vectors.
-
template<class T, size_type N>
inline T celeritas::norm(Array<T, N> const &v)¶ Calculate the Euclidian (2) norm of a vector.
-
template<class T, size_type N>
inline Array<T, N> celeritas::make_unit_vector(Array<T, N> const &v)¶ Construct a unit vector.
Unit vectors have an Euclidian norm magnitude of 1.
-
template<class T, size_type N>
inline T celeritas::distance(Array<T, N> const &x, Array<T, N> const &y)¶ Calculate the Euclidian (2) distance between two points.
-
template<class T>
inline Array<T, 3> celeritas::from_spherical(T costheta, T phi)¶ Calculate a Cartesian vector from spherical coordinates.
Theta is the angle between the Z axis and the outgoing vector, and phi is the angle between the x axis and the projection of the vector onto the x-y plane.
-
template<class T>
inline Array<T, 3> celeritas::rotate(Array<T, 3> const &dir, Array<T, 3> const &rot)¶ Rotate the direction about the given Z-based scatter direction.
The equivalent to calling the Shift transport code’s
is the callvoid cartesian_vector_transform( double costheta, double phi, Vector_View vector);
vector = rotate(from_spherical(costheta, phi), vector);
This code effectively decomposes the given rotation vector
rot
into two sequential transform matrices, one with an angle theta about the y axis and one about phi rotating around the z axis. These two angles are the spherical coordinate transform of the givenrot
cartesian direction vector.There is some extra code in here to deal with loss of precision when the incident direction is along the z axis. As
rot
approaches z, the azimuthal angle phi must be calculated carefully from both the x and y components of the vector, not independently. Ifrot
actually equals z then the azimuthal angle is completely indeterminate so we arbitrarily choosephi
= 0.This function is often used for calculating exiting scattering angles. In that case,
dir
is the exiting angle from the scattering calculation, androt
is the original direction of the particle. The direction vectors are defined as\[ \Omega = \sin\theta\cos\phi\mathbf{i} + \sin\theta\sin\phi\mathbf{j} + \cos\theta\mathbf{k} \,. \]
Soft equivalence¶
These utilities are used for comparing real-valued numbers to a given tolerance.
-
template<class RealType = ::celeritas::real_type>
class SoftEqual¶ Functor for noninfinite floating point equality.
This function-like class considers an absolute tolerance for values near zero, and a relative tolerance for values far from zero. It correctly returns “false” if either value being compared is NaN. The call operator is commutative:
eq(a,b)
should always give the same aseq(b,a)
.The actual comparison is:
\[ |a - b| < \max(\epsilon_r \max(|a|, |b|), \epsilon_a) \]Note
The edge case where both values are infinite (with the same sign) returns false for equality, which could be considered reasonable because relative error is meaningless. To explicitly allow infinities to compare equal, you must test separately, e.g.,
a == b || soft_eq(a, b)
.
-
template<class RealType = ::celeritas::real_type>
class SoftZero¶ Functor for floating point equality.
-
template<class T = ::celeritas::real_type>
class ArraySoftUnit¶ Test for being approximately a unit vector.
Consider a unit vector v with a small perturbation along a unit vector e :
\[ \vec v + \epsilon \vec e \]\[ m^2 = (v + \epsilon e) \cdot (v + \epsilon e) = v \cdot v + 2 \epsilon v \cdot e + \epsilon^2 e \cdot e = 1 + 2 \epsilon v \cdot e + \epsilon^2 \]Since
\[ |v \cdot e| <= |v||e| = 1 \]\[ m^2 = 1 \pm 2 \epsilon + \epsilon^2 \]Instead of calculating the square of the tolerance we loosely bound with another epsilon.
I/O¶
These functions and classes are for communicating helpfully with the user.
-
CELER_LOG(LEVEL)¶
Return a LogMessage object for streaming into at the given level.
The regular
CELER_LOG
call is for code paths that happen uniformly in parallel.The logger will only format and print messages. It is not responsible for cleaning up the state or exiting an app.
CELER_LOG(debug) << "Don't print this in general"; CELER_LOG(warning) << "You may want to reconsider your life choices"; CELER_LOG(critical) << "Caught a fatal exception: " << e.what();
-
CELER_LOG_LOCAL(LEVEL)¶
Like
CELER_LOG
but for code paths that may only happen on a single process or thread.Use sparingly.
-
enum class celeritas::LogLevel
Enumeration for how important a log message is.
Values:
-
enumerator debug
Debugging messages.
-
enumerator diagnostic
Diagnostics about current program execution.
-
enumerator status
Program execution status (what stage is beginning)
-
enumerator info
Important informational messages.
-
enumerator warning
Warnings about unusual events.
-
enumerator error
Something went wrong, but execution can continue.
-
enumerator critical
Something went terribly wrong, should probably abort.
-
enumerator size_
Sentinel value for looping over log levels.
-
enumerator debug