Core package#

The corecel directory contains functionality shared by Celeritas and ORANGE primarily pertaining to GPU abstractions.

Configuration#

The celeritas_config.h configure file contains all-caps definitions of the CMake configuration options as 0/1 defines so they can be used with if constexpr and other C++ expressions. The celeritas_cmake_strings.h defines static C strings with configuration options such as key dependent library versions. Finally, celeritas_version.h defines version numbers as a preprocessor definition, a set of integers, and a descriptive string.

Defines

CELERITAS_VERSION#

Celeritas version.

Encoded as a big-endian hexidecimal with one byte per component: (major * 256 + minor) * 256 + patch.

Variables

static const char celeritas_version[] = "0.4.2"#: Celeritas version string with git metadata.

static const int celeritas_version_major = 0#: Celeritas major version.

static const int celeritas_version_minor = 4#: Celeritas minor version.

static const int celeritas_version_patch = 2#: Celeritas patch version.

Fundamentals#

Language and compiler abstraction macro definitions.

The Macros file defines cross-platform (CUDA, C++, HIP) macros that expand to attributes depending on the compiler and build configuration.

Defines

CELER_FUNCTION#

Decorate a function that works on both host and device, with and without NVCC.

The name of this function and its siblings is based on the Kokkos naming scheme.

CELER_FORCEINLINE#

CELER_FORCEINLINE_FUNCTION#

Like CELER_FUNCTION but forces inlining.

Compiler optimizers usually can tell what needs optimizing, but this function can provide speedups (and smaller sampling profiles) when inlining optimizations are not enabled. It should be used sparingly.

CELER_CONSTEXPR_FUNCTION#: Decorate a function that works on both host and device, with and without NVCC, can be evaluated at compile time, and should be forcibly inlined.

CELER_UNLIKELY(COND)#

Mark the result of this condition to be “unlikely”.

This asks the compiler to move the section of code to a “cold” part of the instructions, improving instruction locality. It should be used primarily for error checking conditions.

CELER_UNREACHABLE#

Mark a point in code as being impossible to reach in normal execution.

See https://clang.llvm.org/docs/LanguageExtensions.html#builtin-unreachable or https://msdn.microsoft.com/en-us/library/1b3fsfxw.aspx or https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#__builtin_unreachable

(The “unreachable” and “assume” compiler optimizations for CUDA are only available in API version 11.3 or higher, which is encoded as

major*1000 + minor*10 

Note

This macro should not generally be used; instead, the macro CELER_ASSERT_UNREACHABLE() defined in base/Assert.hh should be used instead (to provide a more detailed error message in case the point is reached).

CELER_USE_DEVICE#: True if HIP or CUDA are enabled, false otherwise.

CELER_DEVICE_SOURCE#

Defined and true if building a HIP or CUDA source file.

This is a generic replacement for __CUDACC__ .

CELER_DEVICE_COMPILE#

Defined and true if building device code in HIP or CUDA.

This is a generic replacement for __CUDA_ARCH__ .

CELER_DEVICE_PREFIX(TOK)#: Add a prefix “hip” or “cuda” to a code token.

CELER_TRY_HANDLE(STATEMENT, HANDLE_EXCEPTION)#: “Try” to execute the statement, and “handle” all thrown errors by calling the given function-like error handler with a std::exception_ptr object.

Note

A file that uses this macro must include the <exception> header (but since the HANDLE_EXCEPTION needs to take an exception pointer, it’s got to be included anyway).

CELER_TRY_HANDLE_CONTEXT(STATEMENT, HANDLE_EXCEPTION, CONTEXT_EXCEPTION)#

Try the given statement, and if it fails, chain it into the given exception.

The given CONTEXT_EXCEPTION must be an expression that yields an rvalue to a std::exception subclass that isn’t final . The resulting chained exception will be passed into HANDLE_EXCEPTION for processing.

CELER_DEFAULT_COPY_MOVE(CLS)#

Explicitly declare defaulted copy and move constructors and assignment operators.

Use this if the destructor is declared explicitly.

CELER_DELETE_COPY_MOVE(CLS)#

Explicitly declare deleted copy and move constructors and assignment operators.

Use this for scoped RAII classes.

CELER_DISCARD(CODE)#

The argument is an unevaluated operand which will generate no code but force the expression to be used.

This is used in place of the

[[maybe_unused]] 

attribute, which actually generates warnings in older versions of GCC.

Celeritas assertions are only enabled when the CELERITAS_DEBUG configuration option is set. The macros CELER_EXPECT, CELER_ASSERT, and CELER_ENSURE correspond to “precondition contract”, “internal assertion”, and “postcondition contract”.

Macros, exceptions, and helpers for assertions and error handling.

This defines host- and device-compatible assertion macros that are toggled on the CELERITAS_DEBUG configure macro.

Defines

CELER_DEBUG_FAIL(MSG, WHICH)#

Throw a debug assertion regardless of the CELERITAS_DEBUG setting.

This is used internally but is also useful for catching subtle programming errors in downstream code.

CELER_EXPECT(COND)#

Precondition debug assertion macro.

We “expect” that the input values or initial state satisfy a precondition, and we throw exception in debug mode if they do not.

CELER_ASSERT(COND)#

Internal debug assertion macro.

This replaces standard assert usage.

CELER_ENSURE(COND)#

Postcondition debug assertion macro.

Use to “ensure” that return values or side effects are as expected when leaving a function.

CELER_ASSUME(COND)#

Always-on compiler assumption.

This should be used very rarely: you should make sure the resulting assembly is simplified in optimize mode from using the assumption. For example, sometimes informing the compiler of an assumption can reduce code bloat by skipping standard library exception handling code (e.g. in std::visit by assuming !var_obj.valueless_by_exception() ).

CELER_ASSERT_UNREACHABLE()#

Throw an assertion if the code point is reached.

When debug assertions are turned off, this changes to a compiler hint that improves optimization (and may force the coded to exit uncermoniously if the point is encountered, rather than continuing on with undefined behavior).

CELER_VALIDATE(COND, MSG)#

Always-on runtime assertion macro.

This can check user input and input data consistency, and will raise RuntimeError on failure with a descriptive error message that is streamed as the second argument. This macro cannot be used in __device__ -annotated code.

The error message should read:

 "<PROBLEM> (<WHY IT'S A PROBLEM>) <SUGGESTION>?"

Examples with correct casing and punctuation:

”failed to open ‘{filename}’ (should contain relaxation data)”
”unexpected end of file ‘{filename}’ (data is inconsistent with

boundaries)”
”MPI was not initialized (needed to construct a communicator). Maybe set

the environment variable CELER_DISABLE_PARALLEL=1 to disable

externally?”
”invalid min_range={opts.min_range} (must be positive)”

This looks in pracice like:

CELER_VALIDATE(file_stream,
               << "failed to open '" << filename
               << "' (should contain relaxation data)");

An always-on debug-type assertion without a detailed message can be constructed by omitting the stream (but leaving the comma):

CELER_VALIDATE(file_stream,);

CELER_NOT_CONFIGURED(WHAT)#

Assert if the code point is reached because an optional feature is disabled.

This generally should be used for the constructors of dummy class definitions in, e.g., Foo.nocuda.cc:

Foo::Foo()
{
    CELER_NOT_CONFIGURED("CUDA");
}

CELER_NOT_IMPLEMENTED(WHAT)#

Assert if the code point is reached because a feature has yet to be fully implemented.

This placeholder is so that code paths can be “declared but not defined” and implementations safely postponed in a greppable manner. This should not be used to define “unused” overrides for virtual classes. A correct use case would be:

if (z > AtomicNumber{26})
{
    CELER_NOT_IMPLEMENTED("physics for heavy nuclides");
}

CELER_CUDA_CALL(STATEMENT)#

When CUDA support is enabled, execute the wrapped statement and throw a RuntimeError if it fails.

If CUDA is disabled, throw an unconfigured assertion.

If it fails, we call cudaGetLastError to clear the error code. Note that this will not clear the code in a few fatal error cases (kernel assertion failure, invalid memory access) and all subsequent CUDA calls will fail.

CELER_CUDA_CALL(cudaMalloc(&ptr_gpu, 100 * sizeof(float)));
CELER_CUDA_CALL(cudaDeviceSynchronize());

Note

A file that uses this macro must include corecel/device_runtime_api.h .

CELER_HIP_CALL(STATEMENT)#

When HIP support is enabled, execute the wrapped statement and throw a RuntimeError if it fails.

If HIP is disabled, throw an unconfigured assertion.

If it fails, we call hipGetLastError to clear the error code.

CELER_HIP_CALL(hipMalloc(&ptr_gpu, 100 * sizeof(float)));
CELER_HIP_CALL(hipDeviceSynchronize());

Note

A file that uses this macro must include corecel/device_runtime_api.h . The celeritas_device_runtime_api_h_ declaration enforces this when HIP is disabled.

CELER_DEVICE_CALL_PREFIX(STMT)#

Prepend the argument with “cuda” or “hip” and call with the appropriate checking statement as above.

Example:

CELER_DEVICE_CALL_PREFIX(Malloc(&ptr_gpu, 100 * sizeof(float)));
CELER_DEVICE_CALL_PREFIX(DeviceSynchronize());

Note

A file that uses this macro must include corecel/device_runtime_api.h . The celeritas_device_runtime_api_h_ declaration enforces this when CUDA/HIP are disabled.

CELER_DEVICE_CHECK_ERROR()#

After a kernel launch or other call, check that no CUDA errors have occurred.

This is also useful for checking success after external CUDA libraries have been called.

CELER_MPI_CALL(STATEMENT)#

When MPI support is enabled, execute the wrapped statement and throw a RuntimeError if it fails.

If MPI is disabled, throw an unconfigured assertion.

Note

A file that uses this macro must include mpi.h.

namespace celeritas

Enums

enum class DebugErrorType#

Values:

enumerator precondition#: Precondition contract violation.

enumerator internal#: Internal assertion check failure.

enumerator unreachable#: Internal assertion: unreachable code path.

enumerator unconfigured#: Internal assertion: required feature not enabled.

enumerator unimplemented#: Internal assertion: not yet implemented.

enumerator postcondition#: Postcondition contract violation.

enumerator assumption#: “Assume” violation

enum class RuntimeErrorType#

Values:

enumerator validate#: Celeritas runtime error.

enumerator device#: CUDA or HIP.

enumerator mpi#: Coarse-grain parallelism.

enumerator geant#: Error from Geant4.

enumerator root#: Error from ROOT.

Functions

inline void unreachable()#: Invoke undefined behavior.

class DebugError : public std::logic_error#

#include <Assert.hh>

Error thrown by Celeritas assertions.

Public Functions

explicit DebugError(DebugErrorDetails)#: Construct a debug exception from detailed attributes.

inline DebugErrorDetails const &details() const#: Access the debug data.

Private Members

DebugErrorDetails details_#

struct DebugErrorDetails#

#include <Assert.hh>

Detailed properties of a debug assertion failure.

Public Members

DebugErrorType which#

char const *condition#

char const *file#

int line#

class RichContextException : public std::exception#

#include <Assert.hh>

Base class for writing arbitrary exception context to JSON.

This can be overridden in higher-level parts of the code for specific needs (e.g., writing thread, event, and track contexts in Celeritas solver kernels). Note that in order for derived classes to work with std::throw_with_nested, they MUST NOT be final.

Subclassed by celeritas::IPAContextException, celeritas::KernelContextException

Public Functions

virtual void output(JsonPimpl*) const = 0#: Write output to the given JSON object.

virtual char const *type() const = 0#: Provide the name for this exception class.

class RuntimeError : public std::runtime_error#

#include <Assert.hh>

Error thrown by working code from unexpected runtime conditions.

Public Functions

explicit RuntimeError(RuntimeErrorDetails)#: Construct a runtime error from detailed descriptions.

inline RuntimeErrorDetails const &details() const#: Access detailed information.

Public Static Functions

static RuntimeError from_validate(std::string msg, char const *code, char const *file, int line)#: Construct a runtime exception from a validation failure.

static RuntimeError from_device_call(char const *error_string, char const *code, char const *file, int line)#: Construct a runtime exception from a CUDA/HIP runtime failure.

static RuntimeError from_mpi_call(int errorcode, char const *code, char const *file, int line)#: Construct a message and throw an error from a runtime MPI failure.

static RuntimeError from_geant_exception(char const *origin, char const *code, char const *desc)#

Construct an error message from a Geant4 exception.

Parameters:

origin – Usually the function that throws
code – A computery error code
desc – Description of the failure

static RuntimeError from_root_error(char const *origin, char const *msg)#

Construct an error message from a Geant4 exception.

Parameters:

origin – Usually the function that throws
msg – Description of the failure

Private Members

RuntimeErrorDetails details_#

struct RuntimeErrorDetails#

#include <Assert.hh>

Detailed properties of a runtime error.

Public Members

RuntimeErrorType which = {RuntimeErrorType::validate}#

std::string what = {}#

std::string condition = {}#

std::string file = {}#

int line = {0}#

Type definitions for common Celeritas functionality.

namespace celeritas

Convenience typedefs for params and states.

template<template<Ownership, MemSpace> class P> using HostVal = P<Ownership::value, MemSpace::host>#: Managed host memory.

template<template<Ownership, MemSpace> class P> using HostCRef = P<Ownership::const_reference, MemSpace::host>#: Immutable reference to host memory.

template<template<Ownership, MemSpace> class S> using HostRef = S<Ownership::reference, MemSpace::host>#: Mutable reference to host memory.

template<template<Ownership, MemSpace> class P> using DeviceCRef = P<Ownership::const_reference, MemSpace::device>#: Immutable reference to device memory.

template<template<Ownership, MemSpace> class S> using DeviceRef = S<Ownership::reference, MemSpace::device>#: Mutable reference to device memory.

template<template<Ownership, MemSpace> class P> using NativeCRef = P<Ownership::const_reference, MemSpace::native>#: Immutable reference to native memory.

template<template<Ownership, MemSpace> class S> using NativeRef = S<Ownership::reference, MemSpace::native>#: Mutable reference to native memory.

template<template<Ownership, MemSpace> class P, MemSpace M> using CRefPtr = ObserverPtr<P<Ownership::const_reference, M> const, M>#: Pointer to same-memory const collection group.

template<template<Ownership, MemSpace> class S, MemSpace M> using RefPtr = ObserverPtr<S<Ownership::reference, M>, M>#: Pointer to same-memory mutable collection group.

Typedefs

using size_type = unsigned int#: Standard type for container sizes, optimized for GPU use.

using real_type = double#: Numerical type for real numbers.

using ull_int = unsigned long long int#: Equivalent to std::size_t but compatible with CUDA atomics.

Enums

enum class MemSpace#

Memory location of data.

Values:

enumerator host#: CPU memory.

enumerator device#: GPU memory.

enumerator mapped#: Unified virtual address space (both host and device)

enumerator native#: When included by a CUDA/HIP file; else ‘host’.

enum class Ownership#

Data ownership flag.

Values:

enumerator value#: Ownership of the data, only on host.

enumerator reference#: Mutable reference to the data.

enumerator const_reference#: Immutable reference to the data.

Functions

inline constexpr char const *to_cstring(MemSpace m)#: Get a string corresponding to a memory space.

template<class ValueT, class SizeT = ::celeritas::size_type> class OpaqueId#

Type-safe index for accessing an array.

This allows type-safe, read-only indexing/access for a class. The value is ‘true’ if it’s assigned, ‘false’ if invalid.

Template Parameters:

ValueT – Type of each item in the array.
SizeT – Integer index

Public Functions

inline OpaqueId()#: Default to invalid state.

inline explicit OpaqueId(size_type index)#: Construct explicitly with stored value.

inline explicit operator bool() const#: Whether this ID is in a valid (assigned) state.

inline OpaqueId &operator++()#: Pre-increment of the ID.

inline OpaqueId operator++(int)#: Post-increment of the ID.

inline size_type get() const#: Get the ID’s value.

inline size_type unchecked_get() const#: Get the value without checking for validity (atypical)

System#

namespace celeritas

class Device#

#include <Device.hh>

Manage attributes of the GPU.

CUDA/HIP translation table:

CUDA/NVIDIA	HIP/AMD	Description
thread	work item	individual local work element
warp	wavefront	“vectorized thread” operating in lockstep
block	workgroup	group of threads able to sync
multiprocessor	compute unit	hardware executing one or more blocks
multiprocessor	execution unit	hardware executing one or more warps

Each block/workgroup operates on the same hardware (compute unit) until completion. Similarly, a warp/wavefront is tied to a single execution unit. Each compute unit can execute one or more blocks: the higher the number of blocks resident, the more latency can be hidden.

Warning

The current multithreading/multiprocess model is intended to have one GPU serving multiple CPU threads simultaneously, and one MPI process per GPU. The active CUDA device is a static thread-local property but global_device is global. CUDA needs to be activated using activate_device or activate_device_local on every thread, using the same device ID.

Type aliases

using MapStrInt = std::map<std::string, int>#

Public Functions

Device() = default#

explicit Device(int id)#: Construct from a device ID.

inline int device_id() const#: Get the CUDA device ID, if active.

inline explicit operator bool() const#: True if device is initialized.

inline std::string name() const#: Device name.

inline std::size_t total_global_mem() const#: Total memory capacity (bytes)

inline int max_threads_per_block() const#: Maximum number of threads per block (for launch limits)

inline int max_blocks_per_grid() const#: Maximum number of threads per block (for launch limits)

inline int max_threads_per_cu() const#: Maximum number of concurrent threads per compute unit (for occupancy)

inline unsigned int threads_per_warp() const#: Number of threads per warp.

inline bool can_map_host_memory() const#: Whether the device supports mapped pinned memory.

inline unsigned int eu_per_cu() const#: Number of execution units per compute unit (1 for NVIDIA, 4 for AMD)

inline MapStrInt const &extra() const#: Additional potentially interesting diagnostics.

StreamId::size_type num_streams() const#: Number of streams allocated.

void create_streams(unsigned int num_streams) const#

Allocate the given number of streams.

If no streams have been created, the default stream will be used.

Stream &stream(StreamId) const#

Access a stream.

This returns the default stream if no streams were allocated.

Public Static Functions

static int num_devices()#

Get the number of available devices.

This is nonzero if and only if CUDA support is built-in, if at least one CUDA-capable device is present, and if the CELER_DISABLE_DEVICE environment variable is not set.

static bool debug()#

Whether verbose messages and error checking are enabled.

This is true if CELERITAS_DEBUG is set or if the CELER_DEBUG_DEVICE environment variable exists and is not empty.

Private Types

using UPStreamStorage = std::unique_ptr<detail::StreamStorage, StreamStorageDeleter>#

Private Members

int id_ = {-1}#

std::string name_ = {"<DISABLED>"}#

std::size_t total_global_mem_ = {}#

int max_threads_per_block_ = {}#

int max_blocks_per_grid_ = {}#

int max_threads_per_cu_ = {}#

unsigned int threads_per_warp_ = {}#

bool can_map_host_memory_ = {}#

unsigned int eu_per_cu_ = {}#

MapStrInt extra_#

UPStreamStorage streams_#

struct StreamStorageDeleter#

Public Functions

void operator()(detail::StreamStorage*) noexcept#

External deleter for stream storage.

This is for the PIMPL idiom with unique pointers.

namespace detail#

Containers#

template<class T, ::celeritas::size_type N> struct Array#

Fixed-size simple array for storage.

This isn’t fully standards-compliant with std::array: there’s no support for N=0 for example. Additionally it uses the native celeritas size_type, even though this has no effect on generated code for values of N inside the range of size_type.

Note

For supplementary functionality, include:

corecel/math/ArrayUtils.hh for real-number vector/matrix applications
corecel/math/ArrayOperators.hh for mathematical operators
ArrayIO.hh for streaming and string conversion
ArrayIO.json.hh for JSON input and output

Type aliases

using value_type = T #

using size_type = ::celeritas::size_type #

using pointer = value_type*#

using const_pointer = value_type const*#

using reference = value_type&#

using const_reference = value_type const&#

using iterator = pointer #

using const_iterator = const_pointer #

Element access

inline const_reference operator[](size_type i) const#

inline reference operator[](size_type i)#

inline const_reference front() const#

inline reference front()#

inline const_reference back() const#

inline reference back()#

inline const_pointer data() const#

inline pointer data()#

Iterators

inline iterator begin()#

inline iterator end()#

inline const_iterator begin() const#

inline const_iterator end() const#

inline const_iterator cbegin() const#

inline const_iterator cend() const#

Capacity

inline bool empty() const#

static inline size_type size()#

Operations

Fill the array with a constant value

inline void fill(const_reference value)#

Public Members

T data_[N]#: Storage.

template<class T, std::size_t Extent = dynamic_extent> class Span#

Modified backport of C++20 span.

Like the celeritas::Array , this class isn’t 100% compatible with the std::span class (partly of course because language features are missing from C++14). The hope is that it will be complete and correct for the use cases needed by Celeritas (and, as a bonus, it will be device-compatible).

Notably, only a subset of the functions (those having to do with size) are constexpr. This is to allow debug assertions.

Span can be instantiated with the special marker type LdgValue<T> to optimize reading constant data on device memory. In that case, data returned by front, back, operator[] and begin / end iterator use value semantics instead of reference. data still returns a pointer to the data and can be used to bypass using LdgIterator

Template Parameters:

T – value type
Extent – fixed size; defaults to dynamic.

Public Functions

constexpr Span() = default#: Construct with default null pointer and size zero.

inline Span(pointer d, size_type s)#: Construct from data and size.

template<class Iter> inline Span(Iter first, Iter last)#: Construct from two contiguous random-access iterators.

template<std::size_t N> inline Span(element_type (&arr)[N])#: Construct from a C array.

template<class U, std::size_t N> inline Span(Span<U, N> const &other)#: Construct from another span.

Span(Span const&) noexcept = default#: Copy constructor (same template parameters)

Span &operator=(Span const&) noexcept = default#: Assignment (same template parameters)

Public Static Attributes

static constexpr std::size_t extent = Extent #: Size (may be dynamic)

Math, numerics, and algorithms#

Defines

CELER_WRAP_MATH_FLOAT_DBL_1(PREFIX, FUNC)#: Generate overloads for a single-argument math function.

CELER_WRAP_MATH_FLOAT_DBL_PTR_2(PREFIX, FUNC)#

namespace celeritas

Unnamed Group

void sincos(float a, float *s, float *c)#

void sincos(double a, double *s, double *c)#

void sincospi(float a, float *s, float *c)#

void sincospi(double a, double *s, double *c)#

Functions

template<class T> T &&forward(typename std::remove_reference<T>::type &v) noexcept#: Implement perfect forwarding with device-friendly functions.

template<class T> auto move(T &&v) noexcept -> typename std::remove_reference<T>::type&&#: Cast a value as an rvalue reference to allow move construction.

template<class T> void trivial_swap(T &a, T &b) noexcept#: Support swapping of trivial types.

template<class T, class U = T> T exchange(T &dst, U &&src)#: Exchange values on host or device.

template<class InputIt, class Predicate> inline bool all_of(InputIt iter, InputIt last, Predicate p)#: Whether the predicate is true for all items.

template<class InputIt, class Predicate> inline bool any_of(InputIt iter, InputIt last, Predicate p)#: Whether the predicate is true for any item.

template<class T> inline T const &clamp(T const &v, T const &lo, T const &hi)#

Clamp the value between lo and hi values.

If the value is between lo and hi, return the value. Otherwise, return lo if it’s below it, or hi above it.

This replaces:

min(hi, max(lo, v))

max(v, min(v, lo))

assuming that the relationship between lo and hi holds.

template<class T> T clamp_to_nonneg(T v) noexcept#

Return the value or (if it’s negative) then zero.

This is constructed to correctly propagate NaN.

template<class ForwardIt, class T, class Compare> ForwardIt lower_bound(ForwardIt first, ForwardIt last, T const &value, Compare comp)#: Find the insertion point for a value in a sorted list using a binary search.

template<class ForwardIt, class T> ForwardIt lower_bound(ForwardIt first, ForwardIt last, T const &value)#: Find the insertion point for a value in a sorted list using a binary search.

template<class ForwardIt, class T, class Compare> ForwardIt lower_bound_linear(ForwardIt first, ForwardIt last, T const &value, Compare comp)#: Find the insertion point for a value in a sorted list using a linear search.

template<class ForwardIt, class T> ForwardIt lower_bound_linear(ForwardIt first, ForwardIt last, T const &value)#: Find the insertion point for a value in a sorted list using a linear search.

template<class ForwardIt, class T, class Compare> ForwardIt upper_bound(ForwardIt first, ForwardIt last, T const &value, Compare comp)#: Find the first element which is greater than

template<class ForwardIt, class T> ForwardIt upper_bound(ForwardIt first, ForwardIt last, T const &value)#: Find the first element which is greater than

template<class ForwardIt, class Predicate> ForwardIt partition(ForwardIt first, ForwardIt last, Predicate pred)#

Partition elements in the given range, “true” before “false”.

This is done by swapping elements until the range is partitioned.

template<class RandomAccessIt, class Compare> void sort(RandomAccessIt first, RandomAccessIt last, Compare comp)#

Sort an array on a single thread.

This implementation is not thread-safe nor cooperative, but it can be called from CUDA code.

template<class RandomAccessIt> void sort(RandomAccessIt first, RandomAccessIt last)#: Sort an array on a single thread.

template<class T> T const &max(T const &a, T const &b) noexcept#

Return the higher of two values.

This function is specialized when building CUDA device code, which has special intrinsics for max.

template<class T> T const &min(T const &a, T const &b) noexcept#

Return the lower of two values.

This function is specialized when building CUDA device code, which has special intrinsics for min.

template<class ForwardIt, class Compare> inline ForwardIt min_element(ForwardIt iter, ForwardIt last, Compare comp)#: Return an iterator to the lowest value in the range as defined by Compare.

template<class ForwardIt> ForwardIt min_element(ForwardIt first, ForwardIt last)#: Return an iterator to the lowest value in the range.

template<unsigned int N, class T> T ipow(T v) noexcept#

Return an integer power of the input value.

Example:

assert(9.0 == ipow<2>(3.0));
assert(256 == ipow<8>(2));
static_assert(256 == ipow<8>(2));

template<class T, typename = std::enable_if_t<std::is_floating_point<T>::value>> inline T fastpow(T a, T b)#

Raise a number to a power with simplifying assumptions.

This should be faster than std::pow because we don’t worry about exceptions for zeros, infinities, or negative values for a.

Example:

assert(9.0 == fastpow(3.0, 2.0));

inline double rsqrt(double value)#: Calculate an inverse square root.

inline float rsqrt(float value)#: Calculate an inverse square root.

template<class T> T ceil_div(T top, T bottom)#: Integer division, rounding up, for positive numbers.

template<class T> T negate(T value)#: Negation that won’t return signed zeros.

template<class T> T diffsq(T a, T b)#

Calculate the difference of squares \( a^2 - b^2 \).

The expression \( a^2 - b^2 \) exhibits catastrophic cancellation when \( a \) and \( b \) are close in magnitude. This can be avoided by rearranging the formula as \( (a - b)(a + b) \).

Variables

constexpr double m_pi = detail::m_pi#

Double-precision math constant (POSIX derivative).

These should be used in host or type-dependent circumstances because, if using CELERITAS_REAL_TYPE=float, this could have more accuracy than celeritas::constants::pi .

template<class T = void> struct Less#

#include <Algorithms.hh>

Evaluator for the first argument being less than the second.

Public Functions

inline auto operator()(T const &lhs, T const &rhs) const noexcept -> decltype(auto)#

template<> struct Less<void>#

#include <Algorithms.hh>

Specialization of less with template deduction.

Public Functions

template<class T, class U> inline auto operator()(T &&lhs, U &&rhs) const -> decltype(auto)#

Math functions using celeritas::Array.

namespace celeritas

Functions

template<class T, size_type N> inline void axpy(T a, Array<T, N> const &x, Array<T, N> *y)#: Increment a vector by another vector multiplied by a scalar.

template<class T, size_type N> inline T dot_product(Array<T, N> const &x, Array<T, N> const &y)#: Dot product of two vectors.

template<class T> inline Array<T, 3> cross_product(Array<T, 3> const &x, Array<T, 3> const &y)#: Cross product of two space vectors.

template<class T, size_type N> inline T norm(Array<T, N> const &v)#: Calculate the Euclidian (2) norm of a vector.

template<class T, size_type N> inline Array<T, N> make_unit_vector(Array<T, N> const &v)#

Construct a unit vector.

Unit vectors have an Euclidian norm magnitude of 1.

template<class T, size_type N> inline T distance(Array<T, N> const &x, Array<T, N> const &y)#: Calculate the Euclidian (2) distance between two points.

template<class T> inline Array<T, 3> from_spherical(T costheta, T phi)#

Calculate a Cartesian vector from spherical coordinates.

Theta is the angle between the Z axis and the outgoing vector, and phi is the angle between the x axis and the projection of the vector onto the x-y plane.

template<class T> inline Array<T, 3> rotate(Array<T, 3> const &dir, Array<T, 3> const &rot)#

Rotate the direction about the given Z-based scatter direction.

The equivalent to calling the Shift transport code’s

void cartesian_vector_transform(
    double      costheta,
    double      phi,
    Vector_View vector);

is the call

vector = rotate(from_spherical(costheta, phi), vector);

This code effectively decomposes the given rotation vector rot into two sequential transform matrices, one with an angle theta about the y axis and one about phi rotating around the z axis. These two angles are the spherical coordinate transform of the given rot cartesian direction vector.

There is some extra code in here to deal with loss of precision when the incident direction is along the z axis. As rot approaches z, the azimuthal angle phi must be calculated carefully from both the x and y components of the vector, not independently. If rot actually equals z then the azimuthal angle is completely indeterminate so we arbitrarily choose phi = 0.

This function is often used for calculating exiting scattering angles. In that case, dir is the exiting angle from the scattering calculation, and rot is the original direction of the particle. The direction vectors are defined as

\[ \Omega = \sin\theta\cos\phi\mathbf{i} + \sin\theta\sin\phi\mathbf{j} + \cos\theta\mathbf{k} \,. \]

template<class T, size_type N> inline bool is_soft_unit_vector(Array<T, N> const &v)#

Test for being approximately a unit vector.

Consider a unit vector v with a small perturbation along a unit vector e :

\[ \vec v + \epsilon \vec e \]

The magnitude squared is

\[ m^2 = (v + \epsilon e) \cdot (v + \epsilon e) = v \cdot v + 2 \epsilon v \cdot e + \epsilon^2 e \cdot e = 1 + 2 \epsilon v \cdot e + \epsilon^2 \]

Since

\[ |v \cdot e| <= |v||e| = 1 \]

by the triangle inequality, then the magnitude squared of a perturbed unit vector is bounded

\[ m^2 = 1 \pm 2 \epsilon + \epsilon^2 \]

Instead of calculating the square of the tolerance we loosely bound with another epsilon.

Example:

CELER_EXPECT(is_soft_unit_vector(v));

Atomics for use in kernel code (CUDA/HIP/OpenMP).

namespace celeritas

Functions

template<class T> T atomic_add(T *address, T value)#: Add to a value, returning the original value.

template<class T> T atomic_min(T *address, T value)#: Set the value to the minimum of the actual and given, returning old.

template<class T> T atomic_max(T *address, T value)#: Set the value to the maximum of the actual and given, returning old.

template<class Numeric> struct numeric_limits#: Subset of numeric limits compatible with both host and device.

Note

CUDART_NAN and CUDART_INF are not constexpr in CUDA 10 at least, so we have replaced those with compiler built-ins that work in GCC, Clang, and MSVC.

Unnamed Group

CELER_DEFINE_QACCESS(FUNC, QUAL)#: Access the underlying numeric value, discarding units

namespace celeritas

Functions

auto zero_quantity() noexcept#: Get a zero quantity (analogous to nullptr).

auto max_quantity() noexcept#: Get a quantitity greater than any other numeric quantity.

auto neg_max_quantity() noexcept#: Get a quantitity less than any other numeric quantity.

template<class U, class V> void swap(Quantity<U, V> &a, Quantity<U, V> &b) noexcept#: Swap two Quantities.

template<class UnitT, class ValueT> auto native_value_from(Quantity<UnitT, ValueT> quant) noexcept -> decltype(auto)#

Convert the given quantity into the native Celeritas unit system.

assert(native_value_from(Quantity<CLight>{1}) == 2.998e10 *
centimeter/second);

template<class Q> Q native_value_to(typename Q::value_type value) noexcept#

Create a quantity from a value in the Celeritas unit system.

This function can be used for defining a constant for use in another unit system (typically a “natural” unit system for use in physics kernels).

constexpr LightSpeed c = native_value_to<LightSpeed>(constants::c_light);
assert(c.value() == 1);

template<class Q, class SrcUnitT, class ValueT> auto value_as(Quantity<SrcUnitT, ValueT> quant) noexcept -> ValueT #

Use the value of a Quantity.

The redundant unit type in the function signature is to make coupling safer across different parts of the code and to make the user code more readable.

assert(value_as<LightSpeed>(LightSpeed{1}) == 1);

template<class T> inline char const *accessor_unit_label()#

Get the label for a unit returned from a class accessor.

Example:

cout << accessor_unit_label<&ParticleView::mass>() << endl;

template<class UnitT, class ValueT = decltype(UnitT::value())> class Quantity#

#include <Quantity.hh>

A numerical value tagged with a unit.

A quantity is a value expressed in terms of the given unit. Storing values in a different unit system can help with some calculations (e.g. operating in natural unit systems) by avoiding numerical multiplications and divisions by large constants. It can also make debugging easier (numeric values are obvious).

Example usage by physics class, where charge is in units of q_e+, and mass and momentum are expressed in atomic natural units (where m_e = 1 and c = 1).

using MevEnergy   = Quantity<Mev>;
using MevMass     = Quantity<UnitDivide<Mev, CLightSq>>;
using MevMomentum = Quantity<UnitDivide<Mev, CLight>>;

A relativistic equation that operates on these quantities can do so without unnecessary floating point operations involving the speed of light:

real_type eval = value_as<MevEnergy>(energy); // Natural units
MevMomentum momentum{std::sqrt(eval * eval
                               + 2 * value_as<MevMass>(mass) * eval)};

The resulting quantity can be converted to the native Celeritas unit system with native_value_from, which multiplies in the constant value of ElMomentumUnit:

real_type mom = native_value_from(momentum);

When using a Quantity from another part of the code, e.g. an imported unit system, use the quantity free function rather than .value() in order to guarantee consistency of units between source and destination.

An example unit class would be:

struct DozenUnit
{
    static constexpr int value() { return 12; }
    static constexpr char const* label() { return "dozen"; }
};

The label is used solely for outputting to JSON.

Note

The Quantity is designed to be a simple “strong type” class, not a complex mathematical class. To operate on quantities, you must use value_as (to operate within the Quantity’s unit system) or native_value_from (to operate in the Celeritas native unit system), use the resulting numeric values in your mathematical expressions, then return a new Quantity class with the resulting value and correct type.

Template Parameters:

UnitT – unit tag class
ValueT – value type

Type aliases

using value_type = ValueT #

using unit_type = UnitT #

Unnamed Group

inline value_type &value() & noexcept#

inline value_type const &value() const & noexcept#

Public Functions

constexpr Quantity() = default#: Construct with default (zero)

inline explicit Quantity(value_type value) noexcept#: Construct with value in celeritas native units.

template<detail::QConstant QC> inline Quantity(detail::UnitlessQuantity<QC>) noexcept#: Construct implicitly from a unitless quantity.

Private Members

value_type value_ = {}#

namespace detail

Variables

template<class T> constexpr bool is_quantity_v = IsQuantity<T>::value#

template<class T> struct IsQuantity : public std::false_type#: #include <Quantity.hh>

Template matching to determine if T is a Quantity.

template<class V, class S> struct IsQuantity<Quantity<V, S>> : public std::true_type#

template<class V, class S> struct IsQuantity<Quantity<V, S> const> : public std::true_type#

namespace celeritas

Functions

real_type sqrt_tol()#

Square root of the soft equivalence tolerance for Celeritas.

This tolerance is needed for operations where the accuracy is limited by the square root of machine precision.

Todo:: Move orange tolerance and related operations into corecel/math alongside this, revisit ArrayUtils soft comparisons.

template<class T> SoftEqual(T) -> SoftEqual<T>#

template<class T> SoftEqual(T, T) -> SoftEqual<T>#

template<class F> EqualOr(F&&) -> EqualOr<F>#

template<class RealType> inline bool soft_equal(RealType expected, RealType actual)#: Soft equivalence with default tolerance.

template<class RealType> inline bool soft_near(RealType expected, RealType actual, RealType rel_error)#: Soft equivalence with relative error.

template<class RealType> inline bool soft_zero(RealType actual)#: Soft equivalence to zero, with default tolerance.

template<class RealType> inline bool soft_mod(RealType dividend, RealType divisor)#: Soft modulo operator.

template<class F> class EqualOr : public F

#include <SoftEqual.hh>

Compare for equality before checking with the given functor.

This CRTP class allows SoftEqual to work for infinities.

template<class RealType = ::celeritas::real_type> class SoftEqual#

#include <SoftEqual.hh>

Functor for noninfinite floating point equality.

This function-like class considers an absolute tolerance for values near zero, and a relative tolerance for values far from zero. It correctly returns “false” if either value being compared is NaN. The call operator is commutative: eq(a,b) should always give the same as eq(b,a).

The actual comparison is:

\[ |a - b| < \max(\epsilon_r \max(|a|, |b|), \epsilon_a) \]

Note

The edge case where both values are infinite (with the same sign) returns false for equality, which could be considered reasonable because relative error is meaningless. To explicitly allow infinities to compare equal, you must test separately, e.g., a == b || soft_eq(a, b).

Type aliases

using value_type = RealType #

Public Functions

SoftEqual()#: Construct with default relative/absolute precision.

explicit SoftEqual(value_type rel)#: Construct with scaled absolute precision.

SoftEqual(value_type rel, value_type abs)#

Construct with both relative and absolute precision.

Parameters:

rel – tolerance of relative error (default 1.0e-12 for doubles)
abs – threshold for absolute error when comparing small quantities (default 1.0e-14 for doubles)

bool operator()(value_type a, value_type b) const#: Compare two values, implicitly casting arguments.

inline value_type rel() const#: Relative allowable error.

inline value_type abs() const#: Absolute tolerance.

Private Types

using SETraits = detail::SoftEqualTraits<value_type>#

Private Members

value_type rel_#

value_type abs_#

template<class RealType = ::celeritas::real_type> class SoftZero#

#include <SoftEqual.hh>

Functor for floating point equality.

Type aliases

using argument_type = RealType #

using value_type = RealType #

Public Functions

SoftZero()#: Construct with default relative/absolute precision.

explicit SoftZero(value_type abs)#

Construct with specified precision.

Parameters:: abs – threshold for absolute error (default 1.0e-14 for doubles)

inline bool operator()(value_type actual) const#

See if the value is within absolute tolerance of zero.

Parameters:: actual – scalar floating point value

inline value_type abs() const#: Absolute tolerance.

Private Types

using SETraits = detail::SoftEqualTraits<value_type>#

Private Members

value_type abs_#

I/O#

Defines

CELER_CODE_PROVENANCE#: Inject the source code provenance (current file and line)

CELER_LOG(LEVEL)#

Return a LogMessage object for streaming into at the given level.

The regular CELER_LOG call is for code paths that happen uniformly in parallel.

The logger will only format and print messages. It is not responsible for cleaning up the state or exiting an app.

CELER_LOG(debug) << "Don't print this in general";
CELER_LOG(warning) << "You may want to reconsider your life choices";
CELER_LOG(critical) << "Caught a fatal exception: " << e.what();

CELER_LOG_LOCAL(LEVEL)#

Like CELER_LOG but for code paths that may only happen on a single process.

Use sparingly.

namespace celeritas

class Logger#

#include <Logger.hh>

Manage logging in serial and parallel.

This should generally be called by the world_logger and self_logger functions below. The call operator() returns an object that should be streamed into in order to create a log message.

This object is assignable, so to replace the default log handler with a different one, you can call

world_logger = Logger(MpiCommunicator::comm_world(), my_handler);

Type aliases

using Message = detail::LoggerMessage#

Public Functions

explicit Logger(LogHandler handle)#: Construct with default communicator and handler.

Logger(MpiCommunicator const &comm, LogHandler handle)#: Construct with communicator (only rank zero is active) and handler.

inline Message operator()(LogProvenance &&prov, LogLevel lev)#

Create a logger that flushes its contents when it destructs.

It’s assumed that log messages will be relatively unlikely (and expensive anyway), so we mark as CELER_UNLIKELY to optimize for the no-logging case.

inline void level(LogLevel lev)#: Set the minimum logging verbosity.

inline LogLevel level() const#: Get the current logging verbosity.

Public Static Functions

static inline constexpr LogLevel default_level()#: Get the default log level.

Private Members

LogHandler handle_#

LogLevel min_level_ = {default_level()}#

class OutputInterface#

Pure abstract interface for writing metadata output to JSON.

At the end of the program/run, the OutputRegistry will call the “output” method on all interfaces.

Todo:: Perhaps another output method for saving a schema?

Subclassed by celeritas::ActionDiagnostic, celeritas::ActionRegistryOutput, celeritas::BuildOutput, celeritas::ExceptionOutput, celeritas::GeantSimpleCalo, celeritas::GeantStepDiagnostic, celeritas::GeoParamsOutput, celeritas::MaterialParamsOutput, celeritas::OrangeParamsOutput, celeritas::OutputInterfaceAdapter< T >, celeritas::ParticleParamsOutput, celeritas::PhysicsParamsOutput, celeritas::SimpleCalo, celeritas::StepDiagnostic, celeritas::VecgeomParamsOutput

Public Types

enum class Category#

Output category (TODO: could replace with string/cstring?)

Values:

enumerator input#

enumerator result#

enumerator system#

enumerator internal#

enumerator size_#

Public Functions

virtual Category category() const = 0#: Category of data to write.

virtual std::string label() const = 0#: Key for the entry inside the category.