Data model

Data storage must be isolated from data use for any code that is to run on the device. This allows low-level physics classes to operate on references to data using the exact same device/host code. Furthermore, state data (one per track) and shared data (definitions, persistent data, model data) should be separately allocated and managed.

Params

Provide a host-side interface to manage and provide access to constant shared GPU data, usually model parameters or the like. The Params class itself can only be accessed via host code. A params class can contain metadata (string names, etc.) suitable for host-side debug output and for helping related classes convert from user-friendly input (e.g. particle name) to device-friendly IDs (e.g., particle ID). These classes should inherit from the ParamsDataInterface class to define uniform helper methods and types and will often implement the data storage by using ParamsDataStore.

State

Thread-local data specifying the state of a single particle track with respect to a corresponding params class (FooParams). In the main Celeritas stepping loop, all state data is managed via the CoreState class.

View

Device-friendly class that provides read and/or write access to shared and local state data. The name is in the spirit of std::string_view, which adds functionality to non-owned data. It combines the state variables and model parameters into a single class. The constructor always takes const references to ParamsData and StateData as well as the track slot ID. It encapsulates the storage/layout of the state and parameters, as well as what (if any) data is cached in the state.

Hint

Consider the following example.

All SM physics particles share a common set of properties such as mass and charge, and each instance of particle has a particular set of associated variables such as kinetic energy. The shared data (SM parameters) reside in ParticleParams, and the particle track properties are stored as part of the core state.

A separate class, the ParticleTrackView, is instantiated with a specific thread ID so that it acts as an accessor to the stored data for a particular track. It can calculate properties that depend on both the state and parameters. For example, momentum depends on both the mass of a particle (constant, set by the model) and the speed (variable, depends on particle track state).

OpaqueId

The OpaqueId class is fundamental to Celeritas’ platform portability and software infrastructure.

template<class TagT, class IndexT = ::celeritas::size_type>
class OpaqueId

Type-safe “optional” index for accessing an array or collection of data.

Indexing into arrays with integers, rather than storing pointers, is key to easy and safe data management across host/device boundaries. Pointers in C++ can act as a reference to an array or element of data, and they also have a type, which not only gives the stride width in bytes but also prevents accidental aliasing.

The OpaqueId class is an attempt to model integer indexing (device-friendly) with pointer semantics (type-safe). Annotating index offsets with a type gives the offsets a semantic meaning, and it gives the developer compile-time type safety. As an example, it prevents index arguments in a function call from being provided out of order.

The typical usage of an OpaqueId should be as std::optional<IndexT>. The default-constructed value, nullid, cannot be used to index into an array, nor does it represent a valid element. An OpaqueId object evaluates to true if it has a value (OpaqueId{3}), or false if it does not (OpaqueId{}). The invalid state is usually referred to in the codebase as a “null ID”. Analogous to std::optional, nullid can be used for comparison, assignment, and construction.

The OpaqueId is hashable, sortable, and printable. It can be loaded via cached device memory using ldg .

Construction

  • Default: result is nullid

  • Implicitly from nullid

  • Explicitly from a compatible unsigned integer

  • Via id_cast for safe construction from general (or differently sized) integers

Usage

  • Check for nullity with bool or by comparing with nullid

  • Check for validity as a container index with id < vec.size()

  • Access value with operator* : vec[*id]

  • Access data with Collection::operator[]

  • Loop over consecutive IDs with range

  • Pre- and post- increment and decrement

  • Subtract two IDs to get a difference

Related helper functions and types

  • nullid is an instance of nullid_t that compares to any OpaqueId as its “null” value.

  • is_opaque_id_v allows checking for generic types

  • MakeSize_t is a descriptive type alias to get the unsigned integer value_type of an opaque ID, used for container capacities.

  • id_cast safely converts integers to OpaqueId.

About the TagT template parameter

If this class is used for indexing into an array, then TagT argument should usually be the value type of the array:

FooRecord operator[](OpaqueId<FooRecord>);
Otherwise, the convention is to use an anonymous tag:
using FooId = OpaqueId<struct Foo_>;

Note

A valid ID will always compare less than a null ID: you can use std::partition and erase to remove null IDs from a vector.

Note

Comparators are defined as inline friend functions to allow ADL-assisted conversion, including from LdgWrapper.

Template Parameters:
  • TagT – Type of an item at the index corresponding to this ID

  • IndexT – Unsigned integer acting as the stored value

Deprecated access

Deprecated:

Remove in v1.0

Pointer-like arithmetic

Get the distance between two opaque IDs

Compare with unsigned int

This allows size checking for containers

template<class IdT, class J>
inline auto celeritas::id_cast(J value) noexcept(!0) -> std::enable_if_t<is_opaque_id_v<IdT> && std::is_integral_v<J>, IdT>

Safely create an OpaqueId from an integer of any type.

This asserts that the integer is in the valid range of the target ID type, and casts to it.

Note

The value cannot be the underlying “null” value; i.e. static_cast<FooId>(*FooId{}) will not work.

template<class T>
using celeritas::MakeSize_t = typename MakeSize<std::remove_cv_t<T>>::type

Get the unsigned integer corresponding to an ID’s capacity.

Storage

page Collection: a data portability class

The Collection manages data allocation and transfer between CPU and GPU.

Its primary design goal is facilitating construction of deeply hierarchical data on host at setup time and seamlessly copying to device. The templated T must be trivially copyable and destructable: either a fundamental data type or a struct of such types. (Some classes in external libraries, such as rocrand’s state types and VecGeom’s NavTuple types, are essentially trivial, but implement null-op destructors or optimized copy constructors, so we allow specialization through the celeritas::IsTriviallyCopyable class.

An individual item in a Collection<T> can be accessed with ItemId<T>, a contiguous subset of items are accessed with ItemRange<T>, and the entirety of the data are accessed with all_items. All three of these classes are trivially copyable, so they can be embedded in structs that can be managed by a Collection. A group of Collections, one for each data type, can therefore be trivially copied to the GPU to enable arbitrarily deep and complex data hierarchies.

By convention, groups of Collections comprising the data for a single class or subsystem (such as RayleighInteractor or Physics) are stored in a helper struct suffixed with Data . For cases where there is both persistent data (problem-specific parameters) and transient data (track-specific states), the collections must be grouped into two separate classes. StateData are meant to be mutable and never directly copied between host and device; its data collections are typically accessed by thread ID. ParamsData are immutable and always “mirrored” on both host and device. Sometimes it’s sensible to partition ParamsData into discrete helper structs (stored by value), each with a group of collections, and perhaps another struct that has non-templated scalars (since the default assignment operator is less work than manually copying scalars in a templated assignment operator.

A collection group has the following requirements to be compatible with the ParamsDataStore (for “params” collection groups), StateDataStore (for “state” collection groups), and other such helper classes:

  • Be a struct templated with template<Ownership W, MemSpace M>

  • Contain only Collection objects and trivially copyable structs

  • Define an operator bool that is true if and only if the class data is assigned and consistent

  • Define a templated assignment operator on “other” Ownership and MemSpace which assigns every member to the right-hand-side’s member

Additionally, a StateData collection group must define

  • A member function size() returning the number of entries (i.e. number of threads)

  • A free function resize with one of three signatures:

    void resize(
        StateData<Ownership::value, M>* data,
        HostCRef<ParamsData> const&     params,
        StreamId                        stream,
        size_type                       size);
    // or...
    void resize(
        StateData<Ownership::value, M>* data,
        const HostCRef<ParamsData>&     params,
        size_type                       size);
    // or...
    void resize(
        StateData<Ownership::value, M>* data,
        size_type                       size);
    

By convention, related groups of collections are stored in a header file named *Data.hh .

See ParticleParamsData and ParticleStateData for minimal examples of using collections. The MaterialParamsData demonstrates additional complexity by having a multi-level data hierarchy, and MaterialStateData has a resize function that uses params data. PhysicsParamsData is a very complex example, and VecgeomParamsData demonstrates how to use template specialization to adapt Collections to another codebase with a different convention for host-device portability.

A common paradigm for managing host-device data is to have a small fixed-size POD struct called a record that contains attributes about an item. These often need to reference a variable-sized range of data and do so by storing an ItemRange or ItemMap . These two types are offsets into “backend” data stored by a collection group.

Finally, note that the templated type aliases HostVal, HostCRef, DeviceRef, etc. are useful for functions that are specialized on MemSpace.

enum class celeritas::MemSpace

Memory location of data.

Values:

enumerator host

CPU memory.

enumerator device

GPU memory.

enumerator mapped

Unified virtual address space (both host and device)

enumerator size_
enumerator native

When compiling CUDA files, device else host.

enum class celeritas::Ownership

Data ownership flag.

Values:

enumerator value

The collection owns the data.

enumerator reference

Mutable reference to data.

enumerator const_reference

Immutable reference to data.

template<class T, class U = size_type>
using celeritas::ItemId = OpaqueId<T, U>

Opaque ID representing a single element of a container.

template<class T, class Size = size_type>
using celeritas::ItemRange = Range<OpaqueId<T, Size>>

Reference a contiguous range of IDs corresponding to a slice of items.

An ItemRange is a range of OpaqueId<T> that reference a range of values of type T in a Collection . The ItemRange acts like a slice object in Python when used on a Collection, returning a Span<T> of the underlying data.

An ItemRange is only meaningful in connection with a particular Collection of type T. It doesn’t have any persistent connection to its associated collection and thus must be used carefully.

struct MyMaterial
{
    real_type number_density;
    ItemRange<ElementComponents> components;
};

template<Ownership W, MemSpace M>
struct MyData
{
    Collection<ElementComponents, W, M> components;
    Collection<MyMaterial, W, M> materials;
};
Template Parameters:

T – The value type of items to represent.

template<class T1, class T2>
class ItemMap

Access data in a Range<T2> with an index of type T1.

Here, T1 and T2 are expected to be OpaqueId types. This is simply a type-safe “offset” with range checking.

Example:

using ElComponentId = OpaqueId<struct ElComp_>;
using MatId = OpaqueId<struct MaterialRecord>;

// POD struct (record) describing a material
struct MaterialRecord
{
  using DoubleId = ItemId<double>; // same as OpaqueId
  ItemMap<ElComponentId, DoubleId> components;
};

template<Ownership W, MemSpace M>
struct MatParamsData
{
  Collection<MaterialRecord, W, M> materials;
  Collection<double, W, M> doubles; // Backend storage
  // ...
};

Here, components semantically refers to a contiguous range of real values in the doubles collection, where ElComponentId{0} is the first value in that range. Dereferencing the value requires using the map alongside the backend storage:

double get_value(MatParamsData const& params, MatId m, ElComponentId ec)
{
  MaterialRecord const& mat = params.materials[m];
  ItemId<double> dbl_id = mat.components[ec];
  return params.doubles[dbl_id];
}
Note that this access requires only two indirections, as ItemMap is merely performing integer arithmetic.

template<class T, MemSpace M = MemSpace::native>
constexpr detail::AllItems_t<T, M> celeritas::all_items

Memspace-safe sentinel for obtaining a span of a collection.

template<class T, Ownership W, MemSpace M, class I = ItemId<T>>
class Collection

Manage generic array-like data ownership and transfer from host to device.

Data are constructed incrementally on the host, then copied (along with their associated ItemRange) to device. A Collection can act as a std::vector<T>, DeviceVector<T>, Span<T>, or Span<const T>. The Spans can point to host or device memory, but the MemSpace template argument protects against accidental accesses from the wrong memory space.

Each Collection object is usually accessed with an ItemRange, which references a contiguous set of elements in the Collection. For example, setup code on the host would extend the Collection with a series of vectors, the addition of which returns a ItemRange that returns the equivalent data on host or device. This methodology allows complex nested data structures to be built up quickly at setup time without knowing the size requirements beforehand.

Host-device functions and classes should use Collection with a reference or const_reference Ownership, and the MemSpace::native type, which expects device memory when compiled inside a CUDA file and host memory when used inside a C++ source or test. This design choice prevents a single CUDA file from compiling separate host-compatible and device-compatible compute kernels. (In the case of Celeritas this situation won’t arise, because we always want to build host code in C++ files for development ease and to allow testing when CUDA is disabled.)

A MemSpace::Mapped collection will be accessible on both host and device. Unified addressing must be supported by the current device, or an exception will be thrown when initializing the collection. Memory pages will reside on in “pinned” memory on host, and each access from device code to a changed page will require a slow memory transfer. Allocating pinned memory is slow and reduces the memory available to the system: so only allocate the smallest amount needed with the longest possible lifetime. Frequently accessing data from device code will result in low performance. Use case for mapped memory are:

Accessing a const_reference collection in device memory will return a wrapper container that accesses the low-level data through the celeritas::ldg wrapper function, which can accelerate random access on GPU by telling the compiler the memory will not be changed during the lifetime of the kernel. Therefore it is important to only use const Collections for shared, immutable-after-creation “params” data.

Accessing a reference collection returns mutable pointers, even when given the collection as a const reference.

template<template<Ownership, MemSpace> class P>
class ParamsDataStore : public celeritas::ParamsDataInterface<P>

Store and reference persistent collection groups on host and device.

This should generally be an implementation detail of Params classes, which are constructed on host and must have the same data both on host and device. The template P must be a FooData class that:

  • Is templated on ownership and memory space

  • Has a templated assignment operator to copy from one space to another

  • Has a boolean operator returning whether it’s in a valid state.

On assignment, it will copy the data to the device if the GPU is enabled.

Example:
class FooParams
{
  public:
    using CollectionDeviceRef = FooData<Ownership::const_reference,
                                        MemSpace::device>;

    const CollectionDeviceRef& device_ref() const
    {
        return data_.device_ref();
    }
  private:
    ParamsDataStore<FooData> data_;
};

Template Parameters:

P – Params data collection group

template<template<Ownership, MemSpace> class S, MemSpace M>
class StateDataStore

Store and reference stateful collection groups on host and device.

This can be used for unit tests (MemSpace is host) as well as production code. States generally shouldn’t be copied between host and device, so the only “production use case” construction argument is the size. Other constructors are implemented for convenience in unit tests.

The State class must be templated on ownership and memory space, and additionally must have an operator bool(), a templated operator=, and a size() accessor. It must also define a free function “resize” that takes:

  • REQUIRED: a pointer to the state with Ownership::value semantics

  • OPTIONAL: a Ownership::const_reference instance of MemSpace::host params data

  • OPTIONAL: a StreamId for setting up thread/task-local data

  • REQUIRED: a size_type for specifying the size of the new state.

Example:
StateDataStore<ParticleStateData, MemSpace::device> pstates(
    *particle_params, num_tracks);
state_data.particle = pstates.ref();

Template Parameters:

S – State data collection group

Optimized device data access

page Cached device loading

On GPUs, reading from global memory through the L1/texture cache can improve throughput when many threads access the same address.

The __ldg intrinsic (CUDA/HIP) performs such a cached load, using L1/texture memory rather than the ordinary data cache. The hardware contract is that the pointed-to memory must be read-only for the lifetime of the kernel; this is generally true for Params data (physics tables, geometry) but not for State data. On the host the ldg family of functions falls back to a plain dereference, so no special-casing is needed in caller code. Because ldg is only for read-only addresses, all arguments must match only const types.

Scalar values

Pass a const pointer to any supported type to the one-argument ldg:
real_type energy = ldg(&record.energy);
MaterialId mat   = ldg(&record.material);  // OpaqueId supported

Struct members

Load a single member without reading the whole struct using the two-argument overload or the storable LdgMember projector:
// Immediate two-argument form
BIHNodeId parent = ldg(node, &BIHLeafNode::parent);

// Storable callable -- useful with algorithms
auto load_parent = LdgMember{&BIHLeafNode::parent};
BIHNodeId parent = load_parent(node);

Spans and collections

LdgSpan<T const> (from corecel/cont/LdgSpan.hh) is an alias for Span whose iterator triggers __ldg on every element access. Use it as you would any ordinary span:
LdgSpan<real_type const> energies = params.get_energies();
for (real_type e : energies)   // each read uses __ldg
    process(e);

Collection<T, Ownership::const_reference, MemSpace::device> returns LdgSpan automatically when the element type supports ldg, so View classes built on device const_reference collections benefit without any extra work.

Extending ldg to a new type

ldg dispatches through the customization point ldg_data, found by argument-dependent lookup (ADL). To support a new type, define a free function in its namespace that returns a const pointer to an arithmetic type. For a wrapper struct holding a single int member:
namespace myns
{
struct MyCount { int value; };

CELER_CONSTEXPR_FUNCTION int const* ldg_data(MyCount const* p) noexcept
{
    return &p->value;
}
}  // namespace myns

Built-in overloads cover:

  • arithmetic types (identity, the default),

  • enum types (reinterpret-cast to the underlying integer),

  • OpaqueId<I,T> (pointer to the underlying index T), and

  • Quantity<U,T> (pointer to the underlying value T).

template<class T>
T celeritas::ldg(T const *ptr)

Wrap the low-level CUDA/HIP “load read-only global memory” function.

This relies on ldg_data found by ADL to obtain a pointer to the underlying arithmetic type; see Cached device loading for usage and extension examples.

On CUDA the load is cached in L1/texture memory, improving performance when data is repeatedly read by many threads in a kernel.

Warning

The target address must be read-only for the lifetime of the kernel. This is generally true for Params data but not State data.

template<class T, std::size_t Extent = dynamic_extent>
using celeritas::LdgSpan = Span<detail::LdgWrapper<T>, Extent>

Alias for a Span iterating over device const values read using ldg .

This instantiates Span with a special wrapper class to optimize constant data access in global device memory. In that case, data returned by front, back, operator[] and begin / end iterators use value semantics instead of reference.

Containers

These are containers and container-like objects used throughout Celeritas.

template<class T, std::size_t N>
class Array

Fixed-size simple array for storage.

The Array class is primarily used for point coordinates (e.g., Real3) but is also used for other fixed-size data structures.

This is not fully compatible with std::array:

  • no support for N=0

  • zero-initialized by default

Note

For supplementary functionality, include:

  • corecel/math/ArrayUtils.hh for real-number vector/matrix applications

  • corecel/math/ArrayOperators.hh for mathematical operators

  • ArrayIO.json.hh for JSON input and output

template<class E, class T>
class EnumArray

Thin wrapper for an array of enums for accessing by enum instead of int.

The enum must be a zero-indexed contiguous enumeration with a size_ enumeration as its last value.

Todo:

The template parameters are reversed!!!

template<class T>
class Range

Proxy container for iterating over a range of integral values.

Here, T can be any of:

  • an integer,

  • an enum that has contiguous zero-indexed values and a “size_” enumeration value indicating how many, or

  • an OpaqueId.

It is OK to dereference the end iterator! The result should just be the off-the-end value for the range, e.g. FooEnum::size_ or bar.size().

template<class T, std::size_t Extent = dynamic_extent>
class Span

Non-owning device-compatible reference to a contiguous span of data.

A Span, like std::string_view, provides access to externally managed data. In Celeritas, this class is typically used as a return result from accessing a range of elements in a Collection.

This implementation is a nonconforming backport of the C++20 std::span. Improvements for standards compatibility are welcome as long as they retain the same behavior in device code. Important differences from the standard std::span include:

  • Supports a special marker/tag type LdgValue<T> which causes element accessors and iterators to use value-semantics loads (optimized device loads) instead of references.

  • Uses a restricted constructor for iterators: instead of two separate iterator/end types, it uses only one.

  • Provides additional free helpers tailored to Celeritas: make_span overloads for Array<T,N>, C arrays, and generic containers, plus to_array() convenience and a host-only operator<< using StreamableContainer.

  • All public methods are decorated with CELER_CONSTEXPR_FUNCTION for host/device compatibility.

  • Some subview helpers use CELER_EXPECT to check for bounds validation in debug builds.

  • Dynamic-to-fixed conversion performs runtime checks when CELERITAS_DEBUG is on.

  • Default: empty span

  • Implicit: pointer and size

  • Implicit: two contiguous_iterator (first,last)

  • Implicit: C arrays and celeritas::Array when extents are compatible

  • Implicit: fixed-to-dynamic Span

Construction

  • Element access: operator[], front(), back()

  • Observers: data(), size(), size_bytes(), empty()

  • Iteration: begin(), end()

Data access

  • first<Count>(), first(count), last<Count>(), last(count)

  • subspan<Offset,Count>() and subspan(offset,count) for compile-time and runtime subviews

  • Deduction guides for pointer+size, iterator pairs, C arrays, and Array

  • Free functions make_span(...) and to_array(...)

Subviews and utilities

Template Parameters:
  • T – value type

  • Extent – fixed size; defaults to dynamic.

Auxiliary storage

Users and other parts of the code can add their own shared and stream-local (i.e., thread-local) data to Celeritas using the celeritas::AuxParamsInterface and celeritas::AuxStateInterface classes, accessed through the celeritas::AuxParamsRegistry and celeritas::AuxStateVec classes, respectively.

class AuxParamsInterface

Base class for extensible shared data that has associated state.

Auxiliary data can be added to a AuxParamsInterface at runtime to be passed among multiple classes, and then dynamic_cast to the expected type. It needs to supply a factory function for creating the a state instance for multithreaded data on a particular stream and a given memory space. Classes can inherit both from AuxParamsInterface and other ActionInterface classes.

Subclassed by celeritas::AuxParams< StatusCheckParamsData, StatusCheckStateData >, celeritas::ActionTimes, celeritas::AuxParams< P, S >, celeritas::ExtendFromPrimariesAction, celeritas::OffloadGatherAction< S >, celeritas::SlotDiagnostic, celeritas::optical::GeneratorBase

class AuxStateInterface

Auxiliary state data owned by a single stream.

This interface class is strictly to allow polymorphism and dynamic casting. It does not include attributes like size or memspace, because not all use cases require it.

Subclassed by celeritas::ActionTimesState, celeritas::AuxState< S, M >, celeritas::GeneratorStateBase, celeritas::PrimaryStateData< M >, celeritas::optical::CoreStateInterface

class AuxParamsRegistry

Manage auxiliary parameter classes.

This class keeps track of AuxParamsInterface classes.

class AuxStateVec

Manage single-stream auxiliary state data.

This class is constructed from a AuxParamsRegistry after the params are completely added and while the state is being constructed (with its size, etc.). The AuxId for an element of this class corresponds to the AuxParamsRegistry.

This class can be empty either by default or if the given auxiliary registry doesn’t have any entries.

Auxiliary collection groups

template<template<Ownership, MemSpace> class P, template<Ownership, MemSpace> class S>
class AuxParams : public celeritas::AuxParamsInterface, public celeritas::ParamsDataInterface<P>

Construct and manage portable dynamic data.

This generalization of the Celeritas data model manages some of the boilerplate code for the common use case of having portable “params” data (e.g., model data) and “state” data (e.g., temporary values used across multiple kernels or processed into user space). Each state/stream will have an instance of AuxState accessible by this class. An instance of this class can be shared among multiple actions, or an action could inherit from it.

Example:

The StepParams inherits from this class to provide access to host and state data. The execution inside StepGatherAction provides views to both the params and state data classes:

// Extract the local step state data
auto const& step_params = params_->ref<MemSpace::native>();
auto& step_state = params_->ref<MemSpace::native>(state.aux());

// Run the action
auto execute = TrackExecutor{
    params.ptr<MemSpace::native>(),
    state.ptr(),
    detail::StepGatherExecutor<P>{step_params, step_state}};

Note

For the case where the aux state data contains host-side classes and data (e.g., an open file handle) you must manually set up the params/state data using AuxStateInterface and AuxParamsInterface .

Template Parameters:
  • P – Params collection group

  • S – State collection group