Data model

Data storage must be isolated from data use for any code that is to run on the device. This allows low-level physics classes to operate on references to data using the exact same device/host code. Furthermore, state data (one per track) and shared data (definitions, persistent data, model data) should be separately allocated and managed.

Params

Provide a CPU-based interface to manage and provide access to constant shared GPU data, usually model parameters or the like. The Params class itself can only be accessed via host code. A params class can contain metadata (string names, etc.) suitable for host-side debug output and for helping related classes convert from user-friendly input (e.g. particle name) to device-friendly IDs (e.g., particle ID). These classes should inherit from the ParamsDataInterface class to define uniform helper methods and types and will often implement the data storage by using CollectionMirror.

State

Thread-local data specifying the state of a single particle track with respect to a corresponding params class (FooParams). In the main Celeritas stepping loop, all state data is managed via the CoreState class.

View

Device-friendly class that provides read and/or write access to shared and local state data. The name is in the spirit of std::string_view, which adds functionality to non-owned data. It combines the state variables and model parameters into a single class. The constructor always takes const references to ParamsData and StateData as well as the track slot ID. It encapsulates the storage/layout of the state and parameters, as well as what (if any) data is cached in the state.

Hint

Consider the following example.

All SM physics particles share a common set of properties such as mass and charge, and each instance of particle has a particular set of associated variables such as kinetic energy. The shared data (SM parameters) reside in ParticleParams, and the particle track properties are managed by a ParticleStateStore class.

A separate class, the ParticleTrackView, is instantiated with a specific thread ID so that it acts as an accessor to the stored data for a particular track. It can calculate properties that depend on both the state and parameters. For example, momentum depends on both the mass of a particle (constant, set by the model) and the speed (variable, depends on particle track state).

Storage

page collections

The Collection manages data allocation and transfer between CPU and GPU.

The Collection manages data allocation and transfer between CPU and GPU.

Its primary design goal is facilitating construction of deeply hierarchical data on host at setup time and seamlessly copying to device. The templated T must be trivially copyable—either a fundamental data type or a struct of such types.

An individual item in a Collection<T> can be accessed with ItemId<T>, a contiguous subset of items are accessed with ItemRange<T>, and the entirety of the data are accessed with AllItems<T>. All three of these classes are trivially copyable, so they can be embedded in structs that can be managed by a Collection. A group of Collections, one for each data type, can therefore be trivially copied to the GPU to enable arbitrarily deep and complex data hierarchies.

By convention, groups of Collections comprising the data for a single class or subsystem (such as RayleighInteractor or Physics) are stored in a helper struct suffixed with Data . For cases where there is both persistent data (problem-specific parameters) and transient data (track-specific states), the collections must be grouped into two separate classes. StateData are meant to be mutable and never directly copied between host and device; its data collections are typically accessed by thread ID. ParamsData are immutable and always “mirrored” on both host and device. Sometimes it’s sensible to partition ParamsData into discrete helper structs (stored by value), each with a group of collections, and perhaps another struct that has non-templated scalars (since the default assignment operator is less work than manually copying scalars in a templated assignment operator.

A collection group has the following requirements to be compatible with the CollectionMirror, CollectionStateStore, and other such helper classes:

  • Be a struct templated with template<Ownership W, MemSpace M>

  • Contain only Collection objects and trivially copyable structs

  • Define an operator bool that is true if and only if the class data is assigned and consistent

  • Define a templated assignment operator on “other” Ownership and MemSpace which assigns every member to the right-hand-side’s member

Additionally, a StateData collection group must define

  • A member function size() returning the number of entries (i.e. number of threads)

  • A free function resize with one of two signatures:

    void resize(
        StateData<Ownership::value, M>* data,
        HostCRef<ParamsData> const&     params,
        StreamId                        stream,
        size_type                       size);
    // or...
    void resize(
        StateData<Ownership::value, M>* data,
        const HostCRef<ParamsData>&     params,
        size_type                       size);
    // or...
    void resize(
        StateData<Ownership::value, M>* data,
        size_type                       size);
    

By convention, related groups of collections are stored in a header file named Data.hh .

See ParticleParamsData and ParticleStateData for minimal examples of using collections. The MaterialParamsData demonstrates additional complexity by having a multi-level data hierarchy, and MaterialStateData has a resize function that uses params data. PhysicsParamsData is a very complex example, and GeoParamsData demonstates how to use template specialization to adapt Collections to another codebase with a different convention for host-device portability.

enum class celeritas::MemSpace

Memory location of data.

Values:

enumerator host

CPU memory.

enumerator device

GPU memory.

enumerator mapped

Unified virtual address space (both host and device)

enumerator size_
enumerator native

When included by a CUDA/HIP file; else ‘host’.

enum class celeritas::Ownership

Data ownership flag.

Values:

enumerator value

Ownership of the data, only on host.

enumerator reference

Mutable reference to the data.

enumerator const_reference

Immutable reference to the data.

template<class ValueT, class SizeT = ::celeritas::size_type>
class OpaqueId

Type-safe index for accessing an array or collection of data.

It’s common for classes and functions to take multiple indices, especially for O(1) indexing for performance. By annotating these values with a type, we give them semantic meaning, and we gain compile-time type safety.

If this class is used for indexing into an array, then ValueT argument should be the value type of the array: Foo operator[](OpaqueId<Foo>)

An OpaqueId object evaluates to true if it has a value, or false if it does not (i.e. it has an “invalid” value).

See also id_cast below for checked construction of OpaqueIds from generic integer values (avoid compile-time warnings or errors from signed/truncated integers).

Template Parameters:
  • ValueT – Type of each item in an array

  • SizeT – Unsigned integer index

template<class T>
using celeritas::ItemId = OpaqueId<T, size_type>

Opaque ID representing a single element of a container.

template<class T, class Size = size_type>
using celeritas::ItemRange = Range<OpaqueId<T, Size>>

Reference a contiguous range of IDs corresponding to a slice of items.

An ItemRange is a range of OpaqueId<T> that reference a range of values of type T in a Collection . The ItemRange acts like a slice object in Python when used on a Collection, returning a Span<T> of the underlying data.

An ItemRange is only meaningful in connection with a particular Collection of type T. It doesn’t have any persistent connection to its associated collection and thus must be used carefully.

struct MyMaterial
{
    real_type number_density;
    ItemRange<ElementComponents> components;
};

template<Ownership W, MemSpace M>
struct MyData
{
    Collection<ElementComponents, W, M> components;
    Collection<MyMaterial, W, M> materials;
};
Template Parameters:

T – The value type of items to represent.

template<class T1, class T2>
class ItemMap

Access data in a Range<T2> with an index of type T1.

Here, T1 and T2 are expected to be OpaqueId types. This is simply a type-safe “offset” with range checking.

template<class T, Ownership W, MemSpace M, class I = ItemId<T>>
class Collection

Manage generic array-like data ownership and transfer from host to device.

Data are constructed incrementally on the host, then copied (along with their associated ItemRange) to device. A Collection can act as a std::vector<T>, DeviceVector<T>, Span<T>, or Span<const T>. The Spans can point to host or device memory, but the MemSpace template argument protects against accidental accesses from the wrong memory space.

Each Collection object is usually accessed with an ItemRange, which references a contiguous set of elements in the Collection. For example, setup code on the host would extend the Collection with a series of vectors, the addition of which returns a ItemRange that returns the equivalent data on host or device. This methodology allows complex nested data structures to be built up quickly at setup time without knowing the size requirements beforehand.

Host-device functions and classes should use Collection with a reference or const_reference Ownership, and the MemSpace::native type, which expects device memory when compiled inside a CUDA file and host memory when used inside a C++ source or test. (This design choice prevents a single CUDA file from compiling separate host-compatible and device-compatible compute kernels, but in the case of Celeritas this situation won’t arise, because we always want to build host code in C++ files for development ease and to allow testing when CUDA is disabled.)

A MemSpace::Mapped collection will be accessible on the host and the device. Unified addressing must be supported by the current device or an exception will be thrown when using the collection. Mapped pinned memory (i.e. zero-copy memory) is allocated, pages will always reside on host memory and each access from device code will require a slow memory transfer. Allocating pinned memory is slow and reduce the memory available to the system: only allocate the smallest amount needed with the longest possible lifetime. Frequently accessing data from device code will result in low performance. Usecase for this MemSapce are: as a src / dst memory space for asynchronous operations, on integrated GPU architecture, or a single coalesced read or write from device code. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#zero-copy

Accessing a const_reference collection in device memory will return a wrapper container that accesses the low-level data through the __ldg primitive, which can accelerate random access by telling the compiler the memory will not be changed during the lifetime of the kernel. Therefore it is important to only use Collections for shared, constant “params” data.

template<template<Ownership, MemSpace> class P>
class CollectionMirror : public celeritas::ParamsDataInterface<P>

Helper class for copying setup-time Collection groups to host and device.

This should generally be an implementation detail of Params classes, which are constructed on host and must have the same data both on host and device. The template P must be a FooData class that:

  • Is templated on ownership and memory space

  • Has a templated assignment operator to copy from one space to another

  • Has a boolean operator returning whether it’s in a valid state.

On assignment, it will copy the data to the device if the GPU is enabled.

Example:

class FooParams
{
  public:
    using CollectionDeviceRef = FooData<Ownership::const_reference,
                                        MemSpace::device>;

    const CollectionDeviceRef& device_ref() const
    {
        return data_.device_ref();
    }
  private:
    CollectionMirror<FooData> data_;
};

Containers

template<class T, ::celeritas::size_type N>
struct Array

Fixed-size simple array for storage.

The Array class is primarily used for point coordinates (e.g., Real3) but is also used for other fixed-size data structures.

This isn’t fully standards-compliant with std::array: there’s no support for N=0 for example. Additionally it uses the native celeritas size_type, even though this has no effect on generated code for values of N inside the range of size_type.

Note

For supplementary functionality, include:

  • corecel/math/ArrayUtils.hh for real-number vector/matrix applications

  • corecel/math/ArrayOperators.hh for mathematical operators

  • ArrayIO.hh for streaming and string conversion

  • ArrayIO.json.hh for JSON input and output

Operations

Fill the array with a constant value

template<class T, std::size_t Extent = dynamic_extent>
class Span

Non-owning reference to a contiguous span of data.

This Span class is a modified backport of the C++20 std::span . In Celeritas, it is often used as a return value from accessing elements in a Collection.

Like the celeritas::Array , this class isn’t 100% compatible with the std::span class (partly of course because language features are missing from C++14). The hope is that it will be complete and correct for the use cases needed by Celeritas (and, as a bonus, it will be device-compatible).

Notably, only a subset of the functions (those having to do with size) are constexpr. This is to allow debug assertions.

Span can be instantiated with the special marker type LdgValue<T> to optimize reading constant data on device memory. In that case, data returned by front, back, operator[] and begin / end iterator use value semantics instead of reference. data still returns a pointer to the data and can be used to bypass using LdgIterator

Template Parameters:
  • T – value type

  • Extent – fixed size; defaults to dynamic.

Auxiliary user data

Users and other parts of the code can add their own shared and stream-local (i.e., thread-local) data to Celeritas using the AuxParamsInterface and AuxStateInterface classes, accessed through the AuxParamsRegistry and AuxStateVec classes, respectively.

class AuxParamsInterface

Base class for extensible shared data that has associated state.

Auxiliary data can be added to a AuxParamsInterface at runtime to be passed among multiple classes, and then dynamic_cast to the expected type. It needs to supply a factory function for creating the a state instance for multithreaded data on a particular stream and a given memory space.

Subclassed by celeritas::ExtendFromPrimariesAction, celeritas::SlotDiagnostic, celeritas::StatusChecker

class AuxParamsRegistry

Manage auxiliary-added parameter classes.

An instance of this class can be added to shared problem data so that users (and other parts of Celeritas) can share arbitrary information between parts of the code and create independent state data for each stream.

class AuxStateVec

Manage single-stream auxiliary state data.

This class is constructed from a AuxParamsRegistry after the params are completely added and while the state is being constructed (with its size, etc.). The AuxId for an element of this class corresponds to the AuxParamsRegistry.

This class can be empty either by default or if the given auxiliary registry doesn’t have any entries.