Dynamically allocate arbitrary data on a stack. More...

#include <StackAllocator.hh>

Public Types
Type aliases
using	value_type = T

using	result_type = value_type *

using	Data = StackAllocatorData< T, Ownership::reference, MemSpace::native >

Public Member Functions
CELER_FUNCTION	StackAllocator (Data const &data)
	Construct with defaults.

CELER_FUNCTION size_type	capacity () const
	Get the maximum number of values that can be allocated.

CELER_FUNCTION void	clear ()
	Clear the stack allocator.

CELER_FUNCTION result_type	operator() (size_type count)
	Allocate space for a given number of items.

CELER_FUNCTION size_type	size () const
	Get the number of items currently present.

CELER_FUNCTION Span< value_type >	get ()
	View all allocated data.

CELER_FUNCTION Span< value_type const >	get () const
	View all allocated data (const).

Detailed Description

template<class T>
class celeritas::StackAllocator< T >

Dynamically allocate arbitrary data on a stack.

The stack allocator view acts as a functor and accessor to the allocated data. It enables very fast on-device dynamic allocation of data, such as secondaries or detector hits. As an example, inside a hypothetical physics Interactor class, you could create two particles with the following code:

struct Interactor
{
   StackAllocator<Secondary> allocate;
 
   // Sample an interaction
   template<class Engine>
   Interaction operator()(Engine&)
   {
      // Create 2 secondary particles
      Secondary* allocated = this->allocate(2);
      if (!allocated)
      {
          return Interaction::from_failure();
      }
      Interaction result;
      result.secondaries = Span<Secondary>{allocated, 2};
      return result;
   };
};

A later kernel could then iterate over the secondaries to apply cutoffs:

using SecondaryRef
    = StackAllocatorData<Secondary, Ownership::reference, MemSpace::device>;
 
__global__ apply_cutoff(const SecondaryRef ptrs)
{
    auto thread_idx = celeritas::KernelParamCalculator::thread_id().get();
    StackAllocator<Secondary> allocate(ptrs);
    auto secondaries = allocate.get();
    if (thread_idx < secondaries.size())
    {
        Secondary& sec = secondaries[thread_idx];
        if (sec.energy < 100 * units::kilo_electron_volts)
        {
            sec.energy = 0;
        }
    }
}

You cannot safely access the current size of the stack in the same kernel that's modifying it – if the stack attempts to allocate beyond the end, then the size() call will reflect that overflowed state, rather than the corrected size reflecting the failed allocation.

A third kernel with a single thread would then be responsible for clearing the data:

__global__ clear_stack(const SecondaryRef ptrs)
{
    StackAllocator<Secondary> allocate(ptrs);
    auto thread_idx = celeritas::KernelParamCalculator::thread_id().get();
    if (thread_idx == 0)
    {
        allocate.clear();
    }
}

These separate kernel launches are needed as grid-level synchronization points.

Todo:: Instead of returning a pointer, return IdRange<T>. Rename StackAllocatorData to StackAllocation and have it look like a collection so that it will provide access to the data. Better yet, have a StackAllocation that can be a const_reference to the StackAllocatorData. Then the rule will be "you can't create a StackAllocator in the same kernel that you directly access a StackAllocation".

Member Function Documentation

◆ clear()

template<class T >

CELER_FUNCTION void celeritas::StackAllocator< T >::clear ( )

inline

Clear the stack allocator.

This sets the size to zero. It should ideally only be called by a single thread (though multiple threads resetting it should also be OK), but cannot be used in the same kernel that is allocating or viewing it. This is because the access times between different threads or thread-blocks is indeterminate inside of a single kernel.

◆ get() [1/2]

template<class T >

CELER_FUNCTION auto celeritas::StackAllocator< T >::get ( )

inline

View all allocated data.

This cannot be called while any running kernel could be modifiying the size.

◆ get() [2/2]

template<class T >

CELER_FUNCTION auto celeritas::StackAllocator< T >::get ( ) const

inline

View all allocated data (const).

This cannot be called while any running kernel could be modifiying the size.

◆ operator()()

template<class T >

CELER_FUNCTION auto celeritas::StackAllocator< T >::operator() ( size_type count )

inline

Allocate space for a given number of items.

Returns NULL if allocation failed due to out-of-memory. Ensures that the shared size reflects the amount of data allocated.

Todo:: It might be useful to set an "out of memory" flag to make it easier for host code to detect whether a failure occurred, rather than looping through primaries and testing for failure.

◆ size()

template<class T >

CELER_FUNCTION auto celeritas::StackAllocator< T >::size ( ) const

inline

Get the number of items currently present.

This value may not be meaningful (may be less than "actual" size) if called in the same kernel as other threads that are allocating.

The documentation for this class was generated from the following file:

StackAllocator.hh

Public Types

Public Member Functions

Detailed Description

Member Function Documentation

◆ clear()

◆ get() [1/2]

◆ get() [2/2]

◆ operator()()

◆ size()