Celeritas  0.5.0-86+4a8eea4
Public Member Functions | List of all members
celeritas::KernelLauncher< F > Class Template Reference

Profile and launch Celeritas kernels. More...

#include <KernelLauncher.device.hh>

Inheritance diagram for celeritas::KernelLauncher< F >:
Inheritance graph
[legend]

Public Member Functions

 KernelLauncher (std::string_view name)
 Create a launcher from a label.
 
void operator() (Range< ThreadId > threads, StreamId stream_id, F const &execute_thread) const
 Launch a kernel for a thread range.
 
void operator() (size_type num_threads, StreamId stream_id, F const &execute_thread) const
 Launch a kernel with a custom number of threads. More...
 

Detailed Description

template<class F>
class celeritas::KernelLauncher< F >

Profile and launch Celeritas kernels.

The template argument F may define a member type named Applier. F::Applier should have up to two static constexpr int variables named max_block_size and/or min_warps_per_eu. If present, the kernel will use appropriate launch_bounds. If F::Applier::min_warps_per_eu exists then F::Applier::max_block_size must also be present or we get a compile error.

The semantics of the second launch_bounds argument differs between CUDA and HIP. KernelLauncher expects HIP semantics. If Celeritas is built targeting CUDA, it will automatically convert that argument to match CUDA semantics.

The CUDA-specific 3rd argument maxBlocksPerCluster is not supported.

Example:

void FooAction::launch_kernel(size_type count) const
{
auto execute_thread = make_blah_executor(blah);
static KernelLauncher<decltype(execute_thread)> const
launch_kernel("blah");
launch_kernel(state, execute_thread);
}
void launch_kernel(size_type num_threads, F &&execute_thread)
Helper function to run an executor in parallel on CPU.
Definition: KernelLauncher.hh:35
KernelLauncher(std::string_view name)
Create a launcher from a label.
Definition: KernelLauncher.device.hh:92

Member Function Documentation

◆ operator()()

template<class F >
void celeritas::KernelLauncher< F >::operator() ( size_type  num_threads,
StreamId  stream_id,
F const &  execute_thread 
) const
inline

Launch a kernel with a custom number of threads.

The launch arguments have the same ordering as CUDA/HIP kernel launch arguments.

Parameters
num_threadsTotal number of active consecutive threads
stream_idExecute the kernel on this device stream
execute_threadCall the given functor with the thread ID

The documentation for this class was generated from the following file: