Celeritas
0.5.0-86+4a8eea4
|
Profile and launch Celeritas kernels. More...
#include <KernelLauncher.device.hh>
Public Member Functions | |
KernelLauncher (std::string_view name) | |
Create a launcher from a label. | |
void | operator() (Range< ThreadId > threads, StreamId stream_id, F const &execute_thread) const |
Launch a kernel for a thread range. | |
void | operator() (size_type num_threads, StreamId stream_id, F const &execute_thread) const |
Launch a kernel with a custom number of threads. More... | |
Profile and launch Celeritas kernels.
The template argument F
may define a member type named Applier
. F::Applier
should have up to two static constexpr int variables named max_block_size
and/or min_warps_per_eu
. If present, the kernel will use appropriate launch_bounds
. If F::Applier::min_warps_per_eu
exists then F::Applier::max_block_size
must also be present or we get a compile error.
The semantics of the second launch_bounds
argument differs between CUDA and HIP. KernelLauncher
expects HIP semantics. If Celeritas is built targeting CUDA, it will automatically convert that argument to match CUDA semantics.
The CUDA-specific 3rd argument maxBlocksPerCluster
is not supported.
Example:
|
inline |
Launch a kernel with a custom number of threads.
The launch arguments have the same ordering as CUDA/HIP kernel launch arguments.
num_threads | Total number of active consecutive threads |
stream_id | Execute the kernel on this device stream |
execute_thread | Call the given functor with the thread ID |