|
Celeritas 0.7.0-dev.179+develop.ec08af0c9
|
Profile and launch Celeritas kernels. More...
#include <KernelLauncher.device.hh>

Public Member Functions | |
| KernelLauncher (std::string_view name) | |
| Create a launcher from a label. | |
| void | operator() (Range< ThreadId > threads, StreamId stream_id, F const &execute_thread) const |
| Launch a kernel for a thread range. | |
| void | operator() (size_type num_threads, StreamId stream_id, F const &execute_thread) const |
| Launch a kernel with a custom number of threads. | |
Profile and launch Celeritas kernels.
The template argument F may define a member type named Applier. F::Applier should have up to two static constexpr int variables named max_block_size and/or min_warps_per_eu. If present, the kernel will use appropriate __launch_bounds__. If F::Applier::min_warps_per_eu exists then F::Applier::max_block_size must also be present or we get a compile error.
The semantics of the second __launch_bounds__ argument differs between CUDA and HIP. KernelLauncher expects HIP semantics. If Celeritas is built targeting CUDA, it will automatically convert that argument to match CUDA semantics.
The CUDA-specific 3rd argument maxBlocksPerCluster is not supported.
Example:
|
inline |
Launch a kernel with a custom number of threads.
The launch arguments have the same ordering as CUDA/HIP kernel launch arguments.
| num_threads | Total number of active consecutive threads |
| stream_id | Execute the kernel on this device stream |
| execute_thread | Call the given functor with the thread ID |