|
Celeritas 0.7+c0d26b9
|
On GPUs, reading from global memory through the L1/texture cache can improve throughput when many threads access the same address.
The __ldg intrinsic (CUDA/HIP) performs such a cached load, using L1/texture memory rather than the ordinary data cache. The hardware contract is that the pointed-to memory must be read-only for the lifetime of the kernel; this is generally true for Params data (physics tables, geometry) but not for State data. On the host the ldg family of functions falls back to a plain dereference, so no special-casing is needed in caller code. Because ldg is only for read-only addresses, all arguments must match only const types.
Pass a const pointer to any supported type to the one-argument ldg:
Load a single member without reading the whole struct using the two-argument overload or the storable LdgMember projector:
LdgSpan<T const> (from corecel/cont/LdgSpan.hh) is an alias for Span whose iterator triggers __ldg on every element access. Use it as you would any ordinary span:
Collection<T, Ownership::const_reference, MemSpace::device> returns LdgSpan automatically when the element type supports ldg, so View classes built on device const_reference collections benefit without any extra work.
ldg dispatches through the customization point ldg_data, found by argument-dependent lookup (ADL). To support a new type, define a free function in its namespace that returns a const pointer to an arithmetic type. For a wrapper struct holding a single int member:
Built-in overloads cover:
OpaqueId<I,T> (pointer to the underlying index T), andQuantity<U,T> (pointer to the underlying value T).detail::LdgWrapper<T const> is a thin proxy (similar to std::reference_wrapper) that stores a const pointer and implicitly converts to the value type by calling ldg. The result is always a value, not a reference, and the load goes through __ldg on device.
detail::LdgIterator<T const> is a random-access iterator whose operator* returns an LdgWrapper. Wrapping it in Span yields LdgSpan: range-for loops and standard algorithms transparently trigger __ldg on every element access without requiring any change at the call site.