On GPUs, reading from global memory through the L1/texture cache can improve throughput when many threads access the same address.

The __ldg intrinsic (CUDA/HIP) performs such a cached load, using L1/texture memory rather than the ordinary data cache. The hardware contract is that the pointed-to memory must be read-only for the lifetime of the kernel; this is generally true for Params data (physics tables, geometry) but not for State data. On the host the ldg family of functions falls back to a plain dereference, so no special-casing is needed in caller code. Because ldg is only for read-only addresses, all arguments must match only const types.

Scalar values

Pass a const pointer to any supported type to the one-argument ldg:

real_type energy = ldg(&record.energy);

MaterialId mat = ldg(&record.material); // OpaqueId supported

Struct members

Load a single member without reading the whole struct using the two-argument overload or the storable LdgMember projector:

// Immediate two-argument form
BIHNodeId parent = ldg(node, &BIHLeafNode::parent);
 
// Storable callable -- useful with algorithms
auto load_parent = LdgMember{&BIHLeafNode::parent};
BIHNodeId parent = load_parent(node);

Spans and collections

LdgSpan<T const> (from corecel/cont/LdgSpan.hh) is an alias for Span whose iterator triggers __ldg on every element access. Use it as you would any ordinary span:

LdgSpan<real_type const> energies = params.get_energies();
for (real_type e : energies)   // each read uses __ldg
    process(e);

Collection<T, Ownership::const_reference, MemSpace::device> returns LdgSpan automatically when the element type supports ldg, so View classes built on device const_reference collections benefit without any extra work.

Extending ldg to a new type

ldg dispatches through the customization point ldg_data, found by argument-dependent lookup (ADL). To support a new type, define a free function in its namespace that returns a const pointer to an arithmetic type. For a wrapper struct holding a single int member:

namespace myns
{
struct MyCount { int value; };
 
CELER_CONSTEXPR_FUNCTION int const* ldg_data(MyCount const* p) noexcept
{
    return &p->value;
}
}  // namespace myns

Built-in overloads cover:

arithmetic types (identity, the default),
enum types (reinterpret-cast to the underlying integer),
OpaqueId<I,T> (pointer to the underlying index T), and
Quantity<U,T> (pointer to the underlying value T).

Implementation details

detail::LdgWrapper<T const> is a thin proxy (similar to std::reference_wrapper) that stores a const pointer and implicitly converts to the value type by calling ldg. The result is always a value, not a reference, and the load goes through __ldg on device.

detail::LdgIterator<T const> is a random-access iterator whose operator* returns an LdgWrapper. Wrapping it in Span yields LdgSpan: range-for loops and standard algorithms transparently trigger __ldg on every element access without requiring any change at the call site.