Using Celeritas#
Celeritas includes a core set of libraries for internal and external use, as well as several helper applications and front ends.
Software library#
The most stable part of Celeritas is, at the present time, the high-level
program interface to the Acceleritas code library. However, many other
components of the API are stable and documented in the api
section.
CMake integration#
The Celeritas library is most easily used when your downstream app is built with CMake. It should require a single line to initialize:
find_package(Celeritas REQUIRED CONFIG)
and if VecGeom or CUDA are disabled a single line to link:
target_link_libraries(mycode PUBLIC Celeritas::celeritas)
Because of complexities involving CUDA Relocatable Device Code, consuming Celeritas with CUDA and VecGeom support requires an additional include and the use of wrappers around CMake’s target commands:
include(CeleritasLibrary)
celeritas_add_executable(mycode ...)
celeritas_target_link_libraries(mycode PUBLIC Celeritas::celeritas)
As the celeritas_...
functions decay to the wrapped CMake commands if CUDA
and VecGeom you can use them to safely build and link nearly all targets
consuming Celeritas in your project. This provides tracking of the appropriate
sequence of linking for the final application whether it uses CUDA code or not,
and whether Celeritas is CPU-only or CUDA enabled:
celeritas_add_library(myconsumer SHARED ...)
celeritas_target_link_libraries(myconsumer PUBLIC Celeritas::celeritas)
celeritas_add_executable(myapplication ...)
celeritas_target_link_libraries(myapplication PRIVATE myconsumer)
If your project builds shared libraries that are intended to be loaded at application
runtime (e.g. via dlopen
), you should prefer use the the CMake MODULE
target type:
celeritas_add_library(myplugin MODULE ...)
celeritas_target_link_libraries(myplugin PRIVATE Celeritas::celeritas)
This is recommended as celeritas_target_link_libraries
understands these as
a final target for which all device symbols require resolving. If you are
forced to use the SHARED
target type for plugin libraries (e.g. via your
project’s own wrapper functions), then these should be declared with the CMake
or project-specific commands with linking to both the primary Celeritas target
and its device code counterpart:
add_library(mybadplugin SHARED ...)
# ... or myproject_add_library(mybadplugin ...)
target_link_libraries(mybadplugin PRIVATE Celeritas::celeritas $<TARGET_NAME_IF_EXISTS:Celeritas::celeritas_final>)
# ... or otherwise declare the plugin as requiring linking to the two targets
Celeritas device code counterpart target names are always the name of the
primary target appended with _final
. They are only present if Celeritas was
built with CUDA support so it is recommended to use the CMake generator
expression above to support CUDA or CPU-only builds transparently.
The Minimal Celeritas usage example demonstrates how to use Celeritas as a library with a short standalone CMake project.
Standalone simulation app (celer-sim)#
The celer-sim
application is the primary means of running EM test problems
for independent validation and performance analysis.
Usage:
usage: celer-sim {input}.json
celer-sim [--help|-h]
celer-sim --version
celer-sim --config
celer-sim --dump-default
input.json
is the path to the input file, or-
to read the JSON fromstdin
.The
--config
option prints the contents of the["system"]["build"]
diagnostic output. It includes configuration options and the version number.The
--dump-default
option prints the default options for the execution. Not all variables will be shown, because some are conditional on others.
Input#
In addition to these input parameters, Environment variables can be specified to change the program behavior.
Output#
The primary output from celer-sim
is a JSON object that includes several
levels of diagnostic and result data (see I/O). The JSON
output should be the only data sent to stdout
, so it should be suitable for
piping directly into other executables such as Python or jq
.
Additional user-oriented output is sent to stderr
via the Logger facility
(see Logging).
Integrated Geant4 application (celer-g4)#
The celer-g4
app is a Geant4 application that offloads EM tracks to
Celeritas. It takes as input a GDML file with the detector description and
sensitive detectors marked via an auxiliary
annotation. The input particles
must be specified with a HepMC3-compatible file or with a JSON-specified
“particle gun.”
Usage:
celer-g4 {input}.json
{commands}.mac
--interactive
--dump-default
Input#
Physics is set up using the top-level physics_option
key in the JSON input,
corresponding to Geant4 physics options. The magnetic field is
specified with a combination of the field_type
, field
, and
field_file
keys, and detailed field driver configuration options are set
with field_options
corresponding to the FieldOptions
class in Field data input and options.
Note
The macro file usage is in the process of being replaced by JSON input for improved automation.
The input is a Geant4 macro file for executing the program. Celeritas defines
several macros in the /celer
and (if CUDA is available) /celer/cuda/
directories: see High-level interface for a listing.
The celer-g4
app defines several additional configuration commands under
/celerg4
:
Command |
Description |
---|---|
geometryFile |
Filename of the GDML detector geometry |
eventFile |
Filename of the event input read by HepMC3 |
rootBufferSize |
Buffer size of output root file [bytes] |
writeSDHits |
Write a ROOT output file with hits from the SDs |
stepDiagnostic |
Collect the distribution of steps per Geant4 track |
stepDiagnosticBins |
Number of bins for the Geant4 step diagnostic |
fieldType |
Select the field type [rzmap|uniform] |
fieldFile |
Filename of the rz-map loaded by RZMapFieldInput |
magFieldZ |
Set Z-axis magnetic field strength (T) |
In addition to these input parameters, Environment variables can be specified to change the program behavior.
Output#
The ROOT “MC truth” output file, if enabled with the command above, contains hits from all the sensitive detectors.
Additional utilities#
The Celeritas installation includes additional utilities for inspecting input and output.
celer-export-geant#
This utility exports the physics and geometry data needed to run Celeritas without directly calling Geant4 for an independent run. Since it isolates Celeritas from any existing Geant4 installation it can also be a means of debugging whether a behavior change is due to a code change in Celeritas or (for example) a change in cross sections from Geant4.
Usage:
celer-export-geant {input}.gdml [{options}.json, -, ''] {output}.root
celer-export-geant --dump-default
- input
Detector definition file
- options
An optional argument for specifying a JSON file with Geant4 setup options corresponding to the Geant4 physics options struct.
- output
A ROOT output file with the exported Physics data.
The --dump-default
usage renders the default options.
celer-dump-data#
This utility prints an RST-formatted high-level dump of physics data exported via celer-export-geant.
Usage:
celer-dump-data {output}.root
- output
A ROOT file containing exported Physics data.
orange-update#
Read an ORANGE JSON input file and write it out again. This is used for updating from an older version of the input to a newer version.
Usage:
orange-update {input}.org.json {output}.org.json
Either of the filenames can be replaced by -
to read from stdin or write to
stdout.
Environment variables#
Some pieces of core Celeritas code interrogate the environment for variables to change system- or output-level behavior. These variables are checked once per execution, and checking them inserts the key and user-defined value (or empty) into a diagnostic database saved to Celeritas’ JSON output, so the user can tell what variables are in use or may be useful.
Variable |
Component |
Brief description |
---|---|---|
CELER_COLOR |
corecel |
Enable/disable ANSI color logging |
CELER_DEBUG_DEVICE |
corecel |
Increase device error checking and output |
CELER_DISABLE_DEVICE |
corecel |
Disable CUDA/HIP support |
CELER_DISABLE_PARALLEL |
corecel |
Disable MPI support |
CELER_DISABLE_ROOT |
corecel |
Disable ROOT I/O calls |
CELER_ENABLE_PROFILING |
corecel |
Set up NVTX/ROCTX profiling ranges [#pr] |
CELER_LOG |
corecel |
Set the “global” logger verbosity |
CELER_LOG_LOCAL |
corecel |
Set the “local” logger verbosity |
CELER_MEMPOOL… [1] |
celeritas |
Change |
CELER_PROFILE_DEVICE |
corecel |
Record extra kernel launch information |
CUDA_HEAP_SIZE |
celeritas |
Change |
CUDA_STACK_SIZE |
celeritas |
Change |
G4VG_COMPARE_VOLUMES |
celeritas |
Check G4VG volume capacity when converting |
HEPMC3_VERBOSE |
celeritas |
HepMC3 debug verbosity |
VECGEOM_VERBOSE |
celeritas |
VecGeom CUDA verbosity |
CELER_DISABLE |
accel |
Disable Celeritas offloading entirely |
CELER_KILL_OFFLOAD |
accel |
Kill Celeritas-supported tracks in Geant4 |
CELER_STRIP_SOURCEDIR |
accel |
Strip directories from exception output |
Environment variables from external libraries can also be referenced by Celeritas or its apps:
Variable |
Library |
Brief description |
---|---|---|
CUDA_VISIBLE_DEVICES |
CUDA |
Set the active CUDA device |
HIP_VISIBLE_DEVICES |
HIP |
Set the active HIP device |
G4LEDATA |
Geant4 |
Path to low-energy EM data |
G4FORCE_RUN_MANAGER_TYPE |
Geant4 |
Use MT or Serial thread layout |
G4FORCENUMBEROFTHREADS |
Geant4 |
Set CPU worker thread count |
OMP_NUM_THREADS |
OpenMP |
Number of threads per process |
Logging#
The Celeritas library writes informational messages to stderr
. The given
levels can be used with the CELER_LOG
and CELER_LOG_LOCAL
environment
variables to suppress or increase the output. The default is to print
diagnostic messages and higher.
Level |
Description |
---|---|
debug |
Low-level debugging messages |
diagnostic |
Diagnostics about current program execution |
status |
Program execution status (what stage is beginning) |
info |
Important informational messages |
warning |
Warnings about unusual events |
error |
Something went wrong, but execution can continue |
critical |
Something went terribly wrong, program termination imminent |
Profiling#
Since the primary motivator of Celeritas is performance on GPU hardware, profiling is a necessity. Celeritas uses NVTX (or ROCTX when using AMD HIP) to annotate the different sections of the code, allowing for fine-grained profiling and improved visualization.
Timelines#
A detailed timeline of the Celeritas construction, steps, and kernel launches can be gathered using NVIDIA Nsight systems.
Here is an example using the celer-sim
app to generate a timeline:
1$ CELER_ENABLE_PROFILING=1 \
2> nsys profile \
3> -c nvtx --trace=cuda,nvtx,osrt
4> -p celer-sim@celeritas
5> --osrt-backtrace-stack-size=16384 --backtrace=fp
6> -f true -o report.qdrep \
7> celer-sim inp.json
To use the NVTX ranges, you must enable the CELER_ENABLE_PROFILING
variable
and use the NVTX “capture” option (lines 1 and 3). The celer-sim
range in
the celeritas
domain (line 4) enables profiling over the whole application.
Additional system backtracing is specified in line 5; line 6 writes (and
overwrites) to a particular output file; the final line invokes the
application.
Timelines can also be generated on AMD hardware using the ROCProfiler applications. Here’s an example that writes out timeline information:
1$ CELER_ENABLE_PROFILING=1 \
2> rocprof \
3> --roctx-trace \
4> --hip-trace \
5> celer-sim inp.json
It will output a results.json
file that contains profiling data for
both the Celeritas annotations (line 3) and HIP function calls (line 4) in
a “trace event format” which can be viewed in the Perfetto data visualization
tool.
Kernel profiling#
Detailed kernel diagnostics including occupancy and memory bandwidth can be gathered with the NVIDIA Compute systems profiler.
This example gathers kernel statistics for 10 “propagate” kernels (for both charged and uncharged particles) starting with the 300th launch.
1$ CELER_ENABLE_PROFILING=1 \
2> ncu \
3> --nvtx --nvtx-include "celeritas@celer-sim/step/*/propagate" \
4> --launch-skip 300 --launch-count 10 \
5> -f -o propagate
6> celer-sim inp.json
It will write to propagate.ncu-rep
output file. Note that the domain
and range are flipped compared to nsys
since the kernel profiling allows
detailed top-down stack specification.