In situ Visualization with Catalyst 2

Catalyst 2 (further referred to as just Catalyst) is a lightweight in-situ visualization and analysis framework API developed for simulations and other scientific data producers. It has a lightweight implementation (or stub) and an SDK to develop custom implementations of Catalyst. ParaView comes with its own implementation (known as ParaView Catalyst) for leveraging ParaView’s visualization and analysis capabilities, which is what this document will focus on.

Enabling Catalyst

In order to use Catalyst with WarpX, we need to ensure that we will be using the same version of conduit across all libraries i.e Catalyst, AMReX and ParaView. One way to achieve this is to build conduit externally and use it for compiling all the above packages. This ensures compatibility when passing conduit nodes between WarpX and ParaView.

First, we build Conduit and then build Catalyst 2 using the conduit library created in the previous step. The latter can be achieved by adding the installation path of conduit to the environmental variable CMAKE_PREFIX_PATH and setting CATALYST_WITH_EXTERNAL_CONDUIT=ON during the configuration step of Catalyst.

Then we build ParaView master (on a commit after 2024.07.01, tested on 4ef351a54ff747ef7169e2e52e77d9703a9dfa77) following the developer instructions provided here . A representative set of options for a headless ParaView installation is provided here Afterward, WarpX must be built with WarpX_CATALYST=ON. Also, make sure to provide the installed paths of Conduit and Catalyst via CMAKE_PREFIX_PATH before configuring WarpX.

Inputs File Configuration

Once WarpX has been compiled with Catalyst support, it will need to be enabled and configured at runtime. This is done using our usual inputs file (read with amrex::ParmParse). The supported parameters are part of the FullDiagnostics with <diag_name>.format parameter set to catalyst.

In addition to configuring the diagnostics, the following parameters must be included:

catalyst.script_paths: The locations of the pipeline scripts, separated by either a colon or semicolon (e.g. /path/to/script1.py;/path/to/script2.py).
catalyst.implementation (default paraview): The name of the implementation being used (case sensitive).
catalyst.implementation_search_paths: The locations to search for the given implementation. The specific file being searched for will be catalyst_{implementation}.so.

The latter two can also be given via the environmental variables CATALYST_IMPLEMENTATION_NAME and CATALYST_IMPLEMENTATION_PATHS respectively.

Because the scripts and implementations are global, Catalyst does not benefit from nor differentiate between multiple diagnostics.

Visualization/Analysis Pipeline Configuration

Catalyst uses the files specified in catalyst.script_paths to run all analysis.

The following script, simple_catalyst_pipeline.py, automatically detects the type of data for both the mesh and particles, then creates an extractor for them. In most cases, these will be saved as .VTPC files which can be read with the XML Partitioned Dataset Collection Reader.

Listing 86 You can copy this file from Docs/source/dataanalysis/catalyst/catalyst_simple_pipeline.py.

from paraview import catalyst
from paraview.simple import *  # noqa: F403


# Helper function
def create_data_extractor(data_node, filename="Dataset"):
    """Creates a data extractor that saves `data_node` to a datafile named `filename`.
    The filetype is chosen based on the type of `data_note`.

    Note: no rendering is performed by such an extractor. The data are
    written directly to a file via VTK.
    """
    VTK_TYPES = [
        "vtkImageData",
        "vtkRectilinearGrid",
        "vtkStructuredGrid",
        "vtkPolyData",
        "vtkUnstructuredGrid",
        "vtkUniformGridAMR",
        "vtkMultiBlockDataSet",
        "vtkPartitionedDataSet",
        "vtkPartitionedDataSetCollection",
        "vtkHyperTreeGrid",
    ]
    FILE_ASSOCIATIONS = [
        "VTI",
        "VTR",
        "VTS",
        "VTP",
        "VTU",
        "VTH",
        "VTM",
        "VTPD",
        "VTPC",
        "HTG",
    ]
    clientside_data = data_node.GetClientSideObject().GetOutputDataObject(
        0
    )  # Gets the dataobject from the default output port

    # Loop is required because .IsA() detects valid classes that inherit from the VTK_TYPES
    for i, vtk_type in enumerate(VTK_TYPES):
        if clientside_data.IsA(vtk_type):
            filetype = FILE_ASSOCIATIONS[i]
            extractor = CreateExtractor(
                filetype, data_node, registrationName=f"_{filetype}"
            )
            extractor.Writer.FileName = filename + "_{timestep:}" + f".{filetype}"
            return extractor

    raise RuntimeError(f"Unsupported data type: {clientside_data.GetClassName()}")


# Camera settings
paraview.simple._DisableFirstRenderCameraReset()  # Prevents the camera from being shown

# Options
options = catalyst.Options()

options.CatalystLiveTrigger = "TimeStep"  # "Python", "TimeStep", "TimeValue"
options.EnableCatalystLive = 0  # 0 (disabled), 1 (enabled)
if options.EnableCatalystLive == 1:
    options.CatalystLiveURL = "localhost:22222"  # localhost:22222 is default

options.ExtractsOutputDirectory = "datasets"  # Base for where all files are saved
options.GenerateCinemaSpecification = 0  # 0 (disabled), 1 (enabled), generates additional descriptor files for cinema exports
options.GlobalTrigger = "TimeStep"  # "Python", "TimeStep", "TimeValue"

meshSource = PVTrivialProducer(
    registrationName="mesh"
)  # "mesh" is the node where the mesh data is stored
create_extractor(meshSource, filename="meshdata")
particleSource = PVTrivialProducer(
    registrationName="particles"
)  # "particles" is the node where particle data is stored
create_extractor(particleSource, filename="particledata")


# Called on catalyst initialize (after Cxx side initialize)
def catalyst_initialize():
    return


# Called on catalyst execute (after Cxx side update)
def catalyst_execute(info):
    print(f"Time: {info.time}, Timestep: {info.timestep}, Cycle: {info.cycle}")
    return


# Callback if global trigger is set to "Python"
def is_activated(controller):
    return True


# Called on catalyst finalize (after Cxx side finalize)
def catalyst_finalize():
    return


if __name__ == "__main__":
    paraview.simple.SaveExtractsUsingCatalystOptions(options)

For the case of ParaView Catalyst, pipelines are run with ParaView’s included pvbatch executable and use the paraview library to modify the data. While pipeline scripts could be written manually, this is not advised for anything beyond the script above. It is much more practical to use ParaView’s built in Save Catalyst State button.

The process for creating a pipeline is as follows:

Run at least one step of simulation and save the data in a ParaView compatible format, then open it in ParaView.
Set up the desired scene, including filters, camera and views, and extractors.
Press Save Catalyst State, or the multicolored flask icon in the top left corner, and save it to a desired location.
Open the script and replace the used producer with PVTrivialProducer, setting the registrationName to either mesh or particles based on what data is used.

As an example for step four, here are a few lines from a script directly exported from ParaView:

# create a new 'XML Image Data Reader'
meshdatavti = XMLImageDataReader(registrationName='meshdata.vti', FileName=['/path/to/meshdata.vti'])
meshdatavti.CellArrayStatus = ['Bx', 'By', 'Bz', 'Ex', 'Ey', 'Ez']
meshdatavti.TimeArray = 'None'

# Calculator sample filter
calculator1 = Calculator(registrationName='Calculator1', Input=meshdatavti)
calculator1.AttributeType = 'Cell Data'
calculator1.ResultArrayName = 'BField'
calculator1.Function = 'sqrt(Bx^2 + By^2 + Bz^2)'

In order to use it with the mesh data coming from the simulation, the above code would be changed to:

# create the producer
meshdata = PVTrivialProducer(registrationName='mesh')
meshdata.CellArrayStatus = ['Bx', 'By', 'Bz', 'Ex', 'Ey', 'Ez']
meshdata.TimeArray = 'None'

# Calculator sample filter
calculator1 = Calculator(registrationName='Calculator1', Input=meshdata)
calculator1.AttributeType = 'Cell Data'
calculator1.ResultArrayName = 'BField'
calculator1.Function = 'sqrt(Bx^2 + By^2 + Bz^2)'

Steps one is advised so that proper scaling and framing can be done, however in certain cases it may not be possible. If this is the case, a dummy object can be used instead (such as a wavelet or geometric shape scaled appropriately) and the rest of the steps can be followed as usual.

Replay

Catalyst 2.0 supports generating binary data dumps for the conduit nodes passed to each catalyst_ call at each iteration. This allows to debug/adapt catalyst scripts without having to rerun the simulation each time.

To generate the data dumps one must first set the environmental variable CATALYST_DATA_DUMP_DIRECTORY to the path where the dumps should be saved. Then, run the simulation as normal but replace catalyst.implementation=stub either in the calling script of WarpX or as an additional argument.

This will run the simulation and write the conduit nodes under CATALYST_DATA_DUMP_DIRECTORY.

Afterward, one can replay the generated nodes by setting up the CATALYST_IMPLEMENTATION_* variables for the catalyst_replay executable (which can be found in the catalyst build directory) appropriately. For example:

# dump conduit nodes
export CATALYST_DATA_DUMP_DIRECTORY=./raw_data
mpiexec -n N <WarpX build directory>/bin/warpx.2d ./inputs_2d catalyst.script_paths=catalyst_pipeline.py catalyst.implementation="stub"
# validate that files have been written
ls ./raw_data/
... many files of the format XXXX.conduit_bin.Y.Z

# replay them
export CATALYST_IMPLEMENTATION_NAME=paraview
export CATALYST_IMPLEMENTATION_PATHS=<paraview install path>/lib/catalyst
export CATALYST_IMPLEMENTATION_PREFER_ENV=YES
export CATALYST_DEBUG=1 # optional but helps to make sure the right paths are used
export PYTHONPATH=${PYTHONPATH}/$(pwd) # or the path containing catalyst_pipeline.py in general
# N needs to be the same as when we generated the dump
mpiexec -n N <catalyst install path>/bin/catalyst_replay ./raw_data
# check extractor output  e.g
ls ./datasets/

For more information see the documentation for catalyst replay here .