Checksum regression tests

WarpX has checksum regression tests: as part of CI testing, when running a given test, the checksum module computes one aggregated number per field (Ex_checksum = np.sum(np.abs(Ex))) and compares it to a reference (benchmark). This should be sensitive enough to make the test fail if your PR causes a significant difference, print meaningful error messages, and give you a chance to fix a bug or reset the benchmark if needed.

The checksum module is located in Regression/Checksum/, and the benchmarks are stored as human-readable JSON files in Regression/Checksum/benchmarks_json/, with one file per benchmark (for instance, test Langmuir_2d has a corresponding benchmark Regression/Checksum/benchmarks_json/Langmuir_2d.json).

For more details on the implementation, the Python files in Regression/Checksum/ should be well documented.

From a user point of view, you should only need to use checksumAPI.py. It contains Python functions that can be imported and used from an analysis Python script. It can also be executed directly as a Python script. Here are recipes for the main tasks related to checksum regression tests in WarpX CI.

Include a checksum regression test in an analysis Python script

This relies on the function evaluate_checksum:

checksumAPI.evaluate_checksum(test_name, output_file, output_format='plotfile', rtol=1e-09, atol=1e-40, do_fields=True, do_particles=True)[source]

Compare output file checksum with benchmark. Read checksum from output file, read benchmark corresponding to test_name, and assert their equality.

Parameters:

test_name (string) – Name of test, as found between [] in .ini file.
output_file (string) – Output file from which the checksum is computed.
output_format (string) – Format of the output file (plotfile, openpmd).
rtol (float, default=1.e-9) – Relative tolerance for the comparison.
atol (float, default=1.e-40) – Absolute tolerance for the comparison.
do_fields (bool, default=True) – Whether to compare fields in the checksum.
do_particles (bool, default=True) – Whether to compare particles in the checksum.

For an example, see

#!/usr/bin/env python3

import os
import re
import sys

sys.path.insert(1, '../../../../warpx/Regression/Checksum/')
import checksumAPI

# this will be the name of the plot file
fn = sys.argv[1]

# Get name of the test
test_name = os.path.split(os.getcwd())[1]

# Run checksum regression test
if re.search( 'single_precision', fn ):
    checksumAPI.evaluate_checksum(test_name, fn, rtol=2.e-6)
else:
    checksumAPI.evaluate_checksum(test_name, fn)

This can also be included in an existing analysis script. Note that the plotfile must be <test name>_plt?????, as is generated by the CI framework.

Evaluate a checksum regression test from a bash terminal

You can execute checksumAPI.py as a Python script for that, and pass the plotfile that you want to evaluate, as well as the test name (so the script knows which benchmark to compare it to).

./checksumAPI.py --evaluate --output-file <path/to/plotfile> --output-format <'openpmd' or 'plotfile'> --test-name <test name>

See additional options

--skip-fields if you don’t want the fields to be compared (in that case, the benchmark must not have fields)
--skip-particles same thing for particles
--rtol relative tolerance for the comparison
--atol absolute tolerance for the comparison (a sum of both is used by numpy.isclose())

Create/Reset a benchmark with new values that you know are correct

Create/Reset a benchmark from a plotfile generated locally

This is using checksumAPI.py as a Python script.

./checksumAPI.py --reset-benchmark --output-file <path/to/plotfile> --output-format <'openpmd' or 'plotfile'> --test-name <test name>

See additional options

--skip-fields if you don’t want the benchmark to have fields
--skip-particles same thing for particles

Since this will automatically change the JSON file stored on the repo, make a separate commit just for this file, and if possible commit it under the Tools name:

git add <test name>.json
git commit -m "reset benchmark for <test name> because ..." --author="Tools <warpx@lbl.gov>"

Reset a benchmark from the Azure pipeline output on Github

Alternatively, the benchmarks can be reset using the output of the Azure continuous intergration (CI) tests on Github. The output can be accessed by following the steps below:

On the Github page of the Pull Request, find (one of) the pipeline(s) failing due to benchmarks that need to be updated and click on “Details”.
Click on “View more details on Azure pipelines”.
Click on “Build & test”.

From this output, there are two options to reset the benchmarks:

For each of the tests failing due to benchmark changes, the output contains the content of the new benchmark file, as shown below. This content can be copied and pasted into the corresponding benchmark file. For instance, if the failing test is LaserAcceleration_BTD, this content can be pasted into the file Regression/Checksum/benchmarks_json/LaserAcceleration_BTD.json.
If there are many tests failing in a single Azure pipeline, it might become more convenient to update the benchmarks automatically. WarpX provides a script for this, located in Tools/DevUtils/update_benchmarks_from_azure_output.py. This script can be used by following the steps below:
- From the Azure output, click on “View raw log”.
- This should lead to a page that looks like the image below. Save it as a text file on your local computer.
- On your local computer, go to the WarpX folder and cd to the Tools/DevUtils folder.
- Run the command python update_benchmarks_from_azure_output.py /path/to/azure_output.txt. The benchmarks included in that Azure output should now be updated.
- Repeat this for every Azure pipeline (e.g. cartesian2d, cartesian3d, qed) that contains benchmarks that need to be updated.