Checksums on Tests
When running an automated test, we often compare the data of final time step of the test with expected values to catch accidental changes. Instead of relying on reference files that we would have to store in their full size, we calculate an aggregate checksum.
For this purpose, the checksum Python module computes one aggregated number per field (e.g., the sum of the absolute values of the array elements) and compares it to a reference value (benchmark). This should be sensitive enough to make the test fail if your PR causes a significant difference, print meaningful error messages, and give you a chance to fix a bug or reset the benchmark if needed.
The checksum module is located in Regression/Checksum/
, and the benchmarks are stored as human-readable JSON files in Regression/Checksum/benchmarks_json/
, with one file per benchmark (for example, the test test_2d_langmuir_multi
has a corresponding benchmark Regression/Checksum/benchmarks_json/test_2d_langmuir_multi.json
).
For more details on the implementation, please refer to the Python implementation in Regression/Checksum/
.
From a user point of view, you should only need to use checksumAPI.py
, which contains Python functions that can be imported and used from an analysis Python script or can also be executed directly as a Python script.
How to compare checksums in your analysis script
This relies on the function evaluate_checksum
:
- checksumAPI.evaluate_checksum(test_name, output_file, output_format='plotfile', rtol=1e-09, atol=1e-40, do_fields=True, do_particles=True)[source]
Compare output file checksum with benchmark. Read checksum from output file, read benchmark corresponding to test_name, and assert their equality.
If the environment variable CHECKSUM_RESET is set while this function is run, the evaluation will be replaced with a call to reset_benchmark (see below).
- Parameters:
test_name (string) – Name of test, as found between [] in .ini file.
output_file (string) – Output file from which the checksum is computed.
output_format (string) – Format of the output file (plotfile, openpmd).
rtol (float, default=1.e-9) – Relative tolerance for the comparison.
atol (float, default=1.e-40) – Absolute tolerance for the comparison.
do_fields (bool, default=True) – Whether to compare fields in the checksum.
do_particles (bool, default=True) – Whether to compare particles in the checksum.
Here’s an example:
#!/usr/bin/env python3
import os
import sys
sys.path.insert(1, "../../../../warpx/Regression/Checksum/")
from checksumAPI import evaluate_checksum
# compare checksums
evaluate_checksum(
test_name=os.path.split(os.getcwd())[1],
output_file=sys.argv[1],
rtol=1e-2,
)
This can also be included as part of an existing analysis script.
How to evaluate checksums from the command line
You can execute checksumAPI.py
as a Python script for that, and pass the plotfile that you want to evaluate, as well as the test name (so the script knows which benchmark to compare it to).
./checksumAPI.py --evaluate --output-file <path/to/plotfile> --output-format <'openpmd' or 'plotfile'> --test-name <test name>
See additional options
--skip-fields
if you don’t want the fields to be compared (in that case, the benchmark must not have fields)--skip-particles
same thing for particles--rtol
relative tolerance for the comparison--atol
absolute tolerance for the comparison (a sum of both is used bynumpy.isclose()
)
How to create or reset checksums with local benchmark values
This is using checksumAPI.py
as a Python script.
./checksumAPI.py --reset-benchmark --output-file <path/to/plotfile> --output-format <'openpmd' or 'plotfile'> --test-name <test name>
See additional options
--skip-fields
if you don’t want the benchmark to have fields--skip-particles
same thing for particles
Since this will automatically change the JSON file stored on the repo, make a separate commit just for this file, and if possible commit it under the Tools
name:
git add <test name>.json
git commit -m "reset benchmark for <test name> because ..." --author="Tools <warpx@lbl.gov>"
How to reset checksums for a list of tests with local benchmark values
If you set the environment variable export CHECKSUM_RESET=ON
before running tests that are compared against existing benchmarks, the test analysis will reset the benchmarks to the new values, skipping the comparison.
With CTest (coming soon), select the test(s) to reset by name or label.
# regex filter: matched names
CHECKSUM_RESET=ON ctest --test-dir build -R "Langmuir_multi|LaserAcceleration"
# ... check and commit changes ...
How to reset checksums for a list of tests with benchmark values from the Azure pipeline output
Alternatively, the benchmarks can be reset using the output of the Azure continuous intergration (CI) tests on Github. The output can be accessed by following the steps below:
On the Github page of the Pull Request, find (one of) the pipeline(s) failing due to benchmarks that need to be updated and click on “Details”.
Click on “View more details on Azure pipelines”.
Click on “Build & test”.
From this output, there are two options to reset the benchmarks:
For each of the tests failing due to benchmark changes, the output contains the content of the new benchmark file, as shown below. This content can be copied and pasted into the corresponding benchmark file. For instance, if the failing test is
LaserAcceleration_BTD
, this content can be pasted into the fileRegression/Checksum/benchmarks_json/LaserAcceleration_BTD.json
.If there are many tests failing in a single Azure pipeline, it might become more convenient to update the benchmarks automatically. WarpX provides a script for this, located in
Tools/DevUtils/update_benchmarks_from_azure_output.py
. This script can be used by following the steps below:From the Azure output, click on “View raw log”.
This should lead to a page that looks like the image below. Save it as a text file on your local computer.
On your local computer, go to the WarpX folder and cd to the
Tools/DevUtils
folder.Run the command
python update_benchmarks_from_azure_output.py /path/to/azure_output.txt
. The benchmarks included in that Azure output should now be updated.Repeat this for every Azure pipeline (e.g.
cartesian2d
,cartesian3d
,qed
) that contains benchmarks that need to be updated.