Here we explain how to reproduce the results of the TACAS 2021 paper dtControl 2.0: Explainable Strategy Representation via Decision Tree Learning Steered by Experts. More concretely, we show how to
More detailed instructions follow. Note that we always assume you are inside our unzipped artifact folder and give paths relative to that. E.g. just executing
ls should result in seeing the folders
results as well as 9 Python scripts.
This artifact has passed the TACAS 2021 artifact evaluation and is presented here for archival purposes.
Recent advances have shown how decision trees are apt data structures for concisely representing strategies (or controllers) satisfying various objectives. Moreover, they also make the strategy more explainable. The recent tool dtControl had provided pipelines with tools supporting strategy synthesis for hybrid systems, such as SCOTS and Uppaal Stratego. We present dtControl 2.0, a new version with several fundamentally novel features. Most importantly, the user can now provide domain knowledge to be exploited in the decision tree learning process and can also interactively steer the process based on the dynamically provided information. To this end, we also provide a graphical user interface. It allows for inspection and re-computation of parts of the result, suggesting as well as receiving advice on predicates, and visual simulation of the decision-making process. Besides, we interface model checkers of probabilistic systems, namely STORM and PRISM and provide dedicated support for categorical enumeration-type state variables. Consequently, the controllers are more explainable and smaller.
This artifact contains the following files and folders
tacas21-dtcontrol-artifactcountaining the installation files, examples and run scripts
License.txtcontaining the license text
Readme.txtcontaining instructions on how to reproduce the tables and figures of the paper.
We tested the artifact on a PC with a 64 bit Windows 10 operating system, 24 GB RAM and a 3.00 GHz AMD Phenom II X6 1075T processor. We ran the virtual machine with 12 GB of RAM and 2 cores of the processor, using VirtualBox 6.0.16. The case studies have a size of ~4.5GB, so the experiments require ~5GB hard drive. The experiments in the paper were conducted on an Ubuntu 19.10 server with much more power, namely 250 GB RAM and a 2.2GHz Intel Xeon CPU E5-2630 v4, with multiple experiments running in parallel. More concrete details on the scripts are in Section “Reproducing the results of the paper”
To run dtControl, you need Python 3.7.9 or higher (which is installed automatically on the TACAS 21 VM) as well as several libraries which are included in this artifact. The web-based graphical user interface was tested on Firefox 80, which comes bundled with the VM.
We assume that you start on a clean version of the TACAS 2021 VM and use it only for this evaluation, hence we do not bother with virtual environments. We assume that
pip3 are already installed.
First, we have to add
~/.local/bin to the path so that
dtcontrol can easily be found after installation
$ echo "export PATH=\"$PATH:/home/tacas21/.local/bin\"" >> ~/.bashrc $ source ~/.bashrc
To install dtControl with all its dependencies, go to the
installation folder of our artifact and execute
$ cd installation && pip3 install *
This installs dtControl in the
.local/bin directory. You can check that it worked by typing
dtcontrol, which should print the help text of dtcontrol.
The case studies are located in folder
examples. To unpack them, first
$ cd examples
$ find . -name "*.zip" | while read filename; do unzip -o -d "`dirname "$filename"`" "$filename"; done;
to extract the zipped examples within all the subfolders (
examples/storm). The examples in the
csv folder are not zipped as they are very small.
We want to verify that three things work properly: The command line interface (CLI), the graphical user interface (GUI) and the python bindings which we use for the large scripts. To do so, first execute
which should print the help text. If it doesn’t, make sure you have set the path correctly. Then execute
and then open firefox and go to
http://127.0.0.1:5000 You should see the GUI (and not a 404 not found). Finally, in the top level of our artifact folder, execute the script
quickcheck.py by typing
$ python quickcheck.py
This executes 3 different algorithms on 2 case studies and finally opens a results page in firefox.
We offer eight scripts (using the dtControl python bindings) to reproduce the two tables we have. Every script is named after the schema
<set>.py, where -
<models> are either ‘CPS’ or ‘MDP’. CPS (Cyber-physical systems) are the case studies for Table 1 and MDP (Markov decision processes) are the case studies for Table 2.
<representation>is either ‘BDD’ (binary decision diagram) or ‘DT’ (decision tree), the two ways of representing the controllers we compare in the tables.
<set>is either ‘subset’ or ‘all’. Running all experiments is very memory intensive (e.g. constructing a BDD for the larger case studies can take more than 11 GB of RAM) and time intensive (the larger case studies take more than 2 hours, and for BDD every experiment is repeated 20 times). Thus, we offer a meaningful subset for all experiments, repeating BDDs only twice and excluding the largest case studies.
Note for the interested: You can modify the scripts yourself to modify the subset of experiments, e.g. to include a large case study for BDDs, but not repeat it 20 times. For this, here is a quick description of the two important parameters in the scripts:
suite.add_datasets(..., include=[...]). In this ‘include’ list, all case studies are explicitly listed, and you can remove or add them as you deem fit.
classifiers = [...]where all algorithms for representation are listed. By commenting some of the classifiers, you can omit them in the computation (for an example, compare the
cps-bdd-allscript: two case studies were excluded and all but two classifiers have been commented).
By modifying the scripts, you can reproduce as many of the results as your computational resources and time allow.
Here we report the time and memory requirements of the different scripts. Note that we are not benchmarking the memory or time requirements, and hence these are ballpark figures to help the reviewer choose the experiments to reproduce.
We mark some suggested experiments in bold. OOM indicates that the script ran out of memory on our VM.
|Script Name||Max RAM||Time on TACAS VM||Theoretical Maximum|
Provided that you have a good setup, you can also run the scripts in parallel to speed things up (as we did when running them for the paper).
Expected error: Note that we have several times observed the following behaviour of the BDD scripts: after completing several instances, the script terminates with an error related to garbage collection. In all cases, it was sufficient to just restart the script, and it continued without this error, picking up from the point where it failed.
To reproduce this table, we suggest to run the scripts
$ python cps-dt-subset.py $ python cps-bdd-subset.py
The results are saved in multiple formats, namely in the folders
results. The former two are not relevant for reproducing the table. In the results folder, you see an html and a json file that report all the statistics of every combination of algorithm and case study. You may open the
results/DT-CPS.html file and the
results/BDD-CPS.html file in a browser to compare the results with the paper. We report the minimum sized dtControl1, dtControl2 and BDD results from the different DT-algorithms respectively the multiple BDD repetitions.
To reproduce this table, we suggest to run the scripts
$ python mdp-dt-subset.py $ python mdp-bdd-subset.py
The results of Table 2 may be compiled as described for Table 1, from the
results/DT-MDP.html file and the
results/BDD-MDP.html file generated by the above commands.
As previously mentioned, we only report the minimum sized results of the different algorithms in the paper. In order to automatically extract the minimum values given within Table 1 and Table 2, we provide a helper script that reads the corresponding json-files of Table 1 and Table 2. The helper script
process-results.py can be found in the
results/ folder (where the files
BDD-MDP.json are generated).
The helper script
process-results.py can be executed by running
$ python process-results.py
A respective CSV file will be created for each of them, which can be opened using a standard spreadsheet software (Open/LibreOffice, Excel, Google Sheets).
These figures show small illustrative examples that need not be reproduced via the tool.
This figure shows a screenshot of the new GUI. To start the GUI, run
dtcontrol-frontend. The sidebar on the left allows the selection of the case study and the preset. By clicking the browse button, you can select the case study.
cartpole.scs are located in
firewire_abst.prism is in
<our-artifact-folder>/examples/prism. The dropdown menu allows to select the preset, i.e. the exact DT construction algorithm. Every preset induces a choice of hyper-parameters. These hyper-parameters can also be tuned by hand, but to recreate the screenshot, you do not need to use the advanced options.
Adding the case studies with their respective preset as in screenshot and then clicking the run button on the right for the first two recreates the Figure.
This figure shows all the workflow and software architecture of dtControl 2.0. We suggest to read the description of the figure in the paper before continuing here.
Controller: The CPS scripts use controllers from SCOTS and UPPAAL, the MDP scripts controllers from Storm (and hence also PRISM models). Moreover, in the description of Figure 3, we use a PRISM model. We also include an example CSV controller. See our documentation for a description of the CSV format.
Predicate Domain: All these options are offered in the advanced options menu. When selecting algebraic predicates, you also are required to enter domain knowledge predicates. The domain knowledge has to follow the structure presented in user manual. In general, x-variables are used to refer to features. For example,
x_0 references the first feature (whereas c-variables describe coefficients). In order to apply the predicate x₀ - log(x₁) >= 150, you only have to enter
x_0 - log(x_1) >= 150 within the input field. For a list of provided functions see the SymPy documentation. Multiple predicates can be separated by a newline.
Determinizer: Note that this box merges several things, namely pre- and post-processing with Safe early stopping. All of these things are offered in the advanced option menu (note that MLE is considered an impurity measure, but it requires the determinizing early stopping option to be true. For more details, read Section 7 of the paper).
User Choice and Domain Knowledge: These things were described as part of the Predicate Domain and Predicate Selector.
Decision Tree: If you have executed any experiment, then the folder
decision_trees has been created in our artifact folder. Inside this, ordered by preset and case study, you find the C-, json- and dot-files for every DT. The interactive graph is what you see when clicking the view button in the frontend, as e.g. in the description of the interactive predicate selection.
examples/cps/cartpole_dynamics.txtfor reference), then select some initial values (or press the randomize button), and click submit to start the simulation. A simulation trace can be sampled using the Play/Next buttons. The decision path is highlighted in each step. Further, charts displaying the progress of simulation are displayed at the bottom of the page.
You can play around with CLI and GUI, trying any combination of hyper-parameters on any case study. You can also download further case studies from the Quantitative Verification Benchmark set or from our examples repository. You can explore DTs in the GUI and try to get better understanding, or see whether you can come up with useful predicates.
To run dtControl on a single case study, execute the following (assuming that you have installed dtControl and unzipped the examples):
$ dtcontrol --input examples/<location_of_controller_file> --use-preset <preset>
<location_of_controller_file> is the path to an example, e.g.
<preset> is one of the available presets (run
dtcontrol preset --list to list all available presets, we recommend
avg for MDPs and
mlentropy for the CPS benchmarks).
Note that, for the
avg preset, there needs to be a
<controller_name>_config.json file containing the metadata present in the same location as the controller file.
The output tree (in JSON, C and DOT formats) can be found inside the
decision_trees/<present_name>/<controller_name>/ folder, and a summary will be available in the
benchmarks.html file in the folder from which
dtcontrol was run.
To reproduce Figure 9 of the Appendix, one may follow these steps
dtcontrol-frontendto start the GUI and navigate to
http://127.0.0.1:5000in the browser.
examples/cps/10rooms.scscontroller along with the metadata file
mlentropypreset and run the experiment.
x <= 20.625predicate in the tree by clicking on it and press on the “Start interactive tree builder …” button in the left side bar.
x_0 + 3*x_1 <= c_0and wait until the Instantiated Predicate table reloads.
In case of problems with the evaluation scripts, you may try
*.jsonfiles in the
.benchmark_suitehidden folder inside each of the examples folders.
In case the dtControl CLI is not working properly, you can clean up the cache by deleting
*.jsonfiles in the folder from which
.benchmark_suitehidden folder inside each of the examples folders.
In case the dtControl GUI is malfunctioning, you may try
Ctrl+Cin the terminal to kill
dtcontrol-frontend) and try again.
More information on dtControl and its functionality can be found in the user and developer manuals at dtcontrol.readthedocs.io.
dtControl is developed as an open-source software, with the source available in LRZ Gitlab.