MÆSTRO with a Monte Carlo simulator
Overview
In this tutorial, we describe how to run the MÆSTRO algorithm over a Monte Carlo simulator to generate optimal parameter tunes. We do this using the follwoing steps:
Test the install
Set the algorithm parameters
Set the configuration inputs
Run MÆSTRO
Understand the output
Getting started
To install mæstro, execute the following commands:
git clone git@github.com:HEPonHPC/maestro.git
cd maestro
pip install .
Then, test the installation as described in the test installation documentation.
Setting the algorithm parameters
Next, we need to select the algorithm parameters. More details about the
parameters expected, their data types, and examples can be found in the
algorithm parameters documentation.
Here is the example JSON file with algorithm parameters for a minified problem
with a small subset of observables generated using Pythia 8 Monte Carlo event generator
present in parameter_config_backup/miniapp/algoparams.json.
parameter_config_backup/miniapp/algoparams.json{ "tr": { "radius": 2.5, "max_radius": 3, "min_radius": 1e-5, "center": [ 1.3006076218684384, 1.820541, 0.28972391035308026 ], "mu": 0.01, "eta": 0.01 }, "param_names": [ "x", "y", "z" ], "param_bounds": [ [0,2], [0.2,2], [0,1] ], "kappa":100, "max_fidelity":100000, "usefixedfidelity":false, "N_p": 6, "dim": 3, "theta": 0.01, "thetaprime": 0.0001, "fidelity": 1000, "max_iteration":50, "max_fidelity_iteration":5, "min_gradient_norm": 0.00001, "max_simulation_budget":100000000000000000, "output_level":10 }
Setting up the Monte Carlo simulator
The next step is to setting up the Monte Carlo simulator. The simulator can be run using a function call, executing a script, or in a decaf - henson workflow.
Setting up the Monte Carlo simulator using a function call
To run the Monte Carlo simulator using a function call, write a class that is
inherited from the MC task base class MCTask. In this class, you first define
the MC call function as run_mc(self):. Then, define the other inherited but abstract
functions of MCTask in your own class and override any functions defined in MCTask.
More information about MCTask is provided in the
MC Task description. Finally, you set your class along with the
relevant parameters in the mc object configuration.
As an example, the MC call function for
miniapp within maestro/mc/miniapp.py is shown below.
1# MiniApp should inherit MCTask
2class MiniApp(MCTask):
3 def run_mc(self):
4 # In this tutorial, we demonstrate how to run miniapp MC in serial. If you
5 # want to run miniapp MC in parallel, see the run_mc function
6 # in maestro/mc/miniapp.py
7
8 # Get a list of parameter directory (defined in superclass MCTask)
9 dirlist = self.get_param_directory_array(self.mc_run_folder)
10 for dno,d in enumerate(dirlist):
11 # Get parameter from the directory (defined in superclass MCTask)
12 param = self.get_param_from_directory(d) # from super class
13 # Get fidelity from the directory (defined in superclass MCTask)
14 run_fidelity = self.get_fidelity_from_directory(d) # from super class
15
16 if run_fidelity !=0:
17 # Set the output file path
18 outfile = os.path.join(d,"out_curr{}.yoda".format(rank))
19 # Execute the miniapp MC command.
20 # mc_location is defined in the mc object configuration
21 # (see line 5 in the mc object configuration JSON below)
22 p = Popen(
23 [
24 self.mc_parmeters['mc_location'],
25 str(param[0]), str(param[1]), str(param[2]),
26 str(run_fidelity), str(np.random.randint(1,9999999)),
27 "0", "1", output_loc
28 ],
29 stdin=PIPE, stdout=PIPE, stderr=PIPE)
30 p.communicate(b"input data that is passed to subprocess' stdin")
31 comm.barrier()
For selecting this MC call function as the one to run within the MC task, define the mc object configuration as shown below:
1"mc":{
2 "caller_type":"function call",
3 "class_str":"MiniApp",
4 "parameters":{
5 "mc_location":"<location of miniapp MC executable>",
6 }
7}
In this mc object configuration, set the caller_type as function call and the
class_str as the class name defined above Miniapp. Also, add all the parameters
that need to be sent to the MC task within parameters.
Setting up the Monte Carlo simulator by executing a script
To run the Monte Carlo simulator using a script call, a helper script is provided that will interleave
the calls to the optimization task and the MC task until the end of the
MÆSTRO algorithm. The MC task can be a script that calls the run_mc
described in the subsection above or the MC task can directly call a MC executable.
These two approaches are describe in detail below.
Calling the MC task with a script that calls the run_mc function
First, create a enclosing script that calls run_mc function. An example script
for miniapp that calls the run_mc function described above (see maestro/mc/bin/miniapp.py)
is show below.
1if __name__ == "__main__":
2
3parser = argparse.ArgumentParser(description='Run miniapp')
4parser.add_argument("-d", dest="MCDIR", type=str, default="log/MC_RUN",
5 help="MC directory")
6parser.add_argument("-c", dest="CONFIG", type=str, default=None,
7 help="Config file location")
8
9args = parser.parse_args()
10import json
11with open(args.CONFIG,'r') as f:
12 ds = json.load(f)
13mc_parameters = ds['mc']['parameters']
14
15from maestro.mc import MiniApp
16mctask = MiniApp(args.MCDIR,mc_parameters)
17mctask.run_mc()
Next, set the appropriate mc configuration object for the script run
1"mc":{
2 "caller_type":"script run",
3 "class_str":"MiniApp",
4 "commands":[
5 "<location of enclosing script> <location of MC directory> <location of config file>"
6 ],
7 "parameters":{
8
9 }
10 }
In the mc configuration object, set the caller_type as script run and the
class_str as the name of your MC Task class e.g., Miniapp. Also, add all the parameters
that need to be sent to the MC task within parameters. Finally, add the
enclosing script call command within the commands array. This command will be used by
the interleaving helper script to call the MC task.
Calling the MC task by running the MC executable command
To call the MC task by running the MC executable command directly, set the mc configuration object for script run as shown below.
1"mc":{
2 "caller_type":"script run",
3 "class_str":"MiniApp",
4 "commands":[
5 "<location of MC executable> <arguments to the MC executable>"
6 ],
7 "parameters":{
8
9 }
10 }
An example mc configuration object for this kind of MC task can be found in
parameter_config_backup/a14app/config.json.
Setting up the Monte Carlo simulator in a decaf - henson workflow
To run the Monte Carlo simulator within the decaf - henson workflow, a JSON object
with the task commands needs to be defined. As an example, such a JSON object for
miniapp within workflow/miniapp/decaf-henson.json is shown below.
1{
2 "workflow": {
3 "filter_level": "NONE",
4 "nodes": [
5 {
6 "start_proc": 0,
7 "nprocs": "<number of ranks>",
8 "cmdline": "<project location>/maestro/optimization-task.py
9 -a <project location>/parameter_config_backup/miniapp/algoparams.json
10 -c <project location>/parameter_config_backup/miniapp/config.json
11 -d <working directory location>",
12 "func": "opt_task_py",
13 "inports": [],
14 "outports": []
15 },
16 {
17 "start_proc": 0,
18 "nprocs": "<number of ranks>",
19 "cmdline": "<MC task command>",
20 "func": "mc_task_py",
21 "inports": [],
22 "outports": []
23 }
24 ],
25 "edges": [
26 ]
27 }
28}
In the JSON object above, <MC task command> is either the script that calls
the run_mc function or the MC executable command as shown in the commands
array in setting MC simulator by executing a script.
Also, the <number of ranks> is an integer number of ranks to use to run the
optimization task and MC task, <project location> is the location of the MÆSTRO project,
and <working directory location> is the lcoation of the working directory for this run
To call the MC task as a task of the workflow, set the mc configuration object for miniapp as shown below.
1"mc":{
2 "caller_type":"workflow",
3 "class_str":"MiniApp",
4 "parameters":{
5
6 }
7 }
Selecting a surrogate model function
It is possible to select a predefined function or to create your own function in
maestro/model.py to construct surrogate models.
Detailed instructions for selecting the appropriate function can be found in:
reuse a predefined model function function
create your own model function
For this tutorial, we will construct the surrogate model using appr_pa_m_construct function with the following model object configuration:
"model":{ "function_str":{ "MC":"appr_pa_m_construct", "DMC":"appr_pa_m_construct" }, "parameters":{ "MC":{"m":2}, "DMC":{"m":1} } }
Selecting the function structure
It is possible to select a predefined function or to create your own function in
maestro/fstructure.py to get a f_structure object.
Detailed instructions for selecting the appropriate function can be found in:
reuse a predefined f_structure object function
create your own f_structure object function
For this tutorial, we will get the f_structure object using appr_tuning_objective function with the following f_structure object configuration:
"f_structure":{ "parameters":{ "optimization":{ "nstart":5, "nrestart":10, "saddle_point_check":false, "minimize":true, "use_mpi":true } }, "function_str":"appr_tuning_objective" }
Note that if the data and weights keys are not specified in the parameter object
of the f_structure configuration, then a data value of [1,0] and a weight of 1 is
assumed for each term of appr_tuning_objective.
If you want to specify your own data and weights, then assign complete path of the
data and weights files to the data and weights keys, respectively in
the parameter object of the f_structure configuration.
Exampe data and weights files for this tutorial can be found in
parameter_config_backup/miniapp/data.json and parameter_config_backup/miniapp/weights,
respectively.
Setting the configuration inputs
The configuration input consists of the objects from the last three steps. So the configuration output for this tutorial is:
{ "mc":"appropriate mc configuration object depending on whether the caller_type" "is function call, script run, or workflow", "model":{ "function_str":{ "MC":"appr_pa_m_construct", "DMC":"appr_pa_m_construct" }, "parameters":{ "MC":{"m":2}, "DMC":{"m":1}, } }, "f_structure":{ "parameters":{ "optimization":{ "nstart":5, "nrestart":10, "saddle_point_check":false, "minimize":true, "use_mpi":true } }, "function_str":"appr_tuning_objective" } }
More information about the key expected, their definition, their data types, and examples can be found in the configuration input documentation.
Running MÆSTRO on your problem
Here, we will assume that the dependencies and apprentice are installed correctly as described in the initial installation test. Then, we install the MÆSTRO code by typing the following commands:
cd maestro
pip install .
Then, depending on the caller_type used, try the MÆSTRO algorithm on miniapp
using the commands below.
When caller_type is function call
optimization-task
-a <algorithm_parameters_JSON_location>
-c <configuration_input_JSON_location>
-d ../log/workflow/miniapp/<working_dir_name>
Here, replace <algorithm_parameters_JSON_location> and <configuration_input_JSON_location>
with the correct location and assign an appropriate name in <working_dir_name>.
When caller_type is script run
maestro-run
-a <algorithm_parameters_JSON_location>
-c <configuration_input_JSON_location>
-f <parameter_config_backup_location with data, weights, and other settings
e.g., parameter_config_backup/miniapp>
-d ../log/workflow/miniapp/<working_dir_name>
-h <optional hostfile location>
-n <total number of ranks to use (integer)>
Here, replace <algorithm_parameters_JSON_location> and <configuration_input_JSON_location>
with the correct location and assign an appropriate name in <working_dir_name>.
The optional hostfile contains list of nodes and number of ranks to use on these nodes.
The total number of ranks is the number of ranks to use as numProcs in mpirun calls of the
interleaving optimization and MC tasks.
If hostfile is specified, the total number of ranks to use should be the sum of
all the ranks used across all nodes.
When caller_type is workflow
cd <location of decaf-henson JSON file>
mpirun -np <number of ranks to use (integer)>
<location of decaf-henson_python executable>/decaf-henson_python
The number of ranks to use should be the equal to or greater than the value set in the nprocs
key in the decaf-henson JSON file as shown in the the section on
setting MC simulator in decaf-henson workflow.
To run this command with a hostfile:
cd <location of decaf-henson JSON file>
mpirun -hostfile <hostfile location> -np <number of ranks to use (integer)>
<location of decaf-henson_python executable>/decaf-henson_python
The hostfile contains list of nodes and number of ranks to use on these nodes. Also, the number of ranks to use should be the sum of all the ranks used across all nodes.
Understanding the output
If every thing runs as expected, since \(output\_level\ge10\) in the algorithm parameter input, the output should contain a one line summary of each iteration of the MÆSTRO algorithm run as described in the one line output documentation.