Surrogate Models and Function Structures

Surrogate Models

The surrogate models in MÆSTRO are available via the functions in maestro.ModelConstruction and they are stored in the file at path maestro/model.py.

Available surrogate model functions

At this time, the following functions are available in maestro.ModelConstruction:

def appr_pa_m_construct:
def appr_ra_m_n_construct:
def appr_ra_m_1_construct:

All these function calls use apprentice to construct polynomial or rational approximation models.

appr_pa_m_construct

This function constructs polynomial approximation model of order m. The following model object should be used in configuration inputs where the name of the data generated by the Monte Carlo simulator is MC

"model":{
  "function_str":{
    "MC":"appr_pa_m_construct",
  },
  "parameters":{
    "MC":{"m":2},
  }
}

appr_ra_m_n_construct

This function constructs rational approximation model of numerator order m and denominator order n. The following model object should be used in configuration inputs where the name of the data generated by the Monte Carlo simulator is MC

"model":{
  "function_str":{
    "MC":"appr_ra_m_n_construct",
  },
  "parameters":{
    "MC":{"m":2,"n":2},
  }
}

appr_ra_m_1_construct

This function constructs rational approximation model of numerator order m and denominator order 1. The following model object should be used in configuration inputs where the name of the data generated by the Monte Carlo simulator is MC

"model":{
  "function_str":{
    "MC":"appr_ra_m_1_construct",
  },
  "parameters":{
    "MC":{"m":2,"n":1},
  }
}

Creating your own surrogate model function

To create your own surrogate model function, you can use the template below with inline comments explaining different lines of the code:

def my_appx_construct(self,data_name):
  """
  In maestro/model.py, create a function with two arguments
  data_name is the name of the data generated by the Monte Carlo simulator
  that will be passed by self.consturct_models (maestro.ModelConstruction.consturct_models).
  The simulator data is contained in self.mc_data_df, which is a pandas data
  frame that has the following structure:
                        MC                          ...
  term1.P        [[1., 2.],[4., 8.],[12.,9],...]
  term1.V        [19., 18., 17.,...]                ...
  term2.P        [[1., 2.],[4., 8.],[12.,9],...]
  term2.V        [29., 28., 27.,...]
  ...            ...                                ...

  """
  app = {}
  appscaled = {}
  columnnames = list(self.mc_data_df.index)

  import apprentice
  Sclocal = apprentice.Scaler(self.mc_data_df[data_name]['{}'.format(columnnames[0])],
                             pnames=self.state.param_names)
  self.state.set_tr_center_scaled(Sclocal.scale(self.state.tr_center).tolist())
  self.state.set_scaled_min_max_parameter_bounds(Sclocal.box_scaled[:,0].tolist(),Sclocal.box_scaled[:,1].tolist())

  # For each term e.g., term1, term2, ...
  for cnum in range(0,len(columnnames),2):
     X = self.mc_data_df[data_name]['{}'.format(columnnames[cnum])]
     Xscaled = [Sclocal.scale(x) for x in X]
     Y = self.mc_data_df[data_name]['{}'.format(columnnames[cnum+1])]
     model_parameters = self.state.model_parameters[data_name]
     """
     CONSTRUCT MODELS
        This is where your surrogate model construction code should be called, i.e.,
        Use X, Y and model_parameters to construct surrogate models for
        unscaled data and store in unscaled_model_out <any>
        Use Xscaled, Y and model_parameters to construct surrogate models
        for scaled data and store in scaled_model_out <any>
     """

     # Save the surrogate models

     scaled_val_out_file = self.state.working_directory.get_log_path(
          "{}_model_scaled_k{}.<ext>".format(data_name,self.state.k))
     """
     STORE scaled_model_out into scaled_val_out_file
     """
     self.state.update_f_structure_model_parameters('model_scaled',{data_name:val_out_file})

     unscaled_val_out_file = self.state.working_directory.get_log_path(
          "{}_model_unscaled_k{}.<ext>".format(data_name,self.state.k))
     """
     STORE unscaled_model_out into unscaled_val_out_file
     """
     self.state.update_f_structure_model_parameters('model',{data_name:val_out_file})

Note that you need to replace the CONSTRUCT MODELS and STORE sections in the code above to complete the model construction function. Install the code by typing the following commands:

cd maestro
pip install .

Then the following model object should be used in configuration inputs where the name of the data generated by the Monte Carlo simulator is MC

"model":{
  "function_str":{
    "MC":"my_appx_construct",
  },
  "parameters":{
    "MC":{"key-value pairs required as model_parameter in this model function"},
  }
}

If you want to make your model function publicly available with MÆSTRO, consider submitting a pull request.

Function Structure

The f_structure functions in MÆSTRO are available via the functions in maestro.Fstructure and they are stored in the file at path maestro/fstructure.py.

Available f_structure functions

At this time, the following functions are available in maestro.Fstructure:

def appr_tuning_objective:
def appr_tuning_objective_without_error_vals:

All these function calls use apprentice to construct f_structure function objects.

appr_tuning_objective

The objective function in this object calculates the least squares objective with error values generated by simulator. Specifically, the objective function in this object is:

\[L_2(p) = \sum_{t=0}^{N_t} w_t \frac{ (M_t(p)-D_t)^2 }{\widetilde{M_t}(p)^2 + \widetilde{D_t}^2}\]

where

  • \(N_t\): number of terms e.g., term1, term2, …

  • \(w_t\): weight for term t

  • \(M_t(p)\): surrogate model of mean value or the MC mean value for term t evaluated at parameter value p

  • \(D_t\): data (mean) value for term t

  • \(\widetilde{M_t}(p)\): surrogate model of error value or the MC error value for term t evaluated at parameter value p

  • \(\widetilde{D_t}\): data error for term t

The following f_structure object should be used in configuration inputs

"f_structure":{
  "parameters":{
    "data":"<Path of the data file, see below>",
    "weights":"<Path of the weight file, see below>",
    "optimization":{
      "nstart":5,
      "nrestart":10,
      "saddle_point_check":false,
      "minimize":true,
      "use_mpi":true
    }
  },
  "function_str":"appr_tuning_objective"
}

Data File

The data file is a JSON file with keys that are the term names and values that is an array of the [\(D_t,\widetilde{D_t}\)] corresponding to the term \(t\). If the key data is not specified in the f_structure object, then \(D_t=0\) and \(\widetilde{D_t}=1\) is assumed for each term \(t\). An example data file is given below

{
    "Term1": [
            0.0,
            1.0
    ],
    "Term2": [
            0.0,
            1.0
    ],
    "Term3": [
            0.0,
            1.0
    ]
}

Weight File

The weight file is a tab delimited file where the first column are the term names and the second column is \(w_t\) corresponding to the term \(t\). If the key weights is not specified in the f_structure object, then \(w_t=1\) is assumed for each term \(t\). An example weight file is given below:

Term1 1.0
Term2 1.0
Term3 1.0

appr_tuning_objective_without_error_vals

The objective function in this object calculates the least squares objective without the error values generated by simulator. Specifically, the objective function in this object is:

\[L_2(p) = \sum_{t=0}^{N_t} w_t \frac{ (M_t(p)-D_t)^2 }{\widetilde{D_t}^2}\]

where

  • \(N_t\): number of terms e.g., term1, term2, …

  • \(w_t\): weight for term t

  • \(M_t(p)\): surrogate model of mean value or the MC mean value for term t evaluated at parameter value p

  • \(D_t\): data (mean) value for term t

  • \(\widetilde{D_t}\): data error for term t

The following f_structure object should be used in configuration inputs

"f_structure":{
  "parameters":{
    "data":"<Path of the data file, see below>",
    "weights":"<Path of the weight file, see below>",
    "optimization":{
      "nstart":5,
      "nrestart":10,
      "saddle_point_check":false,
      "minimize":true,
      "use_mpi":true
    }
  },
  "function_str":"appr_tuning_objective_without_error_vals"
}

Data File

The data file is a JSON file with keys that are the term names and values that is an array of the [\(D_t,\widetilde{D_t}\)] corresponding to the term \(t\). If the key data is not specified in the f_structure object, then \(D_t=0\) and \(\widetilde{D_t}=1\) is assumed for each term \(t\). An example data file is given below

{
    "Term1": [
            0.0,
            1.0
    ],
    "Term2": [
            0.0,
            1.0
    ],
    "Term3": [
            0.0,
            1.0
    ]
}

Weight File

The weight file is a tab delimited file where the first column are the term names and the second column is \(w_t\) corresponding to the term \(t\). If the key weights is not specified in the f_structure object, then \(w_t=1\) is assumed for each term \(t\). An example weight file is given below:

Term1 1.0
Term2 1.0
Term3 1.0

Creating your own f_structure function

To create your own f_structure function, you can use the template below with inline comments explaining different lines of the code:

def my_f_structure_function(self, parameter=None, use_scaled=False):
  """
  In maestro/fstructure.py, create a function with three arguments
  parameter is an optional parameter argument, in case the recurrence of the
  function needs to be set for faster computation and the use_scaled argument
  that specifies whether to use the scaled or unscaled surrogate models in the f_structure
  function
  """
  m_type = 'model_scaled' if use_scaled else 'model'

  # get the f_structure parameters
  f_structure_parameters = self.state.f_structure_parameters

  # get the mdoels
  models = [self.state.f_structure_parameters[m_type][self.state.data_names[i]]
              for i in range(len(self.state.data_names))]

  # CONSTRUCT FUNCTION STRUCTURE OBJECT
  SP = f(models, f_structure_parameters)

  return SP

Note that you need to replace the CONSTRUCT FUNCTION STRUCTURE OBJECT section in the code above to complete the f_structure object construction function. Also, the following methods should be callable on SP:

# returns the objective function value using surrogates evaluated at parameter p
SP.objective(p)

# returns the objective function using MC simulator values obtained at parameter p,
# the MC simulator values are passed as a pandas_dataframe with the following
# structure:
#
#      MC                          ...
#    term1.P        [[1., 2.],[4., 8.],[12.,9],...]
#    term1.V        [19., 18., 17.,...]                ...
#    term2.P        [[1., 2.],[4., 8.],[12.,9],...]
#    term2.V        [29., 28., 27.,...]
#    ...            ...                                ...
SP.objective_without_surrograte_values(pandas_dataframe)

# returns the gradient of the f_structure function at parameter p
SP.gradient(p)

# runs optimization and returns result where
# result['x'] is the optimal parameter (argmin) and
# result['fun'] is the minimum objective function value (min)
SP.minimize(**self.state.f_structure_parameters['optimization'])

Install the code by typing the following commands:

cd maestro
pip install .

Then the following f_structure object should be used in configuration inputs

"f_structure":{
  "parameters":{
    "key-value pairs required as f_structure_parameters in this f_structure function"
    "optimization":{
      "key-value pairs required by the minimize function"
    }
  },
  "function_str":"my_f_structure_function"
}

If you want to make your f_structure function publicly available with MÆSTRO, consider submitting a pull request.