Comparing Inverse Optimization and Machine Learning Methods for Imputing a Convex Objective Function
Abstract
Inverse optimization (IO) aims to determine optimization model parameters from observed decisions. However, IO is not part of a data scientist’s toolkit in practice, especially as many general-purpose machine learning packages are widely available as an alternative. When encountering IO, practitioners face the question of when, or even whether, investing in developing IO methods is worthwhile. Our paper provides a starting point toward answering these questions, focusing on the problem of imputing the objective function of a parametric convex optimization problem. We compare the predictive performance of three standard supervised machine learning (ML) algorithms (random forest, support vector regression and Gaussian process regression) to the performance of the IO model of Keshavarz \BOthers. (\APACyear2011). While the IO literature focuses on the development of methods tailored to particular problem classes, our goal is to evaluate general ‘out-of-the-box’ approaches. Our experiments demonstrate that determining whether to use an ML or IO approach requires considering (i) the training set size, (ii) the dependence of the optimization problem on external parameters, (iii) the level of confidence with regards to the correctness of the optimization prior, and (iv) the number of critical regions in the solution space.
keywords:
inverse optimization; machine learning; data efficiency; parametric optimization; multi-parametric programming1 Introduction
Inverse optimization (IO) imputes missing optimization model parameters from data that represents minimally sub-optimal solutions of that unknown optimization problem (e.g., past decisions of an optimizing agent). In the academic literature, IO has been shown successful in a variety of problems, including medical decision making (Chan \BOthers., \APACyear2014), electricity demand forecasting (Saez-Gallego \BBA Morales, \APACyear2017), the household activity pattern problem (Chow \BBA Recker, \APACyear2012) and economic lot-sizing (Egri \BOthers., \APACyear2014). However, IO is rarely used by practitioners.
Previous work has established an analogy between IO and regression (Chan \BOthers., \APACyear2019), and provided a statistical inference perspective on IO (Aswani \BOthers., \APACyear2018). More generally, one can notice that imputing parameter values from data can be viewed as a machine learning problem (Tan \BOthers., \APACyear2019), a perspective that we adopt and explore in this paper. Viewing an inverse (parametric) optimization problem as a learning problem raises questions that have previously not been explored in the literature and that are of practical interest. First, how well would classic machine learning methods perform on such problems? Second, what characteristics would make a problem challenging for classic machine learning methods and would require the more specialized methods of IO? Our interactions with data science teams of two large companies have led us to believe that answering these questions is essential if a data scientist wants to consider adding IO to their toolkit.
While most of the IO literature develops specialized approaches, our goal is to compare fairly general methods, as we believe these are more likely to be used by practitioners. After establishing our perspective on inverse parametric optimization as a learning problem, we experimentally compare the performance of a classic IO method with three out-of-the-box machine learning (ML) methods, namely Gaussian process regression, random forest and support vector regression. We use the IO method of Keshavarz \BOthers. (\APACyear2011) as it is well suited and easy to apply to inverse parametric convex optimization problems. We show that the chosen ML methods can indeed perform well on a problem that has been well-studied in the IO literature. To identify the characteristics that would make an IO problem challenging for ML (highlighting the need for sophisticated IO models), we conduct experiments on random parametric optimization problems (POPs) generated using the parametric optimization toolbox of Oberdieck \BOthers. (\APACyear2016). To the best of our knowledge, no previous papers have compared IO and ML methods on this class of problems. Our experiments with randomly-generated POPs demonstrate that the choice of an ML or IO approach should depend on (i) the size of the training set, (ii) the nature of the dependence of the optimization problem on external parameters, (iii) the level of confidence with regards to the correctness optimization prior, (iv) the number of critical regions in the solution space of the POP.
2 Background and Related Work
To define an IO problem, we first need to postulate a forward optimization problem, i.e., the problem whose solutions, perhaps corrupted by noise, we observe. The initial focus of the IO literature was to impute coefficients of forward optimization problems that do not have any dependence on an external parameter (non-parametric). Burton \BBA Toint (\APACyear1992, \APACyear1994) study the inverse shortest path problem in which the goal is to find a minimal perturbation of the arc costs to make an observed set of paths optimal. Given an observed value for decision variables , Ahuja \BBA Orlin (\APACyear2001) find a vector of minimal distance to a known vector that would make optimal. Chan \BOthers. (\APACyear2014) and Chan \BOthers. (\APACyear2019) study the inverse linear programming problem when the observed data is noisy and no prior is given. Tavaslıoğlu \BOthers. (\APACyear2018) characterize the inverse feasible region: the set of objectives that would make a given set of feasible solutions to a linear program optimal. Shahmoradi \BBA Lee (\APACyear2020) introduce a way to measure how sensitive the set of optimal cost vectors is to changes in a given data set.
A parametric optimization problem (also known as a multi-parametric programming problem) is a family of optimization problems parametrized by an independent parameter; the forward parametric problem involves finding a function that would map values of the independent parameter to optimal solutions (Pistikopoulos \BOthers., \APACyear2011). As an example, consider the following parameteric linear program with independent parameters : , where instantiating and leads to a standard (non-parametric) optimization problem. There is substantial interest in the inverse parametric optimization problem (Keshavarz \BOthers., \APACyear2011; Chow \BBA Recker, \APACyear2012; Aswani \BOthers., \APACyear2018; Esfahani \BOthers., \APACyear2018; Saez-Gallego \BBA Morales, \APACyear2017; Kovács, \APACyear2019; Tan \BOthers., \APACyear2019, \APACyear2020). In such a problem, pairs where is an independent parameter for observation and is an observed value for the decision variables are given; the goal is to impute objective function and/or constraint coefficients that would make the observed points optimal or near-optimal solutions of the given parametric forward optimization problem.
Classic IO methods (Burton \BBA Toint, \APACyear1992; Ahuja \BBA Orlin, \APACyear2001; Keshavarz \BOthers., \APACyear2011) rely on some form of optimality conditions, such as the Karush-Kuhn-Tucker (KKT) conditions. More recently, methods based on a combination of ML and IO have emerged (Tan \BOthers., \APACyear2019, \APACyear2020; Babier \BOthers., \APACyear2021). Aswani \BOthers. (\APACyear2019) and Fernández-Blanco \BOthers. (\APACyear2019) include comparison of an IO approach to several ML algorithms in the context of a particular application. Related work in the ML literature includes, most notably, structured prediction (Taskar \BOthers., \APACyear2005). Despite this related ML-based work, a comparison of classic methods on general parametric IO problems has not previously been done and the questions posed in the introduction have not been answered.
3 Problem Definition
We study IO for parametric optimization problems (POPs), which are ‘compatible’ with an ML perspective unlike the non-parametric optimization ones. POPs are mathematical programs where the objective function and constraints are functions of one or multiple external parameters (Pistikopoulos \BOthers., \APACyear2011; Oberdieck \BOthers., \APACyear2016). The forward parametric optimization problem (FOP) of interest to us is:
(1) | ||||
s.t. |
where are the decision variables, are the independent parameters (‘features’ in ML parlance), and are the coefficients of the objective function . We assume and are differentiable and convex in x for each value of u. We let be the index set of variables with , and be the index set of constraints with ; we let denote the column vector of zeros of the required dimension.
In an inverse parametric optimization problem, observations of pairs are given; the goal is to find a that would make minimally sub-optimal for the corresponding , for each :
(2) |
where is any given loss function, is an optimal solution to the fitted forward optimization model instantiated at , and is the feasible region corresponding to . There are various ways to reformulate (2); the particular model we use here is the model proposed by Keshavarz \BOthers. (\APACyear2011), which formulates the optimality criterion via KKT optimality conditions (see appendix).
The goal of IO could be explanatory, i.e., we aim to characterize the process that generated the data, or predictive, i.e., we want to fit a model that will predict the actions of an optimizing agent/process. That is, having imputed and given we could use to predict . Fitting a model to data can also result in better prescriptive performance, i.e., if instead of predicting what an optimizing agent/process would do, we want to prescribe an optimal decision under new circumstances (this latter perspective being more consistent with classical optimization).
4 Inverse Optimization Problem as a Learning Problem
Learning is the capability to acquire knowledge by extracting patterns from raw data (Goodfellow \BOthers., \APACyear2016); in this sense, IO fits within the broader conceptual framework of learning, i.e., we are acquiring knowledge about an optimization process that generated the data we observe. Applied to the IO problem described in Section 3, supervised ML would aim to find a vector-valued function from u (input variable or feature) to x (target variable) such that the loss function is minimized (Russell \BBA Norvig, \APACyear2016). Once is imputed, it can be used for prediction given new feature observations . Figure 1 illustrates this observation: in the left panel, we shown that IO takes pairs of (i.e., data) as input, imputes the value of (i.e., knowledge) and then, given and an imputed , makes a prediction ; in the right panel, we show that the same happens in a typical supervised ML framework, with the difference being the actual models and methods used to impute the unknown parameters and, consequently, the learned model that is used for prediction.

The fundamental assumption in IO is that the observations are near-optimal or optimal solutions to an optimization problem, whereas general ML methods do not require such an assumption. Based on the description above, an IO problem can be seen as part of the same conceptual framework as machine learning – the main difference being that in IO data is assumed to come from an optimization process. Viewed another way, we can say that IO assumes a strong prior that the input data is coming from an optimization process, and IO methods leverage this prior knowledge.

Consider Figure 2 (a): points A to D are given and our goal is to predict the next observation. Point is an unseen test point for which we want to evaluate the quality of prediction. Given no additional information, a natural idea would be to fit a curve using simple linear or polynomial regression. The training error for these models may be low but the test error would be high, which would normally signal over-fitting to the training data. However, here the problem would not be over-fitting — rather, the issue is that the data is not coming from a linear or a polynomial relationship between and and also not from a linear or polynomial relationship between a feature and each of the . In reality, , , and are optimal solutions to the parametric optimization problem subject to the convex hull of , shown in Figure 2 (d), with , , , and . This example was generated using the Wolfram Parametric Linear Programming app (Bunduchi \BBA Mandric, \APACyear2011).
Just like we aim to fit a linear regression model when we conjecture the relationship is linear, or a polynomial one when we conjecture it is polynomial, we can fit an optimization model when we think the data is coming from an optimization problem. Prior knowledge of the process that generated the data can have a substantial impact on the ability to fit a good model; the above example illustrates specifically the effect of incorporating an optimization model prior. Next, we compare the predictive performance of IO methods, which assume an optimization prior, and ML methods, which do not, when data is generated from parametric convex optimization problems.
5 Experiments
5.1 Preliminaries
Our goal is to evaluate ‘out-of-the-box’ approaches that can be easily implemented in practice: a KKT-based IO model (Keshavarz \BOthers., \APACyear2011), random forest (Breiman, \APACyear2001), support vector regression (Drucker \BOthers., \APACyear1997) and Gaussian process regression (Rasmussen, \APACyear2004). We use the scikit-learn (Pedregosa \BOthers., \APACyear2011) implementation of the ML methods. Keshavarz \BOthers. (\APACyear2011)’s KKT-based IO model is solved using Concert Technology of IBM ILOG CPLEX v.12.7.0. All experiments are run on an HP server with 20 Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz processors and 512 GB RAM under Linux environment.
To show the value of prior knowledge for IO, we consider two types of IO models. In IO-perfect, we have perfect information about the form of the unknown objective function (but do not know its parameters). In IO-imperfect, the objective function is misspecified, and so differs from the true one due to modeling error.
Pairs are generated using Algorithm 1 and divided into a training set and a test set. We use leave-one-out cross-validation on the training set for the ML methods: we pick one point as our validation set, fit a model to the remaining data and evaluate the error on the single held-out point. A validation error estimate is obtained by repeating this procedure for each of the training points and averaging the results. Doing so with several hyperparameter settings enables us to choose the hyperparameter setting with the lowest average cross-validation error. All of the ML experimental results presented below show the error on the test set for the best hyper-parameter setting found during the leave-one-out cross-validation procedure.
We measure the deviation between the predicted x and in the test set (of size ) using mean relative error (MRE):
(3) |
5.2 Experiment 1: Utility Function Estimation
Consider imputing a customer’s unknown utility function given observations of their past purchases (Keshavarz \BOthers., \APACyear2011; Bärmann \BOthers., \APACyear2017). We assume that customers solve an internal optimization problem in the course of purchasing goods to maximize their utility function:
(4) | ||||
subject to |
where and are the vectors of consumer demand and the associated price, respectively. Similarly to Keshavarz \BOthers. (\APACyear2011), we assume that the function is a concave utility function which is non-decreasing over , being the maximum demand level found from past observations. Since is concave, the objective function of (4) is convex.
We generate . The true utility function is . We use in implementing IO-perfect (i.e., the model is correctly specified so that all observations can be made optimal solutions for their corresponding ). For IO-imperfect, we use .
Based on the results of our hyper-parameter search during leave-one-out cross-validation, we use the scikit-learn implementation of random forest with n_estimators=50, max_depth=None, max_feature=None, support vector regression with c=0.1, gamma="auto", kernel=RBF and Gaussian process with kernel=RBF, alpha=1e-10, normalized_y=False, optimizer="fmin_1_bfgs_b" and
n_restarts_optimizer=10.
Figure 3 shows how the relative test error of IO-perfect, IO-imperfect, RF, SVR and GP changes when we increase the training size. Whereas RF, SVR and GP performance improves, the performance of IO-imperfect and IO-perfect is not affected. GP achieves performance similar to that of IO-imperfect with only 20 observations, while RF and SVR get close to IO-imperfect at 60 and 100 points, respectively. The ML methods perform well because the underlying optimization process (resulting from model (4)) is constrained by only non-negativity constraints and is parametric only in the objective function – the relationship between the input and output of this process can be captured (learned) without knowledge of the underlying optimization model. In the next experiment, we aim to identify IO problems which would be more difficult for these ML methods and would require knowledge of the underlying optimization structure to make good predictions.

5.3 Experiment 2: Randomly-Generated Parametric Optimization Problems
Using the POP toolbox (Oberdieck \BOthers., \APACyear2016), we generate POPs of the following form:
(5) | ||||
s.t. | ||||
where , and . In the inequality constraints, , and . The last constraint of (5) restricts the parameter space, i.e., the admissible values of external parameter (feature) , where and allow for representations of constraints on (in particular, lower and upper bounds – hence the number of rows is ), and denotes the dimensionality of . We assume in our experiments.
To study (forward) POPs, it is standard to view the space of feasible parameters as being partitioned into polyhedrons called critical regions, each of which is uniquely defined by a set of optimal active constraints (Ahmadi-Moshkenani \BOthers., \APACyear2018; Oberdieck \BOthers., \APACyear2016). It is known that if is positive definite and is convex, the optimal objective function is continuous and piecewise quadratic, while if the problem is reduced to a multi-parametric linear program, then the optimal objective function is continuous, convex, and piecewise affine (see Oberdieck \BOthers. (\APACyear2016) and the references therein). Figure 4 shows an example of one of the instances we generate using the POP toolbox of Oberdieck \BOthers. (\APACyear2016) with 21 critical regions and a two-dimensional ; the critical regions are shown in the left panel while the corresponding optimal value function is shown on the right.
To be able to control the number of critical regions in our experiments, we manually pre-process the instances generated from the toolbox to define problems over smaller regions of the parameter space with a particular number of critical regions. For instance, from the example in Figure 4, we can generate a one-region, a three-region and a five-region problems by selecting the regions indicated by the black rectangle borders. Throughout this process, care is taken to select intervals where each critical region has close to a “fair share” of the region; in preliminary experiments, it was noted that the difficulty of learning over a parameter space interval that covers, say, five regions but is dominated by one particular region is similar to the case of one critical region (more discussion on the effect of critical regions on learning performance appears later in this section).

Varying the Training Set Size (Data Efficiency)
We compare the predictive performance of IO-perfect, IO-imperfect, SVR, RF and GP varying the training size from 1 to 500 on 20 randomly generated POPs of type (5) with two parameters and , with 1 and 3 critical regions. For IO-imperfect we use the mean of selected intervals of in the term in the objective function instead of a parametric term.
Figures 5(a) and 5(b) illustrate the predictive (i.e., test set) performance of these methods (averaged over 20 instances) given data sets from one critical region and three critical regions, respectively. As expected, ML performance improves with increasing training set size in both cases. On the contrary, IO methods are not affected: IO-perfect has nearly zero error while IO-imperfect incurs around 8% error, regardless of the training set size. Since IO is able to achieve low prediction error with a much smaller training size, we see that IO methods with correctly specified priors can be more data efficient than ML algorithms for problems where data is generated by an optimization process.
Because the IO approach we chose solves an optimization model, no matter how many points are given, IO-perfect is able to recover the parameters that make the observations optimal while IO-imperfect will maintain the same level of mis-specification. Insensitivity to the training set size is consistent with Aswani \BOthers. (\APACyear2018)’s results, which showed that Keshavarz \BOthers. (\APACyear2011)’s method is in general not risk consistent, i.e., it is not guaranteed to asymptotically provide the best possible predictions. However, we emphasize that despite being risk inconsistent, given the right prior, the Keshavarz \BOthers. (\APACyear2011) approach performs well, and can be the method of choice due to its data efficiency.


The test error for ML methods increases for all training set sizes as we move from one critical region (Figure 5(a)) to three critical regions (Figure 5(b)). Comparing Figures 5(a) and 5(b), the curves for GP and RF in 5(a) intersect IO-imperfect after 200 points, but in 5(b) they require more points to be comparable to IO-imperfect. This observation suggests that the number of critical regions in the parameter space may be a factor that makes these problems more difficult for standard ML approaches, motivating the next experimental comparison.
Increasing the Number of Critical Regions
Figure 6 shows the average predictive performance of IO-imperfect and the three ML methods over 20 randomly generated POPs given a training set of 200 observations. For each of the 20 problem instances, we choose one interval of with one critical region, one with three critical regions and one with five, keeping the chosen range of both and equal to two, as in Figure 4. We can see that IO-imperfect is not sensitive to the number of critical regions while for all employed ML methods the test error increases with the number of critical regions. As mentioned above, every critical region is a polyhedron of values over which the optimal objective function is quadratic (a quadratic ‘piece’ of the overall piecewise quadratic function) while the optimizer function is piecewise-affine (Oberdieck \BOthers., \APACyear2016). It is not surprising that some of the ML methods can perform well with 200 points over one critical region as they only have to learn one quadratic function. When the number of critical regions increases, the task for the chosen ML methods becomes more challenging: they have to learn multiple quadratic functions without any prior knowledge about when the transition from one region to another will occur. We do not rule out the possibility of creating new, custom learning algorithms to overcome this issue but doing so is beyond the scope of this paper.

Varying the Functional Dependence of the Objective Function on u
Next, we investigate how the relationship between the TPOP objective function and the feature u affects the performance of ML and IO. We investigate this question in the context of one randomly generated POP (see Figure 7):
(6) | ||||
s.t. | ||||

We consider three objective functions for problem (6) defined through functions and , which set the coefficients of and :
(7) |
We start with a linear followed by changing it to a multi-variate rational function and subsequently increasing the degree of the polynomials forming the rational function to create a relationship that could lead to a more challenging learning problem.
As shown in Table 1, the test error of GP, SVR and RF increases as we move from the ‘simple’ (linear) dependence on to making the coefficient of and multi-variate rational functions of . Since ML methods aim to find a mapping from u to x, a more complex indeed makes learning more difficult. On the other hand, IO is not sensitive to the nature of as this function is just the coefficient of x in the IO mathematical model.
True Objective Function | Method | MRE % () | MRE % () |
IO-perfect | |||
IO-imperfect | 3.85 | 3.79 | |
GP | 9.75 | 6.03 | |
RF | 6.75 | 5.74 | |
SVR | 19.03 | 18.04 | |
IO-perfect | |||
IO-imperfect | 7.80 | 7.11 | |
GP | 10.85 | 8.59 | |
RF | 9.77 | 7.07 | |
SVR | 24.94 | 23.30 | |
IO-perfect | |||
IO-imperfect | 7.78 | 7.62 | |
GP | 15.28 | 10.29 | |
RF | 22.24 | 17.75 | |
SVR | 38.76 | 34.80 |
Investigating the Impact of Correctness of Prior Knowledge
Above, we show that IO-perfect and IO-imperfect are insensitive to problem characteristics which significantly impact the performance of ML (training set size, number of critical regions and the nature of the dependence on ). However, a difference between IO-perfect and IO-imperfect was observable. To investigate the impact of the correctness of objective function prior, we consider five objective function choices for problem (6) which encode different degrees of correctness (or mis-specification) of a prior.
In Table 2, given , we aim to impute and . The first objective function is the exact objective function of problem (6), i.e., the IO-perfect method. The subsequent functions deviate from the true function in the coefficients of and (but the goal is still to impute and from the data generated by the true objective function of problem (6)). Besides parametric objective functions (i.e., objective functions defined over u), we also consider two non-parametric choices. In the first non-parametric objective, we use the mean of and from the given training set and ; in the second non-parametric function, we completely eliminate the linear terms.
We see that even slight differences in the function increase IO test error, although for the parametric functions the difference is small. The non-parametric mean-based function also does not lead to a significant increase, but this could change if a larger range of is considered. Removing any encoding of the parametric parts of the objective function drastically increases the prediction error from 3.88% to 52.28%. These observations confirm the sensitivity of IO to the correctness of objective function prior.
Objective Function | MRE % () | |
Parametric | 0 | |
0.38 | ||
3.56 | ||
Non-Parametric | ||
3.88 | ||
52.28 |
5.4 Insights
Our experiments with randomly-generated POPs demonstrate the importance of four characteristics as determinants of the difficulty of objective function learning: (i) size of the training set, (ii) nature of the dependence of the optimization problem on the external parameters, (iii) level of confidence with regards to the correctness of the optimization prior, (iv) number of critical regions in the parameter space of a POP.
Increasing the size of the training set favours ML methods but not necessarily the IO methods. The contrast in performance between the problem in experiment 1 and the randomly-generated POPs suggests that ML methods are more likely to be successful on problems with no or few (parametric) constraints. Increasing the complexity of the relationship between and the objective function also makes the problem more difficult for ML. On the other hand, the performance of IO is strongly dependent on the correctness of the objective function prior.
Perhaps the most important characteristic determining the difficulty of the objective function learning tasks is the number of critical regions. Over 20 randomly-generated POPs, we found a substantial increase in the relative test error for all three ML methods as the number of critical regions increased. In fact, we conjecture that the underlying reason why adding parametric constraints and/or increasing the complexity of the dependence on made learning more difficult is because doing so also increased the number of critical regions.
Given these observations, a summary of recommendations of when to use ML or IO for learning the coefficients of a convex POP is needed. We summarize our recommendations in Table 3. In each combination of a criterion and its value (low/high or small/large), we state whether we expect the classic ML methods or an IO method to be a better choice (or at least a better starting point) for solving the learning problem, while holding other criteria fixed. For example, when training set size is small, we expect IO to be a better option due to its data efficiency.
Low/Small | High/Large | |
Size of Training Set | IO | ML |
Dependence on External Parameter | ML | IO |
Confidence in Correctness of Prior | ML | IO |
Number of Critical Regions | ML | IO |
The relative performance of the methods considered may not be the same for other types of learning problems where data was generated by an optimization problem (e.g., learning constraints, learning in discrete optimization problems). However, we conjecture that the analysis of the underlying structure of the value function or the optimizer function in terms of the external parameters (i.e., analysis of the problem in terms the critical regions) will be important in both gaining additional understanding of the challenges of learning from optimization data and developing more sophisticated learning methods for such problems.
6 Conclusion
In this paper, we view inverse optimization as a problem of learning from decisions that are made through an unknown optimization process. We specifically focus on the problem of learning a convex objective function of a parametric optimization problem. We experimentally compare the predictive performance of an inverse optimization method with perfect and imperfect priors with three well-known machine learning algorithms: support vector regression, random forest and Gaussian processes. While we show that some inverse optimization problems can be tackled through classic machine learning approaches, we highlight the need for sophisticated inverse optimization models for problems where at least one of the following characteristics holds: (i) the size of the training set is small, (ii) both the constraints and the objective function of the problem in question are dependent on an external parameter (feature), particularly when that dependence is non-linear, (iii) we have high confidence that our knowledge of the parametric nature of the constraints and objective is correct, and (iv) the number of critical regions of the POP is large with no one region dominating. We believe that these observations provide practitioners with guidance on when to consider employing inverse optimization instead of, or in addition to, classical machine learning.
Appendix: KKT-based Inverse Optimization Model
The Keshavarz \BOthers. (\APACyear2011) IO model for (1) is
s.t. | |||
where is some norm, and represent the complementary slackness and stationarity residuals, respectively, of the KKT conditions; is the dual variable. The missing objective function coefficient vector c is a decision variable and pairs of are the inputs.
Acknowledgement(s)
The authors are supported by the Natural Sciences and Engineering Research Council of Canada.
References
- Ahmadi-Moshkenani \BOthers. (\APACyear2018) \APACinsertmetastarAhmadi18{APACrefauthors}Ahmadi-Moshkenani, P., Johansen, T\BPBIA.\BCBL \BBA Olaru, S. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleCombinatorial approach toward multiparametric quadratic programming based on characterizing adjacent critical regions Combinatorial approach toward multiparametric quadratic programming based on characterizing adjacent critical regions.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Automatic Control63103221–3231. \PrintBackRefs\CurrentBib
- Ahuja \BBA Orlin (\APACyear2001) \APACinsertmetastarAhuja01{APACrefauthors}Ahuja, R\BPBIK.\BCBT \BBA Orlin, J\BPBIB. \APACrefYearMonthDay2001. \BBOQ\APACrefatitleInverse optimization Inverse optimization.\BBCQ \APACjournalVolNumPagesOperations Research495771–783. \PrintBackRefs\CurrentBib
- Aswani \BOthers. (\APACyear2019) \APACinsertmetastaraswani2019behavioral{APACrefauthors}Aswani, A., Kaminsky, P., Mintz, Y., Flowers, E.\BCBL \BBA Fukuoka, Y. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleBehavioral Modeling in Weight Loss Interventions Behavioral modeling in weight loss interventions.\BBCQ \APACjournalVolNumPagesEuropean Journal of Operational Research27231058–1072. \PrintBackRefs\CurrentBib
- Aswani \BOthers. (\APACyear2018) \APACinsertmetastaraswani2018inverse{APACrefauthors}Aswani, A., Shen, Z\BHBIJ.\BCBL \BBA Siddiq, A. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleInverse optimization with noisy data Inverse optimization with noisy data.\BBCQ \APACjournalVolNumPagesOperations Research663870–892. \PrintBackRefs\CurrentBib
- Babier \BOthers. (\APACyear2021) \APACinsertmetastarbabier2019{APACrefauthors}Babier, A., Chan, T\BPBIC\BPBIY., Lee, T., Mahmood, R.\BCBL \BBA Terekhov, D. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleAn Ensemble Learning Framework for Model Fitting and Evaluation in Inverse Linear Optimization An ensemble learning framework for model fitting and evaluation in inverse linear optimization.\BBCQ \APACjournalVolNumPagesINFORMS Journal on OptimizationArticles in Advance1–20. {APACrefURL} https://doi.org/10.1287/ijoo.2019.0045 \PrintBackRefs\CurrentBib
- Bärmann \BOthers. (\APACyear2017) \APACinsertmetastarBarmann17{APACrefauthors}Bärmann, A., Pokutta, S.\BCBL \BBA Schneider, O. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleEmulating the Expert: Inverse Optimization through Online Learning Emulating the expert: Inverse optimization through online learning.\BBCQ \BIn \APACrefbtitleInternational Conference on Machine Learning International conference on machine learning (\BPGS 400–410). \PrintBackRefs\CurrentBib
- Breiman (\APACyear2001) \APACinsertmetastarbreiman2001random{APACrefauthors}Breiman, L. \APACrefYearMonthDay2001. \BBOQ\APACrefatitleRandom forests Random forests.\BBCQ \APACjournalVolNumPagesMachine Learning4515–32. \PrintBackRefs\CurrentBib
- Bunduchi \BBA Mandric (\APACyear2011) \APACinsertmetastarMathematica{APACrefauthors}Bunduchi, E.\BCBT \BBA Mandric, I. \APACrefYearMonthDay2011. \APACrefbtitleParametric Linear Programming. Parametric linear programming. {APACrefURL} http://demonstrations.wolfram.com/ParametricLinearProgramming/ \PrintBackRefs\CurrentBib
- Burton \BBA Toint (\APACyear1992) \APACinsertmetastarburton1992instance{APACrefauthors}Burton, D.\BCBT \BBA Toint, P\BPBIL. \APACrefYearMonthDay1992. \BBOQ\APACrefatitleOn an instance of the inverse shortest paths problem On an instance of the inverse shortest paths problem.\BBCQ \APACjournalVolNumPagesMathematical Programming531-345–61. \PrintBackRefs\CurrentBib
- Burton \BBA Toint (\APACyear1994) \APACinsertmetastarburton_correlated94{APACrefauthors}Burton, D.\BCBT \BBA Toint, P\BPBIL. \APACrefYearMonthDay1994. \BBOQ\APACrefatitleOn the use of an inverse shortest paths algorithm for recovering linearly correlated costs On the use of an inverse shortest paths algorithm for recovering linearly correlated costs.\BBCQ \APACjournalVolNumPagesMathematical Programming631-31–22. \PrintBackRefs\CurrentBib
- Chan \BOthers. (\APACyear2014) \APACinsertmetastarchan2014generalized{APACrefauthors}Chan, T\BPBIC., Craig, T., Lee, T.\BCBL \BBA Sharpe, M\BPBIB. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleGeneralized inverse multiobjective optimization with application to cancer therapy Generalized inverse multiobjective optimization with application to cancer therapy.\BBCQ \APACjournalVolNumPagesOperations Research623680–695. \PrintBackRefs\CurrentBib
- Chan \BOthers. (\APACyear2019) \APACinsertmetastarGIO{APACrefauthors}Chan, T\BPBIC., Lee, T.\BCBL \BBA Terekhov, D. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleInverse optimization: Closed-form solutions, geometry, and goodness of fit Inverse optimization: Closed-form solutions, geometry, and goodness of fit.\BBCQ \APACjournalVolNumPagesManagement Science6531115–1135. \PrintBackRefs\CurrentBib
- Chow \BBA Recker (\APACyear2012) \APACinsertmetastarchow2012inverse{APACrefauthors}Chow, J.\BCBT \BBA Recker, W. \APACrefYearMonthDay2012. \BBOQ\APACrefatitleInverse optimization with endogenous arrival time constraints to calibrate the household activity pattern problem Inverse optimization with endogenous arrival time constraints to calibrate the household activity pattern problem.\BBCQ \APACjournalVolNumPagesTransportation Research Part B: Methodological463463–479. \PrintBackRefs\CurrentBib
- Drucker \BOthers. (\APACyear1997) \APACinsertmetastarvapnik95{APACrefauthors}Drucker, H., Burges, C., Kaufman, L., Smola, A.\BCBL \BBA Vapnik, V. \APACrefYearMonthDay1997. \BBOQ\APACrefatitleSupport vector regression machines Support vector regression machines.\BBCQ \BIn \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BPGS 155–161). \PrintBackRefs\CurrentBib
- Egri \BOthers. (\APACyear2014) \APACinsertmetastaregri2014inverse{APACrefauthors}Egri, P., Kis, T., Kovács, A.\BCBL \BBA Váncza, J. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleAn inverse economic lot-sizing approach to eliciting supplier cost parameters An inverse economic lot-sizing approach to eliciting supplier cost parameters.\BBCQ \APACjournalVolNumPagesInternational Journal of Production Economics14980–88. \PrintBackRefs\CurrentBib
- Esfahani \BOthers. (\APACyear2018) \APACinsertmetastaresfahani2018data{APACrefauthors}Esfahani, P., Shafieezadeh-Abadeh, S., Hanasusanto, G.\BCBL \BBA Kuhn, D. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleData-driven inverse optimization with imperfect information Data-driven inverse optimization with imperfect information.\BBCQ \APACjournalVolNumPagesMathematical Programming1671191–234. \PrintBackRefs\CurrentBib
- Fernández-Blanco \BOthers. (\APACyear2019) \APACinsertmetastarfernandez2019ev{APACrefauthors}Fernández-Blanco, R., Morales, J\BPBIM., Pineda, S.\BCBL \BBA Porras, Á. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleEV-Fleet Power Forecasting via Kernel-Based Inverse Optimization Ev-fleet power forecasting via kernel-based inverse optimization.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1908.00399. \PrintBackRefs\CurrentBib
- Goodfellow \BOthers. (\APACyear2016) \APACinsertmetastarGoodfellow-et-al-2016{APACrefauthors}Goodfellow, I., Bengio, Y.\BCBL \BBA Courville, A. \APACrefYear2016. \APACrefbtitleDeep Learning Deep learning. \APACaddressPublisherMIT Press. \APACrefnotehttp://www.deeplearningbook.org \PrintBackRefs\CurrentBib
- Keshavarz \BOthers. (\APACyear2011) \APACinsertmetastarKeshavarz11{APACrefauthors}Keshavarz, A., Wang, Y.\BCBL \BBA Boyd, S. \APACrefYearMonthDay2011. \BBOQ\APACrefatitleImputing a convex objective function Imputing a convex objective function.\BBCQ \BIn \APACrefbtitleIntelligent Control (ISIC), 2011 IEEE International Symposium on Intelligent control (isic), 2011 ieee international symposium on (\BPGS 613–619). \PrintBackRefs\CurrentBib
- Kovács (\APACyear2019) \APACinsertmetastarkovacs2019parameter{APACrefauthors}Kovács, A. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleParameter Elicitation for Consumer Models in Demand Response Management Parameter elicitation for consumer models in demand response management.\BBCQ \BIn \APACrefbtitle2019 1st Global Power, Energy and Communication Conference (GPECOM) 2019 1st global power, energy and communication conference (gpecom) (\BPGS 456–460). \PrintBackRefs\CurrentBib
- Oberdieck \BOthers. (\APACyear2016) \APACinsertmetastarPOP{APACrefauthors}Oberdieck, R., Diangelakis, N., Papathanasiou, M., Nascu, I.\BCBL \BBA Pistikopoulos, E. \APACrefYearMonthDay2016. \BBOQ\APACrefatitlePOP – Parametric Optimization Toolbox POP – Parametric Optimization Toolbox.\BBCQ \APACjournalVolNumPagesIndustrial & Engineering Chemistry Research55338979-8991. {APACrefURL} https://doi.org/10.1021/acs.iecr.6b01913 {APACrefDOI} 10.1021/acs.iecr.6b01913 \PrintBackRefs\CurrentBib
- Pedregosa \BOthers. (\APACyear2011) \APACinsertmetastarscikit{APACrefauthors}Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.\BDBLDuchesnay, E. \APACrefYearMonthDay2011\APACmonth11. \BBOQ\APACrefatitleScikit-Learn: Machine Learning in Python Scikit-learn: Machine learning in python.\BBCQ \APACjournalVolNumPagesJ. Mach. Learn. Res.12null2825–2830. \PrintBackRefs\CurrentBib
- Pistikopoulos \BOthers. (\APACyear2011) \APACinsertmetastarpistikopoulos2011multi{APACrefauthors}Pistikopoulos, E\BPBIN., Georgiadis, M\BPBIC.\BCBL \BBA Dua, V. \APACrefYear2011. \APACrefbtitleMulti-Parametric Programming Multi-parametric programming (\BVOL 1). \PrintBackRefs\CurrentBib
- Rasmussen (\APACyear2004) \APACinsertmetastarrasmussen2004gaussian{APACrefauthors}Rasmussen, C. \APACrefYearMonthDay2004. \BBOQ\APACrefatitleGaussian processes in machine learning Gaussian processes in machine learning.\BBCQ \BIn \APACrefbtitleAdvanced Lectures on Machine Learning Advanced lectures on machine learning (\BPGS 63–71). \APACaddressPublisherSpringer. \PrintBackRefs\CurrentBib
- Russell \BBA Norvig (\APACyear2016) \APACinsertmetastarrussell2016artificial{APACrefauthors}Russell, S\BPBIJ.\BCBT \BBA Norvig, P. \APACrefYear2016. \APACrefbtitleArtificial intelligence: a modern approach Artificial intelligence: a modern approach. \APACaddressPublisherMalaysia; Pearson Education Limited,. \PrintBackRefs\CurrentBib
- Saez-Gallego \BBA Morales (\APACyear2017) \APACinsertmetastarGallego{APACrefauthors}Saez-Gallego, J.\BCBT \BBA Morales, J\BPBIM. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleShort-term forecasting of price-responsive loads using inverse optimization Short-term forecasting of price-responsive loads using inverse optimization.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Smart Grid954805–4814. \PrintBackRefs\CurrentBib
- Shahmoradi \BBA Lee (\APACyear2020) \APACinsertmetastarshahmoradi2020{APACrefauthors}Shahmoradi, Z.\BCBT \BBA Lee, T. \APACrefYearMonthDay2020. \APACrefbtitleQuantile Inverse Optimization: Improving Stability in Inverse Linear Programming. Quantile inverse optimization: Improving stability in inverse linear programming. \PrintBackRefs\CurrentBib
- Tan \BOthers. (\APACyear2019) \APACinsertmetastardeepIO{APACrefauthors}Tan, Y., Delong, A.\BCBL \BBA Terekhov, D. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleDeep Inverse Optimization Deep inverse optimization.\BBCQ \BIn \APACrefbtitleInternational Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research International conference on integration of constraint programming, artificial intelligence, and operations research (\BPGS 540–556). \PrintBackRefs\CurrentBib
- Tan \BOthers. (\APACyear2020) \APACinsertmetastartan2020{APACrefauthors}Tan, Y., Terekhov, D.\BCBL \BBA Delong, A. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleLearning linear programs from optimal decisions Learning linear programs from optimal decisions.\BBCQ \BIn \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BVOL 33). \PrintBackRefs\CurrentBib
- Taskar \BOthers. (\APACyear2005) \APACinsertmetastartaskar2005learning{APACrefauthors}Taskar, B., Chatalbashev, V., Koller, D.\BCBL \BBA Guestrin, C. \APACrefYearMonthDay2005. \BBOQ\APACrefatitleLearning Structured Prediction Models: A Large Margin Approach Learning structured prediction models: A large margin approach.\BBCQ \BIn \APACrefbtitleProceedings of the 22nd International Conference on Machine Learning Proceedings of the 22nd international conference on machine learning (\BPGS 896–903). \PrintBackRefs\CurrentBib
- Tavaslıoğlu \BOthers. (\APACyear2018) \APACinsertmetastartavasliouglu2018{APACrefauthors}Tavaslıoğlu, O., Lee, T., Valeva, S.\BCBL \BBA Schaefer, A\BPBIJ. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleOn the structure of the inverse-feasible region of a linear program On the structure of the inverse-feasible region of a linear program.\BBCQ \APACjournalVolNumPagesOperations Research Letters461147–152. \PrintBackRefs\CurrentBib