Reliable Interval Prediction of Minimum Operating Voltage Based on On-chip Monitors via Conformalized Quantile Regression
Abstract
Predicting the minimum operating voltage () of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertainties caused by different sources of variations. While some existing techniques offer region predictions, but they rely on certain distributional assumptions and/or provide no coverage guarantees. In response to these limitations, we propose a novel distribution-free interval estimation methodology possessing a theoretical guarantee of coverage. Our approach leverages conformalized quantile regression and on-chip monitors to generate reliable prediction intervals. We demonstrate the effectiveness of the proposed method on an industrial 5nm automotive chip dataset. Moreover, we show that the use of on-chip monitors can reduce the interval length significantly for prediction.
Index Terms:
chip performance prediction, on-chip monitors, conformal prediction, quantile regressionI Introduction
Measurement of the minimum operating voltage () is one of the important testing procedures to determine chip performance. It facilitates the detection of inferior products, the conservation of power consumption, and the indication of potential early life failures. As technology nodes keep scaling, tests via structural test patterns (e.g., SCAN) become more and more crucial and necessary to screen out tiny flaws and defects [1] inside chips.
Conventional measurements involve testing chips at a high operating voltage and decreasing step by step until they fail, which is time-consuming. Moreover, such a strategy is exclusively applicable in the manufacturing test process, but not in-field systems. To this end, researchers propose to build machine learning based predictors utilizing low-cost features, such as parametric testing data from the production test flow and on-chip monitor data for the in-field prediction [2, 3, 4, 5]. Many regression models have been explored recently, including linear regression [4], Gaussian Process (GP) [3], and Neural Network (NN) [5]. For instance, Chen demonstrated a low-cost approach to predict the system (the maximum operating frequency) using the structural of flip flops [3] via a GP model, whose kernel hyperparameter length scales are used as indicators of the significance of features. Yin adopted a constrained NN to capture the monotonicity between RO delay and degradation [5]. Although these methods provide promising point estimation for , additional techniques are still required to construct prediction intervals to ensure high coverage of true to account for the uncertainties due to variations of process, voltage, temperature, operating frequency, application mode, etc.
Uncertainty Quantification (UQ) for machine learning provides the model’s confidence interval. Commonly employed UQ methods include 1) Bayesian approaches such as GP [6] and Bayesian neural networks [7], 2) neural networks ensemble [8], and 3) Quantile Regression (QR) [9]. While these methods excel at estimating uncertainty within the training data distribution, their prediction intervals often lack a reliable coverage guarantee for new testing data. Consequently, none of these approaches fully meet the stringent demands of the silicon industry for generating robust intervals to ensure high reliability.
Property | Bayesian | Ensemble | QR | CP | CQR |
Distribution-free | ✗ | ✓ | ✓ | ✓ | ✓ |
Agnostic model | ✗ | ✗ | ✓ | ✓ | ✓ |
Coverage guarantee for test data | ✗ | ✗ | ✗ | ✓ | ✓ |
Adaption to heteroscedasticity | ✓ | ✓ | ✓ | ✗ | ✓ |
Computational efficiency | ✗ | ✗ | ✓ | ✓ | ✓ |
Conformal Prediction (CP) [10] emerges as a promising distribution-free UQ method for constructing intervals based on any point predictor while offering a nonasymptotic coverage guarantee. CP leverages a calibration dataset to assess the uncertainty associated with a fitted regression model by analyzing its prediction residuals. However, vanilla CP exhibits limitations as a region predictor, as it constructs constant intervals for all testing samples, potentially leading to excessive margins for normal chips and inadequate coverage for anomalous ones.
To this end, we propose a distribution-free interval prediction framework with a theoretical coverage guarantee. Our approach leverages Conformalized Quantile Regression (CQR) and on-chip monitors to construct prediction intervals. Our primary contributions are outlined as follows:
We conduct a comprehensive comparison among various point predictors for our industrial dataset. We discover that while no golden model outperforms others for all scenarios, the prediction accuracy of linear regression is competitive overall. Moreover, on-chip monitors are capable of predicting future degradation.
We introduce CQR to the context of interval estimation, showcasing its better performance in terms of coverage rate and interval length when compared to alternative UQ models.
Through empirical analysis, we demonstrate that the inclusion of on-chip monitor data yields substantial improvements in the precision of interval predictions.
II Preliminaries
II-A Point Prediction
For the task of point estimation, in both the product flow and in-field scenarios, the objective remains consistent: utilizing a set of features to predict a single value. We denote these features as a dimension vector , the as a real number , and the point predictor as , parameterized by . Given a training dataset of tested chips , the predictor is optimized by minimizing the mean of a loss function :
(1) |
where is a matrix of inputs, and is a vector of true .
II-B Region Prediction
In manufacturing test processes, engineers often face risks of over-kill or under-kill when relying solely on point predictions to identify abnormal products due to process variations. In in-field scenarios, point estimation can be highly unreliable due to the presence of numerous environmental uncertainties. Consequently, the utilization of prediction intervals becomes essential for effectively detecting outliers and identifying potential failures.
Unlike point estimation, which only generates a single value for an input example, region prediction provides an interval prediction. A region regressor , consisting of a pair and of the lower and the upper bound function, maps a sample to a closed region :
(2) |
Given a coverage rate where and the training dataset , the prediction intervals of a region regressor should be able to cover at least labels:
(3) |
We introduce two well-known region regression methods satisfying Eq. 3: Gaussian process and quantile regression. Their theoretical traits are summarized in Table I.
II-B1 Gaussian Process (GP)
GP is a non-parametric Bayesian method that provides a posterior Gaussian distribution for any testing point [6]. Suppose the posterior mean is and the posterior variance is for sample , we are able to construct an interval satisfying Eq. 3:
(4) |
where , , and is the cumulative distribution function of the standard Gaussian distribution.
II-B2 Quantile Regression (QR)
Apart from traditional regression analysis with Mean Square Error (MSE) loss that estimates the conditional mean of , QR estimates the conditional quantile [9]. Given a quantile , a QR model is trained to minimize the quantile loss [9] in Eq. 1:
(5) |
where is the prediction of quantile .
By selecting two different quantiles and , we can train two quantile regressors, the interval between which achieves the coverage in Eq. 3.
QR can be easily added to any point regressor where its objective is to minimize the MSE loss by applying the pinball loss instead.
III Methodology
III-A Overview of Prediction
Our prediction framework is depicted in Fig. 1, where four stress read points are drawn for illustration. at each stress read point will be predicted. The horizontal dash line (min_spec) stands for the product specification of the minimum operating voltage, i.e., device with higher than that threshold will violate the specification and likely become a failure.
We utilize low-cost parametric data and on-chip data to predict at time zero and subsequent read points during stress simulated in-field life. Note that stress is done at an elevated voltage such that a much shorter stress duration is equivalent to a much longer in-field life. Specifically, two kinds of prediction scenarios are considered: in the production test flow, and in the in-field deployment which is simulated by accelerated stress. In the first case, both production parametric test data and on-chip data are included to build predictors. In the second case, however, we make degradation prediction based on all accessible features before the test timestamp, including production parametric test data at time zero and on-chip data measured at all previous read points during stress. In our industrial dataset, both and on-chip data are collected at the same read point, and the total number of read points is relatively small, i.e., less than 10. In this case, time series methods would suffer over-fitting problems. Thus, we treat on-chip data at different read points as different features, and apply CQR to predict intervals.
Since CQR is originated from CP, we first briefly summarize how CP works, and then present CQR for interval prediction.
III-B Conformal Prediction (CP)
Even though the coverage of prediction intervals is guaranteed for the training dataset in GP and QR, such characteristic is not held for a testing instance :
(6) |
The adoption of the aforementioned two region predictors for new examples is risky without the coverage guarantee.
In semiconductor industry, all chips can be viewed as examples from a hidden distribution: are sampled i.i.d. from a distribution . CP can help to calibrate any heuristic interval to meet the coverage guarantee in Eq. 6 [10]. CP has two main versions: full CP and split CP. In regression tasks, full CP needs infinite times of model fitting, rendering it impossible for practical usage. On the contrary, split CP is more computationally efficient with the scarification of splitting the training dataset.
We outline how split CP utilizes a point predictor to generate a interval for :
Split the training dataset into a new training dataset , and a small calibration dataset such that , and .
Fit the point regressor in .
Compute as the quantile of the conformal score function of absolute residuals in the calibration set :
(7) |
where is the number of examples in .
Construct the interval for , satisfying Eq. 6:
(8) |
III-C Conformalized Quantile Regression (CQR)
While split CP satisfies the coverage guarantee, the length of predicted intervals is , remaining fixed to different inputs. This property may incur overkill for good products and underkill for defective ones. CQR, however, is a variant interval prediction method combining CP and QR together.
We describe the procedures of split CQR:
Split the training dataset .
Fit the quantile regressor in .
Compute as the quantile of the conformal score function in , where
(9) |
Construct the interval for satisfying Eq. 6:
(10) |
IV Experimental Results
IV-A Industrial Dataset
Attribute | Parametric | On-chip (ROD) | On-chip (CPD) |
Quantity | 1800 | 168 | 10 |
Temperature (°C) | -45, 25, 125 | 25 | 80 |
Read point (hour) | 0 | 0, 24, 48, 168, 504, 1008 |
Our experiments use 156 5nm automotive chips to demonstrate the effectiveness of the proposed prediction framework. As shown in Fig. 1, parametric data and on-chip monitor data are considered for prediction. We describe how the input features and the output are collected.
All 156 chips go through the dynamic Dhrystone stress at elevated voltage in Burn-In (BI) oven for 1008 hours to simulate in-field long-term aging degradation. At specific stress read points, i.e., 0, 24, 48, 168, 504, and 1008 hours, we pause the stress process and 1) test SCAN , 2) perform the parametric tests, and 3) collect on-chip monitor data. SCAN is tested on Automatic Test Equipment (ATE) tester, at temperatures of -45°C, 25°C, and 125°C. The parametric tests are also performed on ATE tester, including IDDQ, trip IDD, leakage, etc., across all three temperatures. The chip has two types of on-chip monitors: domain sensors which include Ring Oscillator Delay (ROD) sensors and in-situ Critical Path Delay (CPD) sensors. In our experiment, due to hardware and logistic process limitations, ROD is measured on ATE at room temperature (25°C) only while CPD is measured in-situ in BI oven at 80C. We summarize the traits of input features in Table II.
IV-B Experimental Settings
We illustrate the features used for prediction at each read point and the evaluation metrics for point prediction and interval regression. As shown in Fig. 1, for the prediction of at time 0, both parametric test data and on-chip monitor data collected at time 0 are utilized to predict ; For the prediction of at the subsequent read points to enable in-field failure prediction, we use on-chip monitor data collected at all previous read points and parametric data collected at time 0, because parametric tests are no longer possible once chips are shipped to customers and deployed in-field.
For point prediction, the performance criteria are the coefficient of determination () and Root Mean Square Error (RMSE); For region prediction, the metrics are the average interval length and the coverage of true of the testing data.
To reduce the influence of randomization, a 4-fold cross-validation is adopted. We report the average score of each metric across the 4 testing folds. In CQR, 75% training data are used to train predictors while the remaining 25% chips are held for calibration. To ensure a fair comparison, we use the same random seed for all interval predictors.
IV-C Descriptions of Point Regressors
ML models with fewer learnable parameters and simpler structures are more favorable for our high-dimensional small data scenario. Moreover, feature selection is an essential dimension reduction technique for some ML models to avoid overfitting problems.
Firstly, we demonstrate model selection for point prediction. 5 regressors are considered: Linear Regression (LR), Gaussian Process (GP) [6], XGBoost [12], CatBoost [13], and a 2-layer Neural Network (NN). The detailed configurations of each regressor except LR are provided below:
IV-C1 Gaussian Process
GP utilizes a radial basis function kernel, whose parameters are optimized to maximize the likelihood of training data.
IV-C2 XGBoost
We utilize the default hyper-parameters in the XGBoost Python package.
IV-C3 CatBoost
We utilize the default hyperparameters in the CatBoost Python package except for one hyper-parameter: the number of boosting trees. The default number is 1000, which seems too large for our small dataset including 156 chips, and potentially causes over-fitting. Therefore, we reduce it to 100.
IV-C4 Neural Network
We consider a shallow fully-connected multilayer perceptron (MLP) with one hidden layer containing 16 neurons with Rectified Linear Units (ReLU) [14] activation functions. The optimizer is Adam [15] whose learning rate is 0.01, the number of epochs is 3000, and the weight of penalty is 0.1. These configurations are the same as [5].
Then, we discuss how to select a small set of informative features among thousands of input data. For XGBoost and CatBoost which have an intrinsic feature selection mechanism, all raw data are directly fed to regressors. For the rest of the three methods, we apply Correlation Feature Selection (CFS) [16] with the Pearson correlation to pick 1 to 10 features as input data and report the best testing scores.
IV-D Point Prediction Results
The of point predictions of regression models are depicted in Fig. 2 For SCAN tested at time 0, while CatBoost is the best method across all three temperatures, linear regression is also performing well with a small drop of , which is less than 0.03. For all methods except GP, the RMSE for point predictions are within to ( to for GP) for all scenarios, and exhibiting similar comparison as among different models, i.e., CatBoost performs best for time 0 prediction while linear regression performs reasonably well overall. As linear regression is straightforward to implement by either software or hardware, it is a sufficiently good option for time 0 prediction in industrial production tests.
For degradation prediction, no regression model is outperforming the rest across all temperatures and stress read points, in terms of and RMSE. We note that linear regression is still performing reasonably well, and even the best one for predicting SCAN at 25°C and 125°C, for both and RMSE. With its simplicity, implementing a linear regression model with an on-chip hardware accelerator seems to be a viable option for in-field degradation prediction.
In addition, an interesting observation is that there is no clear reduction of in SCAN degradation prediction accuracy from 0 to 1008 hours. It demonstrates that our design of on-chip monitors captures informative gate-level features that exhibit a strong correlation with system-level .
Stress Time (Hour) | Method | -45°C | 25°C | 125°C | |||
Length () | Coverage (%) | Length () | Coverage (%) | Length () | Coverage (%) | ||
0 | GP | 61.96 | 85.9 | 48.56 | 93.59 | 51.88 | 89.1 |
QR Linear Regression | 51.0 | 91.03 | 14.14 | 83.33 | 15.98 | 83.33 | |
QR Neural Network | 30.44 | 66.84 | 18.28 | 53.91 | 21.33 | 52.83 | |
QR XGBoost | 50.31 | 51.28 | 28.22 | 89.1 | 30.96 | 82.05 | |
QR CatBoost | 2.48 | 10.26 | 0.98 | 14.1 | 1.37 | 24.36 | |
CQR Linear Regression | 53.76 | 92.95 | 17.37 | 95.51 | 19.39 | 91.03 | |
CQR Neural Network | 114.3 | 94.81 | 52.75 | 93.11 | 77.54 | 94.01 | |
CQR XGBoost | 60.84 | 95.51 | 31.91 | 92.95 | 48.48 | 98.72 | |
CQR CatBoost | 24.11 | 91.67 | 13.94 | 92.95 | 12.72 | 91.67 | |
24 | GP | 56.76 | 84.93 | 48.64 | 94.87 | 50.53 | 87.74 |
QR Linear Regression | 26.7 | 85.62 | 18.3 | 80.13 | 13.28 | 85.16 | |
QR Neural Network | 24.19 | 68.67 | 16.33 | 49.52 | 19.78 | 53.68 | |
QR XGBoost | 43.27 | 39.04 | 32.64 | 87.18 | 30.28 | 86.45 | |
QR CatBoost | 1.54 | 3.42 | 1.38 | 19.87 | 1.77 | 20.65 | |
CQR Linear Regression | 43.1 | 99.32 | 20.68 | 89.74 | 17.07 | 95.48 | |
CQR Neural Network | 117.82 | 97.01 | 53.66 | 93.34 | 84.99 | 95.45 | |
CQR XGBoost | 65.3 | 99.32 | 43.5 | 92.95 | 42.41 | 92.9 | |
CQR CatBoost | 27.1 | 97.95 | 16.58 | 94.87 | 15.34 | 93.55 | |
48 | GP | 56.83 | 81.13 | 49.84 | 89.72 | 53.84 | 82.24 |
QR Linear Regression | 29.77 | 84.91 | 20.03 | 81.31 | 13.98 | 82.24 | |
QR Neural Network | 29.66 | 68.04 | 44.71 | 92.05 | 26.14 | 50.79 | |
QR XGBoost | 45.43 | 45.28 | 35.78 | 85.98 | 48.6 | 84.11 | |
QR CatBoost | 1.64 | 11.32 | 1.07 | 16.82 | 1.79 | 19.63 | |
CQR Linear Regression | 36.92 | 93.4 | 29.34 | 94.39 | 20.61 | 93.46 | |
CQR Neural Network | 100.62 | 95.59 | 58.75 | 95.62 | 80.64 | 95.07 | |
CQR XGBoost | 62.81 | 98.11 | 49.82 | 94.39 | 55.12 | 95.33 | |
CQR CatBoost | 24.3 | 95.28 | 29.61 | 96.26 | 19.23 | 89.72 | |
168 | GP | 54.45 | 79.81 | 50.43 | 84.91 | 54.42 | 85.58 |
QR Linear Regression | 26.05 | 81.73 | 44.0 | 89.62 | 12.27 | 81.73 | |
QR Neural Network | 27.74 | 72.68 | 43.56 | 84.12 | 26.03 | 48.32 | |
QR XGBoost | 38.27 | 75.96 | 39.89 | 84.91 | 49.65 | 85.58 | |
QR CatBoost | 1.81 | 19.23 | 0.71 | 13.21 | 1.78 | 20.19 | |
CQR Linear Regression | 36.28 | 92.31 | 51.35 | 94.34 | 17.09 | 89.42 | |
CQR Neural Network | 82.98 | 95.33 | 60.16 | 95.48 | 80.99 | 95.42 | |
CQR XGBoost | 56.65 | 96.15 | 48.61 | 94.34 | 57.75 | 92.31 | |
CQR CatBoost | 28.71 | 93.27 | 20.49 | 91.51 | 20.49 | 92.31 | |
504 | GP | 52.61 | 77.0 | 52.63 | 88.46 | 54.23 | 79.61 |
QR Linear Regression | 25.46 | 83.0 | 37.71 | 88.46 | 26.14 | 88.35 | |
QR Neural Network | 25.51 | 70.39 | 46.33 | 92.16 | 48.65 | 83.49 | |
QR XGBoost | 35.9 | 78.0 | 43.14 | 84.62 | 47.71 | 83.5 | |
QR CatBoost | 1.43 | 12.0 | 1.54 | 18.27 | 2.24 | 20.39 | |
CQR Linear Regression | 31.2 | 91.0 | 45.21 | 93.27 | 32.05 | 94.17 | |
CQR Neural Network | 66.13 | 93.37 | 53.44 | 92.79 | 72.25 | 94.76 | |
CQR XGBoost | 46.81 | 93.0 | 46.83 | 87.5 | 58.74 | 96.12 | |
CQR CatBoost | 21.17 | 96.0 | 19.01 | 92.31 | 16.15 | 94.17 | |
1008 | GP | 53.18 | 78.12 | 52.45 | 91.84 | 53.22 | 82.65 |
QR Linear Regression | 29.75 | 88.54 | 42.63 | 88.78 | 32.28 | 80.61 | |
QR Neural Network | 20.2 | 50.3 | 19.89 | 39.14 | 31.47 | 51.9 | |
QR XGBoost | 37.18 | 79.17 | 45.19 | 84.69 | 46.0 | 82.65 | |
QR CatBoost | 1.72 | 17.71 | 1.64 | 13.27 | 1.89 | 24.49 | |
CQR Linear Regression | 32.3 | 89.58 | 47.25 | 94.9 | 36.53 | 91.84 | |
CQR Neural Network | 78.55 | 98.2 | 66.8 | 93.08 | 65.86 | 92.25 | |
CQR XGBoost | 44.14 | 89.58 | 47.11 | 91.84 | 51.44 | 96.94 | |
CQR CatBoost | 17.64 | 93.75 | 18.7 | 94.9 | 14.68 | 89.8 |
IV-E Descriptions of Region Regressors
We consider three interval prediction methods: GP, QR, and CQR. QR and CQR are built on 4 point regressors: LR, NN, XGBoost, and CatBoost. The configurations of these models are the same as those in Section IV-C. We set and let predictors generate an interval with 5% to 95% coverage.
IV-F Region Prediction Results
The average length of prediction intervals of SCAN and coverage rates are shown in Table III. Both GP and QR underestimate the interval for testing chips, failing to meet the designed coverage rate. CQR, in contrast, successfully calibrates the undercovered interval predictions of QR across all stress read points and temperatures, underscoring the importance of applying conformal prediction for reliable region predictions.
CQR performs differently with different point regression models. The best variant is CQR CatBoost, achieving the shortest intervals with around 90% coverage rate. While LR is competitive for point prediction in Section IV-D, its CQR version predicts larger intervals than CQR CatBoost, especially for SCAN at -45°C and 25°C.
Feature type | Avg Interval Length () | |||
-45°C | 25°C | 125°C | Average | |
Parametric | 29.44 | 24.38 | 22.14 | 25.32 |
On-chip | 29.32 | 22.22 | 19.44 | 23.66 |
On-chip and Parametric | 23.84 | 19.72 | 16.43 | 20.00 |
On-chip monitor gain | 19.02% | 19.11% | 25.79% | 21.01% |
IV-G Benefits of On-chip Monitors
We present evidence supporting the value of on-chip monitor data in the prediction of intervals. Fig. 3 illustrates the interval length of CQR CatBoost with three types of feature sets: 1) parametric test data and on-chip monitor data (same to Section IV-F), 2) parametric test data only, and 3) on-chip monitor data only. In addition, Table IV summarizes the average length across all read points of SCAN during stress.
Compared to utilizing parametric data only, the inclusion of on-chip monitor data results in a reduction of 21.01% in the average interval length. Intriguingly, a CQR CatBoost model relying solely on on-chip monitor data outperforms the same model using only parametric test data, despite the much larger number of parametric data (Table II). This implies the on-chip monitor data could contain more information that facilitates estimation.
V Conclusion
We propose a distribution-free interval estimation framework possessing a statistical coverage guarantee. By harnessing CQR in conjunction with on-chip monitor data, our approach achieves an average interval length of with a 90% coverage rate for true values on our industrial dataset. In the future, we will explore how to embed the proposed method 1) in the production test flow to accelerate the test and enhance the yield while screening out outliers, and 2) in the in-field systems to secure long-term reliability and safety.
Acknowledgment
The content of this paper has been developed with the support of Grant No. 1956313 from the National Science Foundation (NSF) and has also received partial funding from a Long Term University (LTU) grant provided by NXP.
References
- [1] C. He and Y. Yu, “Wafer level stress: Enabling zero defect quality for automotive microcontrollers without package burn-in,” in 2020 IEEE International Test Conference (ITC), 2020, pp. 1–10.
- [2] T.-B. Chan, P. Gupta, A. B. Kahng, and L. Lai, “Ddro: A novel performance monitoring methodology based on design-dependent ring oscillators,” in Thirteenth International Symposium on Quality Electronic Design (ISQED), 2012, pp. 633–640.
- [3] J. Chen, J. Zeng, L.-C. Wang, J. Rearick, and M. Mateja, “Selecting the most relevant structural fmax for system fmax correlation,” in 2010 28th VLSI Test Symposium (VTS), 2010, pp. 99–104.
- [4] W.-C. Lin, C. Chen, C.-H. Hsieh, J. C.-M. Li, E. J.-W. Fang, and S. S.-Y. Hsueh, “Ml-assisted vminbinning with multiple guard bands for low power consumption,” in 2022 IEEE International Test Conference (ITC), 2022, pp. 213–218.
- [5] Y. Yin, R. Chen, C. He, and P. Li, “Domain-specific machine learning based minimum operating voltage prediction using on-chip monitor data,” in 2023 IEEE International Test Conference (ITC), 2023, pp. 99–104.
- [6] D. J. MacKay, Information theory, inference and learning algorithms. Cambridge university press, 2003.
- [7] L. V. Jospin, H. Laga, F. Boussaid, W. Buntine, and M. Bennamoun, “Hands-on bayesian neural networks—a tutorial for deep learning users,” IEEE Computational Intelligence Magazine, vol. 17, no. 2, pp. 29–48, 2022.
- [8] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” Advances in neural information processing systems, vol. 30, 2017.
- [9] R. Koenker and G. Bassett Jr, “Regression quantiles,” Econometrica: journal of the Econometric Society, pp. 33–50, 1978.
- [10] G. Shafer and V. Vovk, “A tutorial on conformal prediction.” Journal of Machine Learning Research, vol. 9, no. 3, 2008.
- [11] Y. Romano, E. Patterson, and E. Candes, “Conformalized quantile regression,” Advances in neural information processing systems, vol. 32, 2019.
- [12] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
- [13] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: unbiased boosting with categorical features,” vol. 31, 2018.
- [14] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML’10. Madison, WI, USA: Omnipress, 2010, p. 807–814.
- [15] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- [16] M. A. Hall, “Correlation-based feature selection for machine learning,” Ph.D. dissertation, The University of Waikato, 1999.