This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Reliable Interval Prediction of Minimum Operating Voltage Based on On-chip Monitors via Conformalized Quantile Regression

Yuxuan Yin Dept. of ECE
University of California
Santa Barbara, USA
[email protected]
   Xiaoxiao Wang NXP Semiconductors
Austin, USA
[email protected]
   Rebecca Chen NXP Semiconductors
Austin, USA
[email protected]
   Chen He NXP Semiconductors
Austin, USA
[email protected]
   Peng Li Dept. of ECE
University of California
Santa Barbara, USA
[email protected]
Abstract

Predicting the minimum operating voltage (Vminsubscript𝑉𝑚𝑖𝑛V_{min}) of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current Vminsubscript𝑉𝑚𝑖𝑛V_{min} prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertainties caused by different sources of variations. While some existing techniques offer region predictions, but they rely on certain distributional assumptions and/or provide no coverage guarantees. In response to these limitations, we propose a novel distribution-free Vminsubscript𝑉𝑚𝑖𝑛V_{min} interval estimation methodology possessing a theoretical guarantee of coverage. Our approach leverages conformalized quantile regression and on-chip monitors to generate reliable prediction intervals. We demonstrate the effectiveness of the proposed method on an industrial 5nm automotive chip dataset. Moreover, we show that the use of on-chip monitors can reduce the interval length significantly for Vminsubscript𝑉𝑚𝑖𝑛V_{min} prediction.

Index Terms:
chip performance prediction, on-chip monitors, conformal prediction, quantile regression

I Introduction

Measurement of the minimum operating voltage (Vminsubscript𝑉𝑚𝑖𝑛V_{min}) is one of the important testing procedures to determine chip performance. It facilitates the detection of inferior products, the conservation of power consumption, and the indication of potential early life failures. As technology nodes keep scaling, Vminsubscript𝑉𝑚𝑖𝑛V_{min} tests via structural test patterns (e.g., SCAN) become more and more crucial and necessary to screen out tiny flaws and defects [1] inside chips.

Conventional Vminsubscript𝑉𝑚𝑖𝑛V_{min} measurements involve testing chips at a high operating voltage and decreasing step by step until they fail, which is time-consuming. Moreover, such a strategy is exclusively applicable in the manufacturing test process, but not in-field systems. To this end, researchers propose to build machine learning based Vminsubscript𝑉𝑚𝑖𝑛V_{min} predictors utilizing low-cost features, such as parametric testing data from the production test flow and on-chip monitor data for the in-field prediction [2, 3, 4, 5]. Many regression models have been explored recently, including linear regression [4], Gaussian Process (GP) [3], and Neural Network (NN) [5]. For instance, Chen demonstrated a low-cost approach to predict the system Fmaxsubscript𝐹𝑚𝑎𝑥F_{max} (the maximum operating frequency) using the structural Fmaxsubscript𝐹𝑚𝑎𝑥F_{max} of flip flops [3] via a GP model, whose kernel hyperparameter length scales are used as indicators of the significance of features. Yin adopted a constrained NN to capture the monotonicity between RO delay and Vminsubscript𝑉𝑚𝑖𝑛V_{min} degradation [5]. Although these methods provide promising point estimation for Vminsubscript𝑉𝑚𝑖𝑛V_{min}, additional techniques are still required to construct prediction intervals to ensure high coverage of true Vminsubscript𝑉𝑚𝑖𝑛V_{min} to account for the uncertainties due to variations of process, voltage, temperature, operating frequency, application mode, etc.

Uncertainty Quantification (UQ) for machine learning provides the model’s confidence interval. Commonly employed UQ methods include 1) Bayesian approaches such as GP [6] and Bayesian neural networks [7], 2) neural networks ensemble [8], and 3) Quantile Regression (QR) [9]. While these methods excel at estimating uncertainty within the training data distribution, their prediction intervals often lack a reliable coverage guarantee for new testing data. Consequently, none of these approaches fully meet the stringent demands of the silicon industry for generating robust Vminsubscript𝑉𝑚𝑖𝑛V_{min} intervals to ensure high reliability.

Table I: Comparison of uncertainty quantification methods
Property Bayesian Ensemble QR CP CQR
Distribution-free
Agnostic model
Coverage guarantee for test data
Adaption to heteroscedasticity
Computational efficiency

Conformal Prediction (CP) [10] emerges as a promising distribution-free UQ method for constructing intervals based on any point predictor while offering a nonasymptotic coverage guarantee. CP leverages a calibration dataset to assess the uncertainty associated with a fitted regression model by analyzing its prediction residuals. However, vanilla CP exhibits limitations as a Vminsubscript𝑉𝑚𝑖𝑛V_{min} region predictor, as it constructs constant intervals for all testing samples, potentially leading to excessive margins for normal chips and inadequate coverage for anomalous ones.

To this end, we propose a distribution-free Vminsubscript𝑉𝑚𝑖𝑛V_{min} interval prediction framework with a theoretical coverage guarantee. Our approach leverages Conformalized Quantile Regression (CQR) and on-chip monitors to construct prediction intervals. Our primary contributions are outlined as follows:

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} We conduct a comprehensive comparison among various Vminsubscript𝑉𝑚𝑖𝑛V_{min} point predictors for our industrial dataset. We discover that while no golden model outperforms others for all scenarios, the prediction accuracy of linear regression is competitive overall. Moreover, on-chip monitors are capable of predicting future Vminsubscript𝑉𝑚𝑖𝑛V_{min} degradation.

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} We introduce CQR to the context of Vminsubscript𝑉𝑚𝑖𝑛V_{min} interval estimation, showcasing its better performance in terms of coverage rate and interval length when compared to alternative UQ models.

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} Through empirical analysis, we demonstrate that the inclusion of on-chip monitor data yields substantial improvements in the precision of interval predictions.

Refer to caption
Figure 1: Vminsubscript𝑉𝑚𝑖𝑛V_{min} prediction flow

II Preliminaries

II-A Point Prediction

For the task of Vminsubscript𝑉𝑚𝑖𝑛V_{min} point estimation, in both the product flow and in-field scenarios, the objective remains consistent: utilizing a set of features to predict a single value. We denote these features as a D𝐷D dimension vector 𝐱D𝐱superscript𝐷\bm{\mathrm{x}}\in\mathbb{R}^{D}, the Vminsubscript𝑉𝑚𝑖𝑛V_{min} as a real number yy\mathrm{y}\in\mathbb{R}, and the point predictor as gp(;𝜽):D:subscript𝑔𝑝𝜽superscript𝐷g_{p}(\cdot;\bm{\theta}):\mathbb{R}^{D}\to\mathbb{R}, parameterized by 𝜽𝜽\bm{\theta}. Given a training dataset of N𝑁N tested chips 𝒟={(𝐱i,yi)}i=1N𝒟superscriptsubscriptsubscript𝐱𝑖subscripty𝑖𝑖1𝑁\mathcal{D}=\{(\bm{\mathrm{x}}_{i},\mathrm{y}_{i})\}_{i=1}^{N}, the predictor is optimized by minimizing the mean of a loss function psubscript𝑝\mathcal{L}_{p}:

𝜽=argmin𝜽p(gp(𝐗;𝜽),𝐲),superscript𝜽subscriptargmin𝜽subscript𝑝subscript𝑔𝑝𝐗𝜽𝐲\bm{\theta}^{*}=\operatorname*{arg\,min}_{\bm{\theta}}\mathcal{L}_{p}\big{(}g_{p}(\bm{\mathrm{X}};\bm{\theta}),\bm{\mathrm{y}}\big{)}, (1)

where 𝐗=[𝐱1,,𝐱N]TN×D𝐗superscriptsubscript𝐱1subscript𝐱𝑁𝑇superscript𝑁𝐷\bm{\mathrm{X}}=[\bm{\mathrm{x}}_{1},\cdots,\bm{\mathrm{x}}_{N}]^{T}\in\mathbb{R}^{N\times D} is a matrix of inputs, and 𝐲=[y1,,yN]TN𝐲superscriptsubscripty1subscripty𝑁𝑇superscript𝑁\bm{\mathrm{y}}=[\mathrm{y}_{1},\cdots,\mathrm{y}_{N}]^{T}\in\mathbb{R}^{N} is a vector of true Vminsubscript𝑉𝑚𝑖𝑛V_{min}.

II-B Region Prediction

In manufacturing test processes, engineers often face risks of over-kill or under-kill when relying solely on Vminsubscript𝑉𝑚𝑖𝑛V_{min} point predictions to identify abnormal products due to process variations. In in-field scenarios, point estimation can be highly unreliable due to the presence of numerous environmental uncertainties. Consequently, the utilization of prediction intervals becomes essential for effectively detecting outliers and identifying potential failures.

Unlike point estimation, which only generates a single value for an input example, region prediction provides an interval prediction. A region regressor gr(;𝜽lo,𝜽hi):D2:subscript𝑔𝑟subscript𝜽𝑙𝑜subscript𝜽𝑖superscript𝐷superscript2g_{r}(\cdot;\bm{\theta}_{lo},\bm{\theta}_{hi}):\mathbb{R}^{D}\to\mathbb{R}^{2}, consisting of a pair gp(;𝜽lo):D:subscript𝑔𝑝subscript𝜽𝑙𝑜superscript𝐷g_{p}(\cdot;\bm{\theta}_{lo}):\mathbb{R}^{D}\to\mathbb{R} and gp(;𝜽hi):D:subscript𝑔𝑝subscript𝜽𝑖superscript𝐷g_{p}(\cdot;\bm{\theta}_{hi}):\mathbb{R}^{D}\to\mathbb{R} of the lower and the upper bound function, maps a sample 𝐱𝐱\bm{\mathrm{x}} to a closed region C(𝐱)𝐶𝐱C(\bm{\mathrm{x}}):

C(𝐱)=[gp(𝐱;𝜽lo),gp(𝐱;𝜽hi)].𝐶𝐱subscript𝑔𝑝𝐱subscript𝜽𝑙𝑜subscript𝑔𝑝𝐱subscript𝜽𝑖C(\bm{\mathrm{x}})=\big{[}g_{p}(\bm{\mathrm{x}};\bm{\theta}_{lo}),\quad g_{p}(\bm{\mathrm{x}};\bm{\theta}_{hi})\big{]}. (2)

Given a coverage rate 1α1𝛼1-\alpha where α[0,1]𝛼01\alpha\in[0,1] and the training dataset 𝒟𝒟\mathcal{D}, the prediction intervals of a region regressor grsubscript𝑔𝑟g_{r} should be able to cover at least 1α1𝛼1-\alpha labels:

{yC(𝐱)|(𝐱,y)𝒟}1α.conditional-sety𝐶𝐱𝐱y𝒟1𝛼\mathbb{P}\big{\{}\mathrm{y}\in C(\bm{\mathrm{x}})|(\bm{\mathrm{x}},\mathrm{y})\in\mathcal{D}\big{\}}\geq 1-\alpha. (3)

We introduce two well-known region regression methods satisfying Eq. 3: Gaussian process and quantile regression. Their theoretical traits are summarized in Table I.

II-B1 Gaussian Process (GP)

GP is a non-parametric Bayesian method that provides a posterior Gaussian distribution for any testing point [6]. Suppose the posterior mean is μ(𝐱)𝜇𝐱\mu(\bm{\mathrm{x}})\in\mathbb{R} and the posterior variance is σ2(𝐱)0superscript𝜎2𝐱0\sigma^{2}(\bm{\mathrm{x}})\geq 0 for sample 𝐱𝐱\bm{\mathrm{x}}, we are able to construct an interval C(𝐱)𝐶𝐱C(\bm{\mathrm{x}}) satisfying Eq. 3:

C(𝐱)=[μ(𝐱)+Kloσ(𝐱),μ(𝐱)+Khiσ(𝐱)],𝐶𝐱𝜇𝐱subscript𝐾𝑙𝑜𝜎𝐱𝜇𝐱subscript𝐾𝑖𝜎𝐱C(\bm{\mathrm{x}})=\big{[}\mu(\bm{\mathrm{x}})+K_{lo}\sigma(\bm{\mathrm{x}}),\quad\mu(\bm{\mathrm{x}})+K_{hi}\sigma(\bm{\mathrm{x}})\big{]}, (4)

where Klo=Φ1(α/2)<0subscript𝐾𝑙𝑜superscriptΦ1𝛼20K_{lo}=\Phi^{-1}(\alpha/2)<0, Khi=Φ1(1α/2)>0subscript𝐾𝑖superscriptΦ11𝛼20K_{hi}=\Phi^{-1}(1-\alpha/2)>0, and ΦΦ\Phi is the cumulative distribution function of the standard Gaussian distribution.

II-B2 Quantile Regression (QR)

Apart from traditional regression analysis with Mean Square Error (MSE) loss that estimates the conditional mean of Vminsubscript𝑉𝑚𝑖𝑛V_{min}, QR estimates the conditional quantile [9]. Given a quantile q[0,1]𝑞01q\in[0,1], a QR model is trained to minimize the quantile loss [9] qsubscript𝑞\mathcal{L}_{q} in Eq. 1:

q(y,y^):=max{q(yy^),(1q)(y^y)},assignsubscript𝑞y^y𝑞y^y1𝑞^yy\mathcal{L}_{q}\big{(}\mathrm{y},\hat{\mathrm{y}}\big{)}:=\max\big{\{}q(\mathrm{y}-\hat{\mathrm{y}}),(1-q)(\hat{\mathrm{y}}-\mathrm{y})\big{\}}, (5)

where y^=gp(𝐱;𝜽)^ysubscript𝑔𝑝𝐱𝜽\hat{\mathrm{y}}=g_{p}(\bm{\mathrm{x}};\bm{\theta}) is the prediction of quantile Vminsubscript𝑉𝑚𝑖𝑛V_{min}.

By selecting two different quantiles qlo=α/2subscript𝑞𝑙𝑜𝛼2q_{lo}=\alpha/2 and qhi=1α/2subscript𝑞𝑖1𝛼2q_{hi}=1-\alpha/2, we can train two quantile regressors, the interval between which achieves the coverage in Eq. 3.

QR can be easily added to any point regressor where its objective is to minimize the MSE loss by applying the pinball loss instead.

III Methodology

III-A Overview of Vminsubscript𝑉𝑚𝑖𝑛V_{min} Prediction

Our Vminsubscript𝑉𝑚𝑖𝑛V_{min} prediction framework is depicted in Fig. 1, where four stress read points are drawn for illustration. Vminsubscript𝑉𝑚𝑖𝑛V_{min} at each stress read point will be predicted. The horizontal dash line (min_spec) stands for the product specification of the minimum operating voltage, i.e., device with Vminsubscript𝑉𝑚𝑖𝑛V_{min} higher than that threshold will violate the specification and likely become a failure.

We utilize low-cost parametric data and on-chip data to predict Vminsubscript𝑉𝑚𝑖𝑛V_{min} at time zero and subsequent read points during stress simulated in-field life. Note that stress is done at an elevated voltage such that a much shorter stress duration is equivalent to a much longer in-field life. Specifically, two kinds of Vminsubscript𝑉𝑚𝑖𝑛V_{min} prediction scenarios are considered: in the production test flow, and in the in-field deployment which is simulated by accelerated stress. In the first case, both production parametric test data and on-chip data are included to build Vminsubscript𝑉𝑚𝑖𝑛V_{min} predictors. In the second case, however, we make Vminsubscript𝑉𝑚𝑖𝑛V_{min} degradation prediction based on all accessible features before the Vminsubscript𝑉𝑚𝑖𝑛V_{min} test timestamp, including production parametric test data at time zero and on-chip data measured at all previous read points during stress. In our industrial dataset, both Vminsubscript𝑉𝑚𝑖𝑛V_{min} and on-chip data are collected at the same read point, and the total number of read points is relatively small, i.e., less than 10. In this case, time series methods would suffer over-fitting problems. Thus, we treat on-chip data at different read points as different features, and apply CQR to predict Vminsubscript𝑉𝑚𝑖𝑛V_{min} intervals.

Since CQR is originated from CP, we first briefly summarize how CP works, and then present CQR for Vminsubscript𝑉𝑚𝑖𝑛V_{min} interval prediction.

III-B Conformal Prediction (CP)

Even though the coverage of prediction intervals is guaranteed for the training dataset 𝒟𝒟\mathcal{D} in GP and QR, such characteristic is not held for a testing instance (𝐱N+1,yN+1)subscript𝐱𝑁1subscripty𝑁1(\bm{\mathrm{x}}_{N+1},\mathrm{y}_{N+1}):

{yN+1C(𝐱N+1)}1α.subscripty𝑁1𝐶subscript𝐱𝑁11𝛼\mathbb{P}\big{\{}\mathrm{y}_{N+1}\in C(\bm{\mathrm{x}}_{N+1})\big{\}}\geq 1-\alpha. (6)

The adoption of the aforementioned two region predictors for new examples is risky without the coverage guarantee.

In semiconductor industry, all chips can be viewed as examples from a hidden distribution: {(𝐱i,yi)}i=1N+1superscriptsubscriptsubscript𝐱𝑖subscripty𝑖𝑖1𝑁1\{(\bm{\mathrm{x}}_{i},\mathrm{y}_{i})\}_{i=1}^{N+1} are sampled i.i.d. from a distribution PXYsubscript𝑃𝑋𝑌P_{XY}. CP can help to calibrate any heuristic interval to meet the coverage guarantee in Eq. 6 [10]. CP has two main versions: full CP and split CP. In regression tasks, full CP needs infinite times of model fitting, rendering it impossible for practical usage. On the contrary, split CP is more computationally efficient with the scarification of splitting the training dataset.

We outline how split CP utilizes a Vminsubscript𝑉𝑚𝑖𝑛V_{min} point predictor gpsubscript𝑔𝑝g_{p} to generate a interval C(𝐱)𝐶𝐱C(\bm{\mathrm{x}}) for 𝐱𝐱\bm{\mathrm{x}}:

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} Split the training dataset 𝒟𝒟\mathcal{D} into a new training dataset 𝒟trsubscript𝒟𝑡𝑟\mathcal{D}_{tr}, and a small calibration dataset 𝒟casubscript𝒟𝑐𝑎\mathcal{D}_{ca} such that 𝒟tr𝒟ca=𝒟subscript𝒟𝑡𝑟subscript𝒟𝑐𝑎𝒟\mathcal{D}_{tr}\cup\mathcal{D}_{ca}=\mathcal{D}, and 𝒟tr𝒟ca=ϕsubscript𝒟𝑡𝑟subscript𝒟𝑐𝑎italic-ϕ\mathcal{D}_{tr}\cap\mathcal{D}_{ca}=\phi.

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} Fit the point regressor gpsubscript𝑔𝑝g_{p} in 𝒟trsubscript𝒟𝑡𝑟\mathcal{D}_{tr}.

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} Compute q^^𝑞\hat{q} as the (M+1)(1α)/M-th𝑀11𝛼𝑀-th\lceil(M+1)(1-\alpha)\rceil/M\text{-th} quantile of the conformal score function s(𝐱,y)𝑠𝐱ys(\bm{\mathrm{x}},\mathrm{y}) of absolute residuals in the calibration set 𝒟casubscript𝒟𝑐𝑎\mathcal{D}_{ca}:

s(𝐱,y)=|ygp(𝐱;𝜽)|,𝑠𝐱yysubscript𝑔𝑝𝐱𝜽s(\bm{\mathrm{x}},\mathrm{y})=|\mathrm{y}-g_{p}(\bm{\mathrm{x}};\bm{\theta})|, (7)

where M𝑀M is the number of examples in 𝒟casubscript𝒟𝑐𝑎\mathcal{D}_{ca}.

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} Construct the interval for 𝐱N+1subscript𝐱𝑁1\bm{\mathrm{x}}_{N+1}, satisfying Eq. 6:

C(𝐱N+1)=[gp(𝐱N+1;𝜽)q^,gp(𝐱N+1;𝜽)+q^].𝐶subscript𝐱𝑁1subscript𝑔𝑝subscript𝐱𝑁1𝜽^𝑞subscript𝑔𝑝subscript𝐱𝑁1𝜽^𝑞C(\bm{\mathrm{x}}_{N+1})=\big{[}g_{p}(\bm{\mathrm{x}}_{N+1};\bm{\theta})-\hat{q},\quad g_{p}(\bm{\mathrm{x}}_{N+1};\bm{\theta})+\hat{q}\big{]}. (8)

III-C Conformalized Quantile Regression (CQR)

While split CP satisfies the coverage guarantee, the length of predicted intervals is 2q^2^𝑞2\hat{q}, remaining fixed to different inputs. This property may incur overkill for good products and underkill for defective ones. CQR, however, is a variant interval prediction method combining CP and QR together.

We describe the procedures of split CQR:

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} Split the training dataset 𝒟𝒟\mathcal{D} .

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} Fit the quantile regressor grsubscript𝑔𝑟g_{r} in 𝒟trsubscript𝒟𝑡𝑟\mathcal{D}_{tr}.

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} Compute q^^𝑞\hat{q} as the (M+1)(1α)/M-th𝑀11𝛼𝑀-th\lceil(M+1)(1-\alpha)\rceil/M\text{-th} quantile of the conformal score function s(𝐱,y)𝑠𝐱ys(\bm{\mathrm{x}},\mathrm{y}) in 𝒟casubscript𝒟𝑐𝑎\mathcal{D}_{ca}, where

s(𝐱,y)=max{gp(𝐱;𝜽lo)y,ygp(𝐱;𝜽hi)}.𝑠𝐱ysubscript𝑔𝑝𝐱subscript𝜽𝑙𝑜yysubscript𝑔𝑝𝐱subscript𝜽𝑖s(\bm{\mathrm{x}},\mathrm{y})=\max\{g_{p}(\bm{\mathrm{x}};\bm{\theta}_{lo})-\mathrm{y},\quad\mathrm{y}-g_{p}(\bm{\mathrm{x}};\bm{\theta}_{hi})\}. (9)

\mathord{\mathchoice{\vbox{\hbox{\scalebox{0.7}{$\displaystyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\textstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptstyle\bullet$}}}}{\vbox{\hbox{\scalebox{0.7}{$\scriptscriptstyle\bullet$}}}}} Construct the interval for 𝐱N+1subscript𝐱𝑁1\bm{\mathrm{x}}_{N+1} satisfying Eq. 6:

C(𝐱N+1)=[gp(𝐱N+1;𝜽lo)q^,gp(𝐱N+1;𝜽hi)+q^].𝐶subscript𝐱𝑁1subscript𝑔𝑝subscript𝐱𝑁1subscript𝜽𝑙𝑜^𝑞subscript𝑔𝑝subscript𝐱𝑁1subscript𝜽𝑖^𝑞C(\bm{\mathrm{x}}_{N+1})=\big{[}g_{p}(\bm{\mathrm{x}}_{N+1};\bm{\theta}_{lo})-\hat{q},\quad g_{p}(\bm{\mathrm{x}}_{N+1};\bm{\theta}_{hi})+\hat{q}\big{]}. (10)

CQR inherits good features of CP and QR, as shown in Table I. It is shown empirically effective in achieving the shortest interval length than CP and QR across 11 datasets while persisting the designed coverage rate [11]. Herein, we adopt it for reliable Vminsubscript𝑉𝑚𝑖𝑛V_{min} interval prediction.

IV Experimental Results

IV-A Industrial Dataset

Table II: Input feature description
Attribute Parametric On-chip (ROD) On-chip (CPD)
Quantity 1800 168 10
Temperature (°C) -45, 25, 125 25 80
Read point (hour) 0 0, 24, 48, 168, 504, 1008

Our experiments use 156 5nm automotive chips to demonstrate the effectiveness of the proposed Vminsubscript𝑉𝑚𝑖𝑛V_{min} prediction framework. As shown in Fig. 1, parametric data and on-chip monitor data are considered for Vminsubscript𝑉𝑚𝑖𝑛V_{min} prediction. We describe how the input features and the output Vminsubscript𝑉𝑚𝑖𝑛V_{min} are collected.

All 156 chips go through the dynamic Dhrystone stress at elevated voltage in Burn-In (BI) oven for 1008 hours to simulate in-field long-term aging degradation. At specific stress read points, i.e., 0, 24, 48, 168, 504, and 1008 hours, we pause the stress process and 1) test SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min}, 2) perform the parametric tests, and 3) collect on-chip monitor data. SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} is tested on Automatic Test Equipment (ATE) tester, at temperatures of -45°C, 25°C, and 125°C. The parametric tests are also performed on ATE tester, including IDDQ, trip IDD, leakage, etc., across all three temperatures. The chip has two types of on-chip monitors: domain sensors which include Ring Oscillator Delay (ROD) sensors and in-situ Critical Path Delay (CPD) sensors. In our experiment, due to hardware and logistic process limitations, ROD is measured on ATE at room temperature (25°C) only while CPD is measured in-situ in BI oven at 80C. We summarize the traits of input features in Table II.

Refer to caption
Figure 2: SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} point prediction

IV-B Experimental Settings

We illustrate the features used for Vminsubscript𝑉𝑚𝑖𝑛V_{min} prediction at each read point and the evaluation metrics for point prediction and interval regression. As shown in Fig. 1, for the prediction of Vminsubscript𝑉𝑚𝑖𝑛V_{min} at time 0, both parametric test data and on-chip monitor data collected at time 0 are utilized to predict Vminsubscript𝑉𝑚𝑖𝑛V_{min}; For the prediction of Vminsubscript𝑉𝑚𝑖𝑛V_{min} at the subsequent read points to enable in-field failure prediction, we use on-chip monitor data collected at all previous read points and parametric data collected at time 0, because parametric tests are no longer possible once chips are shipped to customers and deployed in-field.

For Vminsubscript𝑉𝑚𝑖𝑛V_{min} point prediction, the performance criteria are the coefficient of determination (R2superscript𝑅2R^{2}) and Root Mean Square Error (RMSE); For Vminsubscript𝑉𝑚𝑖𝑛V_{min} region prediction, the metrics are the average interval length and the coverage of true Vminsubscript𝑉𝑚𝑖𝑛V_{min} of the testing data.

To reduce the influence of randomization, a 4-fold cross-validation is adopted. We report the average score of each metric across the 4 testing folds. In CQR, 75% training data are used to train predictors while the remaining 25% chips are held for calibration. To ensure a fair comparison, we use the same random seed for all Vminsubscript𝑉𝑚𝑖𝑛V_{min} interval predictors.

IV-C Descriptions of Vminsubscript𝑉𝑚𝑖𝑛V_{min} Point Regressors

ML models with fewer learnable parameters and simpler structures are more favorable for our high-dimensional small data scenario. Moreover, feature selection is an essential dimension reduction technique for some ML models to avoid overfitting problems.

Firstly, we demonstrate model selection for Vminsubscript𝑉𝑚𝑖𝑛V_{min} point prediction. 5 regressors are considered: Linear Regression (LR), Gaussian Process (GP) [6], XGBoost [12], CatBoost [13], and a 2-layer Neural Network (NN). The detailed configurations of each regressor except LR are provided below:

IV-C1 Gaussian Process

GP utilizes a radial basis function kernel, whose parameters are optimized to maximize the likelihood of training data.

IV-C2 XGBoost

We utilize the default hyper-parameters in the XGBoost Python package.

IV-C3 CatBoost

We utilize the default hyperparameters in the CatBoost Python package except for one hyper-parameter: the number of boosting trees. The default number is 1000, which seems too large for our small dataset including 156 chips, and potentially causes over-fitting. Therefore, we reduce it to 100.

IV-C4 Neural Network

We consider a shallow fully-connected multilayer perceptron (MLP) with one hidden layer containing 16 neurons with Rectified Linear Units (ReLU) [14] activation functions. The optimizer is Adam [15] whose learning rate is 0.01, the number of epochs is 3000, and the weight of L2subscript𝐿2L_{2} penalty is 0.1. These configurations are the same as [5].

Then, we discuss how to select a small set of informative features among thousands of input data. For XGBoost and CatBoost which have an intrinsic feature selection mechanism, all raw data are directly fed to regressors. For the rest of the three methods, we apply Correlation Feature Selection (CFS) [16] with the Pearson correlation to pick 1 to 10 features as input data and report the best testing scores.

IV-D Vminsubscript𝑉𝑚𝑖𝑛V_{min} Point Prediction Results

The R2superscript𝑅2R^{2} of Vminsubscript𝑉𝑚𝑖𝑛V_{min} point predictions of regression models are depicted in Fig. 2 For SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} tested at time 0, while CatBoost is the best method across all three temperatures, linear regression is also performing well with a small drop of R2superscript𝑅2R^{2}, which is less than 0.03. For all methods except GP, the RMSE for Vminsubscript𝑉𝑚𝑖𝑛V_{min} point predictions are within 2.5mV2.5𝑚𝑉2.5mV to 7mV7𝑚𝑉7mV (12mV12𝑚𝑉12mV to 22mV22𝑚𝑉22mV for GP) for all scenarios, and exhibiting similar comparison as R2superscript𝑅2R^{2} among different models, i.e., CatBoost performs best for time 0 prediction while linear regression performs reasonably well overall. As linear regression is straightforward to implement by either software or hardware, it is a sufficiently good option for Vminsubscript𝑉𝑚𝑖𝑛V_{min} time 0 prediction in industrial production tests.

For Vminsubscript𝑉𝑚𝑖𝑛V_{min} degradation prediction, no regression model is outperforming the rest across all temperatures and stress read points, in terms of R2superscript𝑅2R^{2} and RMSE. We note that linear regression is still performing reasonably well, and even the best one for predicting SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} at 25°C and 125°C, for both R2superscript𝑅2R^{2} and RMSE. With its simplicity, implementing a linear regression model with an on-chip hardware accelerator seems to be a viable option for in-field Vminsubscript𝑉𝑚𝑖𝑛V_{min} degradation prediction.

In addition, an interesting observation is that there is no clear reduction of R2superscript𝑅2R^{2} in SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} degradation prediction accuracy from 0 to 1008 hours. It demonstrates that our design of on-chip monitors captures informative gate-level features that exhibit a strong correlation with system-level Vminsubscript𝑉𝑚𝑖𝑛V_{min}.

Table III: Average length and coverage of prediction intervals for SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} across 156 chips
Stress Time (Hour) Method -45°C 25°C 125°C
Length (mV𝑚𝑉mV) Coverage (%) Length (mV𝑚𝑉mV) Coverage (%) Length (mV𝑚𝑉mV) Coverage (%)
0 GP 61.96 85.9 48.56 93.59 51.88 89.1
QR Linear Regression 51.0 91.03 14.14 83.33 15.98 83.33
QR Neural Network 30.44 66.84 18.28 53.91 21.33 52.83
QR XGBoost 50.31 51.28 28.22 89.1 30.96 82.05
QR CatBoost 2.48 10.26 0.98 14.1 1.37 24.36
CQR Linear Regression 53.76 92.95 17.37 95.51 19.39 91.03
CQR Neural Network 114.3 94.81 52.75 93.11 77.54 94.01
CQR XGBoost 60.84 95.51 31.91 92.95 48.48 98.72
CQR CatBoost 24.11 91.67 13.94 92.95 12.72 91.67
24 GP 56.76 84.93 48.64 94.87 50.53 87.74
QR Linear Regression 26.7 85.62 18.3 80.13 13.28 85.16
QR Neural Network 24.19 68.67 16.33 49.52 19.78 53.68
QR XGBoost 43.27 39.04 32.64 87.18 30.28 86.45
QR CatBoost 1.54 3.42 1.38 19.87 1.77 20.65
CQR Linear Regression 43.1 99.32 20.68 89.74 17.07 95.48
CQR Neural Network 117.82 97.01 53.66 93.34 84.99 95.45
CQR XGBoost 65.3 99.32 43.5 92.95 42.41 92.9
CQR CatBoost 27.1 97.95 16.58 94.87 15.34 93.55
48 GP 56.83 81.13 49.84 89.72 53.84 82.24
QR Linear Regression 29.77 84.91 20.03 81.31 13.98 82.24
QR Neural Network 29.66 68.04 44.71 92.05 26.14 50.79
QR XGBoost 45.43 45.28 35.78 85.98 48.6 84.11
QR CatBoost 1.64 11.32 1.07 16.82 1.79 19.63
CQR Linear Regression 36.92 93.4 29.34 94.39 20.61 93.46
CQR Neural Network 100.62 95.59 58.75 95.62 80.64 95.07
CQR XGBoost 62.81 98.11 49.82 94.39 55.12 95.33
CQR CatBoost 24.3 95.28 29.61 96.26 19.23 89.72
168 GP 54.45 79.81 50.43 84.91 54.42 85.58
QR Linear Regression 26.05 81.73 44.0 89.62 12.27 81.73
QR Neural Network 27.74 72.68 43.56 84.12 26.03 48.32
QR XGBoost 38.27 75.96 39.89 84.91 49.65 85.58
QR CatBoost 1.81 19.23 0.71 13.21 1.78 20.19
CQR Linear Regression 36.28 92.31 51.35 94.34 17.09 89.42
CQR Neural Network 82.98 95.33 60.16 95.48 80.99 95.42
CQR XGBoost 56.65 96.15 48.61 94.34 57.75 92.31
CQR CatBoost 28.71 93.27 20.49 91.51 20.49 92.31
504 GP 52.61 77.0 52.63 88.46 54.23 79.61
QR Linear Regression 25.46 83.0 37.71 88.46 26.14 88.35
QR Neural Network 25.51 70.39 46.33 92.16 48.65 83.49
QR XGBoost 35.9 78.0 43.14 84.62 47.71 83.5
QR CatBoost 1.43 12.0 1.54 18.27 2.24 20.39
CQR Linear Regression 31.2 91.0 45.21 93.27 32.05 94.17
CQR Neural Network 66.13 93.37 53.44 92.79 72.25 94.76
CQR XGBoost 46.81 93.0 46.83 87.5 58.74 96.12
CQR CatBoost 21.17 96.0 19.01 92.31 16.15 94.17
1008 GP 53.18 78.12 52.45 91.84 53.22 82.65
QR Linear Regression 29.75 88.54 42.63 88.78 32.28 80.61
QR Neural Network 20.2 50.3 19.89 39.14 31.47 51.9
QR XGBoost 37.18 79.17 45.19 84.69 46.0 82.65
QR CatBoost 1.72 17.71 1.64 13.27 1.89 24.49
CQR Linear Regression 32.3 89.58 47.25 94.9 36.53 91.84
CQR Neural Network 78.55 98.2 66.8 93.08 65.86 92.25
CQR XGBoost 44.14 89.58 47.11 91.84 51.44 96.94
CQR CatBoost 17.64 93.75 18.7 94.9 14.68 89.8

IV-E Descriptions of Vminsubscript𝑉𝑚𝑖𝑛V_{min} Region Regressors

We consider three interval prediction methods: GP, QR, and CQR. QR and CQR are built on 4 point regressors: LR, NN, XGBoost, and CatBoost. The configurations of these models are the same as those in Section IV-C. We set α=0.1𝛼0.1\alpha=0.1 and let predictors generate an interval with 5% to 95% coverage.

IV-F Vminsubscript𝑉𝑚𝑖𝑛V_{min} Region Prediction Results

Refer to caption
Figure 3: The average interval length of CQR CatBoost for SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} prediction

The average length of prediction intervals of SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} and coverage rates are shown in Table III. Both GP and QR underestimate the interval for testing chips, failing to meet the designed coverage rate. CQR, in contrast, successfully calibrates the undercovered interval predictions of QR across all stress read points and temperatures, underscoring the importance of applying conformal prediction for reliable region predictions.

CQR performs differently with different point regression models. The best variant is CQR CatBoost, achieving the shortest intervals with around 90% coverage rate. While LR is competitive for point prediction in Section IV-D, its CQR version predicts larger intervals than CQR CatBoost, especially for SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} at -45°C and 25°C.

Table IV: SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} interval prediction via CQR CatBoost averaged across all stress time read points
Feature type Avg Interval Length (mV𝑚𝑉mV)
-45°C 25°C 125°C Average
Parametric 29.44 24.38 22.14 25.32
On-chip 29.32 22.22 19.44 23.66
On-chip and Parametric 23.84 19.72 16.43 20.00
On-chip monitor gain 19.02% 19.11% 25.79% 21.01%

IV-G Benefits of On-chip Monitors

We present evidence supporting the value of on-chip monitor data in the prediction of Vminsubscript𝑉𝑚𝑖𝑛V_{min} intervals. Fig. 3 illustrates the interval length of CQR CatBoost with three types of feature sets: 1) parametric test data and on-chip monitor data (same to Section IV-F), 2) parametric test data only, and 3) on-chip monitor data only. In addition, Table IV summarizes the average length across all read points of SCAN Vminsubscript𝑉𝑚𝑖𝑛V_{min} during stress.

Compared to utilizing parametric data only, the inclusion of on-chip monitor data results in a reduction of 21.01% in the average interval length. Intriguingly, a CQR CatBoost model relying solely on on-chip monitor data outperforms the same model using only parametric test data, despite the much larger number of parametric data (Table II). This implies the on-chip monitor data could contain more information that facilitates Vminsubscript𝑉𝑚𝑖𝑛V_{min} estimation.

V Conclusion

We propose a distribution-free Vminsubscript𝑉𝑚𝑖𝑛V_{min} interval estimation framework possessing a statistical coverage guarantee. By harnessing CQR in conjunction with on-chip monitor data, our approach achieves an average interval length of 20mV20𝑚𝑉20mV with a 90% coverage rate for true Vminsubscript𝑉𝑚𝑖𝑛V_{min} values on our industrial dataset. In the future, we will explore how to embed the proposed method 1) in the production test flow to accelerate the Vminsubscript𝑉𝑚𝑖𝑛V_{min} test and enhance the yield while screening out outliers, and 2) in the in-field systems to secure long-term reliability and safety.

Acknowledgment

The content of this paper has been developed with the support of Grant No. 1956313 from the National Science Foundation (NSF) and has also received partial funding from a Long Term University (LTU) grant provided by NXP.

References

  • [1] C. He and Y. Yu, “Wafer level stress: Enabling zero defect quality for automotive microcontrollers without package burn-in,” in 2020 IEEE International Test Conference (ITC), 2020, pp. 1–10.
  • [2] T.-B. Chan, P. Gupta, A. B. Kahng, and L. Lai, “Ddro: A novel performance monitoring methodology based on design-dependent ring oscillators,” in Thirteenth International Symposium on Quality Electronic Design (ISQED), 2012, pp. 633–640.
  • [3] J. Chen, J. Zeng, L.-C. Wang, J. Rearick, and M. Mateja, “Selecting the most relevant structural fmax for system fmax correlation,” in 2010 28th VLSI Test Symposium (VTS), 2010, pp. 99–104.
  • [4] W.-C. Lin, C. Chen, C.-H. Hsieh, J. C.-M. Li, E. J.-W. Fang, and S. S.-Y. Hsueh, “Ml-assisted vminbinning with multiple guard bands for low power consumption,” in 2022 IEEE International Test Conference (ITC), 2022, pp. 213–218.
  • [5] Y. Yin, R. Chen, C. He, and P. Li, “Domain-specific machine learning based minimum operating voltage prediction using on-chip monitor data,” in 2023 IEEE International Test Conference (ITC), 2023, pp. 99–104.
  • [6] D. J. MacKay, Information theory, inference and learning algorithms.   Cambridge university press, 2003.
  • [7] L. V. Jospin, H. Laga, F. Boussaid, W. Buntine, and M. Bennamoun, “Hands-on bayesian neural networks—a tutorial for deep learning users,” IEEE Computational Intelligence Magazine, vol. 17, no. 2, pp. 29–48, 2022.
  • [8] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” Advances in neural information processing systems, vol. 30, 2017.
  • [9] R. Koenker and G. Bassett Jr, “Regression quantiles,” Econometrica: journal of the Econometric Society, pp. 33–50, 1978.
  • [10] G. Shafer and V. Vovk, “A tutorial on conformal prediction.” Journal of Machine Learning Research, vol. 9, no. 3, 2008.
  • [11] Y. Romano, E. Patterson, and E. Candes, “Conformalized quantile regression,” Advances in neural information processing systems, vol. 32, 2019.
  • [12] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
  • [13] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: unbiased boosting with categorical features,” vol. 31, 2018.
  • [14] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML’10.   Madison, WI, USA: Omnipress, 2010, p. 807–814.
  • [15] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [16] M. A. Hall, “Correlation-based feature selection for machine learning,” Ph.D. dissertation, The University of Waikato, 1999.