This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantitative Prediction of Fracture Toughness (KIcK_{{\rm I}c}) of Polymer by Fractography Using Deep Neural Networks

\nameY. Mototakea, K. Itob, and M. Demurab aThe Institute of Statistical Mathematics, Tachikawa, Tokyo 190-8562, Japan. bResearch and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan.
Abstract

Fracture surfaces provide various types of information about fracture. The fracture toughness KIcK_{{\rm I}c}, which represents the resistance to fracture, can be estimated using the three-dimensional (3D) information of a fracture surface, i.e., its roughness. However, this is time-consuming and expensive to obtain the 3D information of a fracture surface; thus, it is desirable to estimate KIcK_{{\rm I}c} from a two-dimensional (2D) image, which can be easily obtained. In recent years, methods of estimating a 3D structure from its 2D image using deep learning have been rapidly developed. In this study, we propose a framework for fractography that directly estimates KIcK_{{\rm I}c} from a 2D fracture surface image using deep neural networks (DNNs). Typically, image recognition using a DNN requires a tremendous amount of image data, which is difficult to acquire for fractography owing to the high experimental cost. To compensate for the limited data, in this study, we used the transfer learning (TL) method, and constructed high-performance prediction models even with a small dataset by transferring machine learning models trained using other large datasets. We found that the regression model obtained using our proposed framework can predict KIcK_{{\rm I}c} in the range of approximately 1–5 [MPam\sqrt{m}] with a standard deviation of the estimation error of approximately ±\pm0.37 [MPam\sqrt{m}]. The present results demonstrate that the DNN trained with TL opens a new route for quantitative fractography by which parameters of fracture process can be estimated from a fracture surface even with a small dataset. The proposed framework also enables the building of regression models in a few hours. Therefore, our framework enables us to screen a large number of image datasets available in the field of materials science and find candidates that are worth expensive machine learning analysis.

keywords:
fractography, deep neural networks, transfer learning

1 Introduction

Fracture surfaces provide various types of information on fracture. For example, a dimple-like pattern indicates the occurrence of plastic deformation during fracture and a consequent requirement of a large amount of energy to fracture the material[1]. However, smooth fracture surfaces as seen in cleavage fractures are suggestive of brittle fracture[1]. From a fracture surface, quantitative parameters of the fracture process, such as the fracture toughness KIcK_{{\rm I}c} or Charpy impact absorption, can be estimated. In previous studies, such parameters have been estimated using the three-dimensional (3D) information of a fracture surface, i.e., its roughness. For example, Mandelbrot et al. obtained the 3D landscape of a fracture surface by observing its structural changes while continuously slicing the sample. They found a correspondence between the Charpy impact absorption and the fractal dimension by the fractal analysis of this contour structure[2]. Another study revealed that KIcK_{{\rm I}c} can be estimated by extracting the roughness of a fracture surface measured by stereo imaging using an electron microscope[2, 3]. From an engineering perspective, it is time-consuming and expensive to obtain the 3D information of a fracture surface; thus, it is desirable to estimate parameters of fracture process such as KIcK_{{\rm I}c} from two-dimensional (2D) images, which can be acquired at a relatively low cost. The results of some previous studies have suggested that the contrast in 2D images contains 3D roughness information and that the identification of the fracture type, such as ductile or brittle fracture, is possible[1, 4, 5]. However, this contrast is also affected by the angle of the measurement target, the type of observation probe (such as light or electrons), and the interaction between the probe and the measured sample. Therefore, in general, extracting a 3D structure from contrast is difficult in principle. Thus, compared with previous methods that rely on 3D information, estimating parameters of fracture process such as KIcK_{{\rm I}c} from 2D images is difficult.

In this study, we establish a framework for fractography by which we can directly estimate KIcK_{{\rm I}c} from 2D fracture surface images using a deep neural network (DNN) by the transfer learning (TL) method. In recent years, deep learning methods to estimate the depth position of an object from a single 2D image have been rapidly developed[6, 7]. Thus, deep learning can extract the features of an uneven structure from the contrast of its 2D images under the effects of the above-mentioned complex factors. These recent advances in deep learning research suggest that KIcK_{{\rm I}c} can be directly estimated from a 2D fracture surface image by extracting its features through deep learning. Typically, image recognition using a DNN requires a tremendous amount of image data, but it is difficult to acquire such a large amount of image data for fractography owing to the high cost of experimentation involving specimen preparation, observation, and the measurement of fracture properties. The dataset utilized in this study contains 770 2D fracture surface images with corresponding KIcK_{{\rm I}c}; this is a large number for such types of data but quite small as a training dataset for a regular DNN, as subsequently described in detail. To compensate for the limited data, we used the TL method, which allowed us to construct high-performance prediction models even with a small dataset by transferring machine learning models trained using other large datasets. TL can be used to identify fracture surfaces using DNNs[8, 9, 10, 11, 12]. Thus, it is expected that by using a DNN and TL, a framework for directly estimating KIcK_{{\rm I}c} can be established. Concretely, in this study, we examined whether a DNN can extract the features required for predicting KIcK_{{\rm I}c} from images of fracture surfaces of polymers. The present fracture dataset of polymers, which was collected by the National Institute of Technology and Evaluation[13], includes 770 data items, each being a set of one fracture surface image and its KIcK_{{\rm I}c}. This dataset contains macroscale fracture surface data generated by various fracture processes, such as brittle and ductile fracture processes. The results obtained in this study showed that even with such limited data, KIcK_{{\rm I}c} can be estimated from 2D fracture surface images using the established DNN and TL framework.

The remainder of this paper is organized as follows. First, in Sec. II, we describe the fracture toughness test data of the polymeric materials used in this study. In Sec. III, the DNN model used for TL and the regression model are explained. In Sec. IV, the results of the analysis are presented, and in Sec. V, the results are discussed and summarized.

2 Database of polymer fracture toughness test

Refer to caption
Figure 1: Upper figure: schematic of fracture toughness test. A notched sample is fractured by adding force, and the fracture surface is observed. Lower figures: examples of fracture surface image obtained in the test[13].

In this study, the “Polymer Fracture Database”[13] is used, which is provided by the National Institute of Technology and Evaluation (NITE). The development of this database as a collaborative effort of NITE, Yamagata University, Osaka Municipal Technical Research Institute, and Meiji University as a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO) began in 1996. Since 2020, the database has been available on the Materials Data Repositry (MDR) managed by the National Institute for Materials Science (NIMS).

This database was established to develop a widely used technological infrastructure to enhance the safety of materials, such as the development of technologies for investigating the causes of accidents arising from the fracture or failure of plastic products. The database contains test data for various polymeric materials used in society, which were obtained from the fracture toughness, tensile, fatigue, creep, essential work of fracture (EWF), and scratch tests. In this paper, data from the fracture toughness test were used, because such data (upper figure of Fig. 1) are the most comprehensive. For the fracture toughness test, data were obtained for various types of samples with different thicknesses, compositions, grades, molding methods, mold temperatures, injection speeds, cylinder temperatures, and with/without welds, at different test temperatures and speeds. The test temperatures were centered at room temperature and generally ranged from 40-40 °C to +60+60 °C. The database contains macro-photographs taken of about a 10 mm square of the fracture surface for each set of the above conditions (lower figures of Fig. 1). One fracture surface image was included for each of the 1248 test conditions. Among the parameters in the database that are expected to be estimable, as mentioned in Introduction, KIcK_{{\rm I}c} for plane strain was selected in this study because it is the main parameter that one would aim to obtain in a fracture toughness test, and it has always been recorded in many tests in the database.

The database includes comma separated values (CSV) files listing information for all tests, PDF files containing fracture surface photos for each set of test conditions, and tab separated values (TSV) files containing load-displacement curve data. Each test was assigned an identification code. Since the fracture surface images were recorded in the lossy compressed JPEG format in the PDF file of each test and bitmap data without degradation could not be obtained, the image data were extracted, as it is in the JPEG format, using the pdfimages[14] command. Note that the extracted image file had its left and right sides reversed from when the PDF was displayed, so the left and right sides were reversed using the convert command of Imagemagick[15]. From among all test data, 770 data records were obtained by extracting test data in which KIcK_{{\rm I}c} was recorded and which included fracture surface data. These 770 data records were used as the dataset for the analysis in this study.

3 Method

Refer to caption
Figure 2: Conceptual diagram of the proposed transfer learning framework for estimating KIcK_{{\rm I}c}.

In this section, we describe the method adopted for building a predictive model of KIcK_{{\rm I}c} using feature-extraction-type transfer learning[16], which is a combination of a DNN trained on natural images and Gaussian process regression[17]. Specifically, we developed a model that uses a DNN as a feature extractor and regresses its features and KIcK_{{\rm I}c} by Gaussian process regression.

The DNN model used in this study was the vgg16 model[18], which identifies natural images such as animals and vehicles. vgg16 is composed of convolution, pooling, and fully connected layers. A convolution layer convolves the input space using a convolution filter. Convolution filters of DNNs trained on natural images frequently behave similarly to Gabor filters. A Gabor filter can extract image features such as the edges of graphic structures in images. In vgg16, a structure comprising a stack of convolution layers alternated with pooling layers is formed, and the pooling layers compress the dimensions by replacing certain regions of the convolution layers with representative values such as the maximum and average of the region. Pooling can be viewed as a transformation that renders the input space coarse grained. Thus, after training on natural images, vgg16 is expected to extract useful features on various scales for natural images. In the prediction of KIcK_{{\rm I}c} using fracture surface images, the evaluation of the edges of the graphic structures in the images is also considered important. Depth estimation research using images has shown that using a DNN pretrained with the natural image dataset called ImageNet[19] as a feature extractor improves the accuracy of depth estimation after fine-tuning, compared with the method without pretraining[7]. This finding suggests that DNNs trained on Imagenet can extract the image features necessary for depth estimation. As noted in Introduction, information on depth is useful in predicting KIcK_{{\rm I}c}. Thus, the image features extracted by vgg16 trained on natural images at each convolution layer are considered to yield useful features for predicting KIcK_{{\rm I}c}. Therefore, we employed feature-extraction-type transfer learning[16], that used vgg16 trained on natural images as an extractor of image features to construct a prediction model of KIcK_{{\rm I}c}. With such transfer learning, a framework can be constructed for learning a large amount of data at a lower cost owing to the exclusion of the fine-tuning of the DNN. However, in the absence of fine-tuning, if certain image features required for the prediction of KIcK_{{\rm I}c} are not included in the set of image features required for the classification of Imagenet, then they remain missing. In this study, we examined whether the prediction of KIcK_{{\rm I}c} is still possible under such conditions.

The predictive model of KIcK_{{\rm I}c} was constructed as follows. First, we prepared vgg16 (top figure of Fig. 2) trained on the ImageNet dataset. In this study, publicly available learned weights[20] were used as the trained vgg16. In addition, a library[21] using TensorFlow was adopted as a prototype for the vgg16 implementation. When an image is input to the pretrained vgg16, the features are obtained as the outputs of the five pooling and three fully connected layers referred to as Layer 1–Layer 5 and Fc 1–Fc 3, respectively. We denote one of these DNN features as m{Layer 1,Layer 2,Layer 3,Layer 4,Layer 5,Fc 1,Fc 2,Fc 3}m\in\{{\rm Layer\>1,Layer\>2,Layer\>3,Layer\>4,Layer\>5,Fc\>1,Fc\>2,Fc\>3}\}. Let 𝐡mi\mathbf{h}^{i}_{m} be a dmd_{m}-dimensional DNN feature for a certain fracture image ii. Because the number of data N=770N=770 was much smaller than dmd_{m}, 𝐡mi\mathbf{h}^{i}_{m} was compressed to NN dimensions by principal component analysis (PCA) to reduce the computational complexity. Let 𝐱mi\mathbf{x}^{i}_{m} be a compressed DNN feature. Note that in a PCA that compresses data into the dimension of the sample size, information on the covariance of the dataset is not lost. Therefore, a regression model should be learned to predict KIcK_{{\rm I}c} perfectly consistent with that in the uncompressed case in Gaussian process regression. Each fracture surface image, which was approximately 400×400400\times 400 pixels in size, was prescaled using zero-order spline interpolation[22] to ensure that it had the same number of pixels as the natural image (224×224224\times 224). A predictive model of KIcK_{{\rm I}c} was constructed by Gaussian process regression using the dimension-compressed DNN feature 𝐱mi\mathbf{x}^{i}_{m} as the explanatory variable (bottom figure of Fig. 2). Note that in the entire regression model consisting of vgg16 and Gaussian process regression, only the regression parameters of the latter were trained on the fracture surface image data, and the DNN model used the parameters from the pretraining as is. Gaussian process regression allows the flexible construction of regression models by appropriately choosing a kernel k(,)k(\cdot,\cdot). In this study, the rational quadratic kernel

k(𝐱mi,𝐱mj):=θ02(1+|𝐱mi𝐱mj|222θ1θ22)θ1\displaystyle k(\mathbf{x}^{i}_{m},\mathbf{x}^{j}_{m}):=\theta_{0}^{2}\left(1+\frac{\left|\mathbf{x}^{i}_{m}-\mathbf{x}^{j}_{m}\right|_{2}^{2}}{2\theta_{1}\theta_{2}^{2}}\right)^{-\theta_{1}} (1)

and the Gaussian kernel

ϕ(𝐱n,𝐱m)=θ02exp[12θ12𝐱n𝐱m2]\displaystyle\phi(\mathbf{x}_{n},\mathbf{x}_{m})=\theta_{0}^{2}\exp\left[-\frac{1}{2\theta_{1}^{2}}\left\lVert\mathbf{x}_{n}-\mathbf{x}_{m}\right\rVert^{2}\right] (2)

were the candidates for k(,)k(\cdot,\cdot). Here, θk\theta_{k}, which determines the shape of the kernel, is the hyperparameters of the Gaussian process regression.

To construct such a regression model, the DNN feature mm, the kernel function type kk, and the kernel function hyperparameters θk\theta_{k} must be properly selected. In this study, these parameters are selected on the basis of a Bayesian inference framework[23]. For the selection of mm and kk, the Bayesian model selection framework is employed. On the basis of an indicator called Bayesian free energy[24], the Bayesian model selection framework selects the model that considers discrete-valued hyperparameters such as mm or kk (see Appendix A). In general, when using machine learning to build predictive models, it is necessary to prevent over-fitting, where the model over-fits the training data and loses predictive performance. Using Bayesian free energy as an indicator for model selection, it is possible to select a model in which over-fitting does not occur. θk\theta_{k}, that is, the continuous-valued hyperparameter, can also be estimated on the basis of Bayesian free energy (see Appendix A). Thus, the model and hyperparameters are selected so as to have a lower Bayesian free energy. θk\theta_{k} was optimized using the L-BFGS-B algorithm [25], a type of quasi-Newtonian method. Newton’s method sometimes gives a local solution depending on the initial values of the parameters to be optimized. Therefore, in this study, we searched for initial values that minimize the Bayesian free energy in the range shown in Table 1. The brackets (,)(\>,\>) to the right of each parameter in Table 1 indicate how to set the grid. The first element in the brackets is a scale of the space to be gridded and the second element is the number of grids.

For comparison with the case where the DNN feature is not used, a regression of KIcK_{{\rm I}c} was performed directly from the fracture surface images by Gaussian process regression with the same procedure as in the case of DNN features. Hereafter, the obtained regression model is referred to as the “direct regression” model.

Table 1: Search range for hyperparameters of kernel
Rational quadratic kernel Gaussian kernel
θ0\theta_{0}(linear, 5 grids) θ1\theta_{1}(log, 5 grids) θ2\theta_{2}(log, 10 grids) θ0\theta_{0}(linear, 20 grids) θ1\theta_{1}(log, 20 grids)
0.01–0.2 102.010^{-2.0}102.010^{2.0} 101.010^{1.0}101.210^{1.2} 0.01–0.2 101.010^{1.0}103.010^{3.0}

4 Results and Discussion

Refer to caption
Figure 3: Results of Bayesian model selection for DNN features and kernel types.

As a result of Bayesian model selection, Layer 3 with a rational quadratic kernel is the optimal model for predicting KIcK_{{\rm I}c}, where the Bayesian free energy is minimum (Fig. 3). For all DNN features, the Bayesian free energy with the rational quadratic kernel is always lower than that with the Gaussian kernel. This indicates that the rational quadratic kernel is always a plausible kernel type for any DNN feature. In terms of optimal DNN features, Layers 3 and 4 were confirmed to be plausible features specifically with a lower Bayesian free energy than the others, regardless of the kernel type. The most plausible regression model with the minimum Bayesian free energy was obtained when the rational quadratic kernel was selected as the kernel function and Layer 3 for the DNN features. The difference in the Bayesian free energy of the regression model using Layer 3 and Layer 4 features in the rational quadratic kernel was only about 3.36. However, since Bayesian free energy is defined as the logarithm of the likelihood function, which expresses the likelihood of the model (Appendix A), the features of Layer 3 are more than exp(3.36)=28\exp(3.36)=28 times more likely than those of Layer 4 when compared in the original linear space. Therefore, a model that performs regression using a rational quadratic kernel with Layer 3 is selected here. This selected regression model has a lower Bayesian free energy than direct regression. It is noteworthy that under the same kernel function, there are DNN features that yield regression models with lower performance in terms of Bayesian free energy than direct regression. For example, in the rational quadratic kernel, only Layers 3 and 4 had a lower Bayesian free energy than direct regression, and other DNN features such as convolution and fc layers produced regression models with a higher Bayesian free energy than direct regression. Under a feature extraction type of transfer learning, higher-layer features than the last layer of the convolution layer[16, 26, 27, 28, 29], Layer 5 in our model, are often used, but in the case of this study, they were not considered. In other words, depending on the material to be analyzed and the mechanical properties to be estimated, transfer learning would not be effective unless the appropriate Layer is selected. The selected Layer 3 corresponds to a resolution of about 10%\% of the input image. This means that the spatial scale of the best DNN feature is on a relatively coarse-grained scale. It is suggested that a complex feature, different from the simple Gabor filter, is useful for KIcK_{{\rm I}c} estimation.

Refer to caption
Figure 4: Integrated prediction results for test data in all test folds, which are data not used for training. (a) Result of direct regression from fracture surface images using Gaussian process regression with rational quadratic kernel. (b) Result of regression from Layer 3 feature using Gaussian process regression with rational quadratic kernel.

The regression of KIcK_{{\rm I}c} was performed using the DNN features, the kernel type, and the kernel hyperparameters θk\theta_{k} selected in terms of Bayesian free energy. Figure 4 shows the regression results. To use all the data effectively, we estimated the prediction performance on the basis of the framework of the 20-fold cross-validation method. The black dots are the integrated prediction results for test data in all test folds where data are not used for training. The horizontal axis is the measured value and the vertical axis is the predicted value of the constructed model. The left figure shows the result of predicting KIcK_{{\rm I}c} directly from the fracture image (direct regression model). The direct regression returns a constant of around 3 for most of the cases, and there is virtually no correlation between the predicted and measured values. This indicates that the direct regression method cannot extract enough information to predict KIcK_{{\rm I}c}. Note that the regression model using DNN feature extraction with Layer 3 has high accuracy, as shown by Fig. 4(b). The R2R^{2} value of the prediction model using Layer 3 features was 0.65 (correlation coefficient is 0.81). We also found that our model can predict KIcK_{{\rm I}c} with a standard deviation of the estimation error of about ±\pm0.37 [MPam\sqrt{m}] in the range of around 1–5 [MPam\sqrt{m}] (Fig. 4). The high accuracy demonstrates that it is possible to predict KIcK_{{\rm I}c} using only the 2D fracture image without 3D height information. The high accuracy also suggests that the morphology of the fracture surface contains some information about resistance to fracture.

Thus, the comparison with the poor result of the direct prediction method proves that the transfer learning using a DNN trained on natural images as a feature extractor is crucial in building a predictor for KIcK_{{\rm I}c} that has high estimation accuracy. It was also revealed that the importance of the proper use of DNN information, i.e., the selective use of Layer 3 in this case, should be emphasized.

5 Summary

The present result demonstrates that the DNN trained with TL opens the way to quantitative fractography in which one can estimate the parameters of fracture process from the fracture surface image even with a small dataset. In this study, the DNN pretrained on the Imagenet dataset was used directly as a feature extractor. Fine-tuning the DNN with fracture surface image data is expected to improve the performance of the model obtained in this study. It is also assumed that a higher estimation accuracy of KIcK_{{\rm I}c} could be achieved by using not only the fracture surface image but also the additional explanatory variables describing the target material, such as the production conditions or the operating environment.

The proposed framework, which enables the construction of regression models in a few hours, can be used for the low-cost screening of profitable candidates on which to apply machine learning analysis to estimate values representing mechanical property from a large number of image datasets available in the field of materials science. Therefore, if the utilization of the proposed framework is expanded, it is expected that machine learning methods in materials science will be used more broadly, quickly, and efficiently.

Acknowledgements

This work was supported by the Council for Science, Technology and Innovation (CSTI), Cross-Ministerial Strategic Innovation Promotion Program (SIP), “Structural Materials for Innovation” and “Materials Integration for Revolutionary Design System of Structural Materials” (funding agency: JST), JST PRESTO Grant Number JPMJPR212A, and JSPS KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas “Discrete Geometry for Exploring Next-Generation Materials” 20H04648 and 22K13979.

Appendix A Gaussian process regression and Bayesian free energy

Let 𝐱mi\mathbf{x}^{i}_{m} be the N-dimensional compressed output of the mm-th layer using PCA when a certain fracture surface image ii is input to the pretrained DNN. Also, let the KIcK_{{\rm I}c} value of fracture surface image ii be yiy^{i}. Suppose there are NN samples of 𝐱mi\mathbf{x}^{i}_{m} and yiy^{i}. Let them form the matrix 𝐗:=(𝐱m1,𝐱m2𝐱mN)\mathbf{X}:=(\mathbf{x}^{1}_{m},\mathbf{x}^{2}_{m}\cdots\mathbf{x}^{N}_{m}) and the vector 𝐘:=(y1,y2yN)T\mathbf{Y}:=(y_{1},y_{2}\cdots y_{N})^{T}.

Unlike regression based on the maximum likelihood method, Gaussian process regression does not involve the estimation of parameters of the regression function, such as regression coefficients in linear regression. In Gaussian process regression, the regression function 𝐅:=𝐅(𝐗)=(f(𝐱m1),f(𝐱m2)f(𝐱mN))\mathbf{F}:=\mathbf{F}(\mathbf{X})=(f(\mathbf{x}^{1}_{m}),f(\mathbf{x}^{2}_{m})\cdots f(\mathbf{x}^{N}_{m})) itself is the target of estimation, and a likelihood function in which 𝐅\mathbf{F} is the probability variable is introduced as

P(𝐘|𝐅)=1Zexp[12(𝐘𝐅)Tβ𝐈(𝐘𝐅)],\displaystyle{\rm P}(\mathbf{Y}|\mathbf{F})=\frac{1}{Z}\exp\left[-\frac{1}{2}\left(\mathbf{Y}-\mathbf{F}\right)^{T}\beta\mathbf{I}\left(\mathbf{Y}-\mathbf{F}\right)\right], (3)

where ZZ is the normalization constant and 𝐈\mathbf{I} is the identity matrix. From Bayes’ theorem P(B|A)=P(A|B)P(B)P(A)P(B|A)=\frac{P(A|B)P(B)}{P(A)}, the regression function 𝐅\mathbf{F} is estimated as the posterior probability distribution P(𝐅|𝐘){\rm P}(\mathbf{F}|\mathbf{Y}) as

P(𝐅|𝐘)\displaystyle{\rm P}(\mathbf{F}|\mathbf{Y}) =\displaystyle= P(𝐘|𝐅)P(𝐅)P(𝐘)\displaystyle\frac{{\rm P}(\mathbf{Y}|\mathbf{F}){\rm P}(\mathbf{F})}{{\rm P}(\mathbf{Y})} (4)
=\displaystyle= 1Zexp[β2(𝐘𝐅)T𝐈(𝐘𝐅)]1Z2exp(12𝐅T𝐊1𝐅)P(𝐘),\displaystyle\frac{\frac{1}{Z}\exp\left[-\frac{\beta}{2}\left(\mathbf{Y}-\mathbf{F}\right)^{T}\mathbf{I}\left(\mathbf{Y}-\mathbf{F}\right)\right]\frac{1}{Z_{2}}\exp\left(-\frac{1}{2}\mathbf{F}^{T}\mathbf{K}^{-1}\mathbf{F}\right)}{{\rm P}(\mathbf{Y})}, (5)
𝐊\displaystyle\mathbf{K} :=\displaystyle:= 𝐊(𝐗,𝐗)=(k(𝐱m1,𝐱m1)k(𝐱m1,𝐱mi)k(𝐱m1,𝐱mN)k(𝐱mi,𝐱m1)k(𝐱mi,𝐱mi)k(𝐱mi,𝐱mN)k(𝐱mN,𝐱m1)k(𝐱mN,𝐱mi)k(𝐱mN,𝐱mN)),\displaystyle\mathbf{K}(\mathbf{X},\mathbf{X})=\left(\begin{matrix}k(\mathbf{x}^{1}_{m},\mathbf{x}^{1}_{m})&\cdots&k(\mathbf{x}^{1}_{m},\mathbf{x}^{i}_{m})&\cdots&k(\mathbf{x}^{1}_{m},\mathbf{x}^{N}_{m})\\ \vdots&\ddots&&&\vdots\\ k(\mathbf{x}^{i}_{m},\mathbf{x}^{1}_{m})&&k(\mathbf{x}^{i}_{m},\mathbf{x}^{i}_{m})&&k(\mathbf{x}^{i}_{m},\mathbf{x}^{N}_{m})\\ \vdots&&&\ddots&\vdots\\ k(\mathbf{x}^{N}_{m},\mathbf{x}^{1}_{m})&\cdots&k(\mathbf{x}^{N}_{m},\mathbf{x}^{i}_{m})&\cdots&k(\mathbf{x}^{N}_{m},\mathbf{x}^{N}_{m})\\ \end{matrix}\right), (6)

where the prior distribution P(𝐅){\rm P}(\mathbf{F}) is assumed to be an NN-dimensional normal distribution with mean 0 and variance–covariance matrix 𝐊\mathbf{K}, and k(𝐱mi,𝐱mj)k(\mathbf{x}^{i}_{m},\mathbf{x}^{j}_{m}) is a kernel function that indicates the similarity between 𝐱mi\mathbf{x}^{i}_{m} and 𝐱mj\mathbf{x}^{j}_{m}. In this study, the rational quadratic kernel

k(𝐱mi,𝐱mj):=θ02(1+|𝐱mi𝐱mj|222θ1θ22)θ1\displaystyle k(\mathbf{x}^{i}_{m},\mathbf{x}^{j}_{m}):=\theta_{0}^{2}\left(1+\frac{\left|\mathbf{x}^{i}_{m}-\mathbf{x}^{j}_{m}\right|_{2}^{2}}{2\theta_{1}\theta_{2}^{2}}\right)^{-\theta_{1}} (7)

and the Gaussian kernel

k(𝐱m,𝐱m)=exp[12θ12𝐱m𝐱m2]+θ0\displaystyle k(\mathbf{x}_{m},\mathbf{x}_{m})=\exp\left[-\frac{1}{2\theta_{1}^{2}}\left\lVert\mathbf{x}_{m}-\mathbf{x}_{m}\right\rVert^{2}\right]+\theta_{0} (8)

are employed as candidates for the kernel function. The hyperparameters of each type of kernel are defined as θRational={θ0,θ1,θ2}\theta_{\rm Rational}=\{\theta_{0},\theta_{1},\theta_{2}\} and θGauss={θ0,θ1}\theta_{\rm Gauss}=\{\theta_{0},\theta_{1}\}. By reorganizing Eq. (6), it is revealed that the posterior distribution is a normal distribution with respect to 𝐅\mathbf{F}.

P(𝐅|𝐘)\displaystyle{\rm P}(\mathbf{F}|\mathbf{Y}) =\displaystyle= 1Zexp{12[𝐅(𝐊+β1𝐈)1𝐊𝐘]T[𝐊1+βI][𝐅(𝐊+β1𝐈)1𝐊𝐘]},\displaystyle\frac{1}{Z^{\prime}}\exp\left\{-\frac{1}{2}\left[\mathbf{F}-\left(\mathbf{K}+\beta^{-1}\mathbf{I}\right)^{-1}\mathbf{K}\mathbf{Y}\right]^{T}\left[\mathbf{K}^{-1}+\beta I\right]\left[\mathbf{F}-\left(\mathbf{K}+\beta^{-1}\mathbf{I}\right)^{-1}\mathbf{K}\mathbf{Y}\right]\right\},\>\>\>\>\>\> (9)

where ZZ^{\prime} is the normalization constant. This type of regression in which the prior and posterior distributions are normal distributions is referred to as Gaussian process regression. Using the obtained regression model P(𝐅|𝐘){\rm P}(\mathbf{F}|\mathbf{Y}) [Eq. (9)], the predicted value ynewy^{\rm new} at the unknown point 𝐱mnew\mathbf{x}_{m}^{\rm new} is given as the probability P(ynew|𝐘,m,k,θk){\rm P}(y^{\rm new}|\mathbf{Y},m,k,\theta_{k}):

P(ynew|𝐘,m,k,θk)\displaystyle{\rm P}(y^{\rm new}|\mathbf{Y},m,k,\theta_{k}) =\displaystyle= 𝑑𝐅P(ynew|𝐅,θk,m,k)P(𝐘|𝐅,θk,m,k)P(𝐅|θk,m,k)P(θk,m,k)P(𝐘|θk,m,k)\displaystyle\frac{\int_{-\infty}^{\infty}d\mathbf{F}{\rm P}(y_{\rm new}|\mathbf{F},\theta_{k},m,k){\rm P}(\mathbf{Y}|\mathbf{F},\theta_{k},m,k){\rm P}(\mathbf{F}|\theta_{k},m,k){\rm P}(\theta_{k},m,k)}{{\rm P}(\mathbf{Y}|\theta_{k},m,k)}\>\>\>\>\>\>\>\> (10)
\displaystyle\propto exp[12(ynewΣ221Σ21T𝐘)TΣ22(ynewΣ221Σ21T𝐘)],\displaystyle\exp\left[-\frac{1}{2}\left(y_{\rm new}-\Sigma_{22}^{-1}\Sigma_{21}^{T}\mathbf{Y}\right)^{T}\Sigma_{22}(y_{\rm new}-\Sigma_{22}^{-1}\Sigma_{21}^{T}\mathbf{Y})\right], (11)
Σ22\displaystyle\Sigma_{22} =\displaystyle= (Λ22Λ21TΛ111Λ21)1,Σ21=(Λ22Λ21TΛ111Λ21)1Λ21Λ111,\displaystyle\left(\Lambda_{22}-\Lambda_{21}^{T}\Lambda_{11}^{-1}\Lambda_{21}\right)^{-1},\>\>\Sigma_{21}=\left(\Lambda_{22}-\Lambda_{21}^{T}\Lambda_{11}^{-1}\Lambda_{21}\right)^{-1}\Lambda_{21}\Lambda_{11}^{-1}, (12)
𝚲\displaystyle\mathbf{\Lambda} =\displaystyle= (Λ11Λ12Λ21Λ22)=(𝑲(𝐗,𝐗)+β1𝐈𝑲(𝐗,𝐱mnew)𝑲(𝐱mnew,𝐗)k(𝐱mnew,𝐱mnew)+β1),\displaystyle\left(\begin{matrix}\Lambda_{11}&\Lambda_{12}\\ \Lambda_{21}&\Lambda_{22}\end{matrix}\right)=\left(\begin{matrix}\boldsymbol{K}(\mathbf{X},\mathbf{X})+\beta^{-1}\mathbf{I}&\boldsymbol{K}(\mathbf{X},\mathbf{x}^{\rm new}_{m})\\ \boldsymbol{K}(\mathbf{x}^{\rm new}_{m},\mathbf{X})&k(\mathbf{x}^{\rm new}_{m},\mathbf{x}^{\rm new}_{m})+\beta^{-1}\end{matrix}\right), (13)
𝑲(𝐱mnew,𝐗)\displaystyle\boldsymbol{K}(\mathbf{x}^{\rm new}_{m},\mathbf{X}) =\displaystyle= 𝑲(𝐗,𝐱mnew)T=(k(𝐱mnew,𝐱m1),,k(𝐱mnew,𝐱mN)),\displaystyle\boldsymbol{K}(\mathbf{X},\mathbf{x}^{\rm new}_{m})^{T}=\left(k(\mathbf{x}^{\rm new}_{m},\mathbf{x}^{1}_{m}),\cdots,k(\mathbf{x}^{\rm new}_{m},\mathbf{x}^{N}_{m})\right), (14)

where the prior distribution P(m,k){\rm P}(m,k) is assumed to be uniformly distributed.

In this study, on the basis of the framework of Bayesian model selection and empirical Bayesian methods[23], the DNN feature mm, the kernel function type kk, and its hyperparameters θk\theta_{\rm k} were determined by maximizing the following marginal likelihood functions called Bayesian evidence:

P(m,k,θk|𝐘)P(𝐘|m,k,θk)=𝑑𝐅P(𝐘|𝐅,m,k,θk)P(𝐅|m,k,θk)\displaystyle{\rm P}(m,k,\theta_{\rm k}|{\mathbf{Y}})\propto{\rm P}({\mathbf{Y}}|m,k,\theta_{\rm k})=\int d{\mathbf{F}}{\rm P}({\mathbf{Y}}|{\mathbf{F}},m,k,\theta_{\rm k}){\rm P}({\mathbf{F}}|m,k,\theta_{\rm k}) (15)
=\displaystyle= exp{β2μTΛ111μβ2𝐘T𝐘}(β2π)N/2(12π)d/2\displaystyle\exp\left\{\frac{\beta}{2}\mathbf{\mu^{\prime}}^{T}\Lambda_{11}^{-1}\mathbf{\mu^{\prime}}-\frac{\beta}{2}\mathbf{Y}^{T}\mathbf{Y}\right\}\left(\frac{\beta}{2\pi}\right)^{N/2}\left(\frac{1}{2\pi}\right)^{d/2} (16)
×d𝐅exp{β2(𝐅Λ111𝐊𝐘)T(𝐊1+βI)(𝐅Λ111𝐊𝐘)}\displaystyle\>\>\>\>\>\>\>\>\times\int d\mathbf{F}\exp\left\{-\frac{\beta}{2}\left(\mathbf{F}-\Lambda_{11}^{-1}\mathbf{K}\mathbf{Y}\right)^{T}\left(\mathbf{K}^{-1}+\beta I\right)\left(\mathbf{F}-\Lambda_{11}^{-1}\mathbf{K}\mathbf{Y}\right)\right\} (17)
=\displaystyle= exp{β2μTΛ111μβ2𝐘T𝐘}(β2π)N/2|Λ111|1/2.\displaystyle\exp\left\{\frac{\beta}{2}\mathbf{\mu^{\prime}}^{T}\Lambda_{11}^{-1}\mathbf{\mu^{\prime}}-\frac{\beta}{2}\mathbf{Y}^{T}\mathbf{Y}\right\}\left(\frac{\beta}{2\pi}\right)^{N/2}|\Lambda_{11}^{-1}|^{1/2}. (18)

In the Bayesian model selection framework, the likelihood of a model or hyper-parameter is often compared not with the marginal likelihood itself, but with its negative logarithm called Bayesian free energy[24],

F(m,k,θk)=log[p(m,k,θk|𝐘)],\displaystyle F(m,k,\theta_{\rm k})=-\log[p(m,k,\theta_{\rm k}|\mathbf{Y})], (19)

where kk denotes the kernel type. This Bayesian free energy is also employed in this study.

Since θk\theta_{\rm k} of the kernel function has continuous values, it is not possible to search for the value that minimizes the Bayesian free energy in a grid search like mm or kk. Therefore, in this study, the L-BFGS-B algorithm [25], a kind of quasi-Newtonian method, is used to find θk\theta_{\rm k} that minimizes the Bayesian free energy. Newton’s method becomes a local solution depending on the initial values of θk\theta_{\rm k}. In this study, we explored the initial values that minimize F(m,k,θk)F(m,k,\theta_{\rm k}) in the range shown in Table 1. The model parameters kk and mm were determined with θk\theta_{\rm k} optimized in this manner.

References

  • [1] Subra Suresh. Fatigue of Materials. Cambridge University Press, 1998.
  • [2] Benoit B Mandelbrot, Dann E Passoja, and Alvin J Paullay. Fractal character of fracture surfaces of metals. Nature, Vol. 308, No. 5961, pp. 721–722, 1984.
  • [3] Yali Barak, Ankit Srivastava, and Shmuel Osovski. Correlating fracture toughness and fracture surface roughness via correlation length scale. International Journal of Fracture, Vol. 219, No. 1, pp. 19–30, 2019.
  • [4] Ervin E. Underwood. Quantitative Fractography, pp. 101–122. Springer US, Boston, MA, 1986.
  • [5] Maria-Ximena Bastidas-Rodriguez, Flavio Prieto-Ortiz, and Edgar Espejo. Fractographic classification in metallic materials by using computer vision. Engineering Failure Analysis, Vol. 59, pp. 237–252, 2016.
  • [6] Faisal Khan, Saqib Salahuddin, and Hossein Javidnia. Deep learning-based monocular depth estimation methods―a state-of-the-art review. Sensors, Vol. 20, No. 8, p. 2272, 2020.
  • [7] Ibraheem Alhashim and Peter Wonka. High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941, 2018.
  • [8] Maria-Ximena Bastidas-Rodriguez, Luisa Polania, Adrien Gruson, and Flavio Prieto-Ortiz. Deep learning for fractographic classification in metallic materials. Engineering Failure Analysis, Vol. 113, p. 104532, 2020.
  • [9] Stylianos Tsopanidis, Raúl Herrero Moreno, and Shmuel Osovski. Toward quantitative fractography using convolutional neural networks. Engineering Fracture Mechanics, Vol. 231, p. 106992, 2020.
  • [10] Maria-Ximena Bastidas-Rodríguez, Flavio-Augusto Prieto-Ortiz, and Luisa F Polanía. A textural deep neural network combined with handcrafted features for mechanical failure classification. In 2019 IEEE International Conference on Industrial Technology (ICIT), pp. 847–852. IEEE, 2019.
  • [11] Ihor Konovalenko, Pavlo Maruschak, Olegas Prentkovskis, and Raimundas Junevičius. Investigation of the rupture surface of the titanium alloy using convolutional neural networks. Materials, Vol. 11, No. 12, p. 2467, 2018.
  • [12] Stylianos Tsopanidis, et al. Unsupervised machine learning in fractography: Evaluation and interpretation.
  • [13] National Institute of Technology and Evaluation, doi:10.34968/NIMS.1433. Nite polymer fracture database, 1996.
  • [14] pdfimages. https://poppler.freedesktop.org/.
  • [15] imagemagic. https://imagemagick.org/.
  • [16] Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, No. 12, pp. 2935–2947, 2017.
  • [17] Christopher K Williams and Carl Edward Rasmussen. Gaussian Processes for Machine Learning, Vol. 2. MIT press Cambridge, MA, 2006.
  • [18] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [19] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Ieee, 2009.
  • [20] ILSVRC-2014 model (VGG team) with 16 weight layers. https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md.
  • [21] Vgg in tensorflow. https://www.cs.toronto.edu/~frossard/post/vgg16/.
  • [22] Stéfan van der Walt, Johannes L. Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D. Warner, Neil Yager, Emmanuelle Gouillart, Tony Yu, and the scikit-image contributors. scikit-image: image processing in Python. PeerJ, Vol. 2, p. e453, 6 2014.
  • [23] Christopher M Bishop and Nasser M Nasrabadi. Pattern Recognition and Machine Learning, Vol. 4. Springer, 2006.
  • [24] Sumio Watanabe. Mathematical Theory of Bayesian Statistics. Chapman and Hall/CRC, 2018.
  • [25] DC Liu and J Nocedal. On the limited memory method for large scale optimization: Mathematical programming b. 1989.
  • [26] Erkan Deniz, Abdulkadir Şengür, Zehra Kadiroğlu, Yanhui Guo, Varun Bajaj, and Ümit Budak. Transfer learning based histopathologic image classification for breast cancer detection. Health Information Science and Systems, Vol. 6, No. 1, pp. 1–7, 2018.
  • [27] Sandy Napel, Wei Mu, Bruna V Jardim-Perassi, Hugo JWL Aerts, and Robert J Gillies. Quantitative imaging of cancer in the postgenomic era: Radio (geno) mics, deep learning, and habitats. Cancer, Vol. 124, No. 24, pp. 4633–4649, 2018.
  • [28] Taranjit Kaur and Tapan Kumar Gandhi. Deep convolutional neural networks with transfer learning for automated brain image classification. Machine Vision and Applications, Vol. 31, No. 3, pp. 1–16, 2020.
  • [29] Kasthurirangan Gopalakrishnan, Siddhartha K Khaitan, Alok Choudhary, and Ankit Agrawal. Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Construction and Building Materials, Vol. 157, pp. 322–330, 2017.