A Bayesian Approach with Type-2 Student-t Membership Function for T-S Model Identification
Abstract
Clustering techniques have been proved highly successful for Takagi-Sugeno (T-S) fuzzy model identification. In particular, fuzzy c-regression clustering based on type-2 fuzzy set has been shown the remarkable results on non-sparse data but their performance degraded on sparse data. In this paper, an innovative architecture for fuzzy c-regression model is presented and a novel student-t distribution based membership function is designed for sparse data modelling. To avoid the overfitting, we have adopted a Bayesian approach for incorporating a Gaussian prior on the regression coefficients. Additional novelty of our approach lies in type-reduction where the final output is computed using Karnik Mendel algorithm and the consequent parameters of the model are optimized using Stochastic Gradient Descent method. As detailed experimentation, the result shows that proposed approach outperforms on standard datasets in comparison of various state-of-the-art methods.
Index Terms:
TSK Model, Fuzzy c-Regression, Student-t distributionI Introduction
There have been numerous research focusing on modeling of non-linear systems through their input and output mapping. In particular, fuzzy logic based approaches have been very successful in modeling of non-linear dynamics in the presence of uncertainties [1]. Type-1 fuzzy logic enables system identification and modeling by virtue of numerous linguistic rules. Although this approach performs well, but due to the limitation of crisp membership values, its potential to handle the uncertainty in data is limited. Therefore, in order to successfully model the data uncertainty, a type-2 fuzzy logic was proposed, where membership values of each data points are themselves fuzzy. The type-2 fuzzy logic has been remarkably successful in past due to its robustness in the presence of imprecise and noisy data [2, 3].
The basic steps used in fuzzy inference system are the structure and parameter identification of the model. The structure identification is related to the process of selecting number of rules, input features and partition of input-output space while parameter identification is used to compute the antecedent and consequent parameters of the model. In the literature’s fuzzy clustering have been widely used for fuzzy space partitioning since, a T-S fuzzy model is comprised of various locally weighted linear regression models, many of them are hyperplane based models incorporating a hyperplane based clustering and seems very effective for structure identification. In particular, fuzzy c-regression clustering that are hyperplane-shaped clustering becomes more popular [4, 5]. The architecture of these algorithms are robust in partitioning the data space, inferring estimates of outputs with the inputs and determining an optimum fit for the regression model. Previously proposed techniques like fuzzy c-regression model (FCRM) and fuzzy c-mean (FCM) have been developed for type-2 fuzzy logic framework. Here, upper and lower membership values are determined by simultaneously optimizing two objective functions. Interval type-2 (IT2) FCRM, which was presented recently for the T-S regression framework has shown significantly better performance in terms of error minimization and robustness in comparison of type-1 fuzzy logic [4, 5, 6].
In this paper, we have combined the Gaussian and student-t density type membership function for an IT2 FCRM framework. This is a hyperplane-shaped membership function with relatively two different weighed terms. The student-t density part is weighed more if the data being modeled is sparse. The student-t distribution is a popular prior for sparse data modelling in Bayesian inference. Therefore, it is used in our model. The stochastic gradient descent (SGD) technique is used for optimization of consequent parameters and Karnik Mendel (KM) algorithm is applied for type reduction of estimated output. We have used regularization of the regression coefficients in the IT2 fuzzy c-means clustering for identification of antecedent parameters. As demonstrated in the results section, the regularization helps our model against overfitting of the training data and increases the generalization on unseen data. In addition, an innovative scheme for optimizing the consequent parameters is also presented, wherein we do not perform type reduction of the type-1 weight sets prior for output estimation. Instead, we use KM algorithm for the output to infer an optimal interval type-1 fuzzy set and the set boundaries are optimized by SGD method.
The rest of the paper is organized as: In Section II, we discuss the TSK fuzzy model, IT2-FCR and IT2-FCRM. In Section III, we describe the proposed approach . In Section IV, we present the efficacy of proposed approach through experimentation. Finally, Section V concludes the paper.
II Preliminaries
II-A TSK Fuzzy Model
TSK fuzzy model provides a rule-based structure for modeling a complex non-linear system. If is the system to be identified, where be the input vector and be the output. Then, the rule is written as
Rule i : IF is and and is THEN
(1) |
where, is the number of fuzzy rule and is the output. Using these rules, we can infer the final model output as follows:
(2) |
where, denotes the overall firing strength of the rule.
II-B Interval Type-2 FCM (IT2-FCM)
In the interval type-2 FCM, two different objective function that differ in their degrees of fuzziness are optimized simultaneously using Lagrange multipliers to obtain the upper and lower membership function [3]. Let and be the two degree of fuzziness, then the two objective function are described as
(3) |
II-C Inter Type-2 Fuzzy c-Regression Algorithm (IT2-FCR)
The main motivation of inter type-2 fuzzy c-regression algorithm is to partition the set of data points into c clusters. The data points in every cluster can be described by a regression model as
(4) |
where, be the input vector, be the number of features, be the number of clusters and be the coefficient vector of the cluster. In [4], the coefficient vectors are optimized by weighted least square method, whereas, in our approach we used SGD. The primary objective for using SGD is to make the algorithm robust even for the cases where become singular [6].
III Proposed Methodology
In this paper we have presented a new framework for FCRM with an innovative student- distribution based membership function (MF) for sparse data modelling [6, 7, 8]. The presented approach is described in following subsections.
III-A Fuzzy-Space Partitioning
Firstly, we formulate the task of fuzzy-space partitioning as a Maximum A-Posterior (MAP) over a squared error function [8]. Exploiting Bayes rule the MAP estimator is defined as
(5) |
where, be the posterior, be the likelihood and be the prior distribution. Using the above equation the MAP estimator is expressed in term of regression problem as
(6) |
where, be the MAP estimator, be the likelihood, be the prior or called as a regularizer, which is equivalent to the Bayesian notion of having a prior on the regression weights for each cluster and be the regularizer control parameter. The regularizer reduces the overfitting in cluster assignment by constraining the small regression weights.
In the proposed approach, we first define two degrees of fuzziness and , initializes the number of clusters and a termination threshold . We also initialize the parameters and , which are the upper and lower regression coefficient vectors of cluster. Then, the equation (7) is written in the term upper and lower error function MAP estimator as follows:
(7) |
To reduce the complexity of the system, a weighted average type reduction technique is applied to obtain as
(8) |
Through MAP estimate on the posterior of defuzzified error function , the upper and lower membership function for every data points in each cluster are obtained as similar to [3] and they are given as follows:
(9) |
The above equation can be interpreted as that for a MAP problem formulated in (8). To estimate the parameters and , we formulate the problem as a locally weighted linear regression with an objective function:
(10) |
Here, denotes the membership value of data point in the cluster. The parameter and are estimated by SGD using the above objective function by appropriately finding and . Then the regression coefficient () are obtained by a type reduction technique as follow:
(11) |
The steps in this subsection are run and the parameters are updated until the convergence of to obtain the optimal value of the regression coefficient as briefly described in Algorithm 1.
III-B Identification of Antecedent Parameters
The MF developed in [5] is hyperplane shaped, which cannot successfully incorporate the relevant information of data distributions within different clusters. To overcome this issue, we proposed a modified Gaussian based MF combined with a student-t density function. The student-t distribution is widely used as a prior in Bayesian inference for sparse data modelling [7]. Here, we weigh the Gaussian and the student-t part by a hyper-parameter . If the data we are modelling is very sparse then, should be set very low so as to give more weight to student-t density membership value.
(12) | ||||
(13) |
In the above, is the distance between input vector and cluster hyperplane.
(14) |
where, is the maximum distance of input vector from the cluster, and denotes the average distance and variance of each data points from the cluster hyperplane respectively.
(15) |
The lower MF () and upper MF () are called as weights of the TSK fuzzy model corresponding to input belonging to the cluster.
III-C Identification of Consequent Parameters
In the most of the literature the defuzzification of weights is computed before determining the model output . The problem with these approaches are that they do not consider effect of model output which will affect the over all performance of the model. To overcome this problem we evaluated the and corresponding to the and using the KM algorithm [2]. The values of and are optimized parallelly until the convergence. The another advantage of this approach is that it become more robust in handling noise and provide a confidence interval for every output data points. The model output and corresponding to the weights and are calculate using (1) and (2) as follows:
(16) | ||||
(17) |
Model | LR | RR | RBFNN | ITFRCM [9] | TIFNN [9] | RIT2FC [9] | Proposed |
MSE | 0.06 | 0.06 | 0.049 | 0.019 | 0.045 | 0.035 | 0.008 |
Coefficient of Determination | 0.68 | 0.69 | 0.67 | 0.73 | 0.77 | 0.79 | 0.85 |
Median Absolute Error | 0.71 | 0.73 | 0.73 | 0.75 | 0.80 | 0.81 | 0.85 |
LR: Logistic Regression, RR: Ridge Regression, RBFNN: Radial Basis Function Neural Network, ITFRCM: Interval Type-2 Fuzzy c-Means, TIFNN: Type-1 Set-Based Fuzzy Neural Network, RIT2FC: Reinforced Interval Type-2 FCM-Based Fuzzy Classifier
where, and are switching points and computed by KM algorithm. We run above mentioned steps until the convergence of and . Finally , the model output is determined by applying a type reduction technique as
(18) |
IV Results & Discussion
IV-A House Prices Dataset
The house prices dataset (https://www.kaggle.com/lespin/house-prices-dataset) is used to predict the sale price of a particular property. Through experimentation, we have demonstrated the robustness of proposed method on this sparse data. The dataset is divided in training (70) and testing (30) sets and five-fold cross-validation is used while training. The hyper-parameters of the model are initially set as: , , , , and . It should be noted that the value of is small because the dataset is sparse. So, in MF, the contribution of student- function should be high, which is ensured by a smaller value of i.e., larger value of as defined by (12) and (13). The mean square error (MSE) is on the test data, which is lower than state-of-the-art methods as shown in Table I. The absolute value of error as shown in Fig. 2 is also small in compare to the absolute house prices as shown in Fig. 1. We postulate that this is due to the student- MF used in our model, which helps in robustly quantifying the effects of sparse data. Also, the higher test accuracy is due to greater generalization owing to regularizer used in our model. The coefficient of variation which is the ratio of explained variance to total variance is very high (). This suggests that our model captures variations in the data robustly and is not susceptible to faulty performance in the presence of outliers.


IV-B Non-Linear Plant Modeling
The second-order non-linear difference equation as given in (20) is used in order to draw comparison with other benchmark models as given in Table II.
(19) |
where, is the input for validation of model, is the model output whereas, , and are the model inputs respectively. The hyper-parameters are tuned by grid search and finally set as: , , and . The obtained MSE of the model on test data points is using only four rules which is much smaller compared to other models. Through simulations, we have shown that proposed model outperforms with other state-of-the-art model. The Fig. 3 shows that our model output closely tracks the actual output at every time-step. As observed in Fig. 4, the error fluctuates with data point, but the absolute error is consistently less than 0.1 with no rapid surge at stationary points of time series data. This is a crucial requirement for a stable system. Therefore, we conclude that our algorithm yields a dynamically stable model.
State-of-the-Art | Rules | MSE |
Li et al. [1] | 4 | |
Fazel Zarandi [4] | 4 | |
Li et al. [5] | 4 | |
MIT2 FCRM [6] | 4 | |
Proposed | 4 |


IV-C A sinc Function in one Dimension
In this subsection a non-linear sinc function is used to present the effectiveness of the proposed model;
(20) |
where, . We have sampled 121 data points uniformly for this one dimensional function. As similar to previous case study, the number of rules is taken as four. The hyper-parameters are tuned through grid-search and finally fixed as: , and . The MSE of the proposed model is , which is lower in compare to modified inter type-2 FRCM (MIT2-FCRM) [6], which is on the test data of 121 samples. The Table III provides a detailed comparison of performance with state-of-the-art methods.
V Conclusion
In this paper, we have illustrated the efficacy of the proposed Bayesian type-2 fuzzy regression approach using student-t distribution based MF. The proposed MF is useful for fuzzy -mean regression models as demonstrated in section IV. When the number of features are small in compared to the samples, clustering of input-output space yield to be very effective for identify the rules of the fuzzy system. In addition, we have also demonstrated that instead of direct defuzzification of weights before computation of the final output, a continuous defuzzification and optimization gives better results.
References
- [1] C. Li, et al., “T-S fuzzy model identification based on a novel fuzzy c-regression model clustering algorithm,” Engineering Applications of Artificial Intelligence, vol. 22, no. 4-5, pp. 646–653, 2009.
- [2] J. Mendel, “On KM algorithms for solving type-2 fuzzy set problems,” IEEE Trans. on Fuzzy Syst., vol. 21, no. 3, pp. 426–446, 2013.
- [3] C. Hwang and F. C. H. Rhee, “Uncertain fuzzy clustering: Interval type-2 fuzzy approach to -means,” IEEE Trans. on Fuzzy Syst., vol. 15, no. 1, pp. 107–120, 2007.
- [4] M. H. F. Zarandi, R. Gamasaee, and I. B. Turksen, “A type-2 fuzzy c-regression clustering algorithm for takagi-sugeno system identification and its application in the steel industry,” Information Sciences, vol. 187, pp. 179–203, 2012.
- [5] C. Li, et al., “T-S fuzzy model identification with a gravitational search-based hyperplane clustering algorithm,” IEEE Transactions on Fuzzy Systems, vol. 20, no. 2, pp. 305–317, 2012.
- [6] W. Zou, C. Li, and N. Zhang, “A T-S fuzzy model identification approach based on a modified inter type-2 FRCM algorithm,” IEEE Trans. on Fuzzy Syst., vol. 26, no. 3, pp. 1104 – 1113, 2017.
- [7] V. E. E. Bening and V. Y. Korolev, “On an application of the student distribution in the theory of probability and mathematical statistics,” Theory of Probability & Its Apls., vol. 49, no. 3, pp. 377–391, 2005.
- [8] R. Gribonval, “Should penalized least squares regression be interpreted as maximum a posteriori estimation?” IEEE Trans. on Signal Process., vol. 59, no. 5, pp. 2405–2410, 2011.
- [9] E. H. Kim, S. K. Oh, and W. Pedrycz, “Design of reinforced interval type-2 fuzzy c-means-based fuzzy classifier,” IEEE Trans. on Fuzzy Syst., vol. 26, no. 5, pp. 3054 – 3068, 2017.
- [10] M. S. Chen and S. W. Wang, “Fuzzy clustering analysis for optimizing fuzzy membership functions,” Fuzzy sets and systems, vol. 103, no. 2, pp. 239–254, 1999.