This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Bayesian Approach with Type-2 Student-t Membership Function for T-S Model Identification

Vikas Singh, Homanga Bharadhwaj and Nishchal K Verma Vikas Singh Homanga Bharadhwaj, and Nishchal K Verma are with the Department of Electrical Engineering, IIT Kanpur, India (e-mail: [email protected], [email protected], [email protected])
Abstract

Clustering techniques have been proved highly successful for Takagi-Sugeno (T-S) fuzzy model identification. In particular, fuzzy c-regression clustering based on type-2 fuzzy set has been shown the remarkable results on non-sparse data but their performance degraded on sparse data. In this paper, an innovative architecture for fuzzy c-regression model is presented and a novel student-t distribution based membership function is designed for sparse data modelling. To avoid the overfitting, we have adopted a Bayesian approach for incorporating a Gaussian prior on the regression coefficients. Additional novelty of our approach lies in type-reduction where the final output is computed using Karnik Mendel algorithm and the consequent parameters of the model are optimized using Stochastic Gradient Descent method. As detailed experimentation, the result shows that proposed approach outperforms on standard datasets in comparison of various state-of-the-art methods.

Index Terms:
TSK Model, Fuzzy c-Regression, Student-t distribution

I Introduction

There have been numerous research focusing on modeling of non-linear systems through their input and output mapping. In particular, fuzzy logic based approaches have been very successful in modeling of non-linear dynamics in the presence of uncertainties [1]. Type-1 fuzzy logic enables system identification and modeling by virtue of numerous linguistic rules. Although this approach performs well, but due to the limitation of crisp membership values, its potential to handle the uncertainty in data is limited. Therefore, in order to successfully model the data uncertainty, a type-2 fuzzy logic was proposed, where membership values of each data points are themselves fuzzy. The type-2 fuzzy logic has been remarkably successful in past due to its robustness in the presence of imprecise and noisy data [2, 3].

The basic steps used in fuzzy inference system are the structure and parameter identification of the model. The structure identification is related to the process of selecting number of rules, input features and partition of input-output space while parameter identification is used to compute the antecedent and consequent parameters of the model. In the literature’s fuzzy clustering have been widely used for fuzzy space partitioning since, a T-S fuzzy model is comprised of various locally weighted linear regression models, many of them are hyperplane based models incorporating a hyperplane based clustering and seems very effective for structure identification. In particular, fuzzy c-regression clustering that are hyperplane-shaped clustering becomes more popular [4, 5]. The architecture of these algorithms are robust in partitioning the data space, inferring estimates of outputs with the inputs and determining an optimum fit for the regression model. Previously proposed techniques like fuzzy c-regression model (FCRM) and fuzzy c-mean (FCM) have been developed for type-2 fuzzy logic framework. Here, upper and lower membership values are determined by simultaneously optimizing two objective functions. Interval type-2 (IT2) FCRM, which was presented recently for the T-S regression framework has shown significantly better performance in terms of error minimization and robustness in comparison of type-1 fuzzy logic [4, 5, 6].

In this paper, we have combined the Gaussian and student-t density type membership function for an IT2 FCRM framework. This is a hyperplane-shaped membership function with relatively two different weighed terms. The student-t density part is weighed more if the data being modeled is sparse. The student-t distribution is a popular prior for sparse data modelling in Bayesian inference. Therefore, it is used in our model. The stochastic gradient descent (SGD) technique is used for optimization of consequent parameters and Karnik Mendel (KM) algorithm is applied for type reduction of estimated output. We have used L2\textit{L}_{2} regularization of the regression coefficients in the IT2 fuzzy c-means clustering for identification of antecedent parameters. As demonstrated in the results section, the regularization helps our model against overfitting of the training data and increases the generalization on unseen data. In addition, an innovative scheme for optimizing the consequent parameters is also presented, wherein we do not perform type reduction of the type-1 weight sets prior for output estimation. Instead, we use KM algorithm for the output to infer an optimal interval type-1 fuzzy set and the set boundaries are optimized by SGD method.

The rest of the paper is organized as: In Section II, we discuss the TSK fuzzy model, IT2-FCR and IT2-FCRM. In Section III, we describe the proposed approach . In Section IV, we present the efficacy of proposed approach through experimentation. Finally, Section V concludes the paper.

II Preliminaries

II-A TSK Fuzzy Model

TSK fuzzy model provides a rule-based structure for modeling a complex non-linear system. If G(x,y)G(x,y) is the system to be identified, where 𝐱Rn\mathbf{x}\in R^{n} be the input vector and yRy\in R be the output. Then, the ithi^{th} rule is written as

Rule i : IF xix_{i} is A1iA^{i}_{1} and \cdots andxmx_{m} is AmiA^{i}_{m} THEN

yi=θ0i+θ1ix1++θmixm\displaystyle y^{i}=\theta_{0}^{i}+\theta_{1}^{i}x_{1}+\cdots+\theta_{m}^{i}x_{m} (1)

where, i=1,,ci=1,\cdots,c is the number of fuzzy rule and yiy^{i} is the ithi^{th} output. Using these rules, we can infer the final model output as follows:

y=i=1cwiyii=1cwi,wi=j=1mμAjixj\displaystyle y=\frac{\sum^{c}_{i=1}w^{i}y^{i}}{\sum^{c}_{i=1}w^{i}},\;\;\;w^{i}=\prod^{m}_{j=1}\mu_{A_{j}}^{i}x_{j} (2)

where, wiw^{i} denotes the overall firing strength of the ithi^{th} rule.

II-B Interval Type-2 FCM (IT2-FCM)

In the interval type-2 FCM, two different objective function that differ in their degrees of fuzziness are optimized simultaneously using Lagrange multipliers to obtain the upper and lower membership function [3]. Let m1m_{1} and m2m_{2} be the two degree of fuzziness, then the two objective function are described as

Qm1(U,v)=k=1Ni=1cμi(xk)m1Eik(ζi)2Qm2(U,v)=k=1Ni=2cμi(xk)m1Eik(ζi)2\displaystyle\begin{split}Q_{m_{1}}(U,v)=\sum_{k=1}^{N}\sum_{i=1}^{c}\mu_{i}(\textbf{x}_{k})^{m_{1}}{E_{ik}}(\zeta_{i})^{2}\\ Q_{m_{2}}(U,v)=\sum_{k=1}^{N}\sum_{i=2}^{c}\mu_{i}(\textbf{x}_{k})^{m_{1}}{E_{ik}}(\zeta_{i})^{2}\end{split} (3)

II-C Inter Type-2 Fuzzy c-Regression Algorithm (IT2-FCR)

The main motivation of inter type-2 fuzzy c-regression algorithm is to partition the set of nn data points (𝐱k,yk)(\mathbf{x}_{k},y_{k}) (k=1n)(k=1\cdots n) into c clusters. The data points in every cluster ii can be described by a regression model as

y^k=gi(xk,ζi)=b1ixk1++bmixkm+b0i=[𝐱k 1]ζiT\displaystyle\begin{split}\hat{y}_{k}&=g^{i}(x_{k},{\zeta_{i}})=b^{i}_{1}x_{k1}+\cdots+b^{i}_{m}x_{km}+b^{i}_{0}=[\mathbf{x}_{k}\;1]\zeta_{i}^{T}\end{split} (4)

where, 𝐱k=[xk1,,xkm]\mathbf{x}_{k}=[x_{k1},\cdots,x_{km}] be the kthk^{th} input vector, j=1,,mj=1,\cdots,m be the number of features, i=1,,ci=1,\cdots,c be the number of clusters and ζi=[b1i,,bmi,b0i]\mathbf{\zeta}_{i}=[b^{i}_{1},\cdots,b^{i}_{m},b^{i}_{0}] be the coefficient vector of the ithi^{th} cluster. In [4], the coefficient vectors are optimized by weighted least square method, whereas, in our approach we used SGD. The primary objective for using SGD is to make the algorithm robust even for the cases where [𝐱T𝐏i𝐱][\mathbf{x}^{T}\mathbf{P}_{i}\mathbf{x}] become singular [6].

III Proposed Methodology

In this paper we have presented a new framework for FCRM with an innovative student-tt distribution based membership function (MF) for sparse data modelling [6, 7, 8]. The presented approach is described in following subsections.

III-A Fuzzy-Space Partitioning

Firstly, we formulate the task of fuzzy-space partitioning as a Maximum A-Posterior (MAP) over a squared error function [8]. Exploiting Bayes rule the MAP estimator is defined as

ϕ(y)=argmaxxRnp(x/y)=argmaxxRnp(y/x)p(x)\displaystyle\phi(y)=\arg\max_{x\in{R^{n}}}p(x/y)=\arg\max_{x\in{R^{n}}}p(y/x)p(x) (5)

where, p(x/y)p(x/y) be the posterior, p(y/x)p(y/x) be the likelihood and p(x)p(x) be the prior distribution. Using the above equation the MAP estimator is expressed in term of regression problem as

Eik(ζi)=(ykgi(xk,ζi))2+λp=0m(bpi)2\displaystyle{E_{ik}}(\zeta_{i})=(y_{k}-g^{i}(x_{k},{\zeta_{i}}))^{2}+\lambda\sum_{p=0}^{m}(b^{i}_{p})^{2} (6)

where, Eik(ζi){E_{ik}}(\zeta_{i}) be the MAP estimator, (ykgi(xk,ζi))2(y_{k}-g^{i}(x_{k},{\zeta_{i}}))^{2} be the likelihood, p=0m(bpi)2\sum_{p=0}^{m}(b^{i}_{p})^{2} be the prior or called as a regularizer, which is equivalent to the Bayesian notion of having a prior on the regression weights bib^{i} for each cluster ii and λ\lambda be the regularizer control parameter. The regularizer reduces the overfitting in cluster assignment by constraining the small regression weights.

In the proposed approach, we first define two degrees of fuzziness m1m_{1} and m2m_{2}, initializes the number of clusters cc and a termination threshold ϵ\epsilon. We also initialize the parameters ζi¯\overline{\zeta_{i}} and ζi¯\underline{\zeta_{i}}, which are the upper and lower regression coefficient vectors of ithi^{th} cluster. Then, the equation (7) is written in the term upper and lower error function MAP estimator as follows:

Eik(ζ¯i)=(ykgi(xk,ζi¯))2+λp=0m(bpi)2Eik(ζ¯i)=(ykgi(xk,ζi¯))2+λp=0m(bpi)2\displaystyle\begin{split}{E_{ik}}(\overline{\zeta}_{i})=(y_{k}-g^{i}(x_{k},\overline{\zeta_{i}}))^{2}+\lambda\sum_{p=0}^{m}(b^{i}_{p})^{2}\\ {E_{ik}}(\underline{\zeta}_{i})=(y_{k}-g^{i}(x_{k},\underline{\zeta_{i}}))^{2}+\lambda\sum_{p=0}^{m}(b^{i}_{p})^{2}\end{split} (7)

To reduce the complexity of the system, a weighted average type reduction technique is applied to obtain Eik(ζi){E_{ik}}(\zeta_{i}) as

Eik(ζi)=(Eik(ζ¯i)+Eik(ζ¯i))2\displaystyle{E_{ik}}(\zeta_{i})=\frac{({E_{ik}}(\overline{\zeta}_{i})+{E_{ik}}(\underline{\zeta}_{i}))}{2} (8)

Through MAP estimate on the posterior of defuzzified error function Eik(ζi){E_{ik}}(\zeta_{i}), the upper and lower membership function for every data points in each cluster are obtained as similar to [3] and they are given as follows:

u¯ik={1r=1c((Eik(ζi)Erk(ζi))2(m11),if11c((Eik(ζi)Erk(ζi))<1c1r=1c((Eik(ζi)Erk(ζi))2(m21),otherwiseu¯ik={1r=1c((Eik(ζi)Erk(ζi))2(m11),if11c((Eik(ζi)Erk(ζi))1c1r=1c((Eik(ζi)Erk(ζi))2(m21),otherwise\displaystyle\begin{split}\overline{u}_{ik}=\left\{\begin{array}[]{ll}\frac{1}{\sum_{r=1}^{c}\Big{(}\frac{(E_{ik}(\zeta_{i})}{E_{rk}(\zeta_{i})}\Big{)}^{\frac{2}{(m_{1}-1)}}},&\text{if}\;\;\frac{1}{\sum_{1}^{c}\Big{(}\frac{(E_{ik}(\zeta_{i})}{E_{rk}(\zeta_{i})}\Big{)}}<\frac{1}{c}\\ \frac{1}{\sum_{r=1}^{c}\Big{(}\frac{(E_{ik}(\zeta_{i})}{E_{rk}(\zeta_{i})}\Big{)}^{\frac{2}{(m_{2}-1)}}},&\text{otherwise}\\ \end{array}\right.\\ \underline{u}_{ik}=\left\{\begin{array}[]{ll}\frac{1}{\sum_{r=1}^{c}\Big{(}\frac{(E_{ik}(\zeta_{i})}{E_{rk}(\zeta_{i})}\Big{)}^{\frac{2}{(m_{1}-1)}}},&\text{if}\;\;\frac{1}{\sum_{1}^{c}\Big{(}\frac{(E_{ik}(\zeta_{i})}{E_{rk}(\zeta_{i})}\Big{)}}\geq\frac{1}{c}\\ \frac{1}{\sum_{r=1}^{c}\Big{(}\frac{(E_{ik}(\zeta_{i})}{E_{rk}(\zeta_{i})}\Big{)}^{\frac{2}{(m_{2}-1)}}},&\text{otherwise}\\ \end{array}\right.\end{split} (9)

The above equation can be interpreted as that for a MAP problem formulated in (8). To estimate the parameters ζi¯\overline{\zeta_{i}} and ζi¯\underline{\zeta_{i}}, we formulate the problem as a locally weighted linear regression with an objective function:

J(ζi)=12k=1nuik([𝐱k 1]ζiTyk)\displaystyle J(\zeta_{i})=\frac{1}{2}\sum_{k=1}^{n}u_{ik}([\mathbf{x}_{k}\;1]\zeta_{i}^{T}-y_{k}) (10)

Here, uiku_{ik} denotes the membership value of kthk^{th} data point in the ithi^{th} cluster. The parameter ζi¯\overline{\zeta_{i}} and ζi¯\underline{\zeta_{i}} are estimated by SGD using the above objective function by appropriately finding u¯ik\overline{u}_{ik} and u¯ik\underline{u}_{ik}. Then the regression coefficient (ζi\zeta_{i}) are obtained by a type reduction technique as follow:

ζi=(ζ¯i+ζ¯i)2\displaystyle\zeta_{i}=\frac{(\overline{\zeta}_{i}+\underline{\zeta}_{i})}{2} (11)

The steps in this subsection are run and the parameters are updated until the convergence of ζicurrentζiprevious||\zeta_{i}^{current}-\zeta_{i}^{previous}|| \geq ϵ\epsilon to obtain the optimal value of the regression coefficient as briefly described in Algorithm 1.

III-B Identification of Antecedent Parameters

The MF developed in [5] is hyperplane shaped, which cannot successfully incorporate the relevant information of data distributions within different clusters. To overcome this issue, we proposed a modified Gaussian based MF combined with a student-t density function. The student-t distribution is widely used as a prior in Bayesian inference for sparse data modelling [7]. Here, we weigh the Gaussian and the student-t part by a hyper-parameter α\alpha. If the data we are modelling is very sparse then, α\alpha should be set very low so as to give more weight to student-t density membership value.

μ¯Ai(𝐱k)=αexp(η(dik(ζ¯i)vi(ζ¯i))2σi2(ζ¯i))+(1α)(1+dik2(ζ¯i)r)(r+1)2\displaystyle\begin{split}\overline{\mu}_{A_{i}}(\mathbf{x}_{k})=\alpha\exp\left(-\eta\frac{(d_{ik}(\overline{\zeta}_{i})-v_{i}(\overline{\zeta}_{i}))^{2}}{\sigma^{2}_{i}(\overline{\zeta}_{i})}\right)\\ +(1-\alpha)\left(1+\frac{d_{ik}^{2}(\overline{\zeta}_{i})}{r}\right)^{-\frac{(r+1)}{2}}\end{split} (12)
μ¯Ai(𝐱k)=αexp(η(dik(ζ¯i)vi(ζ¯i))2σi2(ζ¯i))+(1α)(1+dik2(ζ¯i)r)(r+1)2\displaystyle\begin{split}\underline{\mu}_{A_{i}}(\mathbf{x}_{k})=\alpha\exp\left(-\eta\frac{(d_{ik}(\underline{\zeta}_{i})-v_{i}(\underline{\zeta}_{i}))^{2}}{\sigma^{2}_{i}(\underline{\zeta}_{i})}\right)\\ +(1-\alpha)\left(1+\frac{d_{ik}^{2}(\underline{\zeta}_{i})}{r}\right)^{-\frac{(r+1)}{2}}\end{split} (13)

In the above, dikd_{ik} is the distance between kthk^{th} input vector and ithi^{th} cluster hyperplane.

dik(ζ¯i)=|xk.ζ¯i|ζ¯i;dik(ζ¯i)=|xk.ζ¯i|ζ¯i\displaystyle d_{ik}(\overline{\zeta}_{i})=\frac{|\text{x}_{k}.\overline{\zeta}_{i}|}{||\overline{\zeta}_{i}||};\;\;\;\;\;\;d_{ik}(\underline{\zeta}_{i})=\frac{|\text{x}_{k}.\underline{\zeta}_{i}|}{||\underline{\zeta}_{i}||} (14)

where, r=max{dik(ζ¯i),i=1c}r=\max\{d_{ik}(\overline{\zeta}_{i}),\;\;i=1\cdots c\} is the maximum distance of kthk^{th} input vector from the ithi^{th} cluster, viv_{i} and σi\sigma_{i} denotes the average distance and variance of each data points from the cluster hyperplane respectively.

vi(ζi)=k=1ndik(ζi)n;σi(ζi)=k=1n(dik(ζi)vi(ζi))2n\displaystyle v_{i}(\zeta_{i})=\frac{\displaystyle\sum^{n}_{k=1}d_{ik}(\zeta_{i})}{n};\;\;\;\;\;\sigma_{i}(\zeta_{i})=\frac{\displaystyle\sum^{n}_{k=1}(d_{ik}(\zeta_{i})-v_{i}(\zeta_{i}))^{2}}{n} (15)

The lower MF (μ¯Ai(𝐱k)\underline{\mu}_{A_{i}}(\mathbf{x}_{k})) and upper MF (μ¯Ai(𝐱k)\overline{\mu}_{A_{i}}(\mathbf{x}_{k})) are called as weights of the TSK fuzzy model corresponding to kthk^{th} input belonging to the ithi^{th} cluster.

Algorithm 1 The Proposed Approach
1:Begin
2:for i=1 to m  do
3:     Calculate ζi¯\overline{\zeta_{i}} and ζi¯\underline{\zeta_{i}} the upper and lower regression vectors using (10)
4:     Calculate errors Eik(ζ¯i)E_{ik}(\overline{\zeta}_{i}), Eik(ζ¯i)E_{ik}(\underline{\zeta}_{i}) using (7)
5:     Calculate upper and lower MFs ( u¯ik\overline{u}_{ik}, u¯ik\underline{u}_{ik}) using (10)
6:     end
7:The above identifies optimal ζi¯\overline{\zeta_{i}} and ζi¯\underline{\zeta_{i}}   i[1,c]\forall\;i\in[1,c]
8:for i=1 to m  do
9:     Compute the input MFs using (12) and (13)
10:     Compute the interval type-2 output yky_{k} using (18)
11:     end
12:End

III-C Identification of Consequent Parameters

In the most of the literature the defuzzification of weights is computed before determining the model output y^k\hat{y}_{k}. The problem with these approaches are that they do not consider effect of model output which will affect the over all performance of the model. To overcome this problem we evaluated the y¯k\underline{y}_{k} and y¯k\overline{y}_{k} corresponding to the μ¯Ai(𝐱k)\underline{\mu}_{A_{i}}(\mathbf{x}_{k}) and μ¯Ai(𝐱k)\overline{\mu}_{A_{i}}(\mathbf{x}_{k}) using the KM algorithm [2]. The values of y¯k\underline{y}_{k} and y¯k\overline{y}_{k} are optimized parallelly until the convergence. The another advantage of this approach is that it become more robust in handling noise and provide a confidence interval for every output data points. The model output y¯k\underline{y}_{k} and y¯k\overline{y}_{k} corresponding to the weights μ¯Ai(𝐱k)\underline{\mu}_{A_{i}}(\mathbf{x}_{k}) and μ¯Ai(𝐱k)\overline{\mu}_{A_{i}}(\mathbf{x}_{k}) are calculate using (1) and (2) as follows:

y¯k=i=1pμ¯Ai(𝐱k).(θ0i+θ1ixk1++θMixkM)i=1pμ¯Ai(𝐱k)+i=p+1cμ¯Ai(𝐱k)+i=p+1cμ¯Ai(𝐱k).(θ0i+θ1ixk1++θMixkM)i=1pμ¯Ai(𝐱k)+i=p+1cμ¯Ai(𝐱k)\displaystyle\begin{split}\underline{y}_{k}=\frac{\displaystyle\sum^{p}_{i=1}\overline{\mu}_{A_{i}}(\mathbf{x}_{k}).(\theta_{0}^{i}+\theta_{1}^{i}x_{k1}+\cdots+\theta_{M}^{i}x_{kM})}{\displaystyle\sum^{p}_{i=1}\overline{\mu}_{A_{i}}(\mathbf{x}_{k})+\displaystyle\sum^{c}_{i=p+1}\underline{\mu}_{A_{i}}(\mathbf{x}_{k})}\\ +\frac{\displaystyle\sum^{c}_{i=p+1}\underline{\mu}_{A_{i}}(\mathbf{x}_{k}).(\theta_{0}^{i}+\theta_{1}^{i}x_{k1}+\cdots+\theta_{M}^{i}x_{kM})}{\displaystyle\sum^{p}_{i=1}\overline{\mu}_{A_{i}}(\mathbf{x}_{k})+\displaystyle\sum^{c}_{i=p+1}\underline{\mu}_{A_{i}}(\mathbf{x}_{k})}\end{split} (16)
y¯k=i=1qμ¯Ai(𝐱k).(θ0i+θ1ixk1++θMixkM)i=1qμ¯Ai(𝐱k)+i=q+1cμ¯Ai(𝐱k)+i=q+1cμ¯Ai(𝐱k).(θ0i+θ1ixk1++θMixkM)i=1qμ¯Ai(𝐱k)+i=q+1cμ¯Ai(𝐱k)\displaystyle\begin{split}\overline{y}_{k}=\frac{\displaystyle\sum^{q}_{i=1}\underline{\mu}_{A_{i}}(\mathbf{x}_{k}).(\theta_{0}^{i}+\theta_{1}^{i}x_{k1}+\cdots+\theta_{M}^{i}x_{kM})}{\displaystyle\sum^{q}_{i=1}\underline{\mu}_{A_{i}}(\mathbf{x}_{k})+\displaystyle\sum^{c}_{i=q+1}\overline{\mu}_{A_{i}}(\mathbf{x}_{k})}\\ +\frac{\displaystyle\sum^{c}_{i=q+1}\overline{\mu}_{A_{i}}(\mathbf{x}_{k}).(\theta_{0}^{i}+\theta_{1}^{i}x_{k1}+\cdots+\theta_{M}^{i}x_{kM})}{\displaystyle\sum^{q}_{i=1}\underline{\mu}_{A_{i}}(\mathbf{x}_{k})+\displaystyle\sum^{c}_{i=q+1}\overline{\mu}_{A_{i}}(\mathbf{x}_{k})}\end{split} (17)
Table I: Comparison of performance on house prices dataset
Model LR RR RBFNN ITFRCM [9] TIFNN [9] RIT2FC [9] Proposed
MSE 0.06 0.06 0.049 0.019 0.045 0.035 0.008
Coefficient of Determination 0.68 0.69 0.67 0.73 0.77 0.79 0.85
Median Absolute Error 0.71 0.73 0.73 0.75 0.80 0.81 0.85

LR: Logistic Regression, RR: Ridge Regression, RBFNN: Radial Basis Function Neural Network, ITFRCM: Interval Type-2 Fuzzy c-Means, TIFNN: Type-1 Set-Based Fuzzy Neural Network, RIT2FC: Reinforced Interval Type-2 FCM-Based Fuzzy Classifier

where, pp and qq are switching points and computed by KM algorithm. We run above mentioned steps until the convergence of y¯k\overline{y}_{k} and y¯k\underline{y}_{k}. Finally , the model output is determined by applying a type reduction technique as

yk=y¯k+y¯k2\displaystyle y_{k}=\frac{\overline{y}_{k}+\underline{y}_{k}}{2} (18)

IV Results & Discussion

IV-A House Prices Dataset

The house prices dataset (https://www.kaggle.com/lespin/house-prices-dataset) is used to predict the sale price of a particular property. Through experimentation, we have demonstrated the robustness of proposed method on this sparse data. The dataset is divided in training (70%\%) and testing (30%\%) sets and five-fold cross-validation is used while training. The hyper-parameters of the model are initially set as: c=3c=3, m1=1.6m_{1}=1.6, m2=4.7m_{2}=4.7, λ=0.3\lambda=0.3, α=0.15\alpha=0.15 and η=3.7\eta=3.7. It should be noted that the value of α=0.15\alpha=0.15 is small because the dataset is sparse. So, in MF, the contribution of student-tt function should be high, which is ensured by a smaller value of α\alpha i.e., larger value of 1α1-\alpha as defined by (12) and (13). The mean square error (MSE) is 0.0080.008 on the test data, which is lower than state-of-the-art methods as shown in Table I. The absolute value of error as shown in Fig. 2 is also small in compare to the absolute house prices as shown in Fig. 1. We postulate that this is due to the student-tt MF used in our model, which helps in robustly quantifying the effects of sparse data. Also, the higher test accuracy is due to greater generalization owing to L2L_{2} regularizer used in our model. The coefficient of variation which is the ratio of explained variance to total variance is very high (0.850.85). This suggests that our model captures variations in the data robustly and is not susceptible to faulty performance in the presence of outliers.

Refer to caption
Figure 1: Performance comparison of model output with actual output
Refer to caption
Figure 2: Plot of test error for house prices dataset

IV-B Non-Linear Plant Modeling

The second-order non-linear difference equation as given in (20) is used in order to draw comparison with other benchmark models as given in Table II.

z(k)=(z(k1)+2.5)z(k1)z(k2)1+z2(k1)+z2(k2)+v(k)\displaystyle z(k)=\frac{(z(k-1)+2.5)z(k-1)z(k-2)}{1+z^{2}(k-1)+z^{2}(k-2)}+v(k) (19)

where, v(k)=sin(2k/25)v(k)=\sin(2k/25) is the input for validation of model, z(k)z(k) is the model output whereas, z(k1)z(k-1), z(k2)z(k-2) and u(k)u(k) are the model inputs respectively. The hyper-parameters are tuned by grid search and finally set as: c=4c=4, m1=1.5m_{1}=1.5, m2=7m_{2}=7 and η=3.14\eta=3.14. The obtained MSE of the model on 500500 test data points is 7.2×1057.2\times 10^{-5} using only four rules which is much smaller compared to other models. Through simulations, we have shown that proposed model outperforms with other state-of-the-art model. The Fig. 3 shows that our model output closely tracks the actual output at every time-step. As observed in Fig. 4, the error fluctuates with data point, but the absolute error is consistently less than 0.1 with no rapid surge at stationary points of time series data. This is a crucial requirement for a stable system. Therefore, we conclude that our algorithm yields a dynamically stable model.

Table II: Performance on non-linear time series problem
                 State-of-the-Art Rules MSE
Li et al. [1] 4 1.49×1021.49\times 10^{-2}
Fazel Zarandi [4] 4 5.4×1035.4\times 10^{-3}
Li et al. [5] 4 1.02×1021.02\times 10^{-2}
MIT2 FCRM [6] 4 1.02×1041.02\times 10^{-4}
Proposed 4 7.2×𝟏𝟎𝟓\mathbf{7.2\times 10^{-5}}
Refer to caption
Figure 3: Performance comparison of model output and actual output
Refer to caption
Figure 4: Plot of test error on time series data

IV-C A sinc Function in one Dimension

In this subsection a non-linear sinc function is used to present the effectiveness of the proposed model;

y=sin(x)x\displaystyle y=\frac{\sin(x)}{x} (20)

where, x[40,0)(0,40]x\in[-40,0)\bigcup(0,40]. We have sampled 121 data points uniformly for this one dimensional function. As similar to previous case study, the number of rules is taken as four. The hyper-parameters are tuned through grid-search and finally fixed as: m1=1.5m_{1}=1.5, m2=7m_{2}=7 and η=3.14\eta=3.14. The MSE of the proposed model is 2.4×1032.4\times 10^{-3}, which is lower in compare to modified inter type-2 FRCM (MIT2-FCRM) [6], which is 7.7×1037.7\times 10^{-3} on the test data of 121 samples. The Table III provides a detailed comparison of performance with state-of-the-art methods.

Table III: performance on sinc function
State-of-the-Art Rules MSE
SCM [10] 2 4.47×1024.47\times 10^{-2}
EUM [10] 2 4.50×1024.50\times 10^{-2}
EFCM [10] 2 8.9×1038.9\times 10^{-3}
Fazel Zarandi [4] 4 2.385×1022.385\times 10^{-2}
MIT2 FCRM [6] 4 7.7×1037.7\times 10^{-3}
Proposed 4 2.4×𝟏𝟎𝟑\mathbf{2.4\times 10^{-3}}

V Conclusion

In this paper, we have illustrated the efficacy of the proposed Bayesian type-2 fuzzy regression approach using student-t distribution based MF. The proposed MF is useful for fuzzy cc-mean regression models as demonstrated in section IV. When the number of features are small in compared to the samples, clustering of input-output space yield to be very effective for identify the rules of the fuzzy system. In addition, we have also demonstrated that instead of direct defuzzification of weights before computation of the final output, a continuous defuzzification and optimization gives better results.

References

  • [1] C. Li, et al., “T-S fuzzy model identification based on a novel fuzzy c-regression model clustering algorithm,” Engineering Applications of Artificial Intelligence, vol. 22, no. 4-5, pp. 646–653, 2009.
  • [2] J. Mendel, “On KM algorithms for solving type-2 fuzzy set problems,” IEEE Trans. on Fuzzy Syst., vol. 21, no. 3, pp. 426–446, 2013.
  • [3] C. Hwang and F. C. H. Rhee, “Uncertain fuzzy clustering: Interval type-2 fuzzy approach to cc-means,” IEEE Trans. on Fuzzy Syst., vol. 15, no. 1, pp. 107–120, 2007.
  • [4] M. H. F. Zarandi, R. Gamasaee, and I. B. Turksen, “A type-2 fuzzy c-regression clustering algorithm for takagi-sugeno system identification and its application in the steel industry,” Information Sciences, vol. 187, pp. 179–203, 2012.
  • [5] C. Li, et al., “T-S fuzzy model identification with a gravitational search-based hyperplane clustering algorithm,” IEEE Transactions on Fuzzy Systems, vol. 20, no. 2, pp. 305–317, 2012.
  • [6] W. Zou, C. Li, and N. Zhang, “A T-S fuzzy model identification approach based on a modified inter type-2 FRCM algorithm,” IEEE Trans. on Fuzzy Syst., vol. 26, no. 3, pp. 1104 – 1113, 2017.
  • [7] V. E. E. Bening and V. Y. Korolev, “On an application of the student distribution in the theory of probability and mathematical statistics,” Theory of Probability & Its Apls., vol. 49, no. 3, pp. 377–391, 2005.
  • [8] R. Gribonval, “Should penalized least squares regression be interpreted as maximum a posteriori estimation?” IEEE Trans. on Signal Process., vol. 59, no. 5, pp. 2405–2410, 2011.
  • [9] E. H. Kim, S. K. Oh, and W. Pedrycz, “Design of reinforced interval type-2 fuzzy c-means-based fuzzy classifier,” IEEE Trans. on Fuzzy Syst., vol. 26, no. 5, pp. 3054 – 3068, 2017.
  • [10] M. S. Chen and S. W. Wang, “Fuzzy clustering analysis for optimizing fuzzy membership functions,” Fuzzy sets and systems, vol. 103, no. 2, pp. 239–254, 1999.