Granular-Ball Fuzzy Set and Its Implementation in SVM

Shuyin Xia, Xiaoyu Lian, Guoyin Wang*, Xinbo Gao, Yabin Shao S. Xia, X. Lian, G. Wang, X. Gao and Y. Shao are with the Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Telecommunications and Posts, 400065, Chongqing, China. E-mail: [email protected], [email protected], [email protected].

Abstract

Most existing fuzzy set methods use points as their input, which is the finest granularity from the perspective of granular computing. Consequently, these methods are neither efficient nor robust to label noise. Therefore, we propose a frame-work called granular-ball fuzzy set by introducing granular-ball computing into fuzzy set. The computational framework is based on the granular-balls input rather than points; therefore, it is more efficient and robust than traditional fuzzy methods, and can be used in various fields of fuzzy data processing according to its extensibility. Furthermore, the framework is extended to the classifier fuzzy support vector machine (FSVM), to derive the granular ball fuzzy SVM (GBFSVM). The experimental results demonstrate the effectiveness and efficiency of GBFSVM. The source codes and data sets are available on the public link: http://www.cquptshuyinxia.com/GBFSVM.html.

Index Terms:

Fuzzy set, granular-ball, SVM, granular computing, label noise.

I Introduction

In the practical world, there are numerous fuzzy phenomena or concepts in the objective world, such as big and small, light and heavy, fast and slow, dynamic and static, deep and shallow, beauty and ugliness, etc., which cannot be clearly and completely distinguished. In fact, fuzzy information is also reliable information. In order to quantitatively describe the objective laws of fuzzy concepts and fuzzy phenomena, Professor L.A. Zadeh, an American computer and cybernetics expert, put forward the important concept of fuzzy set [1] in 1965. He used membership functions to represent fuzzy sets, which are functions of [0,1] closed intervals, to describe the degree to which elements belong to fuzzy sets. The greater the function value, the greater the degree of membership. Since Zadeh introduced fuzzy sets [2], it has been applied to various fields such as control systems, pattern recognition, machine learning, etc, and its another branch, fuzzy rough set, has also been developed rapidly. Several scholars have conducted in-depth research in the direction of feature selection [3, 4, 5, 6, 7, 8], clustering [9], decision making [10, 11] , classification [12] and so on.

Refer to caption — Figure 1: Human cognition the coarse-grained large range is preferred.

Considering the classification problem of fuzzy data sets, Lin et al. [1] proposed a fuzzy support vector machine (FSVM) model by applying fuzzy membership to each input point. The model can make full use of the sample information, however, the complexity of the training stage is still high for a large number of data classification problems. For the research on fuzzy set classification tasks in the field of machine learning, Aydogan et al. [13] proposed a hybrid heuristic method based on the genetic algorithm (GA) and integer programming formula (IPF) to solve the high-dimensional classification problem in the classification system of linguistic fuzzy rules. The method can find accurate and concise classification rules, but can not flexibly consider the number of rule sets generated in the classification. Sanz et al. [14] directly learned interval-valued fuzzy rules by defining a packaging method to obtain a classification system based on the interval valued fuzzy principle. Compared with the existing algorithm at that time, the accuracy of this method has been significantly improved, but the unbalanced classification problem can not be well tested. The algorithm is inefficient owing to its two evolutionary processes. Li et al. [15] proposed an interval extreme learning machine for interval fuzzy set classification of continuous-valued attributes, in which the discretization of conditional attributes and fuzzification of class labels are considered. Recently, an associative fuzzy classifier called CFM-BD [16] was been developed, which has shown robust predictive performance against more complex algorithms such as fuzzy decision trees [17]. To simplify the rule set, Aghaeipoor et al. [18] proposed a new scalable fuzzy classifier for big data, namely Chi-BD-DRF, which added the method of "dynamic rule filtering (DRF)" to supplement fuzzy big data learning.

The aforementioned mentioned processing methods are based on the finest granularity from the perspective of granular computing [19, 20], as shown in Fig. 2(a), therefore, it is not efficient and robust. Human cognition has the rule of "large scope first," and the visual system is particularly sensitive to the global topological characteristics, from large to small, from coarse-grained to fine-grained as shown in Fig. 1 [21]. In granular computing, the larger the granularity size, the higher the efficiency and the better the robustness to noise. However, this is also more likely to lead to a lack of detail and loss of accuracy. Smaller granularity allows more attention to detail, but may reduce the efficiency and robustness to label noise. In the past decades, scholars worldwide have been constantly studying [22, 23, 24, 25], who granulate huge amounts of data and knowledge into different granularities according to different tasks. The relationship between these granularities was then used to solve this problem [26, 27, 28, 29]. Selecting different granularities according to various scenarios can improve the performance of multi-granularity learning methods and solve practical problems [30, 31, 32]. Therefore, Xia et al.[33] proposed granular-ball classifiers using some hyper-balls to granulate the dataset into different sizes of granular-balls [34]. The granular-ball support vector machine (GBSVM) [35] is further proposed, and exhibits higher accuracy and efficiency than the traditional SVM. In order to improve the efficiency of fuzzy data processing, the idea of granular-ball computing can be introduced into fuzzy data processing by defining the fuzzy granular-ball, as shown in 2(b). The concept of fuzzy granular-balls was briefly proposed in our previous work [36], but its algorithm is not designed; besides, its SVM model is incorrect [36, 37], too complex and not consistent with the SVM. In order to improve the efficiency and robustness of fuzzy classifiers by combining granular-ball computing, the main contributions of the paper are as follows:

$\bullet$

We propose a framework called the granular-ball fuzzy set by introducing the concept of the fuzzy granular-ball. It is different from the traditional fuzzy data processing method.
$\bullet$

GBFSVM is proposed based on the fuzzy granular-ball framework. The framework uses granular-balls as the basic analysis unit instead of data points.
$\bullet$

Considering the classification problem with the characteristics of triangular fuzzy numbers, the GBFSVM based on triangular fuzzy numbers is derived in detail using the possibility measure theory.
$\bullet$

Particle swarm optimization (PSO) is used to solve the dual model of GBSVM. Experimental results indicate that GBFSVM performs better than the traditional SVM and FSVM both in robustness and effectiveness.

The rest of this paper is organized as follows: we introduce the concepts of fuzzy sets and the work related to granular-ball computing II. Section III details the granular-ball fuzzy set framework and the definition of fuzzy granular-ball. Section IV introduces the application of granular-ball fuzzy set in fuzzy support vector machines and support vector machines based on triangular fuzzy numbers. The experimental results and analysis are presented in Section V. Finally, some concluding remarks are given in Section VI.

II Related Work

II-A Related concepts of fuzzy sets

With the development of modern science and technology, the system we are facing is becoming more and more complex. For complex problems in the fields of humanities, social sciences and other "soft sciences," it is often difficult to provide an accurate evaluation owing to insufficient cognition or information content in the decision-making process. For multi-attribute decision making without specific decision information, it is difficult for decision makers to accurately evaluate the scheme, thus, the concept of the fuzzy set is generated. This concept is as follows:

Definition 1.

(Fuzzy set [38]) If $X$ is a collection of objects denoted generically by $x$ , then a fuzzy set $\tilde{A}$ in $X$ is a set of ordered pairs:

\tilde{A}=\left\{x,\mu_{\tilde{A}}(x)|x\in X\right\}.

(1)

$\mu_{\tilde{A}}(x)$ is called the membership function (generalized characteristic function) which maps $X$ to the membership space $M$ . Generally speaking, the range of the membership function is $[0,1]$ .

The most important role of fuzzy sets is to represent various uncertainties in the data and data processing. In particular, the introduction of fuzzy sets in big data improves the representation ability of the information samples.

In particular, the triangular fuzzy number is the concept of the fuzzy set proposed by Professor Lotfi A. Zadeh in 1965 in order to solve these problems in an uncertain environment. The concept of triangular fuzzy number is as follows:

Definition 2.

(Triangular fuzzy number [39]) Suppose $\tilde{a}$ is a triangular fuzzy number, when its membership function is expressed as follows:

\mu_{\tilde{a}}(x)=\left\{\begin{aligned} &\frac{x-r_{1}}{r_{2}-r_{1}},\quad\quad r_{1}\leq x\textless r_{2},\\ &1,\quad\quad\quad\quad\quad x=a,\\ &\frac{x-r_{3}}{r_{2}-r_{3}},\quad\quad r_{2}\textless x\leq r_{3},\\ \end{aligned}\right.

(2)

where $r_{1}\leq r_{2}\leq r_{3},r_{j}\in\mathbb{R}(j=1,2,3)$ and $\tilde{a}$ is called a triangular fuzzy number, denoted by $\tilde{a}=(r_{1},r_{2},r_{3})$ . The real numbers $r_{2}$ , $r_{1}$ and $r_{3}$ are called the center, left and right endpoints of the triangular fuzzy number $\tilde{a}$ , respectively. The center reflects the main position of the triangular fuzzy number, and the real number $a$ can be expressed as a special triangular fuzzy number $a=(a,a,a)$ .

The probability of occurrence of a fuzzy event ${\tilde{a}}$ can be measured using the possibility measure. This possibility measure was proposed by Professor Lotfi A. Zadeh in 1978. It is defined as follows:

Definition 3.

(Possibility measure [40]) Let ( $\Gamma$ , $\mathcal{A}$ ) be a backup domain space, and $\rm{Pos}$ is a set function defined on the backup domain $\mathcal{A}$ . If $\rm{Pos}$ satisfies the following conditions:
(1) ${\rm{Pos}}(\emptyset)=0$ , and ${\rm{Pos}}(\Gamma)=1$ ;
(2) For any subclass $\left\{A_{i}|i\in I\right\}$ of $\mathcal{A}$ , there are ${\rm{Pos}}\left(\underset{i\in I}{\bigcup}A_{i}\right)=\underset{i\in I}{\sup}\ {\rm{Pos}}(A_{i})$ . This is called a possibility measure, and the triples ( $\Gamma,\mathcal{A},{\rm{Pos}}$ ) are called possibility spaces.

When the triangular fuzzy number is used to represent the fuzzy event $a$ , the likelihood of the fuzzy event is measured as follows:

Definition 4.

[41] Let ${\tilde{a}}=(r_{1},r_{2},r_{3})$ be a triangular fuzzy number, then:

{\rm{Pos}}({\tilde{a}}\leq 0)=\left\{\begin{aligned} &1,\quad\quad\quad\quad\quad r_{2}\leq 0,\\ &\frac{r_{1}}{r_{1}-r_{2}},\quad\quad r_{1}\leq 0,r_{2}\textgreater 0,\\ &0,\quad\quad\quad\quad\quad r_{1}\textgreater 0.\\ \end{aligned}\right.

(3)

Lemma 1.

[42] Let ${\tilde{a}}=(r_{1},r_{2},r_{3})$ be a triangular fuzzy number, then for any given confidence level $\lambda\left(0\textless\lambda\leq 1\right)$ , we have:

{\rm{Pos}}\left\{{\tilde{a}}\leq 0\right\}\geq\lambda\Leftrightarrow(1-\lambda)r_{1}+\lambda r_{2}\leq 0.

(4)

SVM is a powerful tool for solving classification problems, however, this theory still has some limitations. All training points of the same class are uniformly treated using SVM theory. In various real-world applications, the effects of training points are different, and all have an ambiguous membership relationship. Specifically, each training point no longer belongs to one of these two classes entirely. Whereas the parameter $\xi$ is a measure of the error in SVM, and FSVM considers adding different weights(i.e., membership degree ${\delta}_{i}$ ) to the error [1], and its model is as follows:

\left\{\begin{aligned} &{\rm{min}}\ \ \ \frac{1}{2}\|w\|^{2}+C\sum_{i=1}^{l}{\delta}_{i}\xi_{i},\\ &s.t.\quad y_{i}(wx_{i}+b)\textgreater 1-\xi_{i},\\ &\quad\quad\ \ \xi_{i}\geq 0,i=1,2,...,n.\end{aligned}\right.

(5)

II-B Granular-ball Computing

Granular-ball computing is a big data processing method proposed by Wang and Xia to meet the scalability of high-dimensional data [33]. The core idea of granular-ball computing involves using the hyper ball to cover all or part of the sample space, and use "granular-ball" as the input to represent the sample space, so as to achieve multi-granularity learning characteristics and accurate characterization of the sample space. The great advantage of this method is that it only needs two data representations of center and radius in any dimension.

In any dimensional space $R^{d}$ , each granular-ball can be described by two parameters i.e. center $c$ and radius $r$ . The detailed definition is as follows:

Definition 5.

[33] Given a data set $D=\{x_{1},x_{2},...,x_{n}\}\in\mathbb{R}^{d}$ , the center $c$ of a granular-ball is the center of gravity for all sample points in the ball, and $r$ is equal to the average distance from all points in the granular-ball to $c$ . Specifically, we have:

c=\frac{1}{n}\sum_{i=1}^{n}x_{i};\ \ \ r=\frac{1}{n}\sum_{i=1}^{n}\left|x_{i}-c\right|.

The radius $r$ is defined as the average distance rather than the maximum distance. The balls generated with the average distance are not easily affected by the outlier sample and better fit the data distribution. The label of a granular-ball is defined as the label with the most appearances in a granular-ball. To quantitatively analyze the mass of the split granular-ball, the concept of "purity threshold" is proposed, and it is defined as the percentage of majority samples with the same label in a granular-ball.

Given the training set $D=\{x_{i},i=1,2,...,n\}$ , taking the reciprocal of the granular-ball covering is taken to optimize its minimum value. The optimization goal of the granular-balls can be expressed as:

\left\{\begin{aligned} &{\mathop{\min}}\ \ {\lambda_{1}}*\frac{n}{\sum_{j=1}^{m}\left|GB_{j}\right|}+{\lambda_{2}}*m,\\ &s.t.\quad quality(GB_{j})\geq T,j=1,2,...,m,\\ \end{aligned}\right.

(6)

where $\lambda_{1}$ and $\lambda_{2}$ are the corresponding weight coefficients and $T$ is the purity threshold. $m$ represents the number of granular-balls, $GB$ , and $m<n$ [21]. The existing granular-ball splitting method currently uses the efficient $k$ -means method ( $k$ is the number of labels in a certain ball) to ensure the efficiency of the granular-ball classification process. Fig. 3 is a heuristic algorithm to solve the model (6). The dataset as a whole can be considered as a granular-ball at the beginning, as shown in Fig. 4(a). At this instant, the quality of the granular-ball is the worst, and it cannot describe the distribution characteristics of the data. Therefore the granular-ball needs to be further split, and for each split, it was necessary to count the number of labels for the different categories in the granular-balls. The ball will continue to be divided and the purity of the granular-ball increases, as shown in Fig. 4 (b)-(d). When the purity of all the granular-balls meets the requirement, the algorithm converges result as shown in Fig. 4 (e). The resulting granular-balls are shown in Fig. 4 (f), which shows that granular-ball computing can well describe the distribution of data.

II-C Granular-ball SVM

SVM is one of the most classical and popular classification algorithms in machine learning in recent decades. It is a generalized linear classifier, which classifies data using supervised learning. The SVM classifier takes points as input and divides the data into two categories by finding a classification line (two-dimensional is a straight line, three-dimensional is a plane, and multi-dimensional is a hyperplane). If the points are taken as the input, there will be a large amount of calculation and low robustness. Considering granule balls as input, the basic model of granular ball support vector machine (GBSVM) is derived [35], and its main principle is shown in Fig. 3. Given a dataset $D=\left\{(x_{i},y_{i}),i=1,2,...,n\right\}$ , where $y_{i}\in\left\{+1,-1\right\}$ denotes the label of $x_{i}$ . In Fig. 5, the two colors represent the two types of granular-balls, which are labeled "+1" and "-1". The set of generated granular-balls is denoted by $G=\{((c_{i},r_{i}),y_{i}),i=1,2,...,n\}$ as the input, where $c$ and $r$ represent the center and radius, respectively.

Taking the ball as input has two main advantages: Firstly, taking the ball as input significantly reduces the number of samples input, so as to improve the efficiency of training. Secondly, as shown in Fig. 5, since the overall label of a granular ball is the label with the most appearances in that granular ball, a small number of samples do not affect the label of the whole granular-ball, so the granular-ball algorithm is more robust.

By maximizing the interval and formalizing it into convex quadratic programming, the objective function of the separable GBSVM in separable classification can be obtained as follows:

\left\{\begin{aligned} &{\rm{min}}\ \ \ \frac{1}{2}\|w\|^{2},\\ &s.t.\quad y_{i}(wc_{i}+b)-\|w\|{r_{i}}\textgreater 1,i=1,2,...,n,\\ \end{aligned}\right.

(7)

where $b$ is the bias of the decision plane and $w$ denotes the normal vector of the decision plane.

In the separable GBSVM, all supported granular-balls must satisfy the constraints. However, it is necessary to introduce the slack variable $\xi$ and the penalty coefficient $C$ , since some granular-balls do not satisfy the constraint in most cases. Therefore, the inseparable GBSVM model can be expressed as:

\left\{\begin{aligned} &{\rm{min}}\ \ \ \frac{1}{2}\|w\|^{2}+C\sum_{i=1}^{l}\xi_{i},\\ &s.t.\quad y_{i}(wc_{i}+b)-\|w\|{r_{i}}\textgreater 1-\xi_{i},\\ &\quad\quad\ \ \xi_{i}\geq 0,i=1,2,...,n.\end{aligned}\right.

(8)

GBSVM can be robust to noise without using any other techniques, since the coarse granularity of a granular-ball can eliminate the effect of the minimum granularity of the label noise points as shown in Fig. 5.

III Granular-ball fuzzy set

III-A Motivation

Most of the existing data processing methods use the finest information granularity as input, which is computationally inefficient. GBSVM combined with granular-ball computing and obtained more efficient and robust results. However, this method was not introduced to the fuzzy set. Classical FSVM was proposed by introducing the fuzzy membership of each point in the training set. The fuzzy membership $\delta_{i}$ is the attitude of the corresponding point $x_{i}$ toward one class and the parameter $\xi_{i}$ is a measure of error. FSVM was considered to add different membership weights to the loss terms, but it still used the most fine-grained sample points as input [1], as shown in Fig. 6(a). The time cost is high for large sample data. In addition, FSVM that takes points as input is not robust to label noise. Granular-ball computing can be considered in the fuzzy classifiers, and the simple principle is shown in Fig. 6(b). In this study, we propose a framework called granular-ball fuzzy set to address these issues, and the framework can be applied to all directions of fuzzy data processing.

III-B Fuzzy granular-ball computing framework

Inspired by the granular-ball computing, we believe that the traditional fuzzy data processing structure with points as input can be converted to a new structure incorporating granular-ball fuzzy set. Given a fuzzy dataset $D=\{(x_{1},\delta_{1}),(x_{2},\delta_{2}),...,(x_{n},\delta_{n})\}$ , where $\delta_{i}$ is the degree of membership. In the new structure, different sizes of granular-balls are generated computationally, so that the dataset is reformulated as:

G=\{(GB_{1},\delta_{GB_{1}}),(GB_{2},\delta_{GB_{2}}),...,(GB_{l},\delta_{GB_{l}})\},

where $l$ and $\delta_{GB_{1}}$ represent the number and membership degree of fuzzy granular-ball. Each granular-ball, $GB_{i}$ , can be represented by its center $c_{i}$ and radius $r_{i}$ . Assuming that the fuzzy data processing method is a function $f(x_{i},\delta_{i})$ , the computing framework of fuzzy granular-ball is to transform the points into fuzzy granular-balls as input, which is expressed as:

f(x_{i},\delta_{i})\rightarrow f(GB_{i},\delta_{GB_{i}}).

In this computational framework, the input becomes a general description of each sub-dataset of different sizes. Taking balls as input reduces the number of input samples, thus greatly improving the efficiency of training. In addition, the size of fuzzy granular-ball can be adjusted according to the specific situation. Coarse granularity balls may not lead to lower accuracy, because larger coarse granularity balls can reduce the impact of noisy data and increase robustness to noisy. The framework called granular-ball fuzzy set can be extended to various scenarios in the field of fuzzy data processing. Therefore, the framework has three advantages: high efficiency, robustness and scalability.

III-C Definition of the fuzzy granular-ball

Taking the field of fuzzy data classification as an example, most of the existing processing methods all take sample points as input, and the computation process has a high time cost and is not robust to label noise. In our previous work, we initially provided the definition of a fuzzy granular-ball with a known the membership degree of each sample point as follows [36]:

Definition 6.

If the membership degree ${\delta}_{i}$ of each sample in the dataset is known, fuzzy granular-balls are obtained by the $k$ -means algorithm. For each fuzzy granular-ball, $GB_{i}=\left\{(x_{1},{\delta}_{1}),(x_{2},{\delta}_{2}),...,(x_{l},{\delta}_{l})\right\}$ , where $l$ represents the number of samples in each granular-ball. The membership degree ${\delta}_{GB}$ of the ball is obtained as follows:

{\delta}_{GB_{i}}=\frac{1}{l}\sum_{t=1}^{l}{\delta}_{t}.

However, the above definition does used a general framework, its GBSVM algorithm is not designed, and its GBSVM model is incorrect, too complex and inconsistent with the FSVM. However, in practice, most of the sample points of fuzzy data sets do not contain membership degrees, so we define the fuzzy granular-ball by designing the membership function as follows:

Definition 7.

The set of fuzzy granular-balls, $D=\{\left((c_{1},r_{1}),y_{1}\right),\left((c_{2},r_{2}),y_{2}\right),...,\left((c_{n},r_{n}),y_{n}\right)\}$ is generated by $k$ -means algorithm ( $n$ represents the number of balls). The membership degree of the fuzzy granular-ball ${GB_{i}}$ , ${\delta}_{GB_{i}}$ , is defined using the membership function $\mu(x)$ as follows:

{\delta}_{GB_{i}}=\mu(c_{i}).

In view of Definition 7 of fuzzy granular-ball, on the one hand, fuzzy granular-ball is used instead of sample points as input to improve the computational efficiency when the membership function of training samples is known. On the other hand, it is necessary to calculate the corresponding center point of each fuzzy granular-ball as the membership degree of the ball to participate in training, instead of calculating the membership degree of each sample, which significantly reduces the operation cost and improves computational efficiency.

In addition, the overall label of the fuzzy granular-ball is defined as the label that appears most frequently in the ball. In general, a larger fuzzy granular-ball leads to greater efficiency and lower accuracy. However, since the overall label of a granular-ball is the label that appears most in the ball, the influence of noisy data in each fuzzy granular-ball can be eliminated, which makes the fuzzy granular-ball algorithm more robust.

III-D Fuzzy granular-ball generation method

The generated granular-balls may also contain samples of different classes. To determine whether the fuzzy granular-ball should be further divided, we also used the concept of "purity," which is defined as the percentage of the majority sample in the fuzzy granular-ball. When the purity value of the granular-ball is too low, the further division is required to obtain a high-quality granular-ball. For example, if a granular-ball contains 20 positive samples and 80 negative samples, its purity is equal to 0.8. If the purity threshold is set to 0.9, then fuzzy granular-ball must be segmented. As is known to all, the $k$ -means clustering algorithm is more suitable for generating spherical data clustering to achieve good efficiency, so the method of fuzzy granular-balls generation is designed as shown in Algorithm 1.

Algorithm 1 Generation of fuzzy granular-balls

Input: Fuzzy training set $D$ , the purity threshold $p$ ;
Output: Fuzzy granular-ball $GB$ ;

1: The fuzzy training set

D

is split into two balls

GB_{1}^{1}

and

GB_{2}^{1}

by using 2-means. The number of iteration steps is denoted as

s

and initialized to 1.

2: For each

GB_{j}^{s}

3: Calculate the center

c_{j}^{s}=\frac{1}{n_{j}^{s}}\sum_{i=1}^{n_{j}^{s}}x_{i}

GB_{j}^{s}

, where

x_{i}\in GB_{j}^{s},i=1,...,n_{j}^{s}

;

4: Calculate the radius

r_{j}^{s}=\frac{1}{n_{j}^{s}}\sum_{i=1}^{n_{j}^{s}}\left|x_{i}-c_{j}^{s}\right|

;

5: The purity degrees

p_{j}^{s}

is equal to the percentage of majority samples in

GB_{j}^{s}

;

6: If the value

p_{j}^{s}<p

Then

GB_{j}^{s}

is split by 2-means clustering algorithm.

8: End if

9: End for

10: If the purity of each granular-ball is higher than

p

Then

11: The membership degree of fuzzy granular-ball can be obtained by

\delta_{GB_{j}^{s}}=\mu(c_{j}^{s})

12: Else

13:

s=s+1

, return to step 2;

14: End if

As with granular-ball computing, the learning process of the fuzzy granular-ball classifier includes fuzzy granular-ball generation and its computational learning. Since the number of fuzzy granular-balls generated after splitting a large dataset can almost be considered as a small constant, the training time of the classification or regression process can be ignored. Therefore, the time cost of the fuzzy granular-ball classifier is mainly caused by the time of the fuzzy ball generation process. It is commendable that the fuzzy granular-ball classifier has good robustness and low time complexity [33], which does not necessarily result in a loss of accuracy.

IV The application of granular-ball fuzzy set

IV-A Granular-ball fuzzy support vector machine

In practical problems, there are sample points that may not completely belong to a certain class and some sample points that are affected by noise and have no significance for classification. For example, when turning the steering wheel, the left turn can be 20% or 100%, but in both cases the corresponding label is left. However, standard support vector machine treats all training samples equally, so it is very sensitive to the noise and outlier samples mixed in another class, which reduces the generalization ability of the classifier. In view of this situation, Lin et al. [1] proposed FSVM model by applying the fuzzy technology to SVM. According to the different contributions of different input samples to classification, the corresponding membership degree was assigned, so as to reduce the influence of noise and outlier samples and improve the classification performance of SVM. However, FSVM spends a lot of time solving the problem of fuzzy data classification with a large number of samples. We apply granular-ball fuzzy set to FSVM and use fuzzy balls instead of sample points as input to improve the computational efficiency. Suppose that we have a series of training points:

\left\{\left((c_{1},r_{1}),\delta_{1},y_{1}\right),\left((c_{2},r_{2}),\delta_{2},y_{2}\right),...,\left((c_{l},r_{l}),\delta_{l},y_{l}\right)\right\},

(9)

where $c$ , $r$ and $\delta$ represent the center, radius and membership of the granular-ball, respectively.

For the membership degree $\delta$ , there are two cases in which the membership degree is known and the membership degree is unknown. When the membership degree is known, the membership degree of the fuzzy granular-balls can be obtained by Definition 6, that is, the average membership degree of all sample points in each fuzzy granular-ball. For most existing data sets, membership information is not included. Therefore, it is necessary to construct membership functions to generate membership information, and the membership of fuzzy granular-ball can be obtained by Definition 7.

At present, the method for constructing the membership function is primarily measured by calculating the distance from the sample to the class center. The closer the distance, the greater the degree of membership. The farther the distance is, the smaller the membership degree is.

Denote the mean of class +1 as $x_{+}$ and that of class -1 as $x_{-}$ . Let the radius of class +1 and the radius of class -1 be:

r_{+}=\underset{x_{i}:y_{i}=1}{\rm{max}}\left|x_{i}-x_{+}\right|,

(10)

r_{-}=\underset{x_{i}:y_{i}=-1}{\rm{max}}\left|x_{i}-x_{-}\right|,

(11)

Let the fuzzy membership $\delta_{i}$ be a function of the mean and radius of each class:

\mu(x_{i})=\left\{\begin{aligned} &1-\left|x_{i}-x_{+}\right|/(r_{+}+\epsilon),\quad if\quad y_{i}=1,\\ &1-\left|x_{i}-x_{-}\right|/(r_{-}+\epsilon),\quad if\quad y_{i}=-1,\\ \end{aligned}\right.

(12)

where $\epsilon\textgreater 0$ is used to avoid the case $\mu(\delta_{i})=0$ . There is no general rule to follow for the determination of membership function, which needs to be described according to the characteristics of different data sets.

For the fuzzy classifier problem, FSVM considers adding different membership weights to the lost items to make full use of the information of the samples. Similarly, we define the fuzzy granular-ball, and introduce the membership degree of each ball into GBSVM, and the GBFSVM model can be expressed as:

\left\{\begin{aligned} &\underset{w,b,\xi_{i}}{\rm{min}}\ \ \ \frac{1}{2}\|w\|^{2}+C\sum_{i=1}^{l}\delta_{i}\xi_{i},\\ &s.t.\quad y_{i}(wc_{i}+b)-\|w\|{r_{i}}\geq 1-\xi_{i},i=1,2,...,n,\\ &\quad\quad\ \ \xi_{i}\geq 0.\end{aligned}\right.

(13)

After introducing the Lagrange multiplier $\alpha_{i}$ as the inequality constraint, the Lagrange function of Eq. (13) can be expressed as:

		$\displaystyle\mathcal{L}(w,b,\xi,\alpha,\mu,\delta)=$		(14)
		$\displaystyle\ \ \ \frac{1}{2}\\|w\\|^{2}-\sum_{i=1}^{n}\alpha_{i}(y_{i}(wc_{i}+b)-\\|w\\|r_{i}-1+\xi_{i})$
		$\displaystyle\ \ \ +C\sum_{i=1}^{n}\delta_{i}\xi_{i}-\sum_{i=1}^{n}\mu_{i}\xi_{i}.$

Let $\mathcal{L}(w,b,\xi,\alpha,\mu,\delta)$ on $w$ , $b$ and $\xi$ be partial derivatives equal to 0, we obtain

	$\displaystyle\frac{\partial{\mathcal{L}}}{\partial w}=w-\sum_{i=1}^{n}\alpha_{i}y_{i}c_{i}+\sum_{i=1}^{n}\alpha_{i}r_{i}\frac{w}{\\|w\\|}=0,$		(15)
	$\displaystyle\frac{\partial{\mathcal{L}}}{\partial b}=-\sum_{i=1}^{n}\alpha_{i}y_{i}=0,\quad\quad\quad\quad\quad\quad\quad\quad\$		(16)
	$\displaystyle\frac{\partial{\mathcal{L}}}{\partial\xi}=C-\alpha_{i}-\mu_{i}=0.\quad\quad\quad\quad\quad\quad\quad\$		(17)

Eq. (15) can be rewritten as:

w=\frac{\left\|w\right\|\sum_{i=1}^{n}\alpha_{i}y_{i}c_{i}}{\left\|w\right\|+\sum_{i=1}^{n}\alpha_{i}r_{i}}.

(18)

Square both sides of Equation (18) and take the square root to obtain:

\left\|\sum\limits_{i=1}^{n}\alpha_{i}y_{i}c_{i}\right\|=\left|\left\|w\right\|+\sum\limits_{i=1}^{n}\alpha_{i}r_{i}\right|.

(19)

Since $\left\|w\right\|>0$ , $\alpha_{i}>0$ and $r_{i}>0$ , $w$ can be rewritten as:

\left\|w\right\|=\left\|\sum\limits_{i=1}^{n}\alpha_{i}y_{i}c_{i}\right\|-\sum\limits_{i=1}^{n}\alpha_{i}r_{i}=\left\|A\right\|-B,

(20)

w=\frac{\left(\left\|\sum\limits_{i=1}^{n}\alpha_{i}y_{i}c_{i}\right\|-\sum\limits_{i=1}^{n}\alpha_{i}r_{i}\right)\sum\limits_{i=1}^{n}\alpha_{i}y_{i}c_{i}}{\sum\limits_{i=1}^{n}\alpha_{i}y_{i}c_{i}}=\frac{(\left\|A\right\|-B)A}{\left\|A\right\|},

(21)

where,

\displaystyle A=\sum_{i=1}^{n}\alpha_{i}y_{i}c_{i},\ B=\sum_{i=1}^{n}\alpha_{i}r_{i}.

When $0<\alpha<\delta C$ , $b$ can be obtained as

b=\frac{1+\left\|w\right\|r_{i}}{y_{i}}-wc_{i}.

(22)

Putting Eqs. (15), (16) and (17) into Eq. (14), the dual model of the inseparable GBFSVM is:

	$\displaystyle\underset{\alpha}{\rm{max}}\quad-\frac{1}{2}A^{2}-\frac{1}{2}B^{2}+\\|A\\|B+\sum_{i=1}^{n}\alpha_{i},$
	$\displaystyle\ s.t.\quad\quad\sum_{i=1}^{n}\alpha_{i}y_{i}=0,$
	$\displaystyle\quad\quad\quad\quad 0\leq\alpha_{i}\leq\delta_{i}C,i=1,2,...,n.$		(23)

Eq. (IV-A) can be expressed as:

	$\displaystyle\underset{\alpha}{\rm{max}}\quad-\frac{1}{2}\\|w\\|^{2}+\sum_{i=1}^{n}\alpha_{i},$
	$\displaystyle\ s.t.\quad\quad\sum_{i=1}^{n}\alpha_{i}y_{i}=0,$
	$\displaystyle\quad\quad\quad\quad 0\leq\alpha_{i}\leq\delta_{i}C,i=1,2,...,n,$		(24)

where $\|w\|$ is given by Eq. (20).

The interesting phenomenon is that the obtained dual model (IV-A) is consistent with the original model, and it is also corresponding to the FSVM and GBSVM models.

Algorithm 2 Granular-ball fuzzy support vector machine

Input: The fuzzy granular-ball set was $D=\left\{\left((c_{1},r_{1}),\delta_{1},y_{1}\right),\left((c_{2},r_{2}),\delta_{2},y_{2}\right),...,\left((c_{l},r_{l}),\delta_{l},y_{l}\right)\right\}$ .
Output: $y$

1: According to Eq. (IV-A), defines the fitness function

f=-(-\frac{1}{2}A^{2}-\frac{1}{2}B^{2}+\|A\|B+\sum_{i=1}^{n}\alpha_{i})

, where

\alpha_{i}

is a Lagrange multiplier. That is to say, each

\alpha_{i}

corresponds to a granular-ball

GB_{i}

, then,

\alpha=[\alpha_{1},\alpha_{2},\dots,\alpha_{n}]

2: Initialize the number of the particles

pop

, the maximum number of iterations

max\_iter

, the lower boundary

lb

, the upper boundary

\delta C

, the inertia factor

w

and the learning factors

c_{1}

and

c_{2}

;

3: Use PSO to looking for the optimal

\alpha=[\alpha_{1},\alpha_{2},\dots,\alpha_{n}]

to satisfy

\begin{subarray}{c}\min\\ \alpha\end{subarray}-f

;

4: According to Eq. (21), calculate

\omega

;

5: According to Eq. (22), we calculated

b

IV-B Granular-ball fuzzy support vector machine based on triangular fuzzy number

In practice, there are various examples of fuzzy classification boundaries, particularly in medical diagnoses. The fuzzy number has a very important significance in a fuzzy system. Xue et al. [36] derived the GBSVM model based on triangular fuzzy numbers. However, GBSVM model, which is used by them has an error that the support vector is not contained in the constraint, its form is not very uniform with the traditional model. Therefore, we have corrected it in the following proposed fuzzy SVM based on triangular fuzzy numbers to classify the fuzzy data model. For a given fuzzy dataset, the fuzzy granular-ball training set can be obtained by using the granular-ball fuzzy set as follows:

D_{1}=\left\{((c_{1},r_{1}),{\tilde{y}}_{1}),((c_{2},r_{2}),{\tilde{y}}_{2}),...,((c_{l},r_{l}),{\tilde{y}}_{l})\right\},

(25)

where the fuzzy number ${\tilde{y}}_{j}$ reflects the fuzzy category. The fuzzy classification problem involves finding a rule to infer the fuzzy number ${\tilde{y}}_{j}$ corresponding to any $x$ and thus reflects its fuzzy category [40]. For fuzzy information, there are generally three types of fuzzy characteristics: fuzzy positive class, i.e., the membership degree of the sample points belonging to the positive class is greater than that belonging to the negative class; fuzzy negative class, i.e., the membership degree of the sample points belonging to the negative class is greater than that belonging to the positive class; center, i.e., the membership degree of the sample points belonging to the positive class and the negative class is equal.

For convenience, we introduce the membership of the positive class fuzzy granular-ball $\delta_{+}\in[0.5,1]$ and the membership of the negative class fuzzy granular-ball $\delta_{-}\in[-1,-0.5]$ . Therefore, for the membership degree of the fuzzy granular-ball $\delta$ , the three fuzzy features can be represented using special triangular fuzzy numbers as follows:

\begin{array}[]{*{20}{l}}{{{\tilde{y}}_{j}}={\left({a_{1},a_{2},a_{3}}\right)}}\\ {={\left\{{\begin{array}[]{*{20}{l}}{\text{}{\left({\frac{{2\delta\mathop{{}}\nolimits^{{2}}+\delta-2}}{{\delta}},2\delta-1,\frac{{2\delta\mathop{{}}\nolimits^{{2}}-3\delta+2}}{{\delta}}}\right)},0.5\leq\delta\leq 1},\\ {\text{}{\left({\frac{{2\delta\mathop{{}}\nolimits^{{2}}+3\delta+2}}{\delta},2\delta+1,\frac{{2\delta\mathop{{}}\nolimits^{{2}}-\delta-2}}{\delta}}\right)},-1\leq\delta\leq-0.5.}\end{array}}\right.}}\end{array}

(26)

To solve the research problem, the fuzzy granular-ball training points in the fuzzy granular-ball set $D_{1}$ are reordered; that is, the positive class fuzzy granular-balls are ranked in the front and the negative class fuzzy granular-balls are ranked in the back, so as to obtain the fuzzy training set in the following form:

		$\displaystyle\ \ \ D_{2}=\{((c_{1},r_{1}),{\tilde{y}}_{1}),...,((c_{p},r_{p}),{\tilde{y}}_{p})$		(27)
		$\displaystyle\ \ \ ((c_{p+1},r_{p+1}),{\tilde{y}}_{p+1}),...,((c_{l},r_{l}),{\tilde{y}}_{l})\},$		(27)

where $((c_{t},r_{t}),{\tilde{y}}_{t})$ is the fuzzy positive class ball $t=1,...,p$ and $((c_{i},r_{i}),{\tilde{y}}_{i})$ is the fuzzy negative class ball $i=p+1,...,l$ .

Considering the fuzzy linear separable problem corresponding to the fuzzy training set $D_{2}$ , at this time, under the confidence level $\lambda\in[0,1]$ , the fuzzy classification problem is transformed into the following fuzzy chance constrained programming problem with $(w,b)^{T}$ as the decision variable:

\left\{\begin{aligned} &\underset{w,b}{\rm{min}}\ \ \ \frac{1}{2}\|w\|^{2},\\ &s.t.\quad{\rm{Pos}}\{{\tilde{y}}_{i}(w\cdot c_{i}+b)-\|w\|{r_{i}}\geq 1\}\geq\lambda,{i=1,2,...,l}.\\ \end{aligned}\right.

(28)

Theorem 1.

Let $\tilde{y}$ in fuzzy chance constrained programming Eq. (28) is a triangular fuzzy number, i.e. $\tilde{y}=(a_{1},a_{2},a_{3})$ . Under the confidence level $\lambda$ , the clear equivalent programming of granular fuzzy chance constrained programming (28) is the following quadratic programming:

\left\{\begin{aligned} &\underset{w,b}{\rm{min}}\ \ \ \frac{1}{2}\|w\|^{2},\\ &s.t.\quad((1-\lambda)a_{t3}+\lambda a_{t2})\cdot(wc_{i}+b)-\|w\|{r_{t}}\geq 1,\quad\quad\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ t=1,2,...,p,\\ &\ \quad\quad((1-\lambda)a_{j1}+\lambda a_{j2})\cdot(wc_{i}+b)-\|w\|{r_{j}}\geq 1,\quad\quad\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ j=p+1,...,l.\\ \end{aligned}\right.

(29)

Proof.

From the properties of triangular fuzzy number operation, if $\tilde{y_{i}}=(a_{i1},a_{i2},a_{i3})$ is a triangular fuzzy number, then ${-\tilde{y}}_{i}(w\cdot c_{i}+b)+\|w\|{r_{i}}+1$ is also a triangular fuzzy number.

Let

h_{i}^{+}(w,b)=\left\{\begin{aligned} &0,\ \ \quad\quad\quad\quad\quad h_{i}(w,b)\geq 0,\\ &-(wc_{i}+b),\quad h_{i}(w,b)\textless 0,\\ \end{aligned}\right.

(30)

h_{i}^{-}(w,b)=\left\{\begin{aligned} &(wc_{i}+b),\ \ \quad h_{i}(w,b)\geq 0,\\ &0,\quad\quad\quad\quad\quad h_{i}(w,b)\textless 0.\\ \end{aligned}\right.

(31)

Then $h_{i}^{+}(w,b)$ and $h_{i}^{-}(w,b)$ are nonnegative numbers and $h_{i}(w,b)=h_{i}^{+}(w,b)-h_{i}^{-}(w,b)$ . That is,

	$\displaystyle{\rm{Pos}}\{{\tilde{y}}_{i}(w\cdot c_{i}+b)-\\|w\\|{r_{i}}\geq 1}\}\geq\lambda,{i=1,2,...,l$
	$\displaystyle={\rm{Pos}}\{1-{\tilde{y}}_{i}(w\cdot c_{i}+b)+\\|w\\|{r_{i}}\leq 0}\}\geq\lambda,{i=1,2,...,l.$

Therefore,

	$\displaystyle h_{i}(w,b)\cdot\tilde{y_{i}}+\\|w\\|{r_{i}}+1$
	$\displaystyle=(h_{i}^{+}(w,b)-h_{i}^{-}(w,b))\cdot\tilde{y_{i}}+\\|w\\|{r_{i}}+1$
	$\displaystyle=(h_{i}^{+}(w,b)-h_{i}^{-}(w,b))\cdot(a_{i1},a_{i2},a_{i3})+\\|w\\|{r_{i}}+1$
	$\displaystyle=(h_{i}^{+}(w,b)a_{i1}-h_{i}^{-}(w,b)a_{i3}+\\|w\\|{r_{i}}+1,$		(32)
	$\displaystyle\ \ \ \ h_{i}(w,b)a_{i2}+\\|w\\|{r_{i}}+1,h_{i}^{+}(w,b)a_{i3}-$
	$\displaystyle\ \ \ \ h_{i}^{-}(w,b)a_{i1}+\\|w\\|.{r_{i}}+1).$

According to Lemma 1, the clear equivalent class of Eq. (IV-B) is obtained by

\begin{split}(1-\lambda)((h_{i}^{+}(w,b)a_{i1}-h_{i}^{-}(w,b)a_{i3})+\lambda(h_{i}(w,b)a_{i2})\\ +\|w\|{r_{i}}+1\leq 0,i=1,2,...,l.\end{split}

(33)

At the confidence level $\lambda(0\textless\lambda\leq 1)$ , Eq. (33) can be expressed as

\left\{\begin{aligned} &((1-\lambda)a_{i3}+\lambda a_{i2})\cdot(wc_{i}+b)-\|w\|{r_{i}}\geq 1,wc_{i}+b\textgreater 0,\\ &((1-\lambda)a_{i1}+\lambda a_{i2})\cdot(wc_{i}+b)-\|w\|{r_{i}}\geq 1,wc_{i}+b\leq 0.\\ \end{aligned}\right.

(34)

In summary, Eq. (28) can be expressed as:

\left\{\begin{aligned} &\underset{w,b}{\rm{min}}\ \ \ \frac{1}{2}\|w\|^{2},\\ &s.t.\quad((1-\lambda)a_{t3}+\lambda a_{t2})\cdot(wc_{t}+b)-\|w\|{r_{t}}\geq 1,\quad\quad\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ t=1,2,...,p,\\ &\ \quad\quad((1-\lambda)a_{j1}+\lambda a_{j2})\cdot(wc_{j}+b)-\|w\|{r_{j}}\geq 1,\quad\quad\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ j=p+1,...,l.\end{aligned}\right.

(35)

∎

By proving Theorem 1, the problem (28) is equivalent to (35). After introducing the Lagrange multipliers $\beta_{t}$ and $\alpha_{j}$ , the corresponding augmented Lagrange function of Eq. (35) can be expressed as:

$\displaystyle\mathcal{L}$	$\displaystyle(w,b,\beta,\alpha)=\frac{1}{2}\\|w\\|^{2}-\sum_{t=1}^{p}\beta_{t}\cdot(((1-\lambda)a_{t3}+\lambda a_{t2})\cdot(wc_{t}+b)$
	$\displaystyle-\\|w\\|{r_{t}}-1)-\sum_{j={p+1}}^{l}\alpha_{j}\cdot(((1-\lambda)a_{j1}+\lambda a_{j2})\cdot(wc_{j}+b)$
	$\displaystyle-\ \\|w\\|{r_{j}}-1),$	(36)

where, $\beta=(\beta_{1},\beta_{2},...,\beta_{p})^{T}\in\mathbb{R}^{p},\ \alpha=(\alpha_{p+1},...,\alpha_{l})^{T}\in\mathbb{R}^{l-p}$ .

Let $\mathcal{L}(w,b,\beta,\alpha)$ on $w$ and $b$ partial derivatives be equal to zero, and we obtain

\displaystyle\frac{\partial{\mathcal{L}}}{\partial w}

\displaystyle=w-M-N+\sum_{t=1}^{p}\beta_{t}\cdot r_{t}\frac{w}{\|w\|}+\sum_{j={p+1}}^{l}\alpha_{j}\cdot r_{j}\frac{w}{\|w\|}=0,\quad\quad\quad\quad\quad\quad

(37)

	$\displaystyle\frac{\partial{\mathcal{L}}}{\partial b}=-{\sum_{t=1}^{p}\beta_{t}((1-\lambda)a_{t3}+\lambda a_{t2})}\quad\quad\quad\quad\quad\quad\quad$
	$\displaystyle-{\sum_{j={p+1}}^{l}\alpha_{j}((1-\lambda)a_{j1}+\lambda a_{j2})}=0,\ \ \quad\quad\quad$		(38)

where,

	$\displaystyle M={\sum_{t=1}^{p}\beta_{t}((1-\lambda)a_{t3}+\lambda a_{t2})\cdot c_{t}},$
	$\displaystyle N={\sum_{j={p+1}}^{l}\alpha_{j}((1-\lambda)a_{j1}+\lambda a_{j2})\cdot c_{j}}.$		(39)

By simplifying Eq. (37) and Eq. (IV-B), we can obtain:

w=\frac{\|w\|(M+N)}{\|w\|+{\sum\limits_{t=1}^{p}\beta_{t}r_{t}}+{\sum\limits_{j=p+1}^{l}\alpha_{j}r_{j}}}.

(40)

By further simplifying Eq. (40), we can get:

{\left\|{M+N}\right\|}^{2}=\left(\|w\|+{\sum\limits_{t=1}^{p}\beta_{t}r_{t}}+{\sum\limits_{j=p+1}^{l}\alpha_{j}r_{j}}\right)^{2}.

(41)

Taking the square root of Eq. (41), since ${\left\|{w}\right\|},\beta,\alpha,r>0$ , ${\left\|{w}\right\|}$ can be obtain as follows:

\begin{array}[]{*{20}{l}}{{\left\|{w}\right\|}={{\left\|{M+N}\right\|}}{-{\mathop{\sum}\limits_{t=1}^{p}{\beta_{t}r_{t}}}-{\mathop{\sum}\limits_{j=p+1}^{l}{\alpha_{j}r_{j}}}}}\end{array}.

(42)

Substituting Eq. (42) into Eq. (40), $w$ can be described as follows:

w=\frac{(\|E\|-F)\cdot E}{\|E\|},

(43)

where,

\displaystyle E=M+N;\ \ \ F={\sum_{t=1}^{p}\beta_{t}r_{t}+\sum_{j=p+1}^{l}\alpha_{j}r_{j}}.

(44)

Put Eq. (IV-B) and (40) into Eq. (IV-B), the dual function of Eq. (35) is

\left\{\begin{aligned} &\underset{\alpha,\beta}{\rm{max}}\quad-\frac{1}{2}E^{2}-\frac{1}{2}F^{2}+\|E\|F+\sum_{t=1}^{p}\beta_{t}+\sum_{j=p+1}^{l}\alpha_{j},\\ &s.t.\quad\sum_{t=1}^{p}\beta_{t}((1-\lambda)r_{t3}+\lambda r_{t2})+\\ &\ \ \ \ \ \ \sum_{j=p+1}^{l}\alpha_{j}((1-\lambda)r_{j1}+\lambda r_{j2})=0.\end{aligned}\right.

(45)

Eq. (45) can be transformed into the same function as the original model of SVM:

\left\{\begin{aligned} &\underset{\alpha,\beta}{\rm{max}}\quad-\frac{1}{2}\|w\|^{2}+\sum_{t=1}^{p}\beta_{t}+\sum_{j=p+1}^{l}\alpha_{j},\\ &s.t.\quad\sum_{t=1}^{p}\beta_{t}((1-\lambda)a_{t3}+\lambda a_{t2})+\\ &\ \ \ \ \ \ \sum_{j=p+1}^{l}\alpha_{j}((1-\lambda)a_{j1}+\lambda a_{j2})=0.\\ \end{aligned}\right.

(46)

V Experiment

In this section, since it is difficult to find suitable data with triangular fuzzy number characteristics, we only conducted experiments using the FSVM method for fuzzy granular-ball extension applications. The feasibility and efficiency of GBFSVM are verified by comparison with the SVM and FSVM methods. In the experiment, our hardware experimental environment is AMD Ryzen Threadripper PRO 5975WX 32-Cores 3.60 GHz, and the software experiment environment is Python 3.9. The six benchmark datasets used in the experiment are all from the public dataset UCI, and their corresponding dataset names, sample numbers and dimensions are listed in Table I. To ensure the fairness of the experiment, all models are optimized using the particle swarm optimization (PSO) algorithm with the same parameters.

TABLE I: Dataset Information

Dataset	Samples	Dimensionality
Fourclass	862	2
Haberman	306	3
Heart1	294	13
Titanic	2201	2
BreastCancer	683	9
Credit	690	15

In this study, the data set shown in Table I was selected for experimental comparison, and the parameter $C=10$ was set. The parameters of the PSO algorithm used in the optimization process are set as follows: the dimension $dim$ denotes the number of independent variables of the objective function and $pop$ denotes the number of particles; the maximum number of iterations $max_{iter}$ is set to $1050$ ; the inertia coefficient $w$ is set to $0.5$ ; learning factor $c_{1}=c_{2}$ is equal to $1.6$ . Owing to the randomization of the test seed and the fact that all the results have to meet the constraints, the triangle optimization rule is destroyed. Therefore, the experimental results included the maximum values of SVM, FSVM and GBSVM in four runs on each dataset. In the experiment, we tested each dataset with different percentages of label noise, including 0%, 5%, 10%, 15%, 20%, 25% and 30%.

TABLE II: Comparsion of Accuracy between posSVM, FSVM and GBFSVM with different levels of class noise

Dataset		Fourclass	Haberman	Heart1	Titanic	BreastCancer	Credit
0%	SVM	0.6705	0.3548	0.6441	0.3628	0.6350	0.5072
	FSVM	0.6532	0.3065	0.6949	0.6757	0.8686	0.5217
	GBFSVM	0.7803	0.7903	0.7627	0.7778	0.9927	0.8261
5%	SVM	0.3815	0.3387	0.5254	0.3220	0.7080	0.5870
	FSVM	0.6647	0.4032	0.5932	0.3628	0.7810	0.5290
	GBFSVM	0.7514	0.8226	0.7458	0.6825	0.9927	0.6739
10%	SVM	0.6763	0.2903	0.5932	0.3492	0.7299	0.5870
	FSVM	0.6821	0.6290	0.6780	0.3741	0.7080	0.6159
	GBFSVM	0.8266	0.8065	0.6441	0.6646	1.0000	0.8768
15%	SVM	0.6532	0.3226	0.6271	0.3605	0.6861	0.6014
	FSVM	0.7110	0.3065	0.7288	0.3515	0.6277	0.4783
	GBFSVM	0.8208	0.8387	0.7457	0.7846	0.9927	0.8043
20%	SVM	0.7110	0.3226	0.6271	0.3379	0.6788	0.4928
	FSVM	0.6647	0.3710	0.6949	0.6871	0.7080	0.5435
	GBFSVM	0.7919	0.8387	0.6610	0.8163	0.9854	0.8188
25%	SVM	0.3815	0.2903	0.4407	0.3333	0.6788	0.5870
	FSVM	0.6416	0.3387	0.7119	0.3447	0.6423	0.4565
	GBFSVM	0.7514	0.7903	0.7288	0.7800	0.9854	0.7971
30%	SVM	0.3931	0.2903	0.6610	0.6599	0.6496	0.5290
	FSVM	0.6012	0.3065	0.6271	0.6780	0.7080	0.5942
	GBFSVM	0.7803	0.7903	0.6680	0.7007	0.9927	0.8116

The classification accuracies of SVM, FSVM and GBFSVM under different noise levels are listed in the Table. II. As shown in Table II, GBFSVM obtains higher classification accuracies than the SVM and FSVM in most instances. In addition, the GBFSVM also consistently achieves the best results at high noise levels. This is because the label of a granular-ball is defined as the label with the most appearances in a granular-ball. Further, taking granular-balls as input can reduce the effect of label noise, which has been described in detail in Section II(B).

To further verify the high efficiency of granular-ball fuzzy set, we provide the running time results for SVM, FSVM and GBFSVM solved by the PSO algorithm in Table III. The bold numbers indicate the best results of the three methods. Here, the time of the granular-ball purity threshold optimization was not considered, and the average running time of GBFSVM was used. In fact, the work of granular-ball calculation has been able to achieve the purity threshold adaptation [21]. Owing to the length of this paper and the complexity of the problem be conducted in the future. It is obvious that GBSVM is much faster than the other two methods. The reason is that the application of granular-balls instead of points as input greatly reduces the number of training samples, which trains the data faster. A detailed theoretical analysis is presented in Section II. In summary, the GBFSVM is much more efficient than SVM and FSVM.

TABLE III: Comparison of the running time between SVM, FSVM and GBFSVM

dataset	SVM	FSVM	GBFSVM
Fourclass	1740.4913	1492.9857	11.0211
Haberman	207.2167	192.7256	79.3770
Heart1	206.6333	229.6339	167.05811
Titanic	11425.4193	11942.9475	628.1647
BreastCancer	886.2465	1294.1307	4.7140
Credit	1191.9513	1063.8960	730.7979

VI Conclusions

This paper provides a scalable, efficient and robust algorithm framework for fuzzy big data processing by systematically defining the concept of the fuzzy granular-ball, which is available for all fuzzy data processing method. Moreover, the application of this framework reduces the time and space complexity of existing classifiers. This study extends the granular-ball fuzzy set framework to SVM classification computing and proposes the GBFSVM model and GBFSVM model based on the triangular fuzzy number. As shown in the experimental results of GBFSVM, the running time of using fuzzy granular-balls as input is much less than that of points, and its classification accuracy is higher. For datasets with different noise levels, the GBFSVM outperformed SVM in terms of robustness and effectiveness.

Despite the above advantages, there still exist some disadvantages in this paper despite of above advantages. The PSO algorithm cannot ensure a global optimal solution. Due to the limited length of this study and the complexity of the considered problem, we did not use the gradient descent method to optimize the dual model of GBFSVM. Therefore, in the future, we will research how to use the gradient descent method to solve the dual model. Moreover, GBFSVM based on the triangular fuzzy number has not been applied to solve the problem, and more reliable membership functions can be designed for more effective classification.

VI-A Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 62222601, 62221005 and 62176033, Key Cooperation Project of Chongqing Municipal Education Commission under Grant No. HZ2021008, and Natural Science Foundation of Chongqing under Grant No.cstc2019jcyj-cxttX0002.

References

[1] C.-F. Lin,and S.-D. Wang, “Fuzzy support vector machines,” IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 464-471, 2002.
[2] L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, no. 3 338–353, 1965.
[3] Q. Hu, W. Pan, L. Zhang, Y. Song, M. Guo, and D. Yu, “Feature selection for monotonic classification,” IEEE Transactions on Fuzzy Systems, vol. 20, no. 1, pp. 69–81, 2011.
[4] Y. Lin, Q. Hu, J. Liu, J. Li, and X. Wu, “Streaming feature selection for multilabel learning based on fuzzy mutual information,” IEEE Transactions on Fuzzy Systems, vol. 25, no. 6, pp. 1491–1507, 2017.
[5] L. Sun, L. Wang, W. Ding, Y. Qian, and J. Xu, “Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets,” IEEE Transactions on Fuzzy Systems, vol. 29, no. 1, pp. 19–33, 2020.
[6] A. Tan, W.-Z. Wu, Y. Qian, J. Liang, J. Chen, and J. Li, “Intuitionistic fuzzy rough set-based granular structures and attribute subset selection,” IEEE Transactions on Fuzzy Systems, vol. 27, no. 3, pp. 527–539, 2018.
[7] C. Wang, Y. Qi, M. Shao, Q. Hu, D. Chen, Y. Qian, and Y. Lin, “A fitting model for feature selection with fuzzy rough sets,” IEEE Transactions on Fuzzy Systems, vol. 25, no. 4, pp. 741–753, 2016.
[8] C. Wang, Y. Qian, W. Ding, and X. Fan, “Feature selection with fuzzy-rough minimum classification error criterion,” IEEE Transactions on Fuzzy Systems, vol. 30, no. 8, pp. 2930–2942, 2021.
[9] W. Ding, S. Chakraborty, K. Mali, S. Chatterjee, “An unsupervised fuzzy clustering approach for early screening of COVID-19 from radiological images,” IEEE Transactions on Fuzzy Systems, 2021.
[10] G. Selvachandran, S. G. Quek, L. T. H. Lan, N. L. Giang, W. Ding, M. Abdel-Basset, V. H. C. De Albuquerque, et al, “A new design of mamdani complex fuzzy inference system for multiattribute decision making problems,” IEEE Transactions on Fuzzy Systems, vol. 29, no. 4, pp. 716–730, 2019.
[11] J. Zhan, J. Ye, W. Ding, and P. Liu, “A novel three-way decision model based on utility theory in incomplete fuzzy decision systems,” IEEE Transactions on Fuzzy Systems, 2021.
[12] Q. Hu, D. Yu, Z. Xie, and J. Liu, “Fuzzy probabilistic approximation spaces and their information measures,” IEEE Transactions on Fuzzy Systems, vol. 14, no. 2, pp. 191–201, 2006.
[13] E. K. Aydogan, I. Karaoglan, and P. M. Pardalos, “hGA: Hybrid genetic algorithm in fuzzy rule-based classification systems for high-dimensional problems,” Applied Soft Computing, vol. 12, no. 2, pp. 800–806, 2012.
[14] J. A. Sanz and H. Bustince, “A wrapper methodology to learn interval-valued fuzzy rule-based classification systems,” Applied Soft Computing, vol. 104, pp. 107249, 2021.
[15] Y. Li, R. Wang, and S. C. Shiu, “Interval extreme learning machine for big data based on uncertainty reduction,” Journal of Intelligent & Fuzzy Systems, vol. 28, no. 5, pp. 2391–2403, 2015.
[16] S. Ramachandramurthy, S. Subramaniam, and C. Ramasamy, “Distilling big data: refining quality information in the era of yottabytes,” The Scientific World Journal, vol. 2015, 2015.
[17] A. Segatori, F. Marcelloni, and W. Pedrycz, “On distributed fuzzy decision trees for big data,” IEEE Transactions on Fuzzy Systems, vol. 26, no. 1, pp. 174–192, 2017.
[18] F. Aghaeipoor, M. M. Javidi, I. Triguero, and A. Fernández, “Chi-BD-DRF: Design of scalable fuzzy classifiers for big data via a dynamic rule filtering approach,” 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7, 2020.
[19] L. A. Zadeh, ”Fuzzy sets and information granularity,” Advances in Fuzzy Set Theory and Applications, vol. 11, pp. 3–18 1979.
[20] L. A. Zadeh, “Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic,” Fuzzy Sets and Systems, vol. 20, no. 9, pp. 111–127, 1997.
[21] S. Xia, X. Dai, G. Wang, X. Gao, and E. Giem, “An Efficient and Adaptive Granular-ball Generation Method in Classification Problem,” arXiv preprint arXiv:2201.04343, 2022.
[22] T. Y. Lin, “Granular computing on binary relations II: Rough set representations and belief functions,” Rough Sets in Knowledge Discovery, vol. 1, pp. 122–140, 1998.
[23] X. Zhang, H. Gou, Z. Lv, and D. Miao, “Double-quantitative distance measurement and classification learning based on the tri-level granular structure of neighborhood system,” Knowledge-Based Systems, vol. 217, pp. 106799, 2021.
[24] W. Pedrycz, “Allocation of information granularity in optimization and decision-making models: towards building the foundations of granular computing,” European Journal of Operational Research, vol. 232, no. 1, pp. 137–145, 2014.
[25] W. Pedrycz and K.-C. Kwak, “The development of incremental models,” IEEE Transactions on Fuzzy Systems, vol. 15, no. 3, pp. 507-518, 2007.
[26] D.-B. Bu, S. Bai, and G.-J. Li, “Principle of granularity in clustering and classification,” Chinese Journal of Computers-Chinese Edition-, vol. 25, no. 8, pp. 810–816, 2002.
[27] S. Hu, D. Miao, and W. Pedrycz, “Multi granularity based label propagation with active learning for semi-supervised classification,” Expert Systems with Applications, vol. 192, pp. 116276, 2022.
[28] W. Stach, L. Kurgan, W. Pedrycz, and M. Reformat, “Genetic learning of fuzzy cognitive maps,” Fuzzy Sets and Systems, vol. 153, no. 3, pp. 371–401, 2005.
[29] T. Yang, X. Zhong, G. Lang, Y. Qian, and J. Dai, “Granular matrix: A new approach for granular structure reduction and redundancy evaluation,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 12, pp. 3133–3144, 2020.
[30] H. Yu, G. Wang, B. Hu, X. Jia, H. Li, T. Li, D. Liang, J. Liang, B. Liu, D. Liu, et al, “Methods and practices of three-way decisions for complex problem solving,” International Conference on Rough Sets and Knowledge Technology, Springer, Cham: pp. 255–265, 2015.
[31] W. Pedrycz, “Identification in fuzzy systems,” IEEE Transactions on Systems, Man, and Cybernetics, no. 2, 361–366, 1984.
[32] A. Song, G. Wu, W. Pedrycz, and L. Wang, “Integrating variable reduction strategy with evolutionary algorithms for solving nonlinear equations systems,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 1, pp. 75-89, 2021.
[33] S. Xia, Y. Liu, X. Ding, G. Wang, H. Yu, and Y. Luo, “Granular ball computing classifiers for efficient, scalable and robust learning,” Information Sciences, vol. 483, pp. 136–152, 2019.
[34] S. Xia, D. Peng, D. Meng, C. Zhang, G. Wang, E. Giem, W. Wei, and Z. Chen, “A fast adaptive k-means with no bounds,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
[35] S. Xia, G. Wang, X. Gao, and X. Peng, “GBSVM: Granular-ball Support Vector Machine,” arXiv preprint arXiv:2210.03120, 2022.
[36] Y. S. Yanfei Xue and H. Xia, “GBFSVM: A Robust Classification Learning Method," Scientific Journal of Intelligent Systems Research, vol. 4, no. 1, pp. 1–8, 2022.
[37] Y. Xue, Y. Shao, S. Xia, and G. Wang, "The Dual Model of Support Vector Machine Based on Granular Ball Computing," 2021 3rd International Conference on Applied Machine Learning (ICAML), pp. 43–47, 2021.
[38] H.-J. Zimmermann, “Fuzzy set theory,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 3 pp. 317–332, 2010.
[39] K. K. Yen, S. Ghoshray, and G. Roig, “A linear regression model using triangular fuzzy number coefficients,” Fuzzy Sets and Systems, vol. 106, no. 2, pp. 167-177, 1999.
[40] Y. Zhimin and L. Guangli, “Principle and Application of Uncertain Support Vector Machine,” Beijing: Science Press, 2007.
[41] B. Liu and B. Liu, “Theory and practice of uncertain programming,” Berlin: Springer, 2009.
[42] A. Ji, J. Pang, and H. Qiu, “Support vector machine for classification based on fuzzy training data,” Expert Systems with Applications, vol. 37, no. 4, pp. 3495–3498, 2010.

	$\displaystyle h_{i}(w,b)\cdot\tilde{y_{i}}+\\|w\\|{r_{i}}+1$
	$\displaystyle=(h_{i}^{+}(w,b)-h_{i}^{-}(w,b))\cdot\tilde{y_{i}}+\\|w\\|{r_{i}}+1$
	$\displaystyle=(h_{i}^{+}(w,b)-h_{i}^{-}(w,b))\cdot(a_{i1},a_{i2},a_{i3})+\\|w\\|{r_{i}}+1$
	$\displaystyle=(h_{i}^{+}(w,b)a_{i1}-h_{i}^{-}(w,b)a_{i3}+\\|w\\|{r_{i}}+1,$		(32)
	$\displaystyle\ \ \ \ h_{i}(w,b)a_{i2}+\\|w\\|{r_{i}}+1,h_{i}^{+}(w,b)a_{i3}-$
	$\displaystyle\ \ \ \ h_{i}^{-}(w,b)a_{i1}+\\|w\\|.{r_{i}}+1).$

$\displaystyle\mathcal{L}$	$\displaystyle(w,b,\beta,\alpha)=\frac{1}{2}\\|w\\|^{2}-\sum_{t=1}^{p}\beta_{t}\cdot(((1-\lambda)a_{t3}+\lambda a_{t2})\cdot(wc_{t}+b)$
	$\displaystyle-\\|w\\|{r_{t}}-1)-\sum_{j={p+1}}^{l}\alpha_{j}\cdot(((1-\lambda)a_{j1}+\lambda a_{j2})\cdot(wc_{j}+b)$
	$\displaystyle-\ \\|w\\|{r_{j}}-1),$	(36)