This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\history

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. 10.1109/ACCESS.2017.DOI

\corresp

Corresponding author(s): [email protected], [email protected]

Multiclass Model for Agriculture development using Multivariate Statistical method

N DEEPA1    MOHAMMAD ZUBAIR KHAN 2    PRABADEVI B1    DURAI RAJ VINCENT P M1    PRAVEEN KUMAR REDDY MADDIKUNTA1    THIPPA REDDY GADEKALLU1 School of Infromation Technology and Engineering, VIT - Vellore, Tamilnadu, India Department of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah, Saudi Arabia
Abstract

Mahalanobis taguchi system (MTS) is a multi-variate statistical method extensively used for feature selection and binary classification problems. The calculation of orthogonal array and signal-to-noise ratio in MTS makes the algorithm complicated when more number of factors are involved in the classification problem. Also the decision is based on the accuracy of normal and abnormal observations of the dataset. In this paper, a multiclass model using Improved Mahalanobis Taguchi System (IMTS) is proposed based on normal observations and Mahalanobis distance for agriculture development. Twenty-six input factors relevant to crop cultivation have been identified and clustered into six main factors for the development of the model. The multiclass model is developed with the consideration of the relative importance of the factors. An objective function is defined for the classification of three crops, namely paddy, sugarcane and groundnut. The classification results are verified against the results obtained from the agriculture experts working in the field. The proposed classifier provides 100% accuracy, recall, precision and 0% error rate when compared with other traditional classifier models.

Index Terms:
Agriculture, multiclass, Mahalanobis Taguchi System (MTS), Grey correlation method, Objective function
\titlepgskip

=-15pt

I Introduction

Agriculture is a major boon to India, and it is a primary source of income. Though 60% of the land is cultivable, only 43% is used for crop production. Farmers in developing countries like India lack proper education and awareness about technical aspects of agriculture land cultivation, crop yield improvement, and soil fertility enhancement. The farmers cultivate their lands based on the previous experiences gained from their ancestors and their own field experiences. But the agriculture land quality parameters have been changing due to the drastic changes in the weather conditions. Also, the fertility of the soil is degraded due to the scarcity of water and rainfall[1].

Due to a lack of awareness on crop cultivation and yield, the farmers who were toiled during the entire cultivation period are paid less because of the mediators (agents for bargaining). If adequate training or assistance for the farmers on crop cultivation, pricing of yield and selling of crops are provided, the hard work laid by the farmers would not go in vain. If not, crop production is reduced, which in turn affects the economy of the country. Therefore if adequate support is provided from the government on these skills apart from the traditional way of doing agriculture, the economy of the country will significantly improve[2].

Many countries understand the value of agriculture and have started to shift their focus towards agriculture[3]. They have started to develop enormous innovations in almost all aspect of agriculture like land suitability analysis, soil health monitoring, fertilizer recommendation, good quality seeds, modern farming techniques, advanced irrigation techniques, natural manure production, crop recommendation system, yield prediction, and market price prediction[4]. Hence, the government should take proper initiatives (if not in all the areas as mentioned above) to inculcate the importance of agriculture in the young minds from their schooling. In turn, the full experience of our farmers will be transformed into proper techniques and can be utilized for the better health of the forthcoming generation.

In order to provide better recommendations from land suitability analysis to yield prediction, multiple criteria about each area should be considered [5, 6]. For instance, apart from the major factors to be considered such as soil, water, fertilizer, seasonal changes some other factors such as distance from agricultural land to research institute, extension centres, markets, agro centres, roads, and seed processing plans should be taken into account. When these factors are considered for better decisions, we must also choose a better technique for this prediction[7]. Henceforth, multi-criteria decision-making(MCDM) models prove to be the best in making decisions from various avenues [8].

The main contributions of this work are

  1. 1.

    An improved version of Mahalanobis Taguchi system is proposed in this work for multiclass classification problem, and it considers only normal observations of data and applies Mahalanobis distance for classification.

  2. 2.

    The multiclass IMTS model is built by considering the relative importance of the factors, and Grey correlation method is applied to calculate the weights. An objective function is defined to construct the decision matrix for multiclass classification.

  3. 3.

    Final ranking score matrix is obtained from the objective function to perform multiclass classification of three agriculture crops, and the results are compared with the results of other classifiers such as Naïve Bayes, Decision Tree, Random Forest, AdaBoost, J48, SVM and PART.

II Related work

MCDM approaches consider the relative importance of criteria for taking appropriate decisions. Relative importance (the weight) of these criteria plays a significant role in acquiring accurate decisions, and there are many weight calculation methods used with MCDM approaches. A decision model was developed for agriculture development in which Analytical hierarchy process(AHP), Rank sum method, criteria importance through inter-criteria correlation(CRITIC) and Standard Deviation (SD) methods were used for calculating the integrated weights of the criteria. In the development of the decision model, AHP and Rank sum methods are subjective weight assignment methods that calculate weights based on expert input. Further, CRITIC and SD methods are objective weight calculation methods that determine weights through mathematical analysis [6]. A model was developed for susceptibility mapping of floods using Geographical Information System and MCDM approaches. In this model, AHP was used for calculating the weights of eight criteria identified for the development of the model [9].

Dominance based rough set approach was specifically used for solving decision-making problems and applied for the development of a decision tool for agriculture development [10]. A model was developed to select materials for manufacturing and design of engineering products using Multi-attribute decision making(MADM) approach namely MULTIMOORA and Shannon entropy method was applied for the calculation of relative importance of the parameters identified for the material selection process [11]. An integrated model was developed using the technique for the order of preference by similarity to ideal solution(TOPSIS), Shannon Entropy and Delphi methods for identification of environmental risk in Iran. Shannon Entropy method was used for the calculation of weights of criteria [12].

A hybrid decision-making model was developed for the selection of materials for the construction dam by integrating step-wise weight assessment ratio analysis (SWARA) and Combinative distance-based assessment (CODAS) methods. SWARA is a subjective weight calculation method used for the calculation of weights of parameters for the development of the model[13, 14]. An MCDM model was developed to monitor the time and attendance of employees in companies. CRITIC method was applied to calculate the weights of the criteria and alternatives were ranked using MCDM method, namely Weighted Aggregated Sum Product Assessment (WASPAS) method[13].

A decision model was built for ranking the journals using TOPSIS method by applying two-weight calculation methods, namely Rough set approach[15] and Grey correlation method [16]. Grey Relational Analysis (GRA) is a popular MCDM method used for decision making when multiple conflicting criteria are involved. A ranking model was developed to evaluate the energy consumption in 47 official buildings using MCDM approach, namely GRA. GRA is specifically used to handle the relationships between multiple criteria considered for the problem and uncertain data[17]. Further Grey correlation method was employed for the calculation of weights of factors identified for the development of the multiclass model.

Mahalanobis Taguchi System(MTS) is a multivariate statistical method gaining popularity in decision-making problems. It was introduced by Prof. Mahalanobis, who discovered Mahalanobis Distance(MD) in 1930 to identify the sample from a given set of samples[18]. MTS has been used nowadays to select useful set of variables from the available set of identified variables for decision-making problems[19, 20]. A disease classification model was built using MTS, Fuzzy approach and C-Means clustering algorithm. MTS was applied for the selection of attributes from the dynamically selected features of Electrocardiogram(ECG)[21].

Some of the metric-learning based methods also use MD to solve the complex decision problems. To avoid the ill-conditioned formulations in hyperspectral images(HIS) distance metric learning is used for dimensionality reduction of the HIS images. A discriminative local metric learning method was developed in [22], to attain a global metric learning method for dimensionality reduction of HIS. Similarly, deep distance metric learning was proposed in [23] using convolution neural network for image classification. L2-normalization with cosine similarity was employed to improve the performance of the model.

An optimized binary classification was developed using a modified MTS(MMTS) method. The MMTS showed better results compared with the results obtained from Support Vector Machine(SVM), Probabilistic MTS, Naive Bayes, Hidden Naive Bayes, Kernel Boundary Alignment, Adaptive Conformal Transformation and Synthetic Minority Oversampling Technique methods[24, 25, 26]. A novel method was developed for the identification of conditions of roads using MTS, where it was applied to classify the quality of roads in cities[27]. A novel decision model was built using MTS for the classification of gait patterns for the patients treated for ligament reconstruction. Here MTS was used for both feature selection and classification purposes[28].

An evaluation model was developed for ranking the dangerous chemicals using Mahalanobis Taguchi System method. A multivariate analysis was done on the dataset and correlation among the criteria were considered for ranking the given set of alternatives[29]. An evaluation model was built to rank the performance of energy security[30, 31] in China using 14 factors that are relevant to the problem. Mahalanobis Taguchi Gram Schmidt was applied for the calculation of weights of identified factors, and TOPSIS was used to rank the given set of alternatives[32]. MTS has been proved successful in binary classification. But it has been improved further for the classification of multiclass data also. Several models have been developed for the detection of faults in various devices and equipment in mechanical domain [33], [34].

An adaptive multiclass MTS model was developed for the identification of faults in bearings [35]. Furthermore, various emerging models were used for classification from a larger image dataset. A linear classification system was developed in [36] using maximum a posterior on face recognition dataset. The data were compressed using dimensionality reduction techniques to enhance the performance of the model. The model achieved better results in low computational complexity (reduced training and testing time) and better accuracy of 97.61% when compared to existing conventional methods. Similarly, A feature learning model was developed in [37] for a hyperspectral image (HSI) containing a vast number of spatial-spectral information. The feature learning model using spatial-spectral information, hypergraph learning and discriminant analysis improves the performances of the classification to a greater extent when compared to conventional methods. Also, a dimensionality reduction technique with discriminant learning for enhancing the classification accuracy of HIS was developed in [38]. The model outperforms the other dimensionality reduction techniques by exhibiting the complicated intrinsic relationships of HIS.

A multi-objective firework algorithm was proposed for automatic clustering and classification which contains dynamic searching feature, remodelled objective function for clustering, modified mutual data and automatic clustering capability [39]. A dividing based objective evolutionary algorithm was proposed for feature selection on huge dataset. Two wrapper and filter was designed for obtaining high accuracy and achieving low computation cost. In order to obtain fast convergence, two recombination techniques were presented. A triangular decision making was also proposed using manhattan distance metric for providing assistance to the users knowledge [40]. A feature based data exchange facility is proposed in cloud based design and manufacturing domain [41].

In this paper, an Improved Mahalanobis Taguchi method was applied to develop a multiclass model for the classification of three agriculture crops. MTS was basically used for classification purpose by considering normal and abnormal observations relevant to the problem. In this multiclass model, the usage of abnormal observations is not required for classification, and the proposed Improved MTS method is simple and requires a limited number of calculations to perform classification. Also, the use of Mahalanobis distance value for each crop improves the distinguishability among them.

The rest of the paper is organized as follows: Section III discusses the proposed model, identified factors, study area and dataset used in the paper. Section IV explains the results of Grey correlation method, objective function and Improved MTS. The paper is concluded in V.

III Material and methods

III-A Proposed Model

The architecture of the proposed multiclass IMTS model is shown in Figure 1.

Refer to caption
Figure 1: Schematic diagram of the proposed multiclass IMTS model.

The Proposed multiclass model is segregated into six stages:

  1. 1.

    Selection of experimental land.

  2. 2.

    Identification of relevant factors and crops pertaining to the given problem.

  3. 3.

    Construction of decision matrix for each main factor for the given crops.

  4. 4.

    Computation of weights of sub-factors under each main factor using Grey correlation method.

  5. 5.

    Generation of evaluation scores by applying objective function which transforms sub-factor sequences to main factor sequence matrix.

  6. 6.

    Classification of main factor sequence matrix for three crops using Improved MTS.

A multiclass model for decision making on crop selection for the given agricultural site with influential parameters is proposed to assist the farmer in gaining the utmost profit by maximizing their yield. Agriculture land is selected where paddy, groundnut and sugarcane crops are cultivated as major crops. Though this classification model can provide better decisions on any crop selection, here three crops viz., paddy, groundnut, and sugarcane are chosen, for which these twenty-six input factors are obtained from the identified experimental land and through the survey. As there are many factors considered, they are clustered into six main factors viz., soil(mf1), water(mf2), season(mf3), fertilizer-input(mf4), support(mf5) and amenities(mf6). A decision matrix is constructed for each main factor where all the sub-factor values are included.

As weight calculation plays an important role in decision making, weights are computed for the sub-factors in each main factor using Grey correlation method. The collected agriculture site dataset consists of different values of measurements, and therefore, data normalization is performed. An objective function is defined to generate the evaluation scores, which are normalized values of the raw data collected. Also, the objective function transforms the sub-factor values into main factor values using the weights of the sub-factors and the sub-factor decision matrix. The evaluation scores of the three crops, namely paddy, groundnut and sugarcane, are applied to the Improved Mahalanobis Taguchi method for classification. The proposed method determines the suitability of a crop that can be cultivated in the given agriculture site and the performance of the model is validated by the classification results carried out by the agriculture field experts for the same dataset.

III-B Identification of factors and sub-factors

Based on the agriculture field experts’ opinion and from analysis of the literature survey done, 26 factors were identified for the development of the proposed IMTS model. Further 26 factors were clustered into six main factors each of which have its own sub-factors viz., soil(11 sub-factors), water(2 sub-factors), season(no sub-factor), fertilizer-input(6 sub-factors), support(2 sub-factors) and amenities(3 sub-factors) as shown in Figure 2.

Refer to caption
Figure 2: Main factors and corresponding sub-factors identified for the multiclass model.

III-C Study Area and Data sets

The field of study was Tiruvannamalai district in the state of Tamil Nadu, India. As mentioned above, the three crops namely paddy, groundnut, and sugarcane, are chosen for experimental purposes and considered as major economic crops in the geographical area with latitude, 1212^{\circ}15N{}^{{}^{\prime}}N and the longitude, 7979^{\circ}07E{}^{{}^{\prime}}E. The agricultural sites from the various village panchayats of Tiruvannamalai block namely Melkachirapattu, Thalayampallam, Andampallam, Allikondapattu, Devanur and Perumanam were chosen randomly for collecting the dataset for the study. The chosen main factors namely soil(mf1), water(mf2), season(mf3), fertilizer-input(mf4), support(mf5) and amenities(mf6) associated data for chosen three crops were collected for the development of the multiclass model. Out of 15 sites, three sites pertain to paddy crop, three sites related to sugarcane crop and remaining 3 sites to groundnut crop. Thus decision matrix comprises of sub-factor values under each main factor for each crop is constructed from the raw data for the development of the model.

IV Results and Discussions

IV-A Calculation of weights of sub-factors using Grey correlation method

Grey correlation method is applied to each sub-factor decision matrix which consists of raw data for calculation of relative weights. The first step in Grey correlation method is the generation of comparability sequences. As the raw data consists of a different range of values, it is advisable to normalize the values to the same measurement values. Comparability sequence consists of normalized values of the original decision matrix, which is calculated using the formula given as follows:

Yij=(Xijmin(Xij))/(max(Xij)min(Xij))Y_{ij}=\left(X_{ij}-\min\left(X_{ij}\right)\right)/\left(\max\left(X_{ij}\right)-\min\left(X_{ij}\right)\right) (1)

Where XijX_{ij} is the sub-factor matrix, i=1,2,3,mi=1,2,3,…m, j=1,2,3,nj=1,2,3,…n, and mm is number of alternatives (agriculture site dataset) and nn is number of sub-factors in given main factor. The comparability sequence matrix for sub-factors under soil main factor is shown in Table I.

TABLE I: Comparability Sequence matrix of sub-factors under soil main factor.
Sf1 Sf2 Sf3 Sf4 Sf5 Sf6 Sf7 Sf8 Sf9 Sf10 Sf11
0.00 1.00 0.0000 0.7778 0.0000 0.0000 0.0000 0.0000 0.7101 0.00 1.00
0.5 0.50 1.0000 1.0000 0.7368 0.8182 0.0606 0.2837 1.0000 1.00 0.00
0.00 1.00 1.0000 0.0000 0.9474 1.0000 0.0000 0.4382 1.0000 0.00 1.00
1.00 0.00 0.2609 0.0556 1.0000 0.0000 1.0000 1.0000 0.0000 1.00 0.00
0.00 1.00 0.2609 0.0556 1.0000 0.0000 1.0000 1.0000 0.0000 0.00 1.00

The reference sequence is defined as

Y0j=[111111]Y_{0j}=[11111\ldots 1] (2)

which is the ideal solution for the given alternatives. The next step is the computation of grey relational degree, which gives the distance between the ideal solution and the comparability sequence. Grey relational degree is calculated using the formula given as follows:

δij=|Y0jYij|\delta_{ij}=\left|Y_{0j}-Y_{ij}\right| (3)

Grey relational degree values for sub-factors of soil main factor are presented in Table II.

TABLE II: Grey relational degree values of sub-factors under soil main factor.
Sf1 Sf2 Sf3 Sf4 Sf5 Sf6 Sf7 Sf8 Sf9 Sf10 Sf11
0.00 1.00 0.00 0.7778 0.0000 0.0000 0.0000 0.0000 0.7101 0.00 1.00
0.50 0.50 1.00 1.0000 0.7368 0.8182 0.0606 0.2837 1.0000 1.00 0.00
0.00 1.00 1.00 0.0000 0.9474 1.0000 0.0000 0.4382 1.0000 0.00 1.00
1.00 0.00 0.2609 0.0556 1.0000 0.0000 1.0000 1.0000 0.0000 1.00 0.00
0.00 1.00 0.2609 0.0556 1.0000 0.0000 1.0000 1.0000 0.0000 0.00 1.00

Next Grey coefficient values are calculated using the equation given as follows:

Cij=(δmin+(thδmax))/(δij+(thδmax))C_{ij}=\left(\delta_{\min}+\left(th^{*}\delta_{\max}\right)\right)/\left(\delta_{ij}+\left(th^{*}\delta_{\max}\right)\right) (4)

Where δmax=max(δij)\delta_{\max}=\max\left(\delta_{ij}\right) and δmin=min(δij)\delta_{\min}=\min\left(\delta_{ij}\right) and th is threshold value which is a unique coefficient number which spans between 0 and 1. The threshold value is defined as 0.5 for most of the problems in MCDM [42]. Thus Grey relational coefficient values calculated for each alternative for soil main factor are shown in Table III.

TABLE III: Grey coefficient values of sub-factors under soil main factor.
Sf1 Sf2 Sf3 Sf4 Sf5 Sf6 Sf7 Sf8 Sf9 Sf10 Sf11
1.00 0.00 1.0000 0.2222 1.0000 1.0000 1.0000 1.0000 0.2899 1.00 0.00
0.50 0.50 0.0000 0.0000 0.2632 0.1818 0.9394 0.7163 0.0000 0.00 1.00
1.00 0.00 0.0000 1.0000 0.0526 0.0000 1.0000 0.5618 0.0000 1.00 0.00
0.00 1.00 0.7391 0.9444 0.0000 1.0000 0.0000 0.0000 1.0000 0.00 1.00
1.00 0.00 0.7391 0.9444 0.0000 1.0000 0.0000 0.0000 1.0000 1.00 0.00

Correlation degree values are calculated for the alternatives of sub-factors under each main factor for the identified crops using the formula given as follows:

Cj=1ni=1nCijC_{j}=\frac{1}{n}\sum_{i=1}^{n}C_{ij} (5)

The correlation degree values obtained for each alternative for soil main factor are shown in Table IV.

TABLE IV: Correlation degree values of sub-factors under soil main factor.
Sf1 Sf2 Sf3 Sf4 Sf5 Sf6 Sf7 Sf8 Sf9 Sf10 Sf11
0.3333 1.0000 0.3333 0.6923 0.3333 0.3333 0.3333 0.3333 0.6330 0.3333 1.0000
0.5000 0.5000 1.0000 1.0000 0.6552 0.7333 0.3474 0.4111 1.0000 1.0000 0.3333
0.3333 1.0000 1.0000 0.3333 0.9048 1.0000 0.3333 0.4709 1.0000 0.3333 1.0000
1.0000 0.3333 0.4035 0.3462 1.0000 0.3333 1.0000 1.0000 0.3333 1.0000 0.3333
0.3333 1.0000 0.4035 0.3462 1.0000 0.3333 1.0000 1.0000 0.3333 0.3333 1.0000

The relative weights of sub-factors are obtained by normalizing the correlation degree values using the formula

wj=Cjj=1nCjw_{j}=\frac{C_{j}}{\sum_{j=1}^{n}C_{j}} (6)

Thus the relative weights of sub-factors under each main factor are tabulated and shown in Table V.

TABLE V: Weights of sub-factors under each main factor.
mf1 weights mf2 weights mf3 weights mf4 weights mf5 weights
sf1 0.0714 wf1 0.4973 ff1 0.2022 suf1 0.5345 af1 0.3483
sf2 0.1095 wf2 0.5027 ff2 0.1816 suf2 0.4655 af2 0.3033
sf3 0.0897 ff3 0.1847 af3 0.3483
sf4 0.0776 ff4 0.1846
sf5 0.1112 ff5 0.1156
sf6 0.0781 ff6 0.1313
sf7 0.0861
sf8 0.0918
sf9 0.0942
sf10 0.0857
sf11 0.1047

IV-B Construction of objective function

The input sub-factor matrix under each main factor and their relative weights are applied to objective function in order to rank the given set of alternatives with respect to their main factors. In other words, sub-factor sequence values can be combined to form main factor matrix values in the form of ranking scores assigned for each alternative using the objective function. Thus an objective function is defined using the sub-factor matrix values and relative weights of the sub-factors obtained using the Grey correlation method.

Objfni=jDijwj\textit{Objfn}_{i}=\sum_{j}D_{ij}w_{j} (7)

Where DijD_{ij} is the sub-factor decision matrix obtained from the experimental dataset for the identified crops and wj is the weights of sub-factors under each main factor. i=1,2,3,mi=1,2,3,…m, j=1,2,3,nj=1,2,3,…n. mm is the number of alternatives, nn is the number of factors. Each sub-factor matrix, along with its weights, is applied to the objective function, and final ranking scores are obtained. As the main factor (mf3) input does not have sub-factor and hence the corresponding alternative values are normalized and included in the ranking score decision matrix. Thus obtained main factor matrix for the identified 3 crops, namely paddy, sugarcane and groundnut is shown in Table VI.

TABLE VI: Final ranking scores of alternatives for 3 crops.
mf1 mf2 mf3 mf4 mf5 mf6 decision class
0.5901 0.716 0.9 1.4668 0.5725 0.8268 paddy
1.1521 0.6835 0.9021 1.2722 0.1352 0.3435 paddy
1.1704 0.5707 0.9021 1.2654 0.1418 0.225 paddy
0.7887 0.4867 0.9 1.5224 0.1176 0.3238 paddy
0.7887 0.4426 0.9021 1.5224 0.5725 0.8268 paddy
0.34 0.6916 0.2919 1.6126 0.9583 2.0364 sugarcane
0.868 0.3832 0.2919 1.6467 0.6667 1.3818 sugarcane
0.616 0.7105 0.3 1.1793 0.0417 0.1455 sugarcane
0.568 0.4295 0.2919 1.1022 0.5417 1.1636 sugarcane
0.816 0.7368 0.2919 2.6178 0.9583 2.0364 sugarcane
0.4852 0.6156 1.259 0.4769 1.0752 1.8172 groundnut
0.5074 0.2208 0.7834 0.3527 0.1466 0.2796 groundnut
0.387 0.5481 1.259 0.4967 0.4887 0.9086 groundnut
0.587 0.6156 0.7834 0.4363 0.6842 1.1882 groundnut
0.5981 0.2208 0.7834 0.3527 0.3421 0.5591 groundnut

IV-C Classification using Improved Mahalanobis Taguchi System

Mahalanobis Taguchi system is a statistical method used for classification purpose. It considers normal and abnormal observations relevant to the problem. In this problem, normal observations are the agriculture site dataset suitable for crop cultivation, and abnormal observations are the agriculture sites which are not suitable for cultivation. In this Improved MTS method, the usage of abnormal observations is not required for classification. And the proposed Improved MTS method is simple and requires a limited number of calculations to perform classification. It uses mahalanobis distance value for each crop to distinguish among them.

The obtained final ranking scores of 15 alternatives for 3 crops, namely paddy, sugarcane and groundnut are applied to Improved MTS algorithm for classification of 3 crops. The steps in Improved MTS are as follows:

Calculation of Mahalanobis distance

The initial step is to obtain a measurement scale which is referred to as normal observations(alternatives). Here the normal observations denote the agriculture dataset suitable for crop cultivation Table VII. The normal observations are normalized by calculating the mean and their standard deviation and the inverse of the correlation matrix of normal observations is calculated to obtain Mahalanobis Distance (MD). MD corresponding to the dataset is computed using Equation 8 [43].

MD=1kZijTC1Zij\mathrm{MD}=\sqrt{\frac{1}{k}Z_{ij}^{T}C^{-1}Z_{ij}} (8)

Where kk is the number of factors, i=1,2,ni=1,2,...n factors, j=1,2,,mj=1,2,...,m alternatives, ZijZ_{ij} is normalized matrix calculated using the mean and standard deviation as follows:

Zvj=XijX¯jSjZ_{v_{j}}=\frac{X_{ij}-\bar{X}_{j}}{S_{j}} (9)

where X is normal observation and factor XijX_{ij} means jthj^{th} characteristic of ithi^{th} observation xi¯\overline{x_{i}} is mean value for each factor of every alternative and calculated using the formula

xi¯=j=1nXijn\overline{x_{i}}=\frac{\sum_{j=1}^{n}X_{ij}}{n} (10)

Si{S_{i}} denotes standard deviation for each factor in normal observations and obtained using the formula

Si=j=1n(Xijxi¯)2n1S_{i}=\sqrt{\frac{\sum_{j=1}^{n}\left(X_{ij}-\overline{x_{i}}\right)^{2}}{n-1}} (11)

Crop classification

The appropriate site relevant to crop is classified using the conventional measurement scale. Every single variety of crop data (Y) is obtained from the ranking scores Table VII and made consistent by

Yij=YijXj¯SjY_{ij}=\frac{Y_{ij}-\overline{X_{j}}}{S_{j}} (12)

where Xj¯\overline{X_{j}} is mean of column jj in XX and relative MD is computed using the formula

MD=1kYijTC1Yij\mathrm{MD}=\sqrt{\frac{1}{k}Y_{ij}^{T}C^{-1}Y_{ij}} (13)

The Mahalanobis distance calculated for the 3 crops as per the given site dataset is presented in Table VII.

TABLE VII: Improved MTS classification results.
p1 p2 p3 p4 p5
MDpMD_{p} 3.018445 1.880574 0.795746 2.249553 2.517626
MDsMD_{s} 47098.43 47355.18 45465.46 45209.61 47113.72
MDgMD_{g} 15794.56 16139.26 2058.6 1848.254 2026.686
s1 s2 s3 s4 s5
MDpMD_{p} 4667.18 4512.648 4556.755 4635.9 4683.685
MDsMD_{s} 1.251757 0.703181 4.731859 0.542237 2.505792
MDgMD_{g} 11852.05 2861.864 11718.75 2983.291 2869.019
g1 g2 g3 g4 g5
MDpMD_{p} 32.06872 5.504255 6.165809 25.13068 29.92342
MDsMD_{s} 66.40954 70.91642 20.43124 19.10753 223.5876
MDgMD_{g} 4.670055 3.597042 2.135998 0.454961 3.077487

Table VII shows the classification results of the IMTS model for 3 crops, namely paddy, sugarcane and groundnut. In Table VIII, p1, p2,…p5 represents agriculture sites where paddy crop is grown, s1, s2,… s5 denotes sugarcane crop agriculture sites and g1,g2,…g3 represents agriculture dataset related to groundnut crop. Mahalanobis Distance, namely MDpMD_{p}, MDsMD_{s}, MDgMD_{g}, are computed for the crops paddy, sugarcane and groundnut using the formula given in Equation 10. Here the Mahalanobis Distance (MD) and the subscripts p,s,g are used viz. MDpMD_{p}, MDsMD_{s}, MDgMD_{g} for the crops paddy, sugarcane and groundnut respectively. The rule of least MD is the basis for classification of any agriculture site Z. Based on the MD values of the crops paddy, sugarcane and groundnut, the sites are classified. If MDpMD_{p} ¡ MDsMD_{s} ¡ MDgMD_{g}, then interpretation can be made that the site dataset Z belongs to the Y type of crop.

In Table VII, for p1,p2,,p3p1,p2,…,p3 agriculture sites, the least MD values are 3.01, 1.88, 0.79, 224, 2.51 pertaining to paddy crop. Similarly for sugarcane sites s1,s2,,s3s1,s2,...,s3 the least MD values 1.25, 0.70, 4.73, 0.54, 2.50 denotes the sugarcane crop. And finally the least MD values of groundnut sites g1,g2,,g3g1,g2,...,g3 are 4.67, 3.59, 2.13, 0.45, 3.07 shows the classified crop as groundnut. The same dataset is given to agriculture experts for classification. The results obtained from experts showed 100% accuracy with the results obtained from the developed model. Thus the developed multiclass model is a feasible tool for classification problems.

The multiclass models classify the three crops based on the MD calculated for each alternatives of the dataset. As well, the multiple factors considered for decision making are evaluated by considering the relative importance of each subfactor, reducing the data inconsistencies. The results obtained through the multiclass model and the agricultural experts’ opinion on the dataset are similar. Since the dataset is limited, the experts were able to give their opinion were obtained certainly. Therefore, the developed multiclass model can be recommended as a feasible tool for classification problems with multiple decision criteria.

Comparative analysis of IMTS results with other classifiers

Further, the results of IMTS is assessed by comparing with the results obtained from popular classifiers such as Naïve Bayes, Decision Table, Random Forest (Bagging with 100 iterations), AdaBoost, J48 (pruned tree with three leaves and size 5), SVM and PART. The agriculture dataset is classified using these classifiers and the proposed multiclass model under the test mode of 10 fold cross-validation[44]. The execution time for all the classifiers is not more than 0.05 seconds. Various performance measures are used to evaluate the error rate and accuracy of selected classifiers. In order to validate the results of proposed IMTS and other classifiers, the classification accuracy, precision and recall are calculated. These metrics are calculated using the following equations:

accuracy=TP+TNTP+TN+FP+FN\operatorname{accuracy}=\frac{T_{P}+T_{N}}{T_{P}+T_{N}+F_{P}+F_{N}} (14)
precision=TpTp+Fp\text{precision}=\frac{T_{p}}{T_{p}+F_{p}} (15)
sensitivity=TPTP+FN\text{sensitivity}=\frac{T_{P}}{T_{P}+F_{N}} (16)

TPT_{P}, TNT_{N}, FPF_{P} and FNF_{N} are true positive, true negative, false positive and false negative values respectively.

True positive defines test data predicted to be in decision class and is actually found in it. True negative defines test data not predicted to be in decision class and is not found in it. False-positive provides information about test data predicted to be in decision class and is not found in it. False-negative defines test data not predicted to be in decision class and is found in it. Accuracy defines the total number of correct predictions specified in percentage. Precision is defined as the total number of correct positive predictions represented in percentage. Recall defines the positive observations that are predicted as positive and specified in percentage.

The performance of the classifiers is evaluated using accuracy, precision and recall values and shown in Table VIII. Accuracy, precision and recall scores of all the classifiers are represented in Figure 3 to Figure 5.

TABLE VIII: Evaluation metrics of various classifiers.
Classifiers Accuracy Precision Recall
Naïve Bayes 80 82 80
Random Forest 93.3 94 93
J48 73.3 77.4 73.3
PART 73.3 77.4 73.3
AdaBoost 73.3 77.4 73.3
Decision Table 66.6 72.2 66.7
IMTS 100 100 100
Refer to caption
Figure 3: Accuracy scores of all classifiers.
Refer to caption
Figure 4: Precision scores of all classifiers.
Refer to caption
Figure 5: Recall scores of all classifiers.
TABLE IX: Evaluation of error rates of the classifiers.
Classifiers MAE RMSE RAE RRSE
Naïve Bayes 19.8 24.9 42.9 50.8
Random Forest 21.15 25.46 45.7 51.8
J48 17.7 42.1 38.4 85.8
PART 17.7 42.1 38.4 85.8
AdaBoost 17.7 42.1 38.4 85.8
Decision Table 31.3 37.7 67.7 75.5
IMTS 0 0 0 0
Refer to caption
Figure 6: Mean absolute error rate of all the classifiers.

Mean Absolute Error(MAE) is used for measuring the average of the absolute difference between the set of predicted and actual values, provided each difference have identical weight. Figure 6 shows that MAE (i.e. the mean magnitude of errors) is zero for IMTS whereas Decision tree incurs 31.3% of MAE, Random Forest has 21.15%, and Naïve Bayes has 19.8%. MAE for J48, PART and AdaBoost is same 17.7%. Therefore, the multiclass model provides a 100% match to the actual values with zero error.

Refer to caption
Figure 7: Root mean squared error rate of all the classifiers.

Similar to MAE, the Root Mean Square Error(RMSE) also measures the mean magnitude of the differences(i.e. errors). RMSE is the square root of the mean of the squared deviations. As RMSE is more appropriate than MAE, the proposed model has zero RMSE implying a 100% accurate classification of crops without error whereas RMSE of remaining classifiers fall between 24% and 42%. RMSE graph is shown in Figure 7.

Refer to caption
Figure 8: Root relative squared error rate of all the classifiers.

Root Relative Squared Error(RRSE) provides the squared error of the predictions that are relative to the mean of every data value. It gives accurate results than simple predictor by normalizing the values obtained from the simple predictor(Eg. Naïve or ZeroR). It divides the total squared error by dividing it with absolute squared error obtained from the simple predictor. Furthermore, by generating the square root of a normalized value, the error is reduced. The proposed model attains 0% of RRSE as shown in Figure 8.

Refer to caption
Figure 9: Relative absolute error rate of all the classifiers.

Relative Absolute Error(RAE) is similar to RRSE, which is calculated by dividing MAE by error obtained in the simple predictor. Hence, the smaller the value of RAE indicates a better prediction. Figure 8 shows that the proposed method attains 0% of RAE, an ideal RAE value.

Mean absolute error measures the average of all the absolute errors. The root means squared error calculates the average of the magnitude of the error. Relative absolute error calculates the sum of absolute errors. The root relative squared error measures the square root of the relative squared error. The error rate of each classifier is assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE) and shown in Table IX. The various error rates obtained for different classifiers are shown in Figure 6 to Figure 9. The proposed multiclass model for classification of three different crops yields 100% accuracy when compared to other methods. The accuracy of the classifier is assured as it considers the relative importance of each factor identified for the analysis. Grey correlation method is used for calculating the relative weights of each subfactor, and in turn the main factors are evaluated using the objective function constructed is used. This will reduce the inconsistencies in the data. The proposed model ranks the alternatives based on the least MD values. Thus by alleviating the data inconsistencies, the proposed model assures better accuracy than other models.

V Conclusions

A multiclass model is developed in this paper using the Improved Mahalanobis Taguchi System method for the classification of three crops, namely paddy, sugarcane and groundnut. Twenty-six factors are identified for the three given crops and categorized into six main factors. As the relative importance of each factor plays a major role in decision making, weights of factors are calculated using Grey correlation method. The sub-factor dataset matrix is converted to main factor data values using an objective function by applying the weights of the sub-factors. The obtained ranking score decision matrix is applied to Improved MTS for classification of three crops. Mahalanobis distance is calculated for every alternative of each crop. The least MD value forms the basic idea for the classification of agriculture site pertaining to a particular crop. The classification results of the developed model are validated by the results obtained from the agriculture experts. The multiclass model gives 100% accuracy, recall and precision compared with other classifiers. Also, the error rates RMSE, RRSE, RAE and MAE are 0% indicating a better prediction for the given dataset. The limitation of the model is it can be applied to decision problems with a limited number of alternatives and decision classes. Further research can be extended by using deep neural network algorithms when high dimension dataset is applied. Feature selection methods can be applied to find a useful set of features for decision making. Other classification datasets can also be applied to test the efficiency of the developed model.

References

  • [1] Thippa Reddy Gadekallu, Dharmendra Singh Rajput, M Praveen Kumar Reddy, Kuruva Lakshmanna, Sweta Bhattacharya, Saurabh Singh, Alireza Jolfaei, and Mamoun Alazab. A novel pca–whale optimization-based deep neural network model for classification of tomato plant diseases using gpu. Journal of Real-Time Image Processing, pages 1–14, 2020.
  • [2] Praveen Kumar Reddy Maddikunta, Saqib Hakak, Mamoun Alazab, Sweta Bhattacharya, Thippa Reddy Gadekallu, Wazir Zada Khan, and Quoc-Viet Pham. Unmanned aerial vehicles in smart agriculture: Applications, requirements and challenges. arXiv preprint arXiv:2007.12874, 2020.
  • [3] Anusha Vangala, Ashok Kumar Das, Neeraj Kumar, and Mamoun Alazab. Smart secure sensing for iot-based agriculture: Blockchain perspective. IEEE Sensors Journal, 2020.
  • [4] Giridhar Reddy Bojja and Loknath Sai Ambati. A novel framework for crop pests and disease identification using social media and ai.
  • [5] AA Mustafa, Man Singh, RN Sahoo, Nayan Ahmed, Manoj Khanna, A Sarangi, and AK Mishra. Land suitability analysis for different crops: a multi criteria decision making approach using remote sensing and gis. Researcher, 3(12):61–84, 2011.
  • [6] N Deepa, Kathiravan Srinivasan, Chuan-Yu Chang, Ali Kashif Bashir, et al. An efficient ensemble vtopes multi-criteria decision-making model for sustainable sugarcane farms. Sustainability, 11(16):4288, 2019.
  • [7] Praveen Kumar Reddy Maddikunta, Gautam Srivastava, Thippa Reddy Gadekallu, Natarajan Deepa, and Prabadevi Boopathy. Predictive model for battery life in iot networks. IET Intelligent Transport Systems, 2020.
  • [8] N Deepa and K Ganesan. Hybrid rough fuzzy soft classifier based multi-class classification model for agriculture crop selection. Soft computing, 23(21):10793–10809, 2019.
  • [9] Dhekra Souissi, Lahcen Zouhri, Salma Hammami, Mohamed Haythem Msaddek, Adel Zghibi, and Mahmoud Dlala. Gis-based mcdm–ahp modeling for flood susceptibility mapping of arid areas, southeastern tunisia. Geocarto International, pages 1–27, 2019.
  • [10] Roman Słowiński, Salvatore Greco, and Benedetto Matarazzo. Rough set analysis of preference-ordered data. In International Conference on Rough Sets and Current Trends in Computing, pages 44–59. Springer, 2002.
  • [11] Arian Hafezalkotob and Ashkan Hafezalkotob. Extended multimoora method based on shannon entropy weight for materials selection. Journal of Industrial Engineering International, 12(1):1–13, 2016.
  • [12] SA Jozi, M Shafiee, N MoradiMajd, and S Saffarian. An integrated shannon’s entropy–topsis methodology for environmental risk assessment of helleh protected area in iran. Environmental monitoring and assessment, 184(11):6913–6922, 2012.
  • [13] Ayşegül Tuş and Esra Aytaç Adalı. The new combination with critic and waspas methods for the time and attendance software selection problem. Opsearch, 56(2):528–538, 2019.
  • [14] Abteen Ijadi Maghsoodi, Arta Ijadi Maghsoodi, Parastou Poursoltan, Jurgita Antucheviciene, and Zenonas Turskis. Dam construction material selection by implementing the integrated swara—codas approach with target-based attributes. Archives of Civil and Mechanical Engineering, 19:1194–1210, 2019.
  • [15] Bala Krushna Tripathy, Anirban Mitra, and Jaladhar Ojha. On rough equalities and rough equivalences of sets. In International Conference on Rough Sets and Current Trends in Computing, pages 92–102. Springer, 2008.
  • [16] Mei-Jia Huang, Yuan-Biao Zhang, Jie-Huan Luo, and He Nie. Evaluation of economics journals based on reduction algorithm of rough set and grey correlation. J. Mgmt. & Sustainability, 5:140, 2015.
  • [17] Wen-Shing Lee and Yeong-Chuan Lin. Evaluating and ranking energy performance of office buildings using grey relational analysis. Energy, 36(5):2551–2556, 2011.
  • [18] Prasanta Chandra Mahalanobis. On the generalized distance in statistics. National Institute of Science of India, 1936.
  • [19] P Mahalakshmi and K Ganesan. Mahalanobis taguchi system based criteria selection for shrimp aquaculture development. Computers and electronics in agriculture, 65(2):192–197, 2009.
  • [20] N Deepa and K Ganesan. Mahalanobis taguchi system based criteria selection tool for agriculture crops. Sādhanā, 41(12):1407–1414, 2016.
  • [21] Nur Al Hasan Haldar, Farrukh Aslam Khan, Aftab Ali, and Haider Abbas. Arrhythmia classification using mahalanobis distance based improved fuzzy c-means clustering for mobile health monitoring systems. Neurocomputing, 220:221–235, 2017.
  • [22] Yanni Dong, Bo Du, Liangpei Zhang, and Lefei Zhang. Dimensionality reduction and classification of hyperspectral images using ensemble discriminative local metric learning. IEEE Transactions on Geoscience and Remote Sensing, 55(5):2509–2524, 2017.
  • [23] Xuefei Zhe, Shifeng Chen, and Hong Yan. Directional statistics-based deep metric learning for image classification and retrieval. Pattern Recognition, 93:113–123, 2019.
  • [24] Mahmoud El-Banna. Modified mahalanobis taguchi system for imbalance data classification. Computational Intelligence and Neuroscience, 2017, 2017.
  • [25] Anish Jindal, Neeraj Kumar, and Mukesh Singh. Internet of energy-based demand response management scheme for smart homes and phevs using svm. Future Generation Computer Systems, 108:1058–1068, 2020.
  • [26] Harshita Patel, Dharmendra Singh Rajput, G Thippa Reddy, Celestine Iwendi, Ali Kashif Bashir, and Ohyun Jo. A review on classification of imbalanced data for wireless sensor networks. International Journal of Distributed Sensor Networks, 16(4):1550147720916404, 2020.
  • [27] Huaijun Wang, Na Huo, Junhuai Li, Kan Wang, and Zhixiao Wang. A road quality detection method based on the mahalanobis-taguchi system. IEEE Access, 6:29078–29087, 2018.
  • [28] Hamzah Sakeran, Noor Azuan Abu Osman, and Mohd Shukry Abdul Majid. Gait classification using mahalanobis–taguchi system for health monitoring systems following anterior cruciate ligament reconstruction. Applied Sciences, 9(16):3306, 2019.
  • [29] Da-An Huh, Hong Lyuer Lim, Jong-Ryeul Sohn, Sang-Hoon Byeon, Soonyoung Jung, Woo-Kyun Lee, and Kyong Whan Moon. Development of a screening method for health hazard ranking and scoring of chemicals using the mahalanobis–taguchi system. International journal of environmental research and public health, 15(10):2208, 2018.
  • [30] Ahmad Azab, Robert Layton, Mamoun Alazab, and Jonathan Oliver. Mining malware to detect variants. In 2014 Fifth Cybercrime and Trustworthy Computing Conference, pages 44–53. IEEE, 2014.
  • [31] Mamoun Alazab, Robert Layton, Roderic Broadhurst, and Brigitte Bouhours. Malicious spam emails developments and authorship attribution. In 2013 Fourth Cybercrime and Trustworthy Computing Workshop, pages 58–68. IEEE, 2013.
  • [32] Jiahang Yuan and Xinggang Luo. Regional energy security performance evaluation in china using mtgs and spa-topsis. Science of The Total Environment, 696:133817, 2019.
  • [33] Yunkai Wu, Bin Jiang, and Ningyun Lu. A descriptor system approach for estimation of incipient faults with application to high-speed railway traction devices. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017.
  • [34] Yunkai Wu, Bin Jiang, and Yulong Wang. Incipient winding fault detection and diagnosis for squirrel-cage induction motors equipped on crh trains. ISA transactions, 99:488–495, 2020.
  • [35] Ning Wang, Zhipeng Wang, Limin Jia, Yong Qin, Xinan Chen, and Yakun Zuo. Adaptive multiclass mahalanobis taguchi system for bearing fault diagnosis under variable conditions. Sensors, 19(1):26, 2019.
  • [36] Tzu-Wei Tseng, Kai-Jiun Yang, C-C Jay Kuo, and Shang-Ho Tsai. An interpretable compression and classification system: Theory and applications. IEEE Access, 8:143962–143974, 2020.
  • [37] Fulin Luo, Bo Du, Liangpei Zhang, Lefei Zhang, and Dacheng Tao. Feature learning using spatial-spectral hypergraph discriminant analysis for hyperspectral image. IEEE transactions on cybernetics, 49(7):2406–2419, 2018.
  • [38] Fulin Luo, Liangpei Zhang, Bo Du, and Lefei Zhang. Dimensionality reduction with enhanced hybrid-graph discriminant learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 2020.
  • [39] Haoran Li, Fazhi He, and Yilin Chen. Learning dynamic simultaneous clustering and classification via automatic differential evolution and firework algorithm. Applied Soft Computing, 96:106593, 2020.
  • [40] Haoran Li, Fazhi He, Yaqian Liang, and Quan Quan. A dividing-based many-objective evolutionary algorithm for large-scale feature selection. Soft Computing, pages 1–20, 2019.
  • [41] Yiqi Wu, Fazhi He, Dejun Zhang, and Xiaoxia Li. Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Transactions on services computing, 11(2):341–353, 2015.
  • [42] Tuncay Özcan, Numan Çelebi, and Şakir Esnaf. Comparative analysis of multi-criteria decision making methodologies and implementation of a warehouse location selection problem. Expert Systems with Applications, 38(8):9773–9779, 2011.
  • [43] Jiangtao Ren, Yuanwen Cai, Xiaochen Xing, and Jing Chen. A method of multi-class faults classification based-on mahalanobis-taguchi system using vibration signals. In The Proceedings of 2011 9th International Conference on Reliability, Maintainability and Safety, pages 1015–1020. IEEE, 2011.
  • [44] Swarnalatha Purushotham and BK Tripathy. Evaluation of classifier models using stratified tenfold cross validation techniques. In International Conference on Computing and Communication Systems, pages 680–690. Springer, 2011.
\EOD