This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Novel Nonlinear Non-parametric Correlation Measurement With A Case Study on Surface Roughness in Finish Turning

Ming Luo Srinivasan Radhakrishnan Sagar Kamarthi Northeastern University, 360 Huntington Avenue, Boston, Massachusetts, United States
Abstract

Estimating the correlation coefficient has been a daunting work with the increasing complexity of dataset’s pattern. One of the problems in manufacturing applications consists of the estimation of a critical process variable during a machining operation from directly measurable process variables. For example, the prediction of surface roughness of a workpiece during finish turning processes. In this paper, we did exhaustive study on the existing popular correlation coefficients: Pearson correlation coefficient, Spearman’s rank correlation coefficient, Kendall’s Tau correlation coefficient, Fechner correlation coefficient, and Nonlinear correlation coefficient. However, no one of them can capture all the nonlinear and linear correlations. So, we represent a universal non-linear non-parametric correlation measurement, gg-correlation coefficient. Unlike other correlation measurements, gg-correlation doesn’t require assumptions and pick the dominating patterns of the dataset after examining all the major patterns no matter it is linear or nonlinear. Results of testing on both linearly correlated and non-linearly correlated dataset and comparison with the introduced correlation coefficients in literature show that gg-correlation is robust on all the linearly correlated dataset and outperforms for some non-linearly correlated dataset. Results of the application of different correlation concepts to surface roughness assessment show that gg-correlation has a central role among all standard concepts of correlation.

keywords:
Association, predictability , Fechner correlation coefficient , pattern recognition , surface roughness

1 INTRODUCTION

A classical problem in statistics for ordered pairs of measurement (x1,y1),(x2,y2),,(xn,yn)(x_{1},y_{1}),(x_{2},y_{2}),\\ \dots,(x_{n},y_{n}) is that is it possible in general from knowing one of the values in an arbitrary pair (x,y)\,(x,~{}y)\, of measurements to draw conclusions about the other value in this pair.

When speaking about correlation the statistics literature mainly aims at finding a certain functional relationship (such as a straight line in linear regression) or a monotonic relationship between two numerical variables. In General, correlation analysis is a means of measuring the strength or ‘closeness’ of the relationship between two variables [2].

In section 2 we try to provide general definitions and clarifications of the terms correlation and predictability for pairs of random variables. The most important methods for measuring a correlation between two random variables are also presented and discussed in section 2. Two concepts for nonparametric correlation analysis and prediction -- a seldomly used one and the recently introduced gg-correlation -- which can detect correlations which are neither functional nor monotonic relationships are described, derived and analyzed in sections 3 and 4. In section 5 it is shown how the gg-correlation concept can be generalized for more than 2 variables. In section 6 the gg-correlation is compared with among all the correlation measures introduced in the paper with linearly and nonlinearly correlated dataset. In section 7 the gg-correlation is compared with other important correlation measures for a problem of surface roughness prediction in finish turning and the results are summarized in section 8.

2 THE CONCEPT OF CORRELATION BETWEEN TWO VARIABLES

When prediction is a concern in correlation analysis, statistician make the following distinction [9]: Correlation is a measure of degree of dependence between two dependent random variables.

Definition 1 A dependent random variable YY is uncorrelated with respect to an independent random variable XX if the range and/or frequency of the possible values for YY is constant for varying values of XX. In mathematical terms, this means that, for any two unique values xi,xjx_{i},~{}x_{j} in XX, FY|X(y|xi):=P(Yy|X=xi)\,F_{Y|X}(y|x_{i}):=P(Y\leq y|X=x_{i})\, and FY|X(y|xj):=P(Yy|X=xj)\,F_{Y|X}(y|x_{j}):=P(Y\leq y|X=x_{j})\, are equivalent for all yy in YY.

Correlation therefore means that the values of the dependent variable YY do not always behave completely randomly. In fact, their distribution is influenced by and can be predicted from the corresponding value of XX. It is shown with the following Lemma 2 that under some technical assumptions the correlation defined by Definition 1 is equivalent to the well-known concept of statistical dependence of two random variables – one random variable has a certain impact on the other random variable and vice versa.

Lemma 2 Let XX and YY be two random variables which are either a) both discrete or b) both continuous with a joint density function ff. Under this assumption, XX and YY are uncorrelated according to Definition 1 if and only if they are statistically independent.

Proof.

To prove it, we will show that the statement that X and Y are uncorrelated is both necessary and sufficient for statement that X and Y are statistically independent in both case a) and case b).


Case a: The discrete case means that XX and YY take finitely or countably many values x1,x2,xj,xn\,x_{1},~{}x_{2},~{}\dots\,~{}x_{j},~{}\dots\,~{}x_{n} and y1,y2,,yi,yn\,y_{1},~{}y_{2},~{}\dots,~{}y_{i},~{}\dots\,~{}y_{n}, respectively, which are listed in increasing order. In this case statistical independence of XX and YY is defined as

PY,X{Y=yi,X=xj}=PY{Y=yi}PX{X=xj}{\rm P_{Y,X}}\{Y=y_{i},~{}X=x_{j}\}={\rm P_{Y}}\{Y=y_{i}\}{\rm P_{X}}\{X=x_{j}\} (1)

for all indices i,j\,i,j.

  1. 1.

    Necessary: If XX and YY are independent, then the conditional probability distribution function of YY given X=x,X=x, FY|X(y|x),F_{Y|X}(y|x), is defined as

    FY|X(y|x)=P{Yy|X=x}=P{Yy,X=x}P{X=x}F_{Y|X}(y|x)={\rm P}\{Y\leq y|X=x\}=\frac{{\rm P}\{Y\leq y,~{}X=x\}}{{\rm P}\{X=x\}} (2)

    for all x{x1,x2,xj,xn}x\in\{\,x_{1},~{}x_{2},~{}\dots\,~{}x_{j},~{}\dots\,~{}x_{n}\} such that P{X=x}>0{\rm P}\{X=x\}>0. This can be further written as

    FY|X(y|x)=ykyP{Y=yk,X=x}P{X=x},yk{y1,y2,,yi,yn}F_{Y|X}(y|x)=\frac{\sum_{y_{k}\leq y}{\rm P}\{Y=y_{k},~{}X=x\}}{{\rm P}\{X=x\}},y_{k}\in\{\,y_{1},~{}y_{2},~{}\dots,~{}y_{i},~{}\dots\,~{}y_{n}\} (3)

    and using equation (1) we further get

    FY|X(y|x)=ykyP{Y=yk}P{X=x}P{X=x}=ykyP{Y=yk}=P{Y<y}.F_{Y|X}(y|x)=\frac{\sum_{y_{k}\leq y}{\rm P}\{Y=y_{k}\}{\rm P}\{X=x\}}{{\rm P}\{X=x\}}=\sum_{y_{k}\leq y}{\rm P}\{Y=y_{k}\}=P\{Y<y\}. (4)

    This means that FY|X(y|x)\,F_{Y|X}(y|x)\, is always the same function independent of the value x\,x, i.e. XX and YY are uncorrelated.

  2. 2.

    Sufficient: For the opposite direction,

    1. (a)

      we first consider the case where XX takes only a single value x1x_{1} with probability 1. Then, clearly,

      P{Y=yi,X=x1}=P{Y=yi}=P{Y=yi}P{X=x1}{\rm P}\{Y=y_{i},~{}X=x_{1}\}={\rm P}\{Y=y_{i}\}={\rm P}\{Y=y_{i}\}{\rm P}\{X=x_{1}\} (5)

      for every index ii range from 1 to nn, which is just the independence of XX and YY.

    2. (b)

      Secondly, assume that

      FY|X(y|xk)=P{Yy|X=xk}=P{Yy|X=xj}=FY|X(y|xj)F_{Y|X}(y|x_{k})={\rm P}\{Y\leq y|X=x_{k}\}={\rm P}\{Y\leq y|X=x_{j}\}=F_{Y|X}(y|x_{j}) (6)

      for every real value yy and every pair of different indices k,j{1,2,3,,n}\,k,j\in\{1,2,3,\dots,n\}.

    Equation (6) can be reformulated as

    P{Yy,X=xk}P{X=xk}=P{Yy,X=xj}P{X=xj}\displaystyle\frac{{\rm P}\{Y\leq y,~{}X=x_{k}\}}{{\rm P}\{X=x_{k}\}}=\frac{{\rm P}\{Y\leq y,~{}X=x_{j}\}}{{\rm P}\{X=x_{j}\}}
    \displaystyle\Longleftrightarrow\phantom{aaa} ylyP{Y=yl,X=xk}P{X=xk}=ylyP{Y=yl,X=xj}P{X=xj}.\displaystyle\frac{\sum_{y_{l}\leq y}{\rm P}\{Y=y_{l},~{}X=x_{k}\}}{{\rm P}\{X=x_{k}\}}=\frac{\sum_{y_{l}\leq y}{\rm P}\{Y=y_{l},~{}X=x_{j}\}}{{\rm P}\{X=x_{j}\}}. (7)

    Substituting yy by the values yiy_{i} and y(i1)y_{(i-1)} for an arbitrary index ii leads to

    yly(i1)P{Y=yl,X=xk}P{X=xk}=yly(i1)P{Y=yl,X=xj}P{X=xj}\frac{\sum_{y_{l}\leq y_{(i-1)}}{\rm P}\{Y=y_{l},~{}X=x_{k}\}}{{\rm P}\{X=x_{k}\}}=\frac{\sum_{y_{l}\leq y_{(i-1)}}{\rm P}\{Y=y_{l},~{}X=x_{j}\}}{{\rm P}\{X=x_{j}\}} (8)

    and

    ylyiP{Y=yl,X=xk}P{X=xk}=ylyiP{Y=yl,X=xj}P{X=xj}.\frac{\sum_{y_{l}\leq y_{i}}{\rm P}\{Y=y_{l},~{}X=x_{k}\}}{{\rm P}\{X=x_{k}\}}=\frac{\sum_{y_{l}\leq y_{i}}{\rm P}\{Y=y_{l},~{}X=x_{j}\}}{{\rm P}\{X=x_{j}\}}. (9)

    Subtracting equation (8) from (9) gives

    P{Y=yi,X=xk}P({X=xk}=P{Y=yi,X=xj}P{X=xj}\frac{{\rm P}\{Y=y_{i},~{}X=x_{k}\}}{{\rm P}(\{X=x_{k}\}}=\frac{{\rm P}\{Y=y_{i},~{}X=x_{j}\}}{{\rm P}\{X=x_{j}\}} (10)

    , which is equivalent to

    P{Y=yi,X=xk}=P{Y=yi,X=xj}P{X=xj}P({X=xk}.{\rm P}\{Y=y_{i},~{}X=x_{k}\}=\frac{{\rm P}\{Y=y_{i},~{}X=x_{j}\}}{{\rm P}\{X=x_{j}\}}\cdot{\rm P}(\{X=x_{k}\}. (11)

    In addition, for each index ii and consequently

    P{Y=yi}\displaystyle{\rm P}\{Y=y_{i}\} =k=1nP{Y=yi,X=xk}\displaystyle=\sum_{k=1}^{n}{\rm P}\{Y=y_{i},~{}X=x_{k}\} (12)
    =k=1nP{Y=yi,X=xj}P{X=xj}P({X=xk}\displaystyle=\sum_{k=1}^{n}\frac{{\rm P}\{Y=y_{i},~{}X=x_{j}\}}{{\rm P}\{X=x_{j}\}}\cdot{\rm P}(\{X=x_{k}\}
    =P{Y=yi,X=xj}P{X=xj}k=1nP{X=xk}\displaystyle=\frac{{\rm P}\{Y=y_{i},~{}X=x_{j}\}}{{\rm P}\{X=x_{j}\}}\sum_{k=1}^{n}{\rm P}\{X=x_{k}\}
    =P{Y=yi,X=xj}P{X=xj}.\displaystyle=\frac{{\rm P}\{Y=y_{i},~{}X=x_{j}\}}{{\rm P}\{X=x_{j}\}}.

    This implies that

    P{Y=yi,X=xj}=P{Y=yi}P{X=xj}{\rm P}\{Y=y_{i},~{}X=x_{j}\}={\rm P}\{Y=y_{i}\}{\rm P}\{X=x_{j}\} (13)

    for all ii and jj, which means that the random variables XX and YY are independent.

Case b: In the continuous case, the statistical independence of XX and YY is equivalent to the relation

fYX(y,x)=fY(y)fX(x)almost everywhere,f_{YX}(y,x)=f_{Y}(y)f_{X}(x)\quad\mbox{almost everywhere,} (14)

where fYf_{Y} and fXf_{X} are the marginal probability density functions of YY and XX respectively [7].

  1. 1.

    Necessary: The conditional probability density function of YY given X=xX=x is defined by

    fY|X(y|x):=fYX(y,x)fX(x)f_{Y|X}(y|x):=\frac{f_{YX}(y,~{}x)}{f_{X}(x)} (15)

    for every value xx with fX(x)>0\,f_{X}(x)>0\, [4]. If XX and YY are independent, then it follows from equation (14) that

    fY|X(y|x)=fY(y)almost everywheref_{Y|X}(y|x)=f_{Y}(y)\quad\mbox{almost everywhere} (16)

    for every possible value xx with fX(x)>0\,f_{X}(x)>0. Thus the conditional cumulative distribution function of YY given X=xX=x is

    FY|X(y|x):=P(Yy|X=x)=yfY|X(u|x)dμ(u)\,F_{Y|X}(y|x):=P(Y\leq y|X=x)=\int_{-\infty}^{y}f_{Y|X}(u|x)\,\mbox{d}\mu(u)\,

    are all equal if xx is varied. Consequently, by Definition 1, XX and YY are uncorrelated.

  2. 2.

    Sufficient: Conversly, if we assume that

    FY|X(y|xi)=FY|X(y|xj)F_{Y|X}(y|x_{i})=F_{Y|X}(y|x_{j}) (17)

    for every pair of values xixj\,x_{i}\neq x_{j}\, for which FY|X\,F_{Y|X}\, can be defined. We get

    fY|X(y|xi)=FY|X(y|xi)=FY|X(y|xj)=fY|X(y|xj)almost everywhere.f_{Y|X}(y|x_{i})=F_{Y|X}^{\prime}(y|x_{i})=F_{Y|X}^{\prime}(y|x_{j})=f_{Y|X}(y|x_{j})\quad\mbox{almost everywhere}. (18)

    If we fix xix_{i} arbitrarily, then

    fYX(y,xi)fX(xi)\displaystyle\frac{f_{YX}(y,~{}x_{i})}{f_{X}(x_{i})} =fYX(y,xj)fX(xj)almost everywhere\displaystyle=\frac{f_{YX}(y,~{}x_{j})}{f_{X}(x_{j})}\quad\mbox{almost everywhere} (19)
    fYX(y,xj)\displaystyle\Longleftrightarrow f_{YX}(y,~{}x_{j}) =fYX(y,xi)fX(xi)fX(xj)almost everywhere\displaystyle=\frac{f_{YX}(y,~{}x_{i})}{f_{X}(x_{i})}f_{X}(x_{j})\quad\mbox{almost everywhere}

    for every value xjx_{j}. Consequently,

    fY(y)\displaystyle f_{Y}(y) =fYX(y,xj)dμ(xj)\displaystyle=\int f_{YX}(y,~{}x_{j})\,\mbox{d}\mu(x_{j}) (20)
    =fYX(y,xi)fX(xi)fX(xj)dμ(xj)\displaystyle=\int\frac{f_{YX}(y,~{}x_{i})}{f_{X}(x_{i})}f_{X}(x_{j})\,\mbox{d}\mu(x_{j}) (21)
    =fYX(y,xi)fX(xi)fX(xj)dμ(xj)\displaystyle=\frac{f_{YX}(y,~{}x_{i})}{f_{X}(x_{i})}\int f_{X}(x_{j})\,\mbox{d}\mu(x_{j}) (22)
    =fYX(y,xi)fX(xi)almost everywhere,\displaystyle=\frac{f_{YX}(y,~{}x_{i})}{f_{X}(x_{i})}\quad\mbox{almost everywhere}, (23)

    which is equivalent to equation (14), which means that XX and YY are statistically independent.
    If, on the other hand, xix_{i} is such that fX(xi)=0\,f_{X}(x_{i})=0, then

    fY(y)fX(xi)=0f_{Y}(y)f_{X}(x_{i})=0 (24)

    and

    0=fX(xi)=fYX(y,xi)dμ(y)0=f_{X}(x_{i})=\int f_{YX}(y,~{}x_{i})\,\mbox{d}\mu(y) (25)

    implies

    fYX(y,xi)=0f_{YX}(y,~{}x_{i})=0 (26)

    for almost every yy because the integrand is nonnegative.

Figure 1 shows a case of no correlation between the variables XX and YY. One of the reasons for the study of correlation between two variables is to seek a functional relationship between two random variables (See [1] and Figure 1 for examples.). However, when it is not possible to establish a functional relationship between XX and YY (see Figure 1 for example), then measuring correlation has not been sufficiently dealt with in the past. Such a situation occurs in the prediction of a surface roughness in a turning operation. The development and application of a correlation concept for such a scenario is the objective of this paper. Existing correlation measures are briefly introduced in next section. Let x1,x2,,xn\,x_{1},~{}x_{2},~{}\dots,~{}x_{n}\, and y1,y2,,yn\,y_{1},~{}y_{2},~{}\dots,~{}y_{n}\, be samples of two random variables XX and YY, respectively.

2.1 Pearson Correlation Coefficient

The standard Pearson correlation coefficient [10]

r:=i=1n(xix¯)(yiy¯)i=1n(xix¯)2i=1n(xix¯)2r:=\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}} (27)

of Pearson [10], where x¯\bar{x} and y¯\bar{y} are the sample means respectively. The range of correlation coefficient rr is [-1, 1]. The closer rr is to 1 or -1, the stronger the correlation between the two random variables XX and YY. The degree of the linear dependency between XX and YY can be measured through |r||r|: |r|=1\,|r|=1\, if and only if the points (xi,yi)(x_{i},~{}y_{i}) describe a straight line in 2\mathbb{R}^{2} which is neither horizontal nor vertical.

For the random variable XX and YY in Fig. 1, |r|=0.136|r|=-0.136. Since pearson correlation only detect the linear dependency, so if XX and YY are nonlinearly correlated, the pearson correlation is still close to 0. For example, the rr for Fig. 1 is 0.015.

2.2 Spearman’s Rank Correlation Coefficient

We present the Spearman’s rank correlation which is a nonparametric correlation coefficient for two numerical variables denoted by ρ\rho, which ranges from -1 to 1 [10]. And the more the |ρ||\rho| is closer to 0, the less association between XX and YY. First, sort x1,x2,,xn\,x_{1},~{}x_{2},~{}\dots,~{}x_{n}\, and y1,y2,,yn\,y_{1},~{}y_{2},~{}\dots,~{}y_{n}\, in an ascending order. Next, complete a sequence α1,α2,,αn\,\alpha_{1},~{}\alpha_{2},~{}\dots,~{}\alpha_{n}\, in which αi\alpha_{i} is the position of the corresponding element xix_{i} in the sorted sequence i.e. αi\alpha_{i} = 1 if xix_{i} is the smallest , αi\alpha_{i} = 2 if xix_{i} is the second smallest and so on. In a similar fashion, create a sequence β1,β2,,βn\,\beta_{1},~{}\beta_{2},~{}\dots,~{}\beta_{n}\, of ranks corresponding to the sequence y1,y2,,yn\,y_{1},~{}y_{2},~{}\dots,~{}y_{n}\, defined as

ρ=16i=1n(αiβi)2n(n21).\rho=1-\frac{6\sum_{i=1}^{n}(\alpha_{i}-\beta_{i})^{2}}{n(n^{2}-1)}. (28)

The Spearman’s rank correlation coeffecient |ρ||\rho| determines the degree to which a monotonic relationship exists between the two variables XX and YY.

Fig. 1 has no apparent monotonic relationship, so its |ρ||\rho| = 0.108. Although Fig. 1 has clear nonlinear relationship, but due to the limitation of Spearman’s, the |ρ||\rho| is also very small, which is 0.034. Fig. 1 has a rough monotonic increasing relationship, the |ρ||\rho| = 0.889, which is close to 1.

2.3 Kendall’s Tau

The Kendall nonparametric measure τ\tau of correlation between XX and YY [5] is defined as

τ:=2i=1n1j=i+1nΔijn(n1),\tau:=\frac{2\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}\Delta_{ij}}{n(n-1)}, (29)

where

Δij:={1if(xjxi)(yjyi)>00if(xjxi)(yjyi)=01if(xjxi)(yjyi)<0\Delta_{ij}:=\left\{\begin{array}[]{cc}1&\text{if}\phantom{a}(x_{j}-x_{i})(y_{j}-y_{i})>0\cr 0&\text{if}\phantom{a}(x_{j}-x_{i})(y_{j}-y_{i})=0\cr-1&\text{if}\phantom{a}(x_{j}-x_{i})(y_{j}-y_{i})<0\end{array}\right. (30)

If all values in both of the sequences x1,x2,,xn\,x_{1},~{}x_{2},~{}\dots,~{}x_{n}\, and y1,y2,,yn\,y_{1},~{}y_{2},~{}\dots,~{}y_{n}\, are different, then Kendall’s tau for these two sequences is equal to another correlation measure, which is called Goodman and Kruskal’s gamma [3]. The range of τ\tau is also between [-1, 1], which is similar as Pearson correlation coefficient and Spearman’s rank correlation coefficient. If |τ||\tau| is close to 0, it means very small ordinal association between XX and YY is found. Inversely, when |τ||\tau| is close to 1, XX and YY have strong ordinal association.

Fig. 1 doesn’t have an obvious ordinal association between XX and YY, so its |τ||\tau| is close to 0, |τ|=0.070|\tau|=0.070. Fig. 1 has ordinal association between XX and YY only for the first half observations and an inverse ordinal association for the second half observations, so its |τ||\tau| is also close to 0, |τ|=0.033|\tau|=0.033. However, Fig. 1 has an obvious ordinal association between XX and YY, so its |τ||\tau| is relatively large, which is 0.707.

2.4 Nonlinear Correlation Coefficient

For two variables X,YX,Y, we can estimate its correlation by calculate their mutual information after sorting and grouping their values [11]. Given the discrete variables X={x1,x2,,xn1,xn},Y={y1,y2,,yn1,yn}X=\{x_{1},x_{2},...,x_{n-1},x_{n}\},Y=\{y_{1},y_{2},...,y_{n-1},y_{n}\}:

Step 1  Sort {xi},{yi},i=1,2,..,n\{x_{i}\},\{y_{i}\},i=1,2,..,n in ascending order. Then, we use Xs,YsX^{s},Y^{s} to represent the ordered X,YX,Y. Xs={x(1),x(2),,x(n1),x(n)},Ys={y(1),y(2),,y(n1),y(n)}X^{s}=\{x_{(1)},x_{(2)},...,x_{(n-1)},x_{(n)}\},Y^{s}=\{y_{(1)},y_{(2)},...,y_{(n-1)},y_{(n)}\}, and x(1)x(2)x(n1)x(n),y(1)y(2)y(n1)y(n)x_{(1)}\leq x_{(2)}\leq...\leq x_{(n-1)}\leq x_{(n)},y_{(1)}\leq y_{(2)}\leq...\leq y_{(n-1)}\leq y_{(n)}


Step 2  Split Xs,YsX^{s},Y^{s} into bb ranks , and each rank contains nb\frac{n}{b} observations.


Step 3  For the pair {(x(i),y(i))},i=1,2,..n,\{(x_{(i)},y_{(i)})\},i=1,2,..n, we split them into b2b^{2} regions.


Step 4  Calculate noncorrelation coefficient between XX and YY with the mutual information based on the ranks

The nonlinear correlation coefficient (NCC) is as below:

NCC(X;Y)=H(Xs)+H(Ys)H(Xs,Ys)NCC(X;Y)=H(X^{s})+H(Y^{s})-H(X^{s},Y^{s})

, H(Xs),H(Ys)H(X^{s}),H(Y^{s}) are the entropy for Xs,YsX^{s},Y^{s}, which are calculated based on bb ranks, and H(Xs,Ys)H(X^{s},Y^{s}) is the joint entropy,which are calculated based on b2b^{2} regions.

H(Xs)=i=1bninlogbnin=i=1bn/bnlogbn/bn=1,H(X^{s})=-\sum_{i=1}^{b}\frac{n_{i}}{n}\log_{b}\frac{n_{i}}{n}=-\sum_{i=1}^{b}\frac{n/b}{n}\log_{b}\frac{n/b}{n}=1,
H(Ys)=j=1bnjnlogbnjn=j=1bn/bnlogbn/bn=1,H(Y^{s})=-\sum_{j=1}^{b}\frac{n_{j}}{n}\log_{b}\frac{n_{j}}{n}=-\sum_{j=1}^{b}\frac{n/b}{n}\log_{b}\frac{n/b}{n}=1,

and

H(Xs,Ys)=i=1bj=1bnijnlogbnijn,H(X^{s},Y^{s})=-\sum_{i=1}^{b}\sum_{j=1}^{b}\frac{n_{ij}}{n}\log_{b}\frac{n_{ij}}{n},

bb is the base for logarithm, ni,njn_{i},n_{j} are the number of observations in ii-th,jj-th rank, nijn_{ij} is the number of observations in (i,j)(i,j) ragion.

Thus,

NCC(X;Y)=2+i=1bj=1bnijnlogbnijnNCC(X;Y)=2+\sum_{i=1}^{b}\sum_{j=1}^{b}\frac{n_{ij}}{n}\log_{b}\frac{n_{ij}}{n}

.

The range for this nonlinear correlation coefficient is [0,1][0,1]. And 1 means that a strong nonlinear relationship is being detected and 0 means no nonlinear relationship is being found.

We choose b=10b=10 for calculating the nonlinear correlation coefficient. If XX and YY are not related, the NCCNCC is small (see example Fig. 1, whose NCCNCC is 0.239). NCCNCC for Fig. 1 and Fig. 1 are 0.433 and 0.716, which are larger since these two figures has more clear pattern than Fig. 1. In addition, for Fig. 2, NCC=0.370NCC=0.370. And for Fig. 3, NCC=1NCC=1. We can see from the results that when XX and YY has a more clear nonlinear relationship, its NCCNCC is relatively larger.

3 FECHNER CORRELATION COEFFICIENT

In this section Fechner correlation coefficient [6] is reviewed which is not as widely known in the literature.

3.1 Definition and Interpretation

The Fechner correlation coefficient is defined as

κ:=1ni=1nsign(xix¯)sign(yiy¯),\kappa:=\frac{1}{n}\sum_{i=1}^{n}\mbox{sign}(x_{i}-\bar{x})\mbox{sign}(y_{i}-\bar{y}), (31)

where x¯\bar{x} and y¯\bar{y} are the sample means of the sequences (x1,x2,,xn)\,(x_{1},~{}x_{2},~{}\dots,~{}x_{n})\, and (y1,y2,,yn)\,(y_{1},~{}y_{2},~{}\dots,~{}y_{n}), respectively, and

sign(u):={1ifu01ifu<0\mbox{sign}(u):=\left\{\begin{array}[]{cc}1&\text{if}\phantom{a}u\geq 0\cr-1&\text{if}\phantom{a}u<0\end{array}\right. (32)

is the sign function. Fechner correlation coefficient κ\kappa is calculated using the following scheme:

Step 1  The sequence ((x1,y1),(x2,y2),,(xn,yn))\,((x_{1},~{}y_{1}),~{}(x_{2},~{}y_{2}),~{}\dots,~{}(x_{n},~{}y_{n}))\, is sorted in an based on xix_{i}. Let i0i_{0} denote the largest index ii with xi<x¯,1in\,x_{i}<\bar{x}\,,1\leq i\leq n .


Step 2  The sequence from Step 1 is converted to a binary sequence b=(b1,b2,,bm)(mn)\,b=(b_{1},~{}b_{2},~{}\dots,~{}b_{m})\\ (m\leq n)\, by replacing an element (xi,yi)\,(x_{i},~{}y_{i})\, by 0, if yi<y¯\,y_{i}<\bar{y}\, and by 1, if yiy¯\,y_{i}\geq\bar{y}.


Step 3  The Fechner correlation coefficient is then calculated as

κ=1n(i=1i0(12bi)+i=i0+1n(2bi1)).\kappa=\frac{1}{n}\left(\sum_{i=1}^{i_{0}}(1-2b_{i})+\sum_{i=i_{0}+1}^{n}(2b_{i}-1)\right). (33)

Note that κ\kappa = 1 if the sequence b has the form b=(0,0,,0,1,1,,1)b=(0,~{}0,~{}\dots,~{}0,~{}1,~{}1,~{}\dots,~{}1) with the jump from 0 to 1 occuring at the indices i0+1\,i_{0}+1. On the other hand, κ\kappa is equal to 1-1 if the sequence bb has the form b=(1,1,,1,0,0,,0)b=(1,~{}1,~{}\dots,~{}1,~{}0,~{}0,~{}\dots,~{}0) with the jump from 1 to 0 occuring again at the indices i0+1\,i_{0}+1. Figure 2, κ=0.907\,\kappa=0.907. For Figure 1, κ=0.020\,\kappa=-0.020 and for Figure 1, the κ\kappa = 0.580.

As in the case for a straight line associated with the standard Pearson correlation coefficient, κ\kappa is related to a prediction model. For |κ|1\,|\kappa|\approx 1\, it provides a classification scheme for classifying the values yy given the values xx

y<y¯if(xx¯)sign(κ)<0,\displaystyle y<\bar{y}\quad\text{if}\quad(x-\bar{x})\mbox{sign}(\kappa)<0,
y=y¯ifx=x¯,\displaystyle y=\bar{y}\quad\text{if}\quad x=\bar{x}, (34)
y>y¯if(xx¯)sign(κ)>0\displaystyle y>\bar{y}\quad\text{if}\quad(x-\bar{x})\mbox{sign}(\kappa)>0

3.2 Properties

One of the drawbacks of the Fechner correlation coefficient is that it does not provide any insight of the shape of the data {(xi,yi):i=1,2,,n}\,\{(x_{i},~{}y_{i}):i~{}=~{}1,~{}2,~{}\dots,~{}n\}. However demonstrated in Figure 1 and Figure 2, due to the information reduction in Step 1 and 2, κ\kappa permits the detection of correlations even when accurate predictions of YY are not possible. This can be a big advantage of the Fechner correlation coefficient, at least in certain cases.

Assume that data points (xi,yi)\,(x_{i},~{}y_{i})\, lie on a straight line of the form y=ax+by=ax+b.

If a=0\,a=0, then

κ=1ni=1nsign(xix¯)n2+(n2)n=0.\kappa=\frac{1}{n}\sum_{i=1}^{n}\mbox{sign}(x_{i}-\bar{x})\simeq\frac{\frac{n}{2}+(-\frac{n}{2})}{n}=0. (35)

For a0,\,a\neq 0\,, for i=1,2,,n\,i~{}=~{}1,~{}2,~{}\dots,~{}n\, it follows that

y¯=1ni=1nyi=1n(ai=1nxi+nb)=ax¯+b\bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_{i}=\frac{1}{n}\Bigl{(}a\sum_{i=1}^{n}x_{i}+nb\Bigr{)}=a\bar{x}+b (36)

consequently the Fechner correlation coefficient becomes

κ=1ni=1nsign(xix¯)sign[a(xix¯)]=1ni=1na(sign(xix¯))2=signa.\kappa=\frac{1}{n}\sum_{i=1}^{n}\mbox{sign}(x_{i}-\bar{x})\mbox{sign}[a(x_{i}-\bar{x})]=\frac{1}{n}\sum_{i=1}^{n}a(\mbox{sign}(x_{i}-\bar{x}))^{2}=\mbox{sign}\,a. (37)

That is |κ|=1\,|\kappa|=1\, if the points (xi,yi)fori=1,2,,n\,(x_{i},~{}y_{i})\phantom{a}\text{for}\phantom{a}i~{}=~{}1,~{}2,~{}\dots,~{}n\, lie on a straight line -- a property shared with the correlation coefficients.

However, often data can not be a sample from a strictly monotonically increasing function for which κ\kappa indicates that they are uncorrelated with a small negative correlation. See Figure 3. In the next section,the Fechner correlation coefficient is improved to handle such cases.

4 gg-CORRELATION

As discussed earlier, the Fechner correlation coefficient κ\kappa need not detect monotonic relationships between XX and YY as opposed to the correlation measures presented in the subsections 3.B and 3.C. It is shown that the Fechner correlation coefficient can be improved by splitting the data points by a vertical and a horizontal line in a more sensible way instead of arbitrarily dividing the data into 4 classes based on the lines of x=x¯\,x=\bar{x}\, and y=y¯\,y=\bar{y}

4.1 Definition

As a first step consider the line y=y~\,y=\tilde{y}\,, where y~\tilde{y} is the median of YY, to divide the space of measurements into the following two classes

C1:={(x,y)2|y>y~}andC2:={(x,y)2|y<y~}.C_{1}:=\{(x,~{}y)\in\mathbb{R}^{2}~{}|~{}y>\tilde{y}\}\quad\mbox{and}\quad C_{2}:=\{(x,~{}y)\in\mathbb{R}^{2}~{}|~{}y<\tilde{y}\}. (38)

Assume that the distribution function of YY is continuous and strictly monotonically increasing or decreasing with respect to XX.
Case 1: The number of observations of a given dataset,n~{}n, is even.

Due to the property of y~\tilde{y}, we have:

P(C1)\displaystyle{\mbox{P}}(C_{1}) =1ni=1n[𝟙{yi>y~}]=n2n=12,1in.\displaystyle=\frac{1}{n}\sum_{i=1}^{n}[\mathbbm{1}\{y_{i}>\tilde{y}\}]=\frac{\frac{n}{2}}{n}=\frac{1}{2}\,,1\leq i\leq n. (39)
P(C2)\displaystyle{\mbox{P}}(C_{2}) =1P(C1)=12.\displaystyle=1-{\mbox{P}}(C_{1})=\frac{1}{2}.

And,

𝟙{y>y~}:={1 if y>y~0 if y<y~\mathbbm{1}\{y>\tilde{y}\}:=\begin{cases}1~{}&{\text{ if }}~{}y>\tilde{y}{}\\ 0~{}&{\text{ if }}~{}y<\tilde{y}{}\end{cases} (40)

.


Case 2: The number of observations of a given dataset,n~{}n, is odd.

The above assumption means that there is only one data point (xm,ym),m[1,n](x_{m},y_{m}),~{}m\in[1,n] such that ym=y~y_{m}=\tilde{y}.

Assume nn is large and due to the property of y~\tilde{y}, we have:

P(C1)\displaystyle{\mbox{P}}(C_{1}) =1ni=1n[𝟙{yi>y~}]=n12n12,\displaystyle=\frac{1}{n}\sum_{i=1}^{n}[\mathbbm{1}\{y_{i}>\tilde{y}\}]=\frac{\frac{n-1}{2}}{n}\simeq\frac{1}{2}\,, (41)
P(C2)\displaystyle{\mbox{P}}(C_{2}) =1P(C1)1n=n12n12.\displaystyle=1-{\mbox{P}}(C_{1})-\frac{1}{n}=\frac{\frac{n-1}{2}}{n}\simeq\frac{1}{2}\,.

Thus, each of the classes C1C_{1} and C2C_{2} contain about half of the observations in the data set, leading to an optimal separation.

Instead of choosing the fixed line x=x¯\,x=\bar{x}\, for segmenting the plane that is formed by the dataset into 4 classes C1+,C1,C2+,C2C_{1}^{+},~{}C_{1}^{-},~{}C_{2}^{+},~{}C_{2}^{-}, and

C1+\displaystyle C_{1}^{+} :={(x,y)2|x>c,y>y~}\displaystyle:=\{(x,~{}y)\in\mathbb{R}^{2}~{}|~{}x>c,~{}y>\tilde{y}\}\quad (42)
C1\displaystyle C_{1}^{-} :={(x,y)2|xc,y>y~}\displaystyle:=\{(x,~{}y)\in\mathbb{R}^{2}~{}|~{}x\leq c,~{}y>\tilde{y}\}\quad
C2+\displaystyle C_{2}^{+} :={(x,y)2|x>c,y<y~}\displaystyle:=\{(x,~{}y)\in\mathbb{R}^{2}~{}|~{}x>c,~{}y<\tilde{y}\}\quad
C2\displaystyle C_{2}^{-} :={(x,y)2|xc,y<y~},\displaystyle:=\{(x,~{}y)\in\mathbb{R}^{2}~{}|~{}x\leq c,~{}y<\tilde{y}\},\quad

we will use the optimum line:


Definition 3 Two random variable XX and YY are said to be correlated if there exists c\,c\in\mathbb{R}\, such that the criterion

x=cx=c (43)

assigns realizations (xi,yi)\,(x_{i},~{}y_{i})\, of (X,Y)\,(X,~{}Y)\, to class C1+C_{1}^{+} or C1+C_{1}^{+} if they are classified as C1C_{1} based on equation (38), or class C2+C_{2}^{+} or C2+C_{2}^{+} if they are classified as C2C_{2}. And the g-correlation is

argmaxcg(c):=max{P(C1+)+P(C2),P(C1)+P(C2+)}.\operatorname*{argmax}_{c}g(c):=\max\{{\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-}),~{}{\mbox{P}}(~{}C_{1}^{-})+{\mbox{P}}(~{}C_{2}^{+})\}. (44)

The supremum (the largest upper bound) of all such classification probabilities obtained via the equation (43) for different cc is called the gg-correlation coefficient of XX and YY.

4.2 Properties

Lemma 4 The range of g-correlation coefficient of XX and YY is [0.5, 1].

Proof.

For any given cc,

g(c)=max{P(C1+)+P(C2),P(C1)+P(C2+)},g(c)=\max\{{\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-}),~{}{\mbox{P}}(~{}C_{1}^{-})+{\mbox{P}}(~{}C_{2}^{+})\}, (45)

the restrictions for the above equation are ,

P(C1+)\displaystyle P(C_{1}^{+}) +P(C1)+P(C2+)+P(C2)=1\displaystyle+P(C_{1}^{-})+P(C_{2}^{+})+P(C_{2}^{-})=1 (46)
P(C1+)\displaystyle P(C_{1}^{+}) +P(C1)=P(C1)=0.5\displaystyle+P(C_{1}^{-})=P(C_{1})=0.5
P(C2+)\displaystyle P(C_{2}^{+}) +P(C2)=P(C2)=0.5\displaystyle+P(C_{2}^{-})=P(C_{2})=0.5
P(C1+)\displaystyle P(C_{1}^{+}) =mP(C2+)\displaystyle=mP(C_{2}^{+})
P(C1)\displaystyle P(C_{1}^{-}) =nP(C2)\displaystyle=nP(C_{2}^{-})

In addition, m,n>0m,n>0, this is because when cc changes, the number of observations on the same side of vertical line x=cx=c will increase or decrease at the same time.

g(c)\displaystyle g(c) =max{P(C1+)+P(C2),P(C1)+P(C2+)}\displaystyle=\max\{{\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-}),~{}{\mbox{P}}(~{}C_{1}^{-})+{\mbox{P}}(~{}C_{2}^{+})\} (47)
=max{P(C1+)+P(C2),1(P(C1+)+P(C2))}\displaystyle=\max\{{\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-}),~{}1-({\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-}))\}
=max{mP(C2+)+P(C2),1(P(C1+)+P(C2))}\displaystyle=\max\{m\cdot{\mbox{P}}(~{}C_{2}^{+})+{\mbox{P}}(~{}C_{2}^{-}),~{}1-({\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-}))\}
=max{(m1)P(C2+)+P(C2+)+P(C2),1(P(C1+)+P(C2))}\displaystyle=\max\{(m-1)\cdot{\mbox{P}}(~{}C_{2}^{+})+{\mbox{P}}(~{}C_{2}^{+})+{\mbox{P}}(~{}C_{2}^{-}),~{}1-({\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-}))\}
=max{(m1)P(C2+)+0.5,1((m1)P(C2+)+0.5)}\displaystyle=\max\{(m-1)\cdot{\mbox{P}}(~{}C_{2}^{+})+0.5,~{}1-((m-1)\cdot{\mbox{P}}(~{}C_{2}^{+})+0.5)\}
=max{(m1)P(C2+)+0.5,(1m)P(C2+)+0.5}\displaystyle=\max\{(m-1)\cdot{\mbox{P}}(~{}C_{2}^{+})+0.5,~{}(1-m)\cdot{\mbox{P}}(~{}C_{2}^{+})+0.5\}

In addition, since P(C2+){\mbox{P}}(~{}C_{2}^{+}) is a probability, thus P(C2+)0{\mbox{P}}(~{}C_{2}^{+})\geq 0.

When m10m-1\geq 0,

(m1)P(C2+)+0.50.5,\displaystyle(m-1)\cdot{\mbox{P}}(~{}C_{2}^{+})+0.5\geq 0.5, (48)
(1m)P(C2+)+0.50.5,\displaystyle(1-m)\cdot{\mbox{P}}(~{}C_{2}^{+})+0.5\leq 0.5,

and the equation (47) equals to:

g(c)=(m1)P(C2+)+0.50.5.g(c)=(m-1)\cdot{\mbox{P}}(~{}C_{2}^{+})+0.5\geq 0.5. (49)

Similarly, when (m1)<0(m-1)<0 and m>0m>0, the equation (47) equals to:

g(c)=(1m)P(C2+)+0.50.5.g(c)=(1-m)\cdot{\mbox{P}}(~{}C_{2}^{+})+0.5\geq 0.5. (50)

Therefore, g(c)0.5g(c)\geq 0.5 all the time.

And according to equation (46),

P(C1+)=0{\mbox{P}}(~{}C_{1}^{+})=0

, and

P(C1)=P(C2)=0.5.{\mbox{P}}(~{}C_{1}^{-})={\mbox{P}}(~{}C_{2}^{-})=0.5.

Moreover, from equation (46), we can get that

0P(C1+)+P(C2)1,0\leq{\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-})\leq 1, (51)

thus, the maximum of equation (47) equals to 1 when P(C1+)+P(C2)=1{\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-})=1 and P(C1)+P(C2+)=0,{\mbox{P}}(~{}C_{1}^{-})+{\mbox{P}}(~{}C_{2}^{+})=0, or P(C1+)+P(C2)=0{\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-})=0 and P(C1)+P(C2+)=1.{\mbox{P}}(~{}C_{1}^{-})+{\mbox{P}}(~{}C_{2}^{+})=1.

The range for g(c)g(c) is [0.5,1][0.5,~{}1], thus gg-correlation coefficient of XX and YY ranges from 0.5 to 1.

And when gg-correlation is 0.5, it means the XX and YY are not correlated. When gg-correlation is 1, XX and YY are perfectly correlated. ∎

When we define gg-correlation in section 4.1, we assume that the distribution function of YY is continuous and strictly monotonically increasing or decreasing with respect to XX. But, if we loose the assumption, gg-correlation still works. However, we need to remove all the data points (xi,yi)(x_{i},y_{i}), who share the same trait: yi=y~y_{i}=\tilde{y}, after finding the y~\tilde{y} with original dataset. Then use the new modified dataset to calculate gg-correlation.

Note that if the new modified dataset has 0 data points, which means the YY is constant, we don’t need to calculate the gg-correlation since XX and YY are uncorrelated for sure based on definition 1 in section 2. Similarly, if XX is constant and YY varies, XX and YY are uncorrelated as well.

The gg-correlation coefficient ω\omega is in general, not symmetric, as shown in Figure 3. In that respect ω\omega differs from the rest of the correlation coefficients described earlier. With respect to a gg-correlation of XX and YY in Figure 3, the set

{(x0,y0):x0>c,y0>y~},\{(x_{0},~{}y_{0}):x_{0}>c,~{}y_{0}>\tilde{y}\}, (52)

where (x0,y0)\,(x_{0},~{}y_{0})\, are realizations of the random vector (X,Y)\,(X,~{}Y), contains 50% on the average and the set

{(x0,y0):x0c,y0<y~}\{(x_{0},~{}y_{0}):x_{0}\leq c,~{}y_{0}<\tilde{y}\} (53)

contains 25% of all measurements on the average. Note that cc is optimal because moving the line x=c\,x=c\, to the left would just decrease the probability

P(Xc,Y<y~){\rm P}(X\leq c,~{}Y<\tilde{y}) (54)

and moving the line x=c\,x=c\, to the right would just decrease the probability

P(X>c,Y>y~).{\rm P}(X>c,~{}Y>\tilde{y}). (55)

The following lemma establishes the main distinction between ω\omega and the Fechner correlation coefficient:


Lemma 6 Assume Y=f(X)\,Y=f(X)\, for a strictly monotonic continuous function f()f(\cdot) and the mean of YY is y~\tilde{y}, then gg-correlated between XX and YY can be obtained when c=f1(y~)c=f^{-1}(\tilde{y}), which is 1.


Proof.

Without loss of generality let f(.)f(.) be a strictly monotonically increasing function. Suppose that ϵ>0\epsilon>0, define α1\alpha_{1} and α2\alpha_{2} by

α1:=sup{x:f(x)<y~}+ϵandα2:=inf{x:f(x)>y~}ϵ.\alpha_{1}:=\sup\{x\in\mathbb{R}:f(x)<\tilde{y}\}+\epsilon\quad\mbox{and}\quad\alpha_{2}:=\inf\{x\in\mathbb{R}:f(x)>\tilde{y}\}-\epsilon. (56)

Since sup{x:f(x)<y~}\sup\{x\in\mathbb{R}:f(x)<\tilde{y}\} is the largest value of xx such that f(x)<y~f(x)<\tilde{y}, thus,

f(x)y~, when xα1andf(x)<y~, when x<α1.\displaystyle f(x)\geq\tilde{y},\text{~{}when~{}}x\geq\alpha_{1}\quad\mbox{and}\quad f(x)<\tilde{y},\text{~{}when~{}}x<\alpha_{1}. (57)

Similarly, inf{x:f(x)>y~}\inf\{x\in\mathbb{R}:f(x)>\tilde{y}\} is the smallest value of xx such that f(x)>y~f(x)>\tilde{y}, so,

f(x)>y~, when x>α2andf(x)y~, when xα2.f(x)>\tilde{y},\text{~{}when~{}}x>\alpha_{2}\quad\mbox{and}\quad f(x)\leq\tilde{y},\text{~{}when~{}}x\leq\alpha_{2}. (58)

In addition, if α1>α2\,\alpha_{1}>\alpha_{2}, then for every ξ[α2,α1]\,\xi\in[\alpha_{2},~{}\alpha_{1}]\,, according to equation(57) and equation(58), f(ξ)>y~f(\xi)>\tilde{y} and f(ξ)<y~f(\xi)<\tilde{y}, which is a contradiction.

Similarly, if α1α2\,\alpha_{1}\leq\alpha_{2}, then for every ξ\xi  such that α1ξα2\,\alpha_{1}\leq\xi\leq\alpha_{2}\,, we have y~f(ξ)y~\tilde{y}\leq f(\xi)\leq\tilde{y},  which is f(ξ)=y~f(\xi)=\tilde{y}.

However, since Y=f(X)\,Y=f(X)\, for a strictly monotonic function f()f(\cdot), so xx and f(x)f(x) are one-to-one relationship, and there is a unique xx to get the median y~\tilde{y}. Thus,

x=α1=α2 and f(α1)=f(α2)=y~x=\alpha_{1}=\alpha_{2}\text{~{}and~{}}f(\alpha_{1})=f(\alpha_{2})=\tilde{y} (59)

Then, we will show that the gg-correlation between XX and YY is 1 when we set c=f1(y~)c=f^{-1}(\tilde{y}), which is also α1\alpha_{1} .

Let’s split the dataset into 4 classes based on equation (42) with two lines: x=α1x=\alpha_{1}, y=y~y=\tilde{y}. From the equation(59), (57), (58), We know that no points belongs to class C1C_{1}^{-} or class C2+C_{2}^{+} and all of them belongs either to class C1+C_{1}^{+} or class C2C_{2}^{-}. Thus, according to definition 3,

g(c)=max{P(C1+)+P(C2),P(C1)+P(C2+)}=max{1,0}=1g(c)=\max\{{\mbox{P}}(~{}C_{1}^{+})+{\mbox{P}}(~{}C_{2}^{-}),~{}{\mbox{P}}(~{}C_{1}^{-})+{\mbox{P}}(~{}C_{2}^{+})\}=\max\{1,0\}=1 (60)

All correlation coefficients described in this paper are invariant to linear transformations of the form w=av+d(a>0,d)\,w=av+d\ (a>0,~{}d\in\mathbb{R})\,. For the Pearson correlation coefficient the proof is given in [10] and for the other correlation measures, the proofs are straight forward and hence omitted.


4.3 Estimation of gg-Correlation

For a given {(xi,yi):i=1,2,,n}\,\{(x_{i},~{}y_{i}):i~{}=~{}1,~{}2,~{}\dots,~{}n\} of measurements, the gg-correlation coefficient ω\omega can only be estimated as described next. Consider dividing the data set into two subsets: a training set TT of size qq and an evaluation set EE of size (nq)\,(n-q). First, estimates for the separating lines y=y~\,y=\tilde{y}\, and x=c\,x=c\, with an appropriate value of cc is found based on the training data set TT. For the median y~\tilde{y} of YY, the sample median

y~:={yn+12fornoddyn2+yn+122forneven,\tilde{y}:=\left\{\begin{array}[]{cc}y^{{}^{\prime}}_{\frac{n+1}{2}}&\text{for}\phantom{a}n\phantom{a}\text{odd}\\ \\ \frac{y^{{}^{\prime}}_{\frac{n}{2}}+y^{{}^{\prime}}_{\frac{n+1}{2}}}{2}&\text{for}\phantom{a}n\phantom{a}\text{even},\end{array}\right. (61)

where (y1,y2,,yn)\,(y^{{}^{\prime}}_{1},y^{{}^{\prime}}_{2},~{}\dots,~{}y^{{}^{\prime}}_{n}) denotes the sequence (y1,y2,,yn)\,(y_{1},~{}y_{2},~{}\dots,~{}y_{n})\, of the yy-values of TT sorted in ascending order, is used.

The following algorithm is used to compute a cc which gives an optimal classification for the training set TT of measurements with respect to the classes C1+,C1,C2+,C2C_{1}^{+},~{}C_{1}^{-},~{}C_{2}^{+},~{}C_{2}^{-} defined in (42) see [8] for an alternative method for finding a resonably good value for cc.


Step 1  Sort all pairs in the sequence s=(x1,y1),(x2,y2),,(xq,yq)q1\,s={(x_{1},~{}y_{1}),~{}(x_{2},~{}y_{2}),~{}\dots,~{}(x_{q},~{}y_{q})}_{q\geq 1}\, in ascending order based on the xx-values.


Step 2  Consider the arithmetic means of xxs’ of all successive pairs in ss as possible candidates for cc. Start the smallest value cc and proceed successively to the highest value.


Step 3  For the first candidate for cc, count the number p1p_{1} of pairs (xi,yi)\,(x_{i},~{}y_{i})\, of ss with xic\,x_{i}\leq c\, and yi<y~\,y_{i}<\tilde{y}\, along with the number p2p_{2} of pairs with xi>c\,x_{i}>c\, and yi>y~\,y_{i}>\tilde{y}. For all other candidates update p1p_{1} and p2p_{2} based on whether the pairs passed since using the previous candidate belong to C1+C_{1}^{+} or C2C_{2}^{-}.


Step 4  Store the maximum classification percentage max{p1+p2,qp1p2}/q\,\max\{p_{1}+p_{2},~{}q-p_{1}-p_{2}\}/q\, achieved for the test dataset EE along with the corresponding candidate for cc. Go to Step 2.


Finally, ω\omega is approximated based on the calculated values y~\tilde{y} and cc using Definition 3 for the dataset TT.

5 MULTIDIMENSIONAL gg-CORRELATION

The multidimensional correlation problem consists in determining whether there exists a correlation between a random vector (X1,X2,,XM)\,(X_{1},~{}X_{2},~{}\dots,~{}X_{M})\, of independent variables and a single dependent random variable YY. From all of the correlation coefficients described in this article only Pearson correlation coefficient [12] and the gg-correlation coefficient ω\omega can be generalized for the multidimensional situation.

When generalizing the gg-correlation coefficient to MM independent variables, the line y=y~\,y=\tilde{y}\, becomes a hyperplane while the classes C1C_{1} and C2C_{2} become halfspaces,respectively. In order to separate the orthogonal projections (x1i,x2i,,xMi)\,(x_{1}^{i},~{}x_{2}^{i},~{}\dots,~{}x_{M}^{i})\, of a set {(x1i,x2i,,xMi,yi):i=1,2,,n}\,\{(x_{1}^{i},~{}x_{2}^{i},~{}\dots,~{}x_{M}^{i},~{}y^{i}):i~{}=~{}1,~{}2,~{}\dots,~{}n\}\, of measurements onto the rr-dimensional space of the independent variables, one cannot use a line similar to x=c\,x=c\, as in equation (43). Instead, a hyperplane (a plane for M=2\,M=2\, and a straight line for M=1\,M=1) is sought for separating the orthogonal projections (x1i,x2i,,xMi)(i=1,2,,n)\,(x_{1}^{i},~{}x_{2}^{i},~{}\dots,~{}x_{M}^{i})(i~{}=~{}1,~{}2,~{}\dots,~{}n)\, with respect to the classes C1C_{1} and C2C_{2} to which the corresponding measurements (x1i,x2i,,xMi,yi)\,(x_{1}^{i},~{}x_{2}^{i},~{}\dots,~{}x_{M}^{i},~{}y^{i})\, belong. See [8] for further details about the multidimensional gg-correlation and its practical application using Fisher linear discriminant functions. Also the multidimensional gg-correlation coefficient is directly related to a prediction model which allows inference of YY from the realizations of X1,X2,,XM\,X_{1},~{}X_{2},~{}\dots,~{}X_{M}.

6 CORRELATION COEFFICIENTS COMPARISON

6.1 Comparison on linearly correlated datasets

As so far, we have introduced 5 correlation coefficients from literature and a new nonlinear non-parametric correlation measure method: gg-correlation (ω\omega). We run a comparison on 12 different 2-D simulated dataset with unique features to observe the robustness of ω\omega.

We can visualize the comparison from Fig.4. The xx-axis for each plot (from top to bottom) represents Pearson correlation coefficient (rr), Spearman’s rank correlation coefficient (ρ\rho), Kendall’s Tau correlation coefficient (τ\tau), Fechner correlation coefficient (κ\kappa), and Nonlinear correlation coefficient (NCCNCC). The yy-axis for all plots are gg-correlation.

We can see that the top 4 plots share the same bowl shape that when the Pearson correlation coefficient, Spearman’s rank correlation coefficient, Kendall’s Tau correlation coefficient, and Fechner correlation coefficient are between [-1, 0], the gg-correlation will decrease when these four correlation coefficient get closer to 0. This shows that gg-correlation is robust and correct. Because when these four correlation coefficients get closer to 0, the relationship of the datasets is transforming from negative correlated to non correlated. Thus gg-correlation is changing from its maximum, which is 1, to, its minimum, which is 0.5.

Similarly, when the Pearson correlation coefficient, Spearman’s rank correlation coefficient, Kendall’s Tau correlation coefficient, and Fechner correlation coefficient are between [0, 1], the gg-correlation will increase when these four correlation coefficient get closer to 1. Again, this proves gg-correlation’s robustness. Because when these four correlation coefficient get closer to 1, the two variables in dataset are positively correlated. Thus gg-correlation is getting closer to 1.

The last plot is the comparison between Nonlinear correlation coefficient and gg-correlation. As we mentioned in section 2.4, the range for NCC is [0,1], thus . We can see from the plot that NCC and gg-correlation are monotonic increasing. This also indicate that gg-correlation is correct since when NCC is closer to 1, the corresponding dataset should be perfectly correlated, thus gg-correlation should also be 1.

Thus, in summary, gg-correlation is robust based on the result and analysis from the experiments with 12 datasets and comparison with 5 existing correlation coefficient measurements, which vary from linear correlation coefficient to non-linear correlation coefficient, from parametric correlation coefficient to non-parametric correlation coefficient.

6.2 Comparison on nonlinearly correlated datasets

In section 6.1, we see that gg-correlation is consistent with all the existing 5 coefficient correlation. In this section, we will show two examples which gg-correlation outperforms than one or more other correlations in capturing the nonlinear relationship between variable XX and YY.

In the graph 5, we can see the variable XX and YY are nonlinearly correlated and they are also auto-correlated as time series dataset, which means the pattern of the correlation repeat over certain intervals. When we exam its correlation by Pearson correlation coefficient, the result is -0.058, which is contradicted with our observation. Similarly, Spearman’s rank correlation coefficient is -0.061, Kendall’s Tau correlation coefficient is -0.042, and Fechner correlation coefficient is 0, which are all giving an inaccurate result that XX and YY in graph 5 is not correlated.

However, gg-correlation shows that these two variables are correlated for sure, whose coefficient is 0.71. By Lemma 4 should we know that the further gg-correlation coefficient is from 0.5 the more nonlinearly correlated XX and YY are. Graph 5 demonstrates the gg-correlation. Following the procedure in section 4.3, we get the gg-correlation by splitting the dataset with y=y~y=\tilde{y} and x=c=2.85x=c=2.85.

In the graph 6, we can see that variables XX and YY has some nonlinear correlation since the range for possible yy varies when xx changes. Roughly speaking, the possible YY of x<cx<c is smaller than that of xcx\geq c. By Definition 1 in section 2, we know that this kind of dataset is correlated. And the gg-correlation coefficient for this dataset is 1. Graph 6 demonstrates the gg-correlation and the optimal c=12.366c=-12.366.

However, (NCC)(NCC) here is 0.363, which means that NCCNCC didn’t detect the complete pattern.

In summary, the ability for gg-correlation captures nonlinear and complex relationship between variables is better than the 5 correlation coefficients in literature.

7 CORRELATION COEFFICIENTS AND SURFACE ROUGHNESS ASSESSMENT

Surface roughness is an important quality indicator for products machined with turning, milling, or grinding processes.An implementation of adaptive control schemes requires in-process assessment of surface roughness. Due to the limitations of stylus profilometers, optical techniques, etc., surface roughness is generally measured based on the following three parameters: arithmetic mean roughness (RaR_{a}), maximum peak-to-valley roughness (RmaxR_{max}), and mean roughness depth (RzR_{z}). We use all the correlation coefficients presented in this paper to determine the correlation between the average level of the three surface roughness parameters which act as the dependent variables, and cutting speed and cutting feed as well as average values of the statistics RMS, absolute energy and ringdown counts of acoustic emission signals which act as the independent variables.

Data for 50 experiments with 25 different operating conditions (varying speed and feed rates) were collected and processed. For computing the gg-correlation coefficient the 50 records were randomly divided 10,000 times into a training set TT of 30 records and an evaluation set EE of 20 records. The arithmetic mean of the g-correlations for the respective evaluation set is taken as ω\omega.

Table A Novel Nonlinear Non-parametric Correlation Measurement With A Case Study on Surface Roughness in Finish Turning presents the correlation coefficients for the above data sets. For identical measurements,we took the average ranks for these equal values in finding the Spearman’s rank correlation coefficient ρ\rho [10]. Figure 7 shows the graphical representation of the results. Each color represents one correlation coefficient measure method. For each line, each marker is an absolute value of a correlation coefficient between one independent variable and one dependent variable. From left to right, each marker represents the correlation coefficient between cutting speed and RaR_{a}, cutting speed and RmaxR_{max}, cutting speed and RzR_{z}, cutting feed and RaR_{a}, cutting feed and RmaxR_{max}, cutting feed and RzR_{z}, one of the acoustic emission statistics RMS and RaR_{a}, one of the acoustic emission statistics RMS and RmaxR_{max}, one of the acoustic emission statistics RMS and RzR_{z}, absolute energy and RaR_{a}, absolute energy and RmaxR_{max}, absolute energy and RzR_{z}, ringdown counts and RaR_{a}, ringdown counts and RmaxR_{max}, ringdown counts and RzR_{z},

From figure 7 it is seen that ω\omega has the same pattern as |r|\,|r|, |ρ|\,|\rho|, |τ|\,|\tau|, |κ|\,|\kappa|, and NCCNCC. This result is consistent with the result we got from section 6.1 with simulated dataset. In addition, in 6.1, we use full dataset to calculate the gg-correlation and the result shows that the ω\omega is consistent with r\,r, ρ\,\rho, τ\,\tau, κ\,\kappa, and NCCNCC. In this section, we use real world dataset to calculate gg-correlation by estimating the parameters y~\tilde{y} and cc in training set and validating them in test set. And the result also shows that ω\omega is consistent with r\,r, ρ\,\rho, τ\,\tau, κ\,\kappa, NCCNCC. It further indicates that gg-correlation is robust.

However, in section 6.2, we showed that gg-correlation outperforms when there are some complicated nonlinear relationship between independent and dependent variables. In figure 7, we can see that the correlation coefficients between absolute energy and RaR_{a}, RmaxR_{max}, and RzR_{z}, as well as ringdown counts of emission signals and RaR_{a}, RmaxR_{max}, and RzR_{z} are close to 0 based on all the correlation coefficient measurements, except for NCCNCC and ω\omega. This could be case that the hidden nonlinear relationship is captured by NCCNCC and ω\omega.

From the standpoint of surface roughness prediction in finish turning, the results imply that cutting feed is strongly while the cutting speed and RMS of acoustic emission signals are moderately correlated with the three roughness parameters. The absolute energy and ringdown counts has nonlinear correlation to surface roughness.

8 CONCLUSIONS

Several correlation coefficients have been examined in this paper, with regard to linearly and nonlinearly correlated dataset. We showed that when dealing with linearly correlated variables, gg-correlation coefficient ω\omega is consistent with Pearson’s rr, Spearman’s ρ\rho, Kendall’s τ\tau, Nonlinear Correlation’s NCCNCC as well as Fechner’s κ\kappa. When examining more complicated nonlinear relationship, ω\omega outperforms than all the other 5 measurements.

We also examined these correlation coefficients with regard to a problem of surface roughness assessment in finish turning. It was possible to verify former results about surface roughness prediction such as the usefulness of cutting feed through the whole spectrum of correlation coefficients. In addition, gg-correlation is consistent with other correlation measurement methods and it can also detect some complex nonlinear relationship that most of other methods can’t do.

In addition, properties of the gg-correlation coefficient ω\omega have been proven and an algorithm for the computation of ω\omega has been provided.

What’s more, there is no assumptions on the application of ω\omega, which makes it a universal correlation coefficient measurement method to capture either linear or nonlinear relationship. This together with the facts that it works beyond functional relationships (no parameter needs to estimate) between the data allows the gg-correlation coefficient ω\omega to be applied in a wide range of areas.

Appendix A APPENDIX

Symbol Description
X,YX,Y Random variables
rr Pearson correlation coefficient
τ\tau Kendall’s Tau correlation coefficient
ρ\rho Spearman’s rank correlation coefficient
NCCNCC Nonlinear correlation coefficient
κ\kappa Fechner correlation coefficient
ω\omega gg-correlation correlation coefficient
|x||x| Absolute value of xx
x¯\bar{x} Mean value of variable XX
x~\tilde{x} Median value of variable XX
\simeq Approximately equal to
ϵ\epsilon an arbitrarily very small real number
sup supremum (least upper bound)
inf infimum (greatest lower bound)
:=:= is defined to be equal to
FX,FYF_{X},F_{Y} Cumulative distribution function
P{X<b}P\{X<b\} Probability that XX is strictly less than bb
P{X|Y}P\{X|Y\} Probability of the event XX conditional on the event YY
i=1nai\sum_{i=1}^{n}a_{i} a1+a2+a3++ana_{1}+a_{2}+a_{3}+\cdot\cdot\cdot+a_{n}
TT Training set
EE Evaluation set
c,α1,α2,ξc,\alpha_{1},\alpha_{2},\xi Some real number
f()f(\cdot) A function that avoids a dummy variable
f1(X)f^{-1}(X) The inverse function of function f(X)f(X)

References

  • [1] Frances Drake and Yacob Mulugetta. Assessment of solar and wind energy resources in ethiopia. i. solar energy. Solar energy, 57(3):205–217, 1996.
  • [2] Michael C Fleming et al. Principles of applied statistics rout ledge, london national guidelines and standards for industrial effluents, gaseous, emissions and hazardous waste management in nigeria, 1994.
  • [3] Linton C Freeman. Elementary applied statistics: for students in behavioral science. John Wiley & Sons, 1965.
  • [4] RV Hogg and EA Tanis. Sampling distribution theory. Probability and Statistical Inference. Prentice-Hall, Upper Saddle River, NJ, pages 262–3, 1997.
  • [5] Myles Hollander and Douglas A Wolfe. Nonparametric statistical methods. 1999.
  • [6] Andrey Latyshev and Petr Koldanov. Investigation of connections between pearson and fechner correlations in market network: Experimental study, Jan 1970.
  • [7] PE Pfeiffer. Probability for applications. 1990.
  • [8] Stefan Pittner, Sagar V Kamarthi, Piroj Wongsiripatanakul, and Naken Wongvasu. Correlation between acoustic emission statistics and surface roughness in finish turning. Citeseerx Google Scholar, 2000.
  • [9] William D Richards. The Zen of empirical research. Hampton Press, 1998.
  • [10] VK Rohatgi. An introduction to probability theory and mathematical statistics. ed. john wiley & sons. 1976.
  • [11] Zhiyuan Shen, Qiang Wang, and Yi Shen. A new non-liner correlation measure. In 2009 IEEE Youth Conference on Information, Computing and Telecommunication, pages 11–14, 2009.
  • [12] George O Wesolowsky. Multiple regression and analysis of variance: An introduction for computer users in management and economics. John Wiley & Sons, 1976.
Refer to caption
Refer to caption
Refer to caption
Figure 1: (a) Uncorrelated meaurements, (b) curvilinear correlation, (c) correlation for unknown and coars shape a correlation which seems to allow only a coarse estimate of the dependent variable Y
Refer to caption
Figure 2: Demonstration of Fechner correlation coefficient (κ=\kappa=0.907). The data points are separated into 4 areas by vertical line x=x¯x=\bar{x} and horizontal line y=y¯y=\bar{y}.
Refer to caption
Figure 3: Data lie on a strictly monotonically increasing function but which are considered to be uncorrelated by the Fechner correlation coefficient with κ=0.016\kappa=0.016
Refer to caption
Figure 4: The comparison of five correlation coefficients in literature with gg-correlation on 12 different 2-D datasets that are linearly correlated in different extend.
Refer to caption
Refer to caption
Figure 5: (a) Nonlinear correlated random variables X,YX,Y with repeat patterns. (b) Demonstration of gg-correlation coefficient, gg-correlation = 0.71.
Refer to caption
Refer to caption
Figure 6: (a) Nonlinear correlation example that isn’t detected by NCCNCC (NCC=0.363NCC=0.363) successfully. (b) Demonstration of gg-correlation coefficient, gg-correlation = 1.
Refer to caption
Figure 7: The absolute values of correlation coefficients for surface roughness independent variables and dependent variables.

Table 1

Comparison of various correlation coefficients for an independent variable, (one of the cutting speed, cutting feed, or one of the acoustic emission statistics RMS, absolute energy and ringdown counts) and a dependent variable (one of the surface roughness parameters RaR_{a}, RmaxR_{max}, and RzR_{z}.)

RaR_{a} RmaxR_{max} RzR_{z}
rr ρ\rho τ\tau κ\kappa NCCNCC ω\omega rr ρ\rho τ\tau κ\kappa NCCNCC ω\omega rr ρ\rho τ\tau κ\kappa NCCNCC ω\omega
Speed 0.118-0.118 0.205-0.205 0.180-0.180 -0.000 0.385 0.699 0.475-0.475 0.512-0.512 0.389-0.389 0.240-0.240 0.438 0.741 0.365-0.365 0.408-0.408 0.311-0.311 0.160-0.160 0.397 0.716
Feed 0.723 0.713 0.575 0.640 0.656 0.807 0.557 0.557 0.468 0.560 0.593 0.797 0.656 0.634 0.524 0.640 0.632 0.811
RMS 0.580-0.580 0.563-0.563 0.411-0.411 0.280-0.280 0.414 0.760 0.447-0.447 0.374-0.374 0.278-0.278 0.200-0.200 0.325 0.723 0.515-0.515 0.474-0.474 0.345-0.345 0.280-0.280 0.373 0.740
Energy 0.129-0.129 0.014-0.014 0.011-0.011 0.120-0.120 0.349 0.655 0.151-0.151 0.042-0.042 0.025-0.025 0.040-0.040 0.39 0.661 0.124-0.124 0.048-0.048 0.02-0.02 -0.040 0.373 0.662
Counts 0.015 0.045 0.030 0.120 0.366 0.676 0.108 0.123 0.068 0.120 0.39 0.673 0.085 0.125 0.069 0.120 0.467 0.676