This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Nonparametric tests for interaction in two-way ANOVA with balanced replications

[Uncaptioned image] Bao Khue Tran
Department of Mathematics and Statistics
Kenyon College
Gambier, OH
[email protected]
&Amy S. Wagaman
Department of Mathematics and Statistics
Amherst College
Amherst, MA
&Andrew Nguyen
Department of Statistics and Operations Research
University of North Carolina at Chapel Hill
Chapel Hill, NC
&David Jacobson
Department of Mathematics and Statistics
Amherst College
Amherst, MA
&Bradley Hartlaub
Department of Mathematics and Statistics
Kenyon College
Gambier, OH
(October 6, 2024)
Abstract

Nonparametric procedures are more powerful for detecting interaction in two-way ANOVA when the data are non-normal. In this paper, we compute null critical values for the aligned rank-based tests (APCSSA/APCSSMAPCSSA/APCSSM) where the levels of the factors are between 2 and 6. We compare the performance of these new procedures with the ANOVA F-test for interaction, the adjusted rank transform test (ARTART), Conover’s rank transform procedure (RTRT), and a rank-based ANOVA test (raov) using Monte Carlo simulations. The new procedures APCSSA/APCSSMAPCSSA/APCSSM are comparable with existing competitors in all settings. Even though there is no single dominant test in detecting interaction effects for non-normal data, nonparametric procedure APCSSMAPCSSM is the most highly recommended procedures for Cauchy errors settings.

Keywords Aligned rank-based tests  \cdot Hypothesis testing  \cdot Non-normal errors  \cdot Nonparametric methods

1 Introduction

Factorial designs allow researchers to explore main effects and interactions. Particularly, detecting interaction is crucial to conducting data analysis as the presence of interaction in the data can influence whether a factor is significant or not. In a parametric setting, the usual test for interaction is an FF-test (Montgomery, 2020). However, for data collected in various fields and industries, errors tend to not follow the normal distribution. The use of tests with known issues provides continued motivation for developing appropriate nonparametric tests for interaction in the two-way layout with multiple replications per cell. We consider the model for the general two-way layout and some different types of interaction. We focus on a balanced design with an equal number of replications per cell.

In the case of a two-way layout with balanced replications per cell, the general model is

Yijk=αi+βj+γij+εijk,i=1,,I,j=1,,J,andk=1,,K,Y_{ijk}=\alpha_{i}+\beta_{j}+\gamma_{ij}+\varepsilon_{ijk},\;i=1,...,I,\;j=1,...,J,\mbox{and}\;k=1,...,K, (1)

where two factors UU and VV, having II and JJ levels respectively, are being investigated and KK is the number of replications per cell. The YijkY_{ijk}’s are IJKIJK observations which are mutually independent random variables. The error terms, denoted by the εijk\varepsilon_{ijk}’s, are assumed to have a common median θ\theta. For notation, αi\alpha_{i} is the effect of the ithi^{\text{th}} level of factor UU, βj\beta_{j} is the effect of the jthj^{\text{th}} level of factor VV, and γij\gamma_{ij} is the effect of the interaction between the ithi^{\text{th}} level of factor UU and the jthj^{\text{th}} level of factor VV.
The hypotheses for the typical test of interaction, with no restrictions on αi\alpha_{i} and βj\beta_{j}, are

H0\displaystyle H_{0} :γijk=0,i=1,,I,j=1,,J, and k=1,,K;\displaystyle:\gamma_{ijk}=0,i=1,\dots,I,j=1,\dots,J,\mbox{ and }k=1,\dots,K;
HA\displaystyle H_{A} :γijk’s not all zero.\displaystyle:\;\gamma_{ijk}\mbox{'s not all zero.}

Stating that not all γijk\gamma_{ijk} values are zero is only one way of stating that interaction is present. Multiple definitions of interaction have been developed, and some are only applicable to the location-family model, while others are more general.

In this paper, we propose new aligned rank-based tests (APCSSA/APCSSMAPCSSA/APCSSM) for interaction in the general two-way layout with balanced replications per cell. We start with a review of existing test procedures for interaction. Then, we set forth our procedures. Through Monte Carlo simulation power studies with seven competing tests for interaction, we demonstrate the consistent performance of APCSSA/APCSSMAPCSSA/APCSSM in various settings. We conclude with a discussion of our findings and thoughts on future work in this area.

2 Review of test procedures for interaction

Researchers have developed a variety of procedures to test for interaction in the general two-way layout. The classical test procedure for determining the presence of interaction (as defined by non-zero γijk\gamma_{ijk}’s) in a parametric setting is the standard ANOVA FF-test (Montgomery, 2020). When the error terms in (1) are normally distributed, the two-way analysis of variance sum of squares identity partitions the variability and leads to an FF-test for our null hypothesis of no interaction. The FF-test assumptions of normality and common variance may not be met in all situations, however. When the assumptions necessary for performing an FF-test are not met, a nonparametric test should be used.

Multiple nonparametric tests for interaction exist in the literature. Conover and Iman (1976) suggest using the rank transform approach (RTRT) in factorial settings. RTRT uses joint ranks (averages for ties correction), and then uses the parametric FF-test on the ranks. Another nonparametric test for interaction based on the rank transform approach is the aligned rank transform (ARTART). With the ARTART procedure, an alignment is performed in the rows and columns before the joint ranking of the response variables. Wobbrock, Findlater, Gergle, and Higgins (2011) align the data before performing analysis like RTRT to study individual effects. Alignment is performed by computing the row and column averages and subtracting each from all corresponding entries in the individual rows and columns. After obtaining the joint ranks on the aligned data, the FF-statistic is calculated on the joint ranks.

The aligned rank transform has been found to perform better than the rank transform, but it still has problems with elevated Type I error rates and with nonnormal error terms (Richter, 1999; Luepsen, 2017). Mansouri and Chang (1995) performed a comparative analysis between the FF-test, rank transform, and two versions of the aligned rank transform. They conclude that the aligned rank tests are preferred over the other test procedures. They also note that all the test procedures they considered performed poorly in the Monte Carlo power study with errors drawn from a Cauchy distribution. In our power studies, we use the art procedure from the ARToolARTool package in 𝖱\mathsf{R} (Kay et al., 2021).

Salazar-Alvarez, Tercero-Gómez, Cordero-Franco, and Conover (2014) conduct a literature review and conclude that most of the techniques were based on RTRT. In order to provide a more complete comparison of nonparametric tests for interaction, we will consider methods that do not stem from Conover and Iman’s RTRT.

A different testing approach was proposed by De Kroon and Van Der Laan (1981), based on their definition of rank interaction. Their test statistic was developed to detect the presence of rank interaction in the two-way layout with multiple replications per cell. As noted by De Kroon and Van Der Laan (1981) and Hartlaub et al. (1999), the proposed statistic only works well in detecting a rank interaction of type U(V)U^{*}(V) if no main effect for UU is present (and vice versa for detecting type V(U)V^{*}(U), which works well where no main effect for VV is present).

Another way to detect interaction in factorial design is through the lens of linear regression, specifically rank-based linear models. Kloke and McKean (2012) proposed a rank-based analysis for all three hypotheses including the main effects and interactions based on a reduction of dispersion from the reduced to the full model. For the computations in this paper, we have utilized the raov function in Rfit package to compute their test statistic for interaction (Kloke and McKean, 2012).

Lastly, Hartlaub, Dean, and Wolfe (1999) developed procedures to test for interaction in the two-way layout with one observation per cell. An invariance problem with their statistics was solved by Lehman, Wolfe, Dean, and Hartlaub (Lehmann et al., 2001) who proposed symmetrized procedures. The symmetrized procedures, SSAS-SA (symmetrized statistics aligned by averages) and SSMS-SM (symmetrized statistics aligned by medians) are based on the statistics CRACRA and RCARCA, and CRMCRM and RCMRCM, respectively. In short, these procedures align within the rows or columns using averages or medians, and then rank within the other dimension. Reversing the aligning and ranking to create two statistics is similar to checking for both row and column concordance. The ranks are then combined in a cross-comparison framework to form appropriate statistics to test for interaction. A challenge for this procedure is that null means and variances for the statistics must be computed before the significance of the test statistic may be determined.

3 Nonparametric test for interaction proposal

Salazar-Alvarez, Tercero-Gómez, Cordero-Franco, and Conover (2014) recommend developing new nonparametric methods that are not based on RTRT to detect interaction. The test statistics proposed by Hartlaub, Dean, and Wolfe (1999) were found to perform well in the two-way layout with a single replication per cell. Thus, we propose an extension of their technique for the case of balanced replications per cell. Our proposed statistics use the technique of crossed comparisons (Tukey, 1991) to detect the interactions. Initial investigations summarizing cell information into a single statistic (such as a median or mean) and then applying the methods from Hartlaub, Dean, and Wolfe (1999) did not perform as well as these extended methods where all possible comparisons were examined.

Our proposed statistics generalize the comparison idea with all possible comparisons. We eliminate nuisance effects by aligning with averages or medians to remove one of the nuisance effects (row or column) and ranking within the columns or rows to remove the other. We call the proposed statistics APCSSAAPCSSA and APCSSMAPCSSM. The names of the statistics come from the idea that they are extensions of S-SAS\text{-}SA and S-SMS\text{-}SM from Hartlaub, Dean, and Wolfe 1999, where we have added APC to stand for all possible comparisons.

Next, we describe how our proposed statistics, APCSSAAPCSSA and APCSSMAPCSSM, are computed. We begin with APCSSAAPCSSA, the all possible comparisons extension of SSASSA. APCSSAAPCSSA is the maximum of two standardized statistics, so we begin by outlining their calculation.

Step 1. Calculate APCCRAAPCCRA, which stands for all possible comparisons (APC), column alignment (C), row ranking (R), using averages for the alignment (A). Again, the name of the statistic just reflects that we are aligning within the columns using column averages, and then ranking within the rows, and then using all possible comparisons to create the statistic (below). We compute the J(J1)/2J(J-1)/2 crossed comparisons denoted VjjV_{jj\;^{\prime}}.

Vjj=1i<iIk1=1Kk2=1Kk3=1Kk4=1K{(rijk1+rijk2)(rijk3+rijk4)}2.V_{jj\;^{\prime}}=\sum_{1\leq\>i<}\sum_{i\;^{\prime}\leq\>I}\;\;\sum^{K}_{k_{1}=1}\sum^{K}_{k_{2}=1}\sum^{K}_{k_{3}=1}\sum^{K}_{k_{4}=1}\left\{\left(r_{ijk_{1}}+r_{i\;^{\prime}j\;^{\prime}k_{2}}\right)-\left(r_{i\;^{\prime}jk_{3}}+r_{ij\;^{\prime}k_{4}}\right)\right\}^{2}. (2)

APCCRAAPCCRA is the maximum of the VjjV_{jj\;^{\prime}}’s.

Step 2. Calculate APCCRADAPCCRAD, which is just a scaled version of APCCRAAPCCRA. Divide APCCRAAPCCRA by K4I(I1)/2K^{4}I(I-1)/2, the number of summands in APCCRAAPCCRA, to obtain this scaled version of the crossed comparisons for the maximum column comparison. That is,

APCCRAD=APCCRAK4I(I1)/2.APCCRAD=\frac{APCCRA}{K^{4}I(I-1)/2}. (3)

Step 3. Calculate APCRCAAPCRCA, which stands for all possible comparisons (APC), row alignment (R), column ranking (C), using averages for the alignment (A). Repeat Step 1, with alignment in the rows and ranking in the columns. APCRCAAPCRCA is computed by taking the maximum of I(I1)/2I(I-1)/2 possible row comparisons.

Step 4. Calculate APCRCADAPCRCAD. APCRCADAPCRCAD is computed by dividing APCRCAAPCRCA by K4J(J1)/2K^{4}J(J-1)/2.

Step 5. Standardization. APCCRADAPCCRAD and APCRCADAPCRCAD are further standardized by subtracting the appropriate null mean and dividing by the appropriate null standard deviation. In order to do this, one must find

APCCRAD=APCCRADE0(APCCRAD)V0(APCCRAD),APCCRAD^{*}=\frac{APCCRAD-E_{0}(APCCRAD)}{\sqrt{V_{0}(APCCRAD)}},\\ (4)
andAPCRCAD=APCRCADE0(APCRCAD)V0(APCRCAD),\mbox{and}\;APCRCAD^{*}=\frac{APCRCAD-E_{0}(APCRCAD)}{\sqrt{V_{0}(APCRCAD)}}, (5)

where E0(APCCRAD)E_{0}(APCCRAD) and E0(APCRCAD)E_{0}(APCRCAD) are the null means of APCCRADAPCCRAD and APCRCADAPCRCAD respectively, and V0(APCCRAD)V_{0}(APCCRAD) and V0(APCRCAD)V_{0}(APCRCAD) are the null variances of APCCRADAPCCRAD and APCRCADAPCRCAD respectively. These null means and variances are computed via simulation and their values are available on Github at https://github.com/tranbaokhue/Rank-based-InteractionTest. Note that if I=JI=J, then by symmetry, the null means and null variances are equal and only one set needs to be computed.

Step 6. Calculate APCSSAAPCSSA. APCSSAAPCSSA is the maximum of APCCRADAPCCRAD^{*} and APCRCADAPCRCAD^{*}.
Alternatively, using medians to align the data instead of averages in our procedure yields APCSSMAPCSSM. The common ties correction of using average ranks for ties should be used during the ranking process.

4 Simulations

4.1 Simulation settings and notes

With the increase in 𝖱\mathsf{R} packages used to analyze two-way ANOVA models, Feys (2016) cautions researchers against choosing tests for their pp-values based on a given dataset. In order to compare the proposed statistics with existing competitors, we performed a simulation power study with the FF, RTRT, ARTART, DEKRDEKR, raov statistics, and our two proposed statistics APCSSAAPCSSA and APCSSMAPCSSM. Multiple settings, with factor levels from 2 to 6 and replications between 2 and 9, were investigated and in this section we show selected results from four settings: 3×2×33\times 2\times 3, 3×3×33\times 3\times 3, 3×4×23\times 4\times 2, and 4×6×24\times 6\times 2, where the format I×J×KI\times J\times K gives the number of levels for each factor and the number of replications per cell.

The null distributions for APCSSA/APCSSMAPCSSA/APCSSM were derived for all settings with 2 to 6 levels in each main factor and the number of replications per cell ranging from 1 to 5 using Monte Carlo simulation. The null distributions were based on 100,000 computations of the statistics with no interaction or main effects present and using normal error terms. In cases where symmetrization was needed, e.g. APCSSAAPCSSA and APCSSMAPCSSM, two sets of 100,000 computations were used. The first set of 100,000 was used to determine expected values and variances for use in the symmetrizations, and the second set for the null distribution determination of the symmetrized statistics. During the ranking process, the common ties correction of using average ranks was employed for the our new proposed procedures, APCSSAAPCSSA and APCSSMAPCSSM. Expected values and variances as well as critical values for our proposed statistics are available through Github (https://github.com/tranbaokhue/Rank-based-InteractionTest).

We used Monte Carlo simulation to perform the power comparisons for all statistics at the 0.050.05 significance level. Multiple main effects were chosen for each factor, and two different types of interaction were studied. These types are product and specific interaction. Product interaction is interaction where the γijk\gamma_{ijk}’s are related to the main effects, simply as γijk=λαiβj\gamma_{ijk}=\lambda\alpha_{i}\beta_{j}, where λ\lambda is a general scaling factor (Tukey, 1949). In specific interaction, 2×22\times 2 matrices

[cccc]\begin{bmatrix}c&-c\\ -c&c\end{bmatrix}

where cc\in\mathbb{R}, are embedded in the rows or columns. Details on the main effects and interaction effects examined are available through Github at https://github.com/tranbaokhue/Rank-based-InteractionTest. We note that main effects are absent for each factor at factor level 1.

When generating the data, error terms were drawn from the Normal(0,1)\text{Normal}(0,1), Uniform(2,2)\text{Uniform}(-2,2), Exponential(1)\text{Exponential}(1), Double Exponential(0,12)\text{Double Exponential}\left(0,\frac{1}{\sqrt{2}}\right), or Cauchy(0,1)\text{Cauchy}(0,1) distributions. For each combination of main effects, interaction effects, and error terms, we conduct 10,000 simulations for power comparisons. We use varying interaction effects and no interaction to attain the full power curves for each procedure under each setting. All simulations were computed with R Statistical Software (v.4.3.1; R Core Team 2023).

4.2 Simulation results

Our selected power comparison results are shown via figures. Figures 1 and 4 are from the 3×4×23\times 4\times 2 setting, while Figures 3 and 5 are from the 3×2×33\times 2\times 3 setting and 3×3×33\times 3\times 3 setting. Finally, Figures 2 and 6 are from the 4×6×24\times 6\times 2 setting. While Figures 1, 3, 4, and 5 include power curves from settings with product interaction, Figures 2 and 6 present the powers of the tests when the 2×22\times 2 matrices of specific interaction are embedded in the first two columns of the simulated data. Both interaction types as well as all five distributions of errors are shown in our selected results.

In Figure 1, with normal errors, even thought ARTART has slightly higher power than the FF-test, ARTART suffers from inflated Type I error. Thus, we still recommend the FF-test for normal error data compared to the others (in order from second-best to worst, APCSSAAPCSSA, raov, APCSSMAPCSSM, RTRT, and DEKRDEKR). Our new proposed procedure APCSSAAPCSSA is comparable to the FF-test in terms of power. We also quickly see that the DEKRDEKR statistic does not do well when both factors have main effects.

In Figure 2, looking at uniform errors, we can see that the main effects clearly nullified DEKRDEKR interaction detection. We can see that APCSSAAPCSSA performs the best followed closely by the FF-test and others.

Refer to caption
Figure 1: Power curves in the 3×4×23\times 4\times 2 setting with product interaction (γijk=λαiβj\gamma_{ijk}=\lambda\alpha_{i}\beta_{j}), 𝜶=(1,0,1)\bm{\alpha}=(-1,0,1), 𝜷=(1,0.5,0.5,1)\bm{\beta}=(-1,-0.5,0.5,1), and standard normal errors.
Refer to caption
Figure 2: Power curves in the 4×6×24\times 6\times 2 setting with specific interaction, 𝜶=(2,0,0.5,1.5)\bm{\alpha}=(-2,0,0.5,1.5), 𝜷=(1.5,1,0.5,0.5,1,1.5)\bm{\beta}=(-1.5,-1,-0.5,0.5,1,1.5), and uniform errors.

In Figure 3, when the data follow an exponential distribution, raov is the most powerful but it has an extremely elevated Type I error. Following closely behind are the ARTART and the proposed APCSSAAPCSSA. These two are powerful, with only slightly inflated significance levels. Even though APCSSAAPCSSA is comparable, we would generally recommend comparing with the FF-test results. Facing double exponential errors, in Figure 4, we can see that ARTART is the most powerful, but APCSSAAPCSSA, raov, and the FF-test all have roughly the same empirical powers. From both these figures, it is clear how RTRT and DEKRDEKR are easily influenced by main effects.

Refer to caption
Figure 3: Power curves in the 3×2×33\times 2\times 3 setting with product interaction (γijk=λαiβj\gamma_{ijk}=\lambda\alpha_{i}\beta_{j}), 𝜶=(0.5,0.5,1)\bm{\alpha}=(-0.5,-0.5,1), 𝜷=(1,1)\bm{\beta}=(-1,1), and exponential errors.
Refer to caption
Figure 4: Power curves in the 3×4×23\times 4\times 2 setting with product interaction (γijk=λαiβj\gamma_{ijk}=\lambda\alpha_{i}\beta_{j}), 𝜶=(1,0,1)\bm{\alpha}=(-1,0,1), 𝜷=(1,0.5,0.5,1)\bm{\beta}=(-1,-0.5,0.5,1), and double exponential errors.

Finally, as seen in Figure 5 and Figure 6, when faced with the challenges of Cauchy errors, our proposed test APCSSMAPCSSM performs the best in various settings (both product interaction and specific interaction). Kloke and McKean’s raov might be more powerful, but when there is no interaction, the rejection rate remains way over 0.1. We can also see how the FF-test, DEKRDEKR, APCSSAAPCSSA, and ARTART perform when the data have many outliers.

Refer to caption
Figure 5: Power curves in the 3×3×33\times 3\times 3 setting with product interaction (γijk=λαiβj\gamma_{ijk}=\lambda\alpha_{i}\beta_{j}), 𝜶=(1,0,1)\bm{\alpha}=(-1,0,1), 𝜷=(1,0,1)\bm{\beta}=(-1,0,1), and Cauchy errors.
Refer to caption
Figure 6: Power curves in the 4×6×24\times 6\times 2 setting with specific interaction, 𝜶=(2,0,0.5,1.5)\bm{\alpha}=(-2,0,0.5,1.5), 𝜷=(1.5,1,0.5,0.5,1,1.5)\bm{\beta}=(-1.5,-1,-0.5,0.5,1,1.5), and Cauchy errors.

Our results for other combinations of settings, main effects, interactions and error terms are consistent with these selected results. In summary, the new proposed statistics perform outstanding in situations with Cauchy or double exponential errors, so we advocate their use to detect interaction in these settings.

5 Discussion and Future Work

Our simulation studies have verified previous work that DEKRDEKR suffers from the introduction of row main effects and is not recommended for interaction detection in the balanced replications per cell setting. With Conover’s RTRT approach, the resulting statistic does not compete well with the FF-test and suffers elevated Type I error rates when the error terms come from nonnormal distributions. Although Kloke and McKean’s raov and ARTART are much more powerful than DEKRDEKR and RTRT, their rejection rates in various settings can be twice the significance level of 0.05. Our proposed statistics APCSSAAPCSSA and APCSSMAPCSSM perform well in settings with exponential and Cauchy error terms respectively. Additionally, when using APCSSAAPCSSA, not much power is lost compared to the FF-test when error terms are from the normal or uniform distributions and APCSSMAPCSSM is undeniably the best procedure for detecting interactions for data with Cauchy errors. In conclusion, we are able to recommend using APCSSAAPCSSA and APCSSMAPCSSM to detect interaction in the two-way layout with balanced replications per cell.

While we have demonstrated that our statistics work well to detect interaction in these settings, future work in this area remains. Further validation of these results with the power comparisons from settings with more replications per cell may be enlightening. Additionally, we have work in progress to develop 𝖱\mathsf{R} code so that the new statistics, which are computationally intensive, may be easily accessed by interested researchers. It is also natural to consider extending these techniques to the general two-way layout with an unequal number of replications per cell.

Acknowledgments

We are grateful to Jessica Jeong for her meticulous verification of 𝖱\mathsf{R} code and scripts. We would like to thank Amherst College and Kenyon College for supporting our summer research projects.

References

  • Conover and Iman [1976] W. Conover and R. Iman. On some alternative procedures using ranks for the analysis of experimental designs. Communications in Statistics - Theory and Methods, 5(14):1349–1368, Jan. 1976. ISSN 0361-0926, 1532-415X. doi:10.1080/03610927608827447. URL http://dx.doi.org/10.1080/03610927608827447.
  • De Kroon and Van Der Laan [1981] J. De Kroon and P. Van Der Laan. Distribution-free test procedures in two-way layouts; a concept of rank-interaction. Statistica Neerlandica, 35(4):189–213, Dec. 1981. ISSN 0039-0402, 1467-9574. doi:10.1111/j.1467-9574.1981.tb00730.x. URL http://dx.doi.org/10.1111/j.1467-9574.1981.tb00730.x.
  • Feys [2016] J. Feys. Nonparametric Tests for the Interaction in Two-way Factorial Designs Using R. The R Journal, 8(1):367, 2016. ISSN 2073-4859. doi:10.32614/RJ-2016-027.
  • Hartlaub et al. [1999] B. A. Hartlaub, A. M. Dean, and D. A. Wolfe. Rank-based test procedures for interaction in the two-way layout with one observation per cell. Canadian Journal of Statistics, 27(4):863–874, Dec. 1999. ISSN 0319-5724, 1708-945X. doi:10.2307/3316137.
  • Kay et al. [2021] M. Kay, L. A. Elkin, J. J. Higgins, and J. O. Wobbrock. Mjskay/ARTool: ARTool 0.11.0. Zenodo, Apr. 2021.
  • Kloke and McKean [2012] J. Kloke, D. and J. McKean, W. Rfit: Rank-based Estimation for Linear Models. The R Journal, 4(2):57, 2012. ISSN 2073-4859. doi:10.32614/RJ-2012-014.
  • Lehmann et al. [2001] J. Lehmann, D. Wolfe, and B. A. Hartlaub. Rank-based procedures for analysis of factorial effects. Recent Advances in Experimental Designs and Related Topics, 2001.
  • Luepsen [2017] H. Luepsen. The aligned rank transform and discrete variables: A warning. Communications in Statistics - Simulation and Computation, 46(9):6923–6936, Oct. 2017. ISSN 0361-0918, 1532-4141. doi:10.1080/03610918.2016.1217014.
  • Mansouri and Chang [1995] H. Mansouri and G.-H. Chang. A comparative study of some rank tests for interaction. Computational Statistics & Data Analysis, 19(1):85–96, Jan. 1995. ISSN 01679473. doi:10.1016/0167-9473(93)E0045-6.
  • Montgomery [2020] D. C. Montgomery. Design and Analysis of Experiments. Wiley, Hoboken, NJ, tenth edition edition, 2020. ISBN 978-1-119-49247-4 978-1-119-49244-3.
  • Richter [1999] S. J. Richter. Nearly exact tests in factorial experiments using the aligned rank transform. Journal of Applied Statistics, 26(2):203–217, Feb. 1999. ISSN 0266-4763. doi:10.1080/02664769922548.
  • Salazar-Alvarez et al. [2014] M. I. Salazar-Alvarez, V. G. Tercero-Gómez, A. E. Cordero-Franco, and W. J. Conover. Nonparametric analysis of interactions: A review and gap analysis. IIE Annual Conference and Expo 2014, pages 2910–2917, 01 2014.
  • Tukey [1991] J. W. Tukey. The philosophy of multiple comparisons. Statistical Science, 6:100–116, 1991.
  • Wobbrock et al. [2011] J. O. Wobbrock, L. Findlater, D. Gergle, and J. J. Higgins. The aligned rank transform for nonparametric factorial analyses using only anova procedures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 143–146, Vancouver BC Canada, May 2011. ACM. ISBN 978-1-4503-0228-9. doi:10.1145/1978942.1978963.