Finite size corrections relating to distributions of the length of longest increasing subsequences
Abstract.
Considered are the large , or large intensity, forms of the distribution of the length of the longest increasing subsequences for various models. Earlier work has established that after centring and scaling, the limit laws for these distributions relate to certain distribution functions at the hard edge known from random matrix theory. By analysing the hard to soft edge transition, we supplement and extend results of Baik and Jenkins for the Hammersley model and symmetrisations, which give that the leading correction is proportional to , where is the intensity of the Poisson rate, and provides a functional form as derivates of the limit law. Our methods give the functional form both in terms of Fredholm operator theoretic quantities, and in terms of Painlevé transcendents. For random permutations and their symmetrisations, numerical analysis of exact enumerations and simulations gives compelling evidence that the leading corrections are proportional to , and moreover provides an approximation to their graphical forms.
1. Introduction
Taking a viewpoint of random matrix theory in probability theory, it is very natural to ask about the rate of convergence to universal laws. Consider for example the spacing distribution, say, between consecutive eigenvalues in ensembles with unitary symmetry. Here the subscript is the Dyson index for unitary symmetry. The corresponding universal law, obtained by taking the large limit of an ensemble with unitary symmetry and scaling the mean spacing to unity, tells us that [16]
(1.1) |
where is the Fredholm determinant of the integral operator on with the so-called sine kernel
(1.2) |
An example of an ensemble with unitary symmetry is the set of unitary matrices chosen with Haar (uniform) measure. With denoting the spacing distribution between consecutive eigenvalues in this ensemble, the limit theorem relating to (1.1) is that
(1.3) |
The rate of convergence question may be posed by asking for a tight bound on
(1.4) |
Less ambitious, but more in keeping with an applied mathematics viewpoint on this aspect of random matrix theory, is to ask for the leading term in the large asymptotic expansion of the difference
(1.5) |
occurring in (1.4) for fixed. Indeed this question is central to probing the Keating–Snaith hypothesis [33] relating the statistical distribution of the eigenvalues of Haar distributed random unitary matrices to the statistical distribution of the zeros of the Riemann zeta function on the critical line [35, 10, 24, 12]. Here one uses Odlyzko’s data set [35] of over high precision consecutive zeros about a zero number near to obtain the empirical spacing distribution. From a graphical viewpoint this appears identical to . This is in keeping with the Montgomery–Odlyzko law (see e.g. [41]) equating the scaled local statistics of the Riemann zeros infinitely high up the critical line to the limiting bulk scaled eigenvalue statistics from any random matrix ensemble with unitary symmetry. However there are finite size effects — even though is huge on an absolute scale, it is the logarithm of the zero number which is the relevant measure of size. The extraordinary statistics provided by Odlyzko’s data set allows for the functional form of the analogue of the difference (1.5), where now is replaced by the empirically determined spacing distribution, to be accurately determined. The Keating–Snaith hypothesis predicts that this difference will be identical to the difference (1.5) for an appropriate value of and rescaling of . Hence the applied interest in (1.5) for fixed and large , a study of which was undertaken in [10, 24, 12].
Our interest in the present work relates to the finite size corrections of another applied problem from random matrix theory, this time in the field of combinatorics. Take the set of the first positive integers, and choose a permutation uniformly at random. The question is to specify the statistics of the longest increasing subsequence length, say, from the permutation in the large limit. To explain the notion of the longest increasing subsequence, suppose and the permutation is the ordered list . The longest subsequences of increasing numbers in this list have length 4: and . A result of Logan and Shepp [34] (see the Introduction in [36] for more context) gives that asymptotically the expected length is . The tie in with random matrix theory is the limit theorem [5]
(1.6) |
where is the probability that, after centring and scaling about the largest eigenvalue, the interval is free of eigenvalues in an Hermitian random matrix ensemble with unitary symmetry, or equivalently is the distribution of the scaled largest eigenvalue. This quantity has the exact evaluations [18, 43]
(1.7) |
In the first of these expressions, is the integral operator on with kernel
(1.8) |
where denotes the Airy function. In the second expression, is the solution of the particular Painlevé II equation satisfying the boundary condition as . Much more about the mathematics relating to the longest increasing subsequence problem for a random permutation can be found in [2] and [39].
In analogy with our discussion of studies in random matrix theory motivated by Odlyzko’s data for the Riemann zeros, an immediate question is to inquire about the large form of the difference
(1.9) |
There are various ways to generate exact and simulated data for this quantity; these are discussed in §4. In Figure 1, in the first panel we plot the histogram corresponding to the empirical cumulative distribution function (CDF) for , computed from the longest increasing subsequence length of random permutations. Also plotted, as black dots, is the exact CDF as a function of the positive integer , calculated as described in Section 4.2 below, and plotted as a red line is the quantity with varying continuously. In the second panel the difference (1.9) is displayed. In relation to the functional form in the second panel, we would like to know its dependence on , and its functional form at next-to-leading order. Our result of Conjecture 4.2 asserts the large asymptotic expansion
(1.10) |
where as known from (1.6), while the functional form remains unknown as an analytic function, but can be approximated graphically; see Figure 7.
There is a well known Poissonized form of the longest increasing subsequence problem known as the Hammersley process (see e.g. [22, §10.6]). Each permutation of length appears with probability and is represented by points in the unit square, where is the Poisson rate of the number of these points. The length of the longest increasing subsequence is now the maximum over the number of points that can be joined using straight line segments starting from the origin and finishing at , or equivalently the maximum number of segments in a path through the dots which always goes up and to the right. The example of the permutation for given in the second paragraph is illustrated in Figure 2 from this viewpoint. Define this length to be the random variable . From the definition,
(1.11) |
Analogous to (1.6), this quantity satisfies the limit law [5, 13]
(1.12) |
The study [17] showed the existence of an expansion analogous to (1.10) with replaced by , while most significantly from the present viewpoint, the subsequent work of Baik and Jenkins [6, Theorem 1.3] provided an explicit functional form of the analogue of .
The reasons why is more tractable than for asymptotic analysis are either of the formulas [30, 37]
(1.13) |
where is the average over the unitary group of degree with Haar measure, or [13]
(1.14) |
where is the hard edge scaled probability that in the Laguerre unitary ensemble with parameter (recall the Laguerre weight is ) the interval is free of eigenvalues. The first of these was used in [5] to prove (1.12), while an alternative proof of (1.12) using (1.14) was given in [13].
As with the work [6], the question of interest is now to quantify the large expansion of
(1.15) |
In this setting a precise statement can be made, supplementing the result known from [6]; in relation to the latter see §2.2. To state our result requires introducing the integral operator on with kernel
(1.16) |
Proposition 1.1.
With denoting the integer part of a positive real number , let
(1.17) |
For large we have
(1.18) |
where and
(1.19) |
In keeping with the two characterisations of given in (1), one as a Fredholm determinant and the other in terms of the Painlevé transcendent , the correction can alternatively be written in terms of a second order linear differential equation with a Painlevé transcendent relating to occurring in the coefficients; see Proposition 2.1 below. Moreover we show that this can be further simplified to give agreement with the result of Baik and Jenkins [6].
There are symmetrised versions of the longest increasing subsequence problem and the corresponding Hammersley process that permit analogues of the limit laws (1.6) and (1.12), and which also admit analogues of Proposition 1.1 [7, 8, 9]. To specify these, we first recall that a permutation of can be represented as an matrix, say, of zeros and ones with exactly ones, distributed so each row and column has a single one. For convenience, number the rows of starting from the bottom, and suppose that whenever there is an entry one in position , there is also an entry one in position . If furthermore there are no entries on the diagonal, must be even and the permutation consists entirely of two cycles. For such a permutation chosen uniformly at random, denote the random variable corresponding to the length of the longest increasing subsequence by . Now, in the corresponding two line presentation, suppose the order of the second line is reversed, which is equivalent to rotating by ninety degrees clockwise. Assuming again that the original permutation of two cycles was chosen uniformly at random, the random variable corresponding to the longest increasing subsequence length of this rotated permutation (equivalently, the longest decreasing subsequence of the original permutation) is to be denoted by .
The Hammersley model relating to has only the points in the unit square below the diagonal from to independent; the points above the diagonal are reflections of these points. The longest up/right path length in this setting will be denoted . Similarly, for the rotated Hammersley model relating to , only the points in the unit square below the diagonal from to are chosen independently, with the remaining points the reflection in this diagonal of those points, and we denote by the longest up/right path length.
The analogue of the limit laws (1.6) and (1.12) are now [9]
(1.20) |
and
(1.21) |
The quantities and denote the probability that upon a soft edge scaling in the neighbourhood of the largest eigenvalue in the Gaussian ensemble with and , the interval is free of eigenvalues. The tilde symbol on indicates a rescaling of the natural soft edge Gaussian ensemble variables; see [22, displayed equation below (9.139)]. As in the case of the limit laws associated with and , we seek corrections to these limit laws. Our results are contained in Conjecture 4.4 in relation to , and in Propositions 3.1, 3.4 and Corollaries 3.2, 3.6 for . Baik and Jenkins [6, Theorem 1.2] contains a result that can be interpreted as corresponding to Proposition 3.1, but with a different functional form for . In Section 3.2 we discuss the latter in the context of our Painlevé characterisation of .
2. Large expansion of
2.1. Proof of Proposition 1.1
Following [13] our strategy is to analyse the large form of the LHS of (1.18) by making use of (1.14). In the latter, is a simpler variable to work with than , so to begin we need to express in terms of . We see by matching the LHS of the respective equations that
(2.1) |
where denotes the integer part. Introduce
(2.2) |
It is convenient to consider (2.2) with replaced by so that . This function of and is uniquely determined by (2.2) and the requirement that , independent of , for large. Furthermore we introduce notation for the square of and note the large expansion of the latter
(2.3) |
We note furthermore that
(2.4) |
where in the first of these refers to (2.1) and is from (1.17), and we note too that is a decreasing function of .
The quantity in (1.14) permits a Fredholm determinant form analogous to the first line in (1) [18]
(2.5) |
where is the integral operator on with kernel
(2.6) |
and is the Bessel function of the first kind. It is a standard result in the theory of Fredholm integral equations [46] that the determinant in (2.5) can be expanded as a sum over -dimensional integrals, with the integrand a determinant with entries (2.6)
(2.7) |
Now require that and are related by (2.1). We next change variables in each integrand of the series in (2.7), . Taking into consideration (2.4) the integral in the -th term reads
(2.8) |
The point here is that it follows from asymptotic expansions associated with the functional form (2.6) that for large the integrand is of order unity in the neighbourhood of the lower terminal of integration only. These asymptotic expansions [1, (9.3.23) & (9.3.27)] give that for large
(2.9) |
for certain polynomials of increasing degree. Moreover these expansions are uniform for for any fixed . Specifically, upon inserting the explicit values of these polynomials for low order, and slightly changing the notation,
(2.10) |
uniformly valid for . Recalling the form of the numerator in (2.6), we see in particular that
(2.11) |
For applicability to (2.8), taking into consideration the second equation in (2.3) and (2.6), we see that we require in (2.10) that
(2.12) |
To account for this in (2.1) we must use the Taylor expansions with bounds on error terms valid for
(2.13) |
where we made use of the differential equation satisfied by the Airy function in deriving the second expression. We can now use (2.1) in (2.6) to conclude that for large
(2.14) |
Substituting in (2.8) with the upper terminals replaced by (this is permissible by the error bounds) gives that for large , and related to by (2.1),
(2.15) |
Recalling now (1.14), then rewriting the RHS of (2.15) as in the reverse of going from (2.5) to (2.7), this tells us that for large
(2.16) |
The stated result (1.18) now follows from [12, Lemma 1], and in the term proportional to replacing by , which is valid since they are equal to leading order in .
2.2. A differential equation characterisation of
As mentioned below (1.19), the quantity in (1.18), defined in terms of Fredholm integral operators in (1.19), also permits a characterisation as the solution of a particular second order linear differential equation, with coefficients given in terms of a particular ( form) Painlevé II transcendent. In preparation, we first recall that an alternative to the second expression in (1) is the evaluation [43] (see also [22, §8.3.2])
(2.17) |
where satisfies the particular -PII equation and boundary condition
(2.18) |
Proposition 2.1.
Proof.
We require knowledge [45] (see also [22, §8.3.3]) of an alternative to (2.5), telling us that
(2.23) |
where satisfies the particular -PIII′ equation and boundary condition
(2.24) |
With related to and by (2.1), and with given by (2.3), we can change variables in (2.23) to obtain
(2.25) |
where is defined in (1.17). To be consistent with (1.18) and (2.17) we must have that for large
(2.26) |
Rearranging this gives a functional form for , which is to be substituted in (2.24) after first changing variables . These steps are readily carried out using computer algebra. Equating terms at leading powers of gives the equation (2.18) for at order , and the differential equation (2.20) at order .
In relation to the boundary condition, we reconsider the above working, and the working which gave (2.16), in the case of continuous. The only difference is that the discrete variable should be replaced by the continuous variable . Taking the logarithmic derivative of the RHS of (2.16) in this setting gives
(2.27) |
On the other hand, it follows by substituting (2.26) in (2.25) that this same quantity is also equal to
(2.28) |
Comparing (2.27) and (2.28) at leading order gives , and thus the boundary condition for in (2.18). At this comparison gives . Taking the limit in (1.16) we then obtain (2.22) for .
As noted in the Introduction, earlier Baik and Jenkins [6] obtained an evaluation of relating to Painlevé transcendents. This is simpler than our (2.19) as it involves only ,
(2.29) |
Comparing with (2.19), it follows that we must have
(2.30) |
A similar circumstance arose in the study [25, discussion below (3.42)], which suggests how (2.30) can be verified from the characterisation of in Proposition 2.1.
First, we verify from the boundary condition of in (2.18), and that of in (2.22) that they are compatible with (2.30). It remains then to verify that (2.30) satisfies the differential equation (2.20). This can be done be direct substitution, then substituting for the third and fourth derivatives of . In relation to the latter, we note that differentiating the differential equation (2.18) for , and simplifying, gives us
(2.31) |
Again differentiating this equation, and making further use of (2.18), we can express the fourth derivative of in terms of and . Once the substitutions have been performed, the resulting equation only involves the second derivative of in the form of , which we eliminate in favour of and using (2.18). These steps, performed using computer algebra, verify that (2.30) solves (2.20), as required.
2.3. Comparison with numerical calculations
We numerically calculate the correction term in (1.18) using both the integral operator characterisation of Proposition 1.1 and the expression (2.19) in terms of the solution to a differential equation. We compare these to calculations of the difference
(2.32) |
for , where, for the purposes of comparison, we use the continuous variable .
To numerically evaluate the integral operators we use the Fredholm determinant Matlab toolbox by Folkmar Bornemann [11], and a Mathematica implementation by Allan Trinh, coauthor on some related works along the theme of finite size corrections to limit formulas in random matrix theory [26, 27, 28, 23]. For the DE solutions needed for (2.19) we use a sequence of Taylor series expanded about various points, beginning on the right (near , to match the DE boundary conditions) and proceeding to the left. For we use a sequence of 600 series of degree 11, while for we use a sequence of 500 series of degree 6. To calculate the finite correction (2.32) we also need a sequence of Taylor series solutions for from (2.24) — we use a sequence of 15,446 series of degree . The results are plotted in the left panel of Figure 3. In the right panel we plot a numerical estimate of the next order correction in (1.18) by calculating
(2.33) |
3. Large expansion of and
3.1. Fredholm determinant form
The analogues of (1.13) and (1.14) are the formulas
(3.1) | ||||
(3.2) |
The first equality in both is due to Rains [37], while the second were found in [29]. We note too that the validity of the second formula in (3.1) as derived in [29] is restricted to even. However, by the different strategy of expressing both the average over and in terms of a generalised hypergeometric function of variables based on zonal polynomials — see [31] in relation to the former and [19] in relation to the latter — the validity can be established independent of the parity of . The use of a tilde in the notation indicates the use of a rescaling of the natural hard edge Laguerre ensemble variables; see [22, second displayed equation §9.8] or (3.25) below. The analogues of (2.5) are the formulas [14, 21]
(3.3) | ||||
(3.4) |
where is the integral operator on with kernel
(3.5) |
In relation to the probabilities in (3.1) and (3.2) we have the known limit theorems (1.20) and (1.21) in terms of certain soft edge gap probabilities. As for the latter admit evaluations in terms of Fredholm determinants, and Painlevé transcendents. The Fredholm determinant forms read [42, 21]
(3.6) | ||||
(3.7) |
where is the integral operator on with kernel
(3.8) |
The forms in terms of Painlevé transcendents, assuming (2.17) or the second expression in (1), are [44]
(3.9) | ||||
(3.10) |
where satisfies the particular PII equation and boundary condition as noted below (1.8).
We will first consider the Fredholm determinant forms and obtain the analogues of Proposition 1.1.
Proposition 3.1.
Proof.
We start with the Fredholm determinant (3.3)
(3.14) |
In each variable in the integrand of the series we change variables , which transforms the corresponding multi-dimensional integral to read
(3.15) |
Introducing now as expanded for large according to (2.3) with shows that the argument of the Bessel function in (3.15) has the large expansion
(3.16) |
with
(3.17) |
which from the first formula in (2.10) gives the large behaviour of the Bessel function in (3.15) itself
(3.18) |
Now making further use of the large expansion of we see from this that the argument of the determinant in (3.15) has the large expansion
(3.19) |
Expanding the argument of the Airy function in the first term according to the first formula in (2.1) shows that this reduces to
(3.20) |
The result (3.11) now follows from [12, Lemma 1], where, as in the proof of Proposition 1.1, we replace by in the second-order term since they are of the same order in .
∎
We see from (3.4) that knowledge of the scaled asymptotics of the Fredholm determinant in (3.3) is sufficient to compute the same for the quantity and thus from (3.1) the scaled asymptotics of Pr.
Corollary 3.2.
For large we have
(3.21) |
where and
(3.22) |
Remark 3.3.
For general , define the Laguerre ensemble in terms of the eigenvalue PDF proportional to
(3.23) |
Let denote the probability that the interval has no eigenvalues in this ensemble. The corresponding hard edge scaled limit is specified by
(3.24) |
With this agrees with the meaning of and as appear above, while
(3.25) |
where refers to the ensemble with eigenvalue PDF proportional to (3.23) but with replaced by . As used in [15] in the context of the spectral density, specify the soft edge scaled limit by
(3.26) |
Note here that the quantity on the RHS is the probability that the interval at the far end of the spectrum contains no eigenvalues, and that the limit is independent of . The significance of the value is that this centres and scales the coordinates so that in the variable the largest eigenvalue is near the origin, and has spacing of order unity with its neighbours.
In this random matrix setting, our results suggest that in relation to the hard to soft edge transition, we have that for large
(3.27) |
with the main point being the order of the correction term. The limit law itself was established in [13] for and 4, and, using different techniques, for general in [38]. In relation to the correction, as already pointed out in [6] in the context of the Hammersley process corresponding to the case, the shift on the LHS is crucial for the optimal rate of convergence .
3.2. Differential equation form
The scaled asymptotics of were obtained in the proof of Proposition 2.1 in terms of the solution of a differential equation, starting from knowledge of the Painlevé transcendent evaluation (2.23). For and the analogues of (2.23) are [20]
(3.28) | ||||
(3.29) |
where satisfies the particular Painlevé III′ equation and boundary condition
(3.30) |
From Proposition 2.1 we already know how to express the leading two terms in the scaled limit of the factors in (3.28) and (3.29) in terms of quantities satisfying differential equations. Our primary task then is to do the same for the second factor in (3.28).
Proposition 3.4.
Proof.
Analogous to (2.25), with given by (2.3), we can change variables to obtain
(3.37) |
To be consistent with (3.11) and (3.9) we must have that for large
(3.38) |
Rearranging this gives a particular functional form for . This is to be substituted in the differential equation (3.30) with the change of variable , which we do using computer algebra. Equating terms at leading powers of gives the differential equation stated below (1.8) for at order , and equation (3.32) at order .
From the working of the proof of Proposition 2.1 we have
(3.39) |
The expansion (3.35) now follows from this, (3.31) and (3.28).
In relation to the boundary condition, we allow in both the above working, and that of the proof of Proposition 3.1 to be continuous. The effect is to replace the discrete variable by the continuous variable . The working of the proof of Proposition 3.1 then tells us
(3.40) |
From this latter formula, it follows that for large
(3.41) |
Differentiating (3.41), simplifying using the explicit forms of the kernels (3.8) and (3.13), then comparing to the logarithmic derivative of (3.35) with replaced by we obtain
(3.42) |
We already have the asymptotic behaviour for in (2.18) and in (2.22), from which we can check that and fall off faster than the RHS’s in (3.2). This implies the boundary conditions as stated below (1.8) for , and (3.34) for . ∎
Remark 3.5.
As for , the earlier work of [6] gives an expression for in terms of Painlevé transcendents simpler than (3.36). This reads [6, Th. 1.2]
(3.45) |
A direct verification of the consistency of (3.45) and (3.36) can, upon use of Remark 3.5, be attempted along the same lines of that used in relation to verifying the identity (2.30). However the implied identity for has extra terms and a more complex structure relative to (2.30), and this has not been carried through. We remark that graphical agreement is readily verified.
Knowledge of (3.35) and the structural relation between (3.28) and (3.29) allows for the differential equation companion to Corollary 3.2 to be presented.
Corollary 3.6.
Let be as in Proposition 3.4. For large we have
(3.46) |
3.3. Comparison with numerical calculations
Here we perform similar numerical calculations to those in Section 2.3 above, calculating the corrections and , from (3.11) and (3.21) respectively, using both the Fredholm determinant expressions and the expressions in terms of solutions to differential equations. We compare them to the differences
(3.47) |
and
(3.48) |
for .
For the Fredholm determinants expressions we again use the toolbox of [11]. For the differential equation solutions and we obtain a sequence of Taylor series solutions; series of degree for and series of degree for . Lastly, we compute a sequence of series of degree for . With these sequences, and the corresponding sequences of DE solutions from Section 2.3 we obtain the graphs in the left panels of Figure 4 for and of Figure 5 for . In the right panels of these figures we present plots of
(3.49) |
and
(3.50) |
as approximations to the higher order corrections in (3.11) and (3.21) respectively.
4. Large expansion of and symmetrised analogues
4.1. Relationship to large form of and conjecture
The longest increasing subsequence problem has been described in the paragraph including (1.6). Equating the latter with (1.12) shows the coincidence of limit laws with the maximal up/right path length in the Hammersley process,
(4.1) |
To understand why these two limits coincide, first recall from (1.11) that is an exponential generating function for . Furthermore the latter is a decreasing function of that takes values between and . In this general setting Johansson [32] proved what has been referred to as a de-Poissonisation lemma.
Proposition 4.1.
Let the sequence satisfy the bounds and be monotonically decreasing so that . Let
(4.2) |
and for given write
(4.3) |
One has
(4.4) |
for all , where is some positive constant.
Rewriting (1.11) so that it reads
(4.5) |
then applying Proposition 4.1 establishes the first equality in (4.1). From Proposition 1.1 we know details of further terms in the large expansion of , with the leading correction to the limit formula (1.12) being . However, this knowledge used in Proposition 4.1 does not give information on details of further terms in the large expansion of . In fact we don’t know of any analytic approach to this question. Nonetheless there are numerical methods that allow for data to be obtained leading to a conjecture.
Conjecture 4.2.
4.2. Data from the Painlevé characterisation
Use of (2.23) in (1.14) and recalling (1.11) tells us that
(4.7) |
where is the solution of the particular -PIII′ equation specified by (2.24). This shows that if we expand in powers of
(4.8) |
then we have a practical method to compute . Thus our approach is to use the characterisation (2.24) to carry out the series expansion
(4.9) |
up to some cutoff . We were able to carry out the computation for , allowing for the computation of the CDF for all up to . The data for this quantity can be stored as exact integers, by multiplying each of the probabilities by ; see [36, Table 2–4] for some examples, with the largest value of there being . Recall Figure 1, where on the left we displayed the data for the case in a graphical form — the histogram is the empirical CDF while the black dots are calculated using the from (4.8). On the right of the figure we plotted the difference (1.9). Multiplying this difference by the conjectured order of the correction term in (4.6) we obtain the scaled difference
(4.10) |
which will allow us to compare the data to other values of .
4.3. Data from simulations
For values of beyond data can be generated by Monte Carlo simulations. The C code used to generate samples of was given to the authors by Eric Rains, based on the code used for the simulations in [36], which uses the algorithm of [4]. The most expensive part of the code is the pseudo random number generation, for which Rains’ code uses a Marsaglia-style multiply-with-carry bit-shifting algorithm. To generate the value of from trials with took approximately seconds. Without this code, an alternative method to generate the simulation data is to simply use the inbuilt Mathematica command LongestOrderedSequence, which has comparable runtime for this value of .
In Figure 6 we display the data in a graphical form, along with the estimate of the scaled difference from (4.10), where is taken to be the empirical CDF. In Figure 7 we compare with and , where we have rescaled — the agreement in the plots suggests that is indeed the correct order of the next-to-leading term in (4.6).
4.4. Large expansion of the mean and variance of
We turn our attention now to the large form of the mean and variance. From the leading term in (4.6) it follows that for large [5]
(4.11) |
where, with ,
(4.12) |
and the numerical value follows from a computation based on (1) in [43]. On the other hand the correction term in (4.6) does not immediately reveal information about higher order terms in (4.11), the reason being that is a discrete quantity, while the right hand side of (4.6) corresponds to rescaling and smoothing of the discrete distribution. This is similarly true of the variance, for which the limit theorem (1.6) gives that
(4.13) |
with the nature of higher order terms not immediately determined by the correction term in (4.6). Our data can be used to investigate the corrections to and at a numerical level.
For this purpose, we note from elementary probability theory that
(4.14) |
and
(4.15) |
From the theory of Section 4.2, for up to we have exact knowledge of . Already a consequence of these distributions being discrete shows itself. Thus if we approximate the CDF by the limiting expression , and substitute this into (4.14) and (4.15) for the expected mean and variance, which we denote and respectively, then we obtain the numerical estimates
(4.16) | ||||
(4.17) |
We recognise the values and as the mean and variance of the continuous uniform distribution on .
Tabulating the quantities
(4.18) |
leads us to believe that for large both these quantities are of order unity. Making an ansatz and choosing between or as suggested by their appearance already in this problem, we found that the choice gives the better fit. Notice that the latter exponent is precisely the one appearing in Conjecture 4.2 for the CDF. Performing a least squares analysis from our tabulation with from 10 up to 700 then gives
(4.19) |
In keeping with (4.16), we expect the value in relation to is exactly . Note that the value , being distinct from in (4.17), can be understood as being due to the square of the mean occurring in (4.15). This gives a mechanism for the coupling of terms decaying in in the expansion of the mean, with terms that increase, which are not taken into consideration in deriving (4.17).
In Figures 8 and 9 we plot and respectively, along with the conjectures in (4.19). In the right panel of each we also plot the differences
(4.20) |
4.5. The quantities and
Analogous to (4.7) we have
(4.21) |
and
(4.22) |
These follow from (3.1), (3.2), (3.28) and (3.29); for example see [22, §10.7]. Hence
(4.23) |
and
(4.24) |
We now proceed as detailed in Section 4.2, which provides us with the exact values of and for up to . That is, we find a series solution of degree to the differential equation in (3.30), and use it (along with the from Section 4.2) to expand (4.21) and (4.22) in powers of .
In Figures 10 and 11 we display the cases in graphical form, along with the scaled differences
(4.25) | |||
(4.26) |
The exact values for and can be supplemented by simulations as in Section 4.3. For this we use the C++ code for generating self-inverse permutations from [3], which samples the permutations uniformly for very large . To find the longest increasing subsequence we use the C++ implementation from [40] of an optimal algorithm. Plotting the scaled differences (4.25) and (4.26) for and , with the horizontal axis rescaled by — see Figure 12 — gives evidence for the analogue of Conjecture 4.2.
Conjecture 4.4.
Finally, we comment on the analogues of (4.18) in relation to the large expansion of the mean and variance. We proceeded as for and postulated that the quantities each have a large expansion of the form . The exact data for available for up to 400 was then used to find best fits for the corresponding values of and . However, as distinct from our findings in the case of seen in Figures 8 and 9, when calculating the differences analogous to (4.20) a decrease to zero as increased was not observed. Hence, as yet we do not have convincing evidence for dependence of higher order terms in the large expansion of the mean and standard deviation of .
Acknowledgements
This research is part of the program of study supported by the Australian Research Council Centre of Excellence ACEMS and the Discovery Project grant DP210102887. We thank Eric Rains for providing the computer program as referenced in the text. We are most grateful to Jinho Baik for bringing to our attention the crucial reference [6], which unfortunately was missed when we prepared the first draft of this work. Also, we acknowledge the contribution of Allan Trinh in collaborating in the early stages of this project.
References
- [1] M. Abramowitz and I. A. Stegun, editors. Handbook of Mathematical Functions, Dover, New York (1972).
- [2] D. Aldous and P. Diaconis, Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem, Bull. Amer. Math. Soc. 36 (1999), 413–432.
- [3] J. Arndt, Matters Computational: Ideas, algorithms and source code, Springer, Heidelberg (2011).
- [4] R.M. Baer and P. Brock, Natural sorting over permutation spaces, Math. Comp. 22 (1968), 385–410.
- [5] J. Baik, P. Deift, and K. Johansson, On the distribution of the length of the longest increasing subsequence of random permutations, J. Amer. Math. Soc. 12 (1999), 1119–1178.
- [6] J. Baik and R. Jenkins, Limiting distribution of maximal crossing and nesting of Poissonized random matchings, Ann. Probab., 41 (2013), 4359–4406.
- [7] J. Baik and E.M. Rains, Symmetrized random permutations, Random matrix models and their applications (P.M. Bleher and A.R. Its, eds.), Mathematical Sciences Research Institute Publications, vol. 40, Cambridge University Press, Cambridge (2001), pp. 171–208.
- [8] J. Baik and E.M. Rains, Algebraic aspects of increasing subsequences, Duke Math. J. 109 (2001), 1–65.
- [9] J. Baik and E.M. Rains, The asymptotics of monotone subsequences of involutions, Duke Math. J. 109 (2001), 205–281.
- [10] E. Bogomolny, O. Bohigas, P. Leboeuf, and A.C. Monastra, On the spacing distribution of the Riemann zeros: corrections to the asymptotic result, J. Phys. A 39 (2006), 10743–10754.
- [11] F. Bornemann, On the numerical evaluation of distributions in random matrix theory: a review, Markov Processes Relat. Fields 16 (2010), 803–866.
- [12] F. Bornemann, P. Forrester and A. Mays. Finite size effects for spacing distributions in random matrix theory: circular ensembles and Riemann zeros. Stud. Appl. Math., 138 (2017), 401–437.
- [13] A. Borodin and P.J. Forrester, Increasing subsequences and the hard-to-soft transition in matrix ensembles, J.Phys. A 36 (2003), 2963–2981.
- [14] P. Desrosiers and P.J. Forrester, Relationships between -functions and Fredholm determinant expressions for gap probabilities in random matrix theory, Nonlinearity 19 (2006), 1643–1656.
- [15] P. Desrosiers and P.J. Forrester, Hermite and Laguerre -ensembles: asymptotic corrections to the eigenvalue density, Nucl. Phys. B 743 (2006), 307–332.
- [16] F.J. Dyson, Statistical theory of energy levels of complex systems III, J. Math. Phys. 3 (1962), 166–175.
- [17] P.L. Ferrari and R. Frings, Finite time corrections in KPZ growth models, J. Stat. Phys. 144 (2011), 1123–1150.
- [18] P.J. Forrester, The spectrum edge of random matrix ensembles, Nucl. Phys. B 402 (1993), 709–728.
- [19] P.J. Forrester, Exact results and universal asymptotics in the Laguerre random matrix ensemble, J. Math. Phys. 35 (1993), 2539–2551.
- [20] P.J. Forrester, Painlevé transcendent evaluation of the scaled distribution of the smallest eigenvalue in the Laguerre orthogonal and symplectic ensembles, arXiv:nlin.SI/0005064 (2000).
- [21] P.J. Forrester, Hard and soft edge spacing distributions for random matrix ensembles with orthogonal and symplectic symmetry, Nonlinearity 19 (2006), 2989–3002.
- [22] P.J. Forrester, Log-gases and random matrices, Princeton University Press, Princeton, NJ (2010).
- [23] P.J. Forrester, S.-H. Li and A.K. Trinh, Asymptotic correlations with corrections for the circular Jacobi -ensemble, J. Approximation Th. 271 (2021), 105633.
- [24] P.J. Forrester and A. Mays. Finite-size corrections in random matrix theory and Odlyzko’s dataset for the Riemann zeros. Proc. R. Soc. A, 471 (2015), 20150436.
- [25] P.J. Forrester, J.H.H. Perk, A.K. Trinh and N.S. Witte, Leading corrections to the scaling function on the diagonal for the two-dimensional Ising model, J. Stat. Mech. 2019 (2019), 023106.
- [26] P.J. Forrester and A.K. Trinh. Functional form for the leading correction to the distribution of the largest eigenvalue in the GUE and LUE. J. Math. Phys., 59(5) (2018), 053302.
- [27] P.J. Forrester and A.K. Trinh, Finite-size corrections at the hard edge for the Laguerre ensemble, Stud. Applied Math. 143 (2019), 315–336.
- [28] P.J. Forrester and A. Trinh. Optimal soft edge scaling variables for the Gaussian and Laguerre even ensembles. Nuclear Phys. B, 938 (2019), 621–639.
- [29] P.J. Forrester and N.S. Witte, Application of the -function theory of Painlevé equations to random matrices: PVI, the JUE,CyUE, cJUE and scaled limits, Nagoya Math. J. 174 (2004), 29–114.
- [30] I.M. Gessel, Symmetric functions and -recursiveness, J. Comb. Th. A 53 (1990), 257–285.
- [31] A.T. James, Distributions of matrix variate and latent roots derived from normal samples, Ann. Math. Statist. 35 (1964), 475–501.
- [32] K. Johansson, The longest increasing subsequence in a random permutation and a unitary random matrix model, Math. Research Lett. 5 (1998), 63–82.
- [33] J.P. Keating and N.C. Snaith, Random matrix theory and , Commun. Math. Phys. 214 (2001), 57–89.
- [34] B.F. Logan and L.A. Shepp, A variational problem for random Young tableaux, Advances in Math., 26 (1977), 206–222.
- [35] A.M. Odlyzko, The -nd zero of the Riemann zeta function, Dynamical, Spectral, and Arithmeitc Zeta Functions (M. van Frankenhuysen and M.L. Lapidus, eds.), Contemporary Math. 2001, Amer. Math. Soc, Providence, RI, (2001), 139–144.
- [36] A.M. Odlyzko and E.M. Rains, On Longest Increasing Subsequences in Random Permutations, Amer. Math. Soc., Contemp. Math. 251 (2000), 439–451.
- [37] E.M. Rains, Increasing subsequences and the classical groups, Elect. J. of Combinatorics 5 (1998), #R12.
- [38] J. Ramirez and B. Rider, Diffusion at the random matrix hard edge, Commun. Math. Phys. 288 (2009), 887–906.
- [39] D. Romik, The surprising mathematics of longest increasing subsequences, Institute of Mathematical Statistics Textbooks. Cambridge University Press (2015).
-
[40]
V. Sanaka, Longest Increasing Subsequence Size (N log N), Geeks for Geeks (2012), url:
www.geeksforgeeks.org/
longest-monotonically-increasing-subsequence-size-n-log-n/. - [41] P. Sarnak, Problems of the Millennium: The Riemann Hypothesis, Clay Mathematics Institute Annual Report (2004), 5–21.
- [42] T. Sasamoto, Spatial correlations of the 1D KPZ surface on a flat substrate, J. Phys. A 38 (2005), L549–L556.
- [43] C.A. Tracy and H. Widom, Level-spacing distributions and the Airy kernel, Commun. Math. Phys. 159 (1994), 151–174.
- [44] C.A. Tracy and H. Widom, On orthogonal and symplectic matrix ensembles, Commun. Math. Phys. 177 (1996), 727–754.
- [45] C.A. Tracy and H. Widom, Level-spacing distributions and the Bessel kernel, Commun. Math. Phys. 161 (1994), 289–309.
- [46] E.T. Whittaker and G.N. Watson, A course of modern analysis, 2nd ed., Cambridge University Press, Cambridge (1965).