On the -lazy version of Markov chains in estimation and testing problems
Abstract
Given access to a single long trajectory generated by an unknown irreducible Markov chain , we simulate an -lazy version of which is ergodic. This enables us to generalize recent results on estimation and identity testing that were stated for ergodic Markov chains in a way that allows fully empirical inference. In particular, our approach shows that the pseudo spectral gap introduced by Paulin (2015) and defined for ergodic Markov chains may be given a meaning already in the case of irreducible but possibly periodic Markov chains.
1 Introduction
Estimating the parameters of a discrete distribution and distinguishing whether an unknown distribution is identical to a reference one or is -far from it under some notion of distance, assuming access to iid samples, are two classical and well studied problems in statistics and computer science (e.g. Han et al. (2015), Kamath et al. (2015), Orlitsky and Suresh (2015), Batu et al. (2001), Valiant and Valiant (2017) and the references therein). In contrast, research on the analogue problems assuming the samples are obtained from a single long trajectory generated by an unknown Markov chain is rather young. This work is concerned with the latter and adds to the still rather short list of works on the subject: Wolfer and Kontorovich (2019, 2020, 2021), Cherapanamjeri and Bartlett (2019), Daskalakis et al. (2018), Chan et al. (2021), Fried and Wolfer (2021) and Hao et al. (2018). We shall review the results of these works in the Related literature section.
Before we rigorously present our main results, let us somewhat informally sketch their motivation: In Wolfer and Kontorovich (2021, Theorem 3.1) an estimator for the transition matrix of an unknown ergodic Markov chain was constructed, based on a single long trajectory generated by the Markov chain. In addition, they upper bounded the sample complexity of obtaining an estimation whose distance from under the infinity norm is less than a prescribed with high probability. The assumption that the unknown Markov chain is ergodic is crucial in their sample complexity analysis since it relies on concentration bounds (cf. Paulin (2015)) which assume ergodicity. We have noticed that it is possible to generalize their results to irreducible but possibly periodic Markov chains in the following way: Let be an irreducible Markov chain on states from which we can sample a single long trajectory of arbitrary length. Recall that if and denotes the identity matrix of size , then is called the -lazy version of . It is well known that the -lazy version of an irreducible Markov chain is ergodic. Crucially, it is possible to simulate even when one only has access (as is the case here) to a black box that generates a single long trajectory from (cf. (6) of Section 2.2). This simulation results in a trajectory sampled from the ergodic Markov chain on which we may apply the estimator mentioned above. Finally, we pull back the estimation of to an estimation of .
Several questions immediately arise from this procedure:
-
(a)
Can it be done with other estimation problems or even other statistical inference problems, e.g. identity testing?
-
(b)
The -lazy version of a Markov chain is a special case of a matrix transformation. Can the procedure be done with other transformations?
-
(c)
An upper bound for the sample complexity of the original (ergodic) estimation problem involves (the inverse of) the pseudo spectral gap of an ergodic Markov chain . It follows that the sample complexity in the irreducible case is upper bounded by for an irreducible Markov chain. Since for an irreducible but periodic Markov chain , no meaningful comparison between and is possible. Instead, we ask the following question: Suppose is ergodic. How are and related? An answer to this question quantifies the “price” one has to pay for not being sure whether an unknown irreducible Markov chain is periodic or not.
Our first two main results address the questions in (a) and (b) that are related to Markov chains estimation problems, by which we mean a tuple where is a set, is a distance function, i.e., for it holds: if and only if (Notice that we neither assume that is symmetric nor that it satisfies the triangle inequality. Thus, in addition to metrics such as the total variation or the Hellinger distance, our machinery is applicable to, for example, the Kullback–Leibler divergence). The set is a subset of the set of all transition matrices of irreducible Markov chains on states which we denote by . The set consists of the values that the parameter to be estimated can take and is accompanied by a map . The distance function is used to measure the distance between the true value of the parameter and its estimation. Following the works mentioned above, in the setting we consider we have access to a single long trajectory generated from an unknown Markov chain , started from an arbitrary state, and the purpose is to construct an estimator for some parameter of interest such that the distance between the estimation and the true value of the parameter is less than a prescribed . Along with the estimator we would like to know in advance how long the trajectory must be for the estimation to succeed. The minimal necessary trajectory length is referred to as the sample complexity. Since the estimation relies on a random trajectory, there is generally some probability that the trajectory obtained is not informative. Thus, in addition to , the sample complexity depends also on such that the estimator is guaranteed to succeed with probability at least .
The following theorem provides a sufficient condition for the feasibility of the procedure we described above. It uses the notions of a solution to an estimation problem and of extendibility of their solutions which will be defined in Section 3.1:
Theorem 1.1.
Consider a Markov chains estimation problem . Let and be such that for every and it holds
(1) |
Then every solution to extends to .
Let us demonstrate a typical usage of Theorem 1.1 by applying it on the problem of estimating the transition matrix mentioned above: We have
For we take which is defined as the set of all ergodic Markov chains on states whose minimum stationary probability and pseudo spectral gap are lower bounded by , respectively. The set is the default choice in this work since the sample complexity of the problems that we consider was shown to be upper bounded by expressions that involve and . Taking and for every , a natural candidate for is but it may well result in a matrix that is not irreducible. Even worse, we might end up with negative entries on the main diagonal, i.e., not a row-stochastic matrix. In order to guarantee an estimation which is an irreducible Makrov chain, is followed by a projection on the set of all row-stochastic matrices of size which we denote by , which, in turn, is followed by an additional adjustment to guarantee irreducibility. We discuss this example in detail in Example 3.2(b).
Our second main result shows that there are cases in which extending solutions is hard:
Theorem 1.2.
Let be a Markov chains estimation problem with a finite sample complexity and let be such that there exist with but . Let be such that for every there exists with the following property: Whenever satisfy then (this holds, for example, if is uniformly continuous). Then no solution to extends to via .
In Example 3.4 we shall show that the assumptions of Theorem 1.2 are satisfied in the setting of Wolfer and Kontorovich (2019, Theorem 8.1) who constructed an estimator for the pseudo spectral gap (in absolute error).
Our third main result is a reformulation of Theorem 1.1 to Markov chains identity testing problems. In this setting one needs to decide, based on a single long trajectory generated from an unknown Markov chain, whether a certain parameter of the Markov chain equals a specific value or is -far from it under some notion of distance for . We will apply the theorem on the Markov chains identity testing problem considered by Wolfer and Kontorovich (2020, Theorem 4.1) in which the parameter of interest is the transition matrix. The formal definition of Markov chains identity testing problems and of extendibility of their solutions is given in Section 3.2.
Theorem 1.3.
Let be a Markov chains identity testing problem. Let and be such that for every and it holds
(2) |
Then every solution to extends to .
Our last main result addresses the question in (c), namely, the connection between and for an ergodic Markov chain . Recall that the pseudo spectral gap captures the mixing time (cf. (5)) and that there exists a universal constant such that
(Lemma 4.5). We conjecture (Conjecture 4.6) that a similar inequality holds for the pseudo spectral gap. As a consequence of the following theorem we show that for an ergodic Markov chain such that is obtained at (cf. (4)) it holds
(3) |
This confirms the conjecture for this class of ergodic Markov chains that contains all ergodic and reversible Markov chains.
In the following theorem denotes the spectral gap of a reversible Markov chain and denotes the multiplicative reversibilization of an irreducible Markov chain :
Theorem 1.4.
Let be an ergodic Markov chain. Then
The paper is structured as follows: After a short literature review, in Section 2 we recall definitions and well known facts regarding Markov chains and their -lazy versions that will be used throughout this work. Our main reference here is Levin and Peres (2017); In section 3, that contains our main results, we formulate the framework and demonstrate its usage; Section 4 is devoted to the sample complexity aspect of moving to an -lazy version.
1.1 Related literature
In recent years there have been attempts to go beyond the iid assumption in statistical inference problems. The Markovian setting in which every sample depends only on the sample immediately preceding it is a natural weakening of the iid assumption. Works on the subject differ mainly in their choice of the distance notion between Markov chains: Wolfer and Kontorovich (2020, 2021) consider each row of the transition matrix separately and under the total variation distance between distributions. This leads to a distance notion based on the infinity norm. Their analysis relies on the concentration bounds of Paulin (2015) involving the pseudo spectral gap which is defined only for ergodic Markov chains. Chan et al. (2021) generalized Wolfer and Kontorovich’s results to all irreducible Markov chains by recognizing that the sample complexity under the infinity norm is characterized by what they refer to as the -cover time denoted by and defined to be the expected length of a trajectory such that every state is visited at least times. Taking a different path, Daskalakis et al. (2018) and Cherapanamjeri and Bartlett (2019) use a distance notion that relies on the largest eigenvalue of the geometric mean of the transition matrices. They make the stringent assumption that the Markov chains are symmetric (Fried and Wolfer (2021) subsequently showed that, in essence, reversibility suffices).
Our work is especially related to Chan et al. (2021) since it follows from both theirs and ours that the results of Wolfer and Kontorovich mentioned above apply to all irreducible Markov chains: Chan et al. (2021) achieve this by considering that is defined for every irreducible Markov chain and we by moving to the -lazy version.
We argue that while the -cover provides a sharper sample complexity characterization of the problem of inference in the single long trajectory setting, compared to the lower and upper bounds of Wolfer and Kontorovich that are based on the mixing properties and exhibit a logarithmic gap, at the moment it is not clear how to exploit this sharpness. Indeed, first, there is currently no method to estimate the -cover time from a single long trajectory directly. Second, all the upper bounds on the -cover times in terms of well known quantities that Chan et al. (2021) provide (e.g. for irreducible Markov chains and for ergodic Markov chains) involve the minimum stationary probability. To the best of our knowledge, Example 3.2 (a) (see also Lemma 4.2), which is a combination of our machinery and the estimator of Wolfer and Kontorovich (2019), provides the only fully empirical method for the estimation of the minimum stationary probability of all irreducible Markov chains from a single long trajectory. Furthermore, Wolfer and Kontorovich (2019) provide a similar estimator of the pseudo spectral gap. It seems that although considerable progress has been made regarding the computational problem of approximating the cover time (e.g. Feige and Rabinovich (2003), Bui and Sohier (2007), Feige and Zeitouni (2009) and Ding et al. (2011)), the statistical estimation of the cover time from a single trajectory remains a challenging open question.
2 Preliminaries
2.1 Markov chains
Let be a finite state space which we assume without loss to be equal to where . We denote by the set of all row-stochastic matrices and refer to elements of as Markov chains. For a matrix of size and we write for the entry at the th row and the th column of . Recall that a Markov chain is called irreducible if for every there exists such that . The period of a state is defined as
If a Markov chain is irreducible, all its states share the same period (Levin and Peres (2017, Lemma 1.6)). In this case, the periodicity of is defined as the period of one of its states. If the periodicity is , then is said to be aperiodic and otherwise periodic. Irreducible and aperiodic Markov chains are called ergodic.
We denote by (resp. ) the subset of consisting of all irreducible (resp. ergodic) Markov chains. Let and (to be used henceforth). Following Montenegro and Tetali (2006) and Hermon (2016), we call the -lazy version of where is the identity matrix of size . The simplex of all probability distributions over will be denoted by . If and we write for the th entry of . Vectors are assumed to be row-vectors.
Let and . By we mean
We say that is a stationary distribution of if . All Markov chains on a finite state space have at least one stationary distribution (Levin and Peres (2017, Section 1.7)) and irreducible markov chains have unique stationary distributions (Levin and Peres (2017, Corollary 1.17)). Furthermore, if is the stationary distribution of , then the minimum stationary probability of defined as
satisfies (Levin and Peres (2017, Proposition 1.14)). Ergodic Markov chains converge to their stationary distribution in total variation (Levin and Peres (2017, Theorem 4.9)), i.e., if and is its stationary distribution, then
where
(Levin and Peres (2017, Proposition 4.2)). For denote
and define the mixing time of by
Let with stationary distribution . We write for the edge measure of where is the diagonal matrix whose entries correspond to and call reversible if is symmetric. Suppose now is reversible. Then is an eigenvalue of with a one dimensional eigenspace and all eigenvalues of are real and lie in (Levin and Peres (2017, Lemmas 12.1 and 12.2)). They may be therefore ordered:
The spectral gap and the absolute spectral gap of are defined by
The absolute spectral gap and the minimum stationary probability of capture (Levin and Peres (2017, Theorems 12.3 and 12.4)):
For ergodic Markov chains that are not necessarily reversible there exists an analogue of the absolute spectral gap, namely the pseudo spectral gap that was introduced by Paulin (2015): Let with stationary distribution . Define the time reversal of by and the multiplicative reversibilization of by . The pseudo spectral gap of is then defined to be
(4) |
By Paulin (2015, Proposition 3.4),
(5) |
For denote by the set of all ergodic Markov chains whose minimum stationary probability and pseudo spectral gap are lower bounded by and , respectively.
If is the stationary distribution of some irreducible Markov chain and , define
Finally, if is a square matrix of size then the infinity norm of is defined by
2.2 The lazy version of a Markov chain
While Theorems 1.1, 1.2 and 1.3 are stated for arbitrary maps , we are mainly interested in the case where is defined by
Let us recall some of the properties of . To this end fix with stationary distribution :
-
(1)
.
-
(2)
is injective and is not surjective.
-
(3)
The stationary distribution of is also . In particular, .
-
(4)
There exist such that .
-
(5)
If is reversible, then so is .
-
(6)
It is possible to simulate even when one only has access to a black box that generates a single long trajectory from . Indeed, independently in each step, we toss a biased coin that comes out head with probability . If the coin shows head, we move one step along the trajectory from . Otherwise, we insert a duplicate of the current state at the position immediately after the current position and set it to be the new current position.
3 Main results
Definition 3.1, together with its analogue - Definition 3.5 - provide the basis for this work. The reader might find it useful to keep the following story in mind while reading them: Suppose we are interested in a certain parameter of irreducible Markov chains, e.g. their minimum stationary probability. Let denote the set of all values that this parameter may take, e.g. . Denote by the value that the parameter takes for . Let be an estimator that upon receiving a trajectory from an unknown , outputs an estimation of whose “quality” is measured by a distance function . Suppose we know that for some class the estimator needs not more than samples in order to output a “good” estimate with “high” probability. Now, suppose we want to use this estimator for Markov chains not belonging to . If there is a map into , the following recipe might work: Given , apply on and then pull back the estimation to an estimation of .
3.1 Estimation problems
In this section we define Markov chains estimation problems, their solutions and extendibility of solutions. We then prove Theorems 1.1 and 1.2 and apply these on the results of Wolfer and Kontorovich (2019, 2021).
Definition 3.1.
Let be a set and . Let be a distance function and . We refer to as a Markov chains estimation problem. An estimator is a sequence such that for every . Now, let and an estimator. We write for the output of the estimator upon receiving a sequence . Define the minimax risk
and the sample complexity
An estimator is a solution to if for every . Finally, let . We say that a solution to extends to if there exists such that is a solution to .
Proof of Theorem 1.1
Let be a solution to . We need to show that is a solution to . To this end, let . For every it holds
It follows that
Therefore, since , then also
∎
Example 3.2.
-
(a)
In Wolfer and Kontorovich (2019, Theorem 5.1) a procedure for estimating the minimum stationary probability of an ergodic Markov chain (in relative error) was constructed, thus, in our terminology, a solution to the following Markov chain estimation problem: Let and . Define
We shall now show that any solution to extends to . To this end, take and . Then and satisfy condition (1). Indeed, let and such that . Since we also have .
-
(b)
In Wolfer and Kontorovich (2021, Theorem 3.1) a procedure for estimating the transition matrix of an ergodic Markov chain was constructed, thus, a solution to the following Markov chains estimation problem (actually, Wolfer and Kontorovich did not make sure that the output of their estimator is ergodic, or even irreducible, but only a row-stochastic matrix. Nevertheless, this is easily achieved as we shall see immediately): Let and . Define
We shall now show that any solution to extends to . Defining is straightforward:
To justify the definition of that we subsequently give, let us notice that for , a solution to gives an estimation of , and is a row-stochastic matrix. But it might happen that . Thus, applying to might lead to negative entries on the main diagonal and therefore to a matrix which is not row-stochastic. To mitigate this, one possibility is to project onto with respect to which is equivalent to projecting each of ’s rows on with respect to the norm. This leads to a convex optimization problem whose details we give in 5.1 in the appendix. It remains to make sure that the row-stochastic matrix is irreducible. To this end denote by the square matrix of size that has all entries equal to and for and let . Clearly, is an irreducible (even ergodic) Markov chain. Furthermore,
(6) Thus, for we define
where by we mean that the procedure from Lemma 5.2 is applied independently to each row of . Then and satisfy condition of (1). Indeed, let and such that . This inequality is equivalent to By Lemma 5.2, inequality (6) and the triangle inequality, .
Remark 3.3.
Our definition of a solution to a Markov chains estimation problem only assumes that the sample complexity is finite and therefore sheds no light on the extent to which the sample complexity might change while extending a solution. We postpone the treatment of this aspect to Section 4.
We now come to the
Proof of Theorem 1.2
Let be a solution to . By assumption,
Then there exists such that if satisfy , then . Assume is a solution to . Thus, for large enough, we have
It follows that, with probability at least ,
By the assumption on , with probability at least ,
On the other hand, with probability at least , for :
This means that, with probability at least ,
A contradiction. ∎
Example 3.4.
In Wolfer and Kontorovich (2019, Theorem 8.1) a procedure for estimating the pseudo spectral gap (in absolute error) of an ergodic Markov chain was constructed, thus, a solution to the following Markov chain estimation problem: Let and . Define
We claim that no solution to extends to with continuous (together with the compactness of this means that is uniformly continuous). Indeed, by Theorem 1.2, it suffices to find such that but . Consider
Then but . To see that, first notice that is periodic and therefore (cf. Paulin (2015, p. 11)). Let denote the stationary distribution of . It holds
Thus, with we have
Conclude that both and are reversible and therefore also and . By Wolfer and Kontorovich (2021, p. 18), for ergodic and reversible Markov chains, the maximum in the definition of the pseudo spectral gap is obtained at . Thus, for :
The claim follows now by calculating the second largest eigenvalue of in each of the three cases.
3.2 Identity testing problems
In this section we define Markov chains identity testing problems, their solutions and extendibility of solutions. We then apply Theorem 1.3 on the results of Wolfer and Kontorovich (2020, Theorem 4.1).
Definition 3.5.
Let be a set and . Let be a distance function and . We refer to as a Markov chains identity testing problem. An identity tester is a sequence such that for every . Now, let and an identity tester. We will write for the output of the identity tester upon receiving a sequence and . Define the minimax risk
and the sample complexity
An identity tester is a solution to if
for every .
Let such that . We say that a solution to extends to if is a solution to
Example 3.6.
In Wolfer and Kontorovich (2020, Theorem 4.1) a procedure for identity testing of Markov chains transition matrices was constructed, thus, a solution to the following Markov chains identity testing problem: Let and . Define
Then any solution to extends to . Indeed, given by
4 The cost of moving to an -lazy version of a Markov chain
In the previous section we were solely interested in the feasibility of extending solutions to Markov chains estimation and testing problems. In this section we wish to highlight that extending a solution might result in an increase in the sample complexity of the problem. In particular, we give the exact statements of the problems whose solutions we extended.
Remark 4.1.
Let . It follows from the way the Markov chain is simulated (cf. (6) of Section 2.2) that in order to obtain the actual number of samples coming from , the length of the simulated Markov chain should be multiplied by approximately . In order to make this observation more quantitative, let us denote by the number of samples coming from . Thus, if we simulate for steps, then . In particular, and for every , by Hoeffding’s inequality,
4.1 The sample complexity of some extended solutions
The bounds given in this section are for the simulated -lazy Markov chain. The observations made in Remark 4.1 may be used to calculate the actual sample size from the original Markov chain.
Lemma 4.2 (Example 3.2(a)).
Let and . Then there exist a universal constant and an estimator for such that if
then, upon receiving a sequence , it holds
with probability at least .
Lemma 4.3 (Example 3.2(b)).
Let and . Let be the stationary distribution of . Then there exist a universal constant and an estimator for such that if
then, upon receiving a sequence , it holds
with probability at least .
Lemma 4.4 (Example 3.6).
Let and . There exist a universal constant and an identity tester such that if
then, upon receiving a sequence and , it holds
with probability at least .
4.2 Bounds on the mixing time and the pseudo spectral gap of the -lazy version
A natural question that arises is by how much the sample complexity can increase if we unnecessarily move to the -lazy version, i.e., the Markov chain is already ergodic. The following Lemma, that bounds the mixing time of the -lazy version in terms of and the mixing time of the original Markov chain, seems to be folklore. We give its proof for completeness in (5.2) in the Appendix.
Lemma 4.5.
Let and let . Then
In particular,
A similar result to Lemma 4.5 seems to hold for the pseudo spectral gap:
Conjecture 4.6.
There exists a function such that for every it holds .
As a consequence of the following lemma we establish Conjecture 4.6 for a certain class that contains all ergodic and reversible Markov chains.
Proof of Theorem 1.4
Let be the stationary distribution of . Denote
By Levin and Peres (2017, Lemmas 13.11 and 13.12) and noticing that , we have
(7) |
It is well known (e.g. Marshall et al. (1979, Theorem 9.F.4)) that
Thus,
It follows that
∎
Corollary 4.7.
Proof.
Remark 4.8.
The class
of which ergodic and reversible Markov chains are a subclass, seems to be very large. Indeed, for it suffices that . This follows from the more general observation that if for some then is obtained at .
5 Appendix
5.1 -projection on
In Example 3.2(b) it was necessary to project matrices on with respect to . Since in this norm each row is considered separately, this problem is equivalent to the problem of projecting each of the matrices rows on with respect to the norm. Thus, we need to solve (at most) optimization problems of the following form: For find an -projection of on (cf. Boyd et al. (2004, p. 397)):
(8) |
where
Notice that, in contrast to for , optimization problem (8) has, in general, infinitely many solutions and we will understand as the set of all such solutions.
The following lemma seems to be well known but we were not able to find a reference. In it, only parts (a) and (b) and are relevant for our needs.
Lemma 5.1.
Let . Denote
If then choose any . Otherwise, define as follows: Set for every . Now, for every :
-
(a)
If then set
-
(b)
If then choose any such that .
-
(c)
If then choose any such that .
Then .
Proof.
First assume and let . For each denote . Notice that . It holds
Now assume . Let . We consider each of the three possibilities for separately:
-
(a)
Without loss, there exists such that for every and for every . It holds
-
(b)
Without loss, there exist such that
Then,
(9) Thus, it suffices to show that
(10) Indeed,
(11) and, by assumption, for every .
-
(c)
Similar to the previous case.
∎
An ordinary triangle inequality trick would have introduced an additional -factor on the sample complexity in Example 3.2(b). The following lemma shows that this may be avoided:
Lemma 5.2.
Suppose and let such that . Let and assume that . Then there exists such that .
Proof.
If then and we may take . Otherwise, assume without loss that there is such that for and for . Let be minimal such that (such must exist since ). Define as follows: For let
It holds
∎
5.2 Proof of Lemma 4.5
Proof.
Let be the stationary distribution of and suppose . Let be a sequence of i.i.d. random variables, independent from , such that for each . For define and . Clearly, .
For and to be determined later we have for :
Consider the second term in the last expression: Since the total variation distance is monotone decreasing (cf. Lalley (2009, Proposition 7)), we have
Thus, for it holds
Now, by Hoeffding’s inequality, if for some then
Taking we obtain
It follows that if then . Thus, we may take
The last assertion follows from Levin and Peres (2017, (4.36) on p. 55). ∎
References
- Batu et al. (2001) T. Batu, E. Fischer, L. Fortnow, R. Kumar, R. Rubinfeld, and P. White. Testing random variables for independence and identity. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 442–451. IEEE, 2001.
- Boyd et al. (2004) S. Boyd, S. P. Boyd, and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
- Bui and Sohier (2007) A. Bui and D. Sohier. How to compute times of random walks based distributed algorithms. Fundamenta Informaticae, 80(4):363–378, 2007.
- Chan et al. (2021) S. O. Chan, Q. Ding, and S. H. Li. Learning and testing irreducible Markov chains via the -cover time. In Algorithmic Learning Theory, pages 458–480. PMLR, 2021.
- Cherapanamjeri and Bartlett (2019) Y. Cherapanamjeri and P. L. Bartlett. Testing symmetric Markov chains without hitting. In Conference on Learning Theory, pages 758–785. PMLR, 2019.
- Daskalakis et al. (2018) C. Daskalakis, N. Dikkala, and N. Gravin. Testing symmetric Markov chains from a single trajectory. In Conference On Learning Theory, pages 385–409. PMLR, 2018.
- Ding et al. (2011) J. Ding, J. R. Lee, and Y. Peres. Cover times, blanket times, and majorizing measures. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 61–70, 2011.
- Feige and Rabinovich (2003) U. Feige and Y. Rabinovich. Deterministic approximation of the cover time. Random Structures & Algorithms, 23(1):1–22, 2003.
- Feige and Zeitouni (2009) U. Feige and O. Zeitouni. Deterministic approximation for the cover time of trees. arXiv preprint arXiv:0909.2005, 2009.
- Fried and Wolfer (2021) S. Fried and G. Wolfer. Identity testing of reversible Markov chains. arXiv preprint arXiv:2105.06347, 2021.
- Han et al. (2015) Y. Han, J. Jiao, and T. Weissman. Minimax estimation of discrete distributions. In 2015 IEEE International Symposium on Information Theory (ISIT), pages 2291–2295. IEEE, 2015.
- Hao et al. (2018) Y. Hao, A. Orlitsky, and V. Pichapati. On learning Markov chains. arXiv preprint arXiv:1810.11754, 2018.
- Hermon (2016) J. Hermon. Maximal inequalities and mixing times. PhD thesis, UC Berkeley, 2016.
- Kamath et al. (2015) S. Kamath, A. Orlitsky, D. Pichapati, and A. T. Suresh. On learning distributions from their samples. In Conference on Learning Theory, pages 1066–1100. PMLR, 2015.
- Lalley (2009) S. P. Lalley. Convergence rates of Markov chains, 2009.
- Levin and Peres (2017) D. A. Levin and Y. Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
- Marshall et al. (1979) A. W. Marshall, I. Olkin, and B. C. Arnold. Inequalities: theory of majorization and its applications, volume 143. Springer, 1979.
- Montenegro and Tetali (2006) R. Montenegro and P. Tetali. Mathematical aspects of mixing times in Markov chains. Foundations and Trends in Theoretical Computer Science, 1(3):237–354, 2006.
- Orlitsky and Suresh (2015) A. Orlitsky and A. T. Suresh. Competitive distribution estimation: Why is Good-Turing good. In NIPS, pages 2143–2151, 2015.
- Paulin (2015) D. Paulin. Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron. J. Probab., 20:32 pp., 2015. doi: 10.1214/EJP.v20-4039. URL https://doi.org/10.1214/EJP.v20-4039.
- Valiant and Valiant (2017) G. Valiant and P. Valiant. An automatic inequality prover and instance optimal identity testing. SIAM Journal on Computing, 46(1):429–455, 2017.
- Wolfer and Kontorovich (2019) G. Wolfer and A. Kontorovich. Estimating the mixing time of ergodic Markov chains. In A. Beygelzimer and D. Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 3120–3159, Phoenix, USA, 25–28 Jun 2019. PMLR. URL http://proceedings.mlr.press/v99/wolfer19a.html.
- Wolfer and Kontorovich (2020) G. Wolfer and A. Kontorovich. Minimax testing of identity to a reference ergodic Markov chain. In S. Chiappa and R. Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 191–201, Online, 26–28 Aug 2020. PMLR. URL http://proceedings.mlr.press/v108/wolfer20a.html.
- Wolfer and Kontorovich (2021) G. Wolfer and A. Kontorovich. Statistical estimation of ergodic Markov chain kernel over discrete state space. Bernoulli, 27(1):532–553, 02 2021. doi: 10.3150/20-BEJ1248. URL https://doi.org/10.3150/20-BEJ1248.