Deciding Subspace reachability problems with application to Skolem’s Problem

Abstract.

The higher-dimensional version of Kannan and Lipton’s Orbit Problem asks whether it is decidable if a target vector space can be reached from a starting point under repeated application of a linear transformation. This problem has remained open since its formulation, and in fact generalizes Skolem’s Problem — a long-standing open problem concerning the existence of zeros in linear recurrence sequences. Both problems have traditionally been studied using algebraic and number theoretic machinery. In contrast, this paper reduces the Orbit Problem to an equivalent version in real projective space, introducing a basic geometric reference for examining and deciding problem instances. We find this geometric toolkit enables basic proofs of sweeping assertions concerning the decidability of certain problem classes, including results where the only other known proofs rely on sophisticated number-theoretic arguments.

Key words and phrases:

Orbit Problem, Skolem Problem, Reachability, Linear Dynamical Systems

2010 Mathematics Subject Classification:

11B37, 03D99, 68Q01

The author is supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 2140001.

1. Introduction

In a pair of seminal papers [kannan1980orbit, kannan1986polynomial] Kannan and Lipton introduced the Orbit Problem, motivated by reachability questions of linear sequential machines raised by Harrison in 1969 [harrison1969lectures]. Given a linear transformation $A\in\mathbb{Q}^{d\times d}$ , and elements $x,y\in\mathbb{Q}^{d}$ , the Orbit Problem asks whether it can be decided if there exists an $n\in\mathbb{N}$ , such that $A^{n}x=y$ . Kannan and Lipton proved decidability of this point-to-point version of the problem, but remarked that when the target is a subspace $U$ of $\mathbb{Q}^{d}$ , the problem becomes far more difficult. Despite the importance of this problem due to its connection with Skolem’s Problem and areas of program verification, no progress was made on the higher-dimensional Orbit Problem until 2013, when Chonev, Ouaknine, and Worrell proved that the higher-dimensional Orbit Problem is decidable whenever the dimension of the target space is one, two, or three [chonev2013orbit, chonev2016complexity]. In following work, the same authors considered a version of the problem where the target space is not a subspace but rather a polytope [chonev2014polyhedron], proving decidability when the target is of dimension three or less, and hardness with respect to long-standing number theoretic open problems for higher dimensions.

Since this set of breakthroughs however, little progress has been made on the higher-dimensional Orbit Problem, due in large part to a severe lack of structure. There have been results establishing decidability for generalized cases in which the source and target sets are both either polytopes [almagorPolytopeCollision] or semi-algebraic sets [almagorSemialgebraicOrbit], however, such results continue to only apply to the case when the dimension is at most three. An alternative approach present in recent literature involves generating sets invariant under the action of the matrix $A$ that contain the starting point, and then proving such sets are disjoint from the target set [de2018left, fijalkow2019complete, almagor2022minimal]. However, the non-existence of an invariant set does not imply decidability, and these results either only apply to instances for which decidability is already known, or fail to capture natural classes of problem instances and are conditioned on long-standing number theoretic conjectures. This body of literature constitutes the state-of-the-art, speaking to the difficulty of the problem.

Our work advances the state-of-the-art by providing a geometric machinery used to determine a broad category of decidable instances of the higher-dimensional Orbit Problem, not by restricting the dimension of the target and ambient spaces, but rather the spectral structure of the matrix. The main result of this paper, stated below, concerns a generalization of the higher-dimensional Orbit Problem where the target space is a union of subspaces with convex polytopes.

Theorem 1.

Let $(A,x,U)$ be any non-degenerate¹¹1An instance is non-degenerate if the matrix $A$ has no two distinct eigenvalues whose quotient is a root of unity. It is well-known that any degenerate instance can be effectively reduced to solving a collection of non-degenerate instances. instance of the Orbit Problem, where $A\in\mathbb{Q}^{d\times d}$ , $x\in\mathbb{Q}^{d}$ is non-trivial²²2We say a point $x$ is non-trivial (w.r.t. a matrix $A$ ) if $x$ has a non-zero component with respect to the Jordan blocks of Jordan matrix $J(A)$ . The non-triviality condition on $x$ assures the problem does not collapse to a lower-dimension; if the condition is removed an essentially identical statement holds., and $U$ is a target set in $\mathbb{Q}^{d}$ composed of a finite union of subspaces with convex polytopes, described by rational parameters.

Suppose $A$ has $r$ distinct eigenvalues $\lambda_{1},...,\lambda_{r}$ of maximal modulus. Let $n_{i}$ denote the algebraic multiplicity of the root $\lambda_{i}$ , $i=1,...,r$ , in the minimal polynomial of $A$ . Suppose $n_{1}=\cdots=n_{p}$ , $p\leq r$ , are the largest such values. Then there exists a subspace $W$ computable from $A$ and $x$ of dimension $p$ , such that if $W\cap U=\{0\}$ , it is decidable whether there exists $n\in\mathbb{N}$ such that $A^{n}x\in U$ .

Intuitively, Theorem 1 can be thought of as solving “typical” instances of the Orbit Problem whenever $U$ is a subspace and $p+\text{dim}(U)\leq d$ . But the Theorem is to be primarily viewed as providing a tool for determining decidability of problem classes. For instance, one immediate application of Theorem 1 is in proving the result of Chonev, Ouaknine, and Worrell that the higher-dimensional Orbit Problem is decidable whenever the dimension of the target space is one [chonev2013orbit, chonev2016complexity]. Namely, we use Theorem 1 to give a simple geometric proof of the following

Theorem 2.

Let $(A,x,U)$ be any non-degenerate instance of the Orbit Problem, where $A\in\mathbb{Q}^{d\times d}$ is nonsingular, $x\in\mathbb{Q}^{d}$ , and $U=\emph{span}(v)$ for $v\in\mathbb{Q}^{d}\setminus\{0\}$ . Then it is decidable whether there exists an $n\in\mathbb{N}$ such that $A^{n}x\in U$ .

The use of Theorem 1 reaches beyond the Orbit Problem to Skolem’s Problem as well, which is known to reduce to the higher-dimensional Orbit Problem. Skolem’s Problem is a long-standing open question asking whether it is decidable if the set of zeros of a linear recurrence sequence is empty (see e.g. [ouaknine2015linear, halava2005skolem] for review), and is known to be an impenetrable problem where even simple cases escape our most advanced techniques. Indeed, Tao described the openness of this problem as “faintly outrageous,” as it indicates “we do not know how to decide the halting problem even for linear automata” [tao2008structure].

Decidability of Skolem’s Problem for linear recurrence sequences of dimension three and four was given in the 1980’s [shorey1984distance, vereshchagin1985occurrence], along with decidability when there are at most three dominating characteristic roots of the sequence (see [sha2019effective] for refinements of this work). The methods used to prove such results rely crucially on sophisticated results in transcendental number theory, particularly Baker’s lower bounds on the magnitudes of linear forms in logarithms of algebraic numbers, and van der Poorten’s results in the setting of $p$ -adic valuations. In contrast, we use Theorem 1 paired with a basic geometric insight to prove decidability of Skolem’s Problem for the case of arbitrarily many dominating roots, so long as the set of dominating roots contains a real root with largest algebraic multiplicity in the minimal polynomial.

Theorem 3.

Let $\{u_{n}\}_{n=0}^{\infty}$ be an order $d$ non-degenerate linear recurrence sequence, with $A$ the associated matrix, and suppose the initial terms form a vector $x\in\mathbb{Q}^{d}$ non-trivial with respect to the matrix $A$ . Suppose the linear recurrence sequence has $r\leq d$ distinct characteristic roots $\lambda_{1},...,\lambda_{r}$ of largest modulus. Let $n_{i}$ denote the algebraic multiplicity of the root $\lambda_{i}$ , $i=1,...,r$ , in the minimal polynomial of $A$ . Suppose $\lambda_{1}$ is real, and $n_{1}>n_{i}$ , $i=2,...,r$ . Then it is decidable whether the sequence has a zero term.

We emphasize that in both Theorem 1 and 3, we care only about the multiplicity of the roots in the minimal polynomial of $A$ , which need not coincide with the multiplicity of the roots of the characteristic polynomial. Moreover, in contrast to many of the results towards Skolem’s problem such as [sha2019effective, biluMFCSskolem], we highlight that Theorem 3 makes no assumptions about the simplicity (diagonalizability) of the linear recurrence sequence, nor about the order of the sequence.

1.1. Techniques

Traditionally, work toward determining the decidability of the Orbit Problem and related problems such as Skolem’s Problem has leveraged heavy machinery from various domains of algebraic geometry, mathematical logic, and number theory — particularly algebraic and transcendental number theory. In contrast, our approach is to develop a machinery suitable for application to the Orbit Problem by leveraging basic geometric and dynamical insight.

In essence, we consider not the dynamical system $(\mathbb{R}^{d},A)$ the Orbit Problem is formulated in, but rather the associated induced dynamical system over real projective space (i.e. after projecting onto $S^{d-1}$ and identifying opposite points). Working instead in $\mathbb{R}\mathbb{P}^{d-1}$ , we can ask an equivalent version of the Orbit Problem that is decidable if and only if the original formulation is decidable. We find that by compactifying the state space in such a way, the asymptotic behavior of the system becomes far more understandable: in projective space, we have asymptotically stable (converging) orbits, while no-such behavior is expressed in Euclidean space. As a consequence, we can use arguments unavailable in the original formulation. The central contribution of our paper can then be interpreted as a fine-grained examination of the $\omega$ -limit sets — or more generally the stable manifolds — of dynamical systems $(\mathbb{R}\mathbb{P}^{d-1},f)$ for $f\in PGL(\mathbb{R}^{d})$ , namely linear dynamical systems over real projective space, with this deepened understanding lending itself to deciding the higher-dimensional Orbit Problem.

This reduction is both surprising and noteworthy for a number of reasons. First, we note that although we lose an entire manifold dimension after reducing the problem to projective space, we maintain all the necessary information for deciding the Orbit Problem — indeed, the reduction can be seen as keeping only the “essential” aspects of the Orbit Problem. In addition, this reduction compactifies the state space, and as a consequence enables the application of general tools and intuition from the study of continuous dynamical systems over a compact space, for which there is far more structure and results than in the unbounded case, enabling the application of an entire area of mathematics not used to date.

To the best of our knowledge, no previous work on the Orbit Problem and related questions leverage the techniques introduced here. However, a few works come close in spirit to the asymptotic analysis used here, particularly work on termination of linear loops (see [akshay2022robustness, ouaknine2014positivity, braverman2006termination, hosseiniIntegerTermination, tiwari2004termination]), where the observation that the largest eigenvalues dominate the behavior of the program is used repeatedly. A similar technique is well-known in the study of linear recurrence sequences, whereby restricting the spectral structure of the problem and using the closed-form solution of an LRS, it can be clear how the characteristic roots of maximal modulus dominate the asymptotic behavior. Although this paper leverages the same basic principle, we take it much farther to obtain finer results.

1.2. Related Work

One of the original sources of motivation for studying the Orbit Problem was Skolem’s Problem, due to the fact that Skolem’s Problem reduces to the Orbit Problem when the target space has dimension $d-1$ . Although Skolem’s Problem is easy to describe, it is difficulty to prove deep theorems on account of a lack of machinery. Indeed, in terms of lower bounds, it is known that Skolem’s Problem is NP-hard [blondel2002presence] — which translates to the Orbit Problem when restriction is not placed on the dimension of the target subspace. Moreover, deep work has shown that further advances on the Skolem and Orbit Problem for higher dimensions are likely to be hard due to the fact that such advances in related problems would entail major advances in Diophantine approximation [ouaknine2014positivity]. Despite these difficulties, there continues to be notable advances toward Skolem’s problem (see e.g. [lipton2022skolem, biluMFCSskolem]), including recent promising new techniques by way of Universal Skolem Sets [kenison2020skolem, luca2021universal, lucaMFCSuniversal, luca2023skolem]. Nonetheless, despite the intense study on this problem, our understanding remains incomplete.

Another principle source of motivation in studying the Orbit Problem and related questions comes from the problem of program verification. Specifically, enormous effort has been dedicated to solving the “Termination Problem,” chiefly concerned with deciding whether a “linear while loop” will terminate (see e.g. [tiwari2004termination, braverman2006termination, hosseiniIntegerTermination, cook2006termination, cook2006terminator, bozga2012deciding, ben2013linear, bradley2005termination, ouaknine2014termination, karimov2022s, vyalyi2011orbits, luca2022algebraic]). In particular, we note that the Polytope Hitting Problem where the target is an intersection of half spaces, is closely connected to the higher-dimensional Orbit Problem, and more immediately translates to problems of program verification and termination of linear loops [chonev2014polyhedron].

Acknowledgements

The author is grateful to David Cash for the continued feedback and support.

2. Preliminaries

The purpose of this section is to establish notation, as well as results and machinery used for the proofs of Theorems 1, and 3 in Section 3. Sections 2.3 and 2.4 are the richest, where the key lemmas of this paper are formed.

Given an instance $(A,x,U)$ of the Orbit Problem, always take $A\in\mathbb{Q}^{d\times d}$ , $x\in\mathbb{Q}^{d}$ , and $U$ presented via sets of basis vectors from $\mathbb{Q}^{d}$ . With such a basis in hand it is clearly decidable whether a vector $A^{n}x\in\mathbb{Q}^{d}$ is in $U$ . For instance, to verify whether $A^{n}x\in U$ when $U$ is just a subspace, we compute the determinant $B^{T}B$ where $B$ is the matrix whose columns are $A^{n}x$ and the basis vectors specifying $U$ . Then $n$ is a witness to the problem if and only if this determinant is zero.

For the purposes of this paper, there is no harm in additionally supposing that the instances of the Orbit Problem are non-degenerate. An instance $(A,x,U)$ of the Orbit Problem is degenerate if there exists two distinct eigenvalues of $A$ whose quotient is a root of unity. If not, the instance is non-degenerate. Any instance of the Orbit Problem can be reduced to a finite set of non-degenerate instances.

Let $\lambda_{1},...,\lambda_{m}$ label the eigenvalues of $A\in\mathbb{Q}^{d\times d}$ , $m\leq d$ . We always assume the roots are labelled so that $|\lambda_{1}|\geq|\lambda_{2}|\geq\cdots\geq|\lambda_{m}|$ . In the case $|\lambda_{1}|=\cdots=|\lambda_{r}|>|\lambda_{r+1}|$ we say that $A$ has exactly $r$ roots of maximal modulus, or $r$ dominating roots.

In this paper we work in ambient Euclidean space $\mathbb{R}^{d}$ , with orbits $\{A^{n}x\}_{n=0}^{\infty}$ contained in $\mathbb{Q}^{d}$ . All standard algebraic operations, such as sums, products, root finding of polynomials, and computing Jordan canonical forms of rational matrices [cai1994computing] are well-known to be computable, and hence we do not discuss the details of such procedures (see Section 2.1 for details on the effective manipulation of algebraic numbers).

Recall every matrix $A\in\mathbb{R}^{d\times d}$ can be written in the form $A=QJQ^{-1}$ where $Q$ is nonsingular and $J$ is the Jordan canonical form of $A$ . Let $J=J(A)$ denote the Jordan matrix of $A$ . For the purpose of this paper, we use the real Jordan canonical form.

Lemma 1 (Real Jordan Canonical Form).

For any matrix $A\in\mathbb{R}^{d\times d}$ , there exists a basis in which the matrix of $A$ is a quasi-diagonal matrix $J=\text{Diag}(J_{1},...,J_{N})$ where each block $J_{i}$ is of form

\begin{pmatrix}\lambda_{t}&1&0&\cdots&0\\ 0&\lambda_{t}&1&\cdots&0\\ \vdots&&\ddots&\ddots\\ 0&0&\cdots&\lambda_{t}&1\\ 0&0&\cdots&&\lambda_{t}\end{pmatrix}\text{ or }\begin{pmatrix}\alpha_{l}&\beta_{l}&1&0&0&\cdots&&&&0\\ -\beta_{l}&\alpha_{l}&0&1&0&\cdots&&&&0\\ 0&0&\alpha_{l}&\beta_{l}&1&0\\ \vdots&\vdots&-\beta_{l}&\alpha_{l}&0&1\\ &&&&\alpha_{l}&\beta_{l}\\ &&&&-\beta_{l}&\alpha_{l}\\ &&&&&&\ddots\\ &&&&&&\alpha_{l}&\beta_{l}&1&0\\ \vdots&\vdots&&&&&-\beta_{l}&\alpha_{l}&0&1\\ 0&0&\cdots&&&&&&\alpha_{l}&\beta_{l}\\ 0&0&\cdots&&&&&&-\beta_{l}&\alpha_{l}\end{pmatrix},

where the $\lambda_{t}$ , $t=1,...,u$ are the real eigenvalues of $A$ , and the $\lambda_{l},\bar{\lambda}_{l}=\alpha_{l}\pm i\beta_{l}$ , $l=1,...,s$ , $\beta>0$ are the complex eigenvalues of $A$ . The sizes of the blocks are determined by the elementary divisors of $A$ .

As such, we have $A=QJQ^{-1}$ , and if $A$ is an algebraic matrix, then $J$ and $Q$ are also algebraic matrices and their entries can be computed from the entries of $A$ . The usefulness of the real Jordan canonical form follows from the fact that in the case the characteristic polynomial of $A$ has complex roots, the Jordan blocks of $J$ continue to have real entries. For further details on the real Jordan canonical form including proof of Lemma 1 and effective methods for computing $J$ , consult [shilov2012linear]. For the remainder of this paper, we suppose $J$ is the real Jordan canonical form of $A$ . In addition, suppose Jordan blocks $J_{i}$ , $i=1,...,N$ are of dimension $D_{i}$ , and associated with eigenvalues $\lambda_{i}$ . Of course, there may be many Jordan blocks associated with a single eigenvalue $\lambda_{i}$ , so $N\geq m$ , but we often abuse notation and say $J_{i}$ associates to $\lambda_{i}$ to speak of the collection of Jordan blocks associated with $\lambda_{i}$ — in the event more precise language is needed we use it.

Moreover, we generally assume initial points $x$ are non-trivial (with respect to a matrix $A$ ). That is, a point $x$ is non-trivial if in the Jordan basis of the matrix $A$ , $x$ has at least one-non-zero component with respect to each Jordan block composing $J(A)$ . This assumption prevents a “collapse” to lower order instances, and is not technically necessary in most cases — if it is removed then the statements are often very similar if not ultimately identical. Nonetheless, we keep the assumption as it simplifies matters.

We reduce the Orbit Problem to a simpler version where $A$ is taken to be a block-diagonal Jordan matrix by way of the following

Lemma 2.

Deciding instances $(A,x,U)$ of the Orbit Problem reduces to deciding instances $(J,x^{\prime},U^{\prime})$ where $J=J(A)$ is the Jordan matrix of $A$ .

Proof.

Let $A=QJQ^{-1}$ with $J$ the Jordan canonical form of $A$ , $Q$ nonsingular, and recall $A^{k}=QJ^{k}Q^{-1}$ . Then $A^{k}x\in U$ for some $k\in\mathbb{N}$ implies

QJ^{k}Q^{-1}x\in U,\text{ implying }J^{k}(Q^{-1}x)\in Q^{-1}U.

Letting $Q^{-1}x=x^{\prime}$ and $Q^{-1}U=U^{\prime}$ , it follows that $A^{k}x\in U$ if and only if $Jx^{\prime}\in U^{\prime}$ . ∎

As a consequence of Lemma 2, for the remainder of this paper when working with an instance $(A,x,U)$ of the Orbit Problem, we implicitly work with the equivalent instance $(J,x^{\prime},U^{\prime})$ .

2.1. Algebraic numbers and computing eigenvalues, eigenvectors

Let $\mathbb{A}$ denote the field of algebraic numbers; a complex number $\alpha$ is algebraic if it is a root of a single variable polynomial with integer coefficients. The matrix $A$ is taken to have rational entries, so all eigenvalues $\lambda$ of $A$ are in $\mathbb{A}$ . This paper requires we effectively perform various operations with the eigenvalues of $A$ , which may not lie in $\mathbb{Q}$ . Fortunately, there is a sizable literature concerning computation with algebraic numbers (see [AGAlgorithms, cohen2013course], or the appendix of [chonev2016complexity] for a useful review). Below, we summarize the basic properties of the effective manipulation of algebraic numbers pertaining to this paper.

For $\alpha\in\mathbb{A}$ , the defining polynomial of $\alpha$ , denoted $p_{\alpha}$ , is the unique polynomial of least degree vanishing at $\alpha$ , where the coefficients do not have common factors. Define the degree and height $H(p)$ of $\alpha$ to be the degree of $p$ , and the maximum of the absolute values of $p$ ’s coefficients, respectively. As such, standard finite encoding of an algebraic number $\alpha$ is as a tuple composed of its defining polynomial, and rational approximations of its real and imaginary parts of precision sufficient to distinguish $\alpha$ from the other roots of $p_{\alpha}$ . That is, $\alpha\in\mathbb{A}$ takes the representation $(p_{\alpha},a,b,r)\in\mathbb{Z}[x]\times\mathbb{Q}^{3}$ , when $\alpha$ is the unique root of $p_{\alpha}$ inside a circle in $\mathbb{C}$ of radius $r$ , centered at $a+bi$ . This representation of $\alpha$ is well-defined and computable in part thanks to a useful separation bound of Mignotte [mignotte1982some], which asserts that for distinct roots $\alpha,\beta$ of polynomial $p\in\mathbb{Z}[x]$

|\alpha-\beta|>\frac{\sqrt{6}}{d^{(d+1)/2}H^{d-1}},

where $d=deg(p)$ and $H=H(p)$ . As a consequence, when $r$ is less than the root separation bound, we have equality checking between algebraic numbers. Given distinct $\alpha,\beta\in\mathbb{A}$ these are roots of $p_{\alpha}p_{\beta}$ , which is of degree at most $deg(\alpha)+deg(\beta)$ , and of height at most $H(\alpha)H(\beta)$ . Then one may compute $\alpha+\beta$ , $\alpha\beta$ , $1/\alpha$ , $\overline{\alpha}$ , $|\alpha|$ , decide whether $\alpha>\beta$ for distinct algebraic $\alpha$ and $\beta$ , and more.

For the remainder of this paper, we do not make explicit use of these methods, implicitly assuming they are used in the following procedures for deciding instances of the higher-dimensional Orbit Problem. Indeed, it should be clear from the algorithms given in the sequel that the we never leave the field of algebraic numbers. Although, the techniques of this paper do not in fact require the precision the effective manipulation of algebraic numbers provides: all our techniques are quite amenable to the finite-precision setting. So long as precision can be increased as needed, for our purposes a finite approximation of a number paired with an error bound more than suffices. Nonetheless, for the sake of clarity we opt to compute with algebraic numbers as discussed above.

As a consequence of these observations, we see that we have effective techniques for computing the eigenvalues and eigenvectors of rational matrices $A$ .

2.2. Linear recurrence sequences

We now consider basic aspects of linear recurrence sequences. For a rich treatment see [everest2003recurrence]. A linear recurrence sequence (LRS) over $\mathbb{Q}$ is an infinite sequence $\{u_{n}\}_{n=0}^{\infty}$ of terms in $\mathbb{Q}$ satisfying the recurrence relation

u_{n+d}=a_{d-1}u_{n+d-1}+\cdots+a_{0}u_{n}

where $a_{0},...,a_{d-1}\in\mathbb{Q}$ , with $a_{0}\neq 0$ and $u_{j}\neq 0$ for at least one $j$ in the range $0\leq j\leq m-1$ . We say such an LRS has order $d$ . We call the $a_{0},...,a_{d-1}$ the coefficients of the sequence $\{u_{n}\}$ and the initial terms of $\{u_{n}\}$ are $u_{0},...,u_{d-1}$ . The characteristic polynomial of the sequence $\{u_{n}\}$ is

x^{d}-a_{d-1}x^{d-1}-\cdots-a_{0}=\prod_{i=1}^{k}(x-\lambda_{i})^{m_{i}},

with the $\lambda_{1},...,\lambda_{k}$ called the characteristic roots of the sequence $\{u_{n}\}$ .

We call the sequence simple if $k=d$ , so $m_{1}=\cdots=m_{k}=1$ , and non-degenerate if $\lambda_{i}/\lambda_{j}$ is not a root of unity for any distinct $i,j$ . The study of arbitrary LRS can be reduced effectively to that of non-degenerate LRS by partitioning the original LRS into finitely many non-degenerate subsequences.

Given an LRS, define

A=\begin{pmatrix}a_{d-1}&a_{d-2}&\cdots&a_{1}&a_{0}\\ 1&0&\cdots&0&0\\ 0&1&\cdots&0&0\\ \vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&\cdots&1&0\end{pmatrix}

to be the companion matrix of the LRS, and take the vector $x\in\mathbb{Q}^{d}$ as $(u_{d-1},...,u_{0})^{T}$ of initial terms of the LRS. Then iteration of $A$ over $x$ acts as a “shift” on the entries of $x$ , shifting in the next term of the LRS and dropping the oldest term. In addition, the characteristic polynomial of the LRS is the characteristic polynomial of $A$ , and the characteristic roots of the LRS are the eigenvalues of $A$ .

2.3. Angles between flats

The algorithm presented in this paper requires the computation of angles between Euclidean subspaces. The subject of computing angles between Euclidean subspaces — also known as computing angles between flats — is a well-studied area and there are many effective and efficient techniques for doing so. See [BFb0062107, gunawan2005formula, bjorck1973numerical, jiang1996angles] for readable introductions to the subject, and a variety of effective techniques for computing angles between subspaces.

Let $U,W\subset\mathbb{R}^{d}$ be $k$ and $l$ dimensional subspaces, respectively, presented by orthogonal rational basis $\{u_{1},...,u_{k}\}$ and $\{w_{1},...,w_{l}\}$ . Note any linearly independent set of rational vectors can be effectively transformed into an orthogonal set of rational vectors via the Gram-Schmidt Process (before normalization). If instead it is desired that the basis be orthonormal, then the entries of the vectors will be algebraic after the renormalization step.

In this paper we are primarily concerned with determining the minimal angle $0\leq\theta_{1}\leq\pi/2$ between two subspaces, defined as the least angle between any pair of non-zero vectors from $U$ and $W$ . This is the first principle angle between subspaces, whose variational characterization is

(1)

\theta=\min\left\{\arccos\left(\frac{|\langle u,w\rangle|}{||u||~{}||w||}\right):u\in U,w\in W\right\},

where $\langle\cdot,\cdot\rangle$ is the standard inner (dot) product. We refer to the quantity expressed above in Equation 1 as the minimal angle between subspaces $U$ and $W$ . We always assume $U$ and $W$ have trivial intersection $U\cap W=\{0\}$ , so that the minimal angle is well-defined and non-zero.

It is well-known that the variational characterization of singular values implies a variational characterization of the angles between subspaces [golub2013matrix]. Indeed, as a consequence of the variational characterization of the singular value decomposition of a $m\times n$ matrix $A$ , we see that for $x\in\mathbb{C}^{m}$ , $y\in\mathbb{C}^{n}$ , then

\sigma_{1}=\max_{x\in\mathbb{C}^{m},y\in\mathbb{C}^{n}}\frac{|x^{*}Ay|}{||x||\ ||y||},

where $\sigma_{1}$ is the first singular value of $A$ . Dropping the $\arccos$ term in Equation 1, we obtain a new problem, namely maximizing the quantity

(2)

\sigma=\max\left\{\frac{|\langle u,w\rangle|}{||u||\ ||w||}:u\in U,w\in W\right\}.

Note then that $\sigma\leq 1$ . The pair of vectors $u,v$ maximizing the above quantity can be used to obtain the minimal angle between the subspaces $U$ and $W$ .

Combining the above observations, take matrices $X\in\mathbb{R}^{n\times k}$ and $Y\in\mathbb{R}^{n\times l}$ to have columns consisting of orthonormal bases $\{u_{1},...,u_{k}\}$ and $\{w_{1},...,w_{l}\}$ of $U$ and $W$ , respectively. The optimization problem in Equation 2 can then be written as

\sigma=\max_{x\in\mathbb{R}^{k},y\in\mathbb{R}^{l}}\frac{x^{\top}X^{\top}Yy}{||x||\ ||y||},

the solution to which is the largest singular value of $X^{\top}Y$ , by the variational characterization of the singular value decomposition. For more details and proof of the fact that cosines of principle angles come from the singular value decomposition see [knyazev2002principal], or refer to Chapter 5 of [meyer2023matrix] for more details on the variational characterization of singular value decomposition, and additional effective methods for computing the minimal angle between subspaces.

The value $\theta=\arccos\sigma$ is the minimal angle between subspaces $U,W$ . However, we need not learn $\theta$ — in fact $\theta$ need not be an algebraic number due to the $\arccos$ . Rather, it is sufficient for the purposes of this paper to instead work only with the cosine of the minimal angle, namely $\sigma$ . The algorithm presented in this paper only requires that we compare the minimal angle between subspaces, which we can accomplish equipped just with $\sigma$ : supposing $\sigma^{\prime}<\sigma$ , we know $\arccos(\sigma^{\prime})>\arccos(\sigma)$ , and hence the angle corresponding to $\sigma$ is smaller than the angle corresponding to $\sigma^{\prime}$ . In fact it is sufficient not to work with $\sigma$ , but $\sigma^{2}$ , which is the largest of the positive eigenvalues obtained in the singular value decomposition.

The singular value $\sigma$ is merely the square root of an eigenvalue computed in the singular value decomposition (or just the eigenvalue in the $\sigma^{2}$ case), and is hence algebraic, admitting a finite representation as per Section 2.1. As a consequence, we may effectively compare the minimal angles between different pairs of Euclidean subspaces described by rational basis vectors.

For the remainder of this paper, when we speak of angles between subspaces, we are implicitly speaking of the singular value $\sigma$ expressed above. Define the function

(3)

\Gamma(U,W)=\max_{x\in\mathbb{R}^{k},y\in\mathbb{R}^{l}}\frac{x^{\top}X^{\top}Yy}{||x||\ ||y||}=\sigma,

to be the function computing the minimal angle between subspaces $U$ and $W$ as above. Whether $\Gamma(U,W)$ returns $\sigma$ or $\sigma^{2}$ does not matter for the purposes of this paper — intuitively $\Gamma(U,W)$ computes the minimal angle between the subspaces ( $\arccos\Gamma$ is the actual minimal cosine angle), and everything remains computable: the columns of $U$ and $W$ will always be composed of rational (or possibly algebraic) numbers, and $\Gamma$ is effectively computable with such input.

2.4. Angle evolution, and the Orbit Problem in real projective space

Real projective space $\mathbb{R}\mathbb{P}^{d-1}$ is defined to be the quotient of $\mathbb{R}^{d}\setminus\{0\}$ by the equivalence relation $x\sim\alpha x$ where $\alpha\neq 0$ is real and $x\in\mathbb{R}^{d}$ . Clearly, it suffices to consider only vectors $x$ with Euclidean norm $||x||=1$ . Hence, geometrically, projective space is the space of lines through the origin, or alternatively the space obtained by identifying opposite points on the unit sphere $S^{d-1}$ . The projective linear group $PGL(\mathbb{R}^{d})=GL(\mathbb{R}^{d})/\alpha\cdot\text{Id}$ , $\alpha\neq 0$ , is the induced action of the general linear group on projective space. The induced action of non-invertible square matrices over elements of projective space is similarly defined.

Write $\mathbb{P}:\mathbb{R}^{d}\setminus\{0\}\rightarrow\mathbb{R}\mathbb{P}^{d-1}$ for the projection of elements from Euclidean space to real projective space, denoting elements of $\mathbb{R}\mathbb{P}^{d-1}$ by $p=\mathbb{P}x$ , where $0\neq x\in\mathbb{R}^{d}$ . A metric on $\mathbb{R}\mathbb{P}^{d-1}$ is given by

d(\mathbb{P}x,\mathbb{P}y)=\min\left(\left|\left|\frac{x}{||x||}-\frac{y}{||y||}\right|\right|,\left|\left|\frac{x}{||x||}-\frac{-y}{||y||}\right|\right|\right).

In the case that $W$ is a subspace of $\mathbb{R}^{d}$ , define

d(x,W\cap S^{d-1})=\inf_{y\in W\cap S^{d-1}}||x-y||=\min_{y\in W}d(\mathbb{P}x,\mathbb{P}y)=d(\mathbb{P}x,\mathbb{P}W).

Observe that these metrics are intimately related to the function $\Gamma$ defined in the previous subsection, as expressed in the following trivial, albiet useful lemma.

Lemma 3.

Let $U$ be a subspace of $\mathbb{Q}^{d}$ , $x\in\mathbb{Q}^{d}$ , and $A\in\mathbb{Q}^{d\times d}$ . Suppose $\Gamma(A^{n}x,U)\rightarrow 1$ as $n\rightarrow\infty$ . Then

\lim_{n\rightarrow\infty}d(\mathbb{P}A^{n}x,\mathbb{P}U)=0.

Proof.

We show the lemma for the case that $U$ is a one-dimensional subspace spanned by $y\in\mathbb{Q}^{d}$ , which is easily generalized to higher-dimensional $U$ .

We have

\Gamma(A^{n}x,U)=\frac{|A^{n}x\cdot y|}{||A^{n}x||\ ||y||},\ n\in\mathbb{N}.

Then clearly if

\frac{|A^{n}x\cdot y|}{||A^{n}x||\ ||y||}\rightarrow 1\text{ as }n\rightarrow\infty,

we have

d(\mathbb{P}A^{n}x,\mathbb{P}y)=\min\left(\left|\left|\frac{A^{n}x}{||A^{n}x||}-\frac{y}{||y||}\right|\right|,\left|\left|\frac{A^{n}x}{||A^{n}x||}-\frac{-y}{||y||}\right|\right|\right)\rightarrow 0\text{ as }n\rightarrow\infty.

This readily generalizes to the case when $U$ is of higher-dimension, as we sketch below.

The function $\Gamma$ returns the cosine of the minimal angle between two subspaces. Hence, recalling the variational characterization of the minimal angle in Equation 1 after dropping the $\arccos$ , we have

\Gamma(U,W)=\sigma=\min\left\{\left(\frac{|\langle u,w\rangle|}{||u||~{}||w||}\right):u\in U,w\in W\right\}.

So, abusing notation and letting $A^{n}x$ label the subspace spanned by $A^{n}x$ , we have

\Gamma(A^{n}x,W)=\sigma=\min\left\{\left(\frac{|\langle A^{n}x,w\rangle|}{||A^{n}x||~{}||w||}\right):w\in W\right\}.

Combining this with the fact that

d(\mathbb{P}A^{n}x,\mathbb{P}W)=\inf_{y\in W\cap S^{d-1}}||A^{n}x-y||=\min_{y\in W}d(\mathbb{P}A^{n}x,\mathbb{P}y),

the general statement is immediate upon considering the proof of the case when $U$ is one-dimensional, as shown above. ∎

Given $x\in\mathbb{R}^{d}$ , let $X$ denote the one-dimensional subspace (line) spanned by $x$ . Then, for a matrix $A\in\mathbb{R}^{d\times d}$ and subspace $U$ , we see that there exists $n\in\mathbb{N}$ such that $A^{n}x\in U$ if and only if $A^{n}X\subseteq U$ , where $A^{n}X$ is the subspace spanned by the vector $A^{n}x$ . Hence, given an instance $(A,x,U)$ of the Orbit Problem, it is immediate that there is an $n\in\mathbb{N}$ such that $A^{n}x\in U$ if and only if $\mathbb{P}A^{n}x\in\mathbb{P}U$ .

These considerations lead us to the simple yet powerful conclusion that by projecting the Orbit Problem into projective space we still retain the needed information to decide the Orbit Problem — indeed the formulations are equivalent — and yet in projective space the asymptotic behavior of orbits is far more manageable than in Euclidean space. In projective space, we have asymptotically stable (converging) orbits, while no-such behavior is expressed in Euclidean space. We abuse this structure in the sequel, applying the basic insight that if an orbit of a dynamical system monotonically converges to some set, and the attracting set is separated from the target set by some $\epsilon>0$ , then by running the system for finite time we can decide whether the orbit intersects the target set. See [colonius2014dynamical] for details on linear dynamical systems over real projective space.

We begin by stating the following well-known result about Jordan blocks, easily proved by induction.

Lemma 4 (Powers of Jordan blocks).

For a matrix $K=\text{Diag}(J_{1},...,J_{N})$ in Jordan canonical form, its $n$ th power is given by $J^{n}=\text{Diag}(J_{1}^{n},...,J_{N}^{n})$ . Where, for real eigenvalues $\lambda_{i}$ we have

J_{i}^{n}=\begin{pmatrix}\lambda_{i}^{n}&n\lambda_{i}^{n-1}&{n\choose 2}\lambda_{i}^{n-2}&\cdots&{n\choose D_{i}-1}\lambda_{i}^{n-(D_{i}-1)}\\ 0&\lambda_{i}^{n}&n\lambda_{i}^{n-1}&\cdots&{n\choose D_{i}-2}\lambda_{i}^{n-(D_{i}-2)}\\ \vdots&&\ddots&\ddots&\vdots\\ 0&0&\cdots&\lambda_{i}^{n}&n\lambda_{i}^{n-1}\\ 0&0&\cdots&0&\lambda_{i}^{n}\end{pmatrix}

where $D_{i}$ is the dimension of the block $J_{i}$ , and ${n\choose k}=0$ if $n<k$ . And when $J_{i}$ is a Jordan block associated with a complex conjugate eigenvalue pair expressed in the real canonical form as in Lemma 1, letting

R_{i}=\begin{pmatrix}\alpha_{i}&\beta_{i}\\ -\beta_{i}&\alpha_{i}\end{pmatrix},

we have

J_{i}^{n}=\begin{pmatrix}R_{i}^{n}&nR_{i}^{n-1}&{n\choose 2}R_{i}^{n-2}&\cdots&{n\choose D_{i}-1}R_{i}^{n-(D_{i}-1)}\\ 0&R_{i}^{n}&nR_{i}^{n-1}&\cdots&{n\choose D_{i}-2}R_{i}^{n-(D_{i}-2)}\\ \vdots&&\ddots&\ddots&\vdots\\ 0&0&\cdots&R_{i}^{n}&nR_{i}^{n-1}\\ 0&0&\cdots&0&R_{i}^{n}\end{pmatrix}

with $J_{i}$ of dimension $2D_{i}$ .

Note that that when $\lambda=\alpha\pm i\beta$ is complex $|\lambda|=\sqrt{\alpha^{2}+\beta^{2}}$ , hence with $\theta\in[0,2\pi)$ determined by $\cos\theta=\alpha/\sqrt{\alpha^{2}+\beta^{2}}$ , we can rewrite the matrix $R$ above as

|\lambda|R\text{ with }R=R_{i}=\begin{pmatrix}\cos\theta&\sin\theta\\ -\sin\theta&\cos\theta\end{pmatrix}.

Hence $R$ describes a rotation by the angle $\theta$ followed by multiplication by $|\lambda|$ . Letting $H$ denote the $2D_{i}$ dimensional nilpotent matrix of ones just above the principle diagonal, for Jordan block $J_{i}$ associated with a complex conjugate eigenvalue pair, we have

(4)

J_{i}^{n}x=\sum_{i=0}^{D_{i}-2}{n\choose i}|\lambda|^{n-i}\tilde{R}^{n-i}H^{i}x

where $\tilde{R}$ is the block diagonal matrix with blocks $R$ , and $n\geq D_{i}-2$ .

The following lemmas establish the asymptotic behavior of a non-zero vector under iteration of Jordan blocks. Tiwari and Braverman in [tiwari2004termination, braverman2006termination] touch upon a similar idea, although the outcome and use is different.

We also note that the following Lemmas 5, 6, and 7 are essentially folklore in dynamical systems; the asymptotic behavior of real projective transformations is well understood — see e.g. [colonius2014dynamical, ayala2014topological, KUIPER197613, he2017topological] which all contain relevant background and results related to the Lemmas below. However, we write the following Lemmas in a way that corresponds their computable nature, thereby translating them into a form suitable for the problem this paper is attacking.

Lemma 5.

Let $J_{i}$ label a Jordan block of form

J_{i}=\begin{pmatrix}D&I&0&\cdots&0\\ 0&D&I&\cdots&0\\ \vdots&&\ddots&\ddots\\ 0&0&\cdots&D&I\\ 0&0&\cdots&&D\end{pmatrix}

with respect to eigenvalue $\lambda_{i}$ , $|\lambda_{i}|>0$ , where $D=\lambda_{i}$ and $I=1$ if $\lambda_{i}$ is real. When $\lambda_{i},\overline{\lambda_{i}}$ are complex set $D=\begin{pmatrix}\alpha_{i}&\beta_{i}\\ -\beta_{i}&\alpha_{i}\end{pmatrix}$ , and let $I$ be the two-by-two identity matrix.

Take $J_{i}$ to have dimension $\delta$ , and restrict its action to $\mathbb{Q}^{\delta}$ , letting $x\neq 0\in\mathbb{Q}^{\delta}$ . Let $P$ denote the one (two) dimensional subspace invariant under $J_{i}$ when $\lambda_{i}$ is real (complex). Then

\lim_{n\rightarrow\infty}d(\mathbb{P}J_{i}^{n}x,\mathbb{P}P)=0,

where $d$ is the metric for real projective space defined previously.

Proof.

By Lemma 3, it suffices to show that $\Gamma(J_{i}^{n}x,P)\rightarrow 1$ as $n\rightarrow\infty$ . Recall that $\arccos(\Gamma(J_{i}^{n}x,P))$ is the (minimal) cosine angle between subspaces $J_{i}^{n}x$ and $P$ .

Using Lemma 4 we have

J_{i}^{n}=\begin{pmatrix}D^{n}&nD^{n-1}&{n\choose 2}D^{n-2}&\cdots&{n\choose\delta-1}D^{n-(\delta-1)}\\ 0&D^{n}&nD^{n-1}&\cdots&{n\choose\delta-2}D^{n-(\delta-2)}\\ \vdots&&\ddots&\ddots&\vdots\\ 0&0&\cdots&D^{n}&nD^{n-1}\\ 0&0&\cdots&0&D^{n}\end{pmatrix}

and thus

J_{i}^{n}x=\begin{pmatrix}D^{n}x_{1}+\cdots+{n\choose\delta-1}D^{n-(\delta-1)}x_{\delta}\\ D^{n}x_{2}+\cdots+{n\choose\delta-2}D^{n-(\delta-2)}x_{\delta}\\ \vdots\\ D^{n}x_{\delta-1}+nD^{n-1}x_{\delta}\\ D^{n}x_{\delta}\end{pmatrix},

where $x=(x_{1},...,x_{\delta})^{\top}$ , and the components $x_{i}$ are either $1\times 1$ or $2\times 1$ depending on whether $D$ is $1\times 1$ or $2\times 2$ .

Dividing the components of $J_{i}^{n}x$ by $(J_{i}^{n}x)_{1}$ (the first component), and taking a limit we see that

\lim_{n\rightarrow\infty}\left|\frac{J_{i}^{n}(x)}{(J_{i}^{n}x)_{1}}\right|=(1,0,...,0)^{\top},

as a consequence of the fact that the first term in the vector (corresponding to the subspace $P$ ) has the highest polynomial growth. This limit also indicates that when $D$ is $2\times 2$ , the vector approaches a two-dimensional subspace.

It is then immediate that

\Gamma(J_{i}^{n}x,P)\rightarrow 1\text{ as }n\rightarrow\infty,

from the definition of $\Gamma$ , and the statement follows. ∎

Remark 1.

The statement of Lemma 5 did not require $|\lambda_{i}|>1$ , because it holds if $|\lambda_{i}|=1$ or $|\lambda_{i}|<1$ : when $|\lambda_{i}|=1$ , the rate of growth of the first term of the vector is still higher than the others, and when $|\lambda_{i}|<1$ , the rate of decline of the first component of the vector is slowest, and hence the statement still holds. Furthermore, the statement continues to hold independent of whether $\lambda_{i}$ is positive or negative — in all cases the angle between the vector and the one or two-dimensional space invariant under $J_{i}$ approaches zero. Finally, we remark that so long as $x$ has at least one non-zero component the lemma holds: after a few iterations all the components of $x$ will be non-zero and the asymptotics kick in accordingly.

With Lemma 5 in hand, we obtain the following set of lemmas giving asymptotic behavior of orbits when there are multiple Jordan blocks.

The following lemma states that when there are many Jordan blocks all associated with eigenvectors of the same modulus, the largest block(s) dictate the asymptotic behavior of orbits.

Lemma 6.

Let $J=\text{Diag}(J_{1},...,J_{N})$ be a block diagonal matrix of Jordan blocks, where each $J_{i}$ is associated with a real or complex eigenvalues $\lambda_{i}$ , such that $|\lambda_{1}|=|\lambda_{2}|=\cdots=|\lambda_{N}|$ . Let $D_{i}$ denote the dimension of $J_{i}$ when $\lambda_{i}$ is real, and half the dimension of $J_{i}$ when $\lambda_{i}$ is complex ( $J_{i}$ is the block associated to a complex conjugate pair). Suppose $D_{1}=\cdots=D_{T}>D_{T+1}\geq\cdots\geq D_{N}$ , with $T\leq N$ .

Then there is a subspace $P$ of dimension $k$ , $1\leq k\leq 2T$ , determined by the eigenvalues and Jordan blocks, such that

\lim_{n\rightarrow\infty}d(\mathbb{P}J^{n}x,\mathbb{P}P)=0

for any $x$ with at least one non-zero component with respect to every Jordan block composing $J$ .

Proof.

We break this proof into two cases, both of which can be easily combined to obtain the statement. The first case is when $T=1$ , and the second case is when $T=N$ .

Case 1: T = 1. When $T=1$ we have a single Jordan block $J_{1}$ such that $D_{1}>D_{2}\geq\cdots\geq D_{N}$ . Hence, the iteration of $J$ induces the highest polynomial growth in the components corresponding to the one or two-dimensional invariant subspace under the first Jordan block $J_{1}$ , cf. the solution formula Lemma 4 above.

Following the proof of Lemma 5, it is clear that the limit of $|J^{n}x/(J^{n}x)_{1}|$ as $n\rightarrow\infty$ gives $(1,0,...,0)^{\top}$ . Thus, all elements $\mathbb{P}x$ where $x$ has at least one non-zero component with respect to $J_{1}$ , get taken to $\mathbb{P}P$ in the induced dynamical system in real projective space, where $P$ is either one or two-dimensional depending on whether $J_{1}$ is associated with a real or complex-conjugate pair of eigenvalues, cf. Lemma 5.

Case 2: T = N. In the case, all the $D_{i}$ are equal, and hence no subset of larger Jordan blocks composing $J$ dictate the asymptotic behavior. Let $P_{i}$ denote the one or two-dimensional invariant subspace under $J_{i}$ , depending on whether $\lambda_{i}$ is real or complex, respectively. If $x_{J_{i}}$ denotes the components of $x$ with respect to block $J_{i}$ , then we know by Lemma 5 that iteration of $J_{i}$ over $x_{J_{i}}$ brings $\mathbb{P}x_{J_{i}}$ to $\mathbb{P}P_{i}$ as $n\rightarrow\infty$ .

We now note that adding many Jordan blocks of the same dimension, all corresponding to the same eigenvalue does not increase the dimension of $P$ . This follows from the argument used in proving Lemma 5. Namely, suppose $J$ is a block diagonal matrix composed of $N$ repeated Jordan blocks $J_{i}$ , and let $x$ be some vector such that $x_{J_{i}}\neq 0$ for each $J_{i}$ . Then take the limit of $|J^{n}x/(J^{n}x)_{j}|$ as $n\rightarrow\infty$ , where $|(J^{n}x)_{j}|$ is the largest component of $J^{n}x$ . Then the limiting vector will have form $(c_{1},0,...,c_{i},0,...,1,0,...,c_{N},0,...0)^{\top}$ , where $0<c_{i}\leq 1$ , and the $1$ is in the $j$ th position, and the values of the $c_{i}$ can be effectively computed with the coordinates of $x$ . Given the above, following the argument of Lemma 5 further, we see that when the repeated Jordan blocks all correspond to a real eigenvalue, then the invariant subspace $P$ orbits approach after projecting to $\mathbb{R}\mathbb{P}^{d-1}$ is of dimension one, and when the repeated Jordan blocks (in real Jordan canonical form) correspond to a complex conjugate pair, $P$ is of dimension two: its elements can easily be written as a linear combination of two vectors.

Hence, in the $T=N$ case, without a loss of generality suppose each Jordan block $J_{i}$ and invariant space $P_{i}$ is paired with a distinct eigenvalue or complex conjugate pair (up to a $-1$ factor), with no Jordan block repeated more than once. Then, as an immediate consequence of the block-diagonal structure of $J$ , we have that

P=\bigoplus_{i}P_{i},

where the $P_{i}$ are the one or two-dimensional subspaces invariant under corresponding collections of Jordan blocks associated with the same eigenvalue(s), and $\oplus$ is the direct sum. And, bringing in Lemma 5, we have

\lim_{n\rightarrow\infty}d(\mathbb{P}J^{n}x,\mathbb{P}P)=0.

In addition, following from the above argument it is clear that $\text{dim}(P)=\sum_{i}\text{dim}(P_{i})$ , and hence $\text{dim}(P)=k$ with $1\leq k\leq N=T$ .

Collecting the arguments establishing Cases 1 and 2 above, the statement follows. ∎

Remark 2.

In Case 2 of the proof of Lemma 6, we underscore the fact that when there are many Jordan blocks of the same size but that do not correspond to a repeated eigenvalue, but rather the same eigenvalue up to a factor of $-1$ , then the orbit $\{\mathbb{P}J^{2n}x\}$ will converge to one subet $\mathbb{P}P$ in projective space, while the orbit $\{\mathbb{P}J^{2n+1}x\}$ will converge to another subset $\mathbb{P}P^{\prime}$ of projective space. This follows from the fact that certain components will change sign with every iteration. However, for the purpose this paper and the proofs of Theorems 1 and 3, we assume non-degeneracy so no two distinct eigenvalues of $A$ will be equal up to a $-1$ factor, and hence this aspect can be safely ignored.

The following lemma establishes the case when the Jordan blocks composing a block-diagonal Jordan matrix $J$ correspond to eigenvalues of different magnitude.

Lemma 7.

Let $J=\text{Diag}(J_{1},...,J_{N})$ be a block diagonal matrix, where each Jordan block $J_{i}$ is associated with a distinct real eigenvalue or complex conjugate pair. Moreover, suppose $|\lambda_{1}|>|\lambda_{2}|>\cdots>|\lambda_{N}|$ . Then there is a subspace $P$ of dimension one (two) whenever $\lambda_{1}$ is real (complex) invariant under $J_{1}$ , such that

\lim_{n\rightarrow\infty}d(\mathbb{P}J^{n}x,\mathbb{P}P)=0,

where $d$ is the metric over real projective space, and $x$ has at least one non-zero component with respect to block $J_{1}$ .

Proof.

It suffices to show that the Jordan block $J_{1}$ associated with the largest eigenvalue dominates the dynamics asymptotically, independent of the sizes of the $J_{i}$ .

That $J_{1}$ dominates the dynamics in the limit is a trivial consequence of the fact that

\lim_{n\rightarrow\infty}\frac{\sum_{i=0}^{l}p_{i}(n)|\mu|^{n-i}}{|\lambda|^{n}}=0,\text{ whenever }|\lambda|>|\mu|\geq 0,

the $p_{i}(n)$ are polynomials, $l$ is a positive integer, and $n-i=0$ when $i>n$ .

Pairing this fact with the statement and proofs of Lemma 5 and Lemma 6, it is clear that $|J^{n}x|/|(J^{n}x)_{1}|\rightarrow(1,0,...0)^{\top}$ as $n\rightarrow\infty$ , where $(J^{n}_{x})_{1}$ labels the first component of $J^{n}x$ . And thus, orbits under iteration of $J$ induce a sequence in real projective space converging to $\mathbb{P}P$ where $P$ is the one or two dimensional subspace invariant under $J_{1}$ , following the arguments of the proofs in Lemmas 5 and 6. ∎

With these lemmas in hand, we move to prove the theorems stated in the introduction of this paper.

3. Proofs of Theorems

We begin with the proof of Theorem 1.

Proof of Theorem 1.

Consider the following algorithm deciding instances $(A,x,U)$ of the Orbit Problem satisfying the assumptions of the Theorem. If the given instance is degenerate, reduce it to a finite set of non-degenerate instances and solve each independently using the following algorithm.

(1)

By assumption, the target subspace $U$ can be written as

$U=S_{1}\cup\cdots\cup S_{l}\cup H_{1}\cup\cdots\cdots H_{t},$

where the $S_{i}$ are subspaces of $\mathbb{Q}^{d}$ represented by sets of rational basis vectors. Similarly, the $H_{t}$ are convex polytopes, and hence are by definition intersections of half-spaces. Then we assume without loss of generality that the $H_{i}$ are presented as collections of hyperplanes (described by rational basis vectors) along with an inequality. We work with this presentation of $U$ as a set of rational bases.
(2)

Using Lemma 2, transform $A$ into its Jordan canonical form $J=J(A)$ , and use the invertible matrices $Q$ used in obtaining the Jordan canonical form to transfer $x$ and $U$ to the same basis. We now suppose $x$ and $U$ are presented in the new basis without updating their label.

During the computation of $J$ , collect the eigenvalues $\lambda_{1},...,\lambda_{m}$ of $A$ , organized so that $|\lambda_{1}|\geq\cdots\geq|\lambda_{m}|$ . Suppose $\lambda_{1},...,\lambda_{r}$ , $r\leq m$ are the eigenvalues of maximal modulus. Every eigenvalue $\lambda_{i}$ will correspond to one or more Jordan blocks composing $J$ , where the number of Jordan blocks associated with an eigenvalue is determined by the algebraic and geometric multiplicity of the eigenvalue, along with the elementary divisors of the characteristic polynomial of $A$ .

Let $P_{A}$ label the minimal polynomial of $A$ . Then note that the roots of $P_{A}$ are the eigenvalues of $A$ , and the algebraic multiplicity of each root in $P_{A}$ is the dimension of the largest Jordan block associated with the respective root in the Jordan canonical form of $A$ . This is a trivial consequence of the definition of the minimal polynomial of $A$ . However here we use the real Jordan canonical form, and thus the complex roots of the minimal polynomial will correspond to Jordan blocks of dimension twice their algebraic multiplicity.

Following Lemma 7, the Jordan blocks associated with the $r$ eigenvalues of largest modulus dictate the asymptotic behavior of the orbit $\{\mathbb{P}J^{n}x\}_{n=0}^{\infty}$ . More specifically, by Lemma 6, of those $r$ dominating eigenvalues, those corresponding to the Jordan blocks of largest dimension dominate the asymptotic behavior (up to a possible rescaling of $1/2$ for the complex eigenvalues due to the real Jordan canonical form). Precisely, by Lemma 6 the orbit $\{\mathbb{P}J^{n}x\}_{n=0}^{\infty}$ converges to a subset $\mathbb{P}P$ determined by the largest eigenvalues, and of those the Jordan blocks of highest dimension corresponding to such eigenvalues.

By assumption of Theorem 1, there are $p\leq r$ distinct eigenvalues of maximal modulus with highest algebraic multiplicity in $P_{A}$ . As such, by Lemma 6 and the non-degeneracy assumption, there is a single subspace $P$ of dimension $p$ such that $d(\mathbb{P}J^{n}x,\mathbb{P}P)\rightarrow 0$ as $n\rightarrow\infty$ whenever $x$ has non-zero components with respect to the Jordan blocks composing $J$ — which is ensured by the non-triviality assumption placed on $x$ in the theorem statement.

Return the subspace $P$ .
(3)

The subspace $P$ is taken to be presented as a set of rational basis vectors: it is trivially computable given $x$ , along with determining the position of the Jordan blocks of largest dimension corresponding to the eigenvalues of maximal modulus in the matrix $J$ . This fact can be readily seen through the proofs of Lemmas 5 and 6.

Recall $U$ is composed of subspaces $S_{1},...,S_{l}$ , and polytopes $H_{1},...,H_{t}$ represented using hyperplanes (subspaces), all taken to be presented with rational bases. Compute $P\cap U$ , which can be effectively computed by reducing to computation of the subcases $P\cap S_{i}$ and $P\cap H_{i}$ , where $P\cap H_{i}$ is further reduced to computing the intersection of $P$ with the half-spaces composing $H_{i}$ . If $P\cap U=\{0\}$ , continue. Otherwise halt and terminate execution of the algorithm; this instance of the Orbit Problem cannot be decided by this algorithm.
(4)

We have $P\cap U=\{0\}$ . Then using the minimal angle algorithm following from the singular value decomposition as discussed in Section 2.3, compute the quantity

$\sigma_{max}=\max\left\{\Gamma(P,Y_{i}):Y_{i}\in\{S_{1},...,S_{l},H_{1},...,H_{t}\}\right\},$

where the value $\Gamma(P,H_{i})$ corresponds to the cosine of the minimal angle between $P$ and each of the hyperplanes describing a half-space composing $H_{i}$ .

Since $P\cap U=\{0\}$ , it follows that $\sigma_{max}<1$ . Let $\zeta=1-\sigma_{max}$ . Then there must exist an $\epsilon=\epsilon(\zeta)>0$ such that $d(\mathbb{P}P,\mathbb{P}U)=\epsilon$ .
(5)

Begin iterating $J$ over $x$ . Every iteration, compute $\Gamma(J^{n}x,P)$ , and check if $J^{n}x\in U$ . If there is an $n$ such that $J^{n}x\in U$ , halt: the orbit intersects the target set in this instance of the Orbit Problem.

Otherwise, continue until $\Gamma(J^{n}x,P)>\sigma_{max}$ : it is an immediate consequence of the proofs of Lemmas 5, 6, 7 that $\Gamma(J^{n}x,P)\rightarrow 1$ monotonically as $n\rightarrow\infty$ , and hence by Lemma 3, $d(\mathbb{P}J^{n}x,\mathbb{P}P)\rightarrow 0$ as $n\rightarrow\infty$ . But there is an $\epsilon>0$ such that $d(\mathbb{P}P,\mathbb{P}U)=\epsilon$ , and hence there is an $N\in\mathbb{N}$ such that for all $n\geq N$ , $\mathbb{P}J^{n}x\not\in\mathbb{P}U$ . Hence, there is an $N^{\prime}=N^{\prime}(\sigma_{max})\in\mathbb{N}$ such that $J^{n}x\not\in U$ for all $n>N^{\prime}$ .

∎

We now prove Theorem 2.

Proof of Theorem 2.

Assume $(A,x,U)$ is a non-degenerate instance of the higher-dimensional Orbit Problem with $A$ nonsingular, and $U$ of dimension one, as in the conditions of the theorem. Immediate from Theorem 1 and its proof, if the orbit $\{A^{n}x\}_{n=0}^{\infty}$ approaches a subspace $W$ after projecting onto real projective space, and $W\cap U=\{0\}$ , then the instance is decidable.

Note the condition that $\text{dim}(U)=1$ enforces either $U\subseteq W$ or $U\cap W=\{0\}$ . Hence we are left to consider the case when $U\subseteq W$ . $A$ is taken to be nonsingular. As a consequence, if $x\not\in W$ , $A^{n}x\not\in W$ for all $n$ , and hence $A^{n}x\not\in U$ for all $n$ since $W$ is invariant under $A$ . If $x\in W$ , then consider the reduced, lower-dimensional system $A:W\rightarrow W$ and induct until one of the above conditions is met or $\text{dim}(W)=1$ . ∎

The dynamical and geometric nature of the techniques used here provides simple proofs of other old results, such as the following concerning the decidability of Skolem’s Problem in instances of a dominating real root. We use the proof of this result to then aid in proving Theorem 3.

Proposition 4.

Let $\{u_{n}\}_{n=0}^{\infty}$ be an order $d$ non-degenerate linear recurrence sequence with distinct characteristic roots $\lambda_{1},...,\lambda_{m}$ , $m\leq d$ , ordered so that $|\lambda_{1}|\geq\cdots\geq|\lambda_{m}|$ . Suppose the initial terms form a vector $x\in\mathbb{Q}^{d}$ non-trivial with respect to the companion matrix $A$ of the sequence. Then, if $|\lambda_{1}|>|\lambda_{2}|$ , it is decidable whether the sequence has a zero term.

Proof.

Let $A\in\mathbb{Q}^{d\times d}$ label the companion matrix of the LRS $\{u_{n}\}$ . Let $x$ denote the non-trivial $d$ dimensional vector of initial terms of the LRS. Then, as noted in Section 2.2, iteration of $A$ over $x$ “shifts” the elements of the LRS through $x$ , so that when $x=(u_{d},u_{d-1},...,u_{2},u_{1})^{\top}$ , $Ax=(u_{d+1},u_{d},...,u_{3},u_{2})^{\top}$ .

Let $E_{1},E_{2},...,E_{d}$ denote the $d$ coordinate subspaces of $\mathbb{R}^{d}$ where the elements of $E_{i}$ , $i=1,...,d$ , have a $0$ in their $i$ th component. Then $A^{j}E_{1}=E_{j+1}$ , $j=0,...,d-1$ . Moreover, $E_{1}\cap E_{2}\cap\cdots\cap E_{d}=\{0\}$ . Hence, for any $y\neq 0\in\mathbb{Q}^{d}$ , $y\not\in E_{1}\cap\cdots\cap E_{d}$ , implying there is an $E_{i}$ such that $y\not\in E_{i}$ .

The LRS $\{u_{n}\}$ is taken to have a single dominating real root $\lambda_{1}$ . Hence $A$ has a single dominating real root. Taking $A$ to its Jordan canonical form and back via change of basis, we see by consequence of Lemmas 5, 6, 7 that there is a line spanned by some $y\neq 0\in\mathbb{Q}^{d}$ , such that $d(\mathbb{P}A^{n}x,\mathbb{P}y)\rightarrow 0$ as $n\rightarrow\infty$ . But there is an $E_{i}$ such that $y\not\in E_{i}$ , i.e. $\text{span}(y)\cap E_{i}=\{0\}$ . Then, appealing to the statement and proof of Theorem 1, it can be decided in a finite number of iterations of $A$ over $x$ whether the LRS has a zero. ∎

Finally, we prove Theorem 3.

Proof of Theorem 3.

The proof is a trivial extension of the proof of Proposition 4 with Lemma 6. Let $A\in\mathbb{Q}^{d\times d}$ label the companion matrix of the LRS, and let $P_{A}$ denote the minimal polynomial of $A$ .

In Proposition 4, there is a single dominating characteristic root, while in the case of Theorem 3 there are $r\geq 1$ characteristic roots of maximal modulus. However, by assumption, there is a single real root, $\lambda_{1}$ , whose algebraic multiplicity in the minimal polynomial $P_{A}$ of $A$ is larger than the algebraic multiplicity of the other dominating roots. This implies that, of the $r$ dominating roots, the Jordan block of largest dimension which is associated with $\lambda_{1}$ dictates the asymptotic behavior. Then by Lemma 6, after projecting to real projective space, orbits approach the projection of a line, and hence must eventually be bounded away from at least one of the $E_{i}$ after a finite number of iterations, where the $E_{i}$ are as defined in the proof of Proposition 4. ∎

4. Concluding remarks

This paper does not come close to reaching the limits of the techniques presented here, nor do we apply this machinery to every problem vulnerable to our methods. Indeed, the results of this paper indicate that deepening our understanding of dynamical systems in real projective space $\mathbb{R}\mathbb{P}^{d-1}$ with maps induced by matrices in $GL(d,\mathbb{R})$ provides insight into the higher-dimensional Orbit Problem, and more generally termination problems, by way of the arguments given in this paper. And, possibly, the geometric and dynamical mechanisms penetrating the Orbit Problem may link to the algebraic and number-theoretic structures traditionally employed in nontrivial ways. In particular, we believe that mixing the general geometric and dynamical structures provided here with the finer algebraic and number-theoretic tools traditionally used can lead to additional breakthroughs in this area.

To this end, we identify a number of different directions that may be profitable to explore, given the methods presented here. We begin by considering the following: every linear system in $\mathbb{R}^{d}$ induces a dynamical system on the set of $k$ -dimensional subspaces of $\mathbb{R}^{d}$ — the Grassmannians. A possibly rewarding next step could be to generalize the results of this paper further by studying such induced dynamics on Grassmannians, where a more detailed understanding of the dynamics of this kind can result in stronger results toward the Orbit Problem.

Next, we reflect that the essential reason why we require that $W\cap U=\{0\}$ in order to have decidability in Theorem 1, follows from the main observation this paper makes, which is that orbits approach the projection of $W$ in real projective space, i.e. the angles between orbits and $W$ monotonically approaches zero, and as a consequence whenever $W\cap U=\{0\}$ orbits can be bounded away from $U$ in finite time. Fortunately, if $A,x$ , and $U$ are randomly generated by drawing entries at random from some finite set, when $A$ has $r$ dominating roots, $p$ of which have highest algebraic multiplicity in the minimal polynomial of $A$ , and $U$ dimension $\leq d-p$ , then with overwhelmingly high probability $W\cap U=\{0\}$ . Thus, arguing from this intuitive level, Theorem 1 decides a “large” class of instances.

Nonetheless, difficulty arises when the intersection of $W$ and $U$ is nontrivial. In such cases, the orbits approach $U$ , or periodically get arbitrarily close to $U$ . But this is when the basic argument employed in this paper can no longer be exploited. To overcome this barrier, finer methods must be used. In particular, if it is better understood how induced orbits in projective space approach attracting sets, then this will immediately translate to deciding the Orbit Problem. Fortunately, the attracting sets in projective space have a nice structure: they are either fixed points under the induced map or contained in closed loops. To this end, possibly more can be said in the continuous case, where we consider flows.

Although, we remark that when we have containment $W\subseteq U$ , we do obtain something close to decidability, since orbits $\{\mathbb{P}A^{n}x\}_{n=0}^{\infty}$ approach $\mathbb{P}U$ as $n\rightarrow\infty$ , leading to decidability when we work with notions such as “pseudo-orbits,” already explored in literature [akshay2022robustness, CostaPseudoReachabilityDiag, dcostaPseudoSkolem]. As such, when $W\subseteq U$ we obtain “pseudo-decidability:” orbits converge toward the target set. And, following the proof of Theorem 2, if $A$ is nonsingular then we obtain decidability when $x\not\in W$ . Indeed, in the context of program verification and the Termination Problem, this asymptotic behavior indicates such orbits could enter the error state with sufficiently large perturbation. To this end, perhaps progress can be made on deciding cases in which $W\subseteq U$ .

Deciding Subspace reachability problems with application to Skolem’s Problem

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

Theorem 1.

Theorem 2.

Theorem 3.

1.1. Techniques

1.2. Related Work

Acknowledgements

2. Preliminaries

Lemma 1 (Real Jordan Canonical Form).

Lemma 2.

Proof.

2.1. Algebraic numbers and computing eigenvalues, eigenvectors

2.2. Linear recurrence sequences

2.3. Angles between flats

2.4. Angle evolution, and the Orbit Problem in real projective space

Lemma 3.

Proof.

Lemma 4 (Powers of Jordan blocks).

Lemma 5.

Proof.

Remark 1.

Lemma 6.

Proof.

Remark 2.

Lemma 7.

Proof.

3. Proofs of Theorems

Proof of Theorem 1.

Proof of Theorem 2.

Proposition 4.

Proof.

Proof of Theorem 3.

4. Concluding remarks

References