This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Equivalences for Linearizations of Matrix Polynomials

Robert M. Corless111[email protected] David R. Cheriton School of Computer Science
University of Waterloo ,Canada
Leili Rafiee Sevyeri222[email protected] David R. Cheriton School of Computer Science
University of Waterloo ,Canada
B. David Saunders333[email protected] Department of Computer and Information Sciences University of Delaware
Abstract

One useful standard method to compute eigenvalues of matrix polynomials 𝑷(z)n×n[z]\boldsymbol{P}(z)\in\mathbb{C}^{n\times n}[z] of degree at most \ell in zz (denoted of grade \ell, for short) is to first transform 𝑷(z)\boldsymbol{P}(z) to an equivalent linear matrix polynomial 𝑳(z)=z𝑩𝑨\boldsymbol{L}(z)=z\boldsymbol{B}-\boldsymbol{A}, called a companion pencil, where 𝑨\boldsymbol{A} and 𝑩\boldsymbol{B} are usually of larger dimension than 𝑷(z)\boldsymbol{P}(z) but 𝑳(z)\boldsymbol{L}(z) is now only of grade 11 in zz. The eigenvalues and eigenvectors of 𝑳(z)\boldsymbol{L}(z) can be computed numerically by, for instance, the QZ algorithm. The eigenvectors of 𝑷(z)\boldsymbol{P}(z), including those for infinite eigenvalues, can also be recovered from eigenvectors of 𝑳(z)\boldsymbol{L}(z) if 𝑳(z)\boldsymbol{L}(z) is what is called a “strong linearization” of 𝑷(z)\boldsymbol{P}(z). In this paper we show how to use algorithms for computing the Hermite Normal Form of a companion matrix for a scalar polynomial to direct the discovery of unimodular matrix polynomial cofactors 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) which, via the equation 𝑬(z)𝑳(z)𝑭(z)=diag(𝑷(z),𝑰n,,𝑰n)\boldsymbol{E}(z)\boldsymbol{L}(z)\boldsymbol{F}(z)=\mathrm{diag}{(}\boldsymbol{P}(z),\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n}), explicitly show the equivalence of 𝑷(z)\boldsymbol{P}(z) and 𝑳(z)\boldsymbol{L}(z). By this method we give new explicit constructions for several linearizations using different polynomial bases. We contrast these new unimodular pairs with those constructed by strict equivalence, some of which are also new to this paper. We discuss the limitations of this experimental, computational discovery method of finding unimodular cofactors.

Keywords— linearization, matrix polynomials, polynomial bases, equivalence, Hermite normal form, Smith normal form

2010 Mathematics Subject Classification— 65F15, 15A22, 65D05

1 Introduction

Given a field 𝔽\mathbb{F} and a set of polynomials ϕk(z)𝔽[z]\phi_{k}(z)\in\mathbb{F}[z] for 0k0\leq k\leq\ell that define a basis for polynomials of grade \ell (“grade” is short for “degree at most”) then a square matrix polynomial 𝑷(z)\boldsymbol{P}(z) is an element of 𝔽n×n[x]\mathbb{F}^{n\times n}[x] which we can write as

𝑷(z)=k=0𝑨kϕk(z).\boldsymbol{P}(z)=\sum_{k=0}^{\ell}\boldsymbol{A}_{k}\phi_{k}(z)\>.

The matrix coefficients 𝑨k\boldsymbol{A}_{k} are elements of 𝔽n×n\mathbb{F}^{n\times n}. For concrete exposition, take 𝔽\mathbb{F} to be the field of complex numbers \mathbb{C}. The case when the “leading coefficient” is singular—meaning the matrix coefficient of zz^{\ell}, once the polynomial is expressed in the monomial basis—can also be of special interest. Normally, only with the case of regular square matrix polynomials, for which there exists some z𝔽z^{*}\in\mathbb{F} with det𝑷(z)0\det\boldsymbol{P}(z^{*})\neq 0 is considered. Although our intermediate results require certain nonsingularity conditions, our results using strict equivalence are ultimately valid even in the case when the determinant of 𝑷(z)\boldsymbol{P}(z) is identically zero.

Matrix polynomials are of significant classical and current interest: see the surveys Güttel and Tisseur (2017) and Mackey et al. (2015) for more information about their theory and applications. In this present paper we use Maple to make small example computations in order to discover and prove certain facts about one method for finding eigenvalues of matrix polynomials, namely linearization, which means finding an equivalent grade 11 matrix polynomial 𝑳(z)\boldsymbol{L}(z) when given a higher-grade matrix polynomial 𝑷(z)\boldsymbol{P}(z) to start.

The paper Amiraslani et al. (2008) was the first to systematically study matrix polynomials in alternative polynomial bases. See Cachadina et al. (2020) for a more up-to-date treatment.

2 Definitions and Notation

A companion pencil 𝑳(z)=z𝑩𝑨\boldsymbol{L}(z)=z\boldsymbol{B}-\boldsymbol{A} for a matrix polynomial 𝑷(z)\boldsymbol{P}(z) has the property that det𝑳(z)=αdet𝑷(z)\det\boldsymbol{L}(z)=\alpha\det\boldsymbol{P}(z) for some nonzero α𝔽\alpha\in\mathbb{F}. This means that the eigenvalues of the companion pencil are the eigenvalues of the matrix polynomial.

A linearization 𝑳(z)\boldsymbol{L}(z) of a matrix polynomial 𝑷(z)\boldsymbol{P}(z) has a stronger requirement: a linearization is a pencil 𝑳(z)=z𝑩𝑨\boldsymbol{L}(z)=z\boldsymbol{B}-\boldsymbol{A} which is equivalent to 𝑷(z)\boldsymbol{P}(z) in the following sense: there exist unimodular444In this context, the matrix polynomial 𝑨(z)\boldsymbol{A}(z) is unimodular if and only if det𝑨(z)𝔽0\det\boldsymbol{A}(z)\in\mathbb{F}\setminus{0} is a nonzero constant field element. matrix polynomial cofactors 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) which satisfy 𝑬(z)𝑳(z)𝑭(z)=diag(𝑷(z),𝑰n,,𝑰n)\boldsymbol{E}(z)\boldsymbol{L}(z)\boldsymbol{F}(z)=\mathrm{diag}{(}\boldsymbol{P}(z),\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n}). We write 𝑰n\boldsymbol{I}_{n} for the n×nn\times n identity matrix here.

Linearizations preserve information not only about eigenvalues, but also eigenvectors and invariant factors. Going further, a strong linearization is one for which the matrix pencil z𝑳(1/z)=z𝑨𝑩-z\boldsymbol{L}(1/z)=z\boldsymbol{A}-\boldsymbol{B} is a linearization for the reversal of 𝑷(z)\boldsymbol{P}(z), namely z𝑷(1/z)z^{\ell}\boldsymbol{P}(1/z). Strong linearizations also preserve information about eigenstructure at infinity.

The Hermite normal form of a matrix polynomial 𝑳\boldsymbol{L} is an upper triangular matrix polynomial 𝑯\boldsymbol{H} unimodularly row equivalent to 𝑳\boldsymbol{L}, which is to say 𝑳=𝑬𝑯\boldsymbol{L}=\boldsymbol{E}\boldsymbol{H}, where 𝑬\boldsymbol{E} is a matrix polynomial with det(𝑬)\det(\boldsymbol{E}) a nonzero constant. See  Storjohann (1994) for properties and Hermite form algorithm description. We will chiefly use the computation of the Hermite normal form as a step in the process of discovering unimodular matrix polynomials 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) that demonstrate that 𝑳(z)\boldsymbol{L}(z) linearizes 𝑷(z)\boldsymbol{P}(z).

Another technique, which gives a stronger result (when you can do it) is to prove strict equivalence. We say matrix pencils 𝑳1(z)\boldsymbol{L}_{1}(z) and 𝑳2(z)\boldsymbol{L}_{2}(z) are strictly equivalent if there exist constant unimodular matrices 𝑼\boldsymbol{U} and 𝑾\boldsymbol{W} with 𝑼𝑳1(z)𝑾=𝑳2(z)\boldsymbol{U}\boldsymbol{L}_{1}(z)\boldsymbol{W}=\boldsymbol{L}_{2}(z). We will cite and show some strict equivalence results in this paper, and prove a new strict equivalence for some Lagrange bases linearizations.

The paper Dopico et al. (2020) introduces a new notion in their Definition 2, that of a local linearization of rational matrix functions: this definition allows 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) to be rational unimodular transformations, and defines a local linearization only on a subset Σ\Sigma of 𝔽\mathbb{F}. Specifically, by this definition, two rational matrices 𝑮1(z)\boldsymbol{G}_{1}(z) and 𝑮2(z)\boldsymbol{G}_{2}(z) are locally equivalent if there exist rational matrices 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z), invertible for all zΣz\in\Sigma, such that 𝑮1(z)=𝑬(z)𝑮2(z)𝑭(z)\boldsymbol{G}_{1}(z)=\boldsymbol{E}(z)\boldsymbol{G}_{2}(z)\boldsymbol{F}(z) for all zΣz\in\Sigma.

This allows exclusion of poles of 𝑬(z)\boldsymbol{E}(z) or 𝑭(z)\boldsymbol{F}(z), for instance. It turns out that several authors, including Amiraslani et al. (2008), had in effect been using rational matrices and local linearizations for matrix polynomials. For most purposes, in skilled hands this notion provides all the analytical tools that one needs. However, as we will see, unimodular polynomial equivalence is superior in some ways. This paper therefore is motivated to see how far such local linearizations, in particular for the Bernstein basis and the Lagrange interpolational bases, can be strengthened to unimodular matrix polynomials. The point of this paper, as a symbolic computation paper, is to see how experiments with built-in code for Hermite normal form can help us to discover such cofactors.

We write

Bk(z)=(k)zk(1z)k0kB_{k}^{\ell}(z)=\binom{\ell}{k}z^{k}(1-z)^{\ell-k}\qquad 0\leq k\leq\ell (2.1)

for the Bernstein polynomials of degree \ell. This set of +1\ell+1 polynomials forms a basis for polynomials of grade \ell.

We will use the convention of writing 𝑨k\boldsymbol{A}_{k} for the coefficients when the matrix polynomial is expressed in the monomial basis or in another basis with a three-term recurrence relation among its elements, 𝒀k\boldsymbol{Y}_{k} for the coefficients when the matrix polynomial is expressed in the Bernstein basis, and 𝑷k\boldsymbol{P}_{k} for the coefficients when the matrix polynomial is expressed in a Lagrange basis. We will use lower-case letters for scalar polynomials. We may write the (matrix) coefficient of ϕk(z)\phi_{k}(z) for an arbitrary basis element as [ϕk(z)]𝑷(z)[\phi_{k}(z)]\boldsymbol{P}(z); for instance, the “leading coefficient” of 𝑷(z)\boldsymbol{P}(z), considered as a grade \ell polynomial, is [z]𝑷(z)[z^{\ell}]\boldsymbol{P}(z).

3 Explicit Cofactors

We find cofactors for orthogonal bases, the Bernstein basis, and Lagrange interpolational bases, all modulo certain exceptions. We believe all of these are new. These results are restricted to certain cases; say for the Bernstein basis when the coefficient 𝒀=[B(z)]𝑷(z)\boldsymbol{Y}_{\ell}=[B^{\ell}_{\ell}(z)]\boldsymbol{P}(z) (which is not the same as [z]𝑷(z)[z^{\ell}]\boldsymbol{P}(z)) is nonsingular. The method of this paper does not succeed in that case to find cofactors that demonstrate universal linearization. In fact, however, it is known that this Bernstein companion pencil is indeed a true linearization; we will give two references with two different proofs, and give our own proof in section 5.3.

Similarly, this method produces new cofactors for Lagrange interpolational bases, but in this case restricted to when the coefficients 𝑷k\boldsymbol{P}_{k} (which are actually the values of the matrix polynomial at the interpolational nodes) are all nonsingular. We also have an algebraic proof of equivalence, which we have cut for space reasons; this gives another proof that the Lagrange interpolational basis pencils used here are, in fact, linearizations. In section 5.3 we prove strict equivalence.

3.1 Monomial Basis

Given a potential linearization 𝑳(z)=z𝑪1𝑪0\boldsymbol{L}(z)=z\boldsymbol{C}_{1}-\boldsymbol{C}_{0}, it is possible to discover the matrices 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) by computing the generic Hermite Form555The Smith form, which is related, is also useful here; but we found that the Maple implementation of the Hermite Form Storjohann (1994) gave a simpler answer, although we had to compute the matrix 𝑭(z)\boldsymbol{F}(z) separately ourselves because the Hermite form is upper triangular, not diagonal., with respect to the variable zz, for instance by using a modestly sized example of 𝑳(z)\boldsymbol{L}(z) and a symbolic computation system such as Maple. The generic form that we obtain is of course not correct on specialization of the symbolic coefficients, and in particular is incorrect if the leading coefficient is zero; but we may adjust this by hand to find the following construction, which we will not detail in this section but will in the next.

By using the five-by-five scalar case, with symbolic coefficients which we write as aka_{k} instead of 𝑨k\boldsymbol{A}_{k} because they are scalar, we find that if

𝑬(z)=[1h4h3h2h1000010001z001zz201zz2z3]\boldsymbol{E}(z)=\left[\begin{array}[]{ccccc}1&h_{4}&h_{3}&h_{2}&h_{1}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&0&0&-1\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&0&-1&-z\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&-1&-z&-{z}^{2}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&-1&-z&-{z}^{2}&-{z}^{3}\end{array}\right] (3.2)

where h5=a5h_{5}=a_{5} and hk=ak+zhk+1h_{k}=a_{k}+zh_{k+1} for k=4k=4, 33, 22, 11 are the partial Horner evaluations of p(z)=a5z5++a0p(z)=a_{5}z^{5}+\cdots+a_{0}, and if

𝑭(z)=[z40001z30010z20100z100010000],\boldsymbol{F}(z)=\left[\begin{array}[]{ccccc}{z}^{4}&0&0&0&1\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr{z}^{3}&0&0&1&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr{z}^{2}&0&1&0&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr z&1&0&0&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 1&0&0&0&0\end{array}\right]\>, (3.3)

and if

𝑳(z)=[za5+a4a3a2a1a01z00001z00001z00001z],\boldsymbol{L}(z)=\left[\begin{array}[]{ccccc}za_{{5}}+a_{{4}}&a_{{3}}&a_{{2}}&a_{{1}}&a_{{0}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr-1&z&0&0&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&-1&z&0&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&-1&z&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&0&-1&z\end{array}\right]\>, (3.4)

then we have 𝑬(z)𝑳(z)𝑭(z)=diag(p(z),1,1,1,1)\boldsymbol{E}(z)\boldsymbol{L}(z)\boldsymbol{F}(z)=\mathrm{diag}{(}p(z),1,1,1,1). Moreover, det𝑬(z)=±1\det\boldsymbol{E}(z)=\pm 1 and det𝑭(z)=±1\det\boldsymbol{F}(z)=\pm 1, depending only on dimension.

Once this form is known, it is easily verified for general grades and quickly generalizes to matrix polynomials, establishing (as is well-known) that this form (known as the second companion form) is a linearization.

Note that the polynomial coefficients aka_{k} appear linearly in 𝑬(z)\boldsymbol{E}(z) and the unimodular matrix polynomials 𝑬\boldsymbol{E} and 𝑭\boldsymbol{F} are thus universally valid.

3.2 Three-term Recurrence Bases

The monomial basis, the shifted monomial basis, the Taylor basis, the Newton interpolational bases, and many common orthogonal polynomial bases all have three-term recurrence relations that, for k1k\geq 1, can be written

zϕk(z)=αkϕk+1(z)+βkϕk(z)+γkϕk1(z).z\phi_{k}(z)=\alpha_{k}\phi_{k+1}(z)+\beta_{k}\phi_{k}(z)+\gamma_{k}\phi_{k-1}(z)\>. (3.5)

All such polynomial bases require αk0\alpha_{k}\neq 0. For instance, the Chebyshev polynomial recurrence is usually written Tn+1(z)=2zTn(z)Tn1(z)T_{n+1}(z)=2zT_{n}(z)-T_{n-1}(z) but is easily rewritten in the above form by isolating zTn(z)zT_{n}(z), and all Chebyshev αk=1/2\alpha_{k}=1/2 for k>1k>1. We refer the reader to section 18.9 of the Digital Library of Mathematical Functions (dlmf.nist.gov) for more. See also Gautschi (2016).

For all such bases, we have the linearization666For exposition, we follow Peter Lancaster’s dictum, namely that the 5×55\times 5 case almost always gives the idea. 𝑳(z)=zC1C0\boldsymbol{L}(z)=zC_{1}-C_{0} where

C1=[a5α4𝑰4]C_{1}=\left[\begin{array}[]{c|cccc}\dfrac{a_{5}}{\alpha_{4}}&&&&\\ \hline\cr&\boldsymbol{I}_{4}\\ \end{array}\right] (3.6)

and

C0=[a4+β4α4a5a3+γ4α4a5a2a1a0α3β3γ3α2β2γ2α1β1γ1α0β0],C_{0}=\left[\begin{array}[]{c|cccc}-a_{4}+\dfrac{\beta_{4}}{\alpha_{4}}a_{5}&-a_{3}+\dfrac{\gamma_{4}}{\alpha_{4}}a_{5}&-a_{2}&-a_{1}&-a_{0}\\ \hline\cr\alpha_{3}&\beta_{3}&\gamma_{3}&&\\ &\alpha_{2}&\beta_{2}&\gamma_{2}&\\ &&\alpha_{1}&\beta_{1}&\gamma_{1}\\ &&&\alpha_{0}&\beta_{0}\end{array}\right]\>, (3.7)

In Maple we compute the Hermite normal form of a grade 55 example with symbolic coefficients by the following commands:

with(LinearAlgebra):
m := 5:
poly := add(a[k]*ChebyshevT(k, z), k = 0 .. m):
(C0, C1) := CompanionMatrix(poly, z):

That procedure does not use the same convention we use here and so we apply the standard involutory permutation (SIP) matrix to it and transpose it to place the polynomial coefficients in the top row.

J := Matrix( m, m, (i,j)->‘if‘( i+j=m+1, 1, 0 )):
R := C1*z - C0:
JRJT := Transpose((J . R) . J):

Now compute the generic Hermite normal form:

(HH, UU) := HermiteForm( JRJT,z,output=[’H’,’U’]):

That returns matrices such that 𝑼𝑳=𝑯\boldsymbol{U}\boldsymbol{L}=\boldsymbol{H}, or 𝑳=𝑼1𝑯\boldsymbol{L}=\boldsymbol{U}^{-1}\boldsymbol{H}. We now look at their structure.

mask := proc(A::Matrix)
map(t -> ‘if‘(t = 0, 0, x), A);
end proc:
mask( HH );

This produces

[x000x0x00x00x0x000xx0000x]\left[\begin{array}[]{ccccc}x&0&0&0&x\\ 0&x&0&0&x\\ 0&0&x&0&x\\ 0&0&0&x&x\\ 0&0&0&0&x\end{array}\right] (3.8)

and a separate investigation shows the diagonal entries are (as is correct in the generic case) all just 11, until the lower corner entry which is a monic version of the original polynomial. The matrix 𝑼\boldsymbol{U} has a simple enough shape in this case, but 𝑼1\boldsymbol{U}^{-1} is more convenient:

mask( UU^(-1) );

produces

[xxxxxxxx000xxx000xx0000x0].\left[\begin{array}[]{ccccc}x&x&x&x&x\\ x&x&x&0&0\\ 0&x&x&x&0\\ 0&0&x&x&0\\ 0&0&0&x&0\end{array}\right]\>. (3.9)

Because the first m1m-1 columns of 𝑯\boldsymbol{H} is a subset of the identity matrix, the first m1m-1 columns of 𝑼1\boldsymbol{U}^{-1} are the same as the first m1m-1 columns of 𝑳\boldsymbol{L}. Since the final column of 𝑼1\boldsymbol{U}^{-1} has only one nonzero entry, at the top, call that u0u_{0}. Call the entries in the final column of 𝑯\boldsymbol{H} in descending order hm1h_{m-1}, hm2h_{m-2},\ldots, h1h_{1}; let us suppose that the final entry is a multiple cp(z)cp(z) of the scalar polynomial.

Multiplying out 𝑼1𝑯\boldsymbol{U}^{-1}\boldsymbol{H} and equating its final column to the final column of 𝑳(z)\boldsymbol{L}(z) gives us a set of (triangular, as it happens) equations in the unknowns. These allow us to deduce a general form for 𝑼1(z)\boldsymbol{U}^{-1}(z), and to prove it is unimodular. Once we have 𝑼\boldsymbol{U} and 𝑯\boldsymbol{H} with 𝑼𝑳(z)=𝑯\boldsymbol{U}\boldsymbol{L}(z)=\boldsymbol{H}, construction of unimodular 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) so that 𝑬(z)𝑳(z)𝑭(z)=diag(𝑷(z),𝑰n,,𝑰n)\boldsymbol{E}(z)\boldsymbol{L}(z)\boldsymbol{F}(z)=\mathrm{diag}{(}\boldsymbol{P}(z),\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n}) is straightforward.

The results are slightly simpler in this case to present as inverses: Here we show the Chebyshev scalar case.

𝑬1=[1a1a2a3a52za5+a40012z12012z1200z120001000]\boldsymbol{E}^{-1}=\left[\begin{array}[]{ccccc}1&a_{1}&a_{2}&a_{3}-a_{5}&2za_{5}+a_{4}\\ 0&0&-\frac{1}{2}&z&-\frac{1}{2}\\ 0&-\frac{1}{2}&z&-\frac{1}{2}&0\\ 0&z&-\frac{1}{2}&0&0\\ 0&-1&0&0&0\end{array}\right] (3.10)

and

𝑭1=[000010001T1(z)0010T2(z)0100T3(z)1000T4(z)].\boldsymbol{F}^{-1}=\left[\begin{array}[]{ccccc}0&0&0&0&1\\ 0&0&0&1&-T_{1}\!\left(z\right)\\ 0&0&1&0&-T_{2}\!\left(z\right)\\ 0&1&0&0&-T_{3}\!\left(z\right)\\ 1&0&0&0&-T_{4}\!\left(z\right)\end{array}\right]. (3.11)

It is not immediately obvious that 𝑬(z)\boldsymbol{E}(z) is unimodular, but it is. For 𝑭(z)\boldsymbol{F}(z) it is obvious.

Notice again that the result is linear in the unknown polynomial coefficients, and that we therefore have a universal equivalence (once generalized to the matrix polynomial case, and of arbitrary dimension, which is straightforward).

4 Bernstein Basis

The set of Bernstein polynomials Bk(z)B^{\ell}_{k}(z) in equation (2.1) is a set of +1\ell+1 polynomials each of exact degree \ell that together forms a basis for polynomials of grade \ell over fields 𝔽\mathbb{F} of characteristic zero. Bernstein polynomials have many applications, for example in Computer Aided Geometric Design (CAGD), and many important properties including that of optimal condition number over all bases positive on [0,1]\left[0,1\right]. They do not satisfy a simple three term recurrence relation of the form discussed in Section 3.2, although they satisfy an interesting and useful “degree-elevation” recurrence, namely

(j+1)Bj+1n(z)+(nj)Bjn(z)=nBjn1(z),(j+1)B_{j+1}^{n}(z)+(n-j)B_{j}^{n}(z)=nB_{j}^{n-1}(z)\>, (4.12)

which specifically demonstrates that a sum of Bernstein polynomials of degree nn might actually have degree strictly less than nn. See Farouki (2012)Farouki and Goodman (1996), and Farouki and Rajan (1987) for more details of Bernstein bases.

A Bernstein linearization for p5(z)=k=05ykBk5(z)p_{5}(z)=\sum_{k=0}^{5}y_{k}B_{k}^{5}(z) is 𝑳(z)=\boldsymbol{L}(z)=

[15y5z+y4(1z)y3(1z)y2(1z)y1(1z)y0(1z)z124zz133zz142zz151z].\displaystyle\begin{bmatrix}\frac{1}{5}y_{5}z+y_{4}(1-z)&y_{3}(1-z)&y_{2}(1-z)&y_{1}(1-z)&y_{0}(1-z)\\ z-1&\frac{2}{4}z&&&\\ &z-1&\frac{3}{3}z&&\\ &&z-1&\frac{4}{2}z&\\ &&&z-1&\frac{5}{1}z\end{bmatrix}\>. (4.13)

The pattern on the diagonal is written in unreduced fractions so that the general pattern is more clear.

For a construction of rational unimodular 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) that show that this is a local linearization, in the case that no eigenvalue occurred at the end of the Bernstein interval (z=1z=1), see Amiraslani et al. (2008). The discussion in Mackey and Perović (2016) shows by strict equivalence that it is in fact a global strong linearization independently of any singularities, or indeed independently of regularity of the matrix polynomial. We will give here only an explicit construction of unimodular polynomial matrices 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z), again unless z=1z=1, and requiring that the “leading” block [B(z)](𝑷(z))=𝒀[B^{\ell}_{\ell}(z)](\boldsymbol{P}(z))=\boldsymbol{Y}_{\ell} be nonsingular.

The linearization used here was first analyzed in Jónsson (2001) and Jónsson and Vavasis (2004) (as a companion pencil for scalar polynomials) and has been further studied and generalized in Mackey and Perović (2016). One of the present authors independently invented and implemented a version of this linearization in the Maple command CompanionMatrix (except using 𝑷T(z)\boldsymbol{P}^{\mathrm{T}}(z), and flipped from the above form) in about 20042004. For a review of Bernstein linearizations, see the aforementioned Mackey and Perović (2016). For a proof of their numerical stability, see the original thesis Jónsson (2001).

4.1 Hermite form of the linearization

We use the same approach as before, probing with a small example and computing its Hermite normal form explicitly to suggest the correct form.

The resulting suggested form for 𝑼1\boldsymbol{U}^{-1} is

[xxxxxxx00x0xx0x00xxx000xx]\left[\begin{array}[]{ccccc}x&x&x&x&x\\ x&x&0&0&x\\ 0&x&x&0&x\\ 0&0&x&x&x\\ 0&0&0&x&x\end{array}\right] (4.14)

which is more complicated than before, because the final column is full (as before, the first 1\ell-1 columns merely copy the companion pencil 𝑳(z)\boldsymbol{L}(z)). Moreover, this time the entries in 𝑼1\boldsymbol{U}^{-1} are polynomial functions of the yky_{k}, 0k<0\leq k<\ell, not just linear; but involve negative powers of yy_{\ell} (this equivalence appears to be new: the treatment in Mackey and Perović (2016) is implicit, while the treatment in Amiraslani et al. (2008) only gives a local linearization). This means that in the case 𝒀\boldsymbol{Y}_{\ell} is singular, z=1z=1 is an eigenvalue and something else must be done to linearize the matrix polynomial.

We find a recurrence relation for the unknown blocks in both the purported Hermite normal form and in the inverse of the cofactor. Put 𝑼1=\boldsymbol{U}^{-1}=

[z𝒀/(z1)𝒀1(z1)𝒀2(z1)𝒀1𝑾(z1)𝑰n2z𝑰n/(1)𝑾1(zi)𝑰n(z1)𝑰n𝑾0]\begin{bmatrix}z\boldsymbol{Y}_{\ell}/\ell-(z-1)\boldsymbol{Y}_{\ell-1}&-(z-1)\boldsymbol{Y}_{\ell-2}&\cdots&-(z-1)\boldsymbol{Y}_{1}&\boldsymbol{W}_{\ell}\\ -(z-1)\boldsymbol{I}_{n}&2z\boldsymbol{I}_{n}/(\ell-1)&&&\boldsymbol{W}_{\ell-1}\\ &-(z-i)\boldsymbol{I}_{n}&\ddots&&\vdots\\ &&&-(z-1)\boldsymbol{I}_{n}&\boldsymbol{W}_{0}\\ \end{bmatrix} (4.15)

which, apart from the final column, is the same as the linearization 𝑳(z)\boldsymbol{L}(z), and the Hermite normal form analogue as

𝑯=[𝑰n𝑯1𝑰n𝑯2𝑰n𝑯1𝑷(z)].\boldsymbol{H}=\begin{bmatrix}\boldsymbol{I}_{n}&&&&\boldsymbol{H}_{\ell-1}\\ &\boldsymbol{I}_{n}&&&\boldsymbol{H}_{\ell-2}\\ &&\ddots&&\vdots\\ &&&\boldsymbol{I}_{n}&\boldsymbol{H}_{1}\\ &&&&\boldsymbol{P}(z)\end{bmatrix}\>. (4.16)

Multiplying out 𝑼1𝑯\boldsymbol{U}^{-1}\boldsymbol{H} we find that for the lower right corner block

(z1)𝑯1+𝑾0𝑷(z)=z𝑰n.-(z-1)\boldsymbol{H}_{1}+\boldsymbol{W}_{0}\boldsymbol{P}(z)=\ell z\boldsymbol{I}_{n}\>. (4.17)

A separate investigation shows that 𝑾0\boldsymbol{W}_{0} must be a constant matrix. Evaluating that equation at z=1z=1 shows that 𝑾0=𝑷1(1)\boldsymbol{W}_{0}=\ell\boldsymbol{P}^{-1}(1), so 𝑷(1)\boldsymbol{P}(1) must be nonsingular for this form to hold. Once 𝑾0\boldsymbol{W}_{0} is identified, equation (4.17) can be solved uniquely for the grade 1\ell-1 matrix polynomial 𝑯1(z)\boldsymbol{H}_{1}(z):

𝑯1(z)=z1(𝑷1(1)𝑷(z)z𝑰n).\boldsymbol{H}_{1}(z)=\frac{\ell}{z-1}\left(\boldsymbol{P}^{-1}(1)\boldsymbol{P}(z)-z\boldsymbol{I}_{n}\right)\>. (4.18)

Division is exact there, as can be checked by expanding the numerator of the right-hand side in Taylor series about z=1z=1.

The other block entries in the multiplied-out equation give a triangular set of recurrences for the 𝑾k\boldsymbol{W}_{k} and the 𝑯k(z)\boldsymbol{H}_{k}(z) that are solvable in the same way and under the same condition, namely that 𝑷(1)\boldsymbol{P}(1) must be nonsingular.

Once we have 𝑼\boldsymbol{U} and 𝑯\boldsymbol{H} with 𝑼𝑳(z)=𝑯\boldsymbol{U}\boldsymbol{L}(z)=\boldsymbol{H}, construction of unimodular 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) so that 𝑬(z)𝑳(z)𝑭(z)=diag(𝑷(z),𝑰n,,𝑰n)\boldsymbol{E}(z)\boldsymbol{L}(z)\boldsymbol{F}(z)=\mathrm{diag}{(}\boldsymbol{P}(z),\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n}) is straightforward.

4.2 Strict Equivalence

For the pencil in equation (4.13) we exhibit the strict equivalence by the unimodular matrices 𝑼\boldsymbol{U} and 𝑾\boldsymbol{W} giving 𝑳m(z)=𝑼𝑳B(z)𝑾\boldsymbol{L}_{m}(z)=\boldsymbol{U}\boldsymbol{L}_{B}(z)\boldsymbol{W}, below; for ease of understanding, we actually show 𝑼1\boldsymbol{U}^{-1} which shows its relationship to 𝑾\boldsymbol{W} more clearly:

𝑾=[50000101000010201000515155014641]\boldsymbol{W}=\left[\begin{array}[]{ccccc}5&0&0&0&0\\ -10&10&0&0&0\\ 10&-20&10&0&0\\ -5&15&-15&5&0\\ 1&-4&6&-4&1\end{array}\right] (4.19)

and

𝑼1=[1h3h2h1y0050000101000010201000515155],\boldsymbol{U}^{-1}=\left[\begin{array}[]{ccccc}1&h_{3}&h_{2}&h_{1}&-y_{0}\\ 0&5&0&0&0\\ 0&-10&10&0&0\\ 0&10&-20&10&0\\ 0&-5&15&-15&5\end{array}\right]\>, (4.20)

where h3=10y3+20y215y1+4y0h_{3}=-10y_{3}+20y_{2}-15y_{1}+4y_{0}, h2=10y2+15y16y0h_{2}=-10y_{2}+15y_{1}-6y_{0}, and h1=5y1+4y0h_{1}=-5y_{1}+4y_{0}. Generalizing these to matrix polynomials is straightforward, as is generalizing these to arbitrary grade.

The paper Mackey and Perović (2016) contains, in its equation (3.6), a five by five example including tensor products that shows how to find a strict equivalence of the pencil here to the strong linearizations they construct in their paper. Both the results of that paper and the construction given by example above demonstrate that the Bernstein linearization here is a strong linearization, independently of the singularity of 𝑷(1)\boldsymbol{P}(1) and indeed independently of the regularity of the matrix polynomial.

4.3 A new reversal

We here present a slightly different reversal, namely rev p(z)=(z+1)p(1/(z+1))p(z)=(z+1)^{\ell}p(1/(z+1)) of a polynomial of grade \ell expressed in a Bernstein basis, instead of the standard reversal zp(1/z)z^{\ell}p(1/z). This new reversal has a slight numerical advantage if all the coefficients of p(z)p(z) are the same sign. We also give a proof that the linearization of this reversal is the corresponding reversal of the linearization, thus giving a new independent proof that the linearization is a strong one. This provides the details of the entries in the unimodular matrix polynomials 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) with 𝑬(z)𝑳(z)𝑭(z)=diag𝑷(z),𝑰n,,𝑰n\boldsymbol{E}(z)\boldsymbol{L}(z)\boldsymbol{F}(z)=\mathrm{diag}{\boldsymbol{P}(z),\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n}} constructed above.

A short computation shows that if

p(z)=k=0ykBk(z)p(z)=\sum_{k=0}^{\ell}y_{k}B_{k}^{\ell}(z) (4.21)

then

revp(z)=(z+1)p(1z+1)=k=0dkBk(z)\mathrm{rev}\,p(z)=(z+1)^{\ell}p\left(\frac{1}{z+1}\right)=\sum_{k=0}^{\ell}d_{k}B_{k}^{\ell}(z) (4.22)

where

dk=j=0k(kj)yj,d_{k}=\sum_{j=0}^{k}\binom{k}{j}y_{\ell-j}\>, (4.23)

whereas the coefficients of the standard reversal are, in contrast,

ek=m=0k(1)m(km)ymke_{k}=\sum_{m=0}^{\ell-k}(-1)^{m}\binom{\ell-k}{m}y_{\ell-m-k} (4.24)

which has introduced sign changes, which may fail to preserve numerical stability if all the yky_{k} are of one sign. A further observation is that the coefficient d0d_{0} only involves yy_{\ell}, while e0e_{0} involves all yky_{k}; d1d_{1} involves yy_{\ell} and y1y_{\ell-1} while e1e_{1} involves all but y0y_{0}, and so on; in that sense, this new reversal has a more analogous behaviour to the monomial basis reversal, which simply reverses the list of coefficients.

For interest, we note that if (𝑨,𝑩)(\boldsymbol{A},\boldsymbol{B}) is a linearization for p(z)p(z) so that p(z)=det(z𝑩𝑨)p(z)=\det(z\boldsymbol{B}-\boldsymbol{A}), then reversing the linearization by this transformation is not a matter of simply interchanging 𝑩\boldsymbol{B} and 𝑨\boldsymbol{A}:

(z+1)p(1z+1)\displaystyle(z+1)^{\ell}p\left(\frac{1}{z+1}\right) =(z+1)det(1z+1𝑩𝑨)\displaystyle=(z+1)^{\ell}\det\left(\frac{1}{z+1}\boldsymbol{B}-\boldsymbol{A}\right)
=det(𝑩(z+1)𝑨)\displaystyle=\det(\boldsymbol{B}-(z+1)\boldsymbol{A})
=det(𝑩𝑨z𝑨)\displaystyle=\det(\boldsymbol{B}-\boldsymbol{A}-z\boldsymbol{A}) (4.25)

and so the corresponding reversed linearization is (𝑨,𝑩𝑨)(\boldsymbol{A},\boldsymbol{B}-\boldsymbol{A}). The sign change is of no importance.

Suppose that the Bernstein linearization of p(z)p(z) is (𝑨,𝑩)(\boldsymbol{A},\boldsymbol{B}) and that the Bernstein linearization of rev p(z)p(z) is (𝑨R,𝑩R)(\boldsymbol{A}_{R},\boldsymbol{B}_{R}). That is, the matrices 𝑨R\boldsymbol{A}_{R} and 𝑩R\boldsymbol{B}_{R} have the same form as that of 𝑨\boldsymbol{A} and 𝑩\boldsymbol{B}, but where (𝑨,𝑩)(\boldsymbol{A},\boldsymbol{B}) contain yky_{k}s the matrices (𝑨R,𝑩R)(\boldsymbol{A}_{R},\boldsymbol{B}_{R}) contain dkd_{k}s. To give a new proof that the Bernstein linearization is actually a strong linearization, then, we must find a pair of unimodular matrices (𝑼,𝑾)(\boldsymbol{U},\boldsymbol{W}) which have 𝑼𝑨R𝑾=𝑩𝑨\boldsymbol{U}\boldsymbol{A}_{R}\boldsymbol{W}=\boldsymbol{B}-\boldsymbol{A} and 𝑼𝑩R𝑾=𝑨\boldsymbol{U}\boldsymbol{B}_{R}\boldsymbol{W}=\boldsymbol{A}, valid for all choices of coefficients yky_{k} (which determine the corresponding reversed coefficients dkd_{k} by the formula above).

First, it simplifies matters to deal not with 𝑾\boldsymbol{W} but rather with 𝑾1\boldsymbol{W}^{-1}. Then, our defining conditions become

𝑼𝑨R\displaystyle\boldsymbol{U}\boldsymbol{A}_{R} =(𝑩𝑨)𝑾1\displaystyle=\left(\boldsymbol{B}-\boldsymbol{A}\right)\boldsymbol{W}^{-1} (4.26)
𝑼𝑩R\displaystyle\boldsymbol{U}\boldsymbol{B}_{R} =𝑨𝑾1,\displaystyle=\boldsymbol{A}\boldsymbol{W}^{-1}\>, (4.27)

which are linear in the unknowns (the entries of 𝑼\boldsymbol{U} and of 𝑾1\boldsymbol{W}^{-1}). By inspection of the first few dimensions, we find that 𝑼\boldsymbol{U} and 𝑾1\boldsymbol{W}^{-1} have the following form (using the six-by-six case, for variation, to demonstrate). The anti-diagonal of the general 𝑼\boldsymbol{U} has entries (i+1)/i-(\ell-i+1)/i for i=1i=1, 22, \ldots, \ell.

𝑼=[000006000052u2,600043u3,5u3,60034u4,4u4,5u4,6025u5,3u5,4u5,5u5,616u6,2u6,3u6,4u6,5u6,6]\boldsymbol{U}=\left[\begin{array}[]{cccccc}0&0&0&0&0&-6\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&0&0&-{\frac{5}{2}}&u_{{2,6}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&0&-{\frac{4}{3}}&u_{{3,5}}&u_{{3,6}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&-{\frac{3}{4}}&u_{{4,4}}&u_{{4,5}}&u_{{4,6}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&-{\frac{2}{5}}&u_{{5,3}}&u_{{5,4}}&u_{{5,5}}&u_{{5,6}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr-{\frac{1}{6}}&u_{{6,2}}&u_{{6,3}}&u_{{6,4}}&u_{{6,5}}&u_{{6,6}}\end{array}\right] (4.28)

and

𝑾1=[0000z1,5z1,6000z2,4z2,5z2,600z3,3z3,4z3,5z3,60z4,2z4,3z4,4z4,5z4,6z5,1z5,2z5,3z5,4z5,5z5,6000001]\boldsymbol{W}^{-1}=\left[\begin{array}[]{cccccc}0&0&0&0&z_{{1,5}}&z_{{1,6}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&0&z_{{2,4}}&z_{{2,5}}&z_{{2,6}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&z_{{3,3}}&z_{{3,4}}&z_{{3,5}}&z_{{3,6}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&z_{{4,2}}&z_{{4,3}}&z_{{4,4}}&z_{{4,5}}&z_{{4,6}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr z_{{5,1}}&z_{{5,2}}&z_{{5,3}}&z_{{5,4}}&z_{{5,5}}&z_{{5,6}}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&0&0&0&0&1\end{array}\right] (4.29)

One can then guess the explicit general formulae

ui,j\displaystyle u_{i,j} =(i+1)i(i+1j)1i,j\displaystyle=-\frac{(\ell-i+1)}{i}\binom{i}{\ell+1-j}\qquad 1\leq i,j\leq\ell (4.30)
zi,j\displaystyle z_{i,j} =(i)j(ij)1i,j1\displaystyle=-\frac{(\ell-i)}{j}\binom{i}{\ell-j}\qquad 1\leq i,j\leq\ell-1 (4.31)
zi,\displaystyle z_{i,\ell} =di(i)y,1i1\displaystyle=d_{i}-\frac{(\ell-i)}{\ell}y_{\ell}\>,\qquad 1\leq i\leq\ell-1 (4.32)

and then prove that these are not only necessary for the equations above, but also sufficient. The matrix 𝑩𝑨\boldsymbol{B}-\boldsymbol{A} is diagonal and gives a direct relationship between the triangular block in 𝑾1\boldsymbol{W}^{-1} and a corresponding portion of 𝑼\boldsymbol{U}; the other equation gives a recurrence relation for the entries of 𝑼\boldsymbol{U}. Comparison of the final columns of the products gives an explicit formula for the final column of 𝑾1\boldsymbol{W}^{-1} and an explicit formula for the entries of 𝑼\boldsymbol{U} by comparison of the coefficients of the symbols yky_{k}; this formula can be seen to verify the recurrence relation found earlier, closing the circle and establishing sufficiency. Both matrices 𝑼\boldsymbol{U} and 𝑾\boldsymbol{W} have determinant ±1\pm 1: /2\lfloor\ell/2\rfloor row-permutations brings 𝑼\boldsymbol{U} to upper triangular form and the determinant (1)(-1)^{\ell} (times (1)/2(-1)^{\lfloor\ell/2\rfloor}) can be read off as the product of the formerly anti-diagonal elements, and similarly for the [1:1,1:1][1:\ell-1,1:\ell-1] block of 𝑾1\boldsymbol{W}^{-1} which gives a sign (1)1+(1)/2(-1)^{\ell-1+\lfloor(\ell-1)/2\rfloor}.

5 Lagrange Interpolational Bases

A useful arrowhead companion matrix pencil for polynomials given in a Lagrange basis was given in Corless (2004); Corless and Watt (2004). Later, Piers Lawrence recognized that the mathematically equivalent pencil re-ordered by similarity transformation by the standard involutory permutation (SIP) matrix so that the arrow was pointing up and to the left was numerically superior, in that one of the spurious infinite eigenvalues will be immediately deflated—without rounding errors—by the standard QZ algorithm Lawrence and Corless (2012). We shall use that variant in this paper.

For expository purposes, consider interpolating a scalar polynomial p(z)p(z) on the four distinct nodes τk\tau_{k} for 0k30\leq k\leq 3. In the first barycentric form Berrut and Trefethen (2004) this is

p(z)=w(z)k=03βkpkzτk=k=03pkwk(z),p(z)=w(z)\sum_{k=0}^{3}\frac{\beta_{k}p_{k}}{z-\tau_{k}}=\sum_{k=0}^{3}p_{k}w_{k}(z), (5.33)

where the node polynomial w(z)=(zτ0)(zτ1)(zτ2)(zτ3)w(z)=(z-\tau_{0})(z-\tau_{1})(z-\tau_{2})(z-\tau_{3}), and wk(z)=βkw(z)/(zτk)w_{k}(z)=\beta_{k}w(z)/(z-\tau_{k}), and the barycentric weights βk\beta_{k} are

βk=j=0&jk31τkτj.\beta_{k}=\prod_{j=0\&j\neq k}^{3}\frac{1}{\tau_{k}-\tau_{j}}\>. (5.34)

Then a Schur complement with respect to the bottom right 4×44\times 4 block shows that if

L(z)=[0p3p2p1p0β3zτ3000β20zτ200β100zτ10β0000zτ0]L(z)=\left[\begin{array}[]{ccccc}0&-p_{3}&-p_{2}&-p_{1}&-p_{0}\\ \beta_{3}&z-\tau_{3}&0&0&0\\ \beta_{2}&0&z-\tau_{2}&0&0\\ \beta_{1}&0&0&z-\tau_{1}&0\\ \beta_{0}&0&0&0&z-\tau_{0}\end{array}\right] (5.35)

then detL(z)=p(z)\det L(z)=p(z). By exhibiting a rational unimodular equivalence, the paper Amiraslani et al. (2008) showed that the general form of this was at least a local linearization for matrix polynomials 𝑷(z)\boldsymbol{P}(z), and also demonstrated that this was true for the reversal as well, showing that it was a strong (local) linearization. They also gave indirect arguments, equivalent to the notion of patched local linearizations introduced in Dopico et al. (2020), showing that the construction gave a genuine strong linearization. Here, we wish to see if we can explicitly construct unimodular matrix polynomials 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) which show equivalence, directly demonstrating that this is a linearization.

5.1 Hermite form of the linearization

As we did for the monomial basis, we start with a scalar version. We compute the Hermite normal form 𝑯\boldsymbol{H}, and the transformation matrix 𝑼\boldsymbol{U} so that 𝑼𝑳(z)=𝑯\boldsymbol{U}\boldsymbol{L}(z)=\boldsymbol{H}, with Maple to give clues to find the general form. When we do this for the grade 33 example above we find that the form of 𝑼\boldsymbol{U} is not helpful, but that the form of 𝑼1\boldsymbol{U}^{-1} and 𝑯\boldsymbol{H} are:

L(z)=[0xxx0xx00xx0x0xx00xxx0000][1000x0100x0010x0001x0000x].L(z)=\left[\begin{array}[]{ccccc}0&x&x&x&0\\ x&x&0&0&x\\ x&0&x&0&x\\ x&0&0&x&x\\ x&0&0&0&0\end{array}\right]\left[\begin{array}[]{ccccc}1&0&0&0&x\\ 0&1&0&0&x\\ 0&0&1&0&x\\ 0&0&0&1&x\\ 0&0&0&0&x\end{array}\right]\>. (5.36)

As is usual, the generic Hermite normal form contains p(z)p(z) in the lower right corner; on specialization of the polynomial coefficients this can change, of course. We return to this point later.

This gives us enough information to conjecture the general form in the theorem below, and a proof follows quickly.

Theorem 5.1.

If

L(z)=[0𝑷𝑷1𝑷0β𝑰n(zτ)𝑰n000β1𝑰n0(zτ1)𝑰n00β0𝑰n000(zτ0)𝑰n]L(z)=\left[\begin{array}[]{ccccc}0&-\boldsymbol{P}_{\ell}&-\boldsymbol{P}_{\ell-1}&\ldots&-\boldsymbol{P}_{0}\\ \beta_{\ell}\boldsymbol{I}_{n}&(z-\tau_{\ell})\boldsymbol{I}_{n}&0&0&0\\ \beta_{\ell-1}\boldsymbol{I}_{n}&0&(z-\tau_{\ell-1})\boldsymbol{I}_{n}&0&0\\ \vdots&&&\ddots&\\ \beta_{0}\boldsymbol{I}_{n}&0&0&0&(z-\tau_{0})\boldsymbol{I}_{n}\end{array}\right] (5.37)

and no 𝐏k\boldsymbol{P}_{k} is singular then L(z)=𝐔1𝐇L(z)=\boldsymbol{U}^{-1}\boldsymbol{H} where 𝐔1=\boldsymbol{U}^{-1}=

[0𝑷𝑷1𝑷10β𝑰n(zτ)𝑰n00𝑼β1𝑰n0(zτ1)𝑰n0𝑼1β1𝑰n00(zτ1)𝑰n𝑼1β0𝑰n0000],\left[\begin{array}[]{cccccc}0&-\boldsymbol{P}_{\ell}&-\boldsymbol{P}_{\ell-1}&\ldots&-\boldsymbol{P}_{1}&0\\ \beta_{\ell}\boldsymbol{I}_{n}&(z-\tau_{\ell})\boldsymbol{I}_{n}&0&0&\cdots&\boldsymbol{U}_{\ell}\\ \beta_{\ell-1}\boldsymbol{I}_{n}&0&(z-\tau_{\ell-1})\boldsymbol{I}_{n}&0&\cdots&\boldsymbol{U}_{\ell-1}\\ \vdots&&&\ddots&&\\ \beta_{1}\boldsymbol{I}_{n}&0&0&\cdots&(z-\tau_{1})\boldsymbol{I}_{n}&\boldsymbol{U}_{1}\\ \beta_{0}\boldsymbol{I}_{n}&0&0&0&\cdots&0\end{array}\right]\>,

and 𝐇\boldsymbol{H} is the identity matrix with its final column replaced with a block matrix that can be partitioned as

𝑯=[𝑰n𝑮𝑰n𝑯𝑰n𝑯1𝑰n𝑯1𝑷(z)].\boldsymbol{H}=\left[\begin{array}[]{cccccc}\boldsymbol{I}_{n}&&&&&\boldsymbol{G}\\ &\boldsymbol{I}_{n}&&&&\boldsymbol{H}_{\ell}\\ &&\boldsymbol{I}_{n}&&&\boldsymbol{H}_{\ell-1}\\ &&&\ddots&&\vdots\\ &&&&\boldsymbol{I}_{n}&\boldsymbol{H}_{1}\\ &&&&&\boldsymbol{P}(z)\end{array}\right]\>.

Moreover,

𝑮=1β0(zτ0)𝑰n,\boldsymbol{G}=-\frac{1}{\beta_{0}}(z-\tau_{0})\boldsymbol{I}_{n}\>, (5.38)
𝑼k=βkβ0(τkτ0)𝑷k1,\boldsymbol{U}_{k}=-\frac{\beta_{k}}{\beta_{0}}(\tau_{k}-\tau_{0})\boldsymbol{P}_{k}^{-1}\>, (5.39)

for 1k1\leq k\leq\ell, and

𝑯k=1zτk(βk𝑮𝑼k𝑷(z)).\boldsymbol{H}_{k}=\frac{1}{z-\tau_{k}}\left(\beta_{k}\boldsymbol{G}-\boldsymbol{U}_{k}\boldsymbol{P}(z)\right)\>. (5.40)

Note that the definition of 𝐔k\boldsymbol{U}_{k} in equation (5.39) ensures that the division in equation (5.40) is exact, and therefore 𝐇k\boldsymbol{H}_{k} is a matrix polynomial of grade 1\ell-1. Furthermore, 𝐔\boldsymbol{U} is unimodular.

Proof.

Block multiplication of the forms of 𝑼1\boldsymbol{U}^{-1} and 𝑯\boldsymbol{H} gives +1\ell+1 block equations:

k=1𝑷k𝑯k\displaystyle-\sum_{k=1}^{\ell}\boldsymbol{P}_{k}\boldsymbol{H}_{k} =𝑷0\displaystyle=\boldsymbol{P}_{0} (5.41)
βk𝑮+(zτk)𝑯k+𝑼k𝑷(z)\displaystyle\beta_{k}\boldsymbol{G}+(z-\tau_{k})\boldsymbol{H}_{k}+\boldsymbol{U}_{k}\boldsymbol{P}(z) =0,1k\displaystyle=0,1\leq k\leq\ell (5.42)
β0𝑮\displaystyle\beta_{0}\boldsymbol{G} =(zτ0)𝑰n.\displaystyle=(z-\tau_{0})\boldsymbol{I}_{n}\>. (5.43)

The last block equation identifies 𝑮\boldsymbol{G}. Putting z=τkz=\tau_{k} in each of the middle block equations identifies each 𝑼k\boldsymbol{U}_{k}. Once that has been done, the middle block equations define each 𝑯k\boldsymbol{H}_{k}. All that remains is to show that these purported solutions satisfy equation (5.41) and that the resulting matrix 𝑼\boldsymbol{U} is unimodular.

We will need the following facts about the node polynomial w(z)=k=0(zτk)w(z)=\prod_{k=0}^{\ell}(z-\tau_{k}), the barycentric weights βk\beta_{k}, and the first barycentric form for 𝑷(z)\boldsymbol{P}(z):

1w(z)=k=0βkzτk,\frac{1}{w(z)}=\sum_{k=0}^{\ell}\frac{\beta_{k}}{z-\tau_{k}}\>, (5.44)
zw(z)=k=0βkτkzτk,\frac{z}{w(z)}=\sum_{k=0}^{\ell}\frac{\beta_{k}\tau_{k}}{z-\tau_{k}}\>, (5.45)

and

𝑷(z)=w(z)k=0βkzτk𝑷k.\boldsymbol{P}(z)={w(z)}\sum_{k=0}^{\ell}\frac{\beta_{k}}{z-\tau_{k}}\boldsymbol{P}_{k}\>. (5.46)

By substituting z=0z=0 in equation (5.45) we find that the sum of the barycentric weights is zero777This is true even if one of the τk\tau_{k} is zero, inductively because one factor of zz then cancels in the numerator and denominator on the left hand side and the result is one degree lower..

We now substitute equations (5.40)–(5.39) into the left hand side of equation (5.41) to get

k=11zτk𝑷k(βkβ0(zτ0)+βkβ0(τkτ0)𝑷k1𝑷(z)).\sum_{k=1}^{\ell}\frac{1}{z-\tau_{k}}\boldsymbol{P}_{k}\left(-\frac{\beta_{k}}{\beta_{0}}(z-\tau_{0})+\frac{\beta_{k}}{\beta_{0}}(\tau_{k}-\tau_{0})\boldsymbol{P}_{k}^{-1}\boldsymbol{P}(z)\right)\>. (5.47)

Expanding this, we have

LHS=\displaystyle\mathrm{LHS}= (zτ0)β0k=1βkzτk𝑷k+k=1βk(τkτ0)β0(zτk)𝑷(z)\displaystyle-\frac{(z-\tau_{0})}{\beta_{0}}\sum_{k=1}^{\ell}\frac{\beta_{k}}{z-\tau_{k}}\boldsymbol{P}_{k}+\sum_{k=1}^{\ell}\frac{\beta_{k}(\tau_{k}-\tau_{0})}{\beta_{0}(z-\tau_{k})}\boldsymbol{P}(z)
=\displaystyle= (zτ0)β0[1w(z)𝑷(z)β0zτ0𝑷0]\displaystyle-\frac{(z-\tau_{0})}{\beta_{0}}\left[\frac{1}{w(z)}\boldsymbol{P}(z)-\frac{\beta_{0}}{z-\tau_{0}}\boldsymbol{P}_{0}\right]
+1β0[zw(z)β0τ0zτ0]𝑷(z)τ0β0[1w(z)β0zτ0]𝑷(z)\displaystyle\qquad+\frac{1}{\beta_{0}}\left[\frac{z}{w(z)}-\frac{\beta_{0}\tau_{0}}{z-\tau_{0}}\right]\boldsymbol{P}(z)-\frac{\tau_{0}}{\beta_{0}}\left[\frac{1}{w(z)}-\frac{\beta_{0}}{z-\tau_{0}}\right]\boldsymbol{P}(z)
=𝑷0.\displaystyle=\boldsymbol{P}_{0}\>. (5.48)

This shows that we have found a successful factoring analogous to the scalar Hermite normal form.

All that remains is to show that 𝑼(z)\boldsymbol{U}(z) is unimodular. We use the Schur determinantal formula, and identify the βk(τkτ0)\beta_{k}(\tau_{k}-\tau_{0}) as the barycentric weights on the nodes with τ0\tau_{0} removed and we see that up to sign (depending on dimension) the determinant simplifies to 11. This completes the proof. ∎

As a corollary, we can explicitly construct unimodular matrix polynomials 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) from these factors showing that, if each 𝑷k\boldsymbol{P}_{k} is nonsingular, L(z)L(z) is equivalent to diag(𝑷(z),𝑰n,,𝑰n)\mathrm{diag}{(}\boldsymbol{P}(z),\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n}).

As in Amiraslani et al. (2008) we may use LULU factoring to show that 𝑷\boldsymbol{P} is also equivalent to 𝑳\boldsymbol{L} at the nodes, essentially using Proposition 2.1 of Dopico et al. (2020). We also have a purely algebraic proof based on local equivalence of Smith forms, not shown here.

5.2 Linearization equivalence via local Smith form

In this section we approach the equivalence of 𝑷\boldsymbol{P} to its linearization via localizations. The argument does not require that 𝑷\boldsymbol{P} or its evaluations 𝑷k\boldsymbol{P}_{k} be nonsingular. However, this method is less explicit about the form of the unimodular cofactors. Also, a stronger result, strict equivalence, is shown in section 5.3. The line of argument here, that local information may be brought together to obtain a global equivalence, may be useful for other equivalence problems lacking strict equivalence.

The localizations here are in the algebraic sense. 𝔽[z]\mathbb{F}[z] is a principal ideal ring as is each localization, 𝔽[z]f=𝔽[z]/Mf\mathbb{F}[z]_{f}=\mathbb{F}[z]/M_{f}, at an irreducible f𝔽[z]f\in\mathbb{F}[z], where MfM_{f} denotes the multiplicatively closed set of all g𝔽[z]g\in\mathbb{F}[z] relatively prime to ff. The ideals of 𝔽[x]f\mathbb{F}[x]_{f} are the multiples of a given power of ff, Ik=fk𝔽[x]f.I_{k}=f^{k}\mathbb{F}[x]_{f}. Working over such a localization, unimodular cofactors may have denominators d(z)d(z) relatively prime to ff. Then this algebraically local solution is a local solution in the analytic sense of section 2, being valid for all zz not a zero of dd.

Lemma 1.

Matrices A,B𝔽[z]n×nA,B\in\mathbb{F}[z]^{n\times n} are equivalent if and only if they are equivalent over every localization 𝔽[z]f\mathbb{F}[z]_{f} at an irreducible f(z)f(z).

Proof.

Let 𝑺=diag(s1,,sn)\boldsymbol{S}=\mathrm{diag}{(}s_{1},\ldots,s_{n}) be the Smith normal form of AA. There are unimodular 𝑬,𝑭𝔽[z]\boldsymbol{E},\boldsymbol{F}\in\mathbb{F}[z] such that 𝑨=𝑬𝑺𝑭\boldsymbol{A}=\boldsymbol{E}\boldsymbol{S}\boldsymbol{F}. Let f(z)f(z) be an irreducible and suppose gk[z]g_{k}[z] and eke_{k} are such that sk=fekgks_{k}=f^{e_{k}}g_{k} with gg relatively prime to ff. Then the matrix 𝑼=diag(g1,,gn)\boldsymbol{U}=\mathrm{diag}{(}g_{1},\ldots,g_{n}) is unimodular over 𝔽[z]f\mathbb{F}[z]_{f} and 𝑨=𝑬𝑼𝑺f𝑭\boldsymbol{A}=\boldsymbol{E}\boldsymbol{U}\boldsymbol{S}_{f}\boldsymbol{F}, where 𝑺f=diag(fe1,,fen)\boldsymbol{S}_{f}=\mathrm{diag}{(}f^{e_{1}},\ldots,f^{e_{n}}) is in Smith normal form and is the unique Smith form of 𝑨\boldsymbol{A} locally at ff. Thus the powers of ff in the Smith form over 𝔽[z]\mathbb{F}[z] form precisely the Smith form locally over 𝔽[z]f\mathbb{F}[z]_{f}. Matrices 𝑨,𝑩\boldsymbol{A},\boldsymbol{B} are equivalent precisely when they have the same Smith form. As we’ve just shown, this happens if and only if they have the same Smith forms locally at each irreducible ff. ∎

We remark that it would also work to approximate the localizations and do computations over 𝔽[x]/fe\mathbb{F}[x]/\langle f^{e}\rangle for sufficiently large ee. e=ne=n\ell would do since the degree of any minor of PP is bounded by nn\ell. For a thorough treatment on the local approach to Smith form, see Wilkening and Yu (2011).

Assuming that 𝑳\boldsymbol{L} is equivalent to diag(𝑷,𝑰n,,𝑰n)\mathrm{diag}{(}\boldsymbol{P},\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n}) (which we show next), one can readily produce unimodular cofactors using a Smith form algorithm. Let 𝑺\boldsymbol{S} be the Smith form of 𝑷\boldsymbol{P}; then the Smith form of 𝑳\boldsymbol{L} is diag(𝑺,𝑰n,,𝑰n)\mathrm{diag}{(}\boldsymbol{S},\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n}). Let the unimodular cofactors be 𝑨\boldsymbol{A}, 𝑩\boldsymbol{B}, 𝑪\boldsymbol{C}, 𝑫\boldsymbol{D} such that 𝑷=𝑨𝑺𝑩\boldsymbol{P}=\boldsymbol{A}\boldsymbol{S}\boldsymbol{B} and 𝑳=𝑪diag(𝑺,𝑰n,,𝑰n)𝑫\boldsymbol{L}=\boldsymbol{C}\mathrm{diag}{(}\boldsymbol{S},\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n})\boldsymbol{D}. Then, using the cofactors

𝑬=𝑪diag(𝑨1,𝑰n,,𝑰n) and F=diag(𝑩1,𝑰n,,𝑰n)𝑫,\boldsymbol{E}=\boldsymbol{C}\mathrm{diag}{(}\boldsymbol{A}^{-1},\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n})\mbox{ and }F=\mathrm{diag}{(}\boldsymbol{B}^{-1},\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n})\boldsymbol{D},

one has L=𝑬diag(𝑷,𝑰n,,𝑰n)𝑭L=\boldsymbol{E}\mathrm{diag}{(}\boldsymbol{P},\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n})\boldsymbol{F}.

Theorem 5.2.

Let τ\tau be a list of +1\ell+1 distinct nodes in 𝔽\mathbb{F}. Let 𝐏𝔽[z]n×n\boldsymbol{P}\in\mathbb{F}[z]^{n\times n} of degree no larger than \ell. Then 𝐋\boldsymbol{L} is equivalent to diag(𝐏,𝐈n,,𝐈n)\mathrm{diag}{(}\boldsymbol{P},\boldsymbol{I}_{n},\ldots,\boldsymbol{I}_{n}), in which 𝐋\boldsymbol{L} is the Lagrange interpolation linearization of 𝐏\boldsymbol{P} and there are +1\ell+1 identity blocks in the block diagonal equivalent.

Proof.

We’ll show the equivalence for each localization at an irreducible. From this the equivalence over 𝔽[z]\mathbb{F}[z] follows. We show the equivalence locally at any ff relatively prime to ww_{\ell}. This captures the general case where ff is relatively prime to all zτkz-\tau_{k} as well as the special case f=zτf=z-\tau_{\ell}. The cases for f=zτk,k<f=z-\tau_{k},k<\ell follow by symmetry.

Let 𝒁=diag((zτ1)In,,(zτ0)In),𝑹=[𝑷1𝑷0]\boldsymbol{Z}=\mathrm{diag}{(}(z-\tau_{\ell-1})I_{n},\ldots,(z-\tau_{0})I_{n}),\boldsymbol{R}=\begin{bmatrix}\boldsymbol{P}_{\ell-1}&\ldots&\boldsymbol{P}_{0}\end{bmatrix}, and 𝑩=[β1𝑰nβ0𝑰n]T\boldsymbol{B}=\begin{bmatrix}\beta_{\ell-1}\boldsymbol{I}_{n}&\ldots&\beta_{0}\boldsymbol{I}_{n}\end{bmatrix}^{T}. Then we have the block form

𝑳=[0𝑷𝑹β𝑰n(zτ)𝑰n0𝑩0𝒁]\boldsymbol{L}=\begin{bmatrix}0&\boldsymbol{P}_{\ell}&\boldsymbol{R}\\ -\beta_{\ell}\boldsymbol{I}_{n}&(z-\tau_{\ell})\boldsymbol{I}_{n}&0\\ -\boldsymbol{B}&0&\boldsymbol{Z}\\ \end{bmatrix}

Note that ww is a unit locally at ff so that ZZ is unimodular. Thus 𝑳\boldsymbol{L} is equivalent to 𝑺Z\boldsymbol{S}\oplus Z and to 𝑺In\boldsymbol{S}\oplus I_{n\ell}, where 𝑺\boldsymbol{S} is the following Schur complement of 𝒁\boldsymbol{Z} in 𝑳\boldsymbol{L}.

[0𝑷β𝑰n(zτ)𝑰n][𝑹0]𝒁1[𝑩0]=[𝑹𝒁1𝑩𝑷β𝑰n(zτ)𝑰n].\begin{bmatrix}0&\boldsymbol{P}_{\ell}\\ -\beta_{\ell}\boldsymbol{I}_{n}&(z-\tau_{\ell})\boldsymbol{I}_{n}\\ \end{bmatrix}-\begin{bmatrix}\boldsymbol{R}\\ 0\\ \end{bmatrix}\boldsymbol{Z}^{-1}\begin{bmatrix}-\boldsymbol{B}&0\\ \end{bmatrix}=\begin{bmatrix}\boldsymbol{R}\boldsymbol{Z}^{-1}\boldsymbol{B}&\boldsymbol{P}_{\ell}\\ -\beta_{\ell}\boldsymbol{I}_{n}&(z-\tau_{\ell})\boldsymbol{I}_{n}\\ \end{bmatrix}.

Then we may manipulate this block into the desired form diag(𝑷,𝑰n)\mathrm{diag}{(}\boldsymbol{P},\boldsymbol{I}_{n}).

[𝑰nβ1𝑹𝒁1𝑩0β1𝑰n][𝑹𝒁1𝑩𝑷β𝑰n(zτ)𝑰n][w𝑰n𝑰nw𝑰n0].\begin{bmatrix}-\boldsymbol{I}_{n}&-\beta_{\ell}^{-1}\boldsymbol{R}\boldsymbol{Z}^{-1}\boldsymbol{B}\\ 0&-\beta_{\ell}^{-1}\boldsymbol{I}_{n}\\ \end{bmatrix}\begin{bmatrix}\boldsymbol{R}\boldsymbol{Z}^{-1}\boldsymbol{B}&\boldsymbol{P}_{\ell}\\ -\beta_{\ell}\boldsymbol{I}_{n}&(z-\tau_{\ell})\boldsymbol{I}_{n}\\ \end{bmatrix}\begin{bmatrix}-w\boldsymbol{I}_{n}&\boldsymbol{I}_{n}\\ -w_{\ell}\boldsymbol{I}_{n}&0\\ \end{bmatrix}.
=[w𝑹𝒁1𝑩+w𝑷00𝑰n].=\begin{bmatrix}w\boldsymbol{R}\boldsymbol{Z}^{-1}\boldsymbol{B}+w_{\ell}\boldsymbol{P}_{\ell}&0\\ 0&\boldsymbol{I}_{n}\end{bmatrix}.

Expanding the leading block, we see that it is 𝑷\boldsymbol{P}:

wk=01𝑷k(zτk)1βk+w𝑷=k=0wk𝑷k=𝑷.w\sum_{k=0}^{\ell-1}\boldsymbol{P}_{k}(z-\tau_{k})^{-1}\beta_{k}+w_{\ell}\boldsymbol{P}_{\ell}=\sum_{k=0}^{\ell}w_{k}\boldsymbol{P}_{k}=\boldsymbol{P}.

5.3 Strict Equivalence

We prove the following theorem, establishing the strict equivalence of the companion pencil of equation (5.37), for a matrix polynomial 𝑷(z)\boldsymbol{P}(z) determined to grade \ell by interpolation at +1\ell+1 points, to the monomial basis linearization for 𝑷(z)=0z+2+0z+1+k=0𝑨kzk\boldsymbol{P}(z)=0\cdot z^{\ell+2}+0\cdot z^{\ell+1}+\sum_{k=0}^{\ell}\boldsymbol{A}_{k}z^{k} considered as a grade +2\ell+2 matrix polynomial. This establishes that the Lagrange basis pencil is in fact a linearization, indeed a strong linearization, independently of the singularity or not of any of the values of the matrix polynomial, and independently of the regularity of the matrix pencil.

Theorem 5.3.

Provided that the nodes τk\tau_{k} are distinct, then the Lagrange basis linearization in equation (5.37), namely 𝐋L(z)=z𝐂L,1𝐂L,0\boldsymbol{L}_{L}(z)=z\boldsymbol{C}_{L,1}-\boldsymbol{C}_{L,0}, is strictly equivalent to the monomial basis linearization 𝐋M(z)=z𝐂M,1𝐂M,0\boldsymbol{L}_{M}(z)=z\boldsymbol{C}_{M,1}-\boldsymbol{C}_{M,0}; in other words, there exist nonsingular constant matrices 𝐔\boldsymbol{U} and 𝐖\boldsymbol{W} such that both 𝐔𝐂L,1𝐖=𝐂M,1\boldsymbol{U}\boldsymbol{C}_{L,1}\boldsymbol{W}=\boldsymbol{C}_{M,1} and 𝐔𝐂L,0𝐖=𝐂M,0\boldsymbol{U}\boldsymbol{C}_{L,0}\boldsymbol{W}=\boldsymbol{C}_{M,0}. Explicitly, if 𝐕\boldsymbol{V} is the Vandermonde matrix with (i,j)(i,j) entry vi,j=τj1+1iv_{i,j}=\tau_{j-1}^{\ell+1-i} for 1i,j+11\leq i,j\leq\ell+1, then we have

𝑼=[𝑰n00𝑽𝑰n]\boldsymbol{U}=\begin{bmatrix}\boldsymbol{I}_{n}&0\\ 0&\boldsymbol{V}\otimes\boldsymbol{I}_{n}\end{bmatrix} (5.49)

and, if the node polynomial w(z)=k=0(zτk)=z+1+qz++q0w(z)=\prod_{k=0}^{\ell}(z-\tau_{k})=z^{\ell+1}+q_{\ell}z^{\ell}+\cdots+q_{0} has coefficients qkq_{k}, which we place in a row vector 𝐪=[q,q1,,q0]\boldsymbol{q}=[q_{\ell},q_{\ell-1},\ldots,q_{0}],

𝑾=[𝑰n𝒒0𝑽1𝑰n].\boldsymbol{W}=\begin{bmatrix}\boldsymbol{I}_{n}&\boldsymbol{q}\\ 0&\boldsymbol{V}^{-1}\otimes\boldsymbol{I}_{n}\end{bmatrix}\>. (5.50)
Proof.

Notice first that det𝑼=det𝑽𝑰n=i<j(τjτi)n\det\boldsymbol{U}=\det\boldsymbol{V}\otimes\boldsymbol{I}_{n}=\prod_{i<j}(\tau_{j}-\tau_{i})^{n} is not zero, and that det𝑾\det\boldsymbol{W} is the reciprocal of that. Both these matrices are therefore nonsingular if the nodes are distinct.

Next, it is straightforward to verify that 𝑼𝑪L,1𝑾=𝑪M,1\boldsymbol{U}\boldsymbol{C}_{L,1}\boldsymbol{W}=\boldsymbol{C}_{M,1} by multiplying out, and using the fact that the upper left corner block of each is the zero block, and that the rest is the identity because (𝑽𝑰n)(𝑽1𝑰n)=𝑰n(+1)(\boldsymbol{V}\otimes\boldsymbol{I}_{n})\cdot(\boldsymbol{V}^{-1}\otimes\boldsymbol{I}_{n})=\boldsymbol{I}_{n(\ell+1)}.

The remainder of the proof consists of detailed examination of the consequences of multiplying out the more complicated 𝑼𝑪L,0𝑾\boldsymbol{U}\boldsymbol{C}_{L,0}\boldsymbol{W}. From now on we drop the 𝑰n\otimes\boldsymbol{I}_{n} notation as clutter, and the proof is considered only for the scalar case, but the tensor products should be kept in mind as the operations are read. Writing

𝑪L,0=[0𝑷β𝑫]\boldsymbol{C}_{L,0}=\begin{bmatrix}0&-\boldsymbol{P}\\ \beta&\boldsymbol{D}\end{bmatrix} (5.51)

where 𝑫=diag(τ,τ1,,τ0)\boldsymbol{D}=\mathrm{diag}{(}\tau_{\ell},\tau_{\ell-1},\ldots,\tau_{0}), we have (using 𝑷𝑽1=𝑨\boldsymbol{P}\boldsymbol{V}^{-1}=\boldsymbol{A})

𝑪L,0𝑾=[0𝑨ββ𝒒+𝑫𝑽1].\boldsymbol{C}_{L,0}\boldsymbol{W}=\begin{bmatrix}0&-\boldsymbol{A}\\ \beta&\beta\boldsymbol{q}+\boldsymbol{D}\boldsymbol{V}^{-1}\end{bmatrix}\>. (5.52)

We seek to establish that

𝑼1𝑪M,0=[1𝑽1][0𝑨𝑨01010110]\boldsymbol{U}^{-1}\boldsymbol{C}_{M,0}=\begin{bmatrix}1&\\ &\boldsymbol{V}^{-1}\end{bmatrix}\begin{bmatrix}0&-\boldsymbol{A}_{\ell}&&\cdots&\boldsymbol{A}_{0}\\ 1&0&&&\\ &1&0&&\\ &&1&\ddots&\\ &&&1&0\end{bmatrix} (5.53)

is the same matrix. This means establishing that

[ββ𝒒+𝑫𝑽1]=[𝑽10].\begin{bmatrix}\beta&\beta\boldsymbol{q}+\boldsymbol{D}\boldsymbol{V}^{-1}\end{bmatrix}=\begin{bmatrix}\boldsymbol{V}^{-1}&0\end{bmatrix}\>. (5.54)

Begin by relating the monomial basis to the Lagrange basis:

[zz1z1]=[ττ1τ0τ1τ11τ01ττ1τ0111][w(z)w1(z)w1(z)w0(z)],\begin{bmatrix}z^{\ell}\\ z^{\ell-1}\\ \vdots\\ z\\ 1\end{bmatrix}=\begin{bmatrix}\tau_{\ell}^{\ell}&\tau_{\ell-1}^{\ell}&\cdots&\tau_{0}^{\ell}\\ \tau_{\ell}^{\ell-1}&\tau_{\ell-1}^{\ell-1}&\cdots&\tau_{0}^{\ell-1}\\ \vdots&&&\vdots\\ \tau_{\ell}&\tau_{\ell-1}&&\tau_{0}\\ 1&1&\cdots&1\end{bmatrix}\begin{bmatrix}w_{\ell}(z)\\ w_{\ell-1}(z)\\ \vdots\\ w_{1}(z)\\ w_{0}(z)\end{bmatrix}\>, (5.55)

where the wk(z)w_{k}(z) are the Lagrange basis polynomials wk(z)=βkjk(zτj)w_{k}(z)=\beta_{k}\prod_{j\neq k}(z-\tau_{j}). The matrix with the powers of τk\tau_{k} is, of course, just 𝑽\boldsymbol{V}.

Name the columns of 𝑽1=[Λ,Λ1,,Λ0]\boldsymbol{V}^{-1}=[\Lambda_{\ell},\Lambda_{\ell-1},\ldots,\Lambda_{0}]. We thus have to show that the final column of equation (5.54) is 0 and that

Λj1=βqj+𝑫Λj\Lambda_{j-1}=\beta q_{j}+\boldsymbol{D}\Lambda_{j} (5.56)

for j=1j=1, 22, \ldots, \ell, and that Λ=β\Lambda_{\ell}=\beta. But this is just a translation into matrix algebra of the +2\ell+2 polynomial coefficients of

βiw(z)=(zτi)wj(z)0i,\beta_{i}w(z)=(z-\tau_{i})w_{j}(z)\qquad 0\leq i\leq\ell\>, (5.57)

or zwi(z)=βiw(z)+τiwi(z)zw_{i}(z)=\beta_{i}w(z)+\tau_{i}w_{i}(z). The coefficients of different powers of zz in these identities provide the columns in the matrix equation (5.54). This completes the proof. ∎

6 Concluding Remarks

This paper shows how to use scalar matrix tools—specifically the Maple code for computing the Hermite normal form—and computation with low-dimensional examples to solve problems posed for matrix polynomials of arbitrary dimension and arbitrary block size. Current symbolic computation systems do not seem to offer facilities for direct computation with such objects.

The mathematical problems that we examined included the explicit construction of unimodular matrix polynomial cofactors 𝑬(z)\boldsymbol{E}(z) and 𝑭(z)\boldsymbol{F}(z) which would demonstrate that a given linearization 𝑳(z)\boldsymbol{L}(z) for a matrix polynomial 𝑷(z)\boldsymbol{P}(z) was, indeed, a linearization.

We showed that the approach discovered cofactors that were valid generically. This is because the primary tool used here, namely the Hermite normal form, is discontinuous at special values of the parameters (and indeed discovery of those places of discontinuity is the main purpose of the Hermite and Smith normal forms). Nonetheless, in the case of the monomial basis and others we were able to guess such a universal form (by replacing a leading coefficient block by the identity block) from the HNF. A separate investigation, based on the change-of-basis matrix but also inspired by experimental computation, found new explicit universal cofactors for all bases considered here. We also introduced a new reversal for polynomials expressed in the Bernstein basis which may have better numerical properties than the standard reversal does.

For the Bernstein basis, which is symmetric under z1zz\to 1-z, we were initially puzzled that the Hermite analogue had a problem if 𝑷(1)\boldsymbol{P}(1) was singular but not when 𝑷(0)\boldsymbol{P}(0) was singular. This asymmetry disappears by considering instead a Hermite analogue of the form

𝑯=[𝑷(z)𝑯2𝑰n𝑯0𝑰n].\boldsymbol{H}=\begin{bmatrix}\boldsymbol{P}(z)&&&&\\ \boldsymbol{H}_{\ell-2}&\boldsymbol{I}_{n}&&&\\ \vdots&&\ddots&&\\ \boldsymbol{H}_{0}&&&&\boldsymbol{I}_{n}\end{bmatrix}\>. (6.58)

This works only if 𝑷(0)1\boldsymbol{P}(0)^{-1} exists.

We plan to apply this method to Hermite interpolational bases, where (as previously for Lagrange interpolational bases) only local linearizations are currently known in the literature.

Acknowledgements 6.1.

RMC thanks Froilán Dopico for several very useful discussions on linearization, giving several references, and pointing out the difference between local linearization and linearization. The support of Western University’s New International Research Network grant is also gratefully acknowledged.

References

  • Amiraslani et al. [2008] Amir Amiraslani, Robert M. Corless, and Peter Lancaster. Linearization of matrix polynomials expressed in polynomial bases. IMA Journal of Numerical Analysis, 29(1):141–157, 2008.
  • Berrut and Trefethen [2004] Jean-Paul Berrut and Lloyd N. Trefethen. Barycentric Lagrange interpolation. SIAM Review, 46(3):501–517, 2004. doi: 10.1137/S0036144502417715. URL https://doi.org/10.1137/S0036144502417715.
  • Cachadina et al. [2020] Maria Isabel Bueno Cachadina, Javier Perez, Anthony Akshar, Daria Mileeva, and Remy Kassem. Linearizations for interpolatory bases - a comparison: New families of linearizations. The Electronic Journal of Linear Algebra, 36(36):799–833, December 2020. doi: 10.13001/ela.2020.5183. URL https://doi.org/10.13001/ela.2020.5183.
  • Corless [2004] Robert M Corless. Generalized companion matrices in the Lagrange basis. In Proceedings EACA, pages 317–322. Citeseer, 2004.
  • Corless and Watt [2004] Robert M. Corless and Stephen M. Watt. Bernstein bases are optimal, but, sometimes, Lagrange bases are better. In Proceedings of SYNASC, Timisoara, pages 141–153. MIRTON Press, 2004.
  • Dopico et al. [2020] Froilán M. Dopico, Silvia Marcaida, María C. Quintana, and Paul Van Dooren. Local linearizations of rational matrices with application to rational approximations of nonlinear eigenvalue problems. Linear Algebra and its Applications, 604:441–475, November 2020. doi: 10.1016/j.laa.2020.07.004. URL https://doi.org/10.1016/j.laa.2020.07.004.
  • Farouki [2012] Rida T. Farouki. The Bernstein polynomial basis: A centennial retrospective. Computer Aided Geometric Design, 29(6):379–419, 2012.
  • Farouki and Goodman [1996] Rida T. Farouki and T. Goodman. On the optimal stability of the Bernstein basis. Mathematics of Computation of the American Mathematical Society, 65(216):1553–1566, 1996.
  • Farouki and Rajan [1987] Rida T. Farouki and V. T. Rajan. On the numerical condition of polynomials in Bernstein form. Computer Aided Geometric Design, 4(3):191–216, 1987.
  • Gautschi [2016] Walter Gautschi. Orthogonal polynomials in MATLAB: Exercises and Solutions, volume 26. SIAM, 2016.
  • Güttel and Tisseur [2017] Stefan Güttel and Françoise Tisseur. The nonlinear eigenvalue problem. Acta Numerica, 26:1–94, 2017.
  • Jónsson [2001] Guðbjörn F. Jónsson. Eigenvalue methods for accurate solution of polynomial equations. PhD thesis, Center for Applied Mathematics, Cornell University, Ithaca, NY, 2001.
  • Jónsson and Vavasis [2004] Guðbjörn F. Jónsson and Stephen Vavasis. Solving polynomials with small leading coefficients. SIAM Journal on Matrix Analysis and Applications, 26(2):400–414, 2004.
  • Lawrence and Corless [2012] Piers W. Lawrence and Robert M. Corless. Numerical stability of barycentric Hermite root-finding. In Proceedings of the 2011 International Workshop on Symbolic-Numeric Computation, pages 147–148. ACM, 2012.
  • Mackey and Perović [2016] D. Steven Mackey and Vasilije Perović. Linearizations of matrix polynomials in Bernstein bases. Linear Algebra and its Applications, 501:162–197, July 2016. doi: 10.1016/j.laa.2016.03.019. URL https://doi.org/10.1016/j.laa.2016.03.019.
  • Mackey et al. [2015] D. Steven Mackey, Niloufer Mackey, and Françoise Tisseur. Polynomial eigenvalue problems: Theory, computation, and structure. In Numerical Algebra, Matrix Theory, Differential-Algebraic Equations and Control Theory, pages 319–348. Springer, 2015.
  • Storjohann [1994] Arne Storjohann. Computation of Hermite and Smith normal forms of matrices. Master’s thesis, University of Waterloo, 1994.
  • Wilkening and Yu [2011] Jon Wilkening and Jia Yu. A local construction of the smith normal form of a matrix polynomial. Journal of Symbolic Computation, 46(1):1–22, 2011. ISSN 0747-7171. doi: https://doi.org/10.1016/j.jsc.2010.06.025. URL https://www.sciencedirect.com/science/article/pii/S0747717110001136.