This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Cheriton School of Computer Science, University of [email protected], Massachusetts Institute of [email protected] \CopyrightBryce Sandlund and Yinzhan Xu \ccsdesc[500]Theory of computation Data structures design and analysis \supplement

Acknowledgements.
We would like to thank Virginia Vassilevska Williams for her valuable comments on a draft of this paper. We would like to thank the anonymous reviewers for their helpful suggestions.\hideLIPIcs\EventEditorsJohn Q. Open and Joan R. Access \EventNoEds2 \EventLongTitle42nd Conference on Very Important Topics (CVIT 2016) \EventShortTitleCVIT 2016 \EventAcronymCVIT \EventYear2016 \EventDateDecember 24–27, 2016 \EventLocationLittle Whinging, United Kingdom \EventLogo \SeriesVolume42 \ArticleNo23

Faster Dynamic Range Mode

Bryce Sandlund    Yinzhan Xu
Abstract

In the dynamic range mode problem, we are given a sequence aa of length bounded by NN and asked to support element insertion, deletion, and queries for the most frequent element of a contiguous subsequence of aa. In this work, we devise a deterministic data structure that handles each operation in worst-case O~(N0.655994)\tilde{O}(N^{0.655994}) time, thus breaking the O(N2/3)O(N^{2/3}) per-operation time barrier for this problem. The data structure is achieved by combining the ideas in Williams and Xu (SODA 2020) for batch range mode with a novel data structure variant of the Min-Plus product.

keywords:
Range Mode, Min-Plus Product
category:
\relatedversion

1 Introduction

Given a sequence of elements a1,a2,,ana_{1},a_{2},\ldots,a_{n}, the dynamic range mode problem asks to support queries for the most frequent element in a specified subsequence al,al+1,,ara_{l},a_{l+1},\ldots,a_{r} while also supporting insertion or deletion of an element at a given index ii. The mode of a sequence of elements is one of the most basic data statistics, along with the median and the mean. It is frequently computed in data mining, information retrieval, and data analytics.

The range mode problem seeks to answer multiple queries on distinct intervals of the data sequence without having to recompute each answer from scratch. Its study in the data structure community has shown that the mode is a much more challenging data statistic to maintain than other natural range queries: while range sum, min or max, median, and majority all support linear space dynamic data structures with poly-logarithmic or better time per operation [22, 7, 16, 12, 10], the current fastest dynamic range mode data structure prior to this paper requires a stubborn O(n2/3)O(n^{2/3}) time per operation [9]. Indeed, range mode is one of few remaining classical range queries to which our currently known algorithms may be far from optimal. As originally stated by Brodal et al. [4] and mentioned by Chan et al. [6] in 2011 and 2014, respectively, “The problem of finding the most frequent element within a given array range is still rather open.”

The current best conditional lower bound, by Chan et al. [6], reduces multiplication of two n×n\sqrt{n}\times\sqrt{n} boolean matrices to nn range mode queries on a fixed array of size O(n)O(n). This indicates that if the current algorithm for boolean matrix multiplication is optimal, then answering nn range mode queries on an array of size O(n)O(n) cannot be performed in time better than O(n3/2ϵ)O(n^{3/2-\epsilon}) time for ϵ>0\epsilon>0 with combinatorial techniques, or O(nω/2ϵ)O(n^{\omega/2-\epsilon}) time for ϵ>0\epsilon>0 in general, where ω<2.373\omega<2.373 [25, 13] is the square matrix multiplication exponent. This reduction can be strengthened for dynamic range mode by reducing from the online matrix-vector multiplication problem [17]. Using O(n)O(n) dynamic range mode operations on a sequence of length O(n)O(n), we can multiply a n×n\sqrt{n}\times\sqrt{n} boolean matrix with n\sqrt{n} boolean vectors given one at a time. This indicates that a dynamic range mode data structure taking O(n1/2ϵ)O(n^{1/2-\epsilon}) time per operation for ϵ>0\epsilon>0 is not possible with current knowledge.

Previous attempts indicate the higher O(n2/3)O(n^{2/3}) per operation cost as the bound to beat [6, 9]. Indeed, O~(n2/3)\tilde{O}(n^{2/3}) time per operation111We use the O~()\tilde{O}(\cdot) notation to hide poly-logarithmic factors. can be achieved with a variety of techniques, but crossing the O(n2/3)O(n^{2/3}) barrier appears much harder.

Progress towards this goal has been established with the recent work of Williams and Xu [26]. They show that by appealing to Min-Plus product of structured matrices, nn range mode queries on an array of size nn can be answered in O~(n1.4854)\tilde{O}(n^{1.4854}) time, thus beating the combinatorial lower bound for batch range mode. This result also shows a separation between batch range mode and dynamic range mode: while batch range mode can be completed in O(n1/2ϵ)O(n^{1/2-\epsilon}) time per operation, such a result for dynamic range mode would imply a breakthrough in the online matrix-vector multiplication problem.

Range mode is not the first problem shown to be closely related to the Min-Plus product problem. It is well-known that the all-pairs shortest paths (APSP) problem is asymptotically equivalent to Min-Plus product [11], in the sense that a T(n)T(n) time algorithm to compute the Min-Plus product of two n×nn\times n matrices implies an O(T(n))O(T(n)) time algorithm for APSP in nn-node graphs and vice versa. Although it is not known how to perform Min-Plus product of two arbitrary n×nn\times n matrices in time O(n3ϵ)O(n^{3-\epsilon}) for ϵ>0\epsilon>0, several problems reduce to Min-Plus products of matrices AA and BB which have nice structures that can be exploited. The simplest examples result by restricting edge weights in APSP problems [23, 24, 28, 5, 27]. Bringmann et al. [3] show Language Edit Distance, RNA-folding, and Optimum Stack Generation can be reduced to Min-Plus product where matrix AA has small difference between adjacent entries in each row and column. Finally, the recent work of Williams and Xu [26] reduces APSP in certain geometric graphs, batch range mode, and the maximum subarray problem with entries bounded by O(n0.62)O(n^{0.62}) to a more general structured Min-Plus product, extending the result of Bringmann et al. All of the above structured Min-Plus products are solvable in truly subcubic O(n3ϵ)O(n^{3-\epsilon}) time for ϵ>0\epsilon>0, improving algorithms in the problems reduced to said product.

The connection and upper bound established by Williams and Xu [26] of batch range mode to Min-Plus product suggest other versions of the range mode problem may be amenable to similar improvements. In particular, the ability to efficiently compute a batch of range mode queries via reducing to a structured Min-Plus product suggests that one might be able to improve the update time of dynamic range mode in a similar way.

1.1 Our Results

In this paper, we break the O(n2/3)O(n^{2/3}) time per operation barrier for dynamic range mode. We do so by adapting the result of Williams and Xu [26]. Specifically, we define the following new type of data structure problem on the Min-Plus product that can be applied to dynamic range mode, which may be of independent interest. Then we combine this data structure problem with the algorithm of Williams and Xu.

Problem \thetheorem (Min-Plus-Query problem).

During initialization, we are given two matrices A,BA,B. For each query, we are given three parameters i,j,Si,j,S, where i,ji,j are two integers, and SS is a set of integers. The query asks minkS{Ai,k+Bk,j}\min_{k\not\in S}\{A_{i,k}+B_{k,j}\}.

Our performance theorem is the following.

Theorem 1.1.

There exists a deterministic data structure for dynamic range mode on a sequence a1,,ana_{1},\ldots,a_{n} that supports query, insertion, and deletion in worst-case O~(N0.655994)\tilde{O}(N^{0.655994}) time per operation, where NN is the maximum size of the sequence at any point in time. The space complexity of the data structure is O~(N1.327997)\tilde{O}(N^{1.327997}).

Our result shows yet another application of the Min-Plus product to an independently-studied problem, ultimately showing a dependence of the complexity of dynamic range mode on the complexity of fast matrix multiplication. Further, in contrast to many other reductions to Min-Plus in which we must assume a structured input on the original problem [23, 24, 28, 5, 27, 26], our algorithm works on the fully general dynamic range mode problem. In this sense, our result is perhaps most directly comparable to the batch range mode reduction of Williams and Xu [26] and the Language Edit Distance, RNA-folding, and Optimum Stack Generation reductions of Bringmann et al. [3].

1.2 Discussion of Technical Difficulty

Despite the new O~(n1.4854)\tilde{O}(n^{1.4854}) time algorithm for batch range mode [26], we cannot directly apply the result to dynamic range mode. The main issue is the element deletion operation. In the range mode algorithm of Williams and Xu (and in many other range mode algorithms), critical points are chosen evenly distributed in the array, and the algorithm precomputes the range mode of intervals between every pair of critical points. In [26], the improvement is achieved via a faster precomputation algorithm, which uses a Min-Plus product algorithm for structured matrices. However, if element deletion is allowed, the results stored in the precomputation will not be applicable. For example, an interval between two critical points could contain xx copies of element aa, x1x-1 copies of element bb, and many other elements with frequencies less than x1x-1. During precomputation, the range mode of this interval would be aa. However, if we delete two copies of aa, there is no easy way to determine that the mode of this interval has now changed to bb.

We overcome this difficulty by introducing the Min-Plus-Query problem, as defined in Section 1.1. Intuitively, in the Min-Plus-Query problem, a large portion of the work of the Min-Plus product is put off until the query. It also supports more flexible queries. Using the Min-Plus-Query problem as a subroutine, we will be able to query the most frequent element excluding a set SS of forbidden elements. For instance, in the preceding example, we would be able to query the most frequent element that is not aa. This is the main technical contribution of the paper.

Another major difference between our algorithm for dynamic range mode and the batch range mode algorithm of Williams and Xu [26] is the need for rectangular matrix multiplication. In our algorithm, we treat elements that appear more than about N2/3N^{2/3} times differently from the rest (a similar treatment is given in the dynamic range mode algorithm of Hicham et al. [9]). However, the number of critical points we use is about N1/3N^{1/3}; thus the number of critical points and frequent elements differ. This contrasts with batch range mode, where elements that appear more than about n\sqrt{n} times are considered frequent and the number of critical points used coincides with the number of frequent elements. The consequence of this difference is that a rectangular matrix product is required for dynamic range mode, while a square matrix product sufficed in [26].

2 Related Work

The range mode problem was first studied formally by Krizanc et al. [18]. They study space-efficient data structures for static range mode, achieving a time-space tradeoff of O(n22ϵ)O(n^{2-2\epsilon}) space and O(nϵlogn)O(n^{\epsilon}\log n) query time for any 0<ϵ1/20<\epsilon\leq 1/2. They also give a solution occupying O(n2loglogn/logn)O(n^{2}\log\log n/\log n) space with O(1)O(1) time per query.

Chan et al. [6] also study static range mode, focusing on linear space solutions. They achieve a linear space data structure supporting queries in O(n)O(\sqrt{n}) time via clever use of arrays, which can be improved to O(n/logn)O(\sqrt{n/\log n}) time via bit-packing tricks. Their paper also introduces the conditional lower bound which reduces multiplication of two n×n\sqrt{n}\times\sqrt{n} boolean matrices to nn range mode queries on an array of size O(n)O(n). As mentioned, combined with the presumed hardness of the online matrix vector problem [17], this result indicates a dynamic range mode data structure must take greater than O(n1/2ϵ)O(n^{1/2-\epsilon}) for ϵ>0\epsilon>0 time per operation. Finally, Chan et al. [6] also give the first data structure for dynamic range mode. At linear space, their solution achieves O(n3/4logn/loglogn)O(n^{3/4}\log n/\log\log n) worst-case time per query and O(n3/4loglogn)O(n^{3/4}\log\log n) amortized expected time update, and at O(n4/3)O(n^{4/3}) space, their solution achieves O(n2/3logn/loglogn)O(n^{2/3}\log n/\log\log n) worst-case time query and amortized expected update time.

Recently, Hicham et al. [9] improved the runtime of dynamic range mode to worst-case O(n2/3)O(n^{2/3}) time per operation while simultaneously improving the space usage to linear. Prior to this paper, this result was the fastest data structure for dynamic range mode.

A cell-probe lower bound for static range mode has been devised by Greve et al. [15]. Their result states that any range mode data structure that uses SS memory cells of ww-bit words needs Ω(lognlog(Sw/n))\Omega(\frac{\log n}{\log(Sw/n)}) time to answer a query.

Via reduction to a structured Min-Plus product, Williams and Xu [26] recently showed that nn range mode queries on a fixed array of size nn can be answered in O~(n1.4854)\tilde{O}(n^{1.4854}) time. Williams and Xu actually show how to compute the frequency of the mode for each query. We can adapt this method to find the element that is mode using the following binary search. For query [l,r][l,r], we ask the frequency of the mode in range [l,(l+r)/2][l,(l+r)/2]. If it is the same, we repeat the search with right endpoint in range [(l+r)/2,r][(l+r)/2,r]; if it is not, we repeat the search with right endpoint in range [l,(l+r)/2][l,(l+r)/2]. Using this method, we can binary search until we determine when the frequency of the mode changes, thus finding the element that is mode in an additional O(logn)O(\log n) queries. The algorithm of Williams and Xu can also be used to speed up the preprocessing time of the O(n)O(n) space, O(n)O(\sqrt{n}) query time static range mode data structure to O~(n1.4854)\tilde{O}(n^{1.4854}) time.

Both static and dynamic range mode have been studied in approximate settings [2, 15, 8].

3 Preliminaries

We formally define the Min-Plus product problem and the dynamic range mode problem.

Problem 3.1 (Min-Plus product).

The Min-Plus product of an m×nm\times n matrix AA and an n×pn\times p matrix BB is the m×pm\times p matrix C=ABC=A\star B such that Ci,j=mink{A[i,k]+B[k,j]}C_{i,j}=\min_{k}\{A[i,k]+B[k,j]\}.

Problem 3.2 (Dynamic Range Mode).

In the dynamic range mode problem, we are given an initially empty sequence and must support the following operations:

  • Insert an element at a given position of the sequence.

  • Delete one element of the sequence.

  • Query the most frequent element of any contiguous subsequence. If there are multiple answers, output any.

It is guaranteed that the size of the array does not exceed NN at any point in time.

We use ω\omega to denote the square matrix multiplication exponent, i.e. the smallest real number such that two n×nn\times n matrices can be multiplied in nω+o(1)n^{\omega+o(1)} time. The current bound on ω\omega is 2ω<2.3732\leq\omega<2.373 [13, 25]. In this work, we will use fast rectangular matrix multiplication. Analogous to the square case, we use ω(k)\omega(k) to denote the exponent of rectangular matrix multiplication, i.e., the smallest real number such that an n×nkn\times n^{k} matrix and an nk×nn^{k}\times n matrix can be multiplied in nω(k)+o(1)n^{\omega(k)+o(1)} time. Le Gall and Urrutia [14] computed smallest upper bounds to date for various values of kk. In this work, we are mostly interested in values of ω(k)\omega(k) listed in Figure 1.

kk Upper Bound on ω(k)\omega(k)
1.751.75 3.0215913.021591
22 3.2516403.251640
Figure 1: Upper bounds for the exponent of multiplying an n×nkn\times n^{k} matrix and an nk×nn^{k}\times n matrix [14].

It is known that the function ω(k)\omega(k) is convex for k>0k>0 (see e.g. [19], [20]), so we can use values of ω(p)\omega(p) and ω(q)\omega(q) to give upper bounds for ω(k)\omega(k) as long as pkqp\leq k\leq q.

Fact 1.

When 0<pkq0<p\leq k\leq q, ω(k)kpqpω(q)+qkqpω(p)\omega(k)\leq\frac{k-p}{q-p}\omega(q)+\frac{q-k}{q-p}\omega(p).

Combining Figure 1 and Fact 1, we obtain the following bound on ω(k)\omega(k) when k[1.75,2]k\in[1.75,2].

Corollary 3.3.

When 1.75k21.75\leq k\leq 2, ω(k)0.920196k+1.41125\omega(k)\leq 0.920196k+1.41125.

4 Main Algorithm

A main technical component for our dynamic range mode algorithm is the use of the Min-Plus-Query problem, which is formally defined in Section 1. We are given two matrices A,BA,B. For each query, we are given three parameters i,j,Si,j,S, and we need to compute minkS{Ai,k+Bk,j}\min_{k\not\in S}\{A_{i,k}+B_{k,j}\}.

If we just use the Min-Plus-Query problem, we can only compute the frequency of the range mode. Although we can binary search for the most frequent element as described in Section 2, we are also able to return the witness from the Min-Plus-Query problem organically. This construction may be of independent interest.

Problem 4.1 (Min-Plus-Query-Witness problem).

During initialization, we are given two matrices A,BA,B. For each query, we are given three parameters i,j,Si,j,S, where i,ji,j are two integers, and SS is a set of integers. We must output an index kSk^{*}\notin S such that Ai,k+Bk,j=minkS{Ai,k+Bk,j}A_{i,k^{*}}+B_{k^{*},j}=\min_{k\not\in S}\{A_{i,k}+B_{k,j}\}.

If AA is an n×nsn\times n^{s} matrix and BB is an ns×nn^{s}\times n matrix, then the naive algorithm for Min-Plus-Query just enumerates all possible indices kk for each query, which takes O(ns)O(n^{s}) time per query. In order to get a faster algorithm for dynamic range mode, we need to achieve O~(n2+sϵ)\tilde{O}(n^{2+s-\epsilon}) preprocessing time and O~(nsϵ+|S|)\tilde{O}(n^{s-\epsilon}+|S|) query time, for some ϵ>0\epsilon>0, where A,BA,B are some special matrices generated by the range mode instance. Specifically, matrix BB meets the following two properties:

  1. 1.

    Each row of BB is non-increasing;

  2. 2.

    The difference between the sum of elements in the jj-th column and the sum of elements in the (j+1)(j+1)-th column is at most nsn^{s}, for any jj.

Williams and Xu [26] give a faster algorithm for multiplying an arbitrary matrix AA with such matrix BB, which leads to a faster algorithm for static range mode. We will show that nontrivial data structures exist for the Min-Plus-Query problem for such input matrices AA and BB. Such a data structure will lead to a faster algorithm for dynamic range mode.

In the following lemma, we show a data structure for the Min-Plus-Query problem when both input matrices have integer weights small in absolute value.

Lemma 4.2.

Let s1s\geq 1 be a constant. Let AA and BB be two integer matrices of dimension n×nsn\times n^{s} and ns×nn^{s}\times n, respectively, with entries in {W,,W}{}\{-W,\ldots,W\}\cup\{\infty\} for some W1W\geq 1. Then we can solve the Min-Plus-Query problem of AA and BB in O~(Wnω(s))\tilde{O}(Wn^{\omega(s)}) preprocessing time and O~(|S|)\tilde{O}(|S|) query time. The space complexity is O~(Wn2+n1+s)\tilde{O}(Wn^{2}+n^{1+s}).

Proof 4.3.

The algorithm uses the idea by Alon, Galil and Margalit in [1], which computes the Min-Plus product of A,BA,B in O~(Wnω(s))\tilde{O}(Wn^{\omega(s)}) time.

In their algorithm, they first construct matrix AA^{\prime} defined by

Ai,k={(ns+1)Ai,k+Wif Ai,k,0otherwise.A^{\prime}_{i,k}=\left\{\begin{array}[]{ll}(n^{s}+1)^{A_{i,k}+W}&\text{if }A_{i,k}\neq\infty,\\ 0&\text{otherwise}.\end{array}\right.

We can define BB^{\prime} similarly. Then the product ABA^{\prime}B^{\prime} captures some useful information about the Min-Plus product of AA and BB. Namely, for each entry (AB)i,j(A^{\prime}B^{\prime})_{i,j}, we can uniquely write it as t0rti,j(ns+1)t\sum_{t\geq 0}r^{i,j}_{t}(n^{s}+1)^{t} for integers 0rti,jns0\leq r^{i,j}_{t}\leq n^{s}. Note that rti,jr^{i,j}_{t} exactly equals the number of kk such that Ai,k+Bk,j=t2WA_{i,k}+B_{k,j}=t-2W. Thus, we can use ABA^{\prime}B^{\prime} to compute the Min-Plus Product of AA and BB.

In our algorithm, we use a range tree to maintain the sequence rti,jr^{i,j}_{t} for each pair of i,ji,j. The preprocessing takes O~(Wnω(s))\tilde{O}(Wn^{\omega(s)}) time, which is the time to compute ABA^{\prime}B^{\prime} and the sequences rti,jr^{i,j}_{t}.

During each query, we are given i,j,Si,j,S. We enumerate each kSk\in S, and decrement rAi,k+Bk,j+2Wi,jr^{i,j}_{A_{i,k}+B_{k,j}+2W} in the range tree if Ai,j+Bk,j<A_{i,j}+B_{k,j}<\infty. After we do this for every kSk\in S, we query the range tree for the smallest tt such that rti,j0r^{i,j}_{t}\neq 0, so t2Wt-2W is the answer to the Min-Plus-Query query. After each query, we need to restore the values of ri,jr^{i,j}, which can also be done efficiently. The query time is O~(|S|)\tilde{O}(|S|), since each update and each query of range tree takes O~(1)\tilde{O}(1) time. The space complexity should be clear from the algorithm.

In the previous lemma, the data structure only answers the Min-Plus-Query problem. In all subsequent lemmas, the data structure will be able to handle the Min-Plus-Query-Witness problem.

In the next lemma, we use Lemma 4.2 as a subroutine to show a data structure for the Min-Plus-Query-Witness problem when only matrix AA has small integer weights in absolute value.

Lemma 4.4.

Let s1s\geq 1 be a constant. Let AA and BB be two integer matrices of dimension n×nsn\times n^{s} and ns×nn^{s}\times n, respectively, where AA has entries in {W,,W}{}\{-W,\ldots,W\}\cup\{\infty\} for some W1W\geq 1, and BB has arbitrary integer entries represented by polylog n\text{\rm polylog\leavevmode\nobreak\ }n bit numbers. Then for every integer 1Pns1\leq P\leq n^{s}, we can solve the Min-Plus-Query-Witness problem of AA and BB in O(nsPWnω(s))O(\frac{n^{s}}{P}Wn^{\omega(s)}) preprocessing time and O(|S|+P)O(|S|+P) query time. The space complexity is O~(Wn2+sP+n1+2sP)\tilde{O}(\frac{Wn^{2+s}}{P}+\frac{n^{1+2s}}{P}).

Proof 4.5.

For simplicity, assume PP is a factor of nsn^{s}. We sort each column of matrix BB and put entries whose rank is between (1)P+1(\ell-1)P+1 and P\ell P into the \ell-th bucket. We use Kj,K_{j,\ell} to denote the set of row indices of entries in the \ell-th bucket of the column jj. We use Lj,L_{j,\ell} to denote the smallest entry value of the bucket Kj,K_{j,\ell}, and use Hj,H_{j,\ell} to denote the largest entry value. Formally,

Lj,=minkKj,Bk,jandHj,=maxkKj,Bk,j.L_{j,\ell}=\min_{k\in K_{j,\ell}}B_{k,j}\quad\text{and}\quad H_{j,\ell}=\max_{k\in K_{j,\ell}}B_{k,j}.

For each [ns/P]\ell\in[n^{s}/P], we do the following222We use [n][n], with nn integer, to denote the set {1,2,,n}\{1,2,\ldots,n\}.. We create an ns×nn^{s}\times n matrix BB^{\ell} and initialize all its entries to \infty. Then for each column jj, if Hj,Lj,2WH_{j,\ell}-L_{j,\ell}\leq 2W (we will call it a small bucket), we set Bk,j:=Bk,jLj,WB^{\ell}_{k,j}:=B_{k,j}-L_{j,\ell}-W for all kKj,k\in K_{j,\ell}. We will handle the case Hj,Lj,>2WH_{j,\ell}-L_{j,\ell}>2W (large bucket) later. Clearly, all entries in BB^{\ell} have values in {W,,W}{}\{-W,\ldots,W\}\cup\{\infty\}, so we can use the algorithm in Lemma 4.2 to preprocess AA and BB^{\ell} and store the data structure in DD^{\ell}. Also, for each pair (i,j)(i,j), we create a range tree 𝒯smalli,j\mathcal{T}_{\text{small}}^{i,j} on the sequence (AB1)i,j,(AB2)i,j,(AB3)i,j,(A\star B^{1})_{i,j},(A\star B^{2})_{i,j},(A\star B^{3})_{i,j},\ldots, (ABns/P)i,j(A\star B^{n^{s}/P})_{i,j}, which stores the optimal Min-Plus values when kk is from a specific small bucket. This part takes O~(nsPWnω(s))\tilde{O}(\frac{n^{s}}{P}Wn^{\omega(s)}) time. The space complexity is nsP\frac{n^{s}}{P} times more than the space complexity of Lemma 4.2, so space complexity of this part is O~(Wn2+sP+n1+2sP)\tilde{O}(\frac{Wn^{2+s}}{P}+\frac{n^{1+2s}}{P}).

We also do the following preprocessing for buckets where Hj,Lj,>2WH_{j,\ell}-L_{j,\ell}>2W. We first create a 0/10/1 matrix A¯\bar{A} where A¯i,k=1\bar{A}_{i,k}=1 if and only if Ai,kA_{i,k}\neq\infty. Then for each [ns/P]\ell\in[n^{s}/P], we create a 0/10/1 matrix B¯\bar{B}^{\ell} such that B¯k,j=1\bar{B}^{\ell}_{k,j}=1 if and only if kKj,k\in K_{j,\ell} and Hj,Lj,>2WH_{j,\ell}-L_{j,\ell}>2W. Then we use fast matrix multiplication to compute the product A¯B¯\bar{A}\bar{B}^{\ell}. If Kj,K_{j,\ell} is a large bucket, the (i,j)(i,j)-th entry of A¯B¯\bar{A}\bar{B}^{\ell} is the number of kKj,k\in K_{j,\ell} such that Ai,k<A_{i,k}<\infty; if Kj,K_{j,\ell} is a small bucket, the (i,j)(i,j)-th entry is 0. For each pair (i,j)(i,j), we create a range tree 𝒯largei,j\mathcal{T}_{\text{large}}^{i,j} on the sequence (A¯B¯1)i,j,(A¯B¯2)i,j,(A¯B¯3)i,j,,(A¯B¯ns/P)i,j(\bar{A}\bar{B}^{1})_{i,j},(\bar{A}\bar{B}^{2})_{i,j},(\bar{A}\bar{B}^{3})_{i,j},\ldots,(\bar{A}\bar{B}^{n^{s}/P})_{i,j}. This part takes O~(nsPnω(s))\tilde{O}(\frac{n^{s}}{P}n^{\omega(s)}) time, which is dominated by the time for small buckets. The space complexity is also dominated by the data structures for small buckets.

Now we describe how to handle a query (i,j,S)(i,j,S). First consider small buckets. In O(|S|)O(|S|) time, we can compute the set of small buckets Kj,K_{j,\ell} that intersect with SS. For each such Kj,K_{j,\ell}, we can query the data structure DD^{\ell} with input (i,j,SKj,)(i,j,S\cap K_{j,\ell}) to get the optimum value when kKj,k\in K_{j,\ell}. For each small bucket that intersects with SS, we can set its corresponding value in the range tree 𝒯smalli,j\mathcal{T}_{\text{small}}^{i,j} to \infty, then we can compute the optimum value of all small buckets that do not intersect with SS by querying the minimum value of the range tree 𝒯smalli,j\mathcal{T}_{\text{small}}^{i,j}. After this query, we need to restore all values in the range tree. It takes O~(|S|)\tilde{O}(|S|) time to handle small buckets on query.

Now consider large buckets. Intuitively, we want to enumerate indices in all large buckets Kj,K_{j,\ell} such that there exists an index kKj,([ns]S)k\in K_{j,\ell}\cap([n^{s}]\setminus S) where Ai,k<A_{i,k}<\infty. However, doing so would be prohibitively expensive. We will show that we only need two such buckets. Consider three large buckets l1<l2<l3l_{1}<l_{2}<l_{3}. Pick any k1Tj,l1,k3Tj,l3k_{1}\in T_{j,l_{1}},k_{3}\in T_{j,l_{3}} such that Ai,k1<A_{i,k_{1}}<\infty. Since

Ai,k1+Bk1,jW+Lj,l2<W+Hj,l22W<Ai,k3+Bk3,j,A_{i,k_{1}}+B_{k_{1},j}\leq W+L_{j,l_{2}}<W+H_{j,l_{2}}-2W<A_{i,k_{3}}+B_{k_{3},j},

k3k_{3} can never be the optimum. Thus, it suffices to find the smallest two buckets such that there exists an index kKj,([ns]S)k\in K_{j,\ell}\cap([n^{s}]\setminus S) where Ai,k<A_{i,k}<\infty, and then enumerate all indices in these two buckets. To find such two buckets, we can enumerate over all indices kSk\in S, and if Ai,k<A_{i,k}<\infty we can decrement the corresponding value in the range tree 𝒯largei,j\mathcal{T}_{\text{large}}^{i,j}. Thus, we can compute the two smallest buckets by querying the two earliest nonzero values in the range tree. We also need to restore the range tree after the query. The range tree part takes O~(|S|)\tilde{O}(|S|) time and scanning the two large buckets requires O(P)O(P) time. Thus, this step takes O~(|S|+P)\tilde{O}(|S|+P) time.

At this point, we will know the bucket that contains the optimum index kk^{*}. Thus, we can iterate all indices in this bucket to actually get the witness for the Min-Plus-Query-Witness query. It takes O(P)O(P) time to do so.

In summary, the preprocessing time, query time, and space complexity meet the promise in the lemma statement.

In the following lemma, we show a data structure for the Min-Plus-Query-Witness problem when the matrix BB has the bounded difference property, which means that nearby entries in each row have close values. The proof adapts the strategy of [26].

Lemma 4.6.

Let s1s\geq 1 be a constant. Let AA be an n×nsn\times n^{s} integer matrix, and let BB be an ns×nn^{s}\times n integer matrix. It is guaranteed that there exists 1Δmin{n,W}1\leq\Delta\leq\min\{n,W\}, such that for every kk, |Bk,j1Bk,j2|W|B_{k,j_{1}}-B_{k,j_{2}}|\leq W as long as j1/Δ=j2/Δ\lceil j_{1}/\Delta\rceil=\lceil j_{2}/\Delta\rceil. Then for every L=Ω(Δ)L=\Omega(\Delta), we can solve the Min-Plus-Query-Witness problem of AA and BB in O~(Δ2nsLWnω(s)+n2+sΔ)\tilde{O}(\Delta^{2}\frac{n^{s}}{L}Wn^{\omega(s)}+\frac{n^{2+s}}{\Delta}) preprocessing time and O~(L)\tilde{O}(L) query time, when |S|<L|S|<L. The space complexity is O~(Δ2Wn2+sL+Δ2n1+2sL+n2+sΔ)\tilde{O}(\frac{\Delta^{2}Wn^{2+s}}{L}+\frac{\Delta^{2}n^{1+2s}}{L}+\frac{n^{2+s}}{\Delta}).

Proof 4.7.

Preprocessing Step 1: Create an Estimation Matrix

First, we create a matrix B^\hat{B}, where B^k,j=Bk,j/ΔΔ\hat{B}_{k,j}=B_{k,\lceil j/\Delta\rceil\Delta}. By the property of matrix BB, |B^k,jBk,j|W|\hat{B}_{k,j}-B_{k,j}|\leq W for every k,jk,j. For each pair (i,j)(i,j), we compute the LL-th smallest value of Ai,k+B^k,jA_{i,k}+\hat{B}_{k,j} among all 1kns1\leq k\leq n^{s}, and denote this value by C^i,jL\hat{C}^{L}_{i,j}. Notice that C^i,jL=C^i,j/ΔΔL\hat{C}^{L}_{i,j}=\hat{C}^{L}_{i,\lceil j/\Delta\rceil\Delta}, so it suffices to compute C^i,jL\hat{C}^{L}_{i,j} when jj is a multiple of Δ\Delta, and we can infer other values correctly. It takes O(ns)O(n^{s}) time to compute each C^i,jL\hat{C}^{L}_{i,j}, so this step takes O(n2+s/Δ)O(n^{2+s}/\Delta) time.

If we similarly define Ci,jLC^{L}_{i,j} as the LL-th smallest value of Ai,k+Bk,jA_{i,k}+B_{k,j} among all 1kns1\leq k\leq n^{s}, then |Ci,jLC^i,jL|W|C^{L}_{i,j}-\hat{C}^{L}_{i,j}|\leq W by the following claim, whose proof is omitted for space constraint.

{claim*}

Given two sequences (ak)k=1m(a_{k})_{k=1}^{m} and (bk)k=1m(b_{k})_{k=1}^{m} such that |akbk|W|a_{k}-b_{k}|\leq W, then the LL-th smallest element of aa and the the LL-th smallest element of bb differ by at most WW.

Also, in O~(n2+s/Δ)\tilde{O}(n^{2+s}/\Delta) time, we can compute a sorted list smalli,j\mathcal{L}^{i,j}_{\text{small}} of indices kk sorted by the value Ai,k+B^k,jC^k,jLA_{i,k}+\hat{B}_{k,j}-\hat{C}^{L}_{k,j}, for every ii, and every jj that is a multiple of Δ\Delta.

The space complexity in this step is not dominating.

Preprocessing Step 2: Perform Calls to Lemma 4.4

For some integer ρ1\rho\geq 1, we will perform ρ\rho rounds of the following algorithm. At the rr-th round for some 1rρ1\leq r\leq\rho, we randomly sample jr[n]j^{r}\in[n], and let Ai,kr:=Ai,k+Bk,jrC^i,jrLA^{r}_{i,k}:=A_{i,k}+B_{k,j^{r}}-\hat{C}^{L}_{i,j^{r}} and Bk,jr:=Bk,jBk,jrB^{r}_{k,j}:=B_{k,j}-B_{k,j^{r}}. Clearly, Ai,kr+Bk,jr=Ai,k+Bk,jC^i,jrLA^{r}_{i,k}+B^{r}_{k,j}=A_{i,k}+B_{k,j}-\hat{C}^{L}_{i,j^{r}}. For each pair (i,k)(i,k), we find the smallest rr such that |Ai,kr|3W|A^{r}_{i,k}|\leq 3W. We keep these entries as they are and replace all other entries by \infty. For every (i,k)(i,k), there exists at most one rr such that Ai,krA^{r}_{i,k}\neq\infty. Then we use Lemma 4.4 to preprocess ArA^{r} and BrB^{r} for every 1rρ1\leq r\leq\rho. Thus, this part takes O(ρnsPWnω(s))O(\rho\frac{n^{s}}{P}Wn^{\omega(s)}) time, for some integer PP to be determined later. Note that this parameter also affects the query time. This step stores ρ\rho copies of the data structure from Lemma 4.4, so the space complexity is O~(ρWn2+sP+ρn1+2sP)\tilde{O}(\rho\frac{Wn^{2+s}}{P}+\rho\frac{n^{1+2s}}{P}).

Note that this step is the only step that uses randomization. We can use the method of [26], Appendix A, to derandomize it. We omit the details for simplicity.

Preprocessing Step 3: Handling Uncovered Pairs

For a pair (i,k)(i,k), if Ai,krA^{r}_{i,k}\neq\infty for any rr, we call (i,k)(i,k) covered; otherwise, we call the pair (i,k)(i,k) uncovered. For each pair (i,j)(i,j), we enumerate all kk such that |Ai,k+B^k,jC^i,jL|2W|A_{i,k}+\hat{B}_{k,j}-\hat{C}^{L}_{i,j}|\leq 2W and (i,k)(i,k) is uncovered. Notice that since Ai,k+B^k,jC^i,jL=Ai,k+B^k,j/ΔΔC^i,j/ΔΔLA_{i,k}+\hat{B}_{k,j}-\hat{C}^{L}_{i,j}=A_{i,k}+\hat{B}_{k,\lceil j/\Delta\rceil\Delta}-\hat{C}^{L}_{i,\lceil j/\Delta\rceil\Delta}, we only need to exhaustively enumerate all k[ns]k\in[n^{s}] when jj is a multiple of Δ\Delta. Thus, if the total number of (i,k,j)(i,k,j) where |Ai,k+B^k,jC^i,jL|2W|A_{i,k}+\hat{B}_{k,j}-\hat{C}^{L}_{i,j}|\leq 2W and (i,k)(i,k) is uncovered is XX, then we can enumerate all such triples (i,k,j)(i,k,j) in O(X+n2+s/Δ)O(X+n^{2+s}/\Delta) time.

It remains to bound the total number of triples that satisfy the condition. Fix an arbitrary pair (i,k)(i,k), and suppose the number of jj such that |Ai,k+B^k,jC^i,jL|2W|A_{i,k}+\hat{B}_{k,j}-\hat{C}^{L}_{i,j}|\leq 2W is at least (10+s)nlnn/ρ(10+s)n\ln n/\rho. Then with probability at least 1(1(10+s)lnnρ)ρ11n10+s1-(1-\frac{(10+s)\ln n}{\rho})^{\rho}\geq 1-\frac{1}{n^{10+s}}, we pick a jrj^{r} where |Ai,k+B^k,jrC^i,jrL|2W|A_{i,k}+\hat{B}_{k,j^{r}}-\hat{C}^{L}_{i,j^{r}}|\leq 2W. Therefore,

|Ai,kr|=|Ai,k+Bk,jrC^i,jrL||Ai,k+B^k,jrC^i,jrL|+|B^k,jrBk,jr|3W,|A^{r}_{i,k}|=\left|A_{i,k}+B_{k,j^{r}}-\hat{C}^{L}_{i,j^{r}}\right|\leq\left|A_{i,k}+\hat{B}_{k,j^{r}}-\hat{C}^{L}_{i,j^{r}}\right|+\left|\hat{B}_{k,j^{r}}-B_{k,j^{r}}\right|\leq 3W,

which means (i,k)(i,k) is covered. Therefore, with high probability, all pairs of (i,k)(i,k) where the number of jj such that |Ai,k+B^k,jC^i,jL|2W|A_{i,k}+\hat{B}_{k,j}-\hat{C}^{L}_{i,j}|\leq 2W is at least (10+s)nlnn/ρ(10+s)n\ln n/\rho will be covered. In other words, X=O(n1+snlnn/ρ)=O~(n2+s/ρ)X=O(n^{1+s}\cdot n\ln n/\rho)=\tilde{O}(n^{2+s}/\rho).

For each pair (i,j)(i,j), if we enumerate more than LL indices kk, we only keep the LL values of kk that give the smallest values of Ai,k+Bk,jA_{i,k}+B_{k,j}. We call this list triplei,j\mathcal{L}^{i,j}_{\text{triple}}. From previous discussion, the time cost in this step is O~(n2+s/ρ+n2+s/Δ)\tilde{O}(n^{2+s}/\rho+n^{2+s}/\Delta). Since we need to store all the triples, the space complexity is O(n2+s/ρ)O(n^{2+s}/\rho).

Handling Queries

Now we discuss how to handle queries. For each query (S,i,j)(S,i,j), let k=argminkSAi,k+Bk,jk^{*}=\operatorname*{arg\,min}_{k\not\in S}A_{i,k}+B_{k,j} be the optimum index. Consider two cases:

  • (i,k)(i,k^{*}) is covered. By definition of being covered, there exists a round rr such that Ai,kr=Ai,k+Bk,jrC^i,jrLA^{r}_{i,k^{*}}=A_{i,k^{*}}+B_{k^{*},j^{r}}-\hat{C}^{L}_{i,j^{r}}, so Ai,kr+Bk,jr=Ai,k+Bk,jC^i,jrLA^{r}_{i,k^{*}}+B^{r}_{k^{*},j}=A_{i,k^{*}}+B_{k^{*},j}-\hat{C}^{L}_{i,j^{r}}. Therefore, we can query the data structure in Lemma 4.4 for every ArA^{r} and BrB^{r} and denote brb^{r} as the result. The answer is given by the smallest value of br+C^i,jrLb^{r}+\hat{C}^{L}_{i,j^{r}} over all rr. The witness is given by the data structure of Lemma 4.4.

    Note that when querying ArA^{r} and BrB^{r}, we only need to pass the set {kS:Ai,kr}\{k\in S:A^{r}_{i,k}\neq\infty\}. For every kSk\in S, there is at most one rr such that Ai,krA^{r}_{i,k}\neq\infty, so the total size of the sets passing to the data structure of Lemma 4.4 is |S||S|. Thus, this case takes O(|S|+ρP)O(|S|+\rho P) time.

  • (i,k)(i,k^{*}) is uncovered. There are still two possibilities to consider in this case.

    • Possibility I: Ai,k+B^k,jC^i,jL<2WA_{i,k^{*}}+\hat{B}_{k^{*},j}-\hat{C}^{L}_{i,j}<-2W. In this case,

      Ai,k+Bk,jAi,k+B^k,j+W<C^i,jLW,A_{i,k^{*}}+B_{k^{*},j}\leq A_{i,k^{*}}+\hat{B}_{k^{*},j}+W<\hat{C}^{L}_{i,j}-W,

      so the optimum value is smaller than C^i,jL\hat{C}^{L}_{i,j}. By reading the list smalli,j/ΔΔ\mathcal{L}^{i,\lceil j/\Delta\rceil\Delta}_{\text{small}}, we can effectively find all such kk where Ai,k+B^k,jC^i,jL<2WA_{i,k}+\hat{B}_{k,j}-\hat{C}^{L}_{i,j}<-2W in time linear to the number of such kk. The number of such kk is at most LL, by the definition of C^i,jL\hat{C}^{L}_{i,j}. Thus, this part takes O(L)O(L) time.

    • Possibility II: Ai,k+B^k,jC^i,jL2WA_{i,k^{*}}+\hat{B}_{k^{*},j}-\hat{C}^{L}_{i,j}\geq-2W. In fact, in this case, we further have

      Ai,k+B^k,jC^i,jLAi,k+Bk,jCi,jL+2W2W,A_{i,k^{*}}+\hat{B}_{k^{*},j}-\hat{C}^{L}_{i,j}\leq A_{i,k^{*}}+B_{k^{*},j}-C^{L}_{i,j}+2W\leq 2W,

      where Ai,k+Bk,jCi,jL0A_{i,k^{*}}+B_{k^{*},j}-C^{L}_{i,j}\leq 0 because |S|<L|S|<L. Therefore, in this case, we have |Ai,k+B^k,jC^i,jL|2W|A_{i,k^{*}}+\hat{B}_{k^{*},j}-\hat{C}^{L}_{i,j}|\leq 2W, so we can enumerate all indices in triplei,j\mathcal{L}^{i,j}_{\text{triple}} and take the best choice. This takes O(L)O(L) time.

Time and Space Complexity

In summary, the preprocessing time is

O~(ρnsPWnω(s)+n2+s/Δ+n2+s/ρ),\tilde{O}\left(\rho\frac{n^{s}}{P}Wn^{\omega(s)}+n^{2+s}/\Delta+n^{2+s}/\rho\right),

and the query time is O~(L+ρP)\tilde{O}(L+\rho P). To balance the terms, we can set ρ=Δ\rho=\Delta and P=LΔP=\frac{L}{\Delta} to achieve a O~(Δ2nsLWnω(s)+n2+sΔ)\tilde{O}(\Delta^{2}\frac{n^{s}}{L}Wn^{\omega(s)}+\frac{n^{2+s}}{\Delta}) preprocess time and a O~(L)\tilde{O}(L) query time. Note that since we need P1P\geq 1, we must have L=Ω(Δ)L=\Omega(\Delta).

From the preprocessing steps, the space complexity is O~(ρWn2+sP+ρn1+2sP+n2+s/ρ)\tilde{O}(\rho\frac{Wn^{2+s}}{P}+\rho\frac{n^{1+2s}}{P}+n^{2+s}/\rho). Plugging in ρ=Δ\rho=\Delta and P=LΔP=\frac{L}{\Delta} reduces this to

O~(Δ2Wn2+sL+Δ2n1+2sL+n2+sΔ),\tilde{O}\left(\frac{\Delta^{2}Wn^{2+s}}{L}+\frac{\Delta^{2}n^{1+2s}}{L}+\frac{n^{2+s}}{\Delta}\right),

as given in the statement of the lemma.

The next lemma is our last data structure for Min-Plus-Query-Witness problems.

Lemma 4.8.

Let s1s\geq 1 be a constant. Let AA be an n×nsn\times n^{s} integer matrix and BB be an ns×nn^{s}\times n integer matrix. Suppose matrix BB satisfies

  1. 1.

    Each row of BB is non-increasing;

  2. 2.

    The difference between the sum of elements in the jj-th column and the sum of elements in the (j+1)(j+1)-th column is at most nsn^{s}, for any jj.

Then for every positive integer L=Ω(nω(s)2)L=\Omega(n^{\omega(s)-2}), we can solve the Min-Plus-Query-Witness problem of AA and BB in O~(n85+s+15ω(s)L15)\tilde{O}(n^{\frac{8}{5}+s+\frac{1}{5}\omega(s)}L^{-\frac{1}{5}}) preprocessing time and O~(L)\tilde{O}(L) query time, when |S|<L|S|<L. The space complexity is O~(L15n185+s45ω(s)+L35n95+2s25ω(s)+L15n85+s+15ω(s)).\tilde{O}(L^{-\frac{1}{5}}n^{\frac{18}{5}+s-\frac{4}{5}\omega(s)}+L^{-\frac{3}{5}}n^{\frac{9}{5}+2s-\frac{2}{5}\omega(s)}+L^{-\frac{1}{5}}n^{\frac{8}{5}+s+\frac{1}{5}\omega(s)}).

Proof 4.9.

Let Δ,W1\Delta,W\geq 1 be small polynomials in nn to be fixed later. Define I(j)I(j) to be the interval [jΔ+1,j][j-\Delta+1,j].

Let jj^{\prime} be any multiple of Δ\Delta. By property 2 of matrix BB, k=1nsBk,jk=1nsBk,j+1ns\sum_{k=1}^{n^{s}}B_{k,j}-\sum_{k=1}^{n^{s}}B_{k,j+1}\leq n^{s} for any jI(j)j\in I(j^{\prime}). Thus, we have

k=1nsBk,jΔ+1k=1nsBk,jΔns.\sum_{k=1}^{n^{s}}B_{k,j^{\prime}-\Delta+1}-\sum_{k=1}^{n^{s}}B_{k,j^{\prime}}\leq\Delta n^{s}.

By averaging, there are at most Δns/W\Delta n^{s}/W indices k[ns]k\in[n^{s}] such that Bk,jΔ+1Bk,j>WB_{k,j^{\prime}-\Delta+1}-B_{k,j^{\prime}}>W. We create a new matrix B^\hat{B}, initially the same as matrix BB. For each kk such that Bk,jΔ+1Bk,j>WB_{k,j^{\prime}-\Delta+1}-B_{k,j^{\prime}}>W, and for each jI(j)j\in I(j^{\prime}), we set B^k,j\hat{B}_{k,j} as MM, where MM is some large enough integer. After this replacement, B^k,jΔ+1B^k,jW\hat{B}_{k,j^{\prime}-\Delta+1}-\hat{B}_{k,j^{\prime}}\leq W for any kk and any jj^{\prime} multiple of Δ\Delta. Also, since B^k,jΔ+1B^k,jB^k,j\hat{B}_{k,j^{\prime}-\Delta+1}\geq\hat{B}_{k,j}\geq\hat{B}_{k,j^{\prime}} for any jI(j)j\in I(j^{\prime}), we have that |B^k,j1B^k,j2|W|\hat{B}_{k,j_{1}}-\hat{B}_{k,j_{2}}|\leq W as long as j1/Δ=j2/Δ\lceil j_{1}/\Delta\rceil=\lceil j_{2}/\Delta\rceil. Therefore, we can use Lemma 4.6 to preprocess AA and B^\hat{B} in O(Δ2nsLWnω(s)+n2+sΔ)O(\Delta^{2}\frac{n^{s}}{L}Wn^{\omega(s)}+\frac{n^{2+s}}{\Delta}) time. The space complexity is O~(Δ2Wn2+sL+Δ2n1+2sL+n2+sΔ)\tilde{O}(\frac{\Delta^{2}Wn^{2+s}}{L}+\frac{\Delta^{2}n^{1+2s}}{L}+\frac{n^{2+s}}{\Delta}).

On the other hand, note that B^\hat{B} differs with BB on at most n1+sΔ/Wn^{1+s}\Delta/W entries, so we need to do some extra preprocessing to handle those entries. For each pair (i,j)(i,j), we initialize a range tree 𝒯(i,j)\mathcal{T}^{(i,j)} whose elements are all \infty (it takes O~(1)\tilde{O}(1) time to initialize each range tree if we implement it carefully). Then for every kk such that Bk,jB^k,jB_{k,j}\neq\hat{B}_{k,j}, we set the kk-th element in 𝒯(i,j)\mathcal{T}^{(i,j)} as Ai,k+Bk,jA_{i,k}+B_{k,j}. The total number of operations we perform in all the range trees are O(n2+sΔ/W)O(n^{2+s}\Delta/W), so this part takes O~(n2+sΔ/W)\tilde{O}(n^{2+s}\Delta/W) time. The space complexity is also O~(n2+sΔ/W)\tilde{O}(n^{2+s}\Delta/W).

During a query (S,i,j)(S,i,j), we first query the data structure in Lemma 4.6 on matrix AA and B^\hat{B} with parameters (S,i,j)(S,i,j). Then we query the minimum value from the range tree 𝒯(i,j)\mathcal{T}^{(i,j)} after setting all Ai,k+Bk,jA_{i,k}+B_{k,j} as \infty for kSk\in S. Taking the minimum of these two queries gives the answer. The optimum index kk^{*} is either given by the data structure of Lemma 4.6 or can be obtained from the range tree.

Thus, the preprocessing time of the algorithm is

O~(Δ2nsLWnω(s)+n2+sΔ+n2+sΔ/W),\tilde{O}(\Delta^{2}\frac{n^{s}}{L}Wn^{\omega(s)}+\frac{n^{2+s}}{\Delta}+n^{2+s}\Delta/W),

and the query time is O~(L)\tilde{O}(L). We get the desired preprocessing time by setting Δ=L1/5n2515ω(s)\Delta=L^{1/5}n^{\frac{2}{5}-\frac{1}{5}\omega(s)} and W=Δ2W=\Delta^{2}. Since we need Δ1\Delta\geq 1, we require that L=Ω(nω(s)2)L=\Omega(n^{\omega(s)-2}). In Lemma 4.6, we also requires that L=Ω(Δ)L=\Omega(\Delta), but this is always true when L=Ω(nω(s)2)L=\Omega(n^{\omega(s)-2}).

By previous discussion, the space complexity is O~(Δ2Wn2+sL+Δ2n1+2sL+n2+sΔ+n2+sΔ/W)\tilde{O}(\frac{\Delta^{2}Wn^{2+s}}{L}+\frac{\Delta^{2}n^{1+2s}}{L}+\frac{n^{2+s}}{\Delta}+n^{2+s}\Delta/W). Plugging in the value for Δ\Delta and WW simplifies the complexity to

O~(L1/5n18/5+s4ω(s)/5+L3/5n9/5+2s2ω(s)/5+L1/5n8/5+s+ω(s)/5).\tilde{O}(L^{-1/5}n^{18/5+s-4\omega(s)/5}+L^{-3/5}n^{9/5+2s-2\omega(s)/5}+L^{-1/5}n^{8/5+s+\omega(s)/5}).

Finally, we can apply the data structure of Lemma 4.8 to prove Theorem 1.1.

Proof 4.10 (Proof of Theorem 1.1).

For clarity, we will use element to refer to a specific item aia_{i} of the sequence and use value to refer to all elements of a given type. Given a pointer to an element of the sequence aia_{i}, we assume the ability to look up its index ii in the sequence in O~(1)\tilde{O}(1) time by storing all elements of the sequence in a balanced binary search tree with worst-case time guarantees (e.g. a red-black tree). Thus we can go from index ii to element aia_{i} and back via appropriate rank and select queries on the balanced binary search tree. We may also add or remove an element aia_{i} from the sequence, and thus the binary search tree, in O~(1)\tilde{O}(1) time.

Let T1,T2,T3T_{1},T_{2},T_{3} be three parameters of the algorithm. Parameter T1T_{1} is a threshold that controls the number of “frequent” colors, T2T_{2} controls how frequently the data structure is rebuilt, and T3T_{3} represents the size of blocks in the algorithm.

We call values that appear more than N/T1N/T_{1} times frequent and all other values infrequent. Thus, there are at most T1T_{1} frequent values at any point in time. Note that a fixed value can change from frequent to infrequent, or from infrequent to frequent, via a deletion or insertion.

Infrequent Values

First, we discuss how to handle infrequent values. We maintain NT1\frac{N}{T_{1}} balanced search trees 𝒮𝒯1,,𝒮𝒯NT1\mathcal{BST}_{1},\ldots,\mathcal{BST}_{\frac{N}{T_{1}}}. For balanced search tree 𝒮𝒯k\mathcal{BST}_{k}, we prepare the key/value pairs in the following way. Fix a given value of the sequence. Say all its occurrences are at indices i1,i2,,iti_{1},i_{2},\ldots,i_{t}. Then we insert the key/value pairs (ix,ix+k1)(i_{x},i_{x+k-1}) to 𝒮𝒯k\mathcal{BST}_{k} for every 1xtk+11\leq x\leq t-k+1. However, the indices themselves would need updating when sequence aa is updated. Instead of inserting the indices themselves, we insert corresponding pointers to the nodes of the binary search tree that holds sequence aa. That way we can perform all comparisons using binary search tree operations in O~(1)\tilde{O}(1) time, without needing to update indices when sequence aa changes. We also augment each balanced search tree 𝒮𝒯i\mathcal{BST}_{i} so that every subtree stores the smallest value yy of any pair (x,y)(x,y) in the subtree. After an insertion or deletion, we need to update a total of O((NT1)2)O((\frac{N}{T_{1}})^{2}) pairs. Thus, we can maintain these balanced search trees in O~((NT1)2)\tilde{O}((\frac{N}{T_{1}})^{2}) time per operation.

During a query [l,r][l,r], we iterate through all the balanced search trees 𝒮𝒯1,,𝒮𝒯NT1\mathcal{BST}_{1},\ldots,\mathcal{BST}_{\frac{N}{T_{1}}}. If there exists a pair (i1,i2)𝒮𝒯k(i_{1},i_{2})\in\mathcal{BST}_{k} such that li1i2rl\leq i_{1}\leq i_{2}\leq r, then the range mode is at least kk. Thus, if the range mode is an infrequent value, we can find its frequency and corresponding value by querying the balanced search trees. The query time is O~(NT1)\tilde{O}(\frac{N}{T_{1}}), which is not the dominating term.

Newly Modified Values

We now consider how to handle frequent values. We handle newly modified values and unmodified values differently. We will rebuild our data structure after every T2T_{2} operations, and call values that are inserted or deleted after the last rebuild newly modified values.

For every value, we maintain a balanced search tree of occurrences of this value in the sequence. It takes O~(1)\tilde{O}(1) time per operation to maintain such balanced search trees. Thus, given an interval [l,r][l,r], it takes O~(1)\tilde{O}(1) time to query the number of occurrences of a particular value in the interval. We use this method to query the number of occurrences of each newly modified value. Since there can be at most T2T_{2} such values, this part takes O~(T2)\tilde{O}(T_{2}) time per operation.

Data Structure Rebuild

It remains to handle the frequent, not newly modified values during each rebuild. In this case, we will assume we can split the whole array roughly equally into a left half and right half. We can recursively build the data structure on these two halves so that we may assume a range mode query interval has left endpoint in the left half and right endpoint in the right half. The recursive construction adds only a poly-logarithmic factor to the complexity.

We split the left half and the right half into consecutive segments of length at most T3T_{3}, so that there are O(N/T3)O(N/T_{3}) segments. We call the segments P1,P2,,PmP_{1},P_{2},\ldots,P_{m} in the left half and Q1,Q2,,QmQ_{1},Q_{2},\ldots,Q_{m} in the right half, where segments with a smaller index are closer to the middle of the sequence.

Let v1,v2,,vlv_{1},v_{2},\ldots,v_{l} be the frequent values during the rebuild. We create a matrix AA such that Ai,kA_{i,k} equals the negation of the number of occurrences of vkv_{k} in segments P1,,PiP_{1},\ldots,P_{i}; similarly, we create a matrix BB such that Bk,jB_{k,j} equals the negation of the number of occurrences of vkv_{k} in segments Q1,,QjQ_{1},\ldots,Q_{j}. Note that the negation of the value Ai,k+Bk,jA_{i,k}+B_{k,j} is the frequency of value vkv_{k} in the interval from PiP_{i} to QjQ_{j}. It is not hard to verify that matrix BB satisfies the requirement of Lemma 4.8. We take the negation here since Lemma 4.8 handles (min,+)(\min,+)-product instead of (max,+)(\max,+)-product. Then we use the preprocessing part of Lemma 4.8 with matrices A,BA,B, and L=T2L=T_{2}. If we let T1=Nt1,T2=Nt2,T3=Nt3T_{1}=N^{t_{1}},T_{2}=N^{t_{2}},T_{3}=N^{t_{3}}, then in the notation of Lemma 4.8, n=m=O(N/T3)=O(N1t3)n=m=O(N/T_{3})=O(N^{1-t_{3}}) and ns=O(T1)=O(Nt1)n^{s}=O(T_{1})=O(N^{t_{1}}), so s=t11t3s=\frac{t_{1}}{1-t_{3}} and L=Nt2L=N^{t_{2}}. Thus, by Lemma 4.8 the rebuild takes

O~(N(1t3)(85+t11t3+15ω(t11t3))15t2)\tilde{O}(N^{(1-t_{3})(\frac{8}{5}+\frac{t_{1}}{1-t_{3}}+\frac{1}{5}\omega(\frac{t_{1}}{1-t_{3}}))-\frac{1}{5}t_{2}})

time. Since we perform the rebuild every T2T_{2} operations, the amortized cost of rebuild is

O~(N(1t3)(85+t11t3+15ω(t11t3))65t2)\tilde{O}(N^{(1-t_{3})(\frac{8}{5}+\frac{t_{1}}{1-t_{3}}+\frac{1}{5}\omega(\frac{t_{1}}{1-t_{3}}))-\frac{6}{5}t_{2}})

per operation.

Now we discuss how to handle queries for frequent, unmodified elements. For a query interval [l,r][l,r], we find all the segments inside the interval [l,r][l,r]. The set of such segments must have the form P1,PiQ1QjP_{1},\cup\cdots\cup P_{i}\cup Q_{1}\cup\cdots\cup Q_{j} for some i,ji,j. We scan through all elements in [l,r](P1PiQ1Qj)[l,r]\setminus(P_{1}\cup\cdots\cup P_{i}\cup Q_{1}\cup\cdots\cup Q_{j}), and use their frequency to update the answer. Since the size of segments is O(T3)O(T_{3}), the time complexity to do so is O~(T3)\tilde{O}(T_{3}).

For the segments P1,,Pi,Q1,,QjP_{1},\ldots,P_{i},Q_{1},\ldots,Q_{j}, we query the data structure in Lemma 4.8 with SS being the set of newly modified elements. The answer will be the most frequent element in the interval from PiP_{i} to QjQ_{j} that is not newly modified. By Lemma 4.8 this takes O(L)=O(Nt2)O(L)=O(N^{t2}) time per operation.

Time and Space Complexity

In summary, the amortized cost per operation is

O~(N22t1+Nt2+Nt3+N(1t3)(85+t11t3+15ω(t11t3))65t2).\tilde{O}(N^{2-2t_{1}}+N^{t_{2}}+N^{t_{3}}+N^{(1-t_{3})(\frac{8}{5}+\frac{t_{1}}{1-t_{3}}+\frac{1}{5}\omega(\frac{t_{1}}{1-t_{3}}))-\frac{6}{5}t_{2}}).

To balance the terms, we set t1=112t2t_{1}=1-\frac{1}{2}t_{2}, and t3=t2t_{3}=t_{2}. The time complexity thus becomes

O~(Nt2+N(1t2)(85+10.5t21t2+15ω(10.5t21t2))65t2).\tilde{O}(N^{t_{2}}+N^{(1-t_{2})(\frac{8}{5}+\frac{1-0.5t_{2}}{1-t_{2}}+\frac{1}{5}\omega(\frac{1-0.5t_{2}}{1-t_{2}}))-\frac{6}{5}t_{2}}).

By observation, we can note that the optimum value of 10.5t21t2\frac{1-0.5t_{2}}{1-t_{2}} lies in [1.75,2][1.75,2]. Thus, we can plug in Corollary 3.3 and use t2=0.655994t_{2}=0.655994 to balance the two terms. This gives an O~(N0.655994)\tilde{O}(N^{0.655994}) amortized time per operation algorithm.

The space usage has two potential bottlenecks. The first is the space to store 𝒮𝒯1,,𝒮𝒯NT1\mathcal{BST}_{1},\ldots,\mathcal{BST}_{\frac{N}{T_{1}}} for handling infrequent elements, which is O~(N2T1)\tilde{O}(\frac{N^{2}}{T_{1}}). The second is the space used by Lemma 4.8, which is

O~(Nt2/5N(1t3)(18/5+s4ω(s)/5)+N3t2/5N(1t3)(9/5+2s2ω(s)/5)+Nt2/5N(1t3)(8/5+s+ω(s)/5)).\tilde{O}(N^{-t_{2}/5}N^{(1-t_{3})(18/5+s-4\omega(s)/5)}+N^{-3t_{2}/5}N^{(1-t_{3})(9/5+2s-2\omega(s)/5)}+N^{-t_{2}/5}N^{(1-t_{3})(8/5+s+\omega(s)/5})).

By plugging in the values for t2,t3t_{2},t_{3} and ss, the space complexity becomes O~(N1.327997)\tilde{O}(N^{1.327997}), with the O~(N2T1)\tilde{O}(\frac{N^{2}}{T_{1}}) term being the dominating term.

Worst-Case Time Complexity

By applying the global rebuilding of Overmars [21], we can achieve a worst-case time bound. The basic idea is that after T2T_{2} operations, we don’t immediately rebuild the Min-Plus-Query-Witness data structure. Instead, we rebuild the data structure during the next T2T_{2} operations, spreading the work evenly over each operation. To answer queries during these T2T_{2} operations, we use the previous build of the Min-Plus-Query-Witness data structure. By this technique, the per-operation runtime can be made worst-case.

References

  • [1] Noga Alon, Zvi Galil, and Oded Margalit. On the exponent of the all pairs shortest path problem. J. Comput. Syst. Sci., 54(2):255–262, 1997.
  • [2] Prosenjit Bose, Evangelos Kranakis, Pat Morin, and Yihui Tang. Approximate range mode and range median queries. In Annual Symposium on Theoretical Aspects of Computer Science, 2005.
  • [3] Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Williams. Truly sub-cubic algorithms for language edit distance and rna folding via fast bounded-difference min-plus product. SIAM Journal on Computing, 48, 07 2017. doi:10.1137/17M112720X.
  • [4] Gerth Stølting Brodal, Beat Gfeller, Allan Jørgensen, and Peter Sanders. Towards optimal range medians. In Theoretical Computer Science, 2011.
  • [5] Timothy M. Chan. More algorithms for all-pairs shortest paths in weighted graphs. SIAM J. Comput., 39(5):2075–2089, 2010.
  • [6] Timothy M. Chan, Stephane Durocher, Kasper Green Larsen, Jason Morrison, and Bryan T. Wilkinson. Linear-space data structures for range mode query in arrays. Theory of Computing Systems, 55:719–741, 2014.
  • [7] Timothy M Chan and Konstantinos Tsakalidis. Dynamic orthogonal range searching on the ram, revisited. In LIPIcs-Leibniz International Proceedings in Informatics, volume 77. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
  • [8] Hicham El-Zein, Meng He, J. Ian Munro, Yakov Nekrich, and Bryce Sandlund. On approximate range mode and range selection. In 30th International Symposium on Algorithms and Computation (ISAAC 2019), 2019.
  • [9] Hicham El-Zein, Meng He, J. Ian Munro, and Bryce Sandlund. Improved time and space bounds for dynamic range mode. In Proceedings of the 26th Annual European Symposium on Algorithms, pages 25:1–25:13, 2018.
  • [10] Amr Elmasry, Meng He, J Ian Munro, and Patrick K Nicholson. Dynamic range majority data structures. In International Symposium on Algorithms and Computation, pages 150–159. Springer, 2011.
  • [11] Michael J Fischer and Albert R Meyer. Boolean matrix multiplication and transitive closure. In n 12th Annual Symposium on Switching and Automata Theory, pages 129–131. IEEE, 1971.
  • [12] Travis Gagie, Meng He, and Gonzalo Navarro. Compressed dynamic range majority data structures. In Data Compression Conference (DCC), 2017, pages 260–269. IEEE, 2017.
  • [13] François Le Gall. Powers of tensors and fast matrix multiplication. In International Symposium on Symbolic and Algebraic Computation, ISSAC 2014, Kobe, Japan, July 23-25, 2014, pages 296–303, 2014.
  • [14] François Le Gall and Florent Urrutia. Improved rectangular matrix multiplication using powers of the coppersmith-winograd tensor. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1029–1046. SIAM, 2018.
  • [15] Mark Greve, Allan Grønlund Jørgensen, Kasper Dalgaard Larsen, and Jakob Truelsen. Cell probe lower bounds and approximations for range mode. In International Colloquium on Automata, Languages, and Programming, 2010.
  • [16] Meng He, J. Ian Munro, and Patrick K. Nicholson. Dynamic range selection in linear space. In International Symposium on Algorithms and Computation, 2011.
  • [17] Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, page 21–30, 2015.
  • [18] Danny Krizanc, Pat Morin, and Michiel Smid. Range mode and range median queries on lists and trees. Nordic Journal of Computing, 2005.
  • [19] François Le Gall. Faster algorithms for rectangular matrix multiplication. In 2012 IEEE 53rd annual symposium on foundations of computer science, pages 514–523. IEEE, 2012.
  • [20] Grazia Lotti and Francesco Romani. On the asymptotic complexity of rectangular matrix multiplication. Theoretical Computer Science, 23(2):171–185, 1983.
  • [21] Mark H Overmars. The design of dynamic data structures, volume 156. Springer Science & Business Media, 1983.
  • [22] Mihai Pătraşcu and Emanuele Viola. Cell-probe lower bounds for succinct partial sums. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 117–122. Society for Industrial and Applied Mathematics, 2010.
  • [23] Raimund Seidel. On the all-pairs-shortest-path problem in unweighted undirected graphs. J. Comput. Syst. Sci., 51(3):400–403, 1995.
  • [24] Avi Shoshan and Uri Zwick. All pairs shortest paths in undirected graphs with integer weights. In 40th Annual IEEE Symposium on Foundations of Computer Science, pages 605–615, 1999.
  • [25] Virginia Vassilevska Williams. Multiplying matrices faster than Coppersmith-Winograd. In Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012, pages 887–898, 2012.
  • [26] Virginia Vassilevska Williams and Yinzhan Xu. Truly subcubic min-plus product for less structured matrices, with applications. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, 2020.
  • [27] Raphael Yuster. Efficient algorithms on sets of permutations, dominance, and real-weighted apsp. In 20th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 950–957, 2009.
  • [28] Uri Zwick. All pairs shortest paths using bridging sets and rectangular matrix multiplication. J. ACM, 49(3):289–317, 2002.