A -competitive algorithm for the Multilevel Aggregation Problem with Deadlines
Abstract
In this paper, we consider the multi-level aggregation problem with deadlines
(MLAPD) previously studied by Bienkowski et al. [8], Buchbinder et al. [14], and Azar and Touitou [3]. This is an online problem where the algorithm services requests arriving over time and can save costs by aggregating similar requests. Costs are structured in the form of a rooted tree. This problem has applications to many important areas such as multicasting, sensor networks, and supply-chain management. In particular, the TCP-acknowledgment problem, joint-replenishment problem, and assembly problem are all special cases of the delay version of the problem.
We present a -competitive algorithm for MLAPD. This beats the -competitive algorithm given in Buchbinder et al. [14]. Our approach illuminates key structural aspects of the problem and provides an algorithm that is simpler to implement than previous approaches. We also give improved competitive ratios for special cases of the problem.
1 Introduction
In many optimization contexts, aggregating tasks and performing them together can be more efficient. For example, consider sending multiple gifts to a relative. If there are multiple presents, it may be cheaper to bundle them all in one large box rather than sending each present individually. Although, aggregation is useful in many contexts, it is often constrained. For the gift example, the postal service may have weight restrictions on packages so bundling items together may be impossible. In such a situation, multiple smaller bundles may have to be constructed resulting in a complex optimization problem. The type of aggregation constraints we will be interested in form a tree-like structure that commonly appear in communication networks.
Besides the aggregation constraints, another difficulty in these problems are the urgency of the tasks. A task might have a hard deadline of when it must be completed or may accrue a cost the longer it is delayed. At the same time, delaying a task allows more tasks to pile up which better exploits the power of aggregation. This illustrates a natural trade-off between the aggregation benefits of delaying tasks and the cost of doing so. Generally, we will refer to strategies that mostly delay tasks to maximize aggregation benefits as lazy-aggregating and to strategies that prioritize the urgency of the tasks as eager-aggregating.
To illustrate these ideas, consider the example of a communication tree of employees at a restaurant. A health inspector may continually issue fines to a host at a restaurant until some concern is resolved. Depending on the scope of the concern, it may indicate issues more broadly in the company and so would need to be delivered to a higher official than just the local manager. However, the cost of the fines might not be as expensive as the cost for the host to send a message up the chain of command. So, at any time, the host will have to decide whether to send the fine now or wait to receive more messages to send together. In fact, there are many problems of a similar flavor involving transmitting control packages in communication networks [4, 10, 19, 18]. Similarly, many optimization problems in supply chain management face similar considerations [16, 20, 2, 26].
The online multi-level aggregation problem with delay (MLAP) introduced in [8] captures many of the scenarios mentioned above. In the deadline version (MLAPD) [14], which is a special case of the delay version, requests arrive over time and must be served between their time of arrival and the time they are due. The aggregation constraints are modeled using a node-weighted rooted tree and a service involves transmitting a subtree that contains the root. All requests in a service subtree are satisfied to model aggregation. Then, the goal is to minimize the total cost of all services. MLAP encapsulates many classic problems such as the TCP-acknowledgment Problem, the Joint Replenishment Problem (JRP), and the Assembly Problem.
1.1 Our Contributions
In this paper, we present a -competitive algorithm for MLAPD. This beats the previous best algorithm constructed in [14], which was roughly -competitive. Also, the proposed algorithm attacks the problem directly rather than relying on a reduction to -decreasing trees. This illuminates key structural aspects of the problem and results in the algorithm being simpler to implement than previous approaches.
The main analysis used is based off the framework of critically overdue nodes introduced in [8], but this framework is paired with a generalization of phases introduced in [7] to achieve a more refined analysis. In addition, using the ideas of late and early nodes within the phases context provides additional structure that can be exploited. Overall, the analysis uses a charging strategy that uses the concept of node investments to charge nodes proportionally to their impact on a service. The strategy exploits both top-down and bottom-up views of a service in order to maximize the precision of the charges. Investments also allow charging future nodes without causing complications. This in contrast to past approaches that only considered nodes available at the current moment or earlier.
Additionally, we give improved competitive algorithms for special cases of MLAPD. In particular, we present an algorithm with competitive ratio strictly less than for paths. This shows that the known lower-bound of for infinite graphs does not hold for finite graphs. The main analysis used exploits a notion of phases that utilizes the special structure of path instances. This approach involves looking at nested subpaths and then charging several such subpaths to a single path of the optimal schedule.
1.2 Related Work
MLAP and MLAPD were originally formulated by [8]. The offline versions of MLAP and MLAPD have been well studied. Even for , MLAPD and so MLAP is known to be APX-hard [6, 23, 2]. When the tree is a path, MLAP can be solved efficiently [9]. For MLAPD, the best approximation factor is [5, 8]. Using an algorithm for the assembly problem from [21], [24] constructs a -approximation for MLAP where is arbitrary.
For the online variant, the best competitive ratio for MLAP is due to an algorithm constructed in [3]. For MLAPD, the best competitive ratio known was due to an algorithm developed in [14]. The best known lower bound for MLAPD and MLAP for finite depth trees is , which comes from the lower bound of on JRP with deadlines (JRPD) given in [7]. If the depth of the tree is unbounded, the best lower bound is as shown in [8] by constructing a hard instance where the tree is a path of arbitrary length. This improves on the lower bound of presented in [9] for trees of unbounded depth. Additionally, [8] shows that is in fact the optimal competitive ratio for paths of unbounded length improving on the results in [9, 11].
There is even more known about MLAP for small . For , MLAP is equivalent to the TCP-acknowledgment problem which involves sending control messages in a single packet in a network. The offline version of the problem is equivalent to the lot sizing problem [26] and can be solved efficiently [1]. For the online variant, it is known that the optimal deterministic competitive ratio is [17]. The optimal competitive ratio when randomness can be used is known to be [19, 25, 12, 13]. For the deadline case, a simple greedy algorithm is exactly -competitive.
For , MLAP is equivalent to JRP. JRP tries to optimize the total shipping cost from suppliers to retailers when a warehouse is used as an intermediary place for storage. Although JRPD and so JRP is APX-hard [6, 23, 2], JRP permits an -approximation [7] which improves on the results in [22]. In addition, JRPD permits a -approximation [6]. In the online variant, [15] gives a -competitive algorithm for JRP which improves upon the -competitive algorithm given in [11]. This competitive ratio is nearly tight as the best known lower bound for the competitive ratio of JRP is [7]. For JRPD, the optimal competitive ratio is exactly [7].
2 Preliminaries
Multi-level aggregation problem.
The multi-level aggregation problem with deadlines (MLAPD) is defined by a pair, , where is a node-weighted tree rooted at a node and is a set of requests that arrive at the nodes of over some time period. A request is specified by an arrival time , a deadline , and a node at which the request is issued, . Without loss of generality, we can assume that the deadlines of each request are distinct and that the cost of every node is positive. We define to the height of the tree plus one or, equivalently, the maximum number of nodes present in a path in . We call the depth or number of levels of the tree. Then, we define the level of a node, , to be the number of nodes on the unique to path in .
A service is a pair where is a sub-tree of containing the root and is the time that is transmitted. Notice the definition of a service implies that if and is an ancestor of , then . We refer to this fact as the subtree property of services. We sometimes refer to the subtree as the service when the time of the service is clear or irrelevant to the given context. A request is satisfied by a service, , if and . A schedule is a set of services . A schedule is a solution to MLAPD if every request in is satisfied by some service in the schedule. The cost of a schedule is where is the total cost of nodes in the service. Note that this node-weighted definition is equivalent to the original definition of the problem as shown by [14].
In the online setting, is known to the algorithm but the request arrive in an online fashion. In particular, if is a request, then the algorithm receives this request at time . At time , the algorithm must decide whether to transmit a service and if so decide what nodes to include in the service given only knowledge of requests whose arrival time are before or at time . The algorithms we consider only keep track of active requests, i.e. requests that have arrived yet have not been served previously.
For a node , we let denote the subtree of rooted at . Given a request and a time , we use the notation to mean that was issued at a node in by time and has not yet been satisfied. Then, if , we define to be the nodes on the path in from to the node at which is issued. For the root, we use the special notation to be . If is any subtree, then we define to be the nodes on the path to in that are not already in . We also define to be the cost of the nodes of the path to that are not in .
Critically overdue.
It is known that there is always an optimal schedule that only transmits at times that are deadlines of requests. Given such an optimal schedule and a schedule of some algorithm, we can define a notion of a node being late. Suppose is a service of the algorithm in the given schedule and that . Then, we define to be the deadline of the earliest request in , which we call the deadline of or the urgency of . Also, we define to be the time of the next service of the optimal schedule that includes and is transmitted at a time strictly after time . Note, we sometimes also use to reference this service of the optimal schedule instead of the time at which it was transmitted. If no other services of the optimal schedule include at time or later, then we define . Now, we say that a node is late at time if . This is so called since the optimal schedule must have already satisfied the earliest due request in as the next time this schedule includes this request will have already been due. Next, we say is critically overdue at time if is late at time and if there is no service of the algorithm’s schedule including that is transmitted at a time in [8]. We often will say the pair is critically overdue to mean is critically overdue at time .
3 Special Tree Classes
In this section we present new upper bounds on the competitive ratio for MLAPD when restricted to special classes of graphs. The arguments in this section serve as a warm up for the main argument in the next section. Also, the new upper bound for path instances shows that the known lower-bound of on the competitive ratio does not hold for finite graphs.
3.1 Increasing Trees
An increasing tree is a tree where weights continually increase as one travels away from the root down any path in the tree. We consider an algorithm that serves only the path to a due request. In particular, whenever a request becomes due, the algorithm transmits exactly the path to the due request and nothing else. Hence, every service is of the form . We call this algorithm Noadd since it never adds any nodes to a service other than the bare minimum.
Theorem 3.1.
Noadd is -competitive for any Increasing Tree of depth .
Proof.
We use a simple charging argument. Suppose is a request that causes a service of ALG at time . Further, suppose that , the node at which was issued, is at level and are the nodes on the path . Then, by definition, we know that at the time of its satisfaction by ALG, is due and so must have been served by OPT by this time. Suppose was the service of OPT that satisfied . We charge the entire cost of this service of ALG just to the occurrence of appearing in . We note that since the tree is increasing. Then, the charge to is exactly . Since , the total charge to is at most . Also, since all the requests at are cleared after this transmission, the next time a request issued at becomes due must be strictly after the current time . By the same argument, at that time a new service of OPT must have been transmitted that contains and it will be this occurrence of that is charged next. Thus, all the charges are disjoint and so the charge to any node of OPT is at most . Thus, ALG is -competitive.
∎
Note that the argument above is actually proving is critically overdue at time . Then, the charging scheme boils down to charging critically overdue nodes. We can use the same ideas to get an even better result for -increasing trees. -increasing trees simply satisfy the property that if is an ancestor of then .
Corollary 3.1.1.
Noadd is -competitive for any -increasing Tree of depth .
Proof.
We use the same argument as for increasing trees but refine the charge to any node. To avoid confusion we use to mean the level and to be the factor of increase. With the same setup as before, the only change is the total charge to . Note that since the tree is -increasing. In particular, . Then,
So, the total charge to is . The rest of the argument goes through as before. Thus, ALG is -competitive.
∎
3.2 Paths
In our context, we consider a path to simply be a path graph where one of the endpoints is the root of the tree. Consider the algorithm, Double, that upon a request becoming due, first adds the subpath from the root to the node at which was issued, . Then, the algorithm continually tries to extend the current subpath containing the root by adding other subpaths that connect the last node in the current subpath to nodes holding urgent requests. If adding a subpath would cause the resultant service to cost more than , then the path is not added and the algorithm transmits the current service. This iterative procedure is described by Algorithm 1.
Theorem 3.2.
Double is -competitive for any path with depth .
Proof.
We will use a charging argument that charges the cost of several services of ALG to a single service of OPT. We visualize each subpath as an interval going from the root to another node. Then, this strategy boils down to charging sequences of nested intervals of ALG to a single interval of OPT.
Let be a service of ALG. Since is due at time , OPT must have already satisfied at some earlier time . If is the service of OPT satisfying , then we know that by the subtree property of services. Also, the subtree property and the fact that the tree is a path implies that either or . Since each cost is positive, this corresponds to in the first case and in the second case.
-
•
Suppose that so that ALG’s interval contains OPT’s interval. By definition of ALG, we have that . Thus, we charge to at a total charge of
Here, we used the fact that and so . We claim that no other service of ALG satisfying will be charged to . For the sake of contradiction, suppose that and are both charged to . WLOG, we assume that . Note, it must be the case that and both arrive before time for to satisfy them. Thus, when was constructed, had already arrived. Since triggered a service, we know it was not satisfied by . Hence, the stopping conditions of ALG imply that . But, containing then implies that so could not have satisfied , a contradiction. Thus, only one service of ALG with larger cost may be charged to any service of OPT.
-
•
Suppose that so that ALG’s interval is strictly contained in OPT’s interval. We will pay for all such at once. Suppose that are requests in order of increasing deadline that are all satisfied by and that each trigger a service of ALG having smaller service cost than . Let be the services of ALG triggered by . By Lemma 3.3 we know that
and so the charge to for all these intervals is .
Overall, we see any service of OPT is charged at most larger services of ALG and at most for all the smaller services of ALG. Thus, the total charge made to any service of OPT is at most . Hence, the algorithm is -competitive, where is the maximal number of smaller services that we charge to any service of OPT. A trivial upper bound for is as each interval must differ by at least one node. Thus, ALG is -competitive.
∎
Lemma 3.3.
Suppose that is a service of OPT and are requests in order of increasing deadline that are satisfied by and that trigger services of ALG each having smaller service cost than . Then,
We defer the proof of Lemma 3.3 to section A of the appendix. In fact, the analysis above can be refined to give a -competitive ratio. The proof of this stronger result can also be found in section A of the appendix.
Corollary 3.3.1.
The competitive ratio lower bound of does not hold for finite paths. Thus, the true competitive ratio for MLAPD for path instances is either at most or non-integer.
Proof.
The claims are immediate as for any finite . ∎
4 Arbitrary Trees
In this section, we present a -competitive algorithm for MLAPD on arbitrary trees of depth .
Intuition.
Since the path to a due request must always be served, the main task is to decide what other nodes will be added to the service. The main observation is that several of these nodes have been included in ALG unnecessarily in the past. So, we need to justify their inclusion in the current service. For a node , we do this by satisfying requests in using a procedure called a -fall. This way if was indeed serviced too many times before, we have other nodes that can effectively be used to pay for ’s cost in the analysis. Repeating this logic, we need to recursively justify all the nodes added by the -fall, which will be nodes of higher level than . Eventually, we find good nodes of low level, or this process reaches the leaves, which will always have the desired properties.
Clearly, for this to work the fall will have to include enough new nodes to pay for the cost of the given node. At the same time, adding too many nodes would runs the risk of too much eager-aggregation which would just cause more of the same problem we seek to avoid. To deal with this, we used a fixed budget that determines how many nodes can be added by the fall. However, the path to a request in may have cost far greater than ’s budget. To avoid ’s budget going to waste, we discount the cost of this path so that it becomes easier to include by other falls. We treat this amount discounted as an investment that can be used to justify ’s inclusion. To implement this, we keep track of a price, , for every node that represents how much of is leftover to be paid for. Then, we only need a fall to pay for the price of a path in order to include it in a service, rather than its full cost.
Overall, the fall will add paths to requests in increasing order of deadline. Naturally, this ordering ensures we satisfy many urgent requests. Though, some requests that aren’t urgent may be cheap to add and so should be considered as well. However, the fact that every node included triggers its own fall ensures that requests that aren’t globally urgent but are inexpensive to include are added to the service.
Falls.
Formally, given a tentative service tree at time , we define a -fall to be an iterative procedure that considers requests in for addition to the service. We define to be the set of nodes, , constructed by the -fall at time . We will write to mean , where is the tentative service tree right before this path was added to the fall . If the nodes on a path to a request that are not already included in the service, , can be included without the total price exceeding ’s budget, then the nodes are added to the service and the budget reduced by the corresponding price of these nodes. This procedure continues until there are no pending requests in or until the addition of a path to a request exceeds the remaining budget. If a path exceeds the remaining budget, then we reduce the price of nodes on this path by the remaining budget. In particular, we evenly divide the remaining budget amongst the price of the path and then reduce each node’s price proportional to its contribution to the total price of the path. We refer to these nodes as overflow nodes for . Whenever a fall causes some node’s price to change, we think of this fall as investing in that node. This entire process is described as Algorithm 4. Note we refer to the entire process (Algorithm 2, Algorithm 3, Algorithm 4) as well as just Algorithm 3 as Waterfall.
Waterfall Interpretation.
Algorithm 3 behaves like a flow in a flow network defined by . The sources are the root and other nodes on the path to the due request and the sinks are generally the leaves. Roughly, each -fall is a way of selecting which children of in the network the flow will continue through. Recursively, all such nodes also send the amount of flow it received down its subtree, and we continue the process until we reach the leaves which cannot send their flow elsewhere. Nodes may have excesses if they don’t add enough in a fall, but excess is invested into the overflow nodes for their eventual inclusion. So, if we take future inclusions into account then conservation is preserved. With this flow interpretation, the algorithm behaves like a waterfall where water runs over ledges which act as containers. Whenever enough water fills up on a ledge, it spills over causing a smaller waterfall.
Theorem 4.1.
Waterfall is -competitive on trees of depth .
The main approach to analyzing Waterfall will be a charging argument. Whenever a node is included in a service of ALG, its cost must be paid for by charging to a service of OPT. We think of the amount invested in a node at some time as the total cost it is responsible for. If the node is critically overdue, then it can pay for its own cost and the amount invested in it. Otherwise, a fall investing a node’s budget to its descendants can be interpreted as the node delegating the payment of its cost to its descendants. Then, these descendants will either be able to pay for their own costs as well as the ancestors’, or will further transfer responsibility of payment to their descendants. Continuing this process, nodes that can pay off the costs will eventually be reached. In particular, the leaves of the tree will eventually be reached and these will be critically overdue.
However, nodes in the subtree of a critically overdue node may be early and so cannot pay for themselves. The cost of such nodes will also be charged to their critically overdue ancestor. Thus, a critically overdue node will have to pay both for its ancestors (including itself) and its early descendants. This charging idea can be visualized as the evolution of debt in a family tree. The family may continually pass down debt from generation to generation. At some point, the descendants will have accumulated enough wealth to pay-off the debt of their ancestors. Though these descendants might also have children who will accumulate further debt. These unlucky descendants then would be be tasked with paying off debts from both sides, ancestors and descendants.
4.1 ALG’s Structure
Phases
It will be helpful to break services of ALG into different phases. Suppose that are all the services of OPT that include the node in chronological order of transmission time. Then, for some suppose are all the services of ALG that include and satisfy where . We call these services of ALG a -phase and say that starts the -th -phase. For , a -phase consists of all the services of ALG containing that are transmitted at time or later. Note that if a request becomes due in the -th -phase, then must be satisfied by one of for . Consequently, if a request, , arrives in during a -phase, it cannot become due before the next -phase since no service of OPT includes during a -phase and so cannot satisfy . Thus, if is late at time during a phase, it must have been satisfied earlier. Also, for any service in the th -phase, we have that and or if .
Suppose is late at the time when is transmitted in the phase. Then, we have is late at all previous services in the phase. This is because ALG always considers the earliest due requests in for satisfaction first. Specifically, since late requests cannot arrive in the middle of a phase, we know the request causing ALG to enter , , must have been available in the previous services of ALG. Hence, the earliest due request in for , , had an earlier deadline so is also overdue. Formally,
Thus, we can partition any phase into two pieces: an initial segment where is late and a later segment where is not late. We call the former services late with respect to the phase, and the latter early. Note it could be that a phase has only early or only late services. However, -phases have only late services since transmissions of the algorithm only happen when a request becomes due.
Let be a node included in a service of ALG at time . Recall, we say is critically overdue at time if is late at time and if for any service of the algorithm’s schedule including at a time we have that is not late at time . In terms of phases, is critically overdue at time if the service of ALG at time is the last late service in its respective -phase. We “charge” to critically overdue nodes, , by charging to the occurrence of that appears in . We have that charging to critically overdue nodes is injective since only one service in any phase may be critically overdue.
Next, we discuss structural properties of late and critically overdue nodes. First, we note that if is late and the -fall satisfies all requests in , then must be critically overdue. This is clear since a new request must arrive after time in order for to become late again and a new service of OPT must be transmitted to satisfy this new request.
Lemma 4.2.
Suppose that is late at time and that is the earliest due request in . Then, for all , is late at time . Furthermore, if is late at time and is early at time , then is critically overdue at time .
Proof.
If is late at time then, by definition, we know that . Since is the earliest due request in , we also know that . If for a request was due earlier than , would also be an earlier due request in . Hence, we have is the earliest due request in , so . Now, by the subtree property of services, if OPT includes in a service it must also include in that service implying that . Thus, we see that and so is late at time .
Now, further suppose that is early at time . Also, suppose that ALG next includes at time . If is early at time , then either this service at time is in a new -phase or this service is an early service in the same -phase. In either case, the partition of phases into earlier and late parts implies that the service at time is the last late service for the -phase. Hence, is critically overdue at time .
Next, suppose that is late at time . Again, if the service at time is in a new -phase, is critically overdue at time . If the service at time is in the same -phase as the service at time , we show that at time was actually late. In particular, the services at times and being in the same -phase implies that and . We claim that the earliest due request, , in was issued by time . If not, we have that . This means that arrived and became due between two services of OPT and so OPT never satisfied this request. This is impossible by feasibility of OPT and so . Since was included in the -fall at time , but was not satisfied by this -fall, we know that . But, since and is late at time , we have
Finally, applying the subtree property at time , we have that and so is late at time . This contradicts that is early and hence must be critically overdue at time .
∎
Direct Investments.
Suppose that is a service of ALG with . We say that the pair directly invests in the pair if and or if is an overflow node for and is the next time after that is included in a service of ALG. The investment made, , is , ’s price during the construction of , in the former case. In the latter case, the investment made is where is the remaining budget, , at the end of the -fall’s construction and is this earliest request not satisfied by this fall. If , then we do not consider any investments and say there are no overflow nodes for the fall. We imagine a fall including a node as reducing its price to before resetting its price to its cost. Then, we see the direct investment in any node is exactly the amount by which the fall reduces the node’s price.
Lemma 4.3.
The total direct investment made by any pair is at most and is strictly less than only if satisfied all requests in . In addition, the total direct investments made to a pair is at most and is strictly less than exactly when is on the path to the request that triggers the service of ALG at time .
Proof.
The total direct investment made by is exactly the total price of all nodes included in plus the investments made to the overflow nodes. Note that the budget starts at and is reduced by the price of each node added to the fall in each iteration. Hence, the total investment made by by to nodes included in is exactly . If all requests in are satisfied by , then this is all the direct investments made and this is clearly at most . If there are overflow nodes, each overflow node, , is invested exactly . Thus, the total investment made by to overflow nodes is exactly
So, the total direct investments made by is . Hence, as long as there are overflow nodes, the total direct investment will be exactly . So, the only way the total direct investment can be strictly less than is if there are no overflow nodes, which can only happen if satisfied all requests in by definition.
For the second claim, first note that every direct investment made to will be when is a overflow node other than possibly the very last direct investment as will be included in the service at that point and then cannot be invested in further by definition. Now, the direct investments made to are exactly the amounts the price of the node is reduced by. Since the price of the node begins at its cost and is reduced until the price becomes or until the node is included in a service due to a request becoming due, we see the total direct investments made to is at most . If is added by a fall then the last investment was its final price and so the total direct investment is simply the final price plus the amounts the price was reduced by which is for total direct investments. Hence, we see the total direct investment in is less than exactly when it is not added by some fall. This happens if and only if was on a path to a request due at time and so the initial part of Algorithm 3 added rather than a fall.
∎
Investments.
We say invests in if either directly invests in or if invests in and directly invests in . Equivalently, the latter case can be defined so that directly invests in and invests in and we will use both versions interchangeably. The equivalence follows similarly to how we can define a walk in a graph inductively by either looking at the last step or the first step in the walk. In the inductive case, we define where one of the investments will be direct and the other will be recursively constructed depending on which definition we use. Intuitively, this is just the investment made to by scaled by the relative reduction in caused by ’s investment to .
We define the total investment in to be . Note, that we can calculate the total investment by focusing on direct investments. In particular, which follows since any pair that invests in must invest in some pair directly investing in . To streamline our arguments we will consider as investing in and the investment made is . Note that only ancestors of can invest in a pair .
Lemma 4.4.
For any pair , we have .
Proof.
We proceed by induction on . For the base case, we have , so . The only ancestor of is itself, so the only investment made is from itself. Hence, the total investment to is .
Suppose that , so that is a proper descendant of . If no ancestors of directly invested in , then again . Suppose that directly invested in . Note that each must be a proper ancestor of and so . The induction hypothesis implies that for each . Excluding the investment from to itself, the total investment in is
Lastly, we use the fact from Lemma 4.3 that the total direct investment into is at most , so this final sum is at most . Thus, we have that implying that .
∎
We will not only want to look at the amount invested in a pair, but also how much a pair invests in other pairs. To this end, we define to be the total investment made by to its proper descendants. Again, we can focus on the pairs that are directly invested in as follows: .
Lemma 4.5.
For any pair , .
Proof.
We proceed by induction on . For the base case, , so is a leaf. In this case, has no proper descendants and so .
Now, suppose that so that is not a leaf. If does not directly invest in any pair we are done, so suppose that directly invests in . By Lemma 4.3, we know the total direct investment made by is at most . In symbols, . Also, each is a proper descendant of and so the induction hypothesis implies that for all . Using these two facts gives,
Above we have used the fact that since descendants have higher level and so .
∎
4.2 Charging Scheme
Now, we begin the construction of the charges we will make. For a set of pairs, , we let be the total investment made to the pairs in .
Lemma 4.6.
Let be a service of ALG and suppose is late at time . Then, there exists a set of critically overdue nodes such that .
Proof.
We proceed by induction on .
For the base case, , so is a leaf. In this case, we claim is critically overdue at time and hence satisfies the requirements as . Suppose that there was another service of ALG including at a time . Further, suppose that is late at time . Now, since all requests at are satisfied after time by ’s inclusion, it must be that some other request arrived at after time . However, late at time implies . This implies becomes due before OPT could have satisfied it, contradicting feasibility of OPT. Thus, is critically overdue.
Now, suppose that , so is not a leaf. If is critically overdue, we again have satisfies the requirements. Otherwise, suppose that is not critically overdue at time . Then, by definition, there must be a time at which ALG transmits a service, and this service contains , which is late at time . By the partitioning of phases into early and late parts, we can assume that time is the very next time that ALG includes after time since at time , must be late if it is at a even later time in the phase.
Consider the request whose satisfaction caused to be included at time and the last request considered by . We claim that . If , then it must be the case that . However, the definition of -falls imply they consider requests in earliest deadline order. Thus, for to be considered before it must be the case that arrived after time . But, this implies is in another phase and so was critically overdue at time , a contradiction. Hence, .
Then, Lemma 4.2 implies all the nodes in are late at time as is the earliest request in . Also, the contrapositive of the last statement in the lemma implies that any is late at time since is not critically overdue. Thus, all of the nodes in and the overflow nodes are higher level nodes than that are late so the induction hypothesis applies to these nodes. In particular, if are the nodes that directly invested in, then we have that for each there exists a set of critically overdue nodes, , satisfying . Since is not critically overdue, it must be the case that did not satisfy every request in as mentioned previously. Thus, by Lemma 4.3 we have that the total direct investment made by is exactly . Hence, we have that . Defining , we then have,
Also, contains only nodes that are critically overdue.
∎
Fix any service of ALG. Lemma 4.6 implies for any late node , we can pay for the in this service by charging the critically overdue nodes in the amount invested in them. This will be the charges we make for late nodes. Note, at this point the total charge to any critically overdue node, , will be at most . This follows since we at most charge a node the total investment made in it and this is at most by Lemma 4.4.
We also need to pay for early nodes of a service, which we do next. The analysis above essentially covers the cost of all nodes “above” a node , i.e. its ancestors. Now, we want to bound the cost of all nodes “below” a node , i.e. its descendants. To continue the charging strategy, we will charge every critically overdue node, , for all of the early nodes that it invests in. This is at most the totality of all investments made by so this adds a charge of at most to according to Lemma 4.5. Thus, each node is charged for nodes above it and for nodes below it for a total of . We just need to argue every cost is in fact covered by this strategy.
Lemma 4.7.
Let be a service of ALG. Then, the charging strategy pays for all the costs of nodes in .
Proof.
Any late node is clearly paid for by the definition of the charging scheme and Lemma 4.6. Next, we argue all early nodes are also paid for. Note any node on a path to a due request cannot be early by Lemma 4.2. Thus, if a node is early it cannot be on the path to a request that triggers a service of ALG. Consequently, Lemma 4.3 implies the total investments to an early node must exactly equal its cost.
We proceed by induction on to show if is early at time then the cost of is exactly paid for by charges to some critically overdue nodes as dictated by the charging scheme. For the base case, and we have that . We know is never early by definition of ALG only including when a request becomes due and so this case vacuously holds.
Suppose that . Let be a pair that directly invested in . If is critically overdue, then we know by the charging scheme that pays for the investment that it made to . We just need to ensure that investments of the other pairs that are not critically overdue were paid for. First, we claim all non-critically overdue pairs that invested in are early. For the sake of contradiction, suppose that directly invested in and is late but not critically overdue.
-
•
If is early at time , then Lemma 4.2 implies that is critically overdue, a contradiction.
-
•
If does not add at time , then was an overflow node for . Since is late but not critically overdue, we know the next time that ALG includes after time is in the same -phase as time . However, since was an overflow node at time , we know that the earliest request in that is not satisfied is in .
-
–
If the next time we enter the path includes , then this happens at time and so . Then, Lemma 4.2 implies is late since its on the path to the earliest due request in .
-
–
If the next time we enter the path does not include , it must be that a new request arrived after time causing to be late. However, late requests cannot arrive in the middle of a phase as it would become due before OPT includes again.
-
–
In all cases, we reach a contradiction and so if is not critically overdue then it must be early.
Now, for each that is early we have that is an ancestor of so the induction hypothesis implies that some charges to critically overdue nodes paid for . In particular, by our charging scheme this means there were critically overdue nodes satisfying . However, since is an early descendant of , is also an early descendant of each . Hence, our charging scheme would also ensure that paid for its investment to since it paid its investment to . By definition of the investments, the total investment made to by these critically overdue nodes is exactly
Now, since this holds true for all early pairs investing in , we have that all direct investments to are paid for by our charging scheme. But the total direct investment for an early node is its cost, and so is paid for by the charges. Thus, the charging scheme pays for all of the costs of the service .
∎
Summarizing the charging strategy above gives a proof of Theorem 4.1.
Proof.
For any service of ALG, we apply Lemma 4.6 to create sets of critically overdue nodes to charge. In particular, each node that is late at time we pay for by charging each pair in the investment made to each. This pays for every late node as . By Lemma 4.4 the total investments made to is at most and so the charge to any critically overdue node is at most to pay for late nodes through investments. Then, we charge to each critically overdue node the total amount of investments it made to early nodes. These charges add a total of charge of at most to any node as the total investments made by is at most by Lemma 4.5. Hence, the total charge to any critically overdue node is at most . Lastly, Lemma 4.7 implies every node in a service of ALG is paid for by this charging. Thus, the total cost of ALG’s schedule is at most times the total cost of OPT’s schedule. Consequently, ALG is -competitive.
∎
One thing to note about the above analysis is that its basically tight. Most critically overdue nodes are charged exactly . This optimizes the charging possible if we restrict to only charging to nodes OPT has included prior to a fixed time. The only way to improve upon this analysis would be to allow charging early nodes as well. However, recursive instances give evidence that no such approach would be possible as the adversary could change the instance if it sees ALG early-aggregate many nodes.
At a higher level, there is a clear trade-off between the eager and lazy aggregation costs and neither cost seems to be more dominant than the other. So, perfectly balancing both likely would give the best competitive ratio for the problem and this is exactly what Waterfall does. Specifically, the maximum early aggregation is resultant from the early aggregation made by the root and the maximum lazy aggregation will come from the leaves which will also have a charge of . More generally, the sum of the early charges and the late charges add to as the analysis shows. Although they may not always be exactly each, we always have the early and lazy aggregation costs are in equilibrium with each other balancing out to the fixed number . Depending on the tree structure and the requests received, strictly more eager aggregation or strictly more lazy aggregation may be more beneficial and so this more general type of balancing seems necessary. This further indicates that may be the best competitive ratio possible for MLAPD.
5 Conclusions
One prominent open question left by this paper is whether is the true competitive ratio for MLAPD. In particular, a critical result would be new lower-bounds for MLAPD that improve upon the lower-bound for the case. In addition, determining the difference in complexity between the delay and deadline cases is a critical open problem.
For , we know the optimal competitive ratio for MLAPD is and that for MLAP it is . For , we know that the optimal competitive ratio for MLAPD is and for MLAP it is essentially (improving the known lower-bound of to is still unresolved). Unless the problems for larger differ greatly than that for smaller , this pattern would suggest that the optimal competitive ratio for MLAP is exactly more than that for MLAPD.
In fact, [15] essentially took an algorithm for JRPD and converted it into an algorithm for JRP using a primal dual framework. The resultant algorithm had competitive ratio more than the competitive ratio for the algorithm for JRPD. If this technique could be extended for then using Waterfall as the MLAPD algorithm could result in a competitive algorithm for MLAP, which would improve on the best known -competitive algorithm. This is especially promising since the JRPD algorithm used for that result behaves almost exactly as Waterfall does on trees of depth .
Another possible approach would be to extend Waterfall to work for the delay case directly. By simply replacing deadlines with maturation times and making minor adjustments, the modified Waterfall may in fact be -competitive. However, unlike the nice structural properties of phases being partitioned into late and early parts, no such structure is clear for the delay version. Thus, a different analysis strategy must be employed absent new structural insights.
References
- [1] Alok Aggarwal and James K. Park “Improved Algorithms for Economic Lot Size Problems” In Operations Research 41.3, 1993, pp. 549–571 DOI: 10.1287/opre.41.3.549
- [2] Esther Arkin, Dev Joneja and Robin Roundy “Computational complexity of uncapacitated multi-echelon production planning problems” In Operations Research Letters 8.2, 1989, pp. 61–66 DOI: https://doi.org/10.1016/0167-6377(89)90001-1
- [3] Yossi Azar and Noam Touitou “General Framework for Metric Optimization Problems with Delay or with Deadlines” In CoRR abs/1904.07131, 2019 arXiv: http://arxiv.org/abs/1904.07131
- [4] B.R. Badrinath and P. Sudame “Gathercast: the design and implementation of a programmable aggregation mechanism for the Internet” In Proceedings Ninth International Conference on Computer Communications and Networks (Cat.No.00EX440), 2000, pp. 206–213 DOI: 10.1109/ICCCN.2000.885492
- [5] LUCA Becchetti et al. “Latency-Constrained Aggregation in Sensor Networks” In ACM Trans. Algorithms 6.1 New York, NY, USA: Association for Computing Machinery, 2010 DOI: 10.1145/1644015.1644028
- [6] Marcin Bienkowski et al. “Approximation algorithms for the joint replenishment problem with deadlines” In Journal of Scheduling 18.6 Springer ScienceBusiness Media LLC, 2014, pp. 545–560 DOI: 10.1007/s10951-014-0392-y
- [7] Marcin Bienkowski et al. “Better Approximation Bounds for the Joint Replenishment Problem” In CoRR abs/1307.2531, 2013 arXiv: http://arxiv.org/abs/1307.2531
- [8] Marcin Bienkowski et al. “Online Algorithms for Multi-Level Aggregation” In CoRR abs/1507.02378, 2015 arXiv: http://arxiv.org/abs/1507.02378
- [9] Marcin Bienkowski et al. “Online Control Message Aggregation in Chain Networks” In Proceedings of the 13th International Conference on Algorithms and Data Structures, WADS’13 London, ON, Canada: Springer-Verlag, 2013, pp. 133–145 DOI: 10.1007/978-3-642-40104-6˙12
- [10] Edward Bortnikov and Reuven Cohen “Schemes for scheduling control messages by hierarchical protocols” In Comput. Commun. 24.7-8, 2001, pp. 731–743 DOI: 10.1016/S0140-3664(00)00276-0
- [11] Carlos Fisch Brito, Elias Koutsoupias and Shailesh Vaya “Competitive Analysis of Organization Networks or Multicast Acknowledgment: How Much to Wait?” In Algorithmica 64.4, 2012, pp. 584–605
- [12] Niv Buchbinder and Joseph (Seffi) Naor “The Design of Competitive Online Algorithms via a Primal: Dual Approach” In Found. Trends Theor. Comput. Sci. 3.2–3 Hanover, MA, USA: Now Publishers Inc., 2009, pp. 93–263 DOI: 10.1561/0400000024
- [13] Niv Buchbinder, Kamal Jain and Joseph Seffi Naor “Online Primal-Dual Algorithms for Maximizing Ad-Auctions Revenue” In Proceedings of the 15th Annual European Conference on Algorithms, ESA’07 Eilat, Israel: Springer-Verlag, 2007, pp. 253–264
- [14] Niv Buchbinder, Moran Feldman, Joseph Naor and Ohad Talmon “O(depth)-Competitive Algorithm for Online Multi-level Aggregation” In CoRR abs/1701.01936, 2017 arXiv: http://arxiv.org/abs/1701.01936
- [15] Niv Buchbinder et al. “Online make-to-order joint replenishment model: Primal-dual competitive algorithms” In Operations Research 61.4 INFORMS Inst.for Operations Res.and the Management Sciences, 2013, pp. 1014–1029 DOI: 10.1287/opre.2013.1188
- [16] Wallace B. Crowston and Michael H. Wagner “Dynamic Lot Size Models for Multi-Stage Assembly Systems” In Management Science 20.1, 1973, pp. 14–21 DOI: 10.1287/mnsc.20.1.14
- [17] Daniel R. Dooly, Sally A. Goldman and Stephen D. Scott “On-Line Analysis of the TCP Acknowledgment Delay Problem” In J. ACM 48.2 New York, NY, USA: Association for Computing Machinery, 2001, pp. 243–273 DOI: 10.1145/375827.375843
- [18] F. Hu, X. Cao and C. May “Optimized scheduling for data aggregation in wireless sensor networks” In International Conference on Information Technology: Coding and Computing (ITCC’05) - Volume II 2, 2005, pp. 557–561 Vol. 2 DOI: 10.1109/ITCC.2005.219
- [19] Anna R. Karlin, Claire Kenyon and Dana Randall “Dynamic TCP Acknowledgement and Other Stories about e/(e-1)” In Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing, STOC ’01 Hersonissos, Greece: Association for Computing Machinery, 2001, pp. 502–509 DOI: 10.1145/380752.380845
- [20] Alf Kimms “Multi-level lot sizing and scheduling: methods for capacitated, dynamic, and deterministic models” Springer Science & Business Media, 2012
- [21] Retsef Levi, Robin Roundy and David Shmoys “Primal-Dual Algorithms for Deterministic Inventory Problems.” In Math. Oper. Res. 31, 2006, pp. 267–284 DOI: 10.1145/1007352.1007410
- [22] Retsef Levi and Maxim Sviridenko “Improved Approximation Algorithm for the One-Warehouse Multi-Retailer Problem” In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 188–199
- [23] Tim Nonner and Alexander Souza “A 5/3-Approximation Algorithm for Joint Replenishment with Deadlines” In Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications, COCOA ’09 Huangshan, China: Springer-Verlag, 2009, pp. 24–35 DOI: 10.1007/978-3-642-02026-1˙3
- [24] L… Pedrosa, Private communication reported by Bienkowski et al., 2013
- [25] Steven S. Seiden “A Guessing Game and Randomized Online Algorithms” In Proceedings of the thirty-second annual ACM symposium on theory of computing (STOC), 2000, 2000, pp. 592–601
- [26] Harvey M. Wagner and Thomson M. Whitin “Dynamic Version of the Economic Lot Size Model” In Management Science 5.1, 1958, pp. 89–96 DOI: 10.1287/mnsc.5.1.89
Appendix A Proofs for section 3.2
We begin with the proof of Lemma 3.3.
Proof.
In symbols, the assumptions state that , , and for all . We will show that each interval transmitted by ALG has cost at at least double the previously transmitted interval, and use this to show that in reverse order each interval must be at least a power of two smaller than the cost of .
Let be the nodes that need to be added to in order to reach the node at which was issued. Since triggered a service of ALG, we know that was not satisfied by previous services of ALG. Then, we know that the nodes in were not added to the service. By definition of ALG, this implies that . Thus,
In other words, the intervals of the requests are at least doubling each time a request is not satisfied. This implies that holds for all . As , an easy induction shows that holds for all . The definition of ALG then implies that .
Suppose is the earliest request due at the node that is the furthest node from the root contained in . As was satisfied by by assumption, and , it must be the case that the request arrived before the deadline of and so has been issued at the time of ’s construction. Then, we know that was not satisfied by otherwise being the last node in ’s interval would imply . Thus, the same arguments from before imply that . Thus, for all we have that
The total cost of these intervals ALG transmitted is
∎
Next, we show the improved competitive ratio for Double.
Proof.
We simply improve the upper bound on from to to achieve the claimed competitiveness. Notice that the worst charge to a service of OPT happens when both a larger service and smaller services of ALG both charge to it. In this case, to achieve a charge from the larger service it must be that at least two nodes are not included by the last small service (one that is due and another that is added too early). So, we know that . Now, in one situation we could have every request is at its own node and each node has very large cost and so . However, this will not achieve the largest charge. In particular, this means for each and so instead of we have , which overall yields a decrease of at least in the charge.
In order for to be very close to the , which happens in the worst case, we need at least an additional node between the node is issued at and the last node served by . This pushes the total cost of from closer to in exchange for one fewer interval than could be transmitted in this range. However, the overall increase in cost is positive. In particular, even if we added infinitely many intervals, the total cost would be , whereas just losing one later interval also pushes the cost to , so there is no improvement for choosing more intervals over a longer initial interval. In fact, if the path is finite, the longer interval is a strict improvement. Similarly, for any request where we would want to exchange a longer service for fewer total intervals, this is an improvement as which is exactly the largest we could make by sacrificing one smaller interval. Hence, the worst case occurs when each smaller interval is as large as possible, which requires at least one node to exist between any two request nodes. Thus, we remove half of the nodes from consideration giving that . Thus, ALG is -competitive. ∎
We note that the analysis above actually depends not only on , but the minimal difference, , between intervals charged to a longer service. The most accurate competitive ratio that could be achieved by the above analysis is likely closer to by taking the minimal differences into account appropriately. This indicates an even better competitive ratio may be possible for path instances.