This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Pseudopolynomial Algorithm to Minimize Maximum Lateness on Multiple Related Machines

Elbert Du Harvard University    Stan Zhang Massachussetts Institute of Technology
Abstract

In this paper, we will find a pseudopolynomial algorithm to solve QmLmaxQm\mid\mid L_{\max} and then we will prove that it is impossible to get any constant-factor approximation in polynomial time, and thus also impossible to have a PTAS for this problem. We will also show that the the problem when we don’t assume a fixed number of machines, PLmaxP\mid\mid L_{\max}, is strongly NP-hard.

1 Introduction

With job scheduling, since there are many problems, we have some shorthand notation for denoting problems. When we write ABCA\mid B\mid C, AA refers to the model we are using and BB refers to any constraints we have, and CC refers to the objective we wish to minimize/maximize. We will now introduce the specific models, constraints, and objectives that will be used in this paper.

The objective function we will be optimizing is LmaxL_{\max}, which is to minimize the maximum lateness. The other objective function that will be mentioned is CmaxC_{\max}, which is to minimize the maximum completion time. The models we will be using are PP and QQ. PP stands for parallel machines, all of which run at the same speed and in parallel. QQ refers to related machines, which means the machines still run in parallel but they can run at different speeds. However, the ratio between the time it takes to run two jobs aa and bb is constant among all machines. The number after it refers to the number of machines used, and we assume it to be a constant when we write mm machines. We will not be adding any constraints.

It is known that P2CmaxP2\mid\mid C_{\max} is NP-hard.[1] This is a generalized version of P2LmaxP2\mid\mid L_{\max}, so this problem is NP-hard as well. Furthermore, if we add some constraints to P2LmaxP2\mid\mid L_{\max}, it becomes strongly NP-hard.[2] [3]

2 Results

In this paper we will provide a pseudopolynomial algorithm solving the job scheduling problem of QmLmaxQm\mid\mid L_{\max}. This is a generalization of P2LmaxP2\mid\mid L_{\max}, and thus fills in the gap between the above results by showing that P2LmaxP2\mid\mid L_{\max} is in PpseudoP_{pseudo}

Firstly, we will reduce solving minLmax\min L_{\max} to checking feasibility on a set of jobs.

Theorem 1.

Given an algorithm that solves checking feasibility of solving a set of jobs on time in time T(n)T(n), and that TT is the maximum time it takes to finish any job, we can solve minLmax\min L_{\max} in time O(lognT)T(n)O(\log nT)T(n)

Proof.

Note that if we have an algorithm to check whether the set of jobs is feasible which runs in time T(n)T(n), we can check if getting a maximum lateness of xx is possible by just adding xx to all of the deadlines and check whether the resulting set of jobs is feasible.

By doing this, we can now binary search for the maximum lateness. If we assume the times are all integers, we can check binary search between a lower bound of 0 and an upper bound of nTnT where nn is the number of jobs and TT is the maximum time it takes to run any one job. This gives us a runtime of O(lognT)T(n)O(\log nT)\cdot T(n) to get minLmax\min L_{\max}.

2.1 Solving P2LmaxP2\mid L_{\max}

Firstly, we note that on any given machine, any feasible ordering of the jobs done on that machine can also be done feasibly by doing all the jobs in order of increasing deadlines (as if 22 consecutive jobs are out of order, we can swap them).[4] Also note that we can just scale the deadlines and the time it takes to do each job by the same amount and not change the problem, so we can assume the numbers to be integers.

Now, given two machines, we can compute feasibility as follows:

We DP on time and number of jobs seen: for a time tt and being given the first ii jobs, we want to have Dt,i=1D_{t,i}=1 iff there is a feasible scheduling that makes one of the machines finish at exactly time tt while processing all of the first ii jobs on time and Dt,i=0D_{t,i}=0 otherwise.

To determine the value of Dt,iD_{t,i}, there must either be a scheduling where we put the ithi^{th} job at the end of the schedule of the machine that ends up completing at time tt or in the other machine. To do this, we check Dtti,i1D_{t-t_{i},i-1} and Dt,i1D_{t,i-1}. If Dtti,i1=1D_{t-t_{i},i-1}=1 and tdit\leq d_{i}, then we can add the ithi^{th} job to the end of that machine and get a feasible scheduling of the first ii machines. If not, adding ii to the end of that machine does not give us a feasible scheduling. Otherwise, if Dt,i1=1D_{t,i-1}=1, then we know that the completion time of the other machine on the first i1i-1 jobs is

(j=1i1tj)t\left(\sum_{j=1}^{i-1}t_{j}\right)-t

and that the other machine completes all of these jobs feasibly. If this value is at most ditid_{i}-t_{i}, then we can add the ithi^{th} job to the end of it to get a valid scheduling.

Thus, we let Dt,i=1D_{t,i}=1 if either of the above conditions are true and 0 otherwise. This is correct since there are only two machines, and one of them has to complete at time tt for this to be 11. The above checks both possibilities for whether they are possible, and so if neither works, then we know this is impossible.

Now, to check feasibility, we just need to look at the nthn^{th} row and see if any of the values are 11. If any are, we return 11. Otherwise, we return 0.

The runtime of this algorithm is as follows:

We know that the columns of the DP table only need to go up to i=1ntnnT\sum_{i=1}^{n}t_{n}\leq nT where T=maxtnT=\max t_{n} so we can make our table n×nTn\times nT.

Now, to fill out the ithi^{th} row, we first compute (j=1i1tj)\left(\sum_{j=1}^{i-1}t_{j}\right) in O(n)O(n) time. There are nn rows so all of these together takes O(n2)O(n^{2}) time. Then, to fill in the value at each cell in that row, we check two values in the row above and then sometimes compute whether (j=1i1tj)tditi\left(\sum_{j=1}^{i-1}t_{j}\right)-t\leq d_{i}-t_{i}. Since we know the value of (j=1i1tj)\left(\sum_{j=1}^{i-1}t_{j}\right), each of these takes O(1)O(1) time so we spend O(1)O(1) time on each cell. There are n2Tn^{2}T cells so this takes O(n2T)O(n^{2}T) time.

Thus, the total runtime to check feasibility is O(n2T)O(n^{2}T) and the runtime to solve P2LmaxP2\mid L_{\max} is O(n2Tlog(nT))O(n^{2}T\log(nT))

2.2 Generalization to PmLmaxPm\mid L_{\max}

To generalize to mm machines, we number the machines 1,2,m1,2,\dots m and let DP[i][(x1,x2,,xm1)]DP[i][(x_{1},x_{2},\cdots,x_{m-1})] indicate whether there is a feasible scheduling of the first ii jobs jobs such that machine jj completes all jobs scheduled to it in exactly xjx_{j} time for 1jm11\leq j\leq m-1 (and this uniquely determines the completion time for machine mm since it’s just the remaining time that isn’t accounted for in the first m1m-1 machines).

We again assume that tit_{i} is the amount of time it takes to complete job ii and did_{i} is the deadline of job ii. To compute DP[i][(x1,x2,,xm1)]DP[i][(x_{1},x_{2},\cdots,x_{m-1})], if DP[i1][(x1,x2,,xjti,,xm1]=1DP[i-1][(x_{1},x_{2},\cdots,x_{j}-t_{i},\cdots,x_{m-1}]=1 for any jj such that xjdix_{j}\leq d_{i}, then we have a feasible scheduling of the first i1i-1 jobs such that if we add the ithi^{th} job to the end of the schedule for the jthj^{th} machine, we get the desired completion times for all of the machines and it is still feasible.

In the other case, we add the ithi^{th} job to the last machine. Then, we need to check that DP[i1][(x1,x2,,xm1)]=1DP[i-1][(x_{1},x_{2},\cdots,x_{m-1})]=1, so there is a feasible scheduling of the first i1i-1 jobs with the same weights on the first m1m-1 machines. Then, for this scheduling we also need to make sure that adding the ithi^{th} job to the last machine does not go over the deadline for that job. The last machine runs in time k=1itkk=1m1xk\sum_{k=1}^{i}t_{k}-\sum_{k=1}^{m-1}x_{k}, and we can make this transition as long as k=1itkk=1mxkdi\sum_{k=1}^{i}t_{k}-\sum_{k=1}^{m}x_{k}\leq d_{i}.

Again, we can compute and store k=1itk\sum_{k=1}^{i}t_{k} each time we get to a new row in O(n2)O(n^{2}) time. Then, each dimension is at most nTnT where TT is the maximum time it takes to complete a job, so there are nmTm1n^{m}T^{m-1} total states, and computing the value of a state takes O(m)O(m) time (as we need to check a transition for every single machine and potentially add up mm numbers), so the overall runtime is O(mnmTm1)O(mn^{m}T^{m-1}) time, which is pseudopolynomial if mm is a constant.

2.3 Generalization to QmLmaxQm\mid L_{\max}

This time, we will suppose that we are given for machine ii a rate parameter λi\lambda_{i} which is the number of seconds it takes the machine to do 11 unit of work. Just like above, we will suppose that job ii takes tit_{i} work and has a deadline of did_{i}. We will first scale the rate parameters and the amount of work each job takes until the work each job takes is an integer for every job since scaling both by the same amount doesn’t change anything. Then, for the rate parameters and deadlines, if we scale both of these by the same amount then we don’t change anything about the problem so we if we assume these numbers are all integers, our algorithm is pseudopolynomial in both the job lengths and machine rates.

Again, we let DP[i][(x1,x2,,xm1)]DP[i][(x_{1},x_{2},\cdots,x_{m-1})] indicate whether there is a feasible arrangement of jobs such that machine ii runs in exactly xix_{i} time but here compute the value of DP[i][(x1,x2,,xm1)]DP[i][(x_{1},x_{2},\cdots,x_{m-1})] as follows:

if DP[i1][(x1,x2,,xjtiλj,,xm1]=1DP[i-1][(x_{1},x_{2},\cdots,x_{j}-t_{i}\lambda_{j},\cdots,x_{m-1}]=1 such that xjdix_{j}\leq d_{i} for any 1jm11\leq j\leq m-1, we can add the ithi^{th} job to the end of the schedule of the jthj^{th} machine and get a feasible scheduling just like above. Otherwise, if we add it to the mthm^{th} machine, we know that the machine runs in time

λm(k=1itkk=1m1xkλk)\lambda_{m}\left(\sum_{k=1}^{i}t_{k}-\sum_{k=1}^{m-1}\frac{x_{k}}{\lambda_{k}}\right)

as from how we defined rates, if a machine has rate λk\lambda_{k} and runs for xkx_{k} time, then the total amount of work it does is xkλk\frac{x_{k}}{\lambda_{k}}. The total amount of work to do given the first ii jobs is k=1itk\sum_{k=1}^{i}t_{k}, so we can subtract these values to get the amount of work that the last machine does and multiply by its rate to find the time it takes to complete all these jobs. Then, we just need to check if this value is less than or equal to did_{i} to check if this is a valid scheduling since we know that the scheduling of the first i1i-1 jobs is feasible by induction. As there are the same number of states as in the PmLmaxPm\mid L_{\max} case and each state still takes O(m)O(m) time to compute, the runtime is still O(mnmTm1)O(mn^{m}T^{m-1}) and thus pseudopolynomial.

2.4 Hardness of PLmaxP\mid L_{\max} and constant-factor approximations

Theorem 2.

Bin Packing reduces to PLmaxP\mid L_{\max}

Proof.

Given nn items, we can scale all of the sizes of the items and the sizes of the bins up until all of the numbers are integers (since arbitrary precision real numbers cannot even be stored, this is always possible). Now, after scaling, suppose the sizes of the bins is bb and suppose we have a solution to PLmaxP\mid L_{\max} in time T(n)T(n). This solution must be able to check feasibility of a set of jobs since that’s equivalent to checking whether Lmax=0L_{\max}=0.

We can now note that if we have mm machines and nn jobs with lengths equal to the sizes of the items we wished to place in the bins after scaling, all of which have deadline bb, checking for feasibility of these nn jobs is equivalent to checking whether it is possible to pack all of these items into mm bins.

Now, we can just find the correct number of bins as follows:
We can start with a lower bound of 11 bin and an upper bound of nn bins since each item can only be placed in 11 bin. Then, we can binary search for the correct number of bins. For each value mm that we check, we check feasibility of job scheduling on mm machines with the jobs described above. If it is not feasible, then this mm is a strict lower bound on the number of bins necessary and if it is feasible, then this mm is a non-strict upper bound on the number of bins necessary.

Each check for feasibility takes T(n)T(n) time and we make logn\log n checks since we can binary search, so this runtime is T(n)O(logn)=Poly(n)T(n)T(n)\cdot O(\log n)=Poly(n)\cdot T(n). Thus, Bin packing reduces to PLmaxP\mid L_{\max}. ∎

Since Bin Packing is strongly NP-hard, this means that PLmaxP\mid L_{\max} is also strongly NP-hard so if we do not assume there are a constant number of machines, we cannot even get a pseudopolynomial algorithm.

Theorem 3.

There is no algorithm which gives us a constant factor approximation to P2LmaxP2\mid L_{\max} unless P=NPP=NP. Thus there is also no PTAS to solve P2LmaxP2\mid L_{\max} unless P=NPP=NP.

Proof.

Firstly, if there were a PTAS to solve P2LmaxP2\mid L_{\max}, for any ϵ\epsilon we plug in we have to get a poly-time algorithm that would give us a (1+ϵ)(1+\epsilon)-approximation which is a constant factor approximation, so the fact that there is no PTAS follows directly from the lack of a constant factor approximation.

To see that there is no constant factor approximation, note that if we had a (1+ϵ)(1+\epsilon) approximation, when the answer is 0, (1+ϵ)0=0(1+\epsilon)0=0 so we need to get an answer of 0 from our approximation algorithm. This means that in order to get a constant factor approximation in polynomial time, we need to be able to solve the decision problem for whether a set of jobs can be scheduled feasibly in polynomial time. However, as we saw above, solving the problem exactly has a polynomial time reduction to this. Thus, if this were possible, it would also be possible to solve P2LmaxP2\mid L_{\max} exactly in polynomial time. However, P2LmaxP2\mid L_{\max} is NP-hard so if we could do this, then we would have P=NPP=NP.

Thus, unless P=NPP=NP there is no constant factor approximation or PTAS for
P2LmaxP2\mid L_{\max}. ∎

3 Conclusion

Previously, while it was known that all of the above problems were NPNP-hard, and that a slightly harder version of the easiest problem above was strongly NPNP-hard, we did not know whether P2LmaxP2\mid\mid L_{\max} was is PpseudoP_{pseudo} or if it was strongly NPNP-hard. In this paper, we have resolved this by showing that it is in PpseudoP_{pseudo}. We further proved that we could generalize this to show that QmLmaxQm\mid\mid L_{\max} was also in PpseudoP_{pseudo}.

We also showed some hardness results: we showed that PLmaxP\mid\mid L_{\max}, which is a different way to make P2LmaxP2\mid\mid L_{\max} harder than what was looked at before, was also strongly NPNP-hard, and also showed that there cannot exist any constant factor approximation schemes for these problems unless P=NPP=NP.

As such, a pseudopolynomial algorithm to solve QmLmaxQm\mid\mid L_{\max} is in a sense optimal for deterministic algorithms for this problem unless P=NPP=NP as we cannot get any polynomial time constant factor approximation. Furthermore, many natural extensions of this problem end up being strongly NPNP-hard, so this is in a sense ”close” to the boundary between weakly NPNP-hard problems and strongly NPNP-hard problems.

References

  • [1] J.K. Lenstra, A.H.G. Rinnooy Kan, and P. Brucker. Complexity of machine scheduling problems. Ann. of Discrete Math., 1:343–362, 1977.
  • [2] J.A. Hoogeveen, S.L. van de Velde, and B. Veltman. Complexity of scheduling multiprocessor tasks with prespecified processor allocations. Discrete Appl. Math., 55(3):259–272, 1994.
  • [3] C.-Y. Lee and X. Cai. Scheduling one and two-processor tasks on two parallel processors. IIE Transactions on Scheduling and Logistics, 31:445–455, 1999.
  • [4] J. R. Jackson. Scheduling a production line to minimize maximum tardiness. Management Sciences Research Project, 1955.