Dynamic Programming and Linear Programming
for Odds Problem ^†^†thanks: This work was supported by JSPS KAKENHI Grant Numbers JP26285045, JP26242027, JP20K04973. We thank Katsunori Ano, Naoto Miyoshi, and Akifumi Kira for extensive discussions.

Sachika Kurokawa Graduate School of Engineering, Tokyo Institute of Technology Tomomi Matsui Graduate School of Engineering, Tokyo Institute of Technology

Abstract

This paper discusses the odds problem, proposed by Bruss in 2000, and its variants. A recurrence relation called a dynamic programming (DP) equation is used to find an optimal stopping policy of the odds problem and its variants. In 2013, Buchbinder, Jain, and Singh proposed a linear programming (LP) formulation for finding an optimal stopping policy of the classical secretary problem, which is a special case of the odds problem. The proposed linear programming problem, which maximizes the probability of a win, differs from the DP equations known for long time periods. This paper shows that an ordinary DP equation is a modification of the dual problem of linear programming including the LP formulation proposed by Buchbinder, Jain, and Singh.

Keywords: Odds problem, dynamic programming, linear programming, Markov decision process

2010 Mathematics Subject Classification: Primary 60G40, Secondary 60L15

1 Introduction

The odds problem is an optimal stopping problem proposed by Bruss [1], which includes the classical secretary problem as a special case. Some variants of the odds problem are discussed in [2, 3, 4, 5, 6, 7, 8]. Although the odds theorem shown in [1] and [9] gives an optimal policy directly, it is well-known that a backward induction method finds an optimal policy for most of the finite optimal stopping problems [10]. The backward induction method was used for the first time by Cayley [11, 12, 13]. A recurrence relation appearing in the backward induction method is called a dynamic programming (DP) equation. Recently, Buchbinder, Jain, and Singh [14] proposed a linear programming (LP) formulation for the classical secretary problem. Their LP formulation does not depend on the backward induction method and is essentially different from the DP equation. The purpose of this paper is to clarify the relation between these two types of programming.

The classical secretary problem and its variants are discussed as finite-horizon Markov decision processes and solved through backward induction methods (e.g., see Gilbert and Mosteller [10], Ross [15] Section I.5, Puterman [16] Section 4.6.4, and Bertsekas [17] Section 3.4 of Vol. I). Variations of infinite-horizon optimal stopping problems are discussed as stationary, infinite-horizon Markov decision processes (e.g., see [15] Section III.2, [16] Section 7.2.8, and [18] Section 4.4 of Vol. II). An LP reformulation is a popular method for stationary, infinite-horizon Markov decision process models [19, 20]. By contrast, an LP formulation is seldom used for finite-horizon Markov decision process models (and/or transient Markov policies) owing to the ease of solving the DP equation. Theoretically, Derman [21, 22] showed that we can deal with a finite-horizon Markov decision process model as a special case of an infinite-horizon Markov decision process models. Some recent results on LP formulations for finite-horizon Markov decision process models appear in [23, 24], for example. In Section 3, we propose a linear programming problem whose unique optimal solution is the solution of the DP equation. We give a direct proof of a transformation (from the DP equation to an LP formulation) in a self-contained manner by restricting to the odds problem.

In Section 4, we describe the odds problem as a finite-horizon Markov decision process and discuss a problem of finding an optimal policy (stopping rule). A straightforward formulation, which is independent of the DP equation, produces a non-linear programming problem. We propose a transformation that converts the non-linear programming problem into a linear programming problem, called a flow formulation. By restricting to the classical secretary problem, our flow formulation gives the LP formulation proposed in [14]. As a main result, we show that our flow formulation is the dual linear programming of the LP formulation obtained in Section 3.

The remainder of this paper is organized as follows. In the next section, we provide detailed descriptions of the odds problem and its variants. Section 3 shows that the DP equation gives a unique optimal solution of a certain linear programming problem. In Section 4, we describe a linear programming formulation for determining an optimal policy of the odds problem. We also show the duality of a DP equation and our linear programming formulation.

2 Odds Problem and its Variations

Let $X_{1},X_{2},\ldots,X_{n}$ denote a sequence of independent Bernoulli random variables. If $X_{i}=1$ , we state that the outcome of random variable $X_{i}$ is a success. Otherwise $(X_{i}=0)$ , we state that the outcome of $X_{i}$ is a failure. For each $i\in\{1,2,\ldots,n\}$ , the success probability $p_{i}=\mbox{\rm Pr}[X_{i}=1](0<p_{i}<1)$ is given. We denote the failure probability by $q_{i}=\mbox{\rm Pr}[X_{i}=0]=1-p_{i}>0$ and the odds by $r_{i}=p_{i}/q_{i}$ . A player observes these random variables sequentially one by one and is allowed to select the variable when observing a success. The odds problem, proposed by Bruss [1], maximizes the probability of selecting the last success.

In this paper, we discuss the odds problem and some variations (see [8] for detailed discussion). We assume that a sequence of rewards $(R_{1},R_{2},\ldots,R_{n})$ is given and a player receives reward $R_{i}$ when selecting a success $X_{i}=1$ . The aim of the player is to maximize the expected reward. The odds problem proposed by Bruss [1] is obtained by setting the reward $R_{i}$ to the probability that a success $X_{i}=1$ is the last success, i.e., $R_{i}=q_{i+1}\cdot q_{i+2}\cdot\cdots\cdot q_{n}$ $(\forall i\in\{1,2,\ldots,n\}).$ The classical secretary problem is obtained by setting $p_{i}=1/i$ and $R_{i}=i/n$ $(\forall i\in\{1,2,\ldots,n\}).$

Bruss and Paindaveine [3] discussed a model in which a player wants to select the final $m$ -th success. This problem is obtained by setting

R_{i}=\left(\prod_{j={i+1}}^{n}q_{j}\right)\cdot\left(\sum_{i+1\leq i_{1}<\cdots<i_{m-1}\leq n}r_{i_{1}}r_{i_{2}}\cdots r_{i_{m-1}}\right)\;\;\;(\forall i\in\{1,2,\ldots,n\}).

Tamaki [5] discussed a problem of selecting any of the last $m$ successes. We can express this problem by setting

R_{i}=\left(\prod_{j={i+1}}^{n}q_{j}\right)\cdot\left\{1+\sum_{h=1}^{m-1}\left(\sum_{i+1\leq i_{1}<\cdots<i_{h}\leq n}r_{i_{1}}r_{i_{2}}\cdots r_{i_{h}}\right)\right\}\;\;\;(\forall i\in\{1,2,\ldots,n\}).

Matsui and Ano [8] discussed a problem of selecting $k$ out of the last $\ell$ successes, where $1\leq k\leq\ell<N.$ Their model includes the above models as special cases. The model is obtained by setting

R_{i}=\left(\prod_{j={i+1}}^{n}q_{j}\right)\cdot\left\{\sum_{h=k-1}^{\ell-1}\left(\sum_{i+1\leq i_{1}<\cdots<i_{h}\leq n}r_{i_{1}}r_{i_{2}}\cdots r_{i_{h}}\right)\right\}\;\;\;(\forall i\in\{1,2,\ldots,n\}).

3 Linear Programming Problem for Solving DP Equation

This section describes a DP equation for finding an optimal stopping rule of our problem. We propose a linear programming problem whose unique optimal solution gives the solution to the DP equation.

Let $w_{i}$ be the expected reward under the condition in which a player observes variables $X_{1},X_{2},\ldots,X_{i}$ ; does not select $X_{i}$ ; and afterward adopts an optimal stopping rule. Then, $w_{0}$ denotes the maximum expected reward when a player adopts an optimal stopping rule. The following recurrence relation,

\displaystyle\begin{aligned} &w_{i-1}=\max\{q_{i}w_{i}+p_{i}R_{i},w_{i}\}&(i\in\{1,2,\ldots,n\}),\\ &w_{n}=0,\\ \end{aligned}

(1)

is called a DP equation, which calculates an optimal stopping rule through backward induction (see [10] for example).

Now, we describe a linear programming problem that finds a solution to the above DP equation.

Theorem 3.1.

A linear programming problem

\displaystyle\begin{aligned} \mbox{\rm P: \;\;}\mbox{\rm min.}~{}&w_{0}\\ \mbox{\rm s.t.}~{}&w_{i-1}\geq q_{i}w_{i}+p_{i}R_{i}&(\forall i\in\{1,2,\ldots,n\}),\\ &w_{i-1}\geq w_{i}&(\forall i\in\{1,2,\ldots,n\}),\\ &w_{n}=0,\\ \end{aligned}

has a unique optimal solution $\mbox{\boldmath$w$}^{\ast}$ that satisfies DP Equation (LABEL:eq:dp).

proof: Because problem P has a feasible solution whose objective function is always non-negative, an optimal solution is available. Let $\mbox{\boldmath$w$}^{\ast}$ be an optimal solution of P. Obviously, $\mbox{\boldmath$w$}^{\ast}$ satisfies

w_{i-1}^{\ast}\geq\max\{q_{i}w_{i}^{\ast}+p_{i}R_{i},w_{i}^{\ast}\}\;\;\;(\forall i\in\{1,2,\ldots,n\}).

We prove that $\mbox{\boldmath$w$}^{\ast}$ satisfies DP Equation (LABEL:eq:dp) through a contradiction. Assume that there exists an index $\hat{i}\in\{1,2,\ldots,n\}$ that satisfies

w_{\hat{i}-1}^{\ast}>\max\{q_{\hat{i}}w_{\hat{i}}^{\ast}+p_{\hat{i}}R_{\hat{i}},w_{\hat{i}}^{\ast}\}.

We introduce a sufficiently small positive number $\varepsilon>0$ and a solution $(w_{0}^{\prime},w_{1}^{\prime},\ldots,w_{n}^{\prime})$ defined by

w_{k}^{\prime}=\left\{\begin{array}[]{ll}w_{k}^{\ast}&(\forall k\in\{{\hat{i}},\ldots,n\}),\\ w_{k}^{\ast}-\varepsilon^{n-k}&(\forall k\in\{0,1,\ldots,{\hat{i}}-1\}).\\ \end{array}\right.

In the following, we show that $(w_{0}^{\prime},w_{1}^{\prime},\ldots,w_{n}^{\prime})$ is feasible to P.

For each $k\in\{\hat{i}+1,\hat{i}+2,\ldots,n\},$ the definition of ${\mbox{\boldmath$w$}^{\prime}}$ directly implies that

\displaystyle\begin{aligned} w^{\prime}_{k-1}=~{}&w^{\ast}_{k-1}\geq q_{k}w^{\ast}_{k}+p_{k}R_{k}=q_{k}w^{\prime}_{k}+p_{k}R_{k},\;\;\mbox{ and }\\ \ w^{\prime}_{k-1}=~{}&w^{\ast}_{k-1}\geq w^{\ast}_{k}=w^{\prime}_{k}.\end{aligned}

When $k=\hat{i}$ , $w^{\prime}_{\hat{i}-1}$ satisfies

\displaystyle\begin{aligned} w^{\prime}_{\hat{i}-1}=~{}&w^{\ast}_{\hat{i}-1}-\varepsilon^{n-\hat{i}+1}>q_{\hat{i}}w^{\ast}_{\hat{i}}+p_{\hat{i}}R_{\hat{i}}=q_{\hat{i}}w^{\prime}_{\hat{i}}+p_{\hat{i}}R_{\hat{i}},\;\;\mbox{ and }\\ w^{\prime}_{\hat{i}-1}=~{}&w^{\ast}_{\hat{i}-1}-\varepsilon^{n-\hat{i}+1}>w^{\ast}_{\hat{i}}=w^{\prime}_{\hat{i}},\end{aligned}

as $\varepsilon$ is a sufficiently small positive number and $n-\hat{i}+1\geq 1$ . For each $k\in\{1,2,\ldots,\hat{i}-1\}$ , the inequality $q_{k}>0$ implies the following:

\displaystyle\begin{aligned} w^{\prime}_{k-1}=~{}&w^{\ast}_{k-1}-\varepsilon^{n-(k-1)}\geq q_{k}w^{\ast}_{k}+p_{k}R_{k}-\varepsilon^{n-k+1}\geq q_{k}w^{\ast}_{k}+p_{k}R_{k}-q_{k}\varepsilon^{n-k}\\ =~{}&q_{k}w^{\prime}_{k}+p_{k}R_{k},\;\;\mbox{ and }\\ w^{\prime}_{k-1}=~{}&w^{\ast}_{k-1}-\varepsilon^{n-(k-1)}\geq w^{\ast}_{k}-\varepsilon^{n-k+1}\geq w^{\ast}_{k}-\varepsilon^{n-k}=w^{\prime}_{k}.\end{aligned}

From the above, ${\mbox{\boldmath$w$}^{\prime}}$ is feasible to problem P. Obviously, we have $w_{0}^{\prime}<w_{0}^{\ast}$ , which contradicts with the optimality of ${\mbox{\boldmath$w$}^{\ast}}$ . $\Box$

4 Flow Formulation and its Duality

4.1 Flow Formulation

In this subsection, we describe the odds problem as a finite-horizon Markov decision process and give a straightforward non-linear programming formulation of a problem for finding an optimal policy. We propose a transformation that converts the problem into a linear programming problem.

In this paper, we represent the record of a game using a sequence of realization values of observed random variables. For example, the record $(X_{1},X_{2},X_{3},X_{4})=(1,0,0,1)$ indicates that the player observed four random variables $X_{1},X_{2},X_{3},X_{4}$ and selected the success $X_{4}=1$ . When a player observes all random variables and fails to select one, we define a record of the game though a sequence of realization values with an additional last component 1, i.e., $(X_{1},X_{2},\ldots,X_{n},1)$ . Then, the set of all possible records, denoted by $\Xi$ , becomes

\Xi=\{(\xi_{1},\xi_{2},\ldots,\xi_{i})\in\{0,1\}^{i}\mid i\in\{1,2,\ldots,n+1\},\xi_{i}=1\}.

Next, we introduce a finite-horizon Markov decision process that formulates our problem. Let $\varphi$ be a state space defined by $\varphi={\cal C}\cup{\cal S}$ , where ${\cal C}=\{c_{0},c_{1},\ldots,c_{n+1}\}$ and ${\cal S}=\{s_{1},s_{2},\ldots,s_{n}\}$ . The state $c_{0}$ is the initial state of our process. The state $c_{n+1}$ is an absorbing state called a terminal state. For each state $c_{i}\;(i\in\{0,1,2,\ldots,n\})$ , we define a transition probability from $c_{i}$ to a state $s\in\varphi$ by

p_{c_{i},s}=\left\{\begin{array}[]{ll}p_{i}&(\mbox{if }s=s_{i+1}),\\ q_{i}&(\mbox{if }s=c_{i+1}),\\ 0&(\mbox{otherwise}),\end{array}\right.(\forall i\in\{0,\ldots n-1\})\mbox{ and }p_{c_{n},s}=\left\{\begin{array}[]{ll}1&(\mbox{if }s=c_{n+1}),\\ 0&(\mbox{otherwise}).\end{array}\right.

For each state in ${\cal S}$ , we associate action space ${\cal A}=\{\mbox{\tt cont},\mbox{\tt stop}\}$ . If an action stop is selected at state $s_{i}\in{\cal S}$ , the process moves to the terminal state $c_{n+1}$ and generates a reward $R_{i}$ . Otherwise, an action cont is selected and the process moves from $s_{i}$ to $c_{i}$ without a reward. A (probabilistic) policy is an $n$ -dimensional vector $\pi=(\pi_{1},\pi_{2},\ldots,\pi_{n})\in[0,1]^{n}$ , where $\pi_{i}$ denotes a probability that an action cont is selected at state $s_{i}$ . Our goal is to find an optimal policy $\pi^{*}$ such that the expected reward is maximized.

Given a policy $\pi$ , the occurrence probability of a record $\xi=(\xi_{1},\xi_{2},\ldots,\xi_{i})\in\Xi$ , denoted by $\mbox{\rm Pr}(\xi\mid\pi)$ , satisfies

\mbox{\rm Pr}(\xi\mid\pi)=\alpha_{1}\alpha_{2}\cdots\alpha_{i}

where

\alpha_{j}=\left\{\begin{array}[]{ll}q_{j}&(\mbox{if }\xi_{j}=0),\\ p_{j}\pi_{j}&(\mbox{if }\xi_{j}=1),\end{array}\right.(\forall j\in\{1,2,\ldots,i-1\})

and

\alpha_{i}=\left\{\begin{array}[]{ll}p_{i}(1-\pi_{i})&(\mbox{if }i\leq n),\\ q_{n}+p_{n}\pi_{n}&(\mbox{if }i=n+1).\end{array}\right.

For each $i\in\{1,2,\ldots,n+1\}$ , we define

\Xi_{i}=\{\xi\in\Xi\mid\mbox{the length of $\xi$ is equal to $i$}\}.

Obviously, the occurrence probabilities $\mbox{\rm Pr}(\xi\mid\pi)\;(\xi\in\Xi_{i})$ satisfy

\sum_{\xi\in\Xi_{i}}\mbox{\rm Pr}(\xi\mid\pi)=\left(\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})\right)p_{i}(1-\pi_{i})

for each $i\in\{1,2,\ldots,n\}.$ The expected reward with respect to $\pi$ is equal to

\sum_{i=1}^{n}\left(R_{i}\sum_{\xi\in\Xi_{i}}\mbox{\rm Pr}(\xi\mid\pi)\right).

We can then formulate the problem of maximizing the expected reward as

\displaystyle\begin{aligned} \mbox{Q: }\;\;\mbox{max.}~{}&\sum_{i=1}^{n}R_{i}y_{i}\\ \mbox{s.t.}~{}&y_{i}=\left(\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})\right)p_{i}(1-\pi_{i})&(\forall i\in\{1,2,\ldots,n\}),\\ &0\leq\pi_{i}\leq 1&(\forall i\in\{1,2,\ldots,n\}).\\ \end{aligned}

In the following, we transform the above problem into a linear programming problem. The following theorem provides the key idea of our transformation.

Theorem 4.1.

Let $\mbox{\boldmath{$y$}}=(y_{1},y_{2},\ldots,y_{n})$ be an $n$ -dimensional real vector. The following two conditions are equivalent.

(c1)

There exists $\pi=(\pi_{1},\pi_{2},\ldots,\pi_{n})$ such that $(\mbox{\boldmath{$y$}},\pi)$ is feasible for problem Q.

(c2)

There exists $\mbox{\boldmath{$z$}}=(z_{0},z_{1},\ldots,z_{n})$ such that $(\mbox{\boldmath{$y$}},\mbox{\boldmath{$z$}})$ satisfies the following linear inequality system:

\displaystyle\begin{aligned} &y_{i}\leq p_{i}z_{i-1}&(\forall i\in\{1,2,\ldots,n\}),\\ &y_{i}+z_{i}=z_{i-1}&(\forall i\in\{1,2,\ldots,n\}),\\ &z_{0}=1,\\ &y_{i}\geq 0&(\forall i\in\{1,2,\ldots,n\}).\\ \end{aligned}

(2)

Proof of (c1) $\rightarrow$ (c2). Let $(\mbox{\boldmath{$y$}},\pi)$ be a feasible solution to problem Q. Obviously, we have $\mbox{\boldmath{$y$}}\geq\mbox{\boldmath{$0$}}$ . Apply $z_{0}=1$ and $z_{i}=\prod_{j=1}^{i}(q_{j}+p_{j}\pi_{j})$ $(\forall i\in\{1,2,\ldots,n\})$ . We then obtain for each $i\in\{1,2,\ldots,n\},$

	$\displaystyle z_{i-1}-z_{i}=\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})-\prod_{j=1}^{i}(q_{j}+p_{j}\pi_{j})$
		$\displaystyle=$	$\displaystyle\left(\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})\right)\left(1-(q_{i}+p_{i}\pi_{i})\right)=\left(\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})\right)p_{i}(1-\pi_{i})=y_{i}$

and

	$\displaystyle p_{i}z_{i-1}-y_{i}$	$\displaystyle=$	$\displaystyle p_{i}\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})-\left(\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})\right)p_{i}(1-\pi_{i})$
		$\displaystyle=$	$\displaystyle\left(\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})\right)(p_{i}-p_{i}(1-\pi_{i}))=\left(\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})\right)p_{i}\pi_{i}\geq 0.$

Thus, $(\mbox{\boldmath{$y$}},\mbox{\boldmath{$z$}})$ satisfies the inequality system (2).

Proof of (c2) $\rightarrow$ (c1). Assume that $(\mbox{\boldmath{$y$}},\mbox{\boldmath{$z$}})$ satisfies the inequality system (2). First, we show $z_{i}>0$ through the induction on $i$ . Clearly, $z_{0}=1>0$ , and $z_{i-1}>0$ implies that $z_{i}=z_{i-1}-y_{i}\geq z_{i-1}-p_{i}z_{i-1}=z_{i-1}(1-p_{i})>0.$ With $p_{i}>0$ , we define

\pi_{i}=1-y_{i}/(p_{i}z_{i-1})\;\;\;(\forall i\in\{1,2,\ldots,n\}).

(3)

We then have the following inequalities:

1\geq\pi_{i}=1-\frac{y_{i}}{p_{i}z_{i-1}}=\frac{p_{i}z_{i-1}-y_{i}}{p_{i}z_{i-1}}\geq 0\;\;\;(\forall i\in\{1,2,\ldots,n\}).

It is easy to see that for each $i\in\{1,2,\ldots,n\}$ ,

$\displaystyle z_{i}$	$\displaystyle=$	$\displaystyle z_{i-1}-y_{i}=(q_{i}+p_{i})z_{i-1}-y_{i}=q_{i}z_{i-1}+(p_{i}z_{i-1}-y_{i})$
	$\displaystyle=$	$\displaystyle q_{i}z_{i-1}+p_{i}z_{i-1}\pi_{i}=(q_{i}+p_{i}\pi_{i})z_{i-1}$
	$\displaystyle=$	$\displaystyle(q_{i}+p_{i}\pi_{i})(q_{i-1}+p_{i-1}\pi_{i-1})\cdots(q_{1}+p_{1}\pi_{1})z_{0}=\prod_{j=1}^{i}(q_{j}+p_{j}\pi_{j}),$

and thus,

\displaystyle y_{i}

\displaystyle=

\displaystyle p_{i}z_{i-1}-p_{i}\pi_{i}z_{i-1}=z_{i-1}p_{i}(1-\pi_{i})=\left(\prod_{j=1}^{i-1}(q_{j}+p_{j}\pi_{j})\right)p_{i}(1-\pi_{i}).

From the above, $(\mbox{\boldmath{$y$}},\pi)$ is feasible to problem Q. $\Box$

By employing the above theorem, we can transform problem Q into the following linear programming problem:

FF: max.	$\displaystyle\sum_{i=1}^{n}R_{i}y_{i}$
s.t.	$\displaystyle\frac{y_{i}}{p_{i}}-z_{i-1}\leq 0$	$\displaystyle(\forall i\in\{1,2,\ldots,n\}),$
	$\displaystyle y_{i}+z_{i}-z_{i-1}=0$	$\displaystyle(\forall i\in\{1,2,\ldots,n\}),$	(4)
	$\displaystyle z_{0}=1,$		(5)
	$\displaystyle y_{i}\geq 0$	$\displaystyle(\forall i\in\{1,2,\ldots,n\}).$

We call the above a flow formulation. When we have an optimal solution to FF, Equation (3) provides an optimal policy $\pi$ , which is optimal to problem Q.

In the following, we interpret the FF problem as a flow problem on a digraph. Let $G=(V,E)$ be a digraph with a vertex set $V=\{s_{0},s_{1},\ldots,s_{n}\}\cup\{t_{1},t_{2},\ldots,t_{n}\}$ and a directed edge set $E=\{(s_{i-1},s_{i})\mid i\in\{1,2,\ldots,n\}\}\cup\{(s_{i-1},t_{i})\mid i\in\{1,2,\ldots,n\}\}$ . For each $i\in\{1,2,\ldots,n\}$ , we associate variables $z_{i}$ and $y_{i}$ with edges $(s_{i-1},s_{i})$ and $(s_{i-1},t_{i})$ , respectively. If we regard these variables as a flow on a corresponding directed edge, constraint (4) represents a flow conservation law at vertex $s_{i-1}$ , and constraint (5) states that a flow of volume 1 emanates from vertex $s_{0}$ . Because $y_{i}=\sum_{\xi\in\Xi_{i}}\mbox{\rm Pr}(\xi\mid\pi)$ , vertex $t_{i}$ corresponds to an event in which a player selects a success $X_{i}=1$ .

Figure 1: Digraph

G

, where

n=5.

4.2 Flow Formulation for Classical Secretary Problem

Buchbinder, Jain, and Singh [14] proposed the following LP formulation for the classical secretary problem:

\displaystyle\begin{aligned} \mbox{max.}~{}&\sum_{i=1}^{n}(i/n)y_{i}\\ \mbox{s.t.}~{}&i\cdot y_{i}\leq 1-\sum_{k=1}^{i-1}y_{k}&(\forall i\in\{1,2,\ldots,n\}),\\ &y_{i}\geq 0&(\forall i\in\{1,2,\ldots,n\}).\\ \end{aligned}

(6)

Their formulation is obtained from the FF by eliminating the variables $\{z_{1},z_{2},\ldots z_{n}\}$ through substitutions $z_{i}=1-(y_{1}+y_{2}+\cdots+y_{i})$ and setting $p_{i}=1/i$ and $R_{i}=i/n$ $(\forall i\in\{1,2,\ldots,n\}).$

4.3 Dual of Flow Formulation

The linear programming duality of FF is

P1: min.	$\displaystyle w_{0}$
s.t.	$\displaystyle w_{i}+\frac{\alpha_{i}}{p_{i}}\geq R_{i}$	$\displaystyle(\forall i\in\{1,2,\ldots,n\}),$	(7)
	$\displaystyle w_{i-1}-w_{i}-\alpha_{i}=0$	$\displaystyle(\forall i\in\{1,2,\ldots,n\}),$
	$\displaystyle w_{n}=0,$
	$\displaystyle\alpha_{i}\geq 0$	$\displaystyle(\forall i\in\{1,2,\ldots,n\}).$	(8)

We eliminate variables $\{\alpha_{1},\alpha_{2},\ldots,\alpha_{n}\}$ by $\alpha_{i}=w_{i-1}-w_{i}$ . Constraint (7) is transformed as follows:

\displaystyle\begin{aligned} w_{i}+\frac{\alpha_{i}}{p_{i}}&\geq&&R_{i},\\ p_{i}w_{i}+(w_{i-1}-w_{i})&\geq&&p_{i}R_{i},\\ w_{i-1}&\geq&&q_{i}w_{i}+p_{i}R_{i}.\\ \end{aligned}

Non-negativity constraints $\alpha_{i}\geq 0$ , as shown in (8), implies the following:

w_{i-1}\geq w_{i}\;\;\;(\forall i\in\{1,2,\ldots,n\}).

Thus, the above procedure coverts problem P1 into P.

5 Conclusion

This paper described a linear programming problem whose unique optimal solution is the solution to the DP equation for the odds problem. We proposed a flow formulation that maximizes the expected reward of the odds problem based on a finite-horizon Markov decision process. Furthermore, we showed the duality of these two formulations.

References

[1] F. T. Bruss, Sum the odds to one and stop, Annals of Probability 28 (3) (2000) 1384–1391.
[2] K. Ano, H. Kakinuma, N. Miyoshi, Odds theorem with multiple selection chances, Journal of Applied Probability 47 (4) (2010) 1093–1104.
[3] F. T. Bruss, D. Paindaveine, Selecting a sequence of last successes in independent trials, Journal of Applied Probability 37 (2) (2000) 389–399.
[4] A. Kurushima, K. Ano, Multiple stopping odds problem in Bernoulli trials with random number of observations, Mathematica Applicanda 44 (1) (2016) 209–220.
[5] M. Tamaki, Sum the multiplicative odds to one and stop, Journal of Applied Probability 47 (3) (2010) 761–777.
[6] T. Matsui, K. Ano, A note on a lower bound for the multiplicative odds theorem of optimal stopping, Journal of Applied Probability 51 (3) (2014) 885–889.
[7] T. Matsui, K. Ano, Lower bounds for Bruss’ odds problem with multiple stoppings, Mathematics of Operations Research 41 (2) (2016) 700–714.
[8] T. Matsui, K. Ano, Compare the ratio of symmetric polynomials of odds to one and stop, Journal of Applied Probability 54 (1) (2017) 12.
[9] F. T. Bruss, A note on bounds for the odds theorem of optimal stopping, Annals of Probability 31 (4) (2003) 1859–1961.
[10] J. P. Gilbert, F. Mosteller, Recognizing the maximum of a sequence, Journal of the American Statistical Association 61 (313) (1966) 35–73.
[11] A. Cayley, Mathematical questions with their solutions, The Educational Times 23 (1875) 18–19.
[12] L. Moser, On a problem of Cayley, Scripta Math 22 (1956) 289–292.
[13] T. S. Ferguson, Who solved the secretary problem?, Statistical Science 4 (3) (1989) 282–296.
[14] N. Buchbinder, K. Jain, M. Singh, Secretary problems via linear programming, Mathematics of Operations Research 39 (1) (2014) 190–206.
[15] S. M. Ross, Introduction to stochastic dynamic programming, Academic Press, 2014.
[16] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming, John Wiley & Sons, 2014.
[17] D. P. Bertsekas, Dynamic programming and optimal control 4th edition, volume I, Athena Scientific, 2017.
[18] D. P. Bertsekas, Dynamic programming and optimal control 4th edition, volume II, Athena Scientific, 2012.
[19] A. S. Manne, Linear programming and sequential decisions, Management Science 6 (3) (1960) 259–267.
[20] L. C. M. Kallenberg, Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory, Zeitschrift für Operations Research 40 (1) (1994) 1–42.
[21] C. Derman, On sequential decisions and Markov chains, Management Science 9 (1) (1962) 16–24.
[22] C. Derman, M. Klein, Some remarks on finite horizon Markovian decision models, Operations Research 13 (2) (1965) 272–278.
[23] A. Bhattacharya, J. P. Kharoufeh, Linear programming formulation for non-stationary, finite-horizon Markov decision process models, Operations Research Letters 45 (6) (2017) 570–574.
[24] M. Mundhenk, J. Goldsmith, C. Lusena, E. Allender, Complexity of finite-horizon markov decision process problems, Journal of the ACM (JACM) 47 (4) (2000) 681–720.

Dynamic Programming and Linear Programming for Odds Problem ††thanks: This work was supported by JSPS KAKENHI Grant Numbers JP26285045, JP26242027, JP20K04973. We thank Katsunori Ano, Naoto Miyoshi, and Akifumi Kira for extensive discussions.