This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Predicting Online Item-choice Behavior:
A Shape-restricted Regression Approach

Naoki Nishimura, Noriyoshi Sukegawa, Yuichi Takano, and Jiro Iwanaga N. Nishimura was with the Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki 305-8577, Japan. He is now with Recruit Lifestyle Co., Ltd., Tokyo 100-6640, Japan (e-mail: [email protected]).N. Sukegawa is with the Department of Information and Computer Technology, Tokyo University of Science, Tokyo 125-8585, Japan (e-mail: [email protected]).Y. Takano is with the Faculty of Engineering, Information and Systems, University of Tsukuba, Ibaraki 305-8577, Japan (e-mail: [email protected]).J. Iwanaga is with the Doctoral Program in Policy and Planning Sciences, University of Tsukuba, Ibaraki 305-8577, Japan, and also with Erdos Inc., Kanagawa 222-0033, Japan (email: [email protected]).
Abstract

This paper examines the relationship between user pageview (PV) histories and their item-choice behavior on an e-commerce website. We focus on PV sequences, which represent time series of the number of PVs for each user–item pair. We propose a shape-restricted optimization model that accurately estimates item-choice probabilities for all possible PV sequences. This model imposes monotonicity constraints on item-choice probabilities by exploiting partial orders for PV sequences, according to the recency and frequency of a user’s previous PVs. To improve the computational efficiency of our optimization model, we devise efficient algorithms for eliminating all redundant constraints according to the transitivity of the partial orders. Experimental results using real-world clickstream data demonstrate that our method achieves higher prediction performance than that of a state-of-the-art optimization model and common machine learning methods.

Index Terms:
Consumer behavior, electronic commerce, graph theory, predictive models, quadratic programming

I Introduction

A growing number of companies are now operating e-commerce websites that allow users to browse and purchase a variety of items [45]. Within this context, there is great potential value in analyzing users’ item-choice behavior from clickstream data, which is a record of user pageview (PV) histories on an e-commerce website. By grasping users’ purchase intention as revealed by PV histories, we can lead users to target pages or design special sales promotions, providing companies with opportunities to build profitable relationships with users [22, 33]. Companies can also use clickstream data to improve the quality of operational forecasting and inventory management [18]. Meanwhile, users often find it difficult to select an appropriate item from among the plethora of choices presented by e-commerce websites [1]. Analyzing item-choice behavior can improve the performance of recommendation systems that help users discover items of interest [20]. For these reasons, a number of prior studies have investigated clickstream data from various perspectives [7]. In this study, we focused on closely examining the relationship between PV histories and item-choice behavior on an e-commerce website.

It has been demonstrated that the recency and frequency of a user’s past purchases are critical indicators for purchase prediction [13, 46] and sequential pattern mining [9]. Accordingly, Iwanaga et al. [19] developed a shape-restricted optimization model for estimating item-choice probabilities from the recency and frequency of a user’s previous PVs. Their method creates a two-dimensional probability table consisting of item-choice probabilities for all recency–frequency combinations in a user’s previous PVs. Nishimura et al. [32] employed latent-class modeling to integrate item heterogeneity into a two-dimensional probability table. These prior studies demonstrated experimentally that higher prediction performance was achieved with the two-dimensional probability table than with common machine learning methods, namely, logistic regression, kernel-based support vector machines, artificial neural networks, and random forests. Notably, however, reducing PV histories to two dimensions (recency and frequency) can markedly decrease the amount of information contained in PV histories reflecting item-choice behavior.

This study focused on PV sequences, which represent time series of the number of PVs for each user–item pair. In contrast to the two-dimensional probability table, PV sequences allow us to retain detailed information contained in the PV history. However, the huge number of possible PV sequences makes it extremely difficult to accurately estimate item-choice probabilities for all of them. To overcome this difficulty, we propose a shape-restricted optimization model that imposes monotonicity constraints on item-choice probabilities based on a partially ordered set (poset) for PV sequences. While this optimization model contains a huge number of constraints, all redundant constraints can be eliminated according to the transitivity of partial order. To accomplish this, we compute a transitivity reduction [2] of a directed graph representing the poset. We demonstrate the effectiveness of our method through experiments using real-world clickstream data.

The main contributions of this paper are as follows:

  • We propose a shape-restricted optimization model for estimating item-choice probabilities from a user’s previous PV sequence. This PV sequence model exploits the monotonicity constraints to precisely estimate item-choice probabilities.

  • We derive two types of PV sequence posets according to the recency and frequency of a user’s previous PVs. Experimental results show that the monotonicity constraints based on these posets greatly enhances the prediction performance of our PV sequence model.

  • We devise constructive algorithms for transitive reduction specific to these posets. The time complexity of our algorithms is much smaller than that of general-purpose algorithms. Experimental results reveal that transitive reduction improves efficiency in terms of both the computation time and memory usage of our PV sequence model.

  • We verify experimentally that higher prediction performance is achieved with our method than with the two-dimensional probability table and common machine learning methods, namely, logistic regression, artificial neural networks, and random forests.

The remainder of this paper is organized as follows. Section II gives a brief review of related work. Section III describes the two-dimensional probability table [19], and Section 4 presents our PV sequence model. Section IV describes our constructive algorithms for transitive reduction. Section V evaluates the effectiveness of our method based on experimental results. Section VI concludes with a brief summary of this work and a discussion of future research directions.

II Related work

This section briefly surveys methods for predicting online user behavior and discusses some related work on shape-restricted regression.

II-A Prediction of online user behavior

A number of prior studies have aimed at predicting users’ purchase behavior on e-commerce websites [10]. Mainstream research has applied stochastic or statistical models for predicting purchase sessions [5, 23, 30, 31, 36, 41, 46], but these approaches do not consider which items users choose.

Various machine learning methods have been used to predict online item-choice behavior, including logistic regression [12, 53], association rule mining [37], support vector machines [38, 53], ensemble learning methods [25, 26, 39, 52, 54], and artificial neural networks [21, 47, 50]. Tailored statistical models have also been proposed. For instance, Moe [29] devised a two-stage multinomial logit model that separates the decision-making process into item views and purchase decisions. Yao et al. [51] proposed a joint framework consisting of user-level factor estimation and item-level factor aggregation based on the buyer decision process. Borges and Levener [6] used Markov chain models to estimate the probability of the next link choice of a user.

These prior studies effectively utilized clickstream data in various prediction methods and showed that consideration of time-evolving user behavior is crucial for precise prediction of online item-choice behavior. We therefore focused on user PV sequences to estimate item-choice probabilities on e-commerce websites. Moreover, we evaluated the prediction performance of our method by comparison with machine learning methods that have commonly been used in prior studies.

II-B Shape-restricted regression

In many practical situations, prior information is known about the relationship between explanatory and response variables. For instance, utility functions can be assumed to be increasing and concave according to economic theory [28], and option pricing functions to be monotone and convex according to finance theory [3]. Shape-restricted regression fits a nonparametric function to a set of given observations under shape restrictions such as monotonicity, convexity, concavity, or unimodality [8, 15, 16, 48].

Isotonic regression is the most common method for shape-restricted regression. In general, isotonic regression is the problem of estimating a real-valued monotone (non-decreasing or non-increasing) function with respect to a given partial order of observations [35]. Some regularization techniques [14, 44] and estimation algorithms [17, 35, 43] have been proposed for isotonic regression.

One of the greatest advantages of shape-restricted regression is that it mitigates overfitting, thereby improving prediction performance of regression models [4]. To utilize this advantage, Iwanaga et al. [19] devised a shape-restricted optimization model for estimating item-choice probabilities on e-commerce websites. Along similar lines, we propose a shape-restricted optimization model based on order relations of PV sequences to improve prediction performance.

III Two-dimensional probability table

This section briefly reviews the two-dimensional probability table proposed by Iwanaga et al. [19].

III-A Empirical probability table

TABLE I: Pageview History of Six User–Item Pairs
#PVs Choice
User Item Apr 1 Apr 2 Apr 3 Apr 4 (r,f)(r,f) (v1,v2,v3)(v_{1},v_{2},v_{3})
u1u_{1} i2i_{2} 1 0 1 0 (3,2)(3,2) (1,0,1)(1,0,1)
u1u_{1} i4i_{4} 0 1 0 1 (2,1)(2,1) (0,1,0)(0,1,0)
u2u_{2} i1i_{1} 3 0 0 0 (1,3)(1,3) (0,0,3)(0,0,3)
u2u_{2} i3i_{3} 0 0 3 1 (3,3)(3,3) (3,0,0)(3,0,0)
u2u_{2} i4i_{4} 1 1 1 0 (3,3)(3,3) (1,1,1)(1,1,1)
u3u_{3} i2i_{2} 2 0 1 0 (3,3)(3,3) (1,0,2)(1,0,2)

Table I gives an example of a PV history for six user–item pairs. For instance, user u1u_{1} viewed the page for item i2i_{2} once on each of April 1 and 3. We focus on user choices (e.g., revisit and purchase) on April 4, which we call the base date. For instance, user u1u_{1} chose item i4i_{4} rather than item i2i_{2} on the base date. We suppose for each user–item pair that recency and frequency are characterized by the last PV day and the total number of PVs, respectively. As shown in the table, the PV history can be summarized by the recency–frequency combination (r,f)R×F(r,f)\in R\times F, where RR and FF are index sets representing recency and frequency, respectively.

Let nrfn_{rf} be the number of user–item pairs having (r,f)R×F(r,f)\in R\times F, and set qrfq_{rf} to the number of choices occurring by user–item pairs that have (r,f)R×F(r,f)\in R\times F on the base date. In the case of Table I, the empirical probability table is calculated as

(x^rf:=qrfnrf)(r,f)R×F\displaystyle\left(\hat{x}_{rf}:=\frac{q_{rf}}{n_{rf}}\right)_{\!\!(r,f)\in R\times F} =(0/00/00/11/10/00/00/00/11/3)\displaystyle=\left(\begin{array}[]{ccc}0/0&0/0&0/1\\ 1/1&0/0&0/0\\ 0/0&0/1&1/3\\ \end{array}\right) (4)
(0.000.000.001.000.000.000.000.000.33),\displaystyle\approx\left(\begin{array}[]{lll}0.00&0.00&0.00\\ 1.00&0.00&0.00\\ 0.00&0.00&0.33\\ \end{array}\right), (8)

where, for reasons of expediency, x^rf:=0\hat{x}_{rf}:=0 for (r,f)R×F(r,f)\in R\times F with nrf=0n_{rf}=0.

III-B Two-dimensional monotonicity model

It is reasonable to assume that the recency and frequency of user–item pairs are positively associated with user item-choice probabilities. To estimate user item-choice probabilities xrfx_{rf} for all recency–frequency combinations (r,f)R×F(r,f)\in R\times F, the two-dimensional monotonicity model [19] minimizes the weighted sum of squared errors under monotonicity constraints with respect to recency and frequency.

minimize(xrf)(r,f)R×F\displaystyle\mathop{\mbox{minimize}}_{(x_{rf})_{(r,f)\in R\times F}} (r,f)R×Fnrf(xrfx^rf)2\displaystyle\sum_{(r,f)\in R\times F}n_{rf}(x_{rf}-\hat{x}_{rf})^{2} (9)
   subject to xrfxr+1,f((r,f)R×F),\displaystyle x_{rf}\leq x_{r+1,f}~{}~{}~{}((r,f)\in R\times F), (10)
xrfxr,f+1((r,f)R×F),\displaystyle x_{rf}\leq x_{r,f+1}~{}~{}~{}((r,f)\in R\times F), (11)
0xrf1((r,f)R×F).\displaystyle 0\leq x_{rf}\leq 1~{}~{}~{}~{}\,((r,f)\in R\times F). (12)

Note, however, that PV histories are often indistinguishable according to recency and frequency. A typical example is the set of user–item pairs (u2,i3)(u_{2},i_{3}), (u2,i4)(u_{2},i_{4}), and (u3,i2)(u_{3},i_{2}) in Table I; although their PV histories are actually different, they have the same value (r,f)=(3,3)(r,f)=(3,3) for recency–frequency combinations. As described in the next section, we exploit the PV sequence to distinguish between such PV histories.

IV PV sequence model

This section presents our shape-restricted optimization model for estimating item-choice probabilities from a user’s previous PV sequence.

IV-A PV sequence

The PV sequence for each user–item pair represents a time series of the number of PVs, and is written as

𝒗:=(v1,v2,,vn),\bm{v}:=(v_{1},v_{2},\ldots,v_{n}),

where vjv_{j} is the number of PVs jj periods earlier for j=1,2,,nj=1,2,\ldots,n (see Table I). Note that sequence terms are arranged in reverse chronological order, so vjv_{j} moves into the past as index jj increases.

Throughout the paper, we express sets of consecutive integers as

[m1,m2]:={m1,m1+1,,m2},[m_{1},m_{2}]:=\{m_{1},m_{1}+1,\ldots,m_{2}\}\subseteq\mathbb{Z},

where [m1,m2]=[m_{1},m_{2}]=\emptyset when m1>m2m_{1}>m_{2}. The set of possible PV sequences is defined as

Γ:=[0,m]n={0,1,,m}n,\Gamma:=[0,m]^{n}=\{0,1,\ldots,m\}^{n},

where mm is the maximum number of PVs in each period, and nn is the number of periods considered.

Our objective is to estimate item-choice probabilities x𝒗x_{\bm{v}} for all PV sequences 𝒗Γ\bm{v}\in\Gamma. However, the huge number of PV sequences makes it extremely difficult to accurately estimate such probabilities. In the case of (n,m)=(|R|,|F|)=(5,6)(n,m)=(|R|,|F|)=(5,6) for instance, the number of different PV sequences is (m+1)n=16,807(m+1)^{n}=16{,}807, whereas the number of recency–frequency combinations is only |R||F|=30|R|\cdot|F|=30. To avoid this difficulty, we effectively utilize monotonicity constraints on item-choice probabilities as in the optimization model (9)–(12). In the next section, we introduce three operations underlying the development of the monotonicity constraints.

IV-B Operations based on recency and frequency

From the perspective of frequency, it is reasonable to assume that item-choice probability increases as the number of PVs in a particular period increases. To formulate this, we define the following operation:

Definition 1 (Up).

On the domain

𝒟U:={(𝒗,s)Γ×[1,n]vsm1},\mathcal{D}_{\texttt{U}}:=\{(\bm{v},s)\in\Gamma\times[1,n]\mid v_{s}\leq m-1\},

the function Up:𝒟UΓ{\texttt{Up}}:\mathcal{D}_{\texttt{U}}\to\Gamma is defined as

((,vs,),s)(,vs+1,).((\ldots,v_{s},\ldots),s)\mapsto(\ldots,v_{s}+1,\ldots).

For instance, we have Up((0,1,1),1)=(1,1,1)\texttt{Up}((0,1,1),1)=(1,1,1), and Up((1,1,1),2)=(1,2,1)\texttt{Up}((1,1,1),2)=(1,2,1). Since this operation increases PV frequencies, the monotonicity constraint x(0,1,1)x(1,1,1)x(1,2,1)x_{(0,1,1)}\leq x_{(1,1,1)}\leq x_{(1,2,1)} should be satisfied by item-choice probabilities.

From the perspective of recency, we assume that more-recent PVs have a larger effect on increasing item-choice probability. To formulate this, we consider the following operation for moving one PV from an old period to a new period:

Definition 2 (Move).

On the domain

𝒟M:={(𝒗,s,t)Γ×[1,n]×[1,n]vsm1,vt1,s<t},\begin{array}[]{l}\mathcal{D}_{\texttt{M}}:=\{(\bm{v},s,t)\in\Gamma\times[1,n]\times[1,n]\\ \qquad\qquad\qquad\qquad\mid v_{s}\leq m-1,~{}v_{t}\geq 1,~{}s<t\},\end{array}

the function Move:𝒟MΓ{\texttt{Move}}:\mathcal{D}_{\texttt{M}}\to\Gamma is defined as

((,vs,,vt,),s,t)(,vs+1,,vt1,).((\ldots,v_{s},\ldots,v_{t},\ldots),s,t)\mapsto(\ldots,v_{s}+1,\ldots,v_{t}-1,\ldots).

For instance, we have Move((1,1,1),2,3)=(1,2,0)\texttt{Move}((1,1,1),2,3)=(1,2,0), and Move((1,2,0),1,2)=(2,1,0)\texttt{Move}((1,2,0),1,2)=(2,1,0). Because this operation increases the number of recent PVs, item-choice probabilities should satisfy the monotonicity constraint x(1,1,1)x(1,2,0)x(2,1,0)x_{(1,1,1)}\leq x_{(1,2,0)}\leq x_{(2,1,0)}.

The PV sequence 𝒗=(1,1,1)\bm{v}=(1,1,1) represents a user’s continued interest in a certain item over three periods. In contrast, the PV sequence 𝒗=(1,2,0)\bm{v}=(1,2,0) implies that a user’s interest decreased over the two most-recent periods. In this sense, the monotonicity constraint x(1,1,1)x(1,2,0)x_{(1,1,1)}\leq x_{(1,2,0)} may not be validated. Accordingly, we define the following alternative operation, which exchanges numbers of PVs to increase the number of recent PVs:

Definition 3 (Swap).

On the domain

𝒟S:={(𝒗,s,t)Γ×[1,n]×[1,n]vs<vt,s<t},\mathcal{D}_{\texttt{S}}:=\{(\bm{v},s,t)\in\Gamma\times[1,n]\times[1,n]\mid v_{s}<v_{t},~{}s<t\},

the function Swap:𝒟SΓ{\texttt{Swap}}:\mathcal{D}_{\texttt{S}}\to\Gamma is defined as

((,vs,,vt,),s,t)(,vt,,vs,).((\ldots,v_{s},\ldots,v_{t},\ldots),s,t)\mapsto(\ldots,v_{t},\ldots,v_{s},\ldots).

We thus have Swap((1,0,2),2,3)=(1,2,0)\texttt{Swap}((1,0,2),2,3)=(1,2,0) because v2<v3v_{2}<v_{3}, and Swap((1,2,0),1,2)=(2,1,0)\texttt{Swap}((1,2,0),1,2)=(2,1,0) because v1<v2v_{1}<v_{2}. Since this operation increases the number of recent PVs, item-choice probabilities should satisfy the monotonicity constraint x(1,0,2)x(1,2,0)x(2,1,0)x_{(1,0,2)}\leq x_{(1,2,0)}\leq x_{(2,1,0)}. Note that the monotonicity constraint x(1,1,1)x(1,2,0)x_{(1,1,1)}\leq x_{(1,2,0)} is not implied by this operation.

IV-C Partially ordered sets

Let UΓU\subseteq\Gamma be a subset of PV sequences. The image of each operation is then defined as

Up(U)\displaystyle\texttt{Up}(U) ={Up(𝒖,s)𝒖U,(𝒖,s)𝒟U},\displaystyle=\{\texttt{Up}(\bm{u},s)\mid\bm{u}\in U,~{}(\bm{u},s)\in\mathcal{D}_{\texttt{U}}\},
Move(U)\displaystyle\texttt{Move}(U) ={Move(𝒖,s,t)𝒖U,(𝒖,s,t)𝒟M},\displaystyle=\{\texttt{Move}(\bm{u},s,t)\mid\bm{u}\in U,~{}(\bm{u},s,t)\in\mathcal{D}_{\texttt{M}}\},
Swap(U)\displaystyle\texttt{Swap}(U) ={Swap(𝒖,s,t)𝒖U,(𝒖,s,t)𝒟S}.\displaystyle=\{\texttt{Swap}(\bm{u},s,t)\mid\bm{u}\in U,~{}(\bm{u},s,t)\in\mathcal{D}_{\texttt{S}}\}.

We define UM(U):=Up(U)Move(U)\texttt{UM}(U):=\texttt{Up}(U)\cup\texttt{Move}(U) for UΓU\subseteq\Gamma. The following definition states that the binary relation 𝒖UM𝒗\bm{u}\prec_{\texttt{UM}}\bm{v} holds when 𝒖\bm{u} can be transformed into 𝒗\bm{v} by repeated application of Up and Move:

Definition 4 (UM\preceq_{\texttt{UM}}).

Suppose 𝐮,𝐯Γ\bm{u},\bm{v}\in\Gamma. We write 𝐮UM𝐯\bm{u}\prec_{\texttt{UM}}\bm{v} if and only if there exists k1k\geq 1 such that

𝒗UMk({𝒖})=UMUMUMkcompositions({𝒖}).\bm{v}\in{\texttt{UM}}^{k}(\{\bm{u}\})=\underbrace{{\texttt{UM}}\circ\cdots\circ{\texttt{UM}}\circ{\texttt{UM}}}_{k~{}\mathrm{compositions}}(\{\bm{u}\}).

We also write 𝐮UM𝐯\bm{u}\preceq_{\texttt{UM}}\bm{v} if 𝐮UM𝐯\bm{u}\prec_{\texttt{UM}}\bm{v} or 𝐮=𝐯\bm{u}=\bm{v}.

Similarly, we define US(U):=Up(U)Swap(U)\texttt{US}(U):=\texttt{Up}(U)\cup\texttt{Swap}(U) for UΓU\subseteq\Gamma. Then, the binary relation 𝒖US𝒗\bm{u}\prec_{\texttt{US}}\bm{v} holds when 𝒖\bm{u} can be transformed into 𝒗\bm{v} by repeated application of Up and Swap.

Definition 5 (US\preceq_{\texttt{US}}).

Suppose 𝐮,𝐯Γ\bm{u},\bm{v}\in\Gamma. We write 𝐮US𝐯\bm{u}\prec_{\texttt{US}}\bm{v} if and only if there exists k1k\geq 1 such that

𝒗USk({𝒖})=USUSUSkcompositions({𝒖}).\bm{v}\in{\texttt{US}}^{k}(\{\bm{u}\})=\underbrace{{\texttt{US}}\circ\cdots\circ{\texttt{US}}\circ{\texttt{US}}}_{k~{}\mathrm{compositions}}(\{\bm{u}\}).

We also write 𝐮US𝐯\bm{u}\preceq_{\texttt{US}}\bm{v} if 𝐮US𝐯\bm{u}\prec_{\texttt{US}}\bm{v} or 𝐮=𝐯\bm{u}=\bm{v}.

To prove properties of these binary relations, we can use the lexicographic order, which is a well-known linear order [40]:

Definition 6 (lex\preceq_{\texttt{lex}}).

Suppose 𝐮,𝐯Γ\bm{u},\bm{v}\in\Gamma. We write 𝐮lex𝐯\bm{u}\prec_{\texttt{lex}}\bm{v} if and only if there exists s[1,n]s\in[1,n] such that us<vsu_{s}<v_{s} and uj=vju_{j}=v_{j} for j[1,s1]j\in[1,s-1]. We also write 𝐮lex𝐯\bm{u}\preceq_{\texttt{lex}}\bm{v} if 𝐮lex𝐯\bm{u}\prec_{\texttt{lex}}\bm{v} or 𝐮=𝐯\bm{u}=\bm{v}.

Each application of Up, Move, and Swap makes a PV sequence greater in the lexicographic order. Therefore, we can obtain the following lemma:

Lemma 1.

Suppose 𝐮,𝐯Γ\bm{u},\bm{v}\in\Gamma. If 𝐮UM𝐯\bm{u}\preceq_{\texttt{UM}}\bm{v} or 𝐮US𝐯\bm{u}\preceq_{\texttt{US}}\bm{v}, then 𝐮lex𝐯\bm{u}\preceq_{\texttt{lex}}\bm{v}.

The following theorem states that a partial order of PV sequences is derived by operations Up and Move.

Theorem 1.

The pair (Γ,UM)(\Gamma,\preceq_{\texttt{UM}}) is a poset.

Proof.

From Definition 4, the relation UM\preceq_{\texttt{UM}} is reflexive and transitive. Suppose 𝒖UM𝒗\bm{u}\preceq_{\texttt{UM}}\bm{v} and 𝒗UM𝒖\bm{v}\preceq_{\texttt{UM}}\bm{u}. It follows from Lemma 1 that 𝒖lex𝒗\bm{u}\preceq_{\texttt{lex}}\bm{v} and 𝒗lex𝒖\bm{v}\preceq_{\texttt{lex}}\bm{u}. Since the relation lex\preceq_{\texttt{lex}} is antisymmetric, we have 𝒖=𝒗\bm{u}=\bm{v}, thus proving that the relation UM\preceq_{\texttt{UM}} is also antisymmetric. ∎

We can similarly prove the following theorem for operations Up and Swap:

Theorem 2.

The pair (Γ,US)(\Gamma,\preceq_{\texttt{US}}) is a poset.

IV-D Shape-restricted optimization model

Let n𝒗n_{\bm{v}} be the number of user–item pairs that have the PV sequence 𝒗Γ\bm{v}\in\Gamma. Also, q𝒗q_{\bm{v}} is the number of choices arising from user–item pairs having 𝒗Γ\bm{v}\in\Gamma on the base date. Similarly to Eq. (8), we can calculate empirical item-choice probabilities as

x^𝒗:=q𝒗n𝒗(𝒗Γ).\displaystyle\hat{x}_{\bm{v}}:=\frac{q_{\bm{v}}}{n_{\bm{v}}}\qquad(\bm{v}\in\Gamma). (13)

Our shape-restricted optimization model minimizes the weighted sum of squared errors subject to the monotonicity constraint:

minimize(x𝒗)𝒗Γ\displaystyle\mathop{\mbox{minimize}}_{(x_{\bm{v}})_{\bm{v}\in\Gamma}} 𝒗Γn𝒗(x𝒗x^𝒗)2\displaystyle\quad\sum_{\bm{v}\in\Gamma}n_{\bm{v}}(x_{\bm{v}}-\hat{x}_{\bm{v}})^{2} (14)
subject to x𝒖x𝒗(𝒖,𝒗Γ with 𝒖𝒗),\displaystyle\quad x_{\bm{u}}\leq x_{\bm{v}}\qquad(\bm{u},\bm{v}\in\Gamma\mbox{~{}with~{}}\bm{u}\prec\bm{v}), (15)
0x𝒗1(𝒗Γ),\displaystyle\quad 0\leq x_{\bm{v}}\leq 1~{}~{}~{}(\bm{v}\in\Gamma), (16)

where 𝒖𝒗\bm{u}\prec\bm{v} in Eq. (15) is defined by one of the partial orders UM\prec_{\texttt{UM}} or US\prec_{\texttt{US}}.

The monotonicity constraint (15) enhances the estimation accuracy of item-choice probabilities. In addition, our shape-restricted optimization model can be used in a post-processing step to improve prediction performance of other machine learning methods. Specifically, we first compute item-choice probabilities using a machine learning method and then substitute the computed values into (x^𝒗)𝒗Γ(\hat{x}_{\bm{v}})_{\bm{v}\in\Gamma} to solve the optimization model (14)–(16). Consequently, we can obtain item-choice probabilities corrected by the monotonicity constraint (15). Section VI-D illustrates the usefulness of this approach.

However, since |Γ|=(m+1)n|\Gamma|=(m+1)^{n}, it follows that the number of constraints in Eq. (15) is 𝒪((m+1)2n)\mathcal{O}((m+1)^{2n}), which can be extremely large. When (n,m)=(5,6)(n,m)=(5,6), for instance, we have (m+1)2n=282,475,249(m+1)^{2n}=282{,}475{,}249. The next section describes how we mitigate this difficulty by removing redundant constraints in Eq. (15).

Refer to caption
(a) Operation-based graph
Refer to caption
(b) Transitive reduction
Figure 1: Directed graph representations of the poset (Γ,UM)(\Gamma,\preceq_{\texttt{UM}}) with (n,m)=(3,2)(n,m)=(3,2).
Refer to caption
(a) Operation-based graph
Refer to caption
(b) Transitive reduction
Figure 2: Directed graph representations of the poset (Γ,US)(\Gamma,\preceq_{\texttt{US}}) with (n,m)=(3,2)(n,m)=(3,2).

V Algorithms for transitive reduction

This section describes our constructive algorithms for transitive reduction to decrease the problem size in our shape-restricted optimization model.

V-A Transitive reduction

A poset (Γ,)(\Gamma,\preceq) can be represented by a directed graph (Γ,E)(\Gamma,E), where Γ\Gamma and EΓ×ΓE\subseteq\Gamma\times\Gamma are sets of nodes and directed edges, respectively. Each directed edge (𝒖,𝒗)E(\bm{u},\bm{v})\in E in this graph corresponds to the order relation 𝒖𝒗\bm{u}\prec\bm{v}, so the number of directed edges coincides with the number of constraints in Eq. (15).

Figs. 1 and 2 show directed graph representations of posets (Γ,UM)(\Gamma,\preceq_{\texttt{UM}}) and (Γ,US)(\Gamma,\preceq_{\texttt{US}}), respectively. Each edge in Figs. 1(a) and 2(a) corresponds to one of the operations Up, Move, or Swap. Edge (𝒖,𝒗)(\bm{u},\bm{v}) is red if 𝒗Up({𝒖})\bm{v}\in\texttt{Up}(\{\bm{u}\}) and black if 𝒗Move({𝒖})\bm{v}\in\texttt{Move}(\{\bm{u}\}) or 𝒗Swap({𝒖})\bm{v}\in\texttt{Swap}(\{\bm{u}\}). The directed graphs in Figs. 1(a) and 2(a) can be easily created.

Suppose there are three edges

(𝒖,𝒘),(𝒘,𝒗),(𝒖,𝒗)E.(\bm{u},\bm{w}),(\bm{w},\bm{v}),(\bm{u},\bm{v})\in E.

In this case, edge (𝒖,𝒗)(\bm{u},\bm{v}) is implied by the other edges due to the transitivity of partial order

𝒖𝒘,𝒘𝒗𝒖𝒗,\langle\bm{u}\prec\bm{w},~{}\bm{w}\prec\bm{v}\rangle~{}\Rightarrow~{}\bm{u}\prec\bm{v},

or, equivalently,

x𝒖x𝒘,x𝒘x𝒗x𝒖x𝒗.\langle x_{\bm{u}}\leq x_{\bm{w}},~{}x_{\bm{w}}\leq x_{\bm{v}}\rangle~{}\Rightarrow~{}x_{\bm{u}}\leq x_{\bm{v}}.

As a result, edge (𝒖,𝒗)(\bm{u},\bm{v}) is redundant and can be removed from the directed graph.

A transitive reduction, also known as a Hasse diagram, of a directed graph (Γ,E)(\Gamma,E) is its subgraph (Γ,E)(\Gamma,E^{*}) such that all redundant edges are removed using the transitivity of partial order [2]. Figs. 1(b) and 2(b) show transitive reductions of the directed graphs shown in Figs. 1(a) and 2(a), respectively. By computing transitive reductions, the number of edges is reduced from 90 to 42 in Fig. 1, and from 81 to 46 in Fig. 2. This transitive reduction is known to be unique [2].

V-B General-purpose algorithms

The transitive reduction (Γ,E)(\Gamma,E^{*}) is characterized by the following lemma [40]:

Lemma 2.

Suppose (𝐮,𝐯)Γ×Γ(\bm{u},\bm{v})\in\Gamma\times\Gamma. Then, (𝐮,𝐯)E(\bm{u},\bm{v})\in E^{*} holds if and only if both of the following conditions are fulfilled:

  • (C1) 𝒖𝒗\bm{u}\prec\bm{v}, and

  • (C2) if 𝒘Γ\bm{w}\in\Gamma satisfies 𝒖𝒘𝒗\bm{u}\preceq\bm{w}\preceq\bm{v}, then 𝒘{𝒖,𝒗}\bm{w}\in\{\bm{u},\bm{v}\}.

The basic strategy in general-purpose algorithms for transitive reduction involves the following steps:

  • Step 1: An exhaustive directed graph (Γ,E)(\Gamma,E) is generated from a given poset (Γ,)(\Gamma,\preceq).

  • Step 2: The transitive reduction (Γ,E)(\Gamma,E^{*}) is computed from the directed graph (Γ,E)(\Gamma,E) using Lemma 2.

Various algorithms for speeding up the computation in Step 2 have been proposed. Recall that |Γ|=(m+1)n|\Gamma|=(m+1)^{n} in our situation. Warshall’s algorithm [49] has time complexity 𝒪((m+1)3n)\mathcal{O}((m+1)^{3n}) for completing Step 2 [40]. By using a sophisticated algorithm for fast matrix multiplication, this time complexity can be reduced to 𝒪((m+1)2.3729n)\mathcal{O}((m+1)^{2.3729n}) [24].

However, such general-purpose algorithms are clearly inefficient, especially when nn is very large, and Step 1 requires a huge number of computations. To resolve this difficulty, we devised specialized algorithms for directly constructing a transitive reduction.

V-C Constructive algorithms

Let (Γ,EUM)(\Gamma,E^{*}_{\texttt{UM}}) be a transitive reduction of a directed graph (Γ,EUM)(\Gamma,E_{\texttt{UM}}) representing the poset (Γ,UM)(\Gamma,\preceq_{\texttt{UM}}). Then, the transitive reduction can be characterized by the following theorem:

Theorem 3.

Suppose (𝐮,𝐯)Γ×Γ(\bm{u},\bm{v})\in\Gamma\times\Gamma. Then, (𝐮,𝐯)EUM(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}} holds if and only if any one of the following conditions is fulfilled:

  • (UM1) 𝒗=Up(𝒖,n)\bm{v}={\texttt{Up}}(\bm{u},n), or

  • (UM2) s[1,n]\exists s\in[1,n] such that 𝒗=Move(𝒖,s,s+1)\bm{v}={\texttt{Move}}(\bm{u},s,s+1).

Proof.

See Appendix A-A. ∎

Theorem 3 gives a constructive algorithm that directly computes the transitive reduction (Γ,EUM)(\Gamma,E^{*}_{\texttt{UM}}) without generating an exhaustive directed graph (Γ,E)(\Gamma,E). Our algorithm is based on the breadth-first search [11]. Specifically, we start with a node list L={(0,0,,0)}ΓL=\{(0,0,\ldots,0)\}\subseteq\Gamma. At each iteration of the algorithm, we choose 𝒖L\bm{u}\in L, enumerate 𝒗Γ\bm{v}\in\Gamma such that (𝒖,𝒗)EUM(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}, and add these nodes to LL.

Table II shows this enumeration process for 𝒖=(0,2,1)\bm{u}=(0,2,1) with (n,m)=(3,2)(n,m)=(3,2). The operations Up and Move generate

𝒗{(1,2,1),(0,2,2),(1,1,1),(1,2,0)},\bm{v}\in\{(1,2,1),(0,2,2),(1,1,1),(1,2,0)\},

which amounts to searching edges (𝒖,𝒗)(\bm{u},\bm{v}) in Fig. 1(a). We next check whether each 𝒗\bm{v} satisfies conditions (UM1) or (UM2) in Theorem 3. As shown in Table II, we choose

𝒗{(0,2,2),(1,1,1)}\bm{v}\in\{(0,2,2),(1,1,1)\}

and add them to list LL; this amounts to enumerating edges (𝒖,𝒗)(\bm{u},\bm{v}) in Fig. 1(b).

TABLE II: Process of enumerating 𝒗Γ\bm{v}\in\Gamma such that (𝒖,𝒗)EUM(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}
𝒖\bm{u} Operation 𝒗\bm{v} (UM1) (UM2)
(0,2,1)(0,2,1) Up(𝒖,1)\texttt{Up}(\bm{u},1) (1,2,1)(1,2,1) ×\times
Up(𝒖,3)\texttt{Up}(\bm{u},3) (0,2,2)(0,2,2) \surd
Move(𝒖,1,2)\texttt{Move}(\bm{u},1,2) (1,1,1)(1,1,1) \surd
Move(𝒖,1,3)\texttt{Move}(\bm{u},1,3) (1,2,0)(1,2,0) ×\times

Appendix B-A presents pseudocode for our constructive algorithm (Algorithm 1). Recalling the time complexity analysis of the breadth-first search [11], one readily sees that the time complexity of Algorithm 1 is 𝒪(n(m+1)n)\mathcal{O}(n(m+1)^{n}), which is much smaller than 𝒪((m+1)2.3729n)\mathcal{O}((m+1)^{2.3729n}) as achieved by the general-purpose algorithm [24], especially when nn is very large.

Next, we focus on the transitive reduction (Γ,EUS)(\Gamma,E^{*}_{\texttt{US}}) of a directed graph (Γ,EUS)(\Gamma,E_{\texttt{US}}) representing the poset (Γ,US)(\Gamma,\preceq_{\texttt{US}}). The transitive reduction can then be characterized by the following theorem:

Theorem 4.

Suppose (𝐮,𝐯)Γ×Γ(\bm{u},\bm{v})\in\Gamma\times\Gamma. Then, (𝐮,𝐯)EUS(\bm{u},\bm{v})\in E^{*}_{\texttt{US}} holds if and only if any one of the following conditions is fulfilled:

  • (US1) s[1,n]\exists s\in[1,n] such that 𝒗=Up(𝒖,s)\bm{v}={\texttt{Up}}(\bm{u},s) and uj{us,us+1}u_{j}\not\in\{u_{s},u_{s}+1\} for all j[s+1,n]j\in[s+1,n], or

  • (US2) (s,t)[1,n]×[1,n]\exists(s,t)\in[1,n]\times[1,n] such that 𝒗=Swap(𝒖,s,t)\bm{v}={\texttt{Swap}}(\bm{u},s,t) and uj[us,ut]u_{j}\not\in[u_{s},u_{t}] for all j[s+1,t1]j\in[s+1,t-1].

Proof.

See Appendix A-B. ∎

Theorem 4 also gives a constructive algorithm for computing the transitive reduction (Γ,EUS)(\Gamma,E^{*}_{\texttt{US}}). Let us again consider 𝒖=(0,2,1)\bm{u}=(0,2,1) as an example with (n,m)=(3,2)(n,m)=(3,2). As shown in Table III, operations Up and Swap generate

𝒗{(1,2,1),(0,2,2),(2,0,1),(1,2,0)},\bm{v}\in\{(1,2,1),(0,2,2),(2,0,1),(1,2,0)\},

and we choose

𝒗{(0,2,2),(2,0,1),(1,2,0)}\bm{v}\in\{(0,2,2),(2,0,1),(1,2,0)\}

(see also Figs. 2(a) and 2(b)).

TABLE III: Process of enumerating 𝒗Γ\bm{v}\in\Gamma such that (𝒖,𝒗)EUS(\bm{u},\bm{v})\in E^{*}_{\texttt{US}}
𝒖\bm{u} Operation 𝒗\bm{v} (US1) (US2)
(0,2,1)(0,2,1) Up(𝒖,1)\texttt{Up}(\bm{u},1) (1,2,1)(1,2,1) ×\times
Up(𝒖,3)\texttt{Up}(\bm{u},3) (0,2,2)(0,2,2) \surd
Swap(𝒖,1,2)\texttt{Swap}(\bm{u},1,2) (2,0,1)(2,0,1) \surd
Swap(𝒖,1,3)\texttt{Swap}(\bm{u},1,3) (1,2,0)(1,2,0) \surd

Appendix B-B presents pseudocode for our constructive algorithm (Algorithm 2). Its time complexity is estimated to be 𝒪(n2(m+1)n)\mathcal{O}(n^{2}(m+1)^{n}), which is larger than that of Algorithm 1 but much smaller than that of the general-purpose algorithm [24], especially when nn is very large.

VI Experiments

The experimental results reported in this section evaluate the effectiveness of our method for estimating item-choice probabilities.

We used real-world clickstream data collected from a Chinese e-commerce website, Tmall111https://tianchi.aliyun.com/dataset/. We used a dataset222https://www.dropbox.com/sh/dbzmtq4zhzbj5o9/AACldzQWbw-igKjcPTBI6ZPAa?dl=0 preprocessed by Ludewig and Jannach [27]. Each record corresponds to one PV and contains information such as user ID, item ID, and a timestamp. The dataset includes 28,316,459 unique user–item pairs composed from 422,282 users and 624,221 items.

VI-A Methods for comparison

TABLE IV: Methods for comparison
Abbr. Method
2dimEmp Empirical probability table (8[19]
2dimMono Two-dimensional monotonicity model (9)–(12[19]
SeqEmp Empirical probabilities (13) for PV sequences
SeqUM Our PV sequence model (14)–(16) using (Γ,UM)(\Gamma,\preceq_{\texttt{UM}})
SeqUS Our PV sequence model (14)–(16) using (Γ,US)(\Gamma,\preceq_{\texttt{US}})
LR L2L_{2}-regularized logistic regression
ANN Artificial neural networks for regression
RF Random forest of regression trees

We compared the performance of the methods listed in Table IV. All computations were performed on an Apple MacBook Pro computer with an Intel Core i7-5557U CPU (3.10 GHz) and 16 GB of memory.

The optimization models (9)–(12) and (14)–(16) were solved using OSQP333https://osqp.org/docs/index.html [42], a numerical optimization package for solving convex quadratic optimization problems. As in Table I, daily-PV sequences were calculated for each user–item pair, where mm is the maximum number of daily PVs and nn is the number of terms (past days) in the PV sequence. In this process, all PVs from more than nn days earlier were added to the number of PVs nn days earlier, and numbers of daily PVs exceeding mm were rounded down to mm. Similarly, the recency–frequency combinations (r,f)R×F(r,f)\in R\times F were calculated using daily PVs as in Table I, where (|R|,|F|)=(n,m)(|R|,|F|)=(n,m).

Other machine learning methods (LR, ANN, and RF) were respectively implemented using the LogisticRegressionCV, MLPRegressor, and RandomForestRegressor functions in scikit-learn, a Python library of machine learning tools. Related hyperparameters were tuned through 3-fold cross-validation according to the parameter settings in a benchmark study [34]. These machine learning methods employed the PV sequence (v1,v2,,vn)(v_{1},v_{2},\ldots,v_{n}) as nn input variables for computing item-choice probabilities. We standardized each input variable and performed undersampling to improve prediction performance.

VI-B Performance evaluation methodology

There are five pairs of training and validation sets of clickstream data in the preprocessed dataset [27]. As shown in Table V, each training period is 90 days, and the next day is the validation period. The first four pairs of training and validation sets, which we call the training set, were used for model estimation, and the fifth pair was used for performance evaluation. To examine how sample size affects prediction performance, we prepared small-sample training sets by randomly choosing user–item pairs from the original training set. Here, the sampling rates are 0.1%, 1%, and 10%, and the original training set is referred to as the full sample. Note that the results were averaged over 10 trials for the sampled training sets.

TABLE V: Training and validation periods
Training
Pair ID Start End Validation
1 21 May 2015 18 August 2015 19 August 2015
2 31 May 2015 28 August 2015 29 August 2015
3 10 June 2015 7 September 2015 8 September 2015
4 20 June 2015 17 September 2015 18 September 2015
5 30 June 2015 27 September 2015 28 September 2015
TABLE VI: Problem size of our PV sequence model (14)–(16)
#Cons in Eq. (15)
Enumeration Operation Reduction
nn mm #Vars SeqUM SeqUS SeqUM SeqUS SeqUM SeqUS
5 1 32 430 430 160 160 48 48
5 2 243 21,383 17,945 1,890 1,620 594 634
5 3 1,024 346,374 255,260 9,600 7,680 3,072 3,546
5 4 3,125 3,045,422 2,038,236 32,500 25,000 10,500 12,898
5 5 7,776 18,136,645 11,282,058 86,400 64,800 28,080 36,174
5 6 16,807 82,390,140 48,407,475 195,510 144,060 63,798 85,272
1 6 7 21 21 6 6 6 6
2 6 49 1,001 861 120 105 78 93
3 6 343 42,903 32,067 1,638 1,323 798 1,018
4 6 2,401 1,860,622 1,224,030 18,816 14,406 7,350 9,675
5 6 16,807 82,390,140 48,407,475 195,510 144,060 63,798 85,272

We considered the top-NN selection task to evaluate prediction performance. Specifically, we focused on items that were viewed by a particular user during a training period. From among these items, we selected IselI_{\rm sel}, a set of top-NN items for the user according to estimated item-choice probabilities. The most-recently viewed items were selected when two or more items had the same choice probability. Let IviewI_{\rm view} be the set of items viewed by the user in the validation period. Then, the F1 score is defined by the harmonic average of Recall:=|IselIview|/|Iview|\mbox{\emph{Recall}}:=|I_{\rm sel}\cap I_{\rm view}|/|I_{\rm view}| and Precision:=|IselIview|/|Isel|\mbox{\emph{Precision}}:=|I_{\rm sel}\cap I_{\rm view}|/|I_{\rm sel}| as

F1score:=2RecallPrecisionRecall+Precision.\mathrm{F1~{}score}:=\frac{2\cdot\mathrm{Recall}\cdot\mathrm{Precision}}{\mathrm{Recall}+\mathrm{Precision}}.

In the following sections, we examine F1 scores averaged over all users. The percentage of user–item pairs leading to item choices is only 0.16%.

VI-C Effects of the transitive reduction

We generated constraints in Eq. (15) based on the following three directed graphs:

  • Case 1 (Enumeration): All edges (𝒖,𝒗)(\bm{u},\bm{v}) satisfying 𝒖𝒗\bm{u}\prec\bm{v} were enumerated.

  • Case 2 (Operation): Edges corresponding to operations Up, Move, and Swap were generated as in Figs. 1(a) and 2(a).

  • Case 3 (Reduction): Transitive reduction was computed using our algorithms as in Figs. 1(b) and 2(b).

TABLE VII: Computation times for our PV sequence model (14)–(16)
Time [s]
Enumeration Operation Reduction
nn mm #Vars SeqUM SeqUS SeqUM SeqUS SeqUM SeqUS
5 1 32 0.00 0.01 0.00 0.00 0.00 0.00
5 2 243 2.32 1.66 0.09 0.07 0.03 0.02
5 3 1,024 558.22 64.35 3.41 0.71 0.13 0.26
5 4 3,125 OM OM 24.07 13.86 1.72 5.80
5 5 7,776 OM OM 180.53 67.34 9.71 36.94
5 6 16,807 OM OM 906.76 522.84 86.02 286.30
1 6 7 0.00 0.00 0.00 0.00 0.00 0.00
2 6 49 0.03 0.01 0.01 0.00 0.00 0.00
3 6 343 12.80 1.68 0.20 0.03 0.05 0.02
4 6 2,401 OM OM 8.07 4.09 2.12 2.87
5 6 16,807 OM OM 906.76 522.84 86.02 286.30
TABLE VIII: Computational performance of our PV sequence model (14)–(16)
#Cons in Eq. (15) Time [s] F1 score [%], N=3N=3
nn mm #Vars SeqUM SeqUS SeqUM SeqUS SeqEmp SeqUM SeqUS
3 30 29,791 84,630 118,850 86.72 241.46 12.25 12.40 12.40
4 12 28,561 99,372 142,800 198.82 539.76 12.68 12.93 12.95
5 6 16,807 63,798 85,272 86.02 286.30 12.90 13.18 13.18
6 4 15,625 62,500 76,506 62.92 209.67 13.14 13.49 13.48
7 3 16,384 67,584 76,818 96.08 254.31 13.23 13.52 13.53
8 2 6,561 24,786 25,879 19.35 17.22 13.11 13.37 13.35
9 2 19,683 83,106 86,386 244.15 256.42 13.07 13.40 13.37

Table VI shows the problem size of our PV sequence model (14)–(16) for some (n,m)(n,m) settings of the PV sequence. Here, the “#Vars” column shows the number of decision variables (i.e., (m+1)n(m+1)^{n}), and the subsequent columns show the number of constraints in Eq. (15) for the three cases mentioned above.

The number of constraints grew rapidly as nn and mm increased in the enumeration case. In contrast, the number of constraints was always kept smallest by the transitive reduction among the three cases. When (n,m)=(5,6)(n,m)=(5,6), for instance, transitive reductions reduced the number of constraints in the operation case to 63798/19551032.6%63798/195510\approx 32.6\% for SeqUM and 85272/14406059.2%85272/144060\approx 59.2\% for SeqUS.

The number of constraints was larger for SeqUM than for SeqUS in the enumeration and operation cases. In contrast, the number of constraints was often smaller for SeqUM than for SeqUS in the reduction case. Thus, the transitive reduction had a greater impact on SeqUM than on SeqUS in terms of the number of constraints.

Table VII lists the computation times required for solving the optimization problem (14)–(16) for some (n,m)(n,m) settings of the PV sequence. Here, “OM” indicates that computation was aborted due to a lack of memory. The enumeration case often caused out-of-memory errors because of the huge number of constraints (see Table VI), but the operation and reduction cases completed the computations for all (n,m)(n,m) settings for the PV sequence. Moreover, the transitive reduction made computations faster. A notable example is SeqUM with (n,m)=(5,6)(n,m)=(5,6), for which the computation time in the reduction case (86.02 s) was only one-tenth of that in the operation case (906.76 s). These results demonstrate that transitive reduction improves efficiency in terms of both computation time and memory usage.

Table VIII shows the computational performance of our optimization model (14)–(16) for some (n,m)(n,m) settings of PV sequences. Here, for each n{3,4,9}n\in\{3,4\ldots,9\}, the largest mm was chosen such that the computation finished within 30 min. Both SeqUM and SeqUS always delivered higher F1 scores than SeqEmp did. This indicates that our monotonicity constraint (15) works well for improving prediction performance. The F1 scores provided by SeqUM and SeqUS were very similar and largest with (n,m)=(7,3)(n,m)=(7,3). In light of these results, we use the setting (n,m){(7,3),(5,6)}(n,m)\in\{(7,3),(5,6)\} in the following sections.

VI-D Prediction performance of our PV sequence model

Refer to caption Refer to caption Refer to caption
(a) N=3N=3, (n,m)=(7,3)(n,m)=(7,3) (b) N=5N=5, (n,m)=(7,3)(n,m)=(7,3) (c) N=10N=10, (n,m)=(7,3)(n,m)=(7,3)
Refer to caption Refer to caption Refer to caption
(d) N=3N=3, (n,m)=(5,6)(n,m)=(5,6) (e) N=5N=5, (n,m)=(5,6)(n,m)=(5,6) (f) N=10N=10, (n,m)=(5,6)(n,m)=(5,6)
Figure 3: Comparison of prediction performance versus the two-dimensional probability table.
Refer to caption Refer to caption Refer to caption
(a) N=3N=3, (n,m)=(7,3)(n,m)=(7,3) (b) N=5N=5, (n,m)=(7,3)(n,m)=(7,3) (c) N=10N=10, (n,m)=(7,3)(n,m)=(7,3)
Refer to caption Refer to caption Refer to caption
(d) N=3N=3, (n,m)=(5,6)(n,m)=(5,6) (e) N=5N=5, (n,m)=(5,6)(n,m)=(5,6) (f) N=10N=10, (n,m)=(5,6)(n,m)=(5,6)
Figure 4: Comparison of prediction performance versus machine learning methods.
Refer to caption Refer to caption Refer to caption
(a) SeqEmp, v3=0v_{3}=0 (b) SeqEmp, v3=1v_{3}=1 (c) SeqEmp, v3=2v_{3}=2
Refer to caption Refer to caption Refer to caption
(d) SeqUM, v3=0v_{3}=0 (e) SeqUM, v3=1v_{3}=1 (f) SeqUM, v3=2v_{3}=2
Refer to caption Refer to caption Refer to caption
(g) SeqUS, v3=0v_{3}=0 (h) SeqUS, v3=1v_{3}=1 (i) SeqUS, v3=2v_{3}=2
Figure 5: Item-choice probabilities estimated from the full-sample training set with (n,m)=(5,6)(n,m)=(5,6).

Fig. 3 shows F1 scores of the two-dimensional probability table and our PV sequence model using the sampled training sets, where the number of selected items is N{3,5,10}N\in\{3,5,10\}, and the setting of the PV sequence is (n,m){(7,3),(5,6)}(n,m)\in\{(7,3),(5,6)\}.

When the full-sample training set was used, SeqUM and SeqUS always delivered better prediction performance than the other methods did. When the 1%- and 10%-sampled training sets were used, the prediction performance of SeqUS decreased slightly, whereas SeqUM still performed best among all the methods. When the 0.1%-sampled training set was used, 2dimMono always performed better than SeqUS did, and 2dimMono also had the best prediction performance in the case (n,m)=(5,6)(n,m)=(5,6). These results suggest that our PV sequence model performs very well, especially when the sample size is sufficiently large.

The prediction performance of SeqEmp deteriorated rapidly as the sampling rate decreased, and this performance was always much worse than that of 2dimEmp. Meanwhile, SeqUM and SeqUS maintained high prediction performance even when the 0.1%-sampled training set was used. This suggests that the monotonicity constraint (15) in our PV sequence model is more effective than the monotonicity constraints (10)–(11) in the two-dimensional monotonicity model.

Fig. 4 shows F1 scores for the machine learning methods (LR, ANN, and RF) and our PV sequence model (SeqUM) using the full-sample training set, where the number of selected items is N{3,5,10}N\in\{3,5,10\}, and the PV sequence setting is (n,m){(7,3),(5,6)}(n,m)\in\{(7,3),(5,6)\}. Note that in this figure, SeqUM( \ast ) represents the optimization model (14)–(16), where the item-choice probabilities computed by each machine learning method were substituted into (x^𝒗)𝒗Γ(\hat{x}_{\bm{v}})_{\bm{v}\in\Gamma} (see Section 4.4).

Prediction performance was better for SeqUM than for all the machine learning methods, except in the case of Fig. 4(f), where LR showed better prediction performance. Moreover, SeqUM( \ast ) improved the prediction performance of the machine leaning methods, particularly for ANN and RF. This suggests that our monotonicity constraint (15) is also very helpful in correcting prediction values from other machine learning methods.

VI-E Analysis of estimated item-choice probabilities

Fig. 5 shows item-choice probabilities estimated by our PV sequence model using the full-sample training set, where the PV sequence setting is (n,m)=(5,6)(n,m)=(5,6). Here, we focus on PV sequences in the form 𝒗=(v1,v2,v3,0,0)Γ\bm{v}=(v_{1},v_{2},v_{3},0,0)\in\Gamma and depict estimates of item-choice probabilities on (v1,v2)[0,m]×[0,m](v_{1},v_{2})\in[0,m]\times[0,m] for each v3{0,1,2}v_{3}\in\{0,1,2\}. Note also that the number of associated user–item pairs decreased as the value of v3v_{3} increased.

Because SeqEmp does not consider the monotonicity constraint (15), item-choice probabilities estimated by SeqEmp have irregular shapes for v3{1,2}v_{3}\in\{1,2\}. In contrast, item-choice probabilities estimated with the monotonicity constraint (15) are relatively smooth. Because of the Up operation, item-choice probabilities estimated by SeqUM and SeqUS increase as (v1,v2)(v_{1},v_{2}) moves from (0,0)(0,0) to (6,6)(6,6). Because of the Move operation, item-choice probabilities estimated by SeqUM also increase as (v1,v2)(v_{1},v_{2}) moves from (0,6)(0,6) to (6,0)(6,0). Item-choice probabilities estimated by SeqUS are relatively high around (v1,v2)=(3,3)(v_{1},v_{2})=(3,3). This highlights the difference in the monotonicity constraint (15) between posets (Γ,UM)(\Gamma,\preceq_{\texttt{UM}}) and (Γ,US)(\Gamma,\preceq_{\texttt{US}}).

Fig. 6 shows item-choice probabilities estimated by our PV sequence model using the 10%-sampled training set, where the PV sequence setting is (n,m)=(5,6)(n,m)=(5,6). Since the sample size was reduced in this case, item-choice probabilities estimated by SeqEmp are highly unstable. In particular, item-choice probabilities were estimated to be zero for all (v1,v2)(v_{1},v_{2}) with v13v_{1}\geq 3 in Fig. 6(c), but this is unreasonable from the perspective of frequency. In contrast, SeqUM and SeqUS estimated item-choice probabilities that increase monotonically with respect to (v1,v2)(v_{1},v_{2}).

Refer to caption Refer to caption Refer to caption
(a) SeqEmp, v3=0v_{3}=0 (b) SeqEmp, v3=1v_{3}=1 (c) SeqEmp, v3=2v_{3}=2
Refer to caption Refer to caption Refer to caption
(d) SeqUM, v3=0v_{3}=0 (e) SeqUM, v3=1v_{3}=1 (f) SeqUM, v3=2v_{3}=2
Refer to caption Refer to caption Refer to caption
(g) SeqUS, v3=0v_{3}=0 (h) SeqUS, v3=1v_{3}=1 (i) SeqUS, v3=2v_{3}=2
Figure 6: Item-choice probabilities estimated from the 10%-sampled training set with (n,m)=(5,6)(n,m)=(5,6).

VII Conclusion

We presented a shape-restricted optimization model for estimating item-choice probabilities on an e-commerce website. Our monotonicity constraints based on tailored order relations could better estimate item-choice probabilities for all possible PV sequences. To improve computational efficiency of our optimization model, we devised constructive algorithms for transitive reduction that remove all redundant constraints from the optimization model.

We assessed the effectiveness of our method through experiments using real-world clickstream data. Experimental results demonstrated that transitive reduction enhanced the efficiency of our optimization model in terms of both computation time and memory usage. In addition, our method delivered better prediction performance than did the two-dimensional monotonicity model [19] and common machine learning methods. Our method was also helpful in correcting prediction values computed by other machine learning methods.

This study made three main contributions. First, we derived two types of posets by exploiting the properties of recency and frequency of a user’s previous PVs. These posets allow us to place appropriate monotonicity constraints on item-choice probabilities. Next, we developed algorithms for transitive reduction of our posets. These algorithms are more efficient than general-purpose algorithms in terms of time complexity for transitive reduction. Finally, our method further expanded the potential of shape-restricted regression for predicting user behavior on e-commerce websites.

Once the optimization model for estimating item-choice probabilities has been solved, the obtained results can easily be put into practical use on e-commerce websites. Accurate estimates of item-choice probabilities will be useful for customizing sales promotions according to the needs of a particular user. In addition, our method can estimate user preferences from clickstream data, therefore aiding creation of high-quality user–item rating matrices for recommendation algorithms [20].

In future studies, we will develop new posets that further improve the prediction performance of our PV sequence model. Another direction of future research will be to incorporate user–item heterogeneity into our optimization model, as in the case of latent class modeling with a two-dimensional probability table [32].

Appendix A Proofs

A-A Proof of Theorem 3

The “only if” part

Suppose (𝒖,𝒗)EUM(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}. We then have 𝒗UM({𝒖})\bm{v}\in\texttt{UM}(\{\bm{u}\}) from Definition 4 and Lemma 2. We therefore consider the following two cases:

Case 1: 𝒗=Up(𝒖,s)\bm{v}=\texttt{Up}(\bm{u},s) for some s[1,n]s\in[1,n]
For the sake of contradiction, assume that sns\not=n (i.e., sn1s\leq n-1). Then there exists an index jj such that s<jns<j\leq n. If uj>0u_{j}>0, then 𝒘=Move(𝒖,s,j)\bm{w}=\texttt{Move}(\bm{u},s,j) and 𝒗=Up(𝒘,j)\bm{v}=\texttt{Up}(\bm{w},j). If uj=0u_{j}=0, then 𝒘=Up(𝒖,j)\bm{w}=\texttt{Up}(\bm{u},j) and 𝒗=Move(𝒘,s,j)\bm{v}=\texttt{Move}(\bm{w},s,j). This implies that 𝒖UM𝒘UM𝒗\bm{u}\prec_{\texttt{UM}}\bm{w}\prec_{\texttt{UM}}\bm{v}, which contradicts (𝒖,𝒗)EUM(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}} due to condition (C2) of Lemma 2.

Case 2: 𝒗=Move(𝒖,s,t)\bm{v}=\texttt{Move}(\bm{u},s,t) for some (s,t)[1,n]×[1,n](s,t)\in[1,n]\times[1,n]
For the sake of contradiction, assume that ts+1t\not=s+1 (i.e., ts+2t\geq s+2). Then there exists an index jj such that s<j<ts<j<t. If uj>0u_{j}>0, then 𝒘=Move(𝒖,s,j)\bm{w}=\texttt{Move}(\bm{u},s,j) and 𝒗=Move(𝒘,j,t)\bm{v}=\texttt{Move}(\bm{w},j,t). If uj=0u_{j}=0, then 𝒘=Move(𝒖,j,t)\bm{w}=\texttt{Move}(\bm{u},j,t) and 𝒗=Move(𝒘,s,j)\bm{v}=\texttt{Move}(\bm{w},s,j). This implies that 𝒖UM𝒘UM𝒗\bm{u}\prec_{\texttt{UM}}\bm{w}\prec_{\texttt{UM}}\bm{v}, which contradicts (𝒖,𝒗)EUM(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}} due to condition (C2) of Lemma 2.

The “if” part

Next, we show that (𝒖,𝒗)EUM(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}} in the following two cases:

Case 1: Condition (UM1) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider 𝒘Γ\bm{w}\in\Gamma such that 𝒖UM𝒘UM𝒗\bm{u}\preceq_{\texttt{UM}}\bm{w}\preceq_{\texttt{UM}}\bm{v}. From Lemma 1, we have 𝒖lex𝒘lex𝒗\bm{u}\preceq_{\texttt{lex}}\bm{w}\preceq_{\texttt{lex}}\bm{v}. Since 𝒖\bm{u} is next to 𝒗\bm{v} in the lexicographic order, we have 𝒘{𝒖,𝒗}\bm{w}\in\{\bm{u},\bm{v}\}.

Case 2: Condition (UM2) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider 𝒘Γ\bm{w}\in\Gamma such that 𝒖UM𝒘UM𝒗\bm{u}\preceq_{\texttt{UM}}\bm{w}\preceq_{\texttt{UM}}\bm{v}. From Lemma 1, we have 𝒖lex𝒘lex𝒗\bm{u}\preceq_{\texttt{lex}}\bm{w}\preceq_{\texttt{lex}}\bm{v}, which implies that wj=ujw_{j}=u_{j} for all j[1,s1]j\in[1,s-1]. Therefore, we cannot apply any operations to wjw_{j} for j[1,s1]j\in[1,s-1] in the process of transforming 𝒘\bm{w} from 𝒖\bm{u} into 𝒗\bm{v}. To keep the value of j=1nwj\sum_{j=1}^{n}w_{j} constant, we can apply only the Move operation. However, once the Move operation is applied to wjw_{j} for j[s+2,n]j\in[s+2,n], the resultant sequence cannot be converted into 𝒗\bm{v}. As a result, only Move(,s,s+1)\texttt{Move}(\,\cdot\,,s,s+1) can be performed, and therefore 𝒘=𝒖\bm{w}=\bm{u} or 𝒘=Move(𝒖,s,s+1)=𝒗\bm{w}=\texttt{Move}(\bm{u},s,s+1)=\bm{v}.

A-B Proof of Theorem 4

The “only if” part

Suppose that (𝒖,𝒗)EUS(\bm{u},\bm{v})\in E^{*}_{\texttt{US}}. We then have 𝒗US({𝒖})\bm{v}\in\texttt{US}(\{\bm{u}\}) from Definition 5 and Lemma 2. Thus, we consider the following two cases:

Case 1: 𝒗=Up(𝒖,s)\bm{v}=\texttt{Up}(\bm{u},s) for some s[1,n]s\in[1,n]
For the sake of contradiction, assume uj{us,us+1}u_{j}\in\{u_{s},u_{s}+1\} for some j[s+1,n]j\in[s+1,n]. If uj=usu_{j}=u_{s}, then 𝒘=Up(𝒖,j)\bm{w}=\texttt{Up}(\bm{u},j) and 𝒗=Swap(𝒘,s,j)\bm{v}=\texttt{Swap}(\bm{w},s,j). If uj=us+1u_{j}=u_{s}+1, then 𝒘=Swap(𝒖,s,j)\bm{w}=\texttt{Swap}(\bm{u},s,j) and 𝒗=Up(𝒘,j)\bm{v}=\texttt{Up}(\bm{w},j). This implies that 𝒖US𝒘US𝒗\bm{u}\prec_{\texttt{US}}\bm{w}\prec_{\texttt{US}}\bm{v}, which contradicts (𝒖,𝒗)EUS(\bm{u},\bm{v})\in E^{*}_{\texttt{US}} due to condition (C2) of Lemma 2.

Case 2: 𝒗=Swap(𝒖,s,t)\bm{v}=\texttt{Swap}(\bm{u},s,t) for some (s,t)[1,n]×[1,n](s,t)\in[1,n]\times[1,n]
For the sake of contradiction, assume uj[us,ut]u_{j}\in[u_{s},u_{t}] for some j[s+1,t1]j\in[s+1,t-1]. If us<uj<utu_{s}<u_{j}<u_{t}, then 𝒘1=Swap(𝒖,j,t)\bm{w}_{1}=\texttt{Swap}(\bm{u},j,t), 𝒘2=Swap(𝒘1,s,j)\bm{w}_{2}=\texttt{Swap}(\bm{w}_{1},s,j), and 𝒗=Swap(𝒘2,j,t)\bm{v}=\texttt{Swap}(\bm{w}_{2},j,t). If uj=usu_{j}=u_{s}, then 𝒘=Swap(𝒖,j,t)\bm{w}=\texttt{Swap}(\bm{u},j,t) and 𝒗=Swap(𝒘,s,j)\bm{v}=\texttt{Swap}(\bm{w},s,j). If uj=utu_{j}=u_{t}, then 𝒘=Swap(𝒖,s,j)\bm{w}=\texttt{Swap}(\bm{u},s,j) and 𝒗=Swap(𝒘,j,t)\bm{v}=\texttt{Swap}(\bm{w},j,t). Each of these results contradicts (𝒖,𝒗)EUS(\bm{u},\bm{v})\in E^{*}_{\texttt{US}} due to condition (C2) of Lemma 2.

The “if” part

Next, we show that (𝒖,𝒗)EUS(\bm{u},\bm{v})\in E^{*}_{\texttt{US}} in the following two cases:

Case 1: Condition (US1) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider 𝒘Γ\bm{w}\in\Gamma such that 𝒖US𝒘US𝒗\bm{u}\preceq_{\texttt{US}}\bm{w}\preceq_{\texttt{US}}\bm{v}. From Lemma 1, we have 𝒖lex𝒘lex𝒗\bm{u}\preceq_{\texttt{lex}}\bm{w}\preceq_{\texttt{lex}}\bm{v}, implying that wj=ujw_{j}=u_{j} for all j[1,s1]j\in[1,s-1]. Therefore, we cannot apply any operations to wjw_{j} for j[1,s1]j\in[1,s-1] in the process of transforming 𝒘\bm{w} from 𝒖\bm{u} into 𝒗\bm{v}. We must apply the Up operation only once, because the value of j=1nwj\sum_{j=1}^{n}w_{j} remains the same after the Swap operation. Condition (US1) guarantees that for all j[s+1,n]j\in[s+1,n], wjw_{j} does not coincide with us+1u_{s}+1 even if Up(,j)\texttt{Up}(\,\cdot\,,j) is applied. Therefore, Swap(,s,j)\texttt{Swap}(\,\cdot\,,s,j) for j[s+1,n]j\in[s+1,n] never leads to ws=us+1w_{s}=u_{s}+1. As a result, Up(,s)\texttt{Up}(\,\cdot\,,s) must be performed. Other applicable Swap operations produce a sequence that cannot be converted into 𝒗\bm{v}. This means that 𝒘=𝒖\bm{w}=\bm{u} or 𝒘=Up(𝒖,s)=𝒗\bm{w}=\texttt{Up}(\bm{u},s)=\bm{v}.

Case 2: Condition (US2) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider 𝒘Γ\bm{w}\in\Gamma such that 𝒖US𝒘US𝒗\bm{u}\preceq_{\texttt{US}}\bm{w}\preceq_{\texttt{US}}\bm{v}. From Lemma 1, we have 𝒖lex𝒘lex𝒗\bm{u}\preceq_{\texttt{lex}}\bm{w}\preceq_{\texttt{lex}}\bm{v}. This implies that wj=ujw_{j}=u_{j} for all j[1,s1]j\in[1,s-1], and that ws[us,ut]w_{s}\in[u_{s},u_{t}]. Therefore, we cannot apply any operations to wjw_{j} for j[1,s1]j\in[1,s-1] in the process of transforming 𝒘\bm{w} from 𝒖\bm{u} into 𝒗\bm{v}. To keep the value of j=1nwj\sum_{j=1}^{n}w_{j} constant, we can apply only the Swap operation. However, once the Swap operation is applied to wjw_{j} for j[t+1,n]j\in[t+1,n], the resultant sequence cannot be converted into 𝒗\bm{v}. We cannot adopt 𝒘=Swap(𝒖,s,j)\bm{w}=\texttt{Swap}(\bm{u},s,j) for j[s+1,t1]j\in[s+1,t-1] due to condition (US2). If we adopt 𝒘=Swap(𝒖,j,t)\bm{w}=\texttt{Swap}(\bm{u},j,t) for j[s+1,t1]j\in[s+1,t-1], we have wtus1w_{t}\leq u_{s}-1 due to condition (US2), so the application of Swap(,t,j)\texttt{Swap}(\,\cdot\,,t,j) is unavoidable for j[t+1,n]j\in[t+1,n]. As a result, Swap(,s,t)\texttt{Swap}(\,\cdot\,,s,t) must be performed. Other applicable Swap operations produce a sequence that cannot be converted into 𝒗\bm{v}. This means that 𝒘=𝒖\bm{w}=\bm{u} or 𝒘=Swap(𝒖,s,t)=𝒗\bm{w}=\texttt{Swap}(\bm{u},s,t)=\bm{v}.

Appendix B Pseudocode

B-A Constructive algorithm for (Γ,EUM)(\Gamma,E^{*}_{\texttt{UM}})

Algorithm 1 Constructive algorithm for (Γ,EUM)(\Gamma,E^{*}_{\texttt{UM}})

Input a pair (n,m)(n,m) of positive integers
      Output transitive reduction (Γ,EUM)(\Gamma,E^{*}_{\texttt{UM}})

1:procedure 
2:     Llist consisting of (0,0,,0)L\leftarrow\mbox{list consisting of }(0,0,\ldots,0) \triangleright returns Γ\Gamma
3:     Eempty listE\leftarrow\mbox{empty list} \triangleright returns EUME^{*}_{\texttt{UM}}
4:     Qqueue consisting of (0,0,,0)Q\leftarrow\mbox{queue consisting of }(0,0,\ldots,0)
5:     while QQ is not empty do
6:         𝒖dequeue(Q)\bm{u}\leftarrow\textsc{dequeue}(Q)
7:         if (𝒖,n)𝒟U(\bm{u},n)\in\mathcal{D}_{\texttt{U}} then \triangleright for (UM1)
8:              𝒗Up(𝒖,n)\bm{v}\leftarrow\texttt{Up}(\bm{u},n)
9:              append(L,𝒗)\textsc{append}(L,\bm{v}), append(E,(𝒖,𝒗))\textsc{append}(E,(\bm{u},\bm{v}))
10:              enqueue(Q,𝒗)\textsc{enqueue}(Q,\bm{v})          
11:         for s[1,n1]s\in[1,n-1] do\triangleright for (UM2)
12:              if (𝒖,s,s+1)𝒟M(\bm{u},s,s+1)\in\mathcal{D}_{\texttt{M}} then
13:                  𝒗Move(𝒖,s,s+1)\bm{v}\leftarrow\texttt{Move}(\bm{u},s,s+1)
14:                  append(L,𝒗)\textsc{append}(L,\bm{v}), append(E,(𝒖,𝒗))\textsc{append}(E,(\bm{u},\bm{v}))
15:                  enqueue(Q,𝒗)\textsc{enqueue}(Q,\bm{v})                             

The nodes and directed edges of graph (Γ,EUM)(\Gamma,E^{*}_{\texttt{UM}}) are enumerated in a breadth-first search and are stored in two lists LL and EE, respectively. We use append(L,𝒗)\textsc{append}(L,\bm{v}), which appends a vertex 𝒗\bm{v} to the end of LL. We similarly use append(E,(𝒖,𝒗))\textsc{append}(E,(\bm{u},\bm{v})).

A queue QQ is used to store nodes of LL whose successors are under investigation (the “frontier” of LL). The nodes in QQ are listed in ascending order of depth, where the depth of 𝒗\bm{v} is the shortest-path length from (0,0,,0)(0,0,\ldots,0) to 𝒗\bm{v}. We use dequeue(Q)\textsc{dequeue}(Q), which returns and deletes the first element in QQ, and enqueue(Q,𝒗)\textsc{enqueue}(Q,\bm{v}), which appends 𝒗\bm{v} to the end of QQ.

Algorithm 1 summarizes our constructive algorithm for computing the transitive reduction (Γ,EUM)(\Gamma,E^{*}_{\texttt{UM}}). For a given node 𝒖\bm{u} in line 6, we find all nodes 𝒗\bm{v} satisfying condition (UM1) in lines 7–10 and those satisfying condition (UM2) in lines 11–15.

By definition, the membership test for 𝒟U\mathcal{D}_{\texttt{U}} and 𝒟M\mathcal{D}_{\texttt{M}} can be performed in 𝒪(1)\mathcal{O}(1) time. Recall that dequeue, enqueue, and append can be performed in 𝒪(1)\mathcal{O}(1) time. The FOR loop in lines 11–15 executes in 𝒪(n)\mathcal{O}(n) time. Therefore, recalling that |Γ|=(m+1)n|\Gamma|=(m+1)^{n}, we see that Algorithm 1 runs in 𝒪(n(m+1)n)\mathcal{O}(n(m+1)^{n}) time.

B-B Constructive algorithm for (Γ,EUS)(\Gamma,E^{*}_{\texttt{US}})

Algorithm 2 summarizes our constructive algorithm for computing the transitive reduction (Γ,EUS)(\Gamma,E^{*}_{\texttt{US}}). Here, the difference from Algorithm 1 is the method for finding nodes 𝒗\bm{v} satisfying conditions (US1) or (US2). For a given node 𝒖\bm{u} in line 6, we find all nodes 𝒗\bm{v} satisfying condition (US1) in lines 7–16, and those satisfying condition (US2) in lines 17–26. The following describes the latter part.

Algorithm 2 Constructive algorithm for (Γ,EUS)(\Gamma,E^{*}_{\texttt{US}})

Input: a pair (n,m)(n,m) of positive integers
      Output: the transitive reduction (Γ,EUS)(\Gamma,E^{*}_{\texttt{US}})

1:procedure 
2:     Llist consisting of (0,0,,0)L\leftarrow\mbox{list consisting of }(0,0,\ldots,0) \triangleright returns Γ\Gamma
3:     Eempty listE\leftarrow\mbox{empty list} \triangleright returns EUSE^{*}_{\texttt{US}}
4:     Qqueue consisting of (0,0,,0)Q\leftarrow\mbox{queue consisting of }(0,0,\ldots,0)
5:     while QQ is not empty do
6:         𝒖dequeue(Q)\bm{u}\leftarrow\textsc{dequeue}(Q)
7:         for s[1,n]s\in[1,n] do \triangleright for (US1)
8:              if (𝒖,s)𝒟U(\bm{u},s)\in\mathcal{D}_{\texttt{U}} then
9:                  flagTrue\textit{flag}\leftarrow\textit{True}
10:                  for j[s+1,n]j\in[s+1,n] do
11:                       if uj{us,us+1}u_{j}\in\{u_{s},u_{s}+1\} then
12:                           flagFalse\textit{flag}\leftarrow\textit{False}, break                                          
13:                  if flag=True\textit{flag}=\textit{True} then
14:                       𝒗Up(𝒖,s)\bm{v}\leftarrow\texttt{Up}(\bm{u},s)
15:                       append(L,𝒗)\textsc{append}(L,\bm{v}), append(E,(𝒖,𝒗))\textsc{append}(E,(\bm{u},\bm{v}))
16:                       enqueue(Q,𝒗)\textsc{enqueue}(Q,\bm{v})                                          
17:         for s[1,n1]s\in[1,n-1] do \triangleright for (US2)
18:              bm+1b\leftarrow m+1
19:              for t[s+1,n]t\in[s+1,n] do
20:                  if (𝒖,s,t)𝒟S(\bm{u},s,t)\in\mathcal{D}_{\texttt{S}} and ut<bu_{t}<b then
21:                       𝒗Swap(𝒖,s,t)\bm{v}\leftarrow\texttt{Swap}(\bm{u},s,t)
22:                       append(L,𝒗)\textsc{append}(L,\bm{v}), append(E,(𝒖,𝒗))\textsc{append}(E,(\bm{u},\bm{v}))
23:                       enqueue(Q,𝒗)\textsc{enqueue}(Q,\bm{v})
24:                       butb\leftarrow u_{t}
25:                  else if ut=usu_{t}=u_{s} then
26:                       break                                               

Let (𝒖,𝒗)(\bm{u},\bm{v}) be a directed edge added to EE in line 22. Let (s¯,t¯)(\bar{s},\bar{t}) be such that 𝒗=Swap(𝒖,s¯,t¯)\bm{v}=\texttt{Swap}(\bm{u},\bar{s},\bar{t}). From line 20, we have us¯<ut¯<bu_{\bar{s}}<u_{\bar{t}}<b. Note that for each tt in line 19, value bb gives the smallest value of uju_{j} with uj>us¯u_{j}>u_{\bar{s}} for j[s¯+1,t1]j\in[\bar{s}+1,t-1]. Also, due to lines 25–26, ujus¯u_{j}\not=u_{\bar{s}} for j[s¯+1,t¯1]j\in[\bar{s}+1,\bar{t}-1]. Combining these observations, we see that for j[s¯+1,t¯1]j\in[\bar{s}+1,\bar{t}-1],

uj<us¯orujb>ut¯(meaning uj[us¯,ut¯]).u_{j}<u_{\bar{s}}~{}~{}\mbox{or}~{}~{}u_{j}\geq b>u_{\bar{t}}~{}~{}\mbox{(meaning $u_{j}\notin[u_{\bar{s}},u_{\bar{t}}]$)}.

Therefore, the pair (𝒖,𝒗)(\bm{u},\bm{v}) satisfies condition (US2). It is easy to verify that this process finds all vertices 𝒗\bm{v} satisfying condition (US2).

Since both of the double FOR loops at lines 7–16 and 17–26 execute in 𝒪(n2)\mathcal{O}(n^{2}) time, Algorithm 2 runs in 𝒪(n2(m+1)n)\mathcal{O}(n^{2}(m+1)^{n}) time.

References

  • [1] C. C. Aggarwal, Recommender systems, Cham: Springer International Publishing, 2016
  • [2] A. V. Aho, M. R. Garey, and J. D. Ullman, “The transitive reduction of a directed graph,” SIAM J. Comput., vol. 1, no. 2, pp. 131–137, 1972.
  • [3] Y. Aıt-Sahalia and J. Duarte, “Nonparametric option pricing under shape restrictions,” J. Econom., vol. 116, no. 1–2, pp. 9–47, 2003.
  • [4] E. E. Altendorf, A. C. Restificar, and T. G. Dietterich, “Learning from sparse data by exploiting monotonicity constraints,” in Proc. UAI, 2005, pp. 18–26.
  • [5] A. Baumann, J. Haupt, F. Gebert, and S. Lessmann, “Changing perspectives: Using graph metrics to predict purchase probabilities,” Expert Syst. Appl., vol. 94, pp. 137–148, 2018.
  • [6] J. Borges and M. Levene, “Evaluating variable-length markov chain models for analysis of user web navigation sessions,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 4, pp. 441–452, 2007.
  • [7] R. E. Bucklin and C. Sismeiro, “Click here for Internet insight: Advances in clickstream data analysis in marketing,” J. Interact. Mark., vol. 23, no. 1, pp. 35–48, 2009.
  • [8] S. Chatterjee, A. Guntuboyina, and B. Sen, “On risk bounds in isotonic and other shape restricted regression problems,” Ann. Stat., vol. 43, no. 4, pp. 1774–1800, 2015.
  • [9] Y. L. Chen, M. H. Kuo, S. Y. Wu, and K. Tang, “Discovering recency, frequency, and monetary (RFM) sequential patterns from customers’ purchasing data,” Electron. Commer. Res. Appl., vol. 8, no. 5, pp. 241–251, 2009.
  • [10] D. Cirqueira, M. Hofer, D. Nedbal, M. Helfert, and M. Bezbradica, “Customer Purchase Behavior Prediction in E-commerce: Current Tasks, Applications and Methodologies,” in Inter. Workshop NFMCP, 2019.
  • [11] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms, MIT press, 2009.
  • [12] Y. Dong and W. Jiang, “Brand purchase prediction based on time-evolving user behaviors in e-commerce,” Concurr. Comput., vol. 31, no. 1, pp. e4882, 2019.
  • [13] P. S. Fader, B. G. Hardie, and K. L. Lee, “RFM and CLV: Using iso-value curves for customer base analysis,” J. Mark. Res., vol. 42, no. 4, pp. 415–430, 2005.
  • [14] B. R. Gaines, J. Kim, and H. Zhou, “Algorithms for fitting the constrained lasso,” J. Comput. Graph. Stat., vol. 27, no. 4, pp. 861–871, 2018.
  • [15] P. Groeneboom and G. Jongbloed, Nonparametric estimation under shape constraints (Vol. 38), Cambridge University Press, 2014.
  • [16] A. Guntuboyina and B. Sen, “Nonparametric shape-restricted regression,” Stat. Sci., vol. 33, no. 4, pp. 568–594, 2018.
  • [17] Q. Han, T. Wang, S. Chatterjee, and R. J. Samworth, “Isotonic regression in general dimensions,” Ann. Stat., vol. 47, no. 5, pp. 2440–2471, 2019.
  • [18] T. Huang and J. A. Van Mieghem, “Clickstream data and inventory management: Model and empirical analysis,” Prod. Oper. Manag., vol. 23, no. 3, pp. 333–347, 2014.
  • [19] J. Iwanaga, N. Nishimura, N. Sukegawa, and Y. Takano, “Estimating product-choice probabilities from recency and frequency of page views,” Knowl. Based Syst., vol. 99, pp. 157–167, 2016.
  • [20] J. Iwanaga, N. Nishimura, N. Sukegawa, and Y. Takano, “Improving collaborative filtering recommendations by estimating user preferences from clickstream data,” Electron. Commer. Res. Appl., vol. 37, 100877, 2019.
  • [21] D. Jannach, M. Ludewig, and L. Lerche, “Session-based item recommendation in e-commerce: on short-term intents, reminders, trends and discounts,” User Model. User-adapt. Interact., vol. 27, no. 3–5, pp. 351–392, 2017.
  • [22] P. K. Kannan, “Digital marketing: A framework, review and research agenda,” Int. J. Res. Mark., vol. 34, no. 1, pp. 22–45, 2017.
  • [23] D. Koehn, S. Lessmann, and M. Schaal, “Predicting online shopping behaviour from clickstream data using deep learning,” Expert Syst. Appl., vol. 150, no. 15, 113342, 2020.
  • [24] F. Le Gall, “Powers of tensors and fast matrix multiplication,” in Proc. ISSAC, 2014, pp. 296–303, 2014.
  • [25] Q. Li, M. Gu, K. Zhou, and X. Sun, “Multi-Classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce,” in IEEE ICDM Workshop, 2015, pp. 1048–1054.
  • [26] D. Li, G. Zhao, Z. Wang, W. Ma, and Y. Liu, “A method of purchase prediction based on user behavior log,” in IEEE ICDM Workshop, 2015, pp. 1031–1039.
  • [27] M. Ludewig and D. Jannach, “Evaluation of session-based recommendation algorithms,” User Model. User-adapt. Interact., vol. 28, no. 4–5, pp. 331–390, 2018.
  • [28] R. L. Matzkin, “Semiparametric estimation of monotone and concave utility functions for polychotomous choice models,” Econometrica, vol. 59, no. 5, pp. 1315–1327, 1991.
  • [29] W. W. Moe, “An empirical two-stage choice model with varying decision rules applied to internet clickstream data,” J. Mark. Res., vol. 43, no. 4, pp. 680–692, 2006.
  • [30] W. W. Moe and P. S. Fader, “Dynamic conversion behavior at e-commerce sites,” Manage. Sci., vol. 50, no. 3, pp. 326–335, 2004.
  • [31] A. L. Montgomery, S. Li, K. Srinivasan, and J. C. Liechty, “Modeling online browsing and path analysis using clickstream data,” Mark. Sci., vol. 23, no. 4, pp. 579–595, 2004.
  • [32] N. Nishimura, N. Sukegawa, Y. Takano, and J. Iwanaga, “A latent-class model for estimating product-choice probabilities from clickstream data,” Inf. Sci., vol. 429,pp. 406–420, 2018.
  • [33] E. W. Ngai, L. Xiu, and D. C. Chau, “Application of data mining techniques in customer relationship management: A literature review and classification,” Expert Syst. Appl., vol. 36, no. 2, pp. 2592–2602, 2009.
  • [34] P. Orzechowski, W. La Cava, and J. H. Moore, “Where are we now? A large benchmark study of recent symbolic regression methods,” in Proc. GECCO, 2018, pp. 1183–1190.
  • [35] P. M. Pardalos and G. Xue, “Algorithms for a class of isotonic regression problems,” Algorithmica, vol. 23, no. 3, pp. 211–222, 1999.
  • [36] C. H. Park and Y. H. Park, “Investigating purchase conversion by uncovering online visit patterns,” Mark. Sci., vol. 35, no. 6, pp. 894–914, 2016.
  • [37] A. Pitman and M. Zanker, “Insights from applying sequential pattern mining to e-commerce click stream data,” in IEEE ICDM Workshop, 2010, pp. 967–975.
  • [38] J. Qiu, Z. Lin, and Y. Li, “Predicting customer purchase behavior in the e-commerce context,” Electron. Commer. Res., vol. 15, no. 4, pp. 427–452, 2015.
  • [39] P. Romov and E. Sokolov, “Recsys challenge 2015: ensemble learning with categorical features,” in Proc. Inter. ACM RecSys Challenge, 2015, pp. 1–4.
  • [40] B. Schröder, Ordered Sets: An Introduction with Connections from Combinatorics to Topology, Birkhäuser, 2016.
  • [41] C. Sismeiro and R. E. Bucklin, “Modeling purchase behavior at an e-commerce web site: A task-completion approach,” J. Mark. Res., vol. 41, no. 3, pp. 306–323, 2004.
  • [42] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,” Math. Program. Comput., in press.
  • [43] Q. F. Stout, “Isotonic regression for multiple independent variables,” Algorithmica, vol. 71, no. 2, pp. 450–470, 2015.
  • [44] R. J. Tibshirani, H. Hoefling, and R. Tibshirani, “Nearly-isotonic regression,” Technometrics, vol. 53, no. 1, pp. 54–61, 2011.
  • [45] E. Turban, J. Outland, D. King, J. K. Lee, T. P. Liang, and D. C. Turban, Electronic Commerce 2018: A Managerial and Social Networks Perspective, Springer, 2017.
  • [46] D. Van den Poel and W. Buckinx, “Predicting online-purchasing behaviour,” Eur. J. Oper. Res., vol. 166, no. 2, pp. 557–575, 2005.
  • [47] A. Vieira, “Predicting online user behaviour using deep learning algorithms,” arXiv preprint arXiv:1511.06247, 2015.
  • [48] J. Wang and S. K. Ghosh, “Shape restricted nonparametric regression with Bernstein polynomials,” Comput. Stat. Data Anal., vol. 56, no. 9, pp. 2729–2741, 2012.
  • [49] S. Warshall, “A theorem on boolean matrices,” J. ACM, vol. 9, no. 1, pp. 11–12, 1962.
  • [50] Z. Wu, B. H. Tan, R. Duan, Y. Liu, and R. S. Mong Goh, “Neural modeling of buying behaviour for e-commerce from clicking patterns,” in Proc. Inter. ACM RecSys Challenge, 2015, pp. 1–4.
  • [51] J. Yeo, S. Kim, E. Koh, S. W. Hwang, and N. Lipka, “Predicting online purchase conversion for retargeting,” in Proc. ACM Inter. Conf. WSDM, 2017, pp. 591–600.
  • [52] Z. Yi, D. Wang, K. Hu, and Q. Li, “Purchase behavior prediction in m-commerce with an optimized sampling methods,” in IEEE ICDMW, 2015, pp. 1085–1092.
  • [53] Y. Zhang and M. Pennacchiotti, “Predicting purchase behaviors from social media,” in Proc. Inter. Conf. WWW, 2013, pp. 1521–1532.
  • [54] Y. Zhao, L. Yao, and Y. Zhang, “Purchase prediction using Tmall-specific features,” Concurr. Comput., vol. 28, no. 14, pp. 3879–3894, 2016.