Predicting Online Item-choice Behavior:
A Shape-restricted Regression Approach

Naoki Nishimura, Noriyoshi Sukegawa, Yuichi Takano, and Jiro Iwanaga N. Nishimura was with the Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki 305-8577, Japan. He is now with Recruit Lifestyle Co., Ltd., Tokyo 100-6640, Japan (e-mail: [email protected]).N. Sukegawa is with the Department of Information and Computer Technology, Tokyo University of Science, Tokyo 125-8585, Japan (e-mail: [email protected]).Y. Takano is with the Faculty of Engineering, Information and Systems, University of Tsukuba, Ibaraki 305-8577, Japan (e-mail: [email protected]).J. Iwanaga is with the Doctoral Program in Policy and Planning Sciences, University of Tsukuba, Ibaraki 305-8577, Japan, and also with Erdos Inc., Kanagawa 222-0033, Japan (email: [email protected]).

Abstract

This paper examines the relationship between user pageview (PV) histories and their item-choice behavior on an e-commerce website. We focus on PV sequences, which represent time series of the number of PVs for each user–item pair. We propose a shape-restricted optimization model that accurately estimates item-choice probabilities for all possible PV sequences. This model imposes monotonicity constraints on item-choice probabilities by exploiting partial orders for PV sequences, according to the recency and frequency of a user’s previous PVs. To improve the computational efficiency of our optimization model, we devise efficient algorithms for eliminating all redundant constraints according to the transitivity of the partial orders. Experimental results using real-world clickstream data demonstrate that our method achieves higher prediction performance than that of a state-of-the-art optimization model and common machine learning methods.

Index Terms:

Consumer behavior, electronic commerce, graph theory, predictive models, quadratic programming

I Introduction

A growing number of companies are now operating e-commerce websites that allow users to browse and purchase a variety of items [45]. Within this context, there is great potential value in analyzing users’ item-choice behavior from clickstream data, which is a record of user pageview (PV) histories on an e-commerce website. By grasping users’ purchase intention as revealed by PV histories, we can lead users to target pages or design special sales promotions, providing companies with opportunities to build profitable relationships with users [22, 33]. Companies can also use clickstream data to improve the quality of operational forecasting and inventory management [18]. Meanwhile, users often find it difficult to select an appropriate item from among the plethora of choices presented by e-commerce websites [1]. Analyzing item-choice behavior can improve the performance of recommendation systems that help users discover items of interest [20]. For these reasons, a number of prior studies have investigated clickstream data from various perspectives [7]. In this study, we focused on closely examining the relationship between PV histories and item-choice behavior on an e-commerce website.

It has been demonstrated that the recency and frequency of a user’s past purchases are critical indicators for purchase prediction [13, 46] and sequential pattern mining [9]. Accordingly, Iwanaga et al. [19] developed a shape-restricted optimization model for estimating item-choice probabilities from the recency and frequency of a user’s previous PVs. Their method creates a two-dimensional probability table consisting of item-choice probabilities for all recency–frequency combinations in a user’s previous PVs. Nishimura et al. [32] employed latent-class modeling to integrate item heterogeneity into a two-dimensional probability table. These prior studies demonstrated experimentally that higher prediction performance was achieved with the two-dimensional probability table than with common machine learning methods, namely, logistic regression, kernel-based support vector machines, artificial neural networks, and random forests. Notably, however, reducing PV histories to two dimensions (recency and frequency) can markedly decrease the amount of information contained in PV histories reflecting item-choice behavior.

This study focused on PV sequences, which represent time series of the number of PVs for each user–item pair. In contrast to the two-dimensional probability table, PV sequences allow us to retain detailed information contained in the PV history. However, the huge number of possible PV sequences makes it extremely difficult to accurately estimate item-choice probabilities for all of them. To overcome this difficulty, we propose a shape-restricted optimization model that imposes monotonicity constraints on item-choice probabilities based on a partially ordered set (poset) for PV sequences. While this optimization model contains a huge number of constraints, all redundant constraints can be eliminated according to the transitivity of partial order. To accomplish this, we compute a transitivity reduction [2] of a directed graph representing the poset. We demonstrate the effectiveness of our method through experiments using real-world clickstream data.

The main contributions of this paper are as follows:

•

We propose a shape-restricted optimization model for estimating item-choice probabilities from a user’s previous PV sequence. This PV sequence model exploits the monotonicity constraints to precisely estimate item-choice probabilities.
•

We derive two types of PV sequence posets according to the recency and frequency of a user’s previous PVs. Experimental results show that the monotonicity constraints based on these posets greatly enhances the prediction performance of our PV sequence model.
•

We devise constructive algorithms for transitive reduction specific to these posets. The time complexity of our algorithms is much smaller than that of general-purpose algorithms. Experimental results reveal that transitive reduction improves efficiency in terms of both the computation time and memory usage of our PV sequence model.
•

We verify experimentally that higher prediction performance is achieved with our method than with the two-dimensional probability table and common machine learning methods, namely, logistic regression, artificial neural networks, and random forests.

The remainder of this paper is organized as follows. Section II gives a brief review of related work. Section III describes the two-dimensional probability table [19], and Section 4 presents our PV sequence model. Section IV describes our constructive algorithms for transitive reduction. Section V evaluates the effectiveness of our method based on experimental results. Section VI concludes with a brief summary of this work and a discussion of future research directions.

II Related work

This section briefly surveys methods for predicting online user behavior and discusses some related work on shape-restricted regression.

II-A Prediction of online user behavior

A number of prior studies have aimed at predicting users’ purchase behavior on e-commerce websites [10]. Mainstream research has applied stochastic or statistical models for predicting purchase sessions [5, 23, 30, 31, 36, 41, 46], but these approaches do not consider which items users choose.

Various machine learning methods have been used to predict online item-choice behavior, including logistic regression [12, 53], association rule mining [37], support vector machines [38, 53], ensemble learning methods [25, 26, 39, 52, 54], and artificial neural networks [21, 47, 50]. Tailored statistical models have also been proposed. For instance, Moe [29] devised a two-stage multinomial logit model that separates the decision-making process into item views and purchase decisions. Yao et al. [51] proposed a joint framework consisting of user-level factor estimation and item-level factor aggregation based on the buyer decision process. Borges and Levener [6] used Markov chain models to estimate the probability of the next link choice of a user.

These prior studies effectively utilized clickstream data in various prediction methods and showed that consideration of time-evolving user behavior is crucial for precise prediction of online item-choice behavior. We therefore focused on user PV sequences to estimate item-choice probabilities on e-commerce websites. Moreover, we evaluated the prediction performance of our method by comparison with machine learning methods that have commonly been used in prior studies.

II-B Shape-restricted regression

In many practical situations, prior information is known about the relationship between explanatory and response variables. For instance, utility functions can be assumed to be increasing and concave according to economic theory [28], and option pricing functions to be monotone and convex according to finance theory [3]. Shape-restricted regression fits a nonparametric function to a set of given observations under shape restrictions such as monotonicity, convexity, concavity, or unimodality [8, 15, 16, 48].

Isotonic regression is the most common method for shape-restricted regression. In general, isotonic regression is the problem of estimating a real-valued monotone (non-decreasing or non-increasing) function with respect to a given partial order of observations [35]. Some regularization techniques [14, 44] and estimation algorithms [17, 35, 43] have been proposed for isotonic regression.

One of the greatest advantages of shape-restricted regression is that it mitigates overfitting, thereby improving prediction performance of regression models [4]. To utilize this advantage, Iwanaga et al. [19] devised a shape-restricted optimization model for estimating item-choice probabilities on e-commerce websites. Along similar lines, we propose a shape-restricted optimization model based on order relations of PV sequences to improve prediction performance.

III Two-dimensional probability table

This section briefly reviews the two-dimensional probability table proposed by Iwanaga et al. [19].

III-A Empirical probability table

TABLE I: Pageview History of Six User–Item Pairs

User	Item	Apr 1	Apr 2	Apr 3	Apr 4	$(r,f)$	$(v_{1},v_{2},v_{3})$
		#PVs			Choice
$u_{1}$	$i_{2}$	1	0	1	0	$(3,2)$	$(1,0,1)$
$u_{1}$	$i_{4}$	0	1	0	1	$(2,1)$	$(0,1,0)$
$u_{2}$	$i_{1}$	3	0	0	0	$(1,3)$	$(0,0,3)$
$u_{2}$	$i_{3}$	0	0	3	1	$(3,3)$	$(3,0,0)$
$u_{2}$	$i_{4}$	1	1	1	0	$(3,3)$	$(1,1,1)$
$u_{3}$	$i_{2}$	2	0	1	0	$(3,3)$	$(1,0,2)$

Table I gives an example of a PV history for six user–item pairs. For instance, user $u_{1}$ viewed the page for item $i_{2}$ once on each of April 1 and 3. We focus on user choices (e.g., revisit and purchase) on April 4, which we call the base date. For instance, user $u_{1}$ chose item $i_{4}$ rather than item $i_{2}$ on the base date. We suppose for each user–item pair that recency and frequency are characterized by the last PV day and the total number of PVs, respectively. As shown in the table, the PV history can be summarized by the recency–frequency combination $(r,f)\in R\times F$ , where $R$ and $F$ are index sets representing recency and frequency, respectively.

Let $n_{rf}$ be the number of user–item pairs having $(r,f)\in R\times F$ , and set $q_{rf}$ to the number of choices occurring by user–item pairs that have $(r,f)\in R\times F$ on the base date. In the case of Table I, the empirical probability table is calculated as

	$\displaystyle\left(\hat{x}_{rf}:=\frac{q_{rf}}{n_{rf}}\right)_{\!\!(r,f)\in R\times F}$	$\displaystyle=\left(\begin{array}[]{ccc}0/0&0/0&0/1\\ 1/1&0/0&0/0\\ 0/0&0/1&1/3\\ \end{array}\right)$		(4)
		$\displaystyle\approx\left(\begin{array}[]{lll}0.00&0.00&0.00\\ 1.00&0.00&0.00\\ 0.00&0.00&0.33\\ \end{array}\right),$		(8)

where, for reasons of expediency, $\hat{x}_{rf}:=0$ for $(r,f)\in R\times F$ with $n_{rf}=0$ .

III-B Two-dimensional monotonicity model

It is reasonable to assume that the recency and frequency of user–item pairs are positively associated with user item-choice probabilities. To estimate user item-choice probabilities $x_{rf}$ for all recency–frequency combinations $(r,f)\in R\times F$ , the two-dimensional monotonicity model [19] minimizes the weighted sum of squared errors under monotonicity constraints with respect to recency and frequency.

$\displaystyle\mathop{\mbox{minimize}}_{(x_{rf})_{(r,f)\in R\times F}}$	$\displaystyle\sum_{(r,f)\in R\times F}n_{rf}(x_{rf}-\hat{x}_{rf})^{2}$	(9)
subject to	$\displaystyle x_{rf}\leq x_{r+1,f}~{}~{}~{}((r,f)\in R\times F),$	(10)
	$\displaystyle x_{rf}\leq x_{r,f+1}~{}~{}~{}((r,f)\in R\times F),$	(11)
	$\displaystyle 0\leq x_{rf}\leq 1~{}~{}~{}~{}\,((r,f)\in R\times F).$	(12)

Note, however, that PV histories are often indistinguishable according to recency and frequency. A typical example is the set of user–item pairs $(u_{2},i_{3})$ , $(u_{2},i_{4})$ , and $(u_{3},i_{2})$ in Table I; although their PV histories are actually different, they have the same value $(r,f)=(3,3)$ for recency–frequency combinations. As described in the next section, we exploit the PV sequence to distinguish between such PV histories.

IV PV sequence model

This section presents our shape-restricted optimization model for estimating item-choice probabilities from a user’s previous PV sequence.

IV-A PV sequence

The PV sequence for each user–item pair represents a time series of the number of PVs, and is written as

\bm{v}:=(v_{1},v_{2},\ldots,v_{n}),

where $v_{j}$ is the number of PVs $j$ periods earlier for $j=1,2,\ldots,n$ (see Table I). Note that sequence terms are arranged in reverse chronological order, so $v_{j}$ moves into the past as index $j$ increases.

Throughout the paper, we express sets of consecutive integers as

[m_{1},m_{2}]:=\{m_{1},m_{1}+1,\ldots,m_{2}\}\subseteq\mathbb{Z},

where $[m_{1},m_{2}]=\emptyset$ when $m_{1}>m_{2}$ . The set of possible PV sequences is defined as

\Gamma:=[0,m]^{n}=\{0,1,\ldots,m\}^{n},

where $m$ is the maximum number of PVs in each period, and $n$ is the number of periods considered.

Our objective is to estimate item-choice probabilities $x_{\bm{v}}$ for all PV sequences $\bm{v}\in\Gamma$ . However, the huge number of PV sequences makes it extremely difficult to accurately estimate such probabilities. In the case of $(n,m)=(|R|,|F|)=(5,6)$ for instance, the number of different PV sequences is $(m+1)^{n}=16{,}807$ , whereas the number of recency–frequency combinations is only $|R|\cdot|F|=30$ . To avoid this difficulty, we effectively utilize monotonicity constraints on item-choice probabilities as in the optimization model (9)–(12). In the next section, we introduce three operations underlying the development of the monotonicity constraints.

IV-B Operations based on recency and frequency

From the perspective of frequency, it is reasonable to assume that item-choice probability increases as the number of PVs in a particular period increases. To formulate this, we define the following operation:

Definition 1 (Up).

On the domain

\mathcal{D}_{\texttt{U}}:=\{(\bm{v},s)\in\Gamma\times[1,n]\mid v_{s}\leq m-1\},

the function ${\texttt{Up}}:\mathcal{D}_{\texttt{U}}\to\Gamma$ is defined as

((\ldots,v_{s},\ldots),s)\mapsto(\ldots,v_{s}+1,\ldots).

For instance, we have $\texttt{Up}((0,1,1),1)=(1,1,1)$ , and $\texttt{Up}((1,1,1),2)=(1,2,1)$ . Since this operation increases PV frequencies, the monotonicity constraint $x_{(0,1,1)}\leq x_{(1,1,1)}\leq x_{(1,2,1)}$ should be satisfied by item-choice probabilities.

From the perspective of recency, we assume that more-recent PVs have a larger effect on increasing item-choice probability. To formulate this, we consider the following operation for moving one PV from an old period to a new period:

Definition 2 (Move).

On the domain

\begin{array}[]{l}\mathcal{D}_{\texttt{M}}:=\{(\bm{v},s,t)\in\Gamma\times[1,n]\times[1,n]\\ \qquad\qquad\qquad\qquad\mid v_{s}\leq m-1,~{}v_{t}\geq 1,~{}s<t\},\end{array}

the function ${\texttt{Move}}:\mathcal{D}_{\texttt{M}}\to\Gamma$ is defined as

((\ldots,v_{s},\ldots,v_{t},\ldots),s,t)\mapsto(\ldots,v_{s}+1,\ldots,v_{t}-1,\ldots).

For instance, we have $\texttt{Move}((1,1,1),2,3)=(1,2,0)$ , and $\texttt{Move}((1,2,0),1,2)=(2,1,0)$ . Because this operation increases the number of recent PVs, item-choice probabilities should satisfy the monotonicity constraint $x_{(1,1,1)}\leq x_{(1,2,0)}\leq x_{(2,1,0)}$ .

The PV sequence $\bm{v}=(1,1,1)$ represents a user’s continued interest in a certain item over three periods. In contrast, the PV sequence $\bm{v}=(1,2,0)$ implies that a user’s interest decreased over the two most-recent periods. In this sense, the monotonicity constraint $x_{(1,1,1)}\leq x_{(1,2,0)}$ may not be validated. Accordingly, we define the following alternative operation, which exchanges numbers of PVs to increase the number of recent PVs:

Definition 3 (Swap).

On the domain

\mathcal{D}_{\texttt{S}}:=\{(\bm{v},s,t)\in\Gamma\times[1,n]\times[1,n]\mid v_{s}<v_{t},~{}s<t\},

the function ${\texttt{Swap}}:\mathcal{D}_{\texttt{S}}\to\Gamma$ is defined as

((\ldots,v_{s},\ldots,v_{t},\ldots),s,t)\mapsto(\ldots,v_{t},\ldots,v_{s},\ldots).

We thus have $\texttt{Swap}((1,0,2),2,3)=(1,2,0)$ because $v_{2}<v_{3}$ , and $\texttt{Swap}((1,2,0),1,2)=(2,1,0)$ because $v_{1}<v_{2}$ . Since this operation increases the number of recent PVs, item-choice probabilities should satisfy the monotonicity constraint $x_{(1,0,2)}\leq x_{(1,2,0)}\leq x_{(2,1,0)}$ . Note that the monotonicity constraint $x_{(1,1,1)}\leq x_{(1,2,0)}$ is not implied by this operation.

IV-C Partially ordered sets

Let $U\subseteq\Gamma$ be a subset of PV sequences. The image of each operation is then defined as

	$\displaystyle\texttt{Up}(U)$	$\displaystyle=\{\texttt{Up}(\bm{u},s)\mid\bm{u}\in U,~{}(\bm{u},s)\in\mathcal{D}_{\texttt{U}}\},$
	$\displaystyle\texttt{Move}(U)$	$\displaystyle=\{\texttt{Move}(\bm{u},s,t)\mid\bm{u}\in U,~{}(\bm{u},s,t)\in\mathcal{D}_{\texttt{M}}\},$
	$\displaystyle\texttt{Swap}(U)$	$\displaystyle=\{\texttt{Swap}(\bm{u},s,t)\mid\bm{u}\in U,~{}(\bm{u},s,t)\in\mathcal{D}_{\texttt{S}}\}.$

We define $\texttt{UM}(U):=\texttt{Up}(U)\cup\texttt{Move}(U)$ for $U\subseteq\Gamma$ . The following definition states that the binary relation $\bm{u}\prec_{\texttt{UM}}\bm{v}$ holds when $\bm{u}$ can be transformed into $\bm{v}$ by repeated application of Up and Move:

Definition 4 ( $\preceq_{\texttt{UM}}$ ).

Suppose $\bm{u},\bm{v}\in\Gamma$ . We write $\bm{u}\prec_{\texttt{UM}}\bm{v}$ if and only if there exists $k\geq 1$ such that

\bm{v}\in{\texttt{UM}}^{k}(\{\bm{u}\})=\underbrace{{\texttt{UM}}\circ\cdots\circ{\texttt{UM}}\circ{\texttt{UM}}}_{k~{}\mathrm{compositions}}(\{\bm{u}\}).

We also write $\bm{u}\preceq_{\texttt{UM}}\bm{v}$ if $\bm{u}\prec_{\texttt{UM}}\bm{v}$ or $\bm{u}=\bm{v}$ .

Similarly, we define $\texttt{US}(U):=\texttt{Up}(U)\cup\texttt{Swap}(U)$ for $U\subseteq\Gamma$ . Then, the binary relation $\bm{u}\prec_{\texttt{US}}\bm{v}$ holds when $\bm{u}$ can be transformed into $\bm{v}$ by repeated application of Up and Swap.

Definition 5 ( $\preceq_{\texttt{US}}$ ).

Suppose $\bm{u},\bm{v}\in\Gamma$ . We write $\bm{u}\prec_{\texttt{US}}\bm{v}$ if and only if there exists $k\geq 1$ such that

\bm{v}\in{\texttt{US}}^{k}(\{\bm{u}\})=\underbrace{{\texttt{US}}\circ\cdots\circ{\texttt{US}}\circ{\texttt{US}}}_{k~{}\mathrm{compositions}}(\{\bm{u}\}).

We also write $\bm{u}\preceq_{\texttt{US}}\bm{v}$ if $\bm{u}\prec_{\texttt{US}}\bm{v}$ or $\bm{u}=\bm{v}$ .

To prove properties of these binary relations, we can use the lexicographic order, which is a well-known linear order [40]:

Definition 6 ( $\preceq_{\texttt{lex}}$ ).

Suppose $\bm{u},\bm{v}\in\Gamma$ . We write $\bm{u}\prec_{\texttt{lex}}\bm{v}$ if and only if there exists $s\in[1,n]$ such that $u_{s}<v_{s}$ and $u_{j}=v_{j}$ for $j\in[1,s-1]$ . We also write $\bm{u}\preceq_{\texttt{lex}}\bm{v}$ if $\bm{u}\prec_{\texttt{lex}}\bm{v}$ or $\bm{u}=\bm{v}$ .

Each application of Up, Move, and Swap makes a PV sequence greater in the lexicographic order. Therefore, we can obtain the following lemma:

Lemma 1.

Suppose $\bm{u},\bm{v}\in\Gamma$ . If $\bm{u}\preceq_{\texttt{UM}}\bm{v}$ or $\bm{u}\preceq_{\texttt{US}}\bm{v}$ , then $\bm{u}\preceq_{\texttt{lex}}\bm{v}$ .

The following theorem states that a partial order of PV sequences is derived by operations Up and Move.

Theorem 1.

The pair $(\Gamma,\preceq_{\texttt{UM}})$ is a poset.

Proof.

From Definition 4, the relation $\preceq_{\texttt{UM}}$ is reflexive and transitive. Suppose $\bm{u}\preceq_{\texttt{UM}}\bm{v}$ and $\bm{v}\preceq_{\texttt{UM}}\bm{u}$ . It follows from Lemma 1 that $\bm{u}\preceq_{\texttt{lex}}\bm{v}$ and $\bm{v}\preceq_{\texttt{lex}}\bm{u}$ . Since the relation $\preceq_{\texttt{lex}}$ is antisymmetric, we have $\bm{u}=\bm{v}$ , thus proving that the relation $\preceq_{\texttt{UM}}$ is also antisymmetric. ∎

We can similarly prove the following theorem for operations Up and Swap:

Theorem 2.

The pair $(\Gamma,\preceq_{\texttt{US}})$ is a poset.

IV-D Shape-restricted optimization model

Let $n_{\bm{v}}$ be the number of user–item pairs that have the PV sequence $\bm{v}\in\Gamma$ . Also, $q_{\bm{v}}$ is the number of choices arising from user–item pairs having $\bm{v}\in\Gamma$ on the base date. Similarly to Eq. (8), we can calculate empirical item-choice probabilities as

\displaystyle\hat{x}_{\bm{v}}:=\frac{q_{\bm{v}}}{n_{\bm{v}}}\qquad(\bm{v}\in\Gamma).

(13)

Our shape-restricted optimization model minimizes the weighted sum of squared errors subject to the monotonicity constraint:

$\displaystyle\mathop{\mbox{minimize}}_{(x_{\bm{v}})_{\bm{v}\in\Gamma}}$	$\displaystyle\quad\sum_{\bm{v}\in\Gamma}n_{\bm{v}}(x_{\bm{v}}-\hat{x}_{\bm{v}})^{2}$	(14)
subject to	$\displaystyle\quad x_{\bm{u}}\leq x_{\bm{v}}\qquad(\bm{u},\bm{v}\in\Gamma\mbox{~{}with~{}}\bm{u}\prec\bm{v}),$	(15)
	$\displaystyle\quad 0\leq x_{\bm{v}}\leq 1~{}~{}~{}(\bm{v}\in\Gamma),$	(16)

where $\bm{u}\prec\bm{v}$ in Eq. (15) is defined by one of the partial orders $\prec_{\texttt{UM}}$ or $\prec_{\texttt{US}}$ .

The monotonicity constraint (15) enhances the estimation accuracy of item-choice probabilities. In addition, our shape-restricted optimization model can be used in a post-processing step to improve prediction performance of other machine learning methods. Specifically, we first compute item-choice probabilities using a machine learning method and then substitute the computed values into $(\hat{x}_{\bm{v}})_{\bm{v}\in\Gamma}$ to solve the optimization model (14)–(16). Consequently, we can obtain item-choice probabilities corrected by the monotonicity constraint (15). Section VI-D illustrates the usefulness of this approach.

However, since $|\Gamma|=(m+1)^{n}$ , it follows that the number of constraints in Eq. (15) is $\mathcal{O}((m+1)^{2n})$ , which can be extremely large. When $(n,m)=(5,6)$ , for instance, we have $(m+1)^{2n}=282{,}475{,}249$ . The next section describes how we mitigate this difficulty by removing redundant constraints in Eq. (15).

Refer to caption — Figure 1: Directed graph representations of the poset $(\Gamma,\preceq_{\texttt{UM}})$ with $(n,m)=(3,2)$ .

V Algorithms for transitive reduction

This section describes our constructive algorithms for transitive reduction to decrease the problem size in our shape-restricted optimization model.

V-A Transitive reduction

A poset $(\Gamma,\preceq)$ can be represented by a directed graph $(\Gamma,E)$ , where $\Gamma$ and $E\subseteq\Gamma\times\Gamma$ are sets of nodes and directed edges, respectively. Each directed edge $(\bm{u},\bm{v})\in E$ in this graph corresponds to the order relation $\bm{u}\prec\bm{v}$ , so the number of directed edges coincides with the number of constraints in Eq. (15).

Figs. 1 and 2 show directed graph representations of posets $(\Gamma,\preceq_{\texttt{UM}})$ and $(\Gamma,\preceq_{\texttt{US}})$ , respectively. Each edge in Figs. 1(a) and 2(a) corresponds to one of the operations Up, Move, or Swap. Edge $(\bm{u},\bm{v})$ is red if $\bm{v}\in\texttt{Up}(\{\bm{u}\})$ and black if $\bm{v}\in\texttt{Move}(\{\bm{u}\})$ or $\bm{v}\in\texttt{Swap}(\{\bm{u}\})$ . The directed graphs in Figs. 1(a) and 2(a) can be easily created.

Suppose there are three edges

(\bm{u},\bm{w}),(\bm{w},\bm{v}),(\bm{u},\bm{v})\in E.

In this case, edge $(\bm{u},\bm{v})$ is implied by the other edges due to the transitivity of partial order

\langle\bm{u}\prec\bm{w},~{}\bm{w}\prec\bm{v}\rangle~{}\Rightarrow~{}\bm{u}\prec\bm{v},

or, equivalently,

\langle x_{\bm{u}}\leq x_{\bm{w}},~{}x_{\bm{w}}\leq x_{\bm{v}}\rangle~{}\Rightarrow~{}x_{\bm{u}}\leq x_{\bm{v}}.

As a result, edge $(\bm{u},\bm{v})$ is redundant and can be removed from the directed graph.

A transitive reduction, also known as a Hasse diagram, of a directed graph $(\Gamma,E)$ is its subgraph $(\Gamma,E^{*})$ such that all redundant edges are removed using the transitivity of partial order [2]. Figs. 1(b) and 2(b) show transitive reductions of the directed graphs shown in Figs. 1(a) and 2(a), respectively. By computing transitive reductions, the number of edges is reduced from 90 to 42 in Fig. 1, and from 81 to 46 in Fig. 2. This transitive reduction is known to be unique [2].

V-B General-purpose algorithms

The transitive reduction $(\Gamma,E^{*})$ is characterized by the following lemma [40]:

Lemma 2.

Suppose $(\bm{u},\bm{v})\in\Gamma\times\Gamma$ . Then, $(\bm{u},\bm{v})\in E^{*}$ holds if and only if both of the following conditions are fulfilled:

(C1) $\bm{u}\prec\bm{v}$ , and
(C2) if $\bm{w}\in\Gamma$ satisfies $\bm{u}\preceq\bm{w}\preceq\bm{v}$ , then $\bm{w}\in\{\bm{u},\bm{v}\}$ .

The basic strategy in general-purpose algorithms for transitive reduction involves the following steps:

•

Step 1: An exhaustive directed graph $(\Gamma,E)$ is generated from a given poset $(\Gamma,\preceq)$ .
•

Step 2: The transitive reduction $(\Gamma,E^{*})$ is computed from the directed graph $(\Gamma,E)$ using Lemma 2.

Various algorithms for speeding up the computation in Step 2 have been proposed. Recall that $|\Gamma|=(m+1)^{n}$ in our situation. Warshall’s algorithm [49] has time complexity $\mathcal{O}((m+1)^{3n})$ for completing Step 2 [40]. By using a sophisticated algorithm for fast matrix multiplication, this time complexity can be reduced to $\mathcal{O}((m+1)^{2.3729n})$ [24].

However, such general-purpose algorithms are clearly inefficient, especially when $n$ is very large, and Step 1 requires a huge number of computations. To resolve this difficulty, we devised specialized algorithms for directly constructing a transitive reduction.

V-C Constructive algorithms

Let $(\Gamma,E^{*}_{\texttt{UM}})$ be a transitive reduction of a directed graph $(\Gamma,E_{\texttt{UM}})$ representing the poset $(\Gamma,\preceq_{\texttt{UM}})$ . Then, the transitive reduction can be characterized by the following theorem:

Theorem 3.

Suppose $(\bm{u},\bm{v})\in\Gamma\times\Gamma$ . Then, $(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}$ holds if and only if any one of the following conditions is fulfilled:

(UM1) $\bm{v}={\texttt{Up}}(\bm{u},n)$ , or
(UM2) $\exists s\in[1,n]$ such that $\bm{v}={\texttt{Move}}(\bm{u},s,s+1)$ .

Proof.

See Appendix A-A. ∎

Theorem 3 gives a constructive algorithm that directly computes the transitive reduction $(\Gamma,E^{*}_{\texttt{UM}})$ without generating an exhaustive directed graph $(\Gamma,E)$ . Our algorithm is based on the breadth-first search [11]. Specifically, we start with a node list $L=\{(0,0,\ldots,0)\}\subseteq\Gamma$ . At each iteration of the algorithm, we choose $\bm{u}\in L$ , enumerate $\bm{v}\in\Gamma$ such that $(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}$ , and add these nodes to $L$ .

Table II shows this enumeration process for $\bm{u}=(0,2,1)$ with $(n,m)=(3,2)$ . The operations Up and Move generate

\bm{v}\in\{(1,2,1),(0,2,2),(1,1,1),(1,2,0)\},

which amounts to searching edges $(\bm{u},\bm{v})$ in Fig. 1(a). We next check whether each $\bm{v}$ satisfies conditions (UM1) or (UM2) in Theorem 3. As shown in Table II, we choose

\bm{v}\in\{(0,2,2),(1,1,1)\}

and add them to list $L$ ; this amounts to enumerating edges $(\bm{u},\bm{v})$ in Fig. 1(b).

TABLE II: Process of enumerating

\bm{v}\in\Gamma

such that

(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}

$\bm{u}$	Operation	$\bm{v}$	(UM1)	(UM2)
$(0,2,1)$	$\texttt{Up}(\bm{u},1)$	$(1,2,1)$	$\times$	—
	$\texttt{Up}(\bm{u},3)$	$(0,2,2)$	$\surd$	—
	$\texttt{Move}(\bm{u},1,2)$	$(1,1,1)$	—	$\surd$
	$\texttt{Move}(\bm{u},1,3)$	$(1,2,0)$	—	$\times$

Appendix B-A presents pseudocode for our constructive algorithm (Algorithm 1). Recalling the time complexity analysis of the breadth-first search [11], one readily sees that the time complexity of Algorithm 1 is $\mathcal{O}(n(m+1)^{n})$ , which is much smaller than $\mathcal{O}((m+1)^{2.3729n})$ as achieved by the general-purpose algorithm [24], especially when $n$ is very large.

Next, we focus on the transitive reduction $(\Gamma,E^{*}_{\texttt{US}})$ of a directed graph $(\Gamma,E_{\texttt{US}})$ representing the poset $(\Gamma,\preceq_{\texttt{US}})$ . The transitive reduction can then be characterized by the following theorem:

Theorem 4.

Suppose $(\bm{u},\bm{v})\in\Gamma\times\Gamma$ . Then, $(\bm{u},\bm{v})\in E^{*}_{\texttt{US}}$ holds if and only if any one of the following conditions is fulfilled:

(US1) $\exists s\in[1,n]$ such that $\bm{v}={\texttt{Up}}(\bm{u},s)$ and $u_{j}\not\in\{u_{s},u_{s}+1\}$ for all $j\in[s+1,n]$ , or
(US2) $\exists(s,t)\in[1,n]\times[1,n]$ such that $\bm{v}={\texttt{Swap}}(\bm{u},s,t)$ and $u_{j}\not\in[u_{s},u_{t}]$ for all $j\in[s+1,t-1]$ .

Proof.

See Appendix A-B. ∎

Theorem 4 also gives a constructive algorithm for computing the transitive reduction $(\Gamma,E^{*}_{\texttt{US}})$ . Let us again consider $\bm{u}=(0,2,1)$ as an example with $(n,m)=(3,2)$ . As shown in Table III, operations Up and Swap generate

\bm{v}\in\{(1,2,1),(0,2,2),(2,0,1),(1,2,0)\},

and we choose

\bm{v}\in\{(0,2,2),(2,0,1),(1,2,0)\}

(see also Figs. 2(a) and 2(b)).

TABLE III: Process of enumerating

\bm{v}\in\Gamma

such that

(\bm{u},\bm{v})\in E^{*}_{\texttt{US}}

$\bm{u}$	Operation	$\bm{v}$	(US1)	(US2)
$(0,2,1)$	$\texttt{Up}(\bm{u},1)$	$(1,2,1)$	$\times$	—
	$\texttt{Up}(\bm{u},3)$	$(0,2,2)$	$\surd$	—
	$\texttt{Swap}(\bm{u},1,2)$	$(2,0,1)$	—	$\surd$
	$\texttt{Swap}(\bm{u},1,3)$	$(1,2,0)$	—	$\surd$

Appendix B-B presents pseudocode for our constructive algorithm (Algorithm 2). Its time complexity is estimated to be $\mathcal{O}(n^{2}(m+1)^{n})$ , which is larger than that of Algorithm 1 but much smaller than that of the general-purpose algorithm [24], especially when $n$ is very large.

VI Experiments

The experimental results reported in this section evaluate the effectiveness of our method for estimating item-choice probabilities.

We used real-world clickstream data collected from a Chinese e-commerce website, Tmall¹¹1https://tianchi.aliyun.com/dataset/. We used a dataset²²2https://www.dropbox.com/sh/dbzmtq4zhzbj5o9/AACldzQWbw-igKjcPTBI6ZPAa?dl=0 preprocessed by Ludewig and Jannach [27]. Each record corresponds to one PV and contains information such as user ID, item ID, and a timestamp. The dataset includes 28,316,459 unique user–item pairs composed from 422,282 users and 624,221 items.

VI-A Methods for comparison

TABLE IV: Methods for comparison

Abbr.	Method
2dimEmp	Empirical probability table (8) [19]
2dimMono	Two-dimensional monotonicity model (9)–(12) [19]
SeqEmp	Empirical probabilities (13) for PV sequences
SeqUM	Our PV sequence model (14)–(16) using $(\Gamma,\preceq_{\texttt{UM}})$
SeqUS	Our PV sequence model (14)–(16) using $(\Gamma,\preceq_{\texttt{US}})$
LR	$L_{2}$ -regularized logistic regression
ANN	Artificial neural networks for regression
RF	Random forest of regression trees

We compared the performance of the methods listed in Table IV. All computations were performed on an Apple MacBook Pro computer with an Intel Core i7-5557U CPU (3.10 GHz) and 16 GB of memory.

The optimization models (9)–(12) and (14)–(16) were solved using OSQP³³3https://osqp.org/docs/index.html [42], a numerical optimization package for solving convex quadratic optimization problems. As in Table I, daily-PV sequences were calculated for each user–item pair, where $m$ is the maximum number of daily PVs and $n$ is the number of terms (past days) in the PV sequence. In this process, all PVs from more than $n$ days earlier were added to the number of PVs $n$ days earlier, and numbers of daily PVs exceeding $m$ were rounded down to $m$ . Similarly, the recency–frequency combinations $(r,f)\in R\times F$ were calculated using daily PVs as in Table I, where $(|R|,|F|)=(n,m)$ .

Other machine learning methods (LR, ANN, and RF) were respectively implemented using the LogisticRegressionCV, MLPRegressor, and RandomForestRegressor functions in scikit-learn, a Python library of machine learning tools. Related hyperparameters were tuned through 3-fold cross-validation according to the parameter settings in a benchmark study [34]. These machine learning methods employed the PV sequence $(v_{1},v_{2},\ldots,v_{n})$ as $n$ input variables for computing item-choice probabilities. We standardized each input variable and performed undersampling to improve prediction performance.

VI-B Performance evaluation methodology

There are five pairs of training and validation sets of clickstream data in the preprocessed dataset [27]. As shown in Table V, each training period is 90 days, and the next day is the validation period. The first four pairs of training and validation sets, which we call the training set, were used for model estimation, and the fifth pair was used for performance evaluation. To examine how sample size affects prediction performance, we prepared small-sample training sets by randomly choosing user–item pairs from the original training set. Here, the sampling rates are 0.1%, 1%, and 10%, and the original training set is referred to as the full sample. Note that the results were averaged over 10 trials for the sampled training sets.

TABLE V: Training and validation periods

Pair ID	Start	End	Validation
	Training
1	21 May 2015	18 August 2015	19 August 2015
2	31 May 2015	28 August 2015	29 August 2015
3	10 June 2015	7 September 2015	8 September 2015
4	20 June 2015	17 September 2015	18 September 2015
5	30 June 2015	27 September 2015	28 September 2015

TABLE VI: Problem size of our PV sequence model (14)–(16)

			#Cons in Eq. (15)
			Enumeration		Operation		Reduction
$n$	$m$	#Vars	SeqUM	SeqUS	SeqUM	SeqUS	SeqUM	SeqUS
5	1	32	430	430	160	160	48	48
5	2	243	21,383	17,945	1,890	1,620	594	634
5	3	1,024	346,374	255,260	9,600	7,680	3,072	3,546
5	4	3,125	3,045,422	2,038,236	32,500	25,000	10,500	12,898
5	5	7,776	18,136,645	11,282,058	86,400	64,800	28,080	36,174
5	6	16,807	82,390,140	48,407,475	195,510	144,060	63,798	85,272
1	6	7	21	21	6	6	6	6
2	6	49	1,001	861	120	105	78	93
3	6	343	42,903	32,067	1,638	1,323	798	1,018
4	6	2,401	1,860,622	1,224,030	18,816	14,406	7,350	9,675
5	6	16,807	82,390,140	48,407,475	195,510	144,060	63,798	85,272

We considered the top- $N$ selection task to evaluate prediction performance. Specifically, we focused on items that were viewed by a particular user during a training period. From among these items, we selected $I_{\rm sel}$ , a set of top- $N$ items for the user according to estimated item-choice probabilities. The most-recently viewed items were selected when two or more items had the same choice probability. Let $I_{\rm view}$ be the set of items viewed by the user in the validation period. Then, the F1 score is defined by the harmonic average of $\mbox{\emph{Recall}}:=|I_{\rm sel}\cap I_{\rm view}|/|I_{\rm view}|$ and $\mbox{\emph{Precision}}:=|I_{\rm sel}\cap I_{\rm view}|/|I_{\rm sel}|$ as

\mathrm{F1~{}score}:=\frac{2\cdot\mathrm{Recall}\cdot\mathrm{Precision}}{\mathrm{Recall}+\mathrm{Precision}}.

In the following sections, we examine F1 scores averaged over all users. The percentage of user–item pairs leading to item choices is only 0.16%.

VI-C Effects of the transitive reduction

We generated constraints in Eq. (15) based on the following three directed graphs:

•

Case 1 (Enumeration): All edges $(\bm{u},\bm{v})$ satisfying $\bm{u}\prec\bm{v}$ were enumerated.
•

Case 2 (Operation): Edges corresponding to operations Up, Move, and Swap were generated as in Figs. 1(a) and 2(a).
•

Case 3 (Reduction): Transitive reduction was computed using our algorithms as in Figs. 1(b) and 2(b).

TABLE VII: Computation times for our PV sequence model (14)–(16)

			Time [s]
			Enumeration		Operation		Reduction
$n$	$m$	#Vars	SeqUM	SeqUS	SeqUM	SeqUS	SeqUM	SeqUS
5	1	32	0.00	0.01	0.00	0.00	0.00	0.00
5	2	243	2.32	1.66	0.09	0.07	0.03	0.02
5	3	1,024	558.22	64.35	3.41	0.71	0.13	0.26
5	4	3,125	OM	OM	24.07	13.86	1.72	5.80
5	5	7,776	OM	OM	180.53	67.34	9.71	36.94
5	6	16,807	OM	OM	906.76	522.84	86.02	286.30
1	6	7	0.00	0.00	0.00	0.00	0.00	0.00
2	6	49	0.03	0.01	0.01	0.00	0.00	0.00
3	6	343	12.80	1.68	0.20	0.03	0.05	0.02
4	6	2,401	OM	OM	8.07	4.09	2.12	2.87
5	6	16,807	OM	OM	906.76	522.84	86.02	286.30

TABLE VIII: Computational performance of our PV sequence model (14)–(16)

			#Cons in Eq. (15)		Time [s]		F1 score [%], $N=3$
$n$	$m$	#Vars	SeqUM	SeqUS	SeqUM	SeqUS	SeqEmp	SeqUM	SeqUS
3	30	29,791	84,630	118,850	86.72	241.46	12.25	12.40	12.40
4	12	28,561	99,372	142,800	198.82	539.76	12.68	12.93	12.95
5	6	16,807	63,798	85,272	86.02	286.30	12.90	13.18	13.18
6	4	15,625	62,500	76,506	62.92	209.67	13.14	13.49	13.48
7	3	16,384	67,584	76,818	96.08	254.31	13.23	13.52	13.53
8	2	6,561	24,786	25,879	19.35	17.22	13.11	13.37	13.35
9	2	19,683	83,106	86,386	244.15	256.42	13.07	13.40	13.37

Table VI shows the problem size of our PV sequence model (14)–(16) for some $(n,m)$ settings of the PV sequence. Here, the “#Vars” column shows the number of decision variables (i.e., $(m+1)^{n}$ ), and the subsequent columns show the number of constraints in Eq. (15) for the three cases mentioned above.

The number of constraints grew rapidly as $n$ and $m$ increased in the enumeration case. In contrast, the number of constraints was always kept smallest by the transitive reduction among the three cases. When $(n,m)=(5,6)$ , for instance, transitive reductions reduced the number of constraints in the operation case to $63798/195510\approx 32.6\%$ for SeqUM and $85272/144060\approx 59.2\%$ for SeqUS.

The number of constraints was larger for SeqUM than for SeqUS in the enumeration and operation cases. In contrast, the number of constraints was often smaller for SeqUM than for SeqUS in the reduction case. Thus, the transitive reduction had a greater impact on SeqUM than on SeqUS in terms of the number of constraints.

Table VII lists the computation times required for solving the optimization problem (14)–(16) for some $(n,m)$ settings of the PV sequence. Here, “OM” indicates that computation was aborted due to a lack of memory. The enumeration case often caused out-of-memory errors because of the huge number of constraints (see Table VI), but the operation and reduction cases completed the computations for all $(n,m)$ settings for the PV sequence. Moreover, the transitive reduction made computations faster. A notable example is SeqUM with $(n,m)=(5,6)$ , for which the computation time in the reduction case (86.02 s) was only one-tenth of that in the operation case (906.76 s). These results demonstrate that transitive reduction improves efficiency in terms of both computation time and memory usage.

Table VIII shows the computational performance of our optimization model (14)–(16) for some $(n,m)$ settings of PV sequences. Here, for each $n\in\{3,4\ldots,9\}$ , the largest $m$ was chosen such that the computation finished within 30 min. Both SeqUM and SeqUS always delivered higher F1 scores than SeqEmp did. This indicates that our monotonicity constraint (15) works well for improving prediction performance. The F1 scores provided by SeqUM and SeqUS were very similar and largest with $(n,m)=(7,3)$ . In light of these results, we use the setting $(n,m)\in\{(7,3),(5,6)\}$ in the following sections.

VI-D Prediction performance of our PV sequence model

Fig. 3 shows F1 scores of the two-dimensional probability table and our PV sequence model using the sampled training sets, where the number of selected items is $N\in\{3,5,10\}$ , and the setting of the PV sequence is $(n,m)\in\{(7,3),(5,6)\}$ .

When the full-sample training set was used, SeqUM and SeqUS always delivered better prediction performance than the other methods did. When the 1%- and 10%-sampled training sets were used, the prediction performance of SeqUS decreased slightly, whereas SeqUM still performed best among all the methods. When the 0.1%-sampled training set was used, 2dimMono always performed better than SeqUS did, and 2dimMono also had the best prediction performance in the case $(n,m)=(5,6)$ . These results suggest that our PV sequence model performs very well, especially when the sample size is sufficiently large.

The prediction performance of SeqEmp deteriorated rapidly as the sampling rate decreased, and this performance was always much worse than that of 2dimEmp. Meanwhile, SeqUM and SeqUS maintained high prediction performance even when the 0.1%-sampled training set was used. This suggests that the monotonicity constraint (15) in our PV sequence model is more effective than the monotonicity constraints (10)–(11) in the two-dimensional monotonicity model.

Fig. 4 shows F1 scores for the machine learning methods (LR, ANN, and RF) and our PV sequence model (SeqUM) using the full-sample training set, where the number of selected items is $N\in\{3,5,10\}$ , and the PV sequence setting is $(n,m)\in\{(7,3),(5,6)\}$ . Note that in this figure, SeqUM( $\ast$ ) represents the optimization model (14)–(16), where the item-choice probabilities computed by each machine learning method were substituted into $(\hat{x}_{\bm{v}})_{\bm{v}\in\Gamma}$ (see Section 4.4).

Prediction performance was better for SeqUM than for all the machine learning methods, except in the case of Fig. 4(f), where LR showed better prediction performance. Moreover, SeqUM( $\ast$ ) improved the prediction performance of the machine leaning methods, particularly for ANN and RF. This suggests that our monotonicity constraint (15) is also very helpful in correcting prediction values from other machine learning methods.

VI-E Analysis of estimated item-choice probabilities

Fig. 5 shows item-choice probabilities estimated by our PV sequence model using the full-sample training set, where the PV sequence setting is $(n,m)=(5,6)$ . Here, we focus on PV sequences in the form $\bm{v}=(v_{1},v_{2},v_{3},0,0)\in\Gamma$ and depict estimates of item-choice probabilities on $(v_{1},v_{2})\in[0,m]\times[0,m]$ for each $v_{3}\in\{0,1,2\}$ . Note also that the number of associated user–item pairs decreased as the value of $v_{3}$ increased.

Because SeqEmp does not consider the monotonicity constraint (15), item-choice probabilities estimated by SeqEmp have irregular shapes for $v_{3}\in\{1,2\}$ . In contrast, item-choice probabilities estimated with the monotonicity constraint (15) are relatively smooth. Because of the Up operation, item-choice probabilities estimated by SeqUM and SeqUS increase as $(v_{1},v_{2})$ moves from $(0,0)$ to $(6,6)$ . Because of the Move operation, item-choice probabilities estimated by SeqUM also increase as $(v_{1},v_{2})$ moves from $(0,6)$ to $(6,0)$ . Item-choice probabilities estimated by SeqUS are relatively high around $(v_{1},v_{2})=(3,3)$ . This highlights the difference in the monotonicity constraint (15) between posets $(\Gamma,\preceq_{\texttt{UM}})$ and $(\Gamma,\preceq_{\texttt{US}})$ .

Fig. 6 shows item-choice probabilities estimated by our PV sequence model using the 10%-sampled training set, where the PV sequence setting is $(n,m)=(5,6)$ . Since the sample size was reduced in this case, item-choice probabilities estimated by SeqEmp are highly unstable. In particular, item-choice probabilities were estimated to be zero for all $(v_{1},v_{2})$ with $v_{1}\geq 3$ in Fig. 6(c), but this is unreasonable from the perspective of frequency. In contrast, SeqUM and SeqUS estimated item-choice probabilities that increase monotonically with respect to $(v_{1},v_{2})$ .

VII Conclusion

We presented a shape-restricted optimization model for estimating item-choice probabilities on an e-commerce website. Our monotonicity constraints based on tailored order relations could better estimate item-choice probabilities for all possible PV sequences. To improve computational efficiency of our optimization model, we devised constructive algorithms for transitive reduction that remove all redundant constraints from the optimization model.

We assessed the effectiveness of our method through experiments using real-world clickstream data. Experimental results demonstrated that transitive reduction enhanced the efficiency of our optimization model in terms of both computation time and memory usage. In addition, our method delivered better prediction performance than did the two-dimensional monotonicity model [19] and common machine learning methods. Our method was also helpful in correcting prediction values computed by other machine learning methods.

This study made three main contributions. First, we derived two types of posets by exploiting the properties of recency and frequency of a user’s previous PVs. These posets allow us to place appropriate monotonicity constraints on item-choice probabilities. Next, we developed algorithms for transitive reduction of our posets. These algorithms are more efficient than general-purpose algorithms in terms of time complexity for transitive reduction. Finally, our method further expanded the potential of shape-restricted regression for predicting user behavior on e-commerce websites.

Once the optimization model for estimating item-choice probabilities has been solved, the obtained results can easily be put into practical use on e-commerce websites. Accurate estimates of item-choice probabilities will be useful for customizing sales promotions according to the needs of a particular user. In addition, our method can estimate user preferences from clickstream data, therefore aiding creation of high-quality user–item rating matrices for recommendation algorithms [20].

In future studies, we will develop new posets that further improve the prediction performance of our PV sequence model. Another direction of future research will be to incorporate user–item heterogeneity into our optimization model, as in the case of latent class modeling with a two-dimensional probability table [32].

Appendix A Proofs

A-A Proof of Theorem 3

The “only if” part

Suppose $(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}$ . We then have $\bm{v}\in\texttt{UM}(\{\bm{u}\})$ from Definition 4 and Lemma 2. We therefore consider the following two cases:

Case 1: $\bm{v}=\texttt{Up}(\bm{u},s)$ for some $s\in[1,n]$
For the sake of contradiction, assume that $s\not=n$ (i.e., $s\leq n-1$ ). Then there exists an index $j$ such that $s<j\leq n$ . If $u_{j}>0$ , then $\bm{w}=\texttt{Move}(\bm{u},s,j)$ and $\bm{v}=\texttt{Up}(\bm{w},j)$ . If $u_{j}=0$ , then $\bm{w}=\texttt{Up}(\bm{u},j)$ and $\bm{v}=\texttt{Move}(\bm{w},s,j)$ . This implies that $\bm{u}\prec_{\texttt{UM}}\bm{w}\prec_{\texttt{UM}}\bm{v}$ , which contradicts $(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}$ due to condition (C2) of Lemma 2.

Case 2: $\bm{v}=\texttt{Move}(\bm{u},s,t)$ for some $(s,t)\in[1,n]\times[1,n]$
For the sake of contradiction, assume that $t\not=s+1$ (i.e., $t\geq s+2$ ). Then there exists an index $j$ such that $s<j<t$ . If $u_{j}>0$ , then $\bm{w}=\texttt{Move}(\bm{u},s,j)$ and $\bm{v}=\texttt{Move}(\bm{w},j,t)$ . If $u_{j}=0$ , then $\bm{w}=\texttt{Move}(\bm{u},j,t)$ and $\bm{v}=\texttt{Move}(\bm{w},s,j)$ . This implies that $\bm{u}\prec_{\texttt{UM}}\bm{w}\prec_{\texttt{UM}}\bm{v}$ , which contradicts $(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}$ due to condition (C2) of Lemma 2.

The “if” part

Next, we show that $(\bm{u},\bm{v})\in E^{*}_{\texttt{UM}}$ in the following two cases:

Case 1: Condition (UM1) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider $\bm{w}\in\Gamma$ such that $\bm{u}\preceq_{\texttt{UM}}\bm{w}\preceq_{\texttt{UM}}\bm{v}$ . From Lemma 1, we have $\bm{u}\preceq_{\texttt{lex}}\bm{w}\preceq_{\texttt{lex}}\bm{v}$ . Since $\bm{u}$ is next to $\bm{v}$ in the lexicographic order, we have $\bm{w}\in\{\bm{u},\bm{v}\}$ .

Case 2: Condition (UM2) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider $\bm{w}\in\Gamma$ such that $\bm{u}\preceq_{\texttt{UM}}\bm{w}\preceq_{\texttt{UM}}\bm{v}$ . From Lemma 1, we have $\bm{u}\preceq_{\texttt{lex}}\bm{w}\preceq_{\texttt{lex}}\bm{v}$ , which implies that $w_{j}=u_{j}$ for all $j\in[1,s-1]$ . Therefore, we cannot apply any operations to $w_{j}$ for $j\in[1,s-1]$ in the process of transforming $\bm{w}$ from $\bm{u}$ into $\bm{v}$ . To keep the value of $\sum_{j=1}^{n}w_{j}$ constant, we can apply only the Move operation. However, once the Move operation is applied to $w_{j}$ for $j\in[s+2,n]$ , the resultant sequence cannot be converted into $\bm{v}$ . As a result, only $\texttt{Move}(\,\cdot\,,s,s+1)$ can be performed, and therefore $\bm{w}=\bm{u}$ or $\bm{w}=\texttt{Move}(\bm{u},s,s+1)=\bm{v}$ .

A-B Proof of Theorem 4

The “only if” part

Suppose that $(\bm{u},\bm{v})\in E^{*}_{\texttt{US}}$ . We then have $\bm{v}\in\texttt{US}(\{\bm{u}\})$ from Definition 5 and Lemma 2. Thus, we consider the following two cases:

Case 1: $\bm{v}=\texttt{Up}(\bm{u},s)$ for some $s\in[1,n]$
For the sake of contradiction, assume $u_{j}\in\{u_{s},u_{s}+1\}$ for some $j\in[s+1,n]$ . If $u_{j}=u_{s}$ , then $\bm{w}=\texttt{Up}(\bm{u},j)$ and $\bm{v}=\texttt{Swap}(\bm{w},s,j)$ . If $u_{j}=u_{s}+1$ , then $\bm{w}=\texttt{Swap}(\bm{u},s,j)$ and $\bm{v}=\texttt{Up}(\bm{w},j)$ . This implies that $\bm{u}\prec_{\texttt{US}}\bm{w}\prec_{\texttt{US}}\bm{v}$ , which contradicts $(\bm{u},\bm{v})\in E^{*}_{\texttt{US}}$ due to condition (C2) of Lemma 2.

Case 2: $\bm{v}=\texttt{Swap}(\bm{u},s,t)$ for some $(s,t)\in[1,n]\times[1,n]$
For the sake of contradiction, assume $u_{j}\in[u_{s},u_{t}]$ for some $j\in[s+1,t-1]$ . If $u_{s}<u_{j}<u_{t}$ , then $\bm{w}_{1}=\texttt{Swap}(\bm{u},j,t)$ , $\bm{w}_{2}=\texttt{Swap}(\bm{w}_{1},s,j)$ , and $\bm{v}=\texttt{Swap}(\bm{w}_{2},j,t)$ . If $u_{j}=u_{s}$ , then $\bm{w}=\texttt{Swap}(\bm{u},j,t)$ and $\bm{v}=\texttt{Swap}(\bm{w},s,j)$ . If $u_{j}=u_{t}$ , then $\bm{w}=\texttt{Swap}(\bm{u},s,j)$ and $\bm{v}=\texttt{Swap}(\bm{w},j,t)$ . Each of these results contradicts $(\bm{u},\bm{v})\in E^{*}_{\texttt{US}}$ due to condition (C2) of Lemma 2.

The “if” part

Next, we show that $(\bm{u},\bm{v})\in E^{*}_{\texttt{US}}$ in the following two cases:

Case 1: Condition (US1) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider $\bm{w}\in\Gamma$ such that $\bm{u}\preceq_{\texttt{US}}\bm{w}\preceq_{\texttt{US}}\bm{v}$ . From Lemma 1, we have $\bm{u}\preceq_{\texttt{lex}}\bm{w}\preceq_{\texttt{lex}}\bm{v}$ , implying that $w_{j}=u_{j}$ for all $j\in[1,s-1]$ . Therefore, we cannot apply any operations to $w_{j}$ for $j\in[1,s-1]$ in the process of transforming $\bm{w}$ from $\bm{u}$ into $\bm{v}$ . We must apply the Up operation only once, because the value of $\sum_{j=1}^{n}w_{j}$ remains the same after the Swap operation. Condition (US1) guarantees that for all $j\in[s+1,n]$ , $w_{j}$ does not coincide with $u_{s}+1$ even if $\texttt{Up}(\,\cdot\,,j)$ is applied. Therefore, $\texttt{Swap}(\,\cdot\,,s,j)$ for $j\in[s+1,n]$ never leads to $w_{s}=u_{s}+1$ . As a result, $\texttt{Up}(\,\cdot\,,s)$ must be performed. Other applicable Swap operations produce a sequence that cannot be converted into $\bm{v}$ . This means that $\bm{w}=\bm{u}$ or $\bm{w}=\texttt{Up}(\bm{u},s)=\bm{v}$ .

Case 2: Condition (US2) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider $\bm{w}\in\Gamma$ such that $\bm{u}\preceq_{\texttt{US}}\bm{w}\preceq_{\texttt{US}}\bm{v}$ . From Lemma 1, we have $\bm{u}\preceq_{\texttt{lex}}\bm{w}\preceq_{\texttt{lex}}\bm{v}$ . This implies that $w_{j}=u_{j}$ for all $j\in[1,s-1]$ , and that $w_{s}\in[u_{s},u_{t}]$ . Therefore, we cannot apply any operations to $w_{j}$ for $j\in[1,s-1]$ in the process of transforming $\bm{w}$ from $\bm{u}$ into $\bm{v}$ . To keep the value of $\sum_{j=1}^{n}w_{j}$ constant, we can apply only the Swap operation. However, once the Swap operation is applied to $w_{j}$ for $j\in[t+1,n]$ , the resultant sequence cannot be converted into $\bm{v}$ . We cannot adopt $\bm{w}=\texttt{Swap}(\bm{u},s,j)$ for $j\in[s+1,t-1]$ due to condition (US2). If we adopt $\bm{w}=\texttt{Swap}(\bm{u},j,t)$ for $j\in[s+1,t-1]$ , we have $w_{t}\leq u_{s}-1$ due to condition (US2), so the application of $\texttt{Swap}(\,\cdot\,,t,j)$ is unavoidable for $j\in[t+1,n]$ . As a result, $\texttt{Swap}(\,\cdot\,,s,t)$ must be performed. Other applicable Swap operations produce a sequence that cannot be converted into $\bm{v}$ . This means that $\bm{w}=\bm{u}$ or $\bm{w}=\texttt{Swap}(\bm{u},s,t)=\bm{v}$ .

Appendix B Pseudocode

B-A Constructive algorithm for $(\Gamma,E^{*}_{\texttt{UM}})$

Algorithm 1 Constructive algorithm for

(\Gamma,E^{*}_{\texttt{UM}})

Input a pair $(n,m)$ of positive integers
Output transitive reduction $(\Gamma,E^{*}_{\texttt{UM}})$

1:procedure

L\leftarrow\mbox{list consisting of }(0,0,\ldots,0)

\triangleright

returns

\Gamma

E\leftarrow\mbox{empty list}

\triangleright

returns

E^{*}_{\texttt{UM}}

Q\leftarrow\mbox{queue consisting of }(0,0,\ldots,0)

5: while

Q

is not empty do

\bm{u}\leftarrow\textsc{dequeue}(Q)

7: if

(\bm{u},n)\in\mathcal{D}_{\texttt{U}}

then

\triangleright

for (UM1)

\bm{v}\leftarrow\texttt{Up}(\bm{u},n)

\textsc{append}(L,\bm{v})

\textsc{append}(E,(\bm{u},\bm{v}))

10:

\textsc{enqueue}(Q,\bm{v})

11: for

s\in[1,n-1]

\triangleright

for (UM2)

12: if

(\bm{u},s,s+1)\in\mathcal{D}_{\texttt{M}}

then

13:

\bm{v}\leftarrow\texttt{Move}(\bm{u},s,s+1)

14:

\textsc{append}(L,\bm{v})

\textsc{append}(E,(\bm{u},\bm{v}))

15:

\textsc{enqueue}(Q,\bm{v})

The nodes and directed edges of graph $(\Gamma,E^{*}_{\texttt{UM}})$ are enumerated in a breadth-first search and are stored in two lists $L$ and $E$ , respectively. We use $\textsc{append}(L,\bm{v})$ , which appends a vertex $\bm{v}$ to the end of $L$ . We similarly use $\textsc{append}(E,(\bm{u},\bm{v}))$ .

A queue $Q$ is used to store nodes of $L$ whose successors are under investigation (the “frontier” of $L$ ). The nodes in $Q$ are listed in ascending order of depth, where the depth of $\bm{v}$ is the shortest-path length from $(0,0,\ldots,0)$ to $\bm{v}$ . We use $\textsc{dequeue}(Q)$ , which returns and deletes the first element in $Q$ , and $\textsc{enqueue}(Q,\bm{v})$ , which appends $\bm{v}$ to the end of $Q$ .

Algorithm 1 summarizes our constructive algorithm for computing the transitive reduction $(\Gamma,E^{*}_{\texttt{UM}})$ . For a given node $\bm{u}$ in line 6, we find all nodes $\bm{v}$ satisfying condition (UM1) in lines 7–10 and those satisfying condition (UM2) in lines 11–15.

By definition, the membership test for $\mathcal{D}_{\texttt{U}}$ and $\mathcal{D}_{\texttt{M}}$ can be performed in $\mathcal{O}(1)$ time. Recall that dequeue, enqueue, and append can be performed in $\mathcal{O}(1)$ time. The FOR loop in lines 11–15 executes in $\mathcal{O}(n)$ time. Therefore, recalling that $|\Gamma|=(m+1)^{n}$ , we see that Algorithm 1 runs in $\mathcal{O}(n(m+1)^{n})$ time.

B-B Constructive algorithm for $(\Gamma,E^{*}_{\texttt{US}})$

Algorithm 2 summarizes our constructive algorithm for computing the transitive reduction $(\Gamma,E^{*}_{\texttt{US}})$ . Here, the difference from Algorithm 1 is the method for finding nodes $\bm{v}$ satisfying conditions (US1) or (US2). For a given node $\bm{u}$ in line 6, we find all nodes $\bm{v}$ satisfying condition (US1) in lines 7–16, and those satisfying condition (US2) in lines 17–26. The following describes the latter part.

Algorithm 2 Constructive algorithm for

(\Gamma,E^{*}_{\texttt{US}})

Input: a pair $(n,m)$ of positive integers
Output: the transitive reduction $(\Gamma,E^{*}_{\texttt{US}})$

1:procedure

L\leftarrow\mbox{list consisting of }(0,0,\ldots,0)

\triangleright

returns

\Gamma

E\leftarrow\mbox{empty list}

\triangleright

returns

E^{*}_{\texttt{US}}

Q\leftarrow\mbox{queue consisting of }(0,0,\ldots,0)

5: while

Q

is not empty do

\bm{u}\leftarrow\textsc{dequeue}(Q)

7: for

s\in[1,n]

\triangleright

for (US1)

8: if

(\bm{u},s)\in\mathcal{D}_{\texttt{U}}

then

\textit{flag}\leftarrow\textit{True}

10: for

j\in[s+1,n]

11: if

u_{j}\in\{u_{s},u_{s}+1\}

then

12:

\textit{flag}\leftarrow\textit{False}

, break

13: if

\textit{flag}=\textit{True}

then

14:

\bm{v}\leftarrow\texttt{Up}(\bm{u},s)

15:

\textsc{append}(L,\bm{v})

\textsc{append}(E,(\bm{u},\bm{v}))

16:

\textsc{enqueue}(Q,\bm{v})

17: for

s\in[1,n-1]

\triangleright

for (US2)

18:

b\leftarrow m+1

19: for

t\in[s+1,n]

20: if

(\bm{u},s,t)\in\mathcal{D}_{\texttt{S}}

and

u_{t}<b

then

21:

\bm{v}\leftarrow\texttt{Swap}(\bm{u},s,t)

22:

\textsc{append}(L,\bm{v})

\textsc{append}(E,(\bm{u},\bm{v}))

23:

\textsc{enqueue}(Q,\bm{v})

24:

b\leftarrow u_{t}

25: else if

u_{t}=u_{s}

then

26: break

Let $(\bm{u},\bm{v})$ be a directed edge added to $E$ in line 22. Let $(\bar{s},\bar{t})$ be such that $\bm{v}=\texttt{Swap}(\bm{u},\bar{s},\bar{t})$ . From line 20, we have $u_{\bar{s}}<u_{\bar{t}}<b$ . Note that for each $t$ in line 19, value $b$ gives the smallest value of $u_{j}$ with $u_{j}>u_{\bar{s}}$ for $j\in[\bar{s}+1,t-1]$ . Also, due to lines 25–26, $u_{j}\not=u_{\bar{s}}$ for $j\in[\bar{s}+1,\bar{t}-1]$ . Combining these observations, we see that for $j\in[\bar{s}+1,\bar{t}-1]$ ,

u_{j}<u_{\bar{s}}~{}~{}\mbox{or}~{}~{}u_{j}\geq b>u_{\bar{t}}~{}~{}\mbox{(meaning $u_{j}\notin[u_{\bar{s}},u_{\bar{t}}]$)}.

Therefore, the pair $(\bm{u},\bm{v})$ satisfies condition (US2). It is easy to verify that this process finds all vertices $\bm{v}$ satisfying condition (US2).

Since both of the double FOR loops at lines 7–16 and 17–26 execute in $\mathcal{O}(n^{2})$ time, Algorithm 2 runs in $\mathcal{O}(n^{2}(m+1)^{n})$ time.

References

[1] C. C. Aggarwal, Recommender systems, Cham: Springer International Publishing, 2016
[2] A. V. Aho, M. R. Garey, and J. D. Ullman, “The transitive reduction of a directed graph,” SIAM J. Comput., vol. 1, no. 2, pp. 131–137, 1972.
[3] Y. Aıt-Sahalia and J. Duarte, “Nonparametric option pricing under shape restrictions,” J. Econom., vol. 116, no. 1–2, pp. 9–47, 2003.
[4] E. E. Altendorf, A. C. Restificar, and T. G. Dietterich, “Learning from sparse data by exploiting monotonicity constraints,” in Proc. UAI, 2005, pp. 18–26.
[5] A. Baumann, J. Haupt, F. Gebert, and S. Lessmann, “Changing perspectives: Using graph metrics to predict purchase probabilities,” Expert Syst. Appl., vol. 94, pp. 137–148, 2018.
[6] J. Borges and M. Levene, “Evaluating variable-length markov chain models for analysis of user web navigation sessions,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 4, pp. 441–452, 2007.
[7] R. E. Bucklin and C. Sismeiro, “Click here for Internet insight: Advances in clickstream data analysis in marketing,” J. Interact. Mark., vol. 23, no. 1, pp. 35–48, 2009.
[8] S. Chatterjee, A. Guntuboyina, and B. Sen, “On risk bounds in isotonic and other shape restricted regression problems,” Ann. Stat., vol. 43, no. 4, pp. 1774–1800, 2015.
[9] Y. L. Chen, M. H. Kuo, S. Y. Wu, and K. Tang, “Discovering recency, frequency, and monetary (RFM) sequential patterns from customers’ purchasing data,” Electron. Commer. Res. Appl., vol. 8, no. 5, pp. 241–251, 2009.
[10] D. Cirqueira, M. Hofer, D. Nedbal, M. Helfert, and M. Bezbradica, “Customer Purchase Behavior Prediction in E-commerce: Current Tasks, Applications and Methodologies,” in Inter. Workshop NFMCP, 2019.
[11] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms, MIT press, 2009.
[12] Y. Dong and W. Jiang, “Brand purchase prediction based on time-evolving user behaviors in e-commerce,” Concurr. Comput., vol. 31, no. 1, pp. e4882, 2019.
[13] P. S. Fader, B. G. Hardie, and K. L. Lee, “RFM and CLV: Using iso-value curves for customer base analysis,” J. Mark. Res., vol. 42, no. 4, pp. 415–430, 2005.
[14] B. R. Gaines, J. Kim, and H. Zhou, “Algorithms for fitting the constrained lasso,” J. Comput. Graph. Stat., vol. 27, no. 4, pp. 861–871, 2018.
[15] P. Groeneboom and G. Jongbloed, Nonparametric estimation under shape constraints (Vol. 38), Cambridge University Press, 2014.
[16] A. Guntuboyina and B. Sen, “Nonparametric shape-restricted regression,” Stat. Sci., vol. 33, no. 4, pp. 568–594, 2018.
[17] Q. Han, T. Wang, S. Chatterjee, and R. J. Samworth, “Isotonic regression in general dimensions,” Ann. Stat., vol. 47, no. 5, pp. 2440–2471, 2019.
[18] T. Huang and J. A. Van Mieghem, “Clickstream data and inventory management: Model and empirical analysis,” Prod. Oper. Manag., vol. 23, no. 3, pp. 333–347, 2014.
[19] J. Iwanaga, N. Nishimura, N. Sukegawa, and Y. Takano, “Estimating product-choice probabilities from recency and frequency of page views,” Knowl. Based Syst., vol. 99, pp. 157–167, 2016.
[20] J. Iwanaga, N. Nishimura, N. Sukegawa, and Y. Takano, “Improving collaborative filtering recommendations by estimating user preferences from clickstream data,” Electron. Commer. Res. Appl., vol. 37, 100877, 2019.
[21] D. Jannach, M. Ludewig, and L. Lerche, “Session-based item recommendation in e-commerce: on short-term intents, reminders, trends and discounts,” User Model. User-adapt. Interact., vol. 27, no. 3–5, pp. 351–392, 2017.
[22] P. K. Kannan, “Digital marketing: A framework, review and research agenda,” Int. J. Res. Mark., vol. 34, no. 1, pp. 22–45, 2017.
[23] D. Koehn, S. Lessmann, and M. Schaal, “Predicting online shopping behaviour from clickstream data using deep learning,” Expert Syst. Appl., vol. 150, no. 15, 113342, 2020.
[24] F. Le Gall, “Powers of tensors and fast matrix multiplication,” in Proc. ISSAC, 2014, pp. 296–303, 2014.
[25] Q. Li, M. Gu, K. Zhou, and X. Sun, “Multi-Classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce,” in IEEE ICDM Workshop, 2015, pp. 1048–1054.
[26] D. Li, G. Zhao, Z. Wang, W. Ma, and Y. Liu, “A method of purchase prediction based on user behavior log,” in IEEE ICDM Workshop, 2015, pp. 1031–1039.
[27] M. Ludewig and D. Jannach, “Evaluation of session-based recommendation algorithms,” User Model. User-adapt. Interact., vol. 28, no. 4–5, pp. 331–390, 2018.
[28] R. L. Matzkin, “Semiparametric estimation of monotone and concave utility functions for polychotomous choice models,” Econometrica, vol. 59, no. 5, pp. 1315–1327, 1991.
[29] W. W. Moe, “An empirical two-stage choice model with varying decision rules applied to internet clickstream data,” J. Mark. Res., vol. 43, no. 4, pp. 680–692, 2006.
[30] W. W. Moe and P. S. Fader, “Dynamic conversion behavior at e-commerce sites,” Manage. Sci., vol. 50, no. 3, pp. 326–335, 2004.
[31] A. L. Montgomery, S. Li, K. Srinivasan, and J. C. Liechty, “Modeling online browsing and path analysis using clickstream data,” Mark. Sci., vol. 23, no. 4, pp. 579–595, 2004.
[32] N. Nishimura, N. Sukegawa, Y. Takano, and J. Iwanaga, “A latent-class model for estimating product-choice probabilities from clickstream data,” Inf. Sci., vol. 429,pp. 406–420, 2018.
[33] E. W. Ngai, L. Xiu, and D. C. Chau, “Application of data mining techniques in customer relationship management: A literature review and classification,” Expert Syst. Appl., vol. 36, no. 2, pp. 2592–2602, 2009.
[34] P. Orzechowski, W. La Cava, and J. H. Moore, “Where are we now? A large benchmark study of recent symbolic regression methods,” in Proc. GECCO, 2018, pp. 1183–1190.
[35] P. M. Pardalos and G. Xue, “Algorithms for a class of isotonic regression problems,” Algorithmica, vol. 23, no. 3, pp. 211–222, 1999.
[36] C. H. Park and Y. H. Park, “Investigating purchase conversion by uncovering online visit patterns,” Mark. Sci., vol. 35, no. 6, pp. 894–914, 2016.
[37] A. Pitman and M. Zanker, “Insights from applying sequential pattern mining to e-commerce click stream data,” in IEEE ICDM Workshop, 2010, pp. 967–975.
[38] J. Qiu, Z. Lin, and Y. Li, “Predicting customer purchase behavior in the e-commerce context,” Electron. Commer. Res., vol. 15, no. 4, pp. 427–452, 2015.
[39] P. Romov and E. Sokolov, “Recsys challenge 2015: ensemble learning with categorical features,” in Proc. Inter. ACM RecSys Challenge, 2015, pp. 1–4.
[40] B. Schröder, Ordered Sets: An Introduction with Connections from Combinatorics to Topology, Birkhäuser, 2016.
[41] C. Sismeiro and R. E. Bucklin, “Modeling purchase behavior at an e-commerce web site: A task-completion approach,” J. Mark. Res., vol. 41, no. 3, pp. 306–323, 2004.
[42] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,” Math. Program. Comput., in press.
[43] Q. F. Stout, “Isotonic regression for multiple independent variables,” Algorithmica, vol. 71, no. 2, pp. 450–470, 2015.
[44] R. J. Tibshirani, H. Hoefling, and R. Tibshirani, “Nearly-isotonic regression,” Technometrics, vol. 53, no. 1, pp. 54–61, 2011.
[45] E. Turban, J. Outland, D. King, J. K. Lee, T. P. Liang, and D. C. Turban, Electronic Commerce 2018: A Managerial and Social Networks Perspective, Springer, 2017.
[46] D. Van den Poel and W. Buckinx, “Predicting online-purchasing behaviour,” Eur. J. Oper. Res., vol. 166, no. 2, pp. 557–575, 2005.
[47] A. Vieira, “Predicting online user behaviour using deep learning algorithms,” arXiv preprint arXiv:1511.06247, 2015.
[48] J. Wang and S. K. Ghosh, “Shape restricted nonparametric regression with Bernstein polynomials,” Comput. Stat. Data Anal., vol. 56, no. 9, pp. 2729–2741, 2012.
[49] S. Warshall, “A theorem on boolean matrices,” J. ACM, vol. 9, no. 1, pp. 11–12, 1962.
[50] Z. Wu, B. H. Tan, R. Duan, Y. Liu, and R. S. Mong Goh, “Neural modeling of buying behaviour for e-commerce from clicking patterns,” in Proc. Inter. ACM RecSys Challenge, 2015, pp. 1–4.
[51] J. Yeo, S. Kim, E. Koh, S. W. Hwang, and N. Lipka, “Predicting online purchase conversion for retargeting,” in Proc. ACM Inter. Conf. WSDM, 2017, pp. 591–600.
[52] Z. Yi, D. Wang, K. Hu, and Q. Li, “Purchase behavior prediction in m-commerce with an optimized sampling methods,” in IEEE ICDMW, 2015, pp. 1085–1092.
[53] Y. Zhang and M. Pennacchiotti, “Predicting purchase behaviors from social media,” in Proc. Inter. Conf. WWW, 2013, pp. 1521–1532.
[54] Y. Zhao, L. Yao, and Y. Zhang, “Purchase prediction using Tmall-specific features,” Concurr. Comput., vol. 28, no. 14, pp. 3879–3894, 2016.


(a) $N=3$ , $(n,m)=(7,3)$	(b) $N=5$ , $(n,m)=(7,3)$	(c) $N=10$ , $(n,m)=(7,3)$

(d) $N=3$ , $(n,m)=(5,6)$	(e) $N=5$ , $(n,m)=(5,6)$	(f) $N=10$ , $(n,m)=(5,6)$


(a) $N=3$ , $(n,m)=(7,3)$	(b) $N=5$ , $(n,m)=(7,3)$	(c) $N=10$ , $(n,m)=(7,3)$

(d) $N=3$ , $(n,m)=(5,6)$	(e) $N=5$ , $(n,m)=(5,6)$	(f) $N=10$ , $(n,m)=(5,6)$


(a) SeqEmp, $v_{3}=0$	(b) SeqEmp, $v_{3}=1$	(c) SeqEmp, $v_{3}=2$

(d) SeqUM, $v_{3}=0$	(e) SeqUM, $v_{3}=1$	(f) SeqUM, $v_{3}=2$

(g) SeqUS, $v_{3}=0$	(h) SeqUS, $v_{3}=1$	(i) SeqUS, $v_{3}=2$


(a) SeqEmp, $v_{3}=0$	(b) SeqEmp, $v_{3}=1$	(c) SeqEmp, $v_{3}=2$

(d) SeqUM, $v_{3}=0$	(e) SeqUM, $v_{3}=1$	(f) SeqUM, $v_{3}=2$

(g) SeqUS, $v_{3}=0$	(h) SeqUS, $v_{3}=1$	(i) SeqUS, $v_{3}=2$

Predicting Online Item-choice Behavior: A Shape-restricted Regression Approach

Abstract

Index Terms:

I Introduction

II Related work

II-A Prediction of online user behavior

II-B Shape-restricted regression

III Two-dimensional probability table

III-A Empirical probability table

III-B Two-dimensional monotonicity model

IV PV sequence model

IV-A PV sequence

IV-B Operations based on recency and frequency

Definition 1 (Up).

Definition 2 (Move).

Definition 3 (Swap).

IV-C Partially ordered sets

Definition 4 (⪯UM\preceq_{\texttt{UM}}).

Definition 5 (⪯US\preceq_{\texttt{US}}).

Definition 6 (⪯lex\preceq_{\texttt{lex}}).

Lemma 1.

Theorem 1.

Proof.

Theorem 2.

IV-D Shape-restricted optimization model

V Algorithms for transitive reduction

V-A Transitive reduction

V-B General-purpose algorithms

Lemma 2.

V-C Constructive algorithms

Theorem 3.

Proof.

Theorem 4.

Proof.

VI Experiments

VI-A Methods for comparison

VI-B Performance evaluation methodology

VI-C Effects of the transitive reduction

VI-D Prediction performance of our PV sequence model

VI-E Analysis of estimated item-choice probabilities

VII Conclusion

Appendix A Proofs

A-A Proof of Theorem 3

The “only if” part

The “if” part

A-B Proof of Theorem 4

The “only if” part

The “if” part

Appendix B Pseudocode

B-A Constructive algorithm for (Γ,EUM∗)(\Gamma,E^{*}_{\texttt{UM}})

B-B Constructive algorithm for (Γ,EUS∗)(\Gamma,E^{*}_{\texttt{US}})

References

Predicting Online Item-choice Behavior:
A Shape-restricted Regression Approach

Definition 4 ( $\preceq_{\texttt{UM}}$ ).

Definition 5 ( $\preceq_{\texttt{US}}$ ).

Definition 6 ( $\preceq_{\texttt{lex}}$ ).

B-A Constructive algorithm for $(\Gamma,E^{*}_{\texttt{UM}})$

B-B Constructive algorithm for $(\Gamma,E^{*}_{\texttt{US}})$