Data-driven Preference Learning Methods
for Sorting Problems with Multiple Temporal Criteria

Yijun Li ¹¹1Both authors contributed equally to this work. School of Data Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Kowloon, Hong Kong SAR Mengzhuo Guo ²²2Both authors contributed equally to this work. Business School, Sichuan University, No.24 South Section 1, Yihuan Road, Chengdu, China, 610065 Miłosz Kadziński Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland Qingpeng Zhang Musketeers Foundation Institute of Data Science and LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China

Abstract

We present novel preference learning approaches for sorting problems with multiple temporal criteria. They leverage an additive value function as the basic preference model, adapted for accommodating time series data. Given assignment examples concerning reference alternatives, we learn such a model using convex quadratic programming. It is characterized by fixed-time discount factors and operates within a regularization framework. This approach enables the consideration of temporal interdependencies between timestamps while mitigating the risk of overfitting. To enhance scalability and accommodate learnable time discount factors, we introduce a novel monotonic Recurrent Neural Network (mRNN). It captures the evolving dynamics of preferences over time, while upholding critical properties inherent to MCS problems. These include criteria monotonicity, preference independence, and the natural ordering of classes. The proposed mRNN can describe the preference dynamics by depicting piecewise linear marginal value functions and personalized time discount factors along with time. Thus, it effectively combines the interpretability of traditional sorting methods with the predictive potential offered by deep preference learning models. We comprehensively assess the proposed models on synthetic data scenarios and a real-case study centered on classifying valuable users within a mobile gaming app based on their historical in-game behavioral sequences. Empirical findings underscore the notable performance improvements achieved by the proposed models when compared to a spectrum of baseline methods, spanning machine learning, deep learning, and conventional multiple criteria sorting approaches.

keywords:

Preference learning , Decision analysis , Sorting , Additive value function , Deep learning , Recurrent neural networks , Temporal criteria

1 Introduction

The realm of Multiple Criteria Sorting (MCS) problems, also known as ordinal classification, pertains to evaluating alternatives based on their performance across multiple criteria. Subsequently, these alternatives are categorized into pre-defined classes that inherently embody a preference order [44]. This field has emerged as pivotal within Multiple Criteria Decision Aiding (MCDA) [16, 47]. It finds widespread use across a diverse spectrum of applications, including nanotechnology assessments [31], consumer preference analysis [24], credit rating [15], and estimating migration potential [3].

A contemporary trend in sorting problems has emerged, necessitating the Decision Makers’ (DMs’) involvement in expressing preferences through assignment examples. Subsequently, such holistic preferences are utilized to construct a compatible preference model within a preference disaggregation framework, as outlined in, e.g., [14, 52]. Thus inferred model can be subsequently employed to assign non-reference alternatives using a value-driven threshold-based sorting procedure.

The underlying idea in data-driven preference discovery or learning is similar. That is, a preference model is autonomously constructed using a batch of historical decisions (training samples) to predict outcomes for new data samples [17]. However, diverging from traditional MCDA methods for preference learning, these approaches leverage the capabilities of information technology to manage high-dimensional data. Consequently, their primary focus revolves around the ability to capture intricate patterns between input and output data and to excel in the realm of scalable decision-making [12]. Notably, these methods accommodate non-monotonic criteria [18, 23, 41], consider interactions between criteria [25, 26, 40], and are adaptable to extracting criteria from user-generated content [24, 38, 49]. This flexibility renders them well-suited to capture the complexities inherent in real-world decision-making scenarios.

While data-driven preference learning methods have made significant strides in various decision problems, they face limitations in handling scenarios where the decision criteria involve time series data. Incorporating temporal criteria enables DMs to discern preference-changing patterns over time, a crucial aspect in numerous real-world applications. For instance, in credit risk evaluation, a DM may need to assess consumer default risk based on historical behavioral data encompassing purchase and loan repayment records [4, 37]. Also, time series contribute to developing robust long-term electricity demand forecasts in energy forecasting, as exemplified in [1]. Further, continuous recording of a patient’s health status in the healthcare sector generates a temporal data series. This chronicle of information unveils patterns in readmission and elucidates the behavioral trajectories associated with chronic diseases [36]. The temporal dimension in these instances proves instrumental in extracting valuable insights and enhancing predictive modeling.

To preserve temporal information in MCDA problems, one approach is to apply traditional preference learning models directly to time series. For example, Angelopoulos et al. [1] assume each time series has a quantitative impact on the decision and evaluates time series by a value-based disaggregation model. However, this approach neglects crucial temporal information, such as the fact that yesterday’s behaviors may be more relevant to the current decision than those from a week ago. Alternatively, fixed time discount factors or some statistics, such as the time series average, tendency, and seasonality, are introduced to account for the varying importance of nearby behaviors. For instance, Campello et al. [7] use descriptive statistics to study the temporal criteria impacts on a multi-objective preference learning model.

This paper tackles the MCS problems where the criteria are allowed to be time series, hence further bridging the gap between MCDA and temporal criteria. We aim to address the flaws of the existing approaches in this stream by treating performance for each timestamp as a criterion associated with a time discount factor. The adopted preference model comprises a collection of additive piecewise linear value functions. However, using such functions to approximate each timestamp’s contribution to the alternative’s comprehensive value would result in a highly complex preference model that would be difficult to interpret. Moreover, while excelling in replicating the provided assignment examples, it would exhibit poor predictive performance when applied to non-reference alternatives. This phenomenon is commonly referred to as over-fitting [41]. Also, as data volume and time series length increase, along with the inclusion of learnable time discount factors, such a model would demand substantial storage space and computational resources. This resource requirement can become particularly prohibitive when dealing with large datasets and long time series. Therefore, to facilitate modeling performance values at each timestamp of the criteria, we partition them into multiple sub-intervals delineated by characteristic points and propose dedicated preference learning algorithms.

Our methodological contribution is two-fold. First, we introduce a convex quadratic programming model, named the Temporal Preference Learning (TPL) model, to approximate additive piecewise linear functions for each timestamp of the criteria. This model operates within a regularization framework, effectively addressing the challenge of over-fitting. It applies a fixed discount factor to consecutive timestamps, accounting for the importance of nearby timestamps. While the optimization problem at hand can be directly solved, it bears the risk of incurring substantial computational costs as the scale of the data increases.

Second, we extend the proposed mathematical programming model by modifying the a monotonic recurrent neural network (mRNN) to suit the MCS problem context with preference-ordered classes and accommodate learnable discount factors. The resulting model can adaptively learn personalized time discount factors based on individual data sample characteristics, hence capturing the dynamics of changing preferences over time. Furthermore, the mRNN retains the intuitive and comprehensible nature of MCDA approaches. It accomplishes this by elucidating changes in time discount factors and by portraying obtained marginal value functions that adhere to pre-defined properties like monotonicity and preference independence.

Our most essential application-oriented contribution consists of using the proposed methods in a real-world problem in the gaming industry. Specifically, we distinguish high- and low-value mobile game users based on their historical behavior sequences, i.e., in-game purchases and action records. The study provides an in-depth analysis of individual users’ evolving preferences and the learned time discount factors. In this way, we confirm that the proposed methodologies facilitate a deeper understanding of users’ evolving behavior patterns for the DM.

We also rigorously evaluate the performance of the proposed models and a spectrum of baseline models in terms of their predictive performance. The considered approaches involve machine learning, deep learning, and conventional MCDA approaches. The results – quantified in terms of accuracy, precision, recall, and F-score – are further verified on synthetic data with various complexities of time series, different characteristics of data generators, and assumptions underlying an assumed value-based model.

2 Related Work

2.1 Generalizations of Value-based Preference Learning Methods

Value-based preference learning methods have gained prominence due to their ability to disaggregate indirect preferences, reducing the cognitive burden on the DM, and using an intuitive preference model [12]. Extensive research in this domain has addressed various aspects, including robust recommendations [22, 32], representative preference model selection [20], and contingent preference disaggregation [33, 39].

Typically, these methods assumed that criteria are static, monotonic, and preference-independent, simplifying the description of decision problems. Recent advancements have focused on relaxing these premises to extend the applicability of value-based MCDA methods. One avenue of research involves allowing the marginal value functions to be non-monotonic. For instance, Ghaderi et al. [18] introduced a linear fractional programming model to account for preference inflections. Guo et al. [23] utilized a progressive algorithm to adaptively adjust non-monotonic marginal value functions by constraining the variations of their slopes. Researchers have also incorporated regularization frameworks to automatically smooth marginal value functions for approximating piecewise linear functions using efficient learning algorithms [41]. Additionally, Guo et al. [26] proposed a hybrid model that permits the preference model to be expressed as polynomials, enabling a machine learning-enabled, non-monotonic, and expressive MCDA approach for complex real-world applications.

In scenarios involving interacting criteria, Angilella et al. [2] considered both positive and negative interactions, including an additional term to the comprehensive value. Within the framework of robust ordinal regression, bonus and penalty values were introduced for the same purpose [21]. Guo et al. [25] developed a model based on factorization machines to manually model criteria interactions, proving effective in capturing flexible marginal value function patterns. Recent work addressed the challenges posed by data-intensive tasks with interacting criteria by incorporating a regularization scheme into value-based MCDA methods [40].

The advanced models mentioned above have been designed for static criteria, where performance values remain constant over time. In contrast, handling time series criteria introduces temporal dynamics into decision-making. Some efforts have been made in this direction, such as Yan et al. [50] analyzing trends in alternatives by aggregating data across periods and Banamar [5] extending the PROMETHEE method to incorporate the importance of temporality. Further, Angelopoulos et al. [1] have assumed that each time series can output a quantitative measure to the decision and used a preference disaggregation framework to obtain robust forecasts for long-term electricity demand. Thesari et al. [46] have collected statistical data about public resource distribution to determine a timeline of budget data and employed a classic MCDA approach to analyze long-term manager behaviors. Such forecasting methods have also been employed to predict future rankings under uncertainty [8]. A recent approach uses descriptive statistics measuring the time series trend and seasonality to help understand the DM’s preference in the long term under a preference disaggregation scheme [7]. Finally, a variant of the TOPSIS method with a tensorial formulation that embeds time as the third dimension has been proposed to capture temporal information in criteria [9].

In summary, the developments mentioned above either treat the time series directly as the criteria values or use some manual statistics to describe their temporal characteristics. The first stream of methods may underscore the temporal relationship. In turn, the other stream may fail to obtain explicit time-dependent patterns of preferences, which is a crucial advantage of value function-based MCDA methods [22]. We aim to address these shortcomings in the paper.

2.2 Deep Preference Learning in Multiple Criteria Decision Aiding

Artificial neural networks (ANN) and their efficient learning algorithms, known as deep learning (DL), have found increasing adoption in business analytics and operational research due to their remarkable performance in predicting high-dimensional and scalable data [34]. For instance, in the context of MCDA, researchers have explored the application of feedforward neural networks in addressing decision problems within a supervised learning framework [48]. These studies have introduced various desirable properties related to multiattribute preference models. In other instances, rather than adhering to multi-attribute value theory, Hu [30] designed an outranking-based single-layer perceptron for bankruptcy prediction. They employed a genetic algorithm to determine the parameters of the preference model based on indirect pairwise comparisons among the alternatives.

However, one of the primary challenges in deploying ANN or DL in MCDA problems lies in ensuring the interpretability of the models. The DM often requires a deeper understanding of the decision process and the relationships between criteria and recommended decisions. To address this concern, Guo et al. [26] proposed a hybrid DL-based model. It seeks to enhance predictive performance while also delivering interpretable results. More recent research systematically integrated DL into six classic MCDA models, introducing a monotonic block to ensure that criteria assumptions are satisfied [42]. The authors aimed to bridge the gap between DL and MCDA by incorporating novel network structures that cater to the interpretability requirements of MCDA.

In summary, deep preference learning in MCDA has primarily focused on enhancing interpretability in decision models [26] or adapting neural networks to meet specific decision-making requirements [42]. However, these efforts have not extended to decision contexts involving criteria with temporal information. Addressing this temporal dimension in decision-making remains an area for development.

3 The Proposed Temporal Preference Learning Model

3.1 Problem Description

Let us the following notation to describe MCS problems in the presence of temporal criteria:

1.

$A^{R}=\{a,b,\ldots\}$ – a set of reference alternatives. These alternatives can be regarded as training samples, and their classifications are known.
2.

$A=\{a_{1},a_{2},\ldots\}$ – a set of non-reference alternatives to be classified.
3.

$Cl=\{Cl_{1},\ldots,Cl_{H}\}$ – a set of pre-defined decision classes exhibiting a natural ordering, $Cl_{h}\succ Cl_{h-1},h=2,\dots,H$ , indicating that the alternatives in class $Cl_{h}$ are preferred to those in class $Cl_{h-1}$ . In the threshold-based MCDA methods, class $Cl_{h}$ is bounded by real-valued thresholds $\theta_{h}$ and $\theta_{h-1}$ .
4.

$G=\{g_{1},\ldots,g_{j},\ldots,g_{m}\}$ – a family of $m$ temporal criteria, and $\mathbf{g}_{j}(a)$ is a time series containing $T_{j}$ timestamps, i.e., $\mathbf{g}_{j}(a)=\left(g^{1}_{j}(a),\ldots,g^{t}_{j}(a),\ldots,g^{T_{j}}_{j}(a)\right)\in\mathbb{R}^{T_{j}},a\in A\cup A^{R}$ , where $g^{t}_{j}(a)$ is the performance value of alternative $a$ on criterion $g_{j}$ at $t$ -th timestamp.

We aim to learn a preference model from the training samples $a\in A^{R}$ given their assignments $Cl(a)\in Cl$ and then deduce the classification of non-reference alternatives $a^{\prime}\in A$ . A widely used preference model is an additive value function $U(\cdot)$ aggregating the performances of each alternative $a\in A\cup A^{R}$ on all criteria into a comprehensive score $U(a)$ :

U(a)=\sum_{j=1}^{m}u_{j}^{T_{j}}(\mathbf{g}_{j}(a)),

(1)

where $u_{j}^{T_{j}}(\mathbf{g}_{j}(a))$ is the marginal value on criterion $g_{j}$ at the last timestamp, i.e., the ${T_{j}}$ -th timestamp of the $j$ -th time series. In the value-based MCDA methods, the criteria are assumed to be preferentially independent and monotonic [22]. We define the monotonicity of temporal criteria as follows:

Definition 1.

A temporal criterion $g_{j}$ is monotonic if $u_{j}^{T_{j}}(\mathbf{g}_{j}(a))>u_{j}^{T_{j}}(\mathbf{g}_{j}(b))$ , given $g_{j}^{t^{\prime}}(a)>g_{j}^{t^{\prime}}(b),\forall t^{\prime}\in T^{c}$ at some timestamps, and $g_{j}^{t}(a)=g_{j}^{t}(b),\forall t\in T$ at the other timestamps, where $T\cap T^{c}=\varnothing,T\cup T^{c}=\{1,\ldots,T_{j}\}$ .

Definition 1 establishes that in the context of two time series featuring the $j$ -th criterion, diverging only at a singular performance value occurring at the $t$ -th timestamp, a preference is accorded to the marginal value associated with the superior performance. As stated in Figure 1, the marginal value is defined as the last timestamp’s value, which is related to each timestamp’s performance. Then, the comprehensive value aggregates the marginal values associated with the last timestamps’ for all criteria. Therefore, alternative $a$ is at least as good as alternative $b$ on a given criterion if only one timestamp’s value of $a$ is greater or equal to that for $b$ , with all values at other timestamps equal. This preference aligns with the monotonicity principles delineated in static criteria, as expounded upon in [22].

Refer to caption — Figure 1: Threshold-based sorting procedure with sub-marginal value functions. The sub-marginal value functions $f_{j}^{t}(\cdot)$ of timestamps aggregate the marginal value functions $u_{j}^{T_{j}}(\cdot)$ considering time discount factors. Note that the time discount factor $\tau_{j,t}$ can be fixed to $\tau$ (Section 3) or learnable across criteria and timestamps (Section 4), and the length of the time series can be fixed or not. The comprehensive value of an alternative is determined by $m$ marginal values, and the assignment is decided by comparing the comprehensive value to the delimiting class thresholds.

We approximate marginal value function $u_{j}^{T_{j}}(\cdot)$ using piecewise linear one, aggregating a set of functions $u_{j}^{t}(\cdot)$ corresponding to each timestamp $t$ . Let $\alpha_{j}^{t}=\min_{a\in\{A\cup A^{R}\}}g_{j}^{t}(a)$ and $\beta_{j}^{t}=\max_{a\in\{A\cup A^{R}\}}g_{j}^{t}(a)$ be the worst and best performance on criterion $g_{j}$ at timestamp $t$ . We then divide $\left[\alpha_{j}^{t},\beta_{j}^{t}\right]$ into $\gamma_{j}^{t}\geq 1$ sub-intervals, i.e., $\left[x_{j}^{t,0},x_{j}^{t,1}\right]$ , $\left[x_{j}^{t,1},x_{j}^{t,2}\right],\ldots,$ $\left[x_{j}^{t,\gamma_{j}^{t}-1},x_{j}^{t,\gamma_{j}^{t}}\right]$ , where $x_{j}^{t,k}=\alpha_{j}^{t}+\frac{k}{\gamma_{j}^{t}}(\beta_{j}^{t}-\alpha_{j}^{t}),k=0,1,\ldots,\gamma_{j}^{t}$ . We define a sub-marginal value $f_{j}^{t}(a)$ of alternative $a$ on criterion $g_{j}$ at $t$ -th timestamp:

f_{j}^{t}(g_{j}^{t}(a))=f_{j}^{t}(x_{j}^{t,k_{j}})+\frac{g^{t}_{j}(a)-x^{t,k_{j}}_{j}}{x^{t,k_{j}+1}_{j}-x^{t,k_{j}}_{j}}\left(f_{j}^{t}(x_{j}^{t,k_{j}+1})-f_{j}^{t}(x_{j}^{t,k_{j}})\right),\quad\mbox{if}\quad g_{j}^{t}(a)\in\left[x_{j}^{t,k_{j}},x_{j}^{t,k_{j}+1}\right].

(2)

Note that $u_{j}^{t}(\cdot)$ is related to $f_{j}^{t}(\cdot)$ at $t$ -th timestamp and the values before $t$ , but the correlation should be decayed along with the time. We assume there is a time discount factor $\tau_{j,t}$ for $j$ -th criterion at $t$ -th timestamp, which is set to be fixed in this section, i.e., $\tau=\tau_{1,1}=\tau_{1,2}=\ldots=\tau_{m,T_{m}}$ . The $t$ -th marginal value is only associated with the $(t-1)$ -th timestamp:

	$\displaystyle u_{j}^{1}(g_{j}^{1}(a))$	$\displaystyle=f_{j}^{1}(g_{j}^{1}(a)),$
		$\displaystyle\cdots$
	$\displaystyle u_{j}^{t}(g_{j}^{t}(a))$	$\displaystyle=f_{j}^{t}(g_{j}^{t}(a))+\tau\cdot u_{j}^{t-1}(g_{j}^{t-1}(a))=f_{j}^{t}(g_{j}^{t}(a))+\sum_{k=1}^{t-1}\tau^{t-k}f_{j}^{k}(g_{j}^{k}(a)),\quad\forall t\in\{2,\ldots,T_{j}\},$

where $\tau\in\left[0,1\right]$ indicates how much information should be remembered at the last timestamp. Note that when $\tau=0$ , the temporal information is not considered. Then, the comprehensive value of alternative $a$ is:

\displaystyle U(a)

\displaystyle=\sum_{j=1}^{m}u_{j}^{T_{j}}(\mathbf{g}_{j}(a))

\displaystyle=\sum_{j=1}^{m}\left[f_{j}^{T_{j}}(g_{j}^{T_{j}}(a))+\sum_{t=1}^{T_{j}-1}\tau^{T_{j}-t}f_{j}^{t}(g_{j}^{t}(a))\right],

(3)

where $f_{j}^{t}(\cdot)$ is the sub-marginal value function since it does not consider the time discount factor. Denote $\Delta f_{j}^{t,k}=f_{j}^{t}(x_{j}^{t,k})-f_{j}^{t}(x_{j}^{t,k-1}),t=1,\ldots,T_{j},k=1,\ldots,\gamma_{j}^{t}$ , then Eq. (2) can be rewritten as:

f_{j}^{t}(g_{j}^{t}(a))=\sum_{k=1}^{k_{j}}\Delta f_{j}^{t,k}+\frac{g^{t}_{j}(a)-x^{t,k_{j}}_{j}}{x^{t,k_{j}+1}_{j}-x^{t,k_{j}}_{j}}\Delta f_{j}^{t,k_{j}+1}\quad if\quad g_{j}^{t}(a)\in\left[x_{j}^{t,k_{j}},x_{j}^{t,k_{j}+1}\right].

(4)

Let $\mathbf{f}_{j}^{t}=(\Delta f_{j}^{t,1},\ldots,\Delta f_{j}^{t,\gamma_{j}^{t}})^{\prime}$ and $\mathbf{v}_{j}^{t}(a)=(v_{j}^{t,1}(a),\ldots,v_{j}^{t,\gamma_{j}^{t}}(a))^{\prime}$ be the sub-marginal value and performance value vectors for criterion $g_{j}$ at $t$ -th timestamp, respectively, where

v_{j}^{t,k}(a)={\left\{\begin{array}[]{*{20}{l}}{1,\quad if\quad g_{j}^{t}(a)>x_{j}^{t,k},}\\ {\frac{g_{j}^{t}(a)-x_{j}^{t,k-1}}{x_{j}^{t,k}-x_{j}^{t,k-1}},\quad if\quad x_{j}^{t,k-1}\leq g_{j}^{t}(a)\leq x_{j}^{t,k},}\\ {0,\quad if\quad g_{j}^{t}(a)<x_{j}^{t,k-1}.}\end{array}\right.}

(5)

Hence, Eq. (3) be expressed as:

\displaystyle U(a)

\displaystyle=\mathbf{u}^{\prime}\mathbf{v}(a)=\sum_{j=1}^{m}\mathbf{u}_{j}^{\prime}\Lambda_{j}\mathbf{v}_{j}(a),

(6)

where $\mathbf{u}=(\mathbf{u}_{1}^{\prime},\ldots,\mathbf{u}_{m}^{\prime})^{\prime}$ and $\mathbf{v}(a)=(\mathbf{v}_{1}(a)^{\prime}\Lambda_{1},\ldots,\mathbf{v}_{m}(a)^{\prime}\Lambda_{m})^{\prime}$ . In particular, $\Lambda_{j}={\left[{{\begin{array}[]{*{20}{c}}{{\left({{\tau}}\right)}\mathop{{}}\nolimits^{{T_{j}-1}}}&{\cdots}&{0}\\ {\vdots}&{\ddots}&{\vdots}\\ {0}&{\cdots}&{\left(\tau\left)\mathop{{}}\nolimits^{{0}}\right.\right.}\end{array}}}\right]}$ , $\mathbf{u}_{j}=((\mathbf{f}_{j}^{1})^{\prime},\ldots,(\mathbf{f}_{j}^{T_{j}})^{\prime})^{\prime}$ , and $\mathbf{v}_{j}(a)=(\mathbf{v}_{j}^{1}(a)^{\prime},\ldots,\mathbf{v}_{j}^{T_{j}}(a)^{\prime})^{\prime}$ . Thus computed comprehensive value $U(a)$ is incorporated into a threshold-based sorting procedure, where the alternative’s assignment is decided based on the comparison of its score with the delimiting class thresholds (see Figure 1).

3.2 Temporal Preference Learning Model

The class assignments for reference alternatives can be translated into pairwise comparisons so that $a$ is preferred to $b$ ( $a\succ b$ ) if the desired class of $a$ is more preferred than the desired class of $b$ . Suppose there are total $N$ pairs of alternatives in training set $\mathcal{D}$ . Denote each pair as $\{(a,b),y_{i}\}_{i=1}^{N}$ , where $y_{i}=1$ if $a\succ b$ and $y_{i}=-1$ if $b\succ a$ . For $i$ -th pair $(a,b)\in A^{R}\times A^{R}$ such that $y_{i}=1$ , we require that the desired relation is reproduced as follows:

U(a)>U(b)\Leftrightarrow\mathbf{u}^{\prime}\mathbf{v}(a)>\mathbf{u}^{\prime}\mathbf{v}(b)\Leftrightarrow\mathbf{u}^{\prime}\mathbf{v}_{i}\geq-\xi_{i},

(7)

where $\xi_{i}\geq 0$ , $\mathbf{v}_{i}=\mathbf{v}(a)-\mathbf{v}(b)$ , and putting the minimization of $\sum_{i}^{N}\xi_{i}$ to the objective function leads to a model with the minimum inconsistency. Since we need to approximate a sub-marginal value function $f_{j}^{t}(\cdot)$ for each timestamp, there are total $\sum\nolimits_{j=1}^{m}\gamma_{m}$ functions. To avoid over-fitting when $\gamma_{j}$ increases [41], we propose a regularized variant of the Temporal Preference Learning (TPL) model, aiming to minimize the total inconsistency:

$\displaystyle(P0)\quad$	$\displaystyle\min\frac{1}{2}\left\\|{\mathbf{u}}\right\\|^{2}_{2}+C\cdot\sum\nolimits_{i=1}^{N}\xi_{i}$
$\displaystyle s.t.:\quad$	$\displaystyle y_{i}\mathbf{u}^{\prime}\mathbf{v}_{i}\geq 1-\xi_{i},$	(8)
	$\displaystyle\xi_{i}\geq 0,i=1,\ldots,N,$	(9)
	$\displaystyle\Delta f_{j}^{t,k}\geq 0,j=1,\ldots,m,t=1,\ldots,T_{j},k=1,\ldots,\gamma_{j}^{t}.$	(10)

There are $N+\sum\nolimits_{j=1}^{m}\sum\nolimits_{t=1}^{T_{j}}\gamma_{j}^{t}$ constraints in the optimization problem $P0$ . Such a high number can prevent obtaining a solution effectively when there are too many pairwise alternatives, i.e., $N\gg\sum\nolimits_{j=1}^{m}\sum\nolimits_{t=1}^{T_{j}}\gamma_{j}^{t}$ . To this end, we transform $P0$ to the following optimization problem:

$\displaystyle{(P1)}\quad$	$\displaystyle\min\frac{1}{2}\left\\|{\mathbf{u}}\right\\|^{2}_{2}-\sum\nolimits_{i=1}^{N}\mu_{i}$
$\displaystyle s.t.:\quad$	$\displaystyle\Delta f_{j}^{t,k}-\sum\nolimits_{i}^{N}y_{i}\mu_{i}V_{i,j}^{t,k}\geq 0,j=1,\ldots,m,t=1,\ldots,T_{j},k=1,\ldots,\gamma_{j}^{t},$	(11)
	$\displaystyle C\geq\mu_{i}\geq 0,i=1,\ldots,N.$	(12)

The proof of the transformation is provided in A. Denote the optimal solution of problem $P1$ by $\mathbf{u}^{*}$ . In the threshold-based value-driven methods, a non-reference alternative $a$ is assigned to class $Cl_{h}$ if $\theta_{h-1}\leq U(a)<\theta_{h},h=1,\dots,H$ . Following [27], we set threshold $\theta_{h}$ between $Cl_{h+1}$ and $Cl_{h}$ in the middle between comprehensive values of two alternatives whose distance is the shortest among all pairs accurately assigned to the respective classes:

\theta_{h}=\frac{{\mathbf{u}^{*}}^{\prime}\mathbf{v}(a)+{\mathbf{u}^{*}}^{\prime}\mathbf{v}(b)}{2},

(13)

where

(a,b)=\mathop{\arg\min}\limits_{{{{{\left\{{\text{(}a,b\text{)}\in\mathop{{A}}\nolimits^{{R}}\times\mathop{{A}}\nolimits^{{R}}\left|{\mathbf{u}^{*}}^{\prime}\mathbf{v}\left(a\left)\geq{\mathbf{u}^{*}}^{\prime}\mathbf{v}\left(b\left),\;a\in A\mathop{{}}\nolimits_{{R}}^{{h+1}},\;b\in A\mathop{{}}\nolimits_{{R}}^{{h}},h=1,\ldots,H-1\right.\right.\right.\right.\right.}\right\}}}}}}{\mathbf{u}^{*}}^{\prime}(\mathbf{v}(a)-\mathbf{v}(b)),

(14)

where $A\mathop{{}}\nolimits_{{R}}^{{h}}\subset A^{R}$ , $h=1,\ldots,H$ is the subset of reference alternatives assigned to class $C_{h}$ . The Hessian matrix of the objective function in $P1$ is positive semidefinite, and the constraint functions are also convex. Thus, $P1$ is a convex quadratic programming problem, which can be solved by standard software packages, such as Cplex³³3https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer, python CVXOPT package⁴⁴4https://cvxopt.org/, and R⁵⁵5https://cran.r-project.org/web/packages/e1071/index.html.

In some decision problems, to interpret and explain the trade-offs between time series., we can normalize the sub-marginal value functions of all timestamps. Since the temporal criteria are assumed to be monotonic and preference-independent, we apply the following transformations:

\displaystyle f_{j}^{t}(x_{j}^{t,k_{j}})^{\prime}

\displaystyle=\frac{f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0})}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}\times\frac{f_{j}^{t}(x_{j}^{t,k_{j}})-f_{j}^{t}(x_{j}^{t,0})}{f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0})}=\frac{f_{j}^{t}(x_{j}^{t,k_{j}})-f_{j}^{t}(x_{j}^{t,0})}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}.

(15)

Eq. (15) ensures the minimal sub-marginal value, i.e., $f_{j}^{t}(x_{j}^{t,0})^{\prime}$ , is zero, and the sum of maximal sub-marginal values, i.e., $\sum\nolimits_{j,t}f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})$ , is one. Meanwhile, the obtained thresholds also need to be transformed:

Proposition 1.

If alternative $a$ is assigned to class $Cl_{h}$ , i.e., $\theta_{h-1}\leq U(a)<\theta_{h}$ , then the transformed threshold should satisfy $\theta_{h}^{\prime}=\frac{\theta_{h}-\sum_{j=1}^{m}{\left[f_{j}^{T_{j}}(x_{j}^{T_{j},0})+\sum_{t=1}^{T_{j}-1}\tau^{T_{j}-t}f_{j}^{t}(x_{j}^{t,0})\right]}}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}$ to ensure $\theta_{h-1}^{\prime}\leq U^{\prime}(a)<\theta_{h}^{\prime}$ , where $U^{\prime}(\cdot)$ is the comprehensive value computed based on the transformed sub-marginal value functions.

The proof of Proposition 1 is available in B. If we normalized the sub-marginal value functions, the optimized thresholds should also be transformed to ensure the classifications are consistent. Note that the thresholds are related to the maximal/minimal marginal values and the time discount factor.

4 Deep Preference Learning with Monotonic Recurrent Neural Network

The solution of the quadratic optimization-based TPL model introduced in Section 3.2 can require high computational effort. It may be too challenging for the existing optimization techniques when the number of assignment examples and timestamps in the temporal criteria scale up and even become unsolvable if the time discount factors are not fixed. Accumulating vast data, it is reasonable to foster the intersection of MCDA and deep learning to help the DM deal with large-scale decision problems [26, 42]. To this end, we formulate the standard RNNs to ensure the monotonicity in Definition 1 and propose a novel monotonic recurrent neural network (mRNN) for MCS problems with temporal criteria.

4.1 Recurrent Neural Network

RNN is an artificial neural network designed to process sequential data and time-series information due to the ability to capture temporal dependencies and patterns within sequences effectively. In contrast to traditional feed-forward neural networks, RNN introduces recurrent connections, allowing information to persist across time, which endows them with the capacity to maintain an internal state or memory. This dynamic state enables RNN to contextualize current inputs based on the history of previously encountered elements in the sequence, effectively modeling long-range dependencies.

The computation at each timestamp involves a non-linear activation function ( $\tanh(\cdot)$ ) applied to a combination of the current input and the previous hidden state., i.e.,

\displaystyle\mathbf{h}^{t}(a)=\tanh(\mathbf{W}_{g}\mathbf{g}^{t}(a)+\mathbf{W}_{h}\mathbf{h}^{t-1}(a)+\mathbf{b}),

(16)

In Eq. (16), $\mathbf{h}^{t}(a)\in\mathbb{R}^{c\times 1}$ is the hidden state at $t$ -th timestamp with pre-defined dimension $c$ , which comprises three parts. The first part comes from the input vector $\mathbf{g}^{t}(a)=(g_{1}^{t}(a),g_{2}^{t}(a),\cdots,g_{m}^{t}(a))^{\prime}\in\mathbb{R}^{m\times 1}$ containing $m$ temporal criteria at $t$ -th timestamp, and $\mathbf{W}_{g}\in\mathbb{R}^{c\times m}$ is the input-to-hidden weight matrix. The second part contains the information from the last timestamp $\mathbf{h}^{t-1}(a)$ , and $\mathbf{W}_{h}\in\mathbb{R}^{c\times c}$ is the hidden-to-hidden weight matrix that captures the hidden state transitions within the recurrent connections in the RNN, allowing the network to retain information from the past timestamps and model sequential dependencies. The last part $\mathbf{b}\in\mathbb{R}^{c}$ is the bias vector associated with the hidden state of the RNN, enabling the model to introduce shifts in the learned representations and can have a significant impact on the network’s ability to fit the data and generalize to test samples. In this way, the hidden state $\mathbf{h}^{t}(a)$ retains information about criteria from the current and previous timestamps, including $\mathbf{g}^{1}(a),\cdots,\mathbf{g}^{t}(a)$ , and carries it forward to influence future predictions.

The parameters in Eq. (16) can be optimized by standard gradient descent algorithms. The hyperbolic tangent activation function, $\tanh(\cdot)$ , squashes the input values into the range $(-1,1)$ , ensuring non-linear transformations, which is crucial for the RNN to learn complex patterns and dependencies in sequential data [35].

Standard RNN cannot be directly applied to MCS problems due to the following three issues. First, the non-linear activation functions cannot ensure the monotonic relationship as illustrated in Definition 1. It is essential in most MCDA problems since the preferences are usually stable and have an obvious positive/negative impact on the decisions. Second, RNN inherently processes all criteria jointly, assuming full correlation among them, which, however, contradicts the essence of preference independence assumption, where criteria are processed independently to help the DM simplify the decision problem and comprehend it better. Third, the output of standard RNN for classification problems is usually a probability for assignments, indicating no inherent superiority or inferiority between classes. Consequently, the cross-entropy loss function – the standard choice for classification problems – is not suitable for capturing the natural ordering in classes for the MCS problems. Moreover, as stated in Eq. (13), the optimized thresholds are outside the proposed TPL model, which may raise issues if one sets them in the same range, but better results would be obtained close to either class. Although this determination of thresholds is used in [27], the proposed mRNN can also handle such a problem.

4.2 Ensuring Monotonicity and Preference Independence

The sub-marginal value vector $\mathbf{f}^{t}(a)=(f_{1}^{t}(a),f_{2}^{t}(a),\cdots,f_{m}^{t}(a))^{\prime}\in\mathbb{R}^{m\times 1}$ for all $m$ criteria at the $t$ -th timestamp can be derived as a linear mapping from the hidden state $\mathbf{h}^{t}(a)$ :

\displaystyle\mathbf{f}^{t}(a)=\mathbf{W}_{f}\mathbf{h}^{t}(a),

(17)

where $\mathbf{W}_{f}\in\mathbb{R}^{m\times c}$ is a learnable weight matrix. The derivative of the sub-marginal value functions on all criteria at the $t$ -th timestamp should satisfy the following condition:

\displaystyle\frac{\partial\mathbf{f}^{t}(a)}{\partial\mathbf{g}^{t}(a)}

\displaystyle=\frac{\partial\mathbf{f}^{t}(a)}{\partial\mathbf{h}^{t}(a)}\frac{\partial\mathbf{h}^{t}(a)}{\partial\mathbf{g}^{t}(a)}=\mathbf{W}_{f}\tanh^{{}^{\prime}}(\cdot)\mathbf{W}_{g}\geq 0.

(18)

Given Eq. (18), since the derivative of $\tanh(\cdot)$ is strictly positive for all inputs, one only needs to impose a constraint during the training process to prohibit the training parameters, $\mathbf{W}_{f}$ and $\mathbf{W}_{g}$ , from negative values. Specifically, we bypass $\mathbf{W}_{f}$ and $\mathbf{W}_{g}$ through an activation function $\phi(\cdot)$ that guarantees the non-negativity of the output. This study employs the Rectified Linear Unit (ReLU), which sets all negative input values to zero and leaves positive input values unchanged, i.e., $\phi(x)=\max(x,0)$ . The advantages of adopting ReLU are its simplicity and computational efficiency and its ability to mitigate the vanishing gradient problem encountered during backpropagation in ANN [19]. In this way, we can guarantee that the sub-marginal value functions are monotonically correlated with the temporal criteria values.

To cater to the need for preference independence, we develop a novel architecture that processes each criterion in a parallel manner. Instead of sharing a common hidden state across all criteria, we allocate a unique one to each criterion whose parameters are trained independently. By adopting this approach, the hidden state of one criterion remains isolated from the others throughout the entire training procedure, ensuring that the obtained sub-marginal value function is not related to others. For simplicity, we assume the numbers of sub-intervals and timestamps are fixed, i.e., $\gamma_{j}^{t}=\gamma$ and $T_{j}=T,j=1,\ldots,m.$ The parameter update process for the proposed mRNN is as follows:

\displaystyle\mathbf{H}^{t}(a)=\tanh(\phi(\mathcal{W}_{v})\circledast\mathbf{V}^{t}(a)+\mathcal{W}_{h}\circledast\mathbf{H}^{t-1}(a)+\mathbf{B}),

(19)

where $\mathbf{V}^{t}(a)=[\mathbf{v}^{t}_{1}(a),\mathbf{v}^{t}_{2}(a),\cdots,\mathbf{v}^{t}_{m}(a)]\in\mathbb{R}^{m\times\gamma\times 1}$ are the transformed performance values on all criteria at timestamp $t$ . In this way, $\mathbf{H}^{t}(a)=[\mathbf{h}_{1}^{t}(a),\cdots,\mathbf{h}_{j}^{t}(a),\cdots,\mathbf{h}_{m}^{t}(a)]\in\mathbb{R}^{m\times s\times 1}$ considers both the performance values at the current $t$ -th timestamp and the historical information of $\mathbf{H}^{t-1}(a)$ . The operation $\circledast$ is defined as the tensor-wise multiplication:

	$\displaystyle\phi(\mathcal{W}_{v})\circledast\mathbf{V}^{t}(a)$	$\displaystyle=[\phi(\mathbf{W}_{v1})\mathbf{v}_{1}^{t}(a),\cdots,\phi(\mathbf{W}_{vj})\mathbf{v}_{j}^{t}(a),\cdots,\phi(\mathbf{W}_{vm})\mathbf{v}_{m}^{t}(a)]\in\mathbb{R}^{m\times s\times 1},$		(20)
	$\displaystyle\mathcal{W}_{h}\circledast\mathbf{H}^{t-1}(a)$	$\displaystyle=[\mathbf{W}_{h1}\mathbf{h}_{1}^{t-1}(a),\cdots,\mathbf{W}_{hj}\mathbf{h}_{j}^{t-1}(a),\cdots,\mathbf{W}_{hm}\mathbf{h}_{m}^{t-1}(a)]\in\mathbb{R}^{m\times s\times 1}.$		(21)

The parameters in $\phi(\mathcal{W}_{v}),\mathcal{W}_{h}$ and $\mathbf{B}$ are learnable. In particular, $\phi(\mathcal{W}_{v})=[\phi(\mathbf{W}_{v1}),\cdots,\phi(\mathbf{W}_{vj}),\cdots,\phi(\mathbf{W}_{vm})]\in\mathbb{R}^{m\times s\times\gamma}$ contains $m$ matrices with the size of $s\times\gamma$ , where $s$ is the hidden unit’s size and $\gamma$ is the pre-defined number of sub-intervals. Each matrix $\mathbf{W}_{vj}\in\mathbb{R}^{s\times\gamma}$ represents the sub-marginal values of the characteristic points in $\mathbf{v}^{t}_{j}(a)$ at $t$ -th timestamp. Moreover, $\mathcal{W}_{h}=[\mathbf{W}_{h1},\cdots,\mathbf{W}_{hj},\cdots,\mathbf{W}_{hm}]\in\mathbb{R}^{m\times s\times s}$ considers each criterion’s historical information of the previous $t-1$ timestamps in a preference independence manner by updating $\mathbf{h}_{j}^{t}(a)\in\mathbb{R}^{s\times 1}$ separately. At last, $\mathbf{B}=[\mathbf{b}_{1},\cdots,\mathbf{b}_{j},\cdots,\mathbf{b}_{m}]\in\mathbb{R}^{m\times s\times 1}$ is the matrix of bias. Figure 2 presents the training process of the proposed mRNN structure.

4.3 Learning Sub-marginal and Marginal Value Functions

This section discusses how to depict the DM’s preferences by the proposed mRNN structure. In Figure 3, we illustrate the aggregation of the sub-marginal value functions and the time discount factors. Recall that $\mathbf{f}^{t}(a)=[f^{t}_{1}(a),f^{t}_{2}(a),\cdots,f^{t}_{m}(a)]$ contains the sub-marginal values of all criteria at timestamp $t$ :

	$\displaystyle\mathbf{f}^{t}(a)$	$\displaystyle=\phi(\mathcal{W}_{f}^{t})\circledast\mathbf{H}^{t}(a)$		(22)
		$\displaystyle=[\phi(\mathbf{W}_{f1}^{t})\mathbf{h}^{t}_{1}(a),\phi(\mathbf{W}_{f2}^{t})\mathbf{h}^{t}_{2}(a),\cdots,\phi(\mathbf{W}_{fm}^{t})\mathbf{h}^{t}_{m}(a)],$		(23)

where $\phi(\mathcal{W}_{f}^{t})=[\phi(\mathbf{W}_{f1}^{t}),\phi(\mathbf{W}_{f2}^{t}),\cdots,\phi(\mathbf{W}_{fm}^{t})]\in\mathbb{R}^{m\times s\times 1}$ are learnable parameters and $\phi(\cdot)$ is the RuLU activation function to ensure the monotonicity. We ensure that each criterion’s sub-marginal value function and time discount factors are also learned independently.

We assume that the marginal value function $\mathbf{u}^{t}(a)=[u_{1}^{t}(a),u_{2}^{t}(a),\cdots,u_{m}^{t}(a)]$ at any given timestamp $t$ is composed of two distinct components. The first one comes from the discounted marginal value functions at timestamp $t-1$ , represented by $\boldsymbol{\tau}^{t-1}\odot\mathbf{u}^{t-1}(a)$ , where $\boldsymbol{\tau}^{t-1}=(\tau_{1,t-1},\ldots,$ $\tau_{j,t-1},\ldots$ , $\tau_{m,t-1})^{\prime}$ is a column vector of size $m$ for $(t-1)$ -th timestamp, and $\odot$ is the element-wise product. The other component is the sub-marginal value function $\mathbf{f}^{t}(a)$ at the current timestamp. Instead of using a fixed pre-defined time discount factor, mRNN allows $\boldsymbol{\tau}^{t-1}$ to be learnable:

\displaystyle\boldsymbol{\tau}^{t-1}=Q(\mathbf{H}^{t-1}(a)),

(24)

where $Q(\cdot)$ is a feed-forward network with a smaller size. In this way, our model can adaptively and dynamically adjust the impact of the sub-marginal values in the past, and the time discount factor is related to the input of criteria values. The marginal value function $\mathbf{u}^{t}(a)$ at timestamp $t$ can be obtained by:

\displaystyle\mathbf{u}^{t}(a)=\mathbf{f}^{t}(a)+\boldsymbol{\tau}^{t-1}\odot\mathbf{u}^{t-1}(a).

(25)

The comprehensive value function $U(a)$ can also be derived from the marginal value functions at the last timestamp $T$ :

$\displaystyle U(a)$	$\displaystyle=\sum_{j=1}^{m}u_{j}^{T}(g_{j}^{T}(a))$
	$\displaystyle=\sum_{j=1}^{m}f_{j}^{T}(g_{j}^{T}(a))+\tau_{j,T-1}u_{j}^{T-1}(g_{j}^{T-1}(a))$	(26)
	$\displaystyle=\sum_{j=1}^{m}f_{j}^{T}(g_{j}^{T}(a))+\tau_{j,T-1}f_{j}^{T-1}(g_{j}^{T-1}(a))+\cdots+\tau_{j,T-1}\cdots\tau_{j,1}f_{j}^{1}(g_{j}^{1}(a)).$

4.4 Loss Function with Ordinal Thresholds

Unlike in MCDA, the classification task in machine learning typically involves categories without inherent superiority or inferiority between them. Consequently, applying the cross-entropy loss function cannot capture the inherent orders of classes in MCDA problems, thus making it inefficient for decision-making.

An alternative solution is setting a decision boundary manually. For instance, such a boundary is often set to $0.5$ in a binary classification scenario, as the predicted probabilities typically range from 0 to 1. However, this configuration may not suit multiple classes, resulting in a sub-optimal outcome. Real-world decision-making problems require adaptive decision boundaries that can better accommodate the inherent complexities and variations in the data distribution.

To address this limitation, we present a novel loss function with ordinal thresholds, allowing the model to learn the classification threshold directly from the data and enabling the proposed mRNN structure to act like the threshold-based MCDA method. Note that the assignment of alternative $a$ to class $\hat{Cl}(a)$ can be considered by comparing its comprehensive value $U(a)$ to the thresholds $\theta_{0},\theta_{1},\cdots,\theta_{H}$ :

\hat{Cl}(a)=\begin{cases}Cl_{1},&\text{ if }\theta_{0}<U(a)\leq\theta_{1}\\ Cl_{2},&\text{ if }\theta_{1}<U(a)\leq\theta_{2}\\ \vdots&\\ Cl_{H},&\text{ if }\theta_{H-1}<U(a)\leq\theta_{H}\end{cases},

(27)

where $\theta_{0}$ and $\theta_{H}$ can be set as infinitely small and large numbers. Thus, we can derive the distribution of $\hat{Cl}(a)$ as:

\displaystyle P(\hat{Cl}(a)=Cl_{h}\mid\mathbf{v}(a))=P\left(\theta_{h-1}<U(a)\leq\theta_{k}\right)=\Phi\left(\theta_{h}-U(a)\right)-\Phi\left(\theta_{h-1}-U(a)\right),

(28)

where $\Phi(\cdot)$ is the cumulative distribution function of the standard normal distribution that can be approximated by the Sigmoid function [6]. A loss function can then be constructed to minimize the negative log-likelihood of the predicted results compared to the true labels. The loss function for the proposed mRNN structure is:

\displaystyle\mathcal{L}

\displaystyle=\sum\nolimits_{a\in A^{R}}\mathcal{L}(a)=-\sum\nolimits_{a\in A^{R}}\sum\nolimits_{h=1}^{H}\mathbf{1}_{Cl(a)=Cl_{h}}\cdot\log[\Phi\left(\theta_{h}-U(a)\right)-\Phi\left(\theta_{h-1}-U(a)\right)],

(29)

where $Cl(a)$ is the true class of alternative $a\in A^{R}$ , and the thresholds are learned by the neural networks. In C, we present a numerical example illustrating how to utilize the ordinal thresholds in Eq. (29). Still, the recommended class assignments are established by comparing the values of learned $U(\cdot)$ with class thresholds rather than suggesting a class with the greatest probability defined above.

5 Experiments on Real-world Data Concerning Gaming Industry

In this section, we showcase the effectiveness of the proposed quadratic optimization models and the mRNN in addressing MCS problems with temporal criteria through a real case study. The objective is to distinguish between high- and low-value users for a Multiplayer Online Battle Arena (MOBA) mobile game app. Our experiments aim to provide insights into the following key questions:

1.

To what extent does modeling temporal relationships contribute to improved performance in solving MCS problems with time series criteria?
2.

To what degree are the proposed optimization and deep learning-based models superior to conventional machine learning and MCDA methods?
3.

How does the performance of the proposed models change if we neglect the initial periods and consider only later timestamps?
4.

Can we extract meaningful interpretations from the predictions and user preferences, especially concerning the time discount factor and marginal value functions?

5.1 Data Description

The dataset utilized in this study is sourced from one of China’s most prominent global Internet companies. It comprises four distinct time series spanning from April 30, 2023, to May 30, 2023. These time series pertain to various aspects of user behavior within a MOBA mobile game app. Specifically, the four-time series encompass the following dimensions: (a) Purchase Amounts (CNY): This sequence tracks the total payment made by each user at each timestamp, denominated in Chinese Yuan (CNY). It provides insights into users’ spending behavior within the game. (b) Purchase Frequency: This sequence records how frequently a user makes purchases within the game at each timestamp. It quantifies the user’s transactional frequency. (c) Time Spent (hours): This sequence captures the total time each user actively engages in the game. The time is measured in hours and reflects users’ engagement levels. (d) Log-in Frequency: This sequence documents the frequency with which a user logs into the game. It indicates the user’s interaction frequency with the game.

The dataset encompasses a total of 3,080 users, each of whom is categorized as either “high-value” or “low-value”. This classification is determined based on the sum of each user’s in-game purchases over seven days, spanning from May 24, 2023, to May 30, 2023, which is a commonly used time window in user lifetime value evaluation [51]. Users with a cumulative purchase amount greater than zero during this period are designated as high-value, constituting 47% of the total user population. The remaining users are classified as low-value.

Table 1: The descriptive statistics of different types of users.

Type of Users	Criteria	Mean	Standard Deviation	Min	Max
All Users	Purchase amounts	12.82	76.66	0	3,686
	Purchase frequency	0.47	1.32	0	38
	Time spent	5.50	5.50	0	72.59
	Log-in frequency	2.67	2.15	0	22
High-Value Users	Purchase amounts	20.00	99.87	0	3,686
	Purchase frequency	0.70	1.62	0	38
	Time spent	7.35	6.15	0	72.56
	Log-in frequency	3.45	2.37	0	22
Low-Value Users	Purchase amounts	5.25	37.93	0	1,958
	Purchase frequency	0.22	0.84	0	27
	Time spent	3.55	3.85	0	36.49
	Log-in frequency	1.86	1.51	0	14

The descriptive statistics across all timestamps are provided in Table 1. We have filtered out the users who never logged in, i.e., those with criteria values at any timestamp being zero. The visual representations in Figure 4 underscore significant differences between high-value and low-value users in terms of their historical in-game purchases and time spent within the app. Specifically, high-value users exhibit statistically higher engagement levels in these aspects than their low-value counterparts. These insights are the foundation for the subsequent benchmark models and their respective implementation details, which will be elaborated upon in the relevant subsections.

It is imperative to underscore that the data depicted in Figure 4 pertains to average values derived from distinct user groups instead of individualized patterns. Consequently, certain criteria may exhibit apparent correlations. However, it is essential to disentangle the misconception that increased user payment frequency inevitably corresponds to higher aggregate expenditure. This is exemplified in Figure 6, which delineates the nuanced fluctuations in individual criteria values, demonstrating their non-interdependence. Notably, in mobile gaming, expenditure patterns manifest divergently, with some users making substantial one-time payments and others opting for recurrent, albeit smaller, transactions. Consequently, the statistical averages of purchase amounts and frequency are interlinked, not due to the intrinsic nature of individual behaviors but rather due to the heterogeneous spending practices across user cohorts. It is pivotal to emphasize that the purview of this study centers on individual behaviors, as their impacts can exhibit considerable heterogeneity. Thus, the criteria under consideration are far from redundant within this analytical framework.

The $F-score$ , defined as the harmonic mean of a system’s precision and recall values, is used as the primary performance metric in evaluation:

F-score_{i}=\frac{(1+\beta^{2})\cdot recall_{i}\cdot precision_{i}}{\beta^{2}\cdot recall_{i}+precision_{i}}

(30)

where $recall_{i}=\frac{m_{ii}}{\sum\nolimits_{j=1}^{H}m_{ij}}$ , $precision_{i}=\frac{m_{ii}}{\sum\nolimits_{j=1}^{H}m_{ji}}$ , and $m_{ij},i,j=1,\ldots,H$ is the number of alternatives whose real class is $Cl_{i}$ with prediction being $Cl_{j}$ . The parameter $\beta$ controls the trade-off between recall and precision ( $\beta=1$ in most cases). The final $F-score=\frac{\sum\nolimits_{i=1}^{H}F-score_{i}}{H}$ is computed as the average $F-score$ across all classes, providing a comprehensive evaluation metric for the models’ performance. Apart from the $F$ measure, precision, and recall, we also report $accuracy=\frac{\sum\nolimits_{i=1}^{H}m_{ii}}{|A|}$ as the most intuitive measure reflecting the performance of predictive models.

All studies and experiments were run on Dell Precision 7920 Workstations with Intel(R) Xeon(R) Gold 6256 CPU at 3.60GHz, and NVIDIA Quadro GV100 GPUs. All models were implemented in Python 3.8. The versions of the main packages of our code are Pytorch 1.8.1+cu102, Scikit-Learn: 0.23.2, Numpy: 1.19.2, Pandas: 1.1.3, Matplotlib: 3.3.2. In addition, Pulp⁶⁶6https://coin-or.github.io/pulp/ and Mosek⁷⁷7https://www.mosek.com/ libraries were used to implement the optimization-based models.

5.2 Comparison with Baseline Models

We will first verify if conventional machine learning models and standard MCDA methods can successfully discriminate between high- and low-value without considering the time discount factor. This will let us understand the roles played by modeling temporal relationships. In this regard, we will compare the following methods:

1.

Temporal preference learning (TPL) model: This model corresponds to the optimization-based approach introduced in problem statement $P1$ . It optimizes the model parameters using all available data simultaneously. The optimization problem is solved using the CVXOPT Python package with the Mosek optimizer.
2.

The UTilités Additives DIScriminantes (UTADIS) method [52]: It is a classic approach for MCS problems. It utilizes linear programming to minimize misclassification errors, defined as the average distance of comprehensive values and thresholds corresponding to the desired classes. In this study, each timestamp is treated as an individual criterion in UTADIS, and a threshold-based preference model in the form of piecewise linear functions with equal segments is inferred using the Pulp Python package.
3.

UTADIS-Tensorial (UTADIS-T): It is adapted for MCS problems based on [7]. Specifically, we manually derive two descriptive statistics, including mean and slope coefficients, as illustrated in [7]. The first indicator can measure the average criteria value in the time series, while the second indicator can describe the volatility of each time series.
4.

Logistic regression (LR) [29]: It is a linear model aggregating input attributes (criteria in this study) using weighted sums. It transforms the output using a function determining the probability of assigning an alternative to a desired class. The hyper-parameter $C$ controls the degree of regularization applied to the model.
5.

Support vector machines (SVM) [13]: It is a supervised learning algorithm that maps training examples to points in space to maximize the margin between two classes. This study uses the criteria values for all timestamps as inputs to SVM. A Radial Basis Function kernel is employed to learn a non-linear classifier. The hyper-parameter $C$ controls the trade-off between maximizing the margin and minimizing classification errors on the training data.
6.

Random forests (RF) [28]: It is an ensemble learning method that aggregates multiple decision trees during training. It is often used as a black-box model because it can provide predictions across a wide range of data with minimal configuration. Hyper-parameters such as n_estimators, max_features, and max_depth play critical roles in controlling the behavior and performance of the ensemble of decision trees in RF.
7.

Extreme gradient boosting (XGB) [10]: It is a sequential model based on a gradient-boosting algorithm. It differs from bagging algorithms and can be parallelized. XGB incorporates regularization techniques to generate smaller trees, mitigating overfitting issues. Hyper-parameters like n_estimators and max_depth are crucial for controlling the behavior and performance of the ensemble of decision trees in XGB.
8.

Explainable Ordinal Factorization Model (XOFM) [25]: It is derived from the factorization machine and also uses piecewise linear functions to decipher the criteria impacts on the ordinal regression problems. Moreover, XOFM can account for low-order interacting criteria, which preserves comparable performance.

Note that we treat the classic UTADIS as a benchmark MCDA model, serving as a reference point for evaluating the performance of the proposed approaches. UTADIS-T is its recently proposed counterpart considering the temporal information in MCS. In turn, LR, SVM, RF, and XGB are widely used machine learning models for classification tasks where classes may not exhibit a natural order. They are trained to predict the probability of high-value using cross-entropy loss. Finally, XOFM is designed for ordinal classification problems.

We will also check if the proposed model performs better in user value evaluation problems than standard deep learning methods. The following methods will be compared in the experiments:

1.

Monotonic recurrent neural network (mRNN): It is a novel deep preference learning model designed for temporal criteria with learnable time discount factors. Several adaptations are made to ensure the criteria monotonicity, preference independence, and natural order between classes assumptions are satisfied. This model captures the temporal dynamics in the criteria while maintaining these critical properties.
2.

Multi-layer perceptron (MLP) [43]: It is a fully connected ANN. It can capture non-linear transformations and interactions between criteria, making it suitable for distinguishing data that is not linearly separable. This study employs a three-layered MLP as a deep learning-based baseline, with criteria values from all timestamps directly used as input.
3.

Recurrent neural network (RNN) [45]: It is a type of sequential ANN that considers internal states to process sequential data. The output from the current state can influence subsequent inputs. The proposed mRNN is adapted from the standard RNN architecture to better describe and address MCS problems with temporal criteria.
4.

Gated recurrent unit (GRU) [11]: It is a RNN variant incorporating a gating mechanism. It features a forget gate, allowing only some information from previous states to pass to the subsequent states. GRU is designed to capture long-range dependencies in sequential data efficiently.

The DL-based models, such as MLP, RNN, and GRU, are known for their efficiency in handling large-scale and high-dimensional data [35]. The hyper-parameters and DL model structures are tuned using the ten-fold cross-validation. The best parameters are presented in D.

Table 2 presents the values of all four measures averaged across ten-fold cross-validation. First, despite not explicitly learning temporal information, conventional machine learning models demonstrate comparable performance to the newly introduced, optimization-based TPL model. They treat each timestamp as an independent criterion, leveraging their robust predictive capabilities for handling high-dimensional data. Notably, these models mitigate data overfitting issues associated with increasing data dimensions. The UTADIS method, being a classic MCDA approach for MCS problems, exhibits the best precision performance but the worst recall performance. This indicates that most assignment results may go into the same class. It is less suited for scenarios with many criteria, particularly when each timestamp is treated separately. Consequently, it exhibits the weakest performance when temporal information is not considered. Additionally, it is noteworthy that the two state-of-the-art ordinal regression models, namely XOFM and UTADIS-T, exhibit superior performance solely in the realm of precision when compared with the proposed TPL and mRNN. This distinction arises because XOFM fails to consider temporal relationships inherent in time series data. In parallel, UTADIS-T relies solely on descriptive statistics to encapsulate preference-changing patterns, which proves less efficacious in mobile gaming scenarios characterized by highly stochastic in-game behaviors among users [51].

Table 2: Experimental results for user value evaluation across four metrics. Means and standard deviations are calculated via ten-fold cross-validation. Bold highlights the top-performing results, while stars (^∗) denote models significantly lower than the best-performing models in each metric (Wilcoxon statistic test

{}^{***}p<0.01,^{**}p<0.05,^{*}p<0.1

Method	Precision	Recall	F-score	Accuracy
SVM	0.811 $\pm$ 0.026^∗∗∗	0.746 $\pm$ 0.011^∗∗∗	0.777 $\pm$ 0.016^∗∗∗	0.781 $\pm$ 0.012^∗∗∗
Logistic	0.822 $\pm$ 0.028	0.722 $\pm$ 0.016^∗∗∗	0.768 $\pm$ 0.008^∗∗∗	0.777 $\pm$ 0.009^∗∗∗
RF	0.815 $\pm$ 0.032	0.754 $\pm$ 0.008^∗∗	0.783 $\pm$ 0.017^∗∗∗	0.786 $\pm$ 0.016^∗∗
XGB	0.788 $\pm$ 0.031^∗∗∗	0.762 $\pm$ 0.016^∗∗	0.774 $\pm$ 0.017^∗∗∗	0.773 $\pm$ 0.013^∗∗∗
MLP	0.819 $\pm$ 0.030	0.744 $\pm$ 0.012^∗∗∗	0.779 $\pm$ 0.012^∗∗∗	0.784 $\pm$ 0.013^∗∗∗
RNN	0.820 $\pm$ 0.023	0.753 $\pm$ 0.013^∗∗∗	0.785 $\pm$ 0.013^∗∗∗	0.789 $\pm$ 0.010^∗∗
GRU	0.830 $\pm$ 0.031	0.752 $\pm$ 0.015^∗∗∗	0.788 $\pm$ 0.014^∗∗	0.793 $\pm$ 0.012
UTADIS	0.831 $\pm$ 0.028	0.436 $\pm$ 0.027^∗∗∗	0.571 $\pm$ 0.023^∗∗∗	0.665 $\pm$ 0.015^∗∗∗
UTADIS-T	0.823 $\pm$ 0.025	0.533 $\pm$ 0.019^∗∗∗	0.647 $\pm$ 0.022^∗∗∗	0.716 $\pm$ 0.018^∗∗∗
XOFM	0.813 $\pm$ 0.022	0.625 $\pm$ 0.015^∗∗∗	0.707 $\pm$ 0.015^∗∗∗	0.727 $\pm$ 0.022^∗∗∗
TPL	0.757 $\pm$ 0.092	0.784 $\pm$ 0.098	0.770 $\pm$ 0.023^∗∗∗	0.745 $\pm$ 0.052^∗∗∗
mRNN $(\gamma=2$ )	0.804 $\pm$ 0.024^∗∗	0.784 $\pm$ 0.034	0.794 $\pm$ 0.024	0.792 $\pm$ 0.018
mRNN $(\gamma=4$ )	0.804 $\pm$ 0.029^∗∗∗	0.783 $\pm$ 0.017	0.793 $\pm$ 0.0183^∗	0.791 $\pm$ 0.018^∗∗
mRNN $(\gamma=6$ )	0.802 $\pm$ 0.042^∗∗	0.789 $\pm$ 0.032	0.794 $\pm$ 0.020	0.791 $\pm$ 0.021
mRNN $(\gamma=8$ )	0.814 $\pm$ 0.023^∗∗∗	0.781 $\pm$ 0.028	0.797 $\pm$ 0.019	0.796 $\pm$ 0.016

Regarding the performance of DL-based methods, the findings highlight several important observations. First, traditional MCDA methods like UTADIS and standard machine learning techniques demonstrate comparatively lower performance. Their limitations in effectively capturing and utilizing temporal information contribute to this disparity. Second, DL-based models, in general, perform better in handling temporal criteria in MCS problems. The model equipped with the proposed mRNN structure stands out as the top performer in three measures. Its effectiveness can be attributed to integrating prior knowledge about the monotonic preference direction and incorporating both sub-marginal value functions and loss functions tailored for ordinal classes. It is worth noting that increasing the pre-defined number of sub-intervals in the mRNN model does not consistently enhance its efficacy. In fact, there is a risk of overfitting when using a more significant number of sub-intervals. This explains why the mRNN model with $\gamma=6$ outperforms the one with $\gamma=8$ , underscoring the importance of carefully selecting hyperparameters to avoid overfitting.

5.3 Predictive Performance When Considering Different Numbers of Timestamps

In this section, we report the performance of the mRNN method when limiting its use to various numbers of timestamps. Specifically, its analysis is limited to the last 7 and 14 timestamps (hence omitting the initial period from consideration) or utilizing the entire time series (i.e., 24 timestamps). Table 3 presents the outcomes derived from this perspective.

Table 3: Results accounting for the last 7 (a week) and 14 (half of a month) timestamps and the entire time series (i.e., 24 timestamps).

Method	Precision	Recall	F-score	Accuracy
		$T=7$
mRNN $(\gamma=2$ )	0.7908 $\pm$ 0.0287^∗∗∗	0.7483 $\pm$ 0.0236^∗∗	0.7684 $\pm$ 0.0156^∗∗∗	0.7692 $\pm$ 0.0129^∗∗∗
mRNN $(\gamma=4$ )	0.7950 $\pm$ 0.0232^∗∗∗	0.7516 $\pm$ 0.0165^∗∗∗	0.7724 $\pm$ 0.0130^∗∗∗	0.7732 $\pm$ 0.0104^∗∗∗
mRNN $(\gamma=6$ )	0.7897 $\pm$ 0.0286^∗∗∗	0.7627 $\pm$ 0.0244^∗∗	0.7755 $\pm$ 0.0185^∗∗∗	0.7742 $\pm$ 0.0138^∗∗∗
mRNN $(\gamma=8$ )	0.7975 $\pm$ 0.0282^∗∗∗	0.7542 $\pm$ 0.0247^∗∗	0.7748 $\pm$ 0.0178^∗∗∗	0.7756 $\pm$ 0.0143^∗∗∗
		$T=14$
mRNN $(\gamma=2$ )	0.8066 $\pm$ 0.0257^∗	0.7720 $\pm$ 0.0166^∗	0.7885 $\pm$ 0.0113^∗∗	0.7878 $\pm$ 0.0118^∗∗∗
mRNN $(\gamma=4$ )	0.8071 $\pm$ 0.0193	0.7697 $\pm$ 0.0235^∗∗	0.7876 $\pm$ 0.0134^∗∗∗	0.7875 $\pm$ 0.0110^∗∗
mRNN $(\gamma=6$ )	0.8092 $\pm$ 0.0281	0.7720 $\pm$ 0.0272	0.7894 $\pm$ 0.0137^∗∗	0.7891 $\pm$ 0.0136^∗∗
mRNN $(\gamma=8$ )	0.8145 $\pm$ 0.0280	0.7689 $\pm$ 0.0215^∗	0.7905 $\pm$ 0.0133	0.7914 $\pm$ 0.0114
		$T=24$
mRNN $(\gamma=2$ )	0.8042 $\pm$ 0.0236	0.7839 $\pm$ 0.0339	0.7935 $\pm$ 0.0236	0.7917 $\pm$ 0.0183
mRNN $(\gamma=4$ )	0.8035 $\pm$ 0.0290	0.7834 $\pm$ 0.0171	0.7930 $\pm$ 0.0175^∗	0.7906 $\pm$ 0.0176^∗∗
mRNN $(\gamma=6$ )	0.8023 $\pm$ 0.0422	0.7890 $\pm$ 0.0321	0.7943 $\pm$ 0.0200	0.7907 $\pm$ 0.0205
mRNN $(\gamma=8$ )	0.8137 $\pm$ 0.0225	0.7809 $\pm$ 0.0283	0.7966 $\pm$ 0.0192	0.7961 $\pm$ 0.0163

Our findings reveal the optimal performance of mRNN when leveraging the entire time series for analysis. With an augmentation in the time series length, the model gains a more comprehensive understanding of users’ in-game purchase patterns, thereby enhancing predictive capabilities. Notably, the performance improvement becomes relatively marginal beyond $T=14$ . This can be attributed to the inherent randomness characterizing online gaming users’ behaviors, resulting in sparse records dominated by zero data points [51]. While longer sequences prove advantageous for deep learning-based models [35], their impact on the current user value is less pronounced compared to recent behaviors. Consequently, the observed improvements in predictions remain marginal.

5.4 Interpreting User Preferences

In this section, we analyze users’ preferences, focusing on two distinct perspectives. The primary objective is to explore the user’s marginal value function at each temporal timestamp, revealing its dynamic evolution over time. In F, we present a representative user’s sub-marginal values, time discount factors, and marginal values at each timestamp. This way, we show how to obtain comprehensive values of alternatives and then determine the classification.

Figure 5(a) presents the results delivered by UTADIS. Notably, it exhibits a counterfactual pattern, with the marginal values consistently remaining at zero for most timestamps. Only a limited number of timestamps, specifically in the purchase amount criterion, appear to impact user value significantly. This unconventional pattern raises questions about the UTADIS method’s ability to capture nuanced temporal information when analyzing user preferences effectively.

In contrast, the marginal value functions derived from the proposed TPL method display a more coherent and dynamic pattern (see Figure 5(b)). It assigns greater weight to the criteria purchase amount and purchase frequency, with most timestamps affecting discriminating user value. This aligns with the intuitive expectation that users who have spent more in terms of amount and frequency in the past are more likely to pay more in the future. However, these models neglect the impact of in-game activities, such as time spent in the game and log-in frequency.

Figure 5(c) represents the marginal value functions learned by the proposed mRNN. It confirms a more balanced weight to criteria such as in-game time spent and log-in frequency. Also, the marginal value functions do not approach zero as closely as those shown in Figure 5(b). The flexibility introduced by the learnable time discount factors in the mRNN model allows for the adjustment of relationships across timestamps, taking into account the specific characteristics of each user. This adaptability enables the mRNN model to capture the influence of in-game activities on user preferences, even when such activities do not directly contribute to monetary value.

The analysis underscores the limitations of UTADIS in capturing temporal preferences. Moreover, it highlights the advantages of the proposed TPL and mRNN methods in modeling complex temporal relationships in user preferences. In particular, mRNN demonstrates the capability to consider a broader impact of criteria and timestamps, making it a valuable tool for understanding and predicting user behavior.

The second aspect of our analysis involves an exploration of learnable time discount factors, affording us a deeper comprehension of users’ temporal behaviors. Notably, this investigation is exclusively addressed only by the proposed mRNN model. As elucidated in Figure 6, we present a visual representation of the learned discount factors, considering distinct users as examples. Meanwhile, we also offer graphical depictions of the normalized purchase amount, purchase frequency, time spent, and log-in frequency to understand the implications of the learned time discount factors.

Specifically, Figure 6(a) shows a high-valued user who frequently makes in-game purchases and log-in activities. In this scenario, the learned time discount factors for the purchase frequency criterion consistently maintain elevated values, reflecting the model’s imperative to retain a comprehensive memory of valuable historical information. In contrast, Figure 6(b) portrays a low-valued user whose involvement in in-game purchases and log-in activities is notably sparse. Here, the learned time discount factors rapidly decline, signifying the scarcity of valuable historical data. This observation is further substantiated by the purchase frequency criterion, which exhibits a discernible fluctuating pattern of decrease, increase, and subsequent decrease, mirroring a different trend observed in Figure 6(a).

5.5 Managerial Implications

Given the ubiquity of precision marketing, identifying high-value users is a pivotal reference for DM. Notably, prevailing MCDA methods exhibit proficiency in efficiently handling static criteria. However, challenges emerge as the data scale expands and becomes dynamic. In this context, user sequences, encapsulating personal behavioral patterns over time, emerge as valuable repositories of long-term insights for each user. The proposed models excel in offering precise predictions and interpretable outcomes pertaining to evolving user preference patterns over time. This capability proves particularly valuable in the following two applications.

Given the temporal analytical capabilities of the proposed models, a notable avenue of enhancement lies in alleviating the constraints associated with fixed recommended displays, thereby enabling what is termed “popup recommendation”. In practical terms, this implies the flexibility to expose recommendations at any point in time. In the context of our mobile gaming scenario, a pivotal task involves determining the optimal timing and target user for recommendation triggering. Real-time execution of our model allows for identifying users with higher comprehensive values, signifying heightened purchase intentions and, consequently, an augmentation of in-game revenue. Furthermore, the analysis of sub-marginal value functions facilitates an in-depth exploration of the nuanced impacts of diverse historical behaviors on the differentiation of high-value users. This nuanced understanding serves as a valuable reference for DM in crafting personalized and effective marketing strategies.

In mobile gaming, another impactful application emerges in the form of churn intervention. The key to this intervention lies in targeting users predicted to be of high value, yet exhibiting a rapid decline in their time discount factors over the recent period. Such a decline signifies either inactivity or a heightened risk of user departure. Beyond merely discriminating high-value users, the visual representation of learnable time discount factors, exemplified in Figure 6, offers a novel avenue for DM to identify those prone to churn. To proactively address this, the DM can devise custom tasks designed to incentivize continuous log-ins, subsequently rewarding users upon task completion. This strategic approach not only aids in user retention but also aligns with a personalized and targeted intervention strategy informed by predictive analytics.

6 Simulation Experiments

To further demonstrate the efficacy of the proposed models, in this section, we report the results of simulation experiments on four synthetic data with different complexity. We aim to answer the following questions:

1.

Do the proposed models perform better than conventional machine learning and MCDA methods when the time series become complex?
2.

Are the learned marginal value functions able to describe the characteristics of the original data generators?
3.

How does the proposed mRNN perform concerning different pre-defined numbers of sub-intervals?

6.1 Data Generation Process and Experimental Settings

The data generation process (DGP) employed in these experiments utilizes the Fourier decomposition method. This well-established technique allows for generating time series data with varying complexities and temporal patterns. It is based on the fundamental principle that any time series can be represented as a linear combination of sine or cosine basis functions.

Specifically, we create a set of nine basis functions, each corresponding to a sine function with a distinct frequency. These frequencies range from 0.1 to 0.5 with an increment of 0.05, resulting in a set ${\sin(\omega_{1}\pi t),\cdots,\sin(\omega_{9}\pi t)}$ (see Figure 7(a)). The performance value at each timestamp is generated by a linear combination of these basis functions, with weights randomly sampled from a Dirichlet distribution, ensuring that they sum to one:

\displaystyle g_{j}^{t}(a)

\displaystyle=\beta_{1}\sin(\omega_{1}\pi t)+\cdots+\beta_{9}\sin(\omega_{9}\pi t),\quad\mbox{s.t.}\quad\sum\nolimits_{i=1}^{9}\beta_{i}=1.

(31)

This way, we can generate diverse time series data with different frequencies and temporal patterns, which are essential for verifying the proposed models’ capacity to handle temporal information:

The DGP provides a versatile framework for constructing time series with varying frequencies and temporal patterns. We pre-define sub-marginal value functions as the ground model and then define different procedures for generating marginal values and reference assignment examples given these sub-marginal values. The aim is to use the proposed models to restore the simulated assignment examples in different settings and verify whether they can capture the characteristics of the DGP settings. The latter include the time discount factor distributions and the monotonicity of the sub-marginal value functions. To achieve this goal, we have designed four distinct experiments, each catering to specific temporal characteristics:

1.

Basic experiment: It represents the most straightforward scenario, establishing the simplest DGP for time series data. In this case, the sub-marginal value function, denoted as $f_{j}^{t}(a)$ , is a monotonically transformed function of the performance value $g_{j}^{t}(a)$ at the current timestamp. Specifically, we define it as $f_{j}^{t}(a)=\tanh(g_{j}^{t}(a))$ . .
2.

Non-Markovian experiment: It introduces temporal dependencies into the sub-marginal value function, which not only depends on the performance value at the current timestamp but also incorporates the performance value from the previous timestamp. Mathematically, it is defined as $f_{j}^{t}(a)=\tanh\left(g_{j}^{t}(a)+g^{t-1}_{j}(a)\right)$ .
3.

Non-monotonic experiment: It incorporates sub-marginal value functions deliberately made non-monotonic. Instead of a simple transformation, the function for criterion $j$ at timestamp $t$ is determined by a sine function: $f_{j}^{t}(a)=\sin\left(2^{j-1}\pi g_{j}^{t}(a)\right)$ . This experiment explores the models’ ability to handle non-monotonic relationships in the data.
4.

Non-independent experiment: It represents the most complex scenario, where the assumption of independence between criteria is relaxed. To achieve this, we modify the DGP by introducing an interaction term into the sub-marginal value function. Specifically, it is defined as $f^{t}(a)=\tanh\left(g_{1}^{t}(a)+g_{2}^{t}(a)\right)+\tanh\left(g_{3}^{t}(a)+g_{4}^{t}(a)\right)$ . This experiment assesses the models’ performance when criteria interact, making the data more intricate.

In each experiment, $u_{j}^{t}(a)$ is determined by iteratively applying a time discount factor $\tau_{j,t-1}$ to the previous marginal value $u_{j}^{t-1}(a)$ , along with the current sub-marginal value $f_{j}^{t}(a)$ , i.e., $u_{j}^{t}(a)=\tau_{j,t-1}u_{j}^{t-1}(a)+(1-\tau_{j,t-1})f_{j}^{t}(a)$ , where $\tau_{j,t-1}=\frac{1}{1+\exp(-u_{j}^{t-2}(a))}$ , indicating that a higher marginal value exerts a more significant influence on the subsequent period.

Deriving comprehensive values from the temporal marginal value functions involves summing up all the values of marginal functions at the last timestamp. Such a comprehensive value is then transformed to the range between 0 and 1 using a Sigmoid function. The distribution of these comprehensive values for all the data points in the Basic experiment is depicted in Figure 7(b). To determine class labels, samples with a comprehensive value less than 0.5 are labeled as negative samples, while those with a comprehensive value at least as good as 0.5 are labeled as positive samples. This data generation process ensures that the time series data exhibits dynamic temporal patterns, making it well-suited for training and evaluating models like mRNN.

For each of the four experiments, we generated a total of 3,000 data samples. Each data sample consists of four distinct time series, and all time series have a fixed length of 20 timestamps. Figure 7(c) presents an example data sample. We adopt a robust ten-fold cross-validation approach for data splitting to ensure an unbiased and reliable evaluation of the model’s performance. In each fold, the data is further partitioned into three distinct subsets: the training set, the validation set, and the test set, with proportions of 60%, 20%, and 20%, respectively. This systematic approach allows us to assess the models’ generalization performance effectively.

6.2 Experimental Results

The experimental results, shown in Table 4, are averaged across the ten folds to derive the $F-score$ , comprehensively evaluating the models’ generalization capabilities. The detailed results regarding $precision$ , $recall$ , and $accuracy$ measures are presented in E.

Table 4: Experimental results (

F-score

) for simulation experiment. Means and standard deviations are calculated via ten-fold cross-validation. Bold highlights the top-performing methods, while stars (^∗) denote models significantly worse than the best-performing models (

{}^{***}p<0.01,^{**}p<0.05,^{*}p<0.1

Method	Basic	Non-Markovian	Non-monotonic	Non-independent
SVM	0.913 $\pm$ 0.009^∗∗∗	0.874 $\pm$ 0.016^∗∗∗	0.588 $\pm$ 0.028^∗∗∗	0.899 $\pm$ 0.016^∗∗∗
Logistic	0.906 $\pm$ 0.008^∗∗∗	0.881 $\pm$ 0.009^∗∗∗	0.620 $\pm$ 0.024^∗∗∗	0.925 $\pm$ 0.009^∗∗∗
RF	0.924 $\pm$ 0.007^∗∗∗	0.886 $\pm$ 0.019^∗∗∗	0.623 $\pm$ 0.027^∗∗∗	0.839 $\pm$ 0.017^∗∗∗
XGB	0.935 $\pm$ 0.009^∗∗∗	0.910 $\pm$ 0.007^∗∗∗	0.661 $\pm$ 0.024	0.887 $\pm$ 0.011^∗∗∗
MLP	0.914 $\pm$ 0.006^∗∗∗	0.878 $\pm$ 0.009^∗∗∗	0.608 $\pm$ 0.021^∗∗∗	0.920 $\pm$ 0.012^∗∗∗
RNN	0.912 $\pm$ 0.008^∗∗∗	0.882 $\pm$ 0.013^∗∗∗	0.610 $\pm$ 0.024^∗∗∗	0.953 $\pm$ 0.008
GRU	0.912 $\pm$ 0.008^∗∗∗	0.884 $\pm$ 0.009^∗∗∗	0.600 $\pm$ 0.019^∗∗∗	0.952 $\pm$ 0.009
UTADIS	0.891 $\pm$ 0.015^∗∗∗	0.892 $\pm$ 0.011^∗∗∗	0.585 $\pm$ 0.025^∗∗∗	0.857 $\pm$ 0.013^∗∗∗
UTADIS-T	0.897 $\pm$ 0.018^∗∗∗	0.900 $\pm$ 0.016^∗∗∗	0.600 $\pm$ 0.025^∗∗∗	0.865 $\pm$ 0.011^∗∗∗
XOFM	0.904 $\pm$ 0.019^∗∗∗	0.911 $\pm$ 0.010^∗∗∗	0.600 $\pm$ 0.029^∗∗∗	0.875 $\pm$ 0.021^∗∗∗
TPL	0.893 $\pm$ 0.018^∗∗∗	0.909 $\pm$ 0.030^∗∗∗	0.606 $\pm$ 0.048^∗∗	0.894 $\pm$ 0.020^∗∗∗
mRNN $(\gamma=2$ )	0.943 $\pm$ 0.011^∗∗∗	0.920 $\pm$ 0.013^∗∗∗	0.634 $\pm$ 0.029^∗∗	0.906 $\pm$ 0.011^∗∗∗
mRNN $(\gamma=4$ )	0.952 $\pm$ 0.017^∗	0.940 $\pm$ 0.016^∗∗	0.636 $\pm$ 0.025^∗∗	0.901 $\pm$ 0.014^∗∗∗
mRNN $(\gamma=6$ )	0.961 $\pm$ 0.005	0.948 $\pm$ 0.012	0.638 $\pm$ 0.024^∗∗	0.902 $\pm$ 0.015^∗∗∗
mRNN $(\gamma=8$ )	0.959 $\pm$ 0.008	0.952 $\pm$ 0.006	0.624 $\pm$ 0.031^∗∗∗	0.899 $\pm$ 0.016^∗∗∗

The superior performance of the proposed mRNN model in the basic and non-Markovian experiments across various values of $\gamma$ can be attributed to its ability to overcome intrinsic limitations present in other baseline models. Conventional machine learning models are not well-equipped to effectively capture the temporal dependencies inherent in sequential data, which leads to suboptimal results in scenarios with time series criteria. In turn, traditional deep learning models like RNN and GRU can model temporal dependencies but typically compute the comprehensive value of each sample only at the final timestamp, potentially discarding crucial information from earlier timestamps. Additionally, these models output probabilities for each category, necessitating a manual decision on classification by, e.g., fixing a threshold at 0.5 for binary classification tasks. Nevertheless, such pre-determined thresholds may not be universally applicable, thus potentially yielding suboptimal performance in some cases.

Given these limitations, the proposed mRNN model is a promising solution that effectively addresses the abovementioned challenges. The proposed adaptive loss function also empowers the model to dynamically learn and determine classification thresholds based on the underlying data distribution, imparting flexibility and precision in decision-making. The outcomes of the Basic experiment affirm these assertions. Additionally, the non-Markovian experiment serves to reinforce these findings, as it elucidates that the performance of mRNN remains unaltered in the presence of a non-Markovian sub-marginal value function. In contrast, the performance of all comparative baselines experiences a decline.

Recall that the proposed mRNN satisfies the preference independence and monotonicity assumptions, which may decline the model performance when the data become complex. This is confirmed by the results of Non-monotonic and Non-independent experiments (see Table 4). When the sub-marginal value functions are not monotonic, our model’s performance is not the highest. However, it closely follows XGB, which is ranked as the top-performing method. It demonstrates that although we have constrained the monotonicity of the sub-marginal value function, the proposed model can provide remarkable robustness in non-monotonic cases, achieving comparable performance with the baseline methods. As for the preference-dependent case, GRU and RNN models outperform all other methods, highlighting the superior performance of deep learning-based sequence models in capturing interactive information between criteria. In contrast, the proposed model assigns an independent hidden state to each criterion and utilizes separate parameters for training. While this design choice ensures complete independence between criteria, it also leads to a potential loss of interactive information, deteriorating our model’s effectiveness.

6.3 Ability for Reconstructing Sub-marginal Value Functions

In addition to demonstrating superior classification performance, the methods proposed in this paper offer the advantage of interpretability owing to the incorporation of prior knowledge regarding monotonic constraints. During DGP, we impose a $tanh$ function as the sub-marginal value function for all time series. We can use this prior knowledge to visualize and analyze the learned sub-marginal value functions at different timestamps. The results, depicted in Figures 8(b) and 8(c), reveal that all the learned sub-marginal value functions from proposed TPL and mRNN models closely resemble the $tanh$ function’s characteristic shape. This observation confirms that the model effectively captures and learns the inherent monotonic constraints presented in the data.

Furthermore, the analysis of Figures 8(b) and 8(c) indicates that the values from the marginal value function at later timestamps consistently exhibit higher values than those from the function at preceding timestamps. This observation implies that the later time points exert a pronounced influence on the classification outcomes, which align with the underlying DGP, wherein the marginal values associated with earlier timestamps progressively diminish over time, resulting in a reduced impact on classification decisions when compared to more recent timestamps. In contrast, the conventional UTADIS approach, lacking the capacity to capture temporal effects, manifests a static marginal value function across all timestamps.

6.4 Model Performance for Different Numbers of Sub-intervals

The performance of the mRNN model in the simulation experiments is influenced by the number of pre-defined sub-intervals, as shown in Table 4. Interestingly, increasing the number of sub-intervals does not consistently lead to improved mRNN performance. In the first two simulation experiments, the mRNN model achieves the best results when $\gamma=6$ , and smaller numbers of sub-intervals yield better performance. However, in the last two simulation experiments, where the data complexity is higher due to non-monotonic and non-independent time series, mRNN performs better with smaller values of $\gamma$ .

This behavior can be explained by considering the trade-off between model complexity and overfitting. More sub-intervals lead to a more complex model, which has the potential to capture intricate patterns in the data. However, this increased complexity also raises the risk of overfitting, especially when the dataset size is limited. In contrast, a simpler model with fewer sub-intervals may generalize better when the data is noisy or exhibits non-linear patterns.

The observation that a smaller pre-defined number of sub-intervals and a slightly higher $\gamma$ can be effective aligns with the findings from the real-world case study, where $\gamma=6$ achieved the best average $F-score$ . This suggests that in practice, it may be advisable to start with a small number of pre-defined sub-intervals and adjust $\gamma$ based on the specific requirements and characteristics of the decision problem and dataset. This approach allows for flexibility in model complexity and can help avoid overfitting while still meeting the DM’s needs.

7 Conclusions

This study considered the sorting problems in the presence of temporal criteria. Such criteria involve time series data that may represent preference patterns changing over time. The capability to handle such non-static features is essential for solving numerous real-world problems in economy, energy, and medicine.

We proposed two value-based approaches for handling such problems, employing additive piecewise linear functions as the basic preference model for each timestamp. On the one hand, we formulated a convex quadratic programming model with a fixed time discount factor under a regularization framework. On the one hand, we proposed a deep preference learning model that allows the time discount factors to be learnable, hence investigating the dynamics of changing preferences over time. It satisfies monotonicity and preference independence conditions and captures natural order between classes by adapting the recurrent neural network.

We applied the proposed models to a real-world decision-making problem where the mobile game users are assigned to different classes based on their historical behavioral sequences. The study provides an in-depth analysis of users’ evolving preferences captured via marginal value functions at all timestamps as well as the learned time discount factors, affording a more profound comprehension of temporal behaviors. These outcomes may be used for recommendation triggering and churn intervention, which is essential in the mobile gaming industry.

We also tested the proposed methods on synthetic data with various temporal characteristics. We demonstrated the superior performances of the proposed optimization- and neural network-based models over the baseline multiple criteria decision aiding, machine learning, and deep learning approaches. Among the novel methods, the one based on suitably adapted recurrent networks proved more computationally efficient and expressive, leading to better predictive performance than the optimization model.

Future work can focus on more complex time series. For instance, the interval of timestamps can be irregularly spaced, which cannot be directly handled by the standard RNN model. Another direction is relaxing the monotonicity and preference independence in temporal criteria, which can describe a more complicated decision scenario while simultaneously making the preference model highly complex and difficult to optimize. At last, it is interesting to analyze time series where the performances on some criteria exhibit various types of concept drifts.

Acknowledgments

Miłosz Kadziński acknowledges support from the Polish National Science Center under the SONATA BIS project (grant no. DEC-2019/34/E/HS4/00045).

References

Angelopoulos et al. [2019] Angelopoulos, D., Siskos, Y., Psarras, J., 2019. Disaggregating time series on multiple criteria for robust forecasting: The case of long-term electricity demand in greece. European Journal of Operational Research 275, 252–265.
Angilella et al. [2014] Angilella, S., Corrente, S., Greco, S., Słowiński, R., 2014. Musa-int: Multicriteria customer satisfaction analysis with interacting criteria. Omega 42, 189–200.
Arandarenko et al. [2020] Arandarenko, M., Corrente, S., Jandrić, M., Stamenković, M., 2020. Multiple criteria decision aiding as a prediction tool for migration potential of regions. European Journal of Operational Research 284, 1154–1166.
Babaev et al. [2019] Babaev, D., Savchenko, M., Tuzhilin, A., Umerenkov, D., 2019. Et-rnn: Applying deep learning to credit loan applications, in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2183–2190.
Banamar [2019] Banamar, I., 2019. An interpolation-based method for the time weighed vector elicitation in temporal promethee ii applications. International Journal of Multicriteria Decision Making 8, 84–103.
Bowling et al. [2009] Bowling, S.R., Khasawneh, M.T., Kaewkuekool, S., Cho, B.R., 2009. A logistic approximation to the cumulative normal distribution. Journal of industrial engineering and management 2, 114–127.
Campello et al. [2023a] Campello, B.S., BenAmor, S., Duarte, L.T., Romano, J.M.T., 2023a. Improving preference disaggregation in multicriteria decision making: incorporating time series analysis and a multi-objective approach. arXiv preprint arXiv:2308.05259 .
Campello et al. [2022] Campello, B.S.C., Duarte, L.T., Romano, J.M.T., 2022. Dealing with multi-criteria decision analysis in time-evolving approach using a probabilistic prediction method. Engineering Applications of Artificial Intelligence 116, 105462.
Campello et al. [2023b] Campello, B.S.C., Duarte, L.T., Romano, J.M.T., 2023b. Exploiting temporal features in multicriteria decision analysis by means of a tensorial formulation of the topsis method. Computers & Industrial Engineering 175, 108915.
Chen and Guestrin [2016] Chen, T., Guestrin, C., 2016. Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794.
Cho et al. [2014] Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y., 2014. Learning phrase representations using rnn encoder–decoder for statistical machine translation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics. p. 1724.
Corrente et al. [2013] Corrente, S., Greco, S., Kadziński, M., Słowiński, R., 2013. Robust ordinal regression in preference learning and ranking. Machine Learning 93, 381–422.
Cortes and Vapnik [1995] Cortes, C., Vapnik, V., 1995. Support-vector networks. Machine learning 20, 273–297.
Devaud et al. [1980] Devaud, J., Groussaud, G., Jacquet-Lagreze, E., 1980. Utadis: Une méthode de construction de fonctions d’utilité additives rendant compte de jugements globaux. European Working Group on Multicriteria Decision Aid, Bochum 94.
Doumpos and Figueira [2019] Doumpos, M., Figueira, J.R., 2019. A multicriteria outranking approach for modeling corporate credit ratings: An application of the electre tri-nc method. Omega 82, 166–180.
Dyer et al. [1992] Dyer, J.S., Fishburn, P.C., Steuer, R.E., Wallenius, J., Zionts, S., 1992. Multiple criteria decision making, multiattribute utility theory: the next ten years. Management science 38, 645–654.
Fürnkranz and Hüllermeier [2010] Fürnkranz, J., Hüllermeier, E., 2010. Preference learning and ranking by pairwise comparison, in: Preference learning. Springer, pp. 65–82.
Ghaderi et al. [2017] Ghaderi, M., Ruiz, F., Agell, N., 2017. A linear programming approach for learning non-monotonic additive value functions in multiple criteria decision aiding. European Journal of Operational Research 259, 1073–1084.
Glorot et al. [2011] Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectifier neural networks, in: Proceedings of the fourteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings. pp. 315–323.
Greco et al. [2011] Greco, S., Kadziński, M., SŁowiński, R., 2011. Selection of a representative value function in robust multiple criteria sorting. Computers & Operations Research 38, 1620–1637.
Greco et al. [2014] Greco, S., Mousseau, V., Słowiński, R., 2014. Robust ordinal regression for value functions handling interacting criteria. European Journal of Operational Research 239, 711–730.
Greco et al. [2010] Greco, S., Mousseau, V., Słowiński, R., 2010. Multiple criteria sorting with a set of additive value functions. European Journal of Operational Research 207, 1455–1470. URL: https://www.sciencedirect.com/science/article/pii/S0377221710003772, doi:https://doi.org/10.1016/j.ejor.2010.05.021.
Guo et al. [2019] Guo, M., Liao, X., Liu, J., 2019. A progressive sorting approach for multiple criteria decision aiding in the presence of non-monotonic preferences. Expert Systems with Applications 123, 1–17.
Guo et al. [2020] Guo, M., Liao, X., Liu, J., Zhang, Q., 2020. Consumer preference analysis: A data-driven multiple criteria approach integrating online information. Omega 96, 102074.
Guo et al. [2021a] Guo, M., Xu, Z., Zhang, Q., Liao, X., Liu, J., 2021a. Deciphering feature effects on decision-making in ordinal regression problems: an explainable ordinal factorization model. ACM Transactions on Knowledge Discovery from Data (TKDD) 16, 1–26.
Guo et al. [2021b] Guo, M., Zhang, Q., Liao, X., Chen, F.Y., Zeng, D.D., 2021b. A hybrid machine learning framework for analyzing human decision-making through learning preferences. Omega 101, 102263.
Herbrich et al. [1999] Herbrich, R., Graepel, T., Obermayer, K., 1999. Support vector learning for ordinal regression, in: In International Conference on Artificial Neural Networks, pp. 97–102.
Ho [1995] Ho, T.K., 1995. Random decision forests, in: Proceedings of 3rd international conference on document analysis and recognition, IEEE. pp. 278–282.
Hosmer Jr et al. [2013] Hosmer Jr, D.W., Lemeshow, S., Sturdivant, R.X., 2013. Applied logistic regression. volume 398. John Wiley & Sons.
Hu [2009] Hu, Y.C., 2009. Bankruptcy prediction using electre-based single-layer perceptron. Neurocomputing 72, 3150–3157.
Kadziński et al. [2018] Kadziński, M., Cinelli, M., Ciomek, K., Coles, S.R., Nadagouda, M.N., Varma, R.S., Kirwan, K., 2018. Co-constructive development of a green chemistry-based model for the assessment of nanoparticles synthesis. European Journal of Operational Research 264, 472–490.
Kadziński et al. [2014] Kadziński, M., Corrente, S., Greco, S., Słowiński, R., 2014. Preferential reducts and constructs in robust multiple criteria ranking and sorting. OR Spectrum 36, 1021–1053.
Kadziński et al. [2020] Kadziński, M., Ghaderi, M., Dąbrowski, M., 2020. Contingent preference disaggregation model for multiple criteria sorting problem. European Journal of Operational Research 281, 369–387.
Kraus et al. [2020] Kraus, M., Feuerriegel, S., Oztekin, A., 2020. Deep learning in business analytics and operations research: Models, applications and managerial implications. European Journal of Operational Research 281, 628–641.
LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. nature 521, 436–444.
Lekwijit et al. [2023] Lekwijit, S., Terwiesch, C., Asch, D.A., Volpp, K.G., 2023. Evaluating the efficacy of connected healthcare: An empirical examination of patient engagement approaches and their impact on readmission. Management Science .
Lin et al. [2021] Lin, W., Sun, L., Zhong, Q., Liu, C., Feng, J., Ao, X., Yang, H., 2021. Online credit payment fraud detection via structure-aware hierarchical recurrent neural network., in: IJCAI, pp. 3670–3676.
Liu et al. [2023a] Liu, F., Liao, H., Al-Barakati, A., 2023a. Physician selection based on user-generated content considering interactive criteria and risk preferences of patients. Omega 115, 102784.
Liu et al. [2023b] Liu, J., Kadziński, M., Liao, X., 2023b. Modeling contingent decision behavior: A bayesian nonparametric preference-learning approach. INFORMS Journal on Computing .
Liu et al. [2021] Liu, J., Kadziński, M., Liao, X., Mao, X., 2021. Data-driven preference learning methods for value-driven multiple criteria sorting with interacting criteria. INFORMS Journal on Computing 33, 586–606.
Liu et al. [2019] Liu, J., Liao, X., Kadziński, M., Słowiński, R., 2019. Preference disaggregation within the regularization framework for sorting problems with multiple potentially non-monotonic criteria. European Journal of Operational Research 276, 1071–1089.
Martyn and Kadziński [2023] Martyn, K., Kadziński, M., 2023. Deep preference learning for multiple criteria decision analysis. European Journal of Operational Research 305, 781–805.
Rosenblatt [1958] Rosenblatt, F., 1958. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review 65, 386.
Roy [1996] Roy, B., 1996. Multicriteria methodology for decision aiding. volume 12. Springer Science & Business Media.
Schuster and Paliwal [1997] Schuster, M., Paliwal, K.K., 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 2673–2681.
Thesari et al. [2019] Thesari, S.S., Trojan, F., Batistus, D.R., 2019. A decision model for municipal resources management. Management Decision 57, 3015–3034.
Wallenius et al. [2008] Wallenius, J., Dyer, J.S., Fishburn, P.C., Steuer, R.E., Zionts, S., Deb, K., 2008. Multiple criteria decision making, multiattribute utility theory: Recent accomplishments and what lies ahead. Management science 54, 1336–1349.
Wang and Malakooti [1992] Wang, J., Malakooti, B., 1992. A feedforward neural network for multiple criteria decision making. Computers & Operations Research 19, 151–167.
Wu and Liao [2021] Wu, X., Liao, H., 2021. Modeling personalized cognition of customers in online shopping. Omega 104, 102471.
Yan et al. [2015] Yan, S., Liu, S., Liu, J., Wu, L., 2015. Dynamic grey target decision making method with grey numbers based on existing state and future development trend of alternatives. Journal of Intelligent & Fuzzy Systems 28, 2159–2168.
Zhao et al. [2023] Zhao, S., Wu, R., Tao, J., Qu, M., Zhao, M., Fan, C., Zhao, H., 2023. percltv: A general system for personalized customer lifetime value prediction in online games. ACM Transactions on Information Systems 41, 1–29.
Zopounidis and Doumpos [2002] Zopounidis, C., Doumpos, M., 2002. Multicriteria classification and sorting methods: A literature review. European Journal of Operational Research 138, 229–246.

Appendix A Proof of the Transformation from $P0$ to $P1$

Proof.

The Lagrange function of $P0$ is:

\mathcal{L}(\mathbf{u},\boldsymbol{\xi},\boldsymbol{\mu},\boldsymbol{\eta},\boldsymbol{\sigma})=\frac{1}{2}\left\|\mathbf{u}\right\|^{2}_{2}+C\cdot\sum_{i=1}^{N}\xi_{i}-\sum_{i=1}^{N}\mu_{i}\left[y_{i}\mathbf{u}^{\prime}\mathbf{v}_{i}+\xi_{i}-1\right]-\sum_{i=1}^{N}\eta_{i}\xi_{i}-\sum_{j=1}^{m}\sum_{t=1}^{T_{j}}\sum_{k=1}^{\gamma_{j}^{t}}\Delta f_{j}^{t,k}\cdot\sigma_{j}^{t,k}

(32)

where $\boldsymbol{\xi}=(\xi_{1},\ldots,\xi_{N})^{\prime},\boldsymbol{\mu}=(\mu_{1},\ldots,\mu_{N})^{\prime},\boldsymbol{\eta}=(\eta_{1},\ldots,\eta_{N})^{\prime},\boldsymbol{\sigma}=(\underbrace{\sigma_{1}^{1,1},\ldots,\sigma_{1}^{1,\gamma_{1}^{1}}}_{\text{1st timestamp on criterion }g_{1}}\ldots,\sigma_{m}^{T_{m},\gamma_{m}^{T_{m}}})^{\prime}$ , and $\mathbf{v}_{i}=(V_{i,1}^{1,1},\ldots,V_{i,j}^{t,k},\ldots,V_{i,m}^{T_{m},\gamma_{m}^{T_{m}}})^{\prime}$ . We set derivatives of $\mathcal{L}(\mathbf{u},\boldsymbol{\xi},\boldsymbol{\mu},\boldsymbol{\eta},\boldsymbol{\sigma})$ w.r.t. $\mathbf{u}$ and $\boldsymbol{\xi}$ to be zero, respectively:

	$\displaystyle\frac{\partial\mathcal{L}(\mathbf{u},\boldsymbol{\xi},\boldsymbol{\mu},\boldsymbol{\eta},\boldsymbol{\sigma})}{\partial\Delta f_{j}^{t,k}}=\Delta f_{j}^{t,k}-\sum_{i=1}^{N}y_{i}\mu_{i}V_{i,j}^{t,k}-\sigma_{j}^{t,k}=0$		(33)
	$\displaystyle\Rightarrow\Delta f_{j}^{t,k}=\sum_{i=1}^{N}y_{i}\mu_{i}V_{i,j}^{t,k}+\sigma_{j}^{t,k}$		(34)
	$\displaystyle\frac{\partial\mathcal{L}(\mathbf{u},\boldsymbol{\xi},\boldsymbol{\mu},\boldsymbol{\eta},\boldsymbol{\sigma})}{\partial\xi_{i}}=C-\mu_{i}-\eta_{i}=0$		(35)
	$\displaystyle\Rightarrow C\cdot\sum_{i=1}^{N}\xi_{i}=\sum_{i=1}^{N}\mu_{i}\xi_{i}-\sum_{i=1}^{N}\eta_{i}\xi_{i}$		(36)

Given Eq. (36), the Lagrange function can be simplified:

\mathcal{L}(\mathbf{u},\boldsymbol{\mu},\boldsymbol{\sigma})=\frac{1}{2}\left\|\mathbf{u}\right\|^{2}_{2}-\sum_{i=1}^{N}\mu_{i}\left(y_{i}\mathbf{u}^{\prime}\mathbf{v}_{i}-1\right)-\sum_{j=1}^{m}\sum_{t=1}^{T_{j}}\sum_{k=1}^{\gamma_{j}^{t}}\Delta f_{j}^{t,k}\cdot\sigma_{j}^{t,k}

(37)

Given Eq. (34), the last term in Eq. (37) can be further expressed as:

$\displaystyle\sum_{j=1}^{m}\sum_{t=1}^{T_{j}}\sum_{k=1}^{\gamma_{j}^{t}}\Delta f_{j}^{t,k}\cdot\sigma_{j}^{t,k}$	$\displaystyle=\sum_{j=1}^{m}\sum_{t=1}^{T_{j}}\sum_{k=1}^{\gamma_{j}^{t}}\Delta f_{j}^{t,k}\left(\Delta f_{j}^{t,k}-\sum_{i=1}^{N}y_{i}\mu_{i}V_{i,j}^{t,k}\right)$	(38)
	$\displaystyle=\sum_{j=1}^{m}\sum_{t=1}^{T_{j}}\sum_{k=1}^{\gamma_{j}^{t}}(\Delta f_{j}^{t,k})^{2}-\sum_{j=1}^{m}\sum_{t=1}^{T_{j}}\sum_{k=1}^{\gamma_{j}^{t}}\Delta f_{j}^{t,k}(\sum_{i=1}^{N}y_{i}\mu_{i}V_{i,j}^{t,k})$	(39)
	$\displaystyle=\left\\|\mathbf{u}\right\\|^{2}_{2}-\sum_{i=1}^{N}y_{i}\mu_{i}\left(\sum_{j=1}^{m}\sum_{t=1}^{T_{j}}\sum_{k=1}^{\gamma_{j}^{t}}\Delta f_{j}^{t,k}V_{i,j}^{t,k}\right)$	(40)
	$\displaystyle=\left\\|\mathbf{u}\right\\|^{2}_{2}-\sum_{i=1}^{N}y_{i}\mu_{i}\mathbf{u}^{\prime}\mathbf{v}_{i}$	(41)

Hence, Eq. (37) can be rewritten as:

\mathcal{L}(\boldsymbol{\mu},\boldsymbol{\sigma})=\sum_{i=1}^{N}\mu_{i}-\frac{1}{2}\sum_{j=1}^{m}\sum_{t=1}^{T_{j}}\sum_{k=1}^{\gamma_{j}^{t}}\left(\sum_{i=1}^{N}y_{i}\mu_{i}V_{i,j}^{t,k}+\sigma_{j}^{t,k}\right)

(42)

The dual problem of P0 can be:

$\displaystyle\max g(\boldsymbol{\mu},\boldsymbol{\eta},\boldsymbol{\sigma})$	$\displaystyle=\inf_{\mathbf{u},\boldsymbol{\xi}}\mathcal{L}(\mathbf{u},\boldsymbol{\xi},\boldsymbol{\mu},\boldsymbol{\eta},\boldsymbol{\sigma})$	(43)
	$\displaystyle=\sum_{i=1}^{N}\mu_{i}-\frac{1}{2}\sum_{j=1}^{m}\sum_{t=1}^{T_{j}}\sum_{k=1}^{\gamma_{j}^{t}}\left(\sum_{i=1}^{N}y_{i}\mu_{i}V_{i,j}^{t,k}+\sigma_{j}^{t,k}\right)$	(44)
$\displaystyle s.t.:\quad$	$\displaystyle\boldsymbol{\sigma}\geq{\mathbf{0}}$	(45)
	$\displaystyle C\mathbf{e}\geq\boldsymbol{\mu}\geq\mathbf{0}$	(46)

From Eq. (34), we can replace $\sigma_{j}^{t,k}$ by $\Delta f_{j}^{t,k}-\sum_{i=1}^{N}y_{i}\mu_{i}V_{i,j}^{t,k}$ and obtain P1:

(P1)	$\displaystyle\min\frac{1}{2}\left\\|{\mathbf{u}}\right\\|^{2}_{2}-\sum\nolimits_{i=1}^{N}\mu_{i}$
$\displaystyle s.t.:\quad$	$\displaystyle\Delta f_{j}^{t,k}-\sum\nolimits_{i}^{N}y_{i}\mu_{i}V_{i,j}^{t,k}\geq 0,j=1,\ldots,m,t=1,\ldots,T_{j},k=1,\ldots,\gamma_{j}^{t}$	(47)
	$\displaystyle C\geq\mu_{i}\geq 0,i=1,\ldots,N$	(48)

∎

Appendix B Proof of Proposition 1

Proof.

Recall the definition of the sub-marginal values of alternative $a$ in Eq. (4), we can rewrite the sub-marginal value at $t$ -th timestamp in $j$ -th criterion:

\displaystyle f_{j}^{t}(g_{j}^{t}(a))=f_{j}^{t}(x_{j}^{t,k})+\frac{g_{j}^{t}(a)-x_{j}^{t,k}}{x_{j}^{t,k+1}-x_{j}^{t,k}}\left(f_{j}^{t}(x_{j}^{t,k+1})-f_{j}^{t}(x_{j}^{t,k})\right)

(49)

Following the transformation in Eq. (15), we can easily get:

$\displaystyle f_{j}^{t}(g_{j}^{t}(a))^{\prime}$	$\displaystyle=\frac{f_{j}^{t}(x_{j}^{t,k})-f_{j}^{t}(x_{j}^{t,0})}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}+\frac{g_{j}^{t}(a)-x_{j}^{t,k}}{x_{j}^{t,k+1}-x_{j}^{t,k}}\left[\frac{f_{j}^{t}(x_{j}^{t,k+1})-f_{j}^{t}(x_{j}^{t,0})}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}-\frac{f_{j}^{t}(x_{j}^{t,k})-f_{j}^{t}(x_{j}^{t,0})}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}\right]$	(50)
	$\displaystyle=\frac{f_{j}^{t}(x_{j}^{t,k})+\frac{g_{j}^{t}(a)-x_{j}^{t,k}}{x_{j}^{t,k+1}-x_{j}^{t,k}}\left(f_{j}^{t}(x_{j}^{t,k+1})-f_{j}^{t}(x_{j}^{t,k})\right)-f_{j}^{t}(x_{j}^{t,0})}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}$	(51)
	$\displaystyle=\frac{f_{j}^{t}(g_{j}^{t}(a))-f_{j}^{t}(x_{j}^{t,0})}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}$	(52)

Using Eq. (52) to express the transformed comprehensive value $U^{\prime}(a)$ , we can get:

$\displaystyle U^{\prime}(a)$	$\displaystyle=\frac{\sum_{j=1}^{m}\left[f_{j}^{T_{j}}(g_{j}^{T_{j}}(a))-f_{j}^{T_{j}}(x_{j}^{T_{j},0})\right]}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}+\frac{\sum_{t=1}^{T_{j}-1}\tau^{T_{j}-t}\left[f_{j}^{t}(g_{j}^{t}(a))-f_{j}^{t}(x_{j}^{t,0})\right]}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}$	(53)
	$\displaystyle=\frac{\sum_{j+1}^{m}\left[f_{j}^{T_{j}}(g_{j}^{T_{j}}(a))+\sum_{t=1}^{T_{j}-1}\tau^{T_{j}-t}f_{j}^{t}(g_{j}^{t}(a))\right]}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}$	(54)
	$\displaystyle\quad+\frac{\sum_{j=1}^{m}{\left[-f_{j}^{T_{j}}(x_{j}^{T_{j},0})-\sum_{t=1}^{T_{j}-1}\tau^{T_{j}-t}f_{j}^{t}(x_{j}^{t,0})\right]}}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}$	(55)
	$\displaystyle=\frac{U(a)-\sum_{j=1}^{m}{\left[f_{j}^{T_{j}}(x_{j}^{T_{j},0})+\sum_{t=1}^{T_{j}-1}\tau^{T_{j}-t}f_{j}^{t}(x_{j}^{t,0})\right]}}{\sum\nolimits_{j,t}(f_{j}^{t}(x_{j}^{t,\gamma_{j}^{t}})-f_{j}^{t}(x_{j}^{t,0}))}$	(56)

Thus, the transformation in Proposition 1 can ensure classifications are not changed when the sub-marginal value functions are normalized. ∎

Appendix C A Dummy Example of Loss Function with Ordinal Thresholds

Assume that in the context of an ordinal classification problem, the objective is to categorize alternatives into three classes: $Cl_{1},Cl_{2}$ , and $Cl_{3}$ , with $Cl_{3}$ representing the most preferred category. The task involves learning ordinal thresholds during training to maximize the probability of alternatives remaining in their true classes.

Consider a hypothetical scenario with three alternatives: $a_{1},a_{2}$ , and $a_{3}$ , assigned to $Cl_{1},Cl_{2}$ , and $Cl_{3}$ respectively. The computed comprehensive values for these alternatives are 0.5, 1.5, and 5. In Table 5, various values of thresholds $\theta_{1}$ and $\theta_{2}$ are presented, along with the corresponding probabilities of each alternative being assigned to their respective classes and the resulting loss. For instance, when thresholds are set to $\theta_{1}=1$ and $\theta_{2}=4$ , we can compute the probabilities for $a_{1}$ belonging to each class using Eq. (28):

$\displaystyle P(Cl(a)=1)$	$\displaystyle=\Phi(1-0.5)-\Phi(-\infty-0.5)=0.6225-0=0.6225,$	(57)
$\displaystyle P(Cl(a)=2)$	$\displaystyle=\Phi(4-0.5)-\Phi(1-0.5)=0.9707-0.6225=0.3482,$
$\displaystyle P(Cl(a)=3)$	$\displaystyle=\Phi(\infty-0.5)-\Phi(4-0.5)=1-0.9707=0.0293.$

Given that the ground truth for $a_{1}$ is $Cl_{1}$ , we can calculate the likelihood loss for $a_{1}$ using Eq. (29):

\mathcal{L}=-\left(1\cdot\ln(0.6225)+0\cdot\ln(0.3482)+0\cdot\ln(0.0293)\right)=0.4741.

(58)

Subsequently, optimization techniques like stochastic gradient descent are employed to minimize the loss and update the training thresholds. When compared to other threshold combinations in Table 5, it is evident that $\theta_{1}=1$ and $\theta_{2}=4$ provide a more suitable configuration.

Table 5: The example for the computation of ordinal thresholds and loss function.

$\theta_{1}$	Alternative	Assignment	$U(a)$	$P(Cl=1)$	$P(Cl=2)$	$P(Cl=3)$	$\mathcal{L}_{a}$
1	$a_{1}$	1	0.5	0.6225	0.3482	0.0293	0.4741
$\theta_{2}$	$a_{2}$	2	1.5	0.3775	0.5466	0.0759	0.6040
4	$a_{3}$	3	5	0.0180	0.2510	0.7311	0.3133
Average Loss ( $\mathcal{L}$ )							0.4638
$\theta_{1}$	Alternative	Assignment	$U(a)$	$P(Cl=1)$	$P(Cl=2)$	$P(Cl=3)$	$\mathcal{L}_{a}$
0.1	$a_{1}$	1	0.5	0.4013	0.5694	0.0293	0.9130
$\theta_{2}$	$a_{2}$	2	1.5	0.1978	0.7263	0.0759	0.3198
4	$a_{3}$	3	5	0.0074	0.2615	0.7311	0.3133
Average Loss ( $\mathcal{L}$ )							0.5153
$\theta_{1}$	Alternative	Assignment	$U(a)$	$P(Cl=1)$	$P(Cl=2)$	$P(Cl=3)$	$\mathcal{L}_{a}$
2	$a_{1}$	1	0.5	0.8176	0.1531	0.0293	0.2014
$\theta_{2}$	$a_{2}$	2	1.5	0.6225	0.3017	0.0759	1.1984
4	$a_{3}$	3	5	0.0474	0.2215	0.7311	0.3133
Average Loss ( $\mathcal{L}$ )							0.5710
$\theta_{1}$	Alternative	Assignment	$U(a)$	$P(Cl=1)$	$P(Cl=2)$	$P(Cl=3)$	$\mathcal{L}_{a}$
1	$a_{1}$	1	0.5	0.6225	0.1951	0.1824	0.4741
$\theta_{2}$	$a_{2}$	2	1.5	0.3775	0.2449	0.3775	1.4068
2	$a_{3}$	3	5	0.0180	0.0294	0.9526	0.0486
Average Loss ( $\mathcal{L}$ )							0.6432
$\theta_{1}$	Alternative	Assignment	$U(a)$	$P(Cl=1)$	$P(Cl=2)$	$P(Cl=3)$	$\mathcal{L}_{a}$
1	$a_{1}$	1	0.5	0.6225	0.3735	0.0041	0.4741
$\theta_{2}$	$a_{2}$	2	1.5	0.3775	0.6115	0.0110	0.4919
6	$a_{3}$	3	5	0.0180	0.7131	0.2689	1.3133
Average Loss ( $\mathcal{L}$ )							0.7597

Appendix D Best Model Settings in Experiments

We conduct ten-fold cross-validation in the real-case experiment and four simulations to determine the best parameters for the baselines and the proposed models. The best parameters are presented in Table 6.

Table 6: Best parameters in real-case and simulation experiments.

Experiment type	TPL	mRNN	UTADIS UTADIS-T	XOFM	Logistic	SVM	RF	XGB	GRU	RNN	MLP
Real-case	$\gamma=4$ $C=0.01$	Dim $c=$ 40 #para=3,162	$\gamma=4$	$\gamma=3$ $K=3$	$C=0.01$	$C=1$	n_est=100 max_feat=6 max_depth=10	n_est=60 max_depth=2	Dim $c=$ 32 #para=4,737	Dim $c=$ 40 #para=3,521	96-32-8-1 #para=3,377
Basic	$\gamma=4$ $C=0.001$	Dim $c=$ 40 #para=3,162	$\gamma=4$	$\gamma=3$ $K=2$	$C=0.1$	$C=1$	n_est=80 max_feat=8 max_depth=10	n_est=80 max_depth=4	Dim $c=$ 32 #para=4,737	Dim $c=$ 40 #para=3,521	80-32-8-1 #para=2,865
Non-Markovian	$\gamma=3$ $C=0.001$	Dim $c=$ 40 #para=3,162	$\gamma=3$	$\gamma=3$ $K=2$	$C=0.1$	$C=1$	n_est=60 max_feat=8 max_depth=10	n_est=80 max_depth=2	Dim $c=$ 32 #para=4,737	Dim $c=$ 40 #para=3,521	80-32-8-1 #para=2,865
Non-monotonic	$\gamma=4$ $C=0.01$	Dim $c=$ 40 #para=3,162	$\gamma=4$	$\gamma=3$ $K=3$	$C=0.1$	$C=1$	n_est=60 max_feat=10 max_depth=6	n_est=80 max_depth=2	Dim $c=$ 32 #para=4,737	Dim $c=$ 40 #para=3,521	80-32-8-1 #para=2,865
Non-independent	$\gamma=4$ $C=0.001$	Dim $c=$ 40 #para=3,162	$\gamma=3$	$\gamma=3$ $K=4$	$C=1$	$C=1$	n_est=80 max_feat=10 max_depth=8	n_est=80 max_depth=4	Dim $c=$ 32 #para=4,737	Dim $c=$ 40 #para=3,521	80-32-8-1 #para=2,865

Appendix E Completed Results for Simulation Experiments

We have provided all experimental results in simulation in Tables 7 to 10.

Table 7: Experimental results for Basic experiment across four metrics. Means and standard deviations are calculated via ten-fold cross-validation. Bold type highlights the top-performing results, while stars (^∗) denote that the performance is significantly lower than the best (

{}^{***}p<0.01,^{**}p<0.05,^{*}p<0.1

Method	Precision	Recall	F-score	Accuracy
SVM	0.909 $\pm$ 0.013^∗∗∗	0.916 $\pm$ 0.015^∗∗∗	0.913 $\pm$ 0.009^∗∗∗	0.906 $\pm$ 0.008^∗∗∗
Logistic	0.906 $\pm$ 0.010^∗∗∗	0.905 $\pm$ 0.011^∗∗∗	0.906 $\pm$ 0.008^∗∗∗	0.899 $\pm$ 0.008^∗∗∗
RF	0.892 $\pm$ 0.017^∗∗∗	0.958 $\pm$ 0.013^∗∗	0.924 $\pm$ 0.007^∗∗∗	0.915 $\pm$ 0.009^∗∗∗
XGB	0.920 $\pm$ 0.014^∗∗∗	0.951 $\pm$ 0.012^∗∗∗	0.935 $\pm$ 0.009^∗∗∗	0.929 $\pm$ 0.010^∗∗∗
MLP	0.910 $\pm$ 0.013^∗∗∗	0.917 $\pm$ 0.010^∗∗∗	0.914 $\pm$ 0.006^∗∗∗	0.906 $\pm$ 0.006^∗∗∗
RNN	0.922 $\pm$ 0.010^∗∗∗	0.903 $\pm$ 0.011^∗∗∗	0.912 $\pm$ 0.008^∗∗∗	0.907 $\pm$ 0.008^∗∗∗
GRU	0.917 $\pm$ 0.014^∗∗∗	0.908 $\pm$ 0.010^∗∗∗	0.912 $\pm$ 0.008^∗∗∗	0.906 $\pm$ 0.009^∗∗∗
UTADIS	0.889 $\pm$ 0.012^∗∗∗	0.894 $\pm$ 0.023^∗∗∗	0.891 $\pm$ 0.015^∗∗∗	0.902 $\pm$ 0.015^∗∗∗
UTADIS-T	0.886 $\pm$ 0.015^∗∗∗	0.908 $\pm$ 0.022^∗∗∗	0.897 $\pm$ 0.018^∗∗∗	0.906 $\pm$ 0.011^∗∗∗
XOFM	0.893 $\pm$ 0.013^∗∗∗	0.915 $\pm$ 0.021^∗∗∗	0.904 $\pm$ 0.019^∗∗∗	0.909 $\pm$ 0.009^∗∗∗
TPL	0.877 $\pm$ 0.079^∗∗	0.923 $\pm$ 0.073	0.893 $\pm$ 0.018^∗∗∗	0.900 $\pm$ 0.023^∗∗∗
mRNN $(\gamma=2$ )	0.930 $\pm$ 0.013^∗∗∗	0.958 $\pm$ 0.018^∗∗	0.943 $\pm$ 0.011^∗∗∗	0.938 $\pm$ 0.011^∗∗∗
mRNN $(\gamma=4$ )	0.949 $\pm$ 0.013	0.961 $\pm$ 0.022^∗	0.952 $\pm$ 0.017^∗	0.948 $\pm$ 0.020
mRNN $(\gamma=6$ )	0.949 $\pm$ 0.012	0.973 $\pm$ 0.008	0.961 $\pm$ 0.005	0.957 $\pm$ 0.007
mRNN $(\gamma=8$ )	0.949 $\pm$ 0.018	0.969 $\pm$ 0.012	0.959 $\pm$ 0.008	0.955 $\pm$ 0.010

Table 8: Experimental results for Non-Markovian experiment across four metrics.

Method	Precision	Recall	F-score	Accuracy
SVM	0.884 $\pm$ 0.010^∗∗∗	0.864 $\pm$ 0.028^∗∗∗	0.874 $\pm$ 0.016^∗∗∗	0.888 $\pm$ 0.013^∗∗∗
Logistic	0.889 $\pm$ 0.013^∗∗∗	0.873 $\pm$ 0.017^∗∗∗	0.881 $\pm$ 0.009^∗∗∗	0.894 $\pm$ 0.008^∗∗∗
RF	0.918 $\pm$ 0.018^∗∗∗	0.856 $\pm$ 0.027^∗∗∗	0.886 $\pm$ 0.019^∗∗∗	0.901 $\pm$ 0.015^∗∗∗
XGB	0.924 $\pm$ 0.014^∗∗∗	0.896 $\pm$ 0.011^∗∗∗	0.910 $\pm$ 0.007^∗∗∗	0.920 $\pm$ 0.007^∗∗∗
MLP	0.878 $\pm$ 0.014^∗∗∗	0.877 $\pm$ 0.018^∗∗∗	0.878 $\pm$ 0.009^∗∗∗	0.890 $\pm$ 0.009^∗∗∗
RNN	0.884 $\pm$ 0.016^∗∗∗	0.880 $\pm$ 0.019^∗∗∗	0.882 $\pm$ 0.013^∗∗∗	0.894 $\pm$ 0.012^∗∗∗
GRU	0.881 $\pm$ 0.011^∗∗∗	0.888 $\pm$ 0.013^∗∗∗	0.884 $\pm$ 0.009^∗∗∗	0.896 $\pm$ 0.009^∗∗∗
UTADIS	0.889 $\pm$ 0.014^∗∗∗	0.896 $\pm$ 0.020^∗∗∗	0.892 $\pm$ 0.011^∗∗∗	0.903 $\pm$ 0.013^∗∗∗
UTADIS-T	0.896 $\pm$ 0.015^∗∗∗	0.904 $\pm$ 0.021^∗∗∗	0.900 $\pm$ 0.016^∗∗∗	0.906 $\pm$ 0.017^∗∗∗
XOFM	0.905 $\pm$ 0.013^∗∗∗	0.918 $\pm$ 0.014^∗∗∗	0.911 $\pm$ 0.010^∗∗∗	0.911 $\pm$ 0.015^∗∗∗
TPL	0.919 $\pm$ 0.062^∗	0.898 $\pm$ 0.078	0.909 $\pm$ 0.030^∗∗∗	0.914 $\pm$ 0.025^∗∗∗
mRNN $(\gamma=2$ )	0.928 $\pm$ 0.014^∗∗∗	0.912 $\pm$ 0.024^∗∗	0.920 $\pm$ 0.013^∗∗∗	0.929 $\pm$ 0.011^∗∗∗
mRNN $(\gamma=4$ )	0.959 $\pm$ 0.014	0.923 $\pm$ 0.025^∗∗	0.940 $\pm$ 0.016^∗∗	0.947 $\pm$ 0.016^∗∗
mRNN $(\gamma=6$ )	0.965 $\pm$ 0.014	0.931 $\pm$ 0.020	0.948 $\pm$ 0.012	0.954 $\pm$ 0.010
mRNN $(\gamma=8$ )	0.960 $\pm$ 0.013	0.945 $\pm$ 0.012	0.952 $\pm$ 0.006	0.958 $\pm$ 0.006

Table 9: Experimental results for Non-monotonic experiment across four metrics.

Method	Precision	Recall	F-score	Accuracy
SVM	0.623 $\pm$ 0.034^∗∗∗	0.559 $\pm$ 0.044^∗∗∗	0.588 $\pm$ 0.028^∗∗∗	0.645 $\pm$ 0.025^∗∗∗
Logistic	0.652 $\pm$ 0.022^∗∗∗	0.592 $\pm$ 0.035^∗∗∗	0.620 $\pm$ 0.024^∗∗∗	0.671 $\pm$ 0.018^∗∗∗
RF	0.672 $\pm$ 0.027	0.582 $\pm$ 0.035^∗∗∗	0.623 $\pm$ 0.027^∗∗∗	0.681 $\pm$ 0.022^∗∗
XGB	0.683 $\pm$ 0.030	0.642 $\pm$ 0.032	0.661 $\pm$ 0.024	0.702 $\pm$ 0.017
MLP	0.647 $\pm$ 0.024^∗∗∗	0.574 $\pm$ 0.034^∗∗∗	0.608 $\pm$ 0.021^∗∗∗	0.664 $\pm$ 0.017^∗∗∗
RNN	0.665 $\pm$ 0.037	0.565 $\pm$ 0.033^∗∗∗	0.610 $\pm$ 0.024^∗∗∗	0.672 $\pm$ 0.019^∗∗∗
GRU	0.662 $\pm$ 0.034^∗	0.551 $\pm$ 0.031^∗∗∗	0.600 $\pm$ 0.019^∗∗∗	0.667 $\pm$ 0.017^∗∗∗
UTADIS	0.558 $\pm$ 0.029^∗∗∗	0.616 $\pm$ 0.038	0.585 $\pm$ 0.025^∗∗∗	0.603 $\pm$ 0.021^∗∗∗
UTADIS-T	0.577 $\pm$ 0.033^∗∗∗	0.624 $\pm$ 0.039	0.600 $\pm$ 0.029^∗∗∗	0.632 $\pm$ 0.019^∗∗∗
XOFM	0.583 $\pm$ 0.029^∗∗∗	0.619 $\pm$ 0.045	0.600 $\pm$ 0.027^∗∗∗	0.637 $\pm$ 0.016^∗∗∗
TPL	0.642 $\pm$ 0.066	0.607 $\pm$ 0.147	0.606 $\pm$ 0.048^∗∗	0.652 $\pm$ 0.033^∗∗∗
mRNN $(\gamma=2$ )	0.644 $\pm$ 0.030^∗∗∗	0.625 $\pm$ 0.043	0.634 $\pm$ 0.029^∗∗	0.672 $\pm$ 0.021^∗∗∗
mRNN $(\gamma=4$ )	0.649 $\pm$ 0.026^∗∗∗	0.624 $\pm$ 0.039	0.636 $\pm$ 0.025^∗∗	0.675 $\pm$ 0.022^∗∗
mRNN $(\gamma=6$ )	0.658 $\pm$ 0.034^∗∗	0.623 $\pm$ 0.044	0.638 $\pm$ 0.024^∗∗	0.680 $\pm$ 0.020^∗∗
mRNN $(\gamma=8$ )	0.660 $\pm$ 0.033^∗∗	0.596 $\pm$ 0.055^∗∗	0.624 $\pm$ 0.031^∗∗∗	0.675 $\pm$ 0.021^∗∗∗

Table 10: Experimental results for Non-independent experiment across four metrics.

Method	Precision	Recall	F-score	Accuracy
SVM	0.911 $\pm$ 0.014^∗∗∗	0.888 $\pm$ 0.026^∗∗∗	0.899 $\pm$ 0.016^∗∗∗	0.922 $\pm$ 0.014^∗∗∗
Logistic	0.932 $\pm$ 0.009^∗∗∗	0.917 $\pm$ 0.016^∗∗	0.925 $\pm$ 0.009^∗∗∗	0.941 $\pm$ 0.008^∗∗∗
RF	0.917 $\pm$ 0.021^∗∗∗	0.774 $\pm$ 0.026^∗∗∗	0.839 $\pm$ 0.017^∗∗∗	0.883 $\pm$ 0.014^∗∗∗
XGB	0.905 $\pm$ 0.013^∗∗∗	0.871 $\pm$ 0.024^∗∗∗	0.887 $\pm$ 0.011^∗∗∗	0.913 $\pm$ 0.009^∗∗∗
MLP	0.924 $\pm$ 0.017^∗∗∗	0.916 $\pm$ 0.022^∗∗	0.920 $\pm$ 0.012^∗∗∗	0.937 $\pm$ 0.011^∗∗∗
RNN	0.964 $\pm$ 0.006	0.942 $\pm$ 0.015	0.953 $\pm$ 0.008	0.963 $\pm$ 0.006
GRU	0.964 $\pm$ 0.010	0.940 $\pm$ 0.015	0.952 $\pm$ 0.009	0.963 $\pm$ 0.007
UTADIS	0.850 $\pm$ 0.016^∗∗∗	0.866 $\pm$ 0.029^∗∗∗	0.857 $\pm$ 0.013^∗∗∗	0.887 $\pm$ 0.012^∗∗∗
UTADIS-T	0.862 $\pm$ 0.019^∗∗∗	0.869 $\pm$ 0.025^∗∗∗	0.865 $\pm$ 0.011^∗∗∗	0.894 $\pm$ 0.010^∗∗∗
XOFM	0.874 $\pm$ 0.023^∗∗∗	0.877 $\pm$ 0.031^∗∗∗	0.875 $\pm$ 0.021^∗∗∗	0.902 $\pm$ 0.011^∗∗∗
TPL	0.878 $\pm$ 0.067^∗∗	0.910 $\pm$ 0.070	0.894 $\pm$ 0.020^∗∗∗	0.910 $\pm$ 0.017^∗∗∗
mRNN $(\gamma=2$ )	0.950 $\pm$ 0.021^∗	0.868 $\pm$ 0.026^∗∗∗	0.906 $\pm$ 0.011^∗∗∗	0.929 $\pm$ 0.008^∗∗∗
mRNN $(\gamma=4$ )	0.932 $\pm$ 0.023^∗∗∗	0.872 $\pm$ 0.030^∗∗∗	0.901 $\pm$ 0.014^∗∗∗	0.924 $\pm$ 0.012^∗∗∗
mRNN $(\gamma=6$ )	0.935 $\pm$ 0.021^∗∗∗	0.871 $\pm$ 0.026^∗∗∗	0.902 $\pm$ 0.015^∗∗∗	0.925 $\pm$ 0.011^∗∗∗
mRNN $(\gamma=8$ )	0.933 $\pm$ 0.021^∗∗∗	0.868 $\pm$ 0.025^∗∗∗	0.899 $\pm$ 0.016^∗∗∗	0.923 $\pm$ 0.012^∗∗∗

Appendix F Example of Computing Comprehensive Value

Table 11: The example for using sub-marginal values and time discount factors to aggregate marginal and comprehensive values. Note that the last day’s time discount factors are not used in practice.

Day	$g_{1}$	$g_{2}$	$g_{3}$	$g_{4}$	$\tau_{1}$	$\tau_{2}$	$\tau_{3}$	$\tau_{4}$	$f_{1}$	$f_{2}$	$f_{3}$	$f_{4}$	$u_{1}$	$u_{2}$	$u_{3}$	$u_{4}$
1	861.36	2	15.62	8	0.10	0.55	0.60	0.87	1.99	0.54	0.10	0.13	1.99	0.54	0.10	0.38
2	532.00	4	12.20	6	0.33	0.77	0.73	0.95	0.23	1.02	0.21	0.08	0.43	1.32	0.27	0.40
3	0.00	0	4.20	4	0.46	0.90	0.80	0.95	0.00	0.00	0.06	0.05	0.14	1.02	0.26	0.43
4	6.00	1	6.39	5	0.45	0.86	0.83	0.96	0.00	0.92	0.15	0.04	0.06	1.84	0.35	0.45
5	0.00	0	9.03	7	0.44	0.89	0.85	0.97	0.00	0.00	0.16	0.10	0.03	1.58	0.45	0.53
6	0.00	0	3.12	4	0.43	0.86	0.83	0.97	0.00	0.00	0.04	0.09	0.01	1.39	0.42	0.60
7	2642.00	17	9.14	4	0.50	0.94	0.83	0.96	1.93	1.38	0.00	0.12	1.94	2.58	0.35	0.70
8	0.00	0	5.90	6	0.74	0.97	0.81	0.96	0.00	0.06	0.00	0.22	0.96	2.48	0.29	0.88
9	0.00	0	10.85	4	0.75	0.91	0.84	0.96	0.00	0.00	0.29	0.14	0.71	2.42	0.53	0.99
10	0.00	0	11.39	4	0.75	0.84	0.88	0.95	0.00	0.00	0.40	0.22	0.54	2.20	0.84	1.17
11	80.00	3	5.56	3	0.80	0.91	0.89	0.94	0.00	0.00	0.41	0.23	0.40	1.84	1.15	1.34
12	0.00	0	10.97	6	0.83	0.96	0.91	0.95	0.00	0.00	1.05	0.39	0.32	1.67	2.07	1.66
13	68.00	1	10.54	4	0.86	0.91	0.92	0.96	0.00	0.12	1.38	0.41	0.26	1.73	3.27	2.00
14	80.00	3	8.29	6	0.86	0.93	0.93	0.96	0.00	0.15	1.16	0.29	0.23	1.72	4.18	2.21
15	19.00	4	3.22	4	0.89	0.96	0.90	0.95	0.00	0.08	0.92	0.38	0.20	1.68	4.79	2.50
16	99.00	8	8.59	5	0.90	0.97	0.89	0.95	0.01	0.15	1.01	0.26	0.18	1.78	5.34	2.63
17	1519.50	5	6.17	4	0.96	0.98	0.87	0.95	1.13	0.15	0.68	0.31	1.29	1.87	5.44	2.81
18	198.00	1	7.89	5	0.96	0.97	0.84	0.95	0.00	0.15	0.59	0.31	1.25	1.98	5.30	2.98
19	0.00	0	11.26	6	0.98	0.96	0.81	0.96	0.00	0.09	0.42	0.15	1.20	2.01	4.84	2.99
20	0.00	0	6.97	4	0.96	0.92	0.77	0.96	0.00	0.04	0.18	0.11	1.17	1.96	4.12	2.99
21	75.00	3	4.49	5	0.90	0.91	0.70	0.96	0.00	0.19	0.01	0.11	1.12	2.01	3.18	2.98
22	94.00	2	11.21	5	0.97	0.94	0.71	0.96	0.00	0.32	0.17	0.13	1.01	2.14	2.42	2.98
23	0.00	0	14.81	6	0.93	0.95	0.77	0.96	0.00	0.08	0.20	0.11	0.98	2.10	1.91	2.98
24	74.00	2	7.74	4	-	-	-	-	0.00	0.58	0.28	0.17	0.91	2.57	1.76	3.04
Comprehensive value	(Threshold)														8.28	(1.64)

In Table 11, we delve into an illustrative example extracted from the user value evaluation experiment to explicate the computation of three pivotal functions: the sub-marginal value function, the marginal value function, and the comprehensive value function.

Commencing with the sub-marginal value vector, we employ Eq. (23) in conjunction with an mRNN. On the initial day (Day 1), the sub-marginal values corresponding to the four criteria stand at 1.99, 0.54, 0.10, and 0.13, respectively. Consequently, these values serve as the marginal values for each criterion on Day 1.

Progressing to Day 2, we posit that the marginal value vector is a composite of two components: the sub-marginal value vector for Day 2 and the discounted marginal value vector from the preceding timestamp (Day 1). The calculation of the discount factor for each of the four criteria is determined by Eq. (25), yielding factors of 0.10, 0.55, 0.60, and 0.87. Armed with these factors, the computation of the marginal value vector for Day 2 unfolds as follows:

$\displaystyle u_{1}^{2}$	$\displaystyle=f_{1}^{2}+\tau_{1}^{1}\cdot u_{1}^{1}=0.23+0.10\cdot 1.99=0.43,$	(59)
$\displaystyle u_{2}^{2}$	$\displaystyle=f_{2}^{2}+\tau_{2}^{1}\cdot u_{2}^{1}=1.02+0.55\cdot 0.54=1.32,$
$\displaystyle u_{3}^{2}$	$\displaystyle=f_{3}^{2}+\tau_{3}^{1}\cdot u_{3}^{1}=0.21+0.60\cdot 0.10=0.27,$
$\displaystyle u_{4}^{2}$	$\displaystyle=f_{4}^{2}+\tau_{4}^{1}\cdot u_{4}^{1}=0.08+0.87\cdot 0.13=0.40.$

This iterative process extends until the final timestamp, culminating in a marginal value vector of 0.91, 2.57, 1.76, and 3.04 for the respective criteria. Subsequently, these individual marginal values are aggregated to compute the comprehensive marginal value, resulting in a total of 8.28. Note that we can normalize the obtained sub-marginal value functions to re-scale the comprehensive values into range $[0,1]$ , following the Eq. (15), which is required in some decision scenarios. To render a classification determination, we ascertain the optimized threshold ( $\theta=1.64$ ) through Eqs. (28) and (29). In this specific scenario, given that the calculated marginal value of 8.28 surpasses the threshold of 1.64, the alternative is classified into the high-value category.

Data-driven Preference Learning Methods for Sorting Problems with Multiple Temporal Criteria

Abstract

keywords:

1 Introduction

2 Related Work

2.1 Generalizations of Value-based Preference Learning Methods

2.2 Deep Preference Learning in Multiple Criteria Decision Aiding

3 The Proposed Temporal Preference Learning Model

3.1 Problem Description

Definition 1.

3.2 Temporal Preference Learning Model

Proposition 1.

4 Deep Preference Learning with Monotonic Recurrent Neural Network

4.1 Recurrent Neural Network

4.2 Ensuring Monotonicity and Preference Independence

4.3 Learning Sub-marginal and Marginal Value Functions

4.4 Loss Function with Ordinal Thresholds

5 Experiments on Real-world Data Concerning Gaming Industry

5.1 Data Description

5.2 Comparison with Baseline Models

5.3 Predictive Performance When Considering Different Numbers of Timestamps

5.4 Interpreting User Preferences

5.5 Managerial Implications

6 Simulation Experiments

6.1 Data Generation Process and Experimental Settings

6.2 Experimental Results

6.3 Ability for Reconstructing Sub-marginal Value Functions

6.4 Model Performance for Different Numbers of Sub-intervals

7 Conclusions

Acknowledgments

References

Appendix A Proof of the Transformation from P​0P0 to P​1P1

Proof.

Appendix B Proof of Proposition 1

Proof.

Appendix C A Dummy Example of Loss Function with Ordinal Thresholds

Appendix D Best Model Settings in Experiments

Appendix E Completed Results for Simulation Experiments

Appendix F Example of Computing Comprehensive Value

Data-driven Preference Learning Methods
for Sorting Problems with Multiple Temporal Criteria

Appendix A Proof of the Transformation from $P0$ to $P1$