COUPA: An Industrial Recommender System for Online to Offline Service Platforms
Abstract.
Aiming at helping users locally discovery retail services (e.g., entertainment and dinning), Online to Offline (O2O) service platforms have become popular in recent years, which greatly challenge current recommender systems. With the real data in Alipay, a feeds-like scenario for O2O services, we find that recurrence based temporal patterns and position biases commonly exist in our scenarios, which seriously threaten the recommendation effectiveness. To this end, we propose COUPA, an industrial system targeting for characterizing user preference with following two considerations: (1) Time aware preference: we employ the continuous time aware point process equipped with an attention mechanism to fully capture temporal patterns for recommendation. (2) Position aware preference: a position selector component equipped with a position personalization module is elaborately designed to mitigate position bias in a personalized manner. Finally, we carefully implement and deploy COUPA on Alipay with a cooperation of edge, streaming and batch computing, as well as a two-stage online serving mode, to support several popular recommendation scenarios. We conduct extensive experiments to demonstrate that COUPA consistently achieves superior performance and has potential to provide intuitive evidences for recommendation.
1. Introduction
Recommender systems, which aim at matching user interests, are playing vital role in various online websites and mobile applications, including Taobao (Gong et al., 2020), Youtube (Covington et al., 2016) and Google Play (Cheng et al., 2016). To fully mine user preference in online services, numerous methods are proposed to learn the effective matching function between target users and items with subtle feature engineer (Xiao et al., 2017; Lian et al., 2018; Guo et al., 2017) or abundant historical records (Zhou et al., 2018, 2019; Hidasi et al., 2016; Kang and McAuley, 2018; Hou et al., 2022). Although these methods achieve superior performance to some extent, we argue that they are unsuitable to recommendation scenarios for Online to Offline (O2O) service platforms (e.g., Meituan 111https://about.meituan.com/en, Grubhub 222https://www.grubhub.com/ and Uber Eats 333https://www.ubereats.com/), where recurrence based temporal patterns are ubiquitous.

In the case of Alipay 444https://www.alipay.com/, whose goal is to guide user for locally discovering retail services including entertainment, travelling, delivery and other services, there is a clear and strong need to capture dynamic preference over time for recommendation. Concretely, we illustrate an example in Figure 1. There are numbers of channels (e.g., “Food Delivery” and “Travel” channel) in our scenarios, each of which represents a certain business field. Besides, several items are displayed under each channel, which are called contents of channel. Taking the contents of “Food Delivery” channel 555Obviously, similar findings can also be revealed for other channels (e.g., “Travel” channel). as an example (as shown in the left part of Figure 1), users may be interested in daily necessities (e.g., “Coffee”) or weekly intents (e.g., “HotPot”), leading to different click demands at different times. On the other hand, as shown in the right part of Figure 1, since the feeds-like styles are widely employed in the recommendation scenarios of O2O service platforms, users typically scan the screen of mobile phone from top to bottom, while their attention distributions are indeed distinct. That is, users may prefer clicking items with certain positions regardless of the relevance, which brings so-called personalized position biases to hurt the recommendation effectiveness.
In light of the above observations, we believe it is of crucial importance to design an industrial recommender system for O2O service platforms, centering on characterizing user preference from time and position bias aware perspectives simultaneously. The idea is appealing, while the solution is challenging in real recommendation scenarios, which is summarized as follows: (C1) Time aware preference modelling: how to effectively capture recurrence based preferences for users over continuous time? Recurrence based temporal patterns are ubiquitous in recommendation scenarios for local services. For example, a certain user may periodically visits specific malls and enjoy specific delicacies (e.g., Hot Pot) in a reasonable period (e.g., weekend). Therefore, explicitly capturing such temporal patterns over continuous time is potential to benefit for the preference characterization in real recommendation scenarios. (C2) Position aware preference modelling: how to deal with user’s preferences for positions and build a debiasing model in a personalized way? On account of the feeds-like exhibition in our scenarios and users’ browsing habits, position biases widely exist in the behavior data. Moreover, users have different preferences towards each position. For example, most users prefer clicking items ranked first whereas some users have preferences for other certain positions (e.g., the second and even last position). Blindly fitting the observational behavior data without considering the inherent position biases may seriously deteriorate the recommendation effectiveness. (C3) System design: how to devise the base system to support the complicated temporal interaction scenarios? Our temporal interaction scenario put forward the urgent request of low delay for system design since our approach is expected to have awareness of user real-time behaviors for online serving. Moreover, building the complete chain to balance the trade off between data storage and timely feedback as well as guaranteeing low delays for online serving also remain challenging.
To this end, we propose COUPA, a novel COntinuoUs time and Position bias Aware recommender system targeted for O2O service platforms (e.g., Alipay). Specifically, we borrow the idea of functional time encoding, a promising way to embed timestamp into vector space, and propose a novel continuous time aware point process with an attention mechanism to explore the excites of inhabits from previous interaction records in continuous time (C1). On the other hand, inspired by the idea of knowledge transfer, a position selector component, cooperated with a position personalization module is elaborately designed to perform personalized position debiasing in our scenarios (C2). To satisfy the requirements of online serving, the implementation and deployment are carefully designed to not only collect user behaviors as well as corresponding positions in a real-time manner, but also perform efficient online inference in a two-stage mode (C3).
In sum, we make the following contributions:
-
•
We analyze and highlight the importance of capturing time and position aware preferences in Alipay, which is potentially generalized to the universal recommendation scenarios for O2O service platforms (e.g., Meituan, Grubhub and Uber Eats).
-
•
We propose COUPA, which is not only capable of modeling continuous time aware preference for recommendation, but also well deal with position bias in a personalized manner.
-
•
System deployment and corresponding implementation details are uncovered, where edge, streaming and batch computing are jointly employed for capturing the real-time user behaviors, and a two-stage online serving mode is also carefully designed.
-
•
We conduct extensive offline and online experiments, demonstrating that COUPA consistently achieves superior performance and provides intuitive evidences for recommendation.
2. Observations on Real Data
In this section, we give an intuitive analysis for user preference w.r.t. temporal pattern and position existed in recommendation scenarios for O2O platform, with Alipay as an example. All of the real data are collected within one month, consisting of user interactions with channels and contents under each channel.





Temporal Pattern. As mentioned above, recurrence based temporal patterns are widely existed in recommended scenarios for O2O services. Since users may be interested in daily necessities or weekly intents, which implies that the reasons of users accessing O2O service platform (e.g., Alipay) are periodical. With the real data, we aim at digging out the willingness of users to purchase items under the same categories w.r.t. time interval.
We respectively select the categories “Hot Pot” and “Coffee” and plot the corresponding intensities (i.e., the number of purchase) against time intervals in Figure 2 (a) and (b). Here, refers to the timestamp of first purchase. From results in Figure 2 (a), we observe that the intensity reaches local peak when . Moreover, among these peaks, the intensity is more large when . These observations show that the intention of purchasing “Hot Pot” has strong daily and weekly periodicity. On the contrary, the intensity under the “Coffee” achieve a peak with 24 hours after the first purchase, whereas it sharply declines as time passes. Both cases reveal that user behaviors have strong temporal patterns w.r.t. different categories of items in our recommendation scenarios, which is vital information that should be fully considered in our approach.
Position bias. Click distributions are always distinct over positions for different users in feeds-like scenarios of O2O platform, which brings in personalized bias in modelling caused by that user feedback data is observational rather than experimental.
We conduct a micro-view analysis based on 3 types of users with different click distributions over positions in Figure 2 (c), (d), (e). Specifically, “User type 1”, whose behavior is the most common case, prefers clicking items with higher positions (i.e., Position “0” and “1”), whereas “User type 2” is more likely to click position “2” and “3”. As for “User type 3”, who treat each position almost equally, their preferences are more stable. In sum, we conclude that position bias is not only dependent, but also personalized, which has not been sufficiently explored in previous studies.
3. Preliminary
Functional time encoding aims to find a mapping from time domain to -dimensional vector space, targeting for preserving evolving nature of user interest/intent. Intuitively, the temporal pattern related to the timespan between any two timestamps can be denoted as inner product between their functional encodings, i.e., . Therefore, we formulate above temporal patterns with a translation-invariant kernel with as the mapping function associated with .
Suggested by the Mercer’s Theorem (Minh et al., 2006; Xu et al., 2019a; Hu et al., 2022), we formulate the mapping function with frequency parameter as follows:
(1) |
With the help of the nice truncation properties provided by such a Fourier series-like form, we truncate above mapping function as . Subsequently, by concatenate multiple truncated periodic mapping function with the frequency set , we represent time with functional encoding as:
(2) |
Temporal point process has been commonly adopted to model dynamic in sequences, which is a kind of stochastic process that generates a list of discrete events at different times, denoted as where is the type (i.e., item in our study) of -the event and is the timestamp of the -th event. The process is well characterized via the conditional intensity function , which makes prediction for next arrival time based on the history . Here, the symbol reminds us of dependence on . The core of the temporal point process is to design the conditional intensity function for capturing various interests. As consequence, a series of typical point processes equipped with well-designed intensity functions are proposed, including Possion process, Hawkes process and recently proposed fully neural network based proint process (FullyNN).
Maximum likelihood estimation (MLE) is commonly adopted to learn the parameters of temporal point process, and the corresponding likelihood over a time interval is given by :
(3) |
where the first term models the sum of log-intensity functions of past events and the second term is the log-likelihood of infinitely many non-events, where the negative sampling strategy is always adopted.
4. The Proposed COUPA
In this section, we present COUPA, a novel COntinUous time and Position Aware recommender system for O2O platforms, whose model part mainly focuses on user preferences towards temporal patterns (i.e., time aware preference modelling) and positions (i.e., position bias aware modelling). The overall architecture of COUPA is illustrated in Figure 3.
Before the elaboration of the model design for COUPA, we give a brief introduction for the inputs, which clearly consist of position related features (e.g., the sequence of user click positions and the corresponding item ids), user profile, context feature and user behaviors (including item ids and category ids) from left to right in Figure 3. Following common strategies adopted in (Guo et al., 2017; Lian et al., 2018), we transform the involved features into low-dimensional representations (called embedding) by look-up operation, concatenation and multi-layer perceptron successively. At last, given a user and an item , we represent the embedding for position related features as , user profile as , context features as and item profile (i.e., item id and category id) as , respectively.

4.1. Time aware Preference Modelling
Generally, user’s preferences are affected by the items they have already interacted (e.g., click and purchase) with and the effects evolve as time passes. Due to the fact that modelling such preferences is highly time sensitive, we firstly represent the timestamps into a continuous, higher dimensional space for preserving the temporal patterns with the help of the functional time encoding. Subsequently, we propose continuous time aware point process module to summarize influence from historical interaction records for estimating the likelihood of the target item, where a novel continuous time aware attention mechanism is introduced for locating important/relevant items in continuous time.
4.1.1. Continuous Time aware Attention Mechanism.
To effectively model the target user’s periodic preferences derived from his/her historical behaviors over continuous time spaces, we introduce the continuous time aware attention mechanism to chronologically weigh various underlying preferences for historical interactions w.r.t. the target item. We devise it upon the masked self-attention architecture (Vaswani et al., 2017) to learn the adaptive weights over continuous time conditioned on the involved user behaviors and target item. Formally, given a triplet , we denote the historical interaction records before time as . Following the original self-attention mechanism (Vaswani et al., 2017), we obtain the temporal sequence matrix at time to take account of the relationships of user behaviors and target item .
(4) |
where “——” denotes concatenation operation, is the length of user behaviors and is the original embedding for items. Then, we produce the final representation that summarizes the influence of user behaviors using the scaled dot-product attention, which is formulated as:
(5) |
where and respectively denotes the “query”, “key” and “value’ matrix w.r.t. user and item in time , which are linear projections of the temporal sequence matrix : , , , where and are weight matrices for linear projection.
4.1.2. Continuous time aware point process.
Here, we aim to estimate the likelihood of the target item based on the historical interaction records through continuous time aware point process, whose major role is to construct conditional intensity function . For convenience, we rewrite above intensity function as , where is a non-negative function, commonly implemented as the exponential function in previous works. Instead of formulating such a specific functional form, which only models the exponential effects (decrease or increase) of historical behaviors toward the target item, following the idea in (Omi et al., 2019), we exploit a more complex way to enhance the model capability. That is, we directly modelling the cumulative intensity function , which can be differentiated for the final intensity function.
(6) |
where denotes the interval since the last interaction. Obviously, we adopt an intensity-free formulation to model the user’s time aware preference towards target items, which is more suitable to complex scenarios in real-world applications. Due to the ability of modelling non-linear functions, we implement the cumulative intensity function with a feed-forward neural network as follows:
(7) |
Here, is the user embedding derived from his/her original profile features and is the ReLU activation function. At last, we obtain our final intensity function as follows,
(8) |
4.2. Position Bias aware Modelling
Through the above intuitive analysis, user’s preference of clicking at certain positions may bring in personalized position bias, which may deteriorate the recommendation effectiveness without consideration. Modelling user’s preference for positions is important for debiasing, whereas previous works always take the simple assumption that each position is dependent, which ignores the heavy affects of personalized features related to users. To fill this gap, we aims at performing position debiasing in a personalized manner, where a position selector component equipped with a position personalization module is elaborately designed.
4.2.1. Position Personalization Module.
Since positions differ from each other based on the features of specific users and items, the position uplifts in different positions can be regarded as multiple tasks, whose differences and relations can be naturally captured by the Multi-gate Mixture-of-Experts (MMoE) (Ma et al., 2018). Specifically, following the idea of (Ma et al., 2018), we formulate so-called -th position uplift for user and item as follows:
(9) |
where refers to the position uplift in position , refers to the output of -th experts, is the number of experts, is a trainable matrix for the -th position, and are also trainable parameters and is generated by a 3-layer fully connected neural network with the input of profile embedding for target user (i.e., ), context embedding for target user and items (i.e., ) and continuous time aware embedding for target user and item at current time (i.e., ).
To further enhance personalization, we incorporate a few position related features (e.g.,user click position sequence, item id) and design a Gated Linear Units (GLU) (Dauphin et al., 2017) block to control the information passed from features and the position uplift from MMoE block. Specifically, a linear transformation is applied for the position related features to obtain the gate units for -th position. Then, the final position uplift can be calculated as:
(10) |
Here, is a gated control unit, implemented by the classical sigmoid function, which controls how much information can be passed.
4.2.2. Position Selector.
Knowledge transfer (or parameter sharing) has been proved to be potential for facilitating model learning. We take this inspiration to obtain the output of -th position by summarize uplifts from subsequent positions (i.e., ) as follows:
(11) |
Clearly, the uplift for the -th position is involved in the samples with position , where the learned information can be shared among these samples.
To simplify the formulation and speed up the numerical computation, we construct a position matrix, which is denoted as for position . Then, we rewrite Eq. 11 as follows:
(12) |
where is the position uplift vector. At last, given a user and an item with the position , we estimate the click likelihood as follows:
(13) |
Remark 1.
In general, the displayed position of item is a kind of posterior information, which cannot be obtained for online inference. Therefore, in practice, we only utilize the position data to perform position debiasing in training stage. And on the top of the well-trained model, we set all positions to 0 in default for online serving.
4.3. Model learning
In COUPA, we intend to maximize the following posterior probability of model parameters with observed interaction records involving target user and item with corresponding position and click timestamp .
(14) |
where is the parameter set of COUPA. Here, the first measures the priori probability of model parameters , which can be regarded as the regularizer to avoid overfitting. Then, we mainly focus on the estimation of the second term , which can be factorized by minimizing its negative logarithm:
(15) |
Here, is the cross entropy function and is the noise distribution for the generator of negative samples.
5. Online Deployment of COUPA

In this section, we introduce the online deployment of the proposed COUPA in Alipay 666It is noteworthy that COUPA is also adapted for arbitrary recommendation scenarios in O2O service platforms. to serve ranking tasks for channels and contents of each channel simultaneously, where the system design is still challenging. Most importantly, our temporal sequential model has urgent requirements of low delay for collecting real-time sequential data. Compared with traditional methods (e.g., DeepFM), COUPA is relatively more complicated. It drives us to avoid amounts of delays for online inference, which needs to support high QPS (queries per second). To tackle these challenges, we carefully design the industrial system, roughly comprised of three main modules: data processing, two-stage online serving and offline training.
Data processing. User feedback data (i.e., exposure and click) are necessary to train the recommender system, where the trade off between data storage and timely feedback need to be balanced. Hence, our principle is to store 90-day click data in huge volume via cheap storage that only guarantee latency in days, and timely feedback data in tiny volume via edges that guarantee latency in seconds. As such, we design the following batch, streaming and edge computing in a cooperative manner, followed by a sequence fusion procedure.
Batch computing. Based on the powerful MaxCompute 777https://www.alibabacloud.com/product/maxcompute, an offline and low-cost computing platform developed by Alicloud 888https://www.alibabacloud.com/, we perform batch computing for generating user click sequences in the last 90 days, involving hundreds of billions of interaction records, with a latency of days.
Streaming computing. Streaming computing is mainly based on the frigate system (similar to Kafka 999https://kafka.apache.org/ and Storm 101010https://storm.apache.org/ system) in the Ant Group, which directs at generating user click sequences in the last 2 days. Due to the affects of log back and real-time reporting, this part of the data has a latency of tens of seconds.
Edge computing. Due to the urgent real-time demand of COUPA, we employ edge computing for collecting click sequences on the client in completely real time with a latency of only few seconds. Specifically, user click data will be captured by the client at once after reported, and recorded into local database. As soon as a user proposes a request, the client timely processes his/her click data in the last 3 hours and sends them to online servers.
Sequence fusion. Sequence fusion is applied to fuse user click sequences in different time spans. It stitches the sequence data, generated by batch, streaming and edge computing, in chronological order and eliminates duplicates with timestamps. Subsequently, a complete sequence in the last 90 days is passed to COUPA as input.
Two-stage online serving. This module is designed to relax delays of the online system for COUPA with two main stages. Stage I: Channel Ranking Worker performs the coarse-grained ranking on channels with dozens of candidates based on COUPA, ignoring features of contents under channels. Then only top (i.e., ) channels are selected to request their corresponding contents (i.e., “Food Delive” and “Trave” channels) recommender systems with tens of millions of candidates in Content Recommendation Worker, whose ranking model is COUPA. Stage II: With contents under channels are determined in Stage I as well as collected content features, Channel Ranking Worker performs the fine-grained ranking considering content features on top channels to decide their displayed orders. In this way, system latency is greatly relaxed and ensures the stability of COUPA for online inference.
Offline training. This module aims at the training of the COUPA based on the user historical click records and online feature logs. With help of the MaxCompute and PAI 111111https://www.alibabacloud.com/product/machine-learning platform, we are capable of training our model on tens of billions of data, whose details are stated in the Section 4.
6. Experiments
Food Delivery | Travel | Channel | |
---|---|---|---|
# User | 1,388,754 | 1,323,131 | 8,396,541 |
# Item | 1,937,950 | 17,859 | 17 |
# Train samples | 4,351,640 | 3,839,993 | 31,672,510 |
# Validation samples | 200,000 | 200,000 | 200,000 |
# Test samples | 3,385,818 | 3,186,217 | 1,682,579 |
In this section, we conduct a series of offline and online experiments to demonstrate the effectiveness of COUPA. In short, we aim at answering the following three research questions:
-
•
RQ1: Does our proposed model outperforms other state-of-the-art methods on both temporal recommendation and position debiasing task.
-
•
RQ2: Does our proposed model achieve superior performance in real-world scenarios.
-
•
RQ3: Does our proposed model intuitively provide convincing evidences for recommendation.
6.1. Experimental Setup
6.1.1. Dataset
We collect three real-world datasets for offline evaluation from Alipay, namely Food Delivery , Travel and Channel dataset. Specifically, we extract the interaction records from “2021-01-07” to “2021-01-13” for training and remain records on “2021-01-13” for test. Moreover, we leave out 200, 000 samples of training data as the validation set for parameter tuning. For each user in the extracted dataset, we collect his/her interaction records in the last three months to construct recent behaviors. We only preserve the users who have more than 10 behaviors and keep the ratio of positive and negative samples as 1 : 6. Also, abundant features are extract to effectively characterize users, items and associated scenarios, which are presented in the Section 4. Statistics of the datasets are shown in Table 1.
6.1.2. Baselines
In our experiments, we compare our approach with several state-of-the-art baselines, developed for sequential recommendation (i.e., DeepFM (Guo et al., 2017), GRU4Rec (Hidasi et al., 2016), DIN (Zhou et al., 2018), DIEN (Zhou et al., 2019), SASRec (Kang and McAuley, 2018), Time-LSTM (Zhu et al., 2017)) and position debiasing (i.e., YoutubeRank (Zhao et al., 2019) and PAL (Guo et al., 2019)), respectively. Considering the industrial settings, the selected baselines have potential for scaling up to huge volume of datasets.
-
•
DeepFM (Guo et al., 2017) : It is a typical feature based recommendation method consisting of a factorization machine (FM) component and a deep neural network (DNN) component.
-
•
GRU4Rec (Hidasi et al., 2016) : It is a sequential recommendation method with a RNN structured GRU part for user behaviors modelling.
-
•
DIN (Zhou et al., 2018) : It is a sequential recommendation method with an attention mechanism to exploit related user behaviors.
-
•
DIEN (Zhou et al., 2019) : It is an improved version of DIN by considering the interest evolving process through GRU with an attention update gate.
-
•
SASRec (Kang and McAuley, 2018) : It is a representative sequential recommendation method using a left-to-right Transformer module to capture user behaviors.
-
•
Time-LSTM (Zhu et al., 2017) : It is a temporal recommendation method, equipping a LSTM structure with time gates for time interval modelling.
-
•
YoutubeRank (Zhao et al., 2019): It is a industrial video ranking system for Youtube, which employs the Multi-gate Mixture-of-Experts to optimize multiple ranking objectives and adopt a Wide&Deep framework for mitigating the selection bias.
-
•
PAL (Guo et al., 2019): It is a position bias aware learning framework for CTR prediction, which jointly and simultaneously optimizes the probability that a user see and click a target item.
Moreover, COUPA has two variant:
-
•
COUPAT: It is a variant of COUPA, which only models user’s temporal preference over continuous time
-
•
COUPAP: It is another variant of COUPA, which only aims at position debiasing task.
6.1.3. Implementation Details
We implement all methods on parameter server based distributed learning systems (Zhou et al., 2017) with Tensorflow 1.13. All parameters for baselines are optimized in the validation set as mentioned above, and we briefly present the optimal parameters in our following experiments for reproducibility. For all methods, we perform Adam for optimization with learning rate 1e-4. We set embedding size of each feature as 8 and set the architecture of MLP as [256, 128, 64]. For sequential methods, the max length of sequence is set as 50. For SASRec, two self-attention blocks are used. We run each method ten times and average the results as the final performance.
6.1.4. Evaluation Metrics
In our experiments, we employ several widely used metrics to evaluate the offline and online performance of all approaches, respectively. To evaluate the offline performance (i.e., Section 6.2.1), we adopt the group weighted area under curve (GAUC), which is a more fine-grained metric in industrial settings since it is more relevant to online performance. Formally, we can calculate the GAUC as follows:
(16) |
where is the weight of group (i.e., the number of samples in group ) and AUCg is the AUC for group . Moreover, we also report the relative improvement (RI) w.r.t. GAUC of our approach over compared baselines, which is defined as:
(17) |
where is the absolute value, refers to the performance of our approach and corresponding baselines, respectively. Note that it is remarkable in our industrial scenarios (especially the channel rank scenario) that only 0.1% improvement w.r.t. GAUC is achieved. To evaluate the online performance (i.e., Section 6.2.2 and 6.3), we use the metric IPV and CTR, which are commonly adopted in various industrial online systems.
Food Delivery | Travel | Channel | ||||
---|---|---|---|---|---|---|
GAUC | RI | GAUC | RI | GAUC | RI | |
DeepFM | 0.7534 | +3.31% | 0.7198 | +4.86% | 0.8837 | +0.26% |
GRU4Rec | 0.7682 | +1.32% | 0.7373 | +2.37% | 0.8839 | +0.15% |
DIN | 0.7682 | +1.32% | 0.7371 | +2.40% | 0.8846 | +0.16% |
DIEN | 0.7701 | +1.08% | 0.7421 | +1.71% | 0.8849 | +0.12% |
SASRec | 0.7688 | +1.25% | 0.7362 | +2.52% | 0.8843 | +0.19% |
Time-LSTM | 0.7698 | +1.12% | 0.7399 | +2.01% | 0.8842 | +0.20% |
COUPAT | 0.7784** | - | 0.7548** | - | 0.8860* | - |
CTR | IPV | ||
---|---|---|---|
YoutubeRank | - | - | |
Food Delivery | PAL | +1.07% | +1.03% |
COUPAP | +1.44% | +1.28% | |
YoutubeRank | - | - | |
Travel | PAL | +0.78% | +0.84% |
COUPAP | +1.22% | +1.17% | |
YoutubeRank | - | - | |
Channel | PAL | +1.26% | +1.50% |
COUPAP | +2.04% | +2.18% |



6.2. Performance Comparison (RQ1)
6.2.1. Temporal Recommendation
We evaluate the performance of temporal recommendation on three offline datasets. Since the performance of position debiasing cannot be evaluated offline, we only show the performance of the variant COUPAT and select DeepFM, GRU4Rec, DIN, DIEN, SASRec and Time-LSTM as baselines for comparison.
We report the comparison results on three offline datasets in Table 2 and summarize the major findings as follows. (i) COUPAT is consistently better than all the baselines with statistical significance, demonstrating its effectiveness to learn time aware preference for recommendation. Note that the improvement on the Channel dataset is relatively weak since there are extremely few items in this dataset. Nevertheless, such a slight gain still is remarkable for online performance, which will be shown in Section 6.3. (ii) Among baselines, the performance improvement of GRU4Rec, DIN and SASRec w.r.t. DeepFM reveals effectiveness of user historical behaviors for inferring his/her preferences, while the performance improvements of DIEN w.r.t. DIN and Time-LSTM w.r.t. GRU4Rec imply the importance of modelling interest evolution for users over time. (iii) COUPAT still significantly outperforms both state-of-the-art sequential and temporal recommendation methods, attributed to the excellent ability for summarizing influence from historical interactions for recommendation through continuous time aware attention mechanism and point process. Overall, COUPAT achieves performance improvement over the best baseline by 1.08%, 1.71% and 0.12% on the three offline datasets, respectively.
6.2.2. Position Debiasing
We examine the performance of position debiasing on three online scenarios (i.e., Food delivery, Travel and Channels). For fair comparison, we only report the performance of COUPAP and select YoutubeRank and PAL as baselines for comparison. For convenience, we report the relative improvement ratios w.r.t. YoutubeRank. We perform the evaluation with 7 days and report average results for each method in Table 3.
From the results, we observe that COUPAP achieves the best performance with statistical significance in all scenarios in term of both IPV and CTR metrics, clearly demonstrating the superiorities of COUPAP for performing position debiasing in a personalized manner. It is noteworthy that COUPAP works extremely well on the scenerio of channel rank. An intuitive explanation is that the position bias issue of this scenario is more serious since each channel occupies a large area and the corresponding click distributions are indeed very skew, which has been analyzed in Section 2. Obviously, these issues are unable to be well handled by the YoutubeRank and PAL.
6.3. Online Performance (RQ2)
To further verify the effectiveness of the proposed COUPA in the real-world settings, we conduct a series of experiments for online services. Similarly, we deploy COUPA into three scenarios (i.e., Food delivery, Travel and Channels) in Alipay, comparing it with the existed deployed baseline in our real system. Specifically, for each scenario, we conduct a bucket testing (i.e., A/B testing) online to evaluate the users’ responses to COUPA against the baseline, where we select one bucket for COUPA and another for the baseline. We perform the evaluation from “2020-01-13” to “2020-01-20” and present performance comparison results in Figure 5. For convenience, we report the relative improvement ratios w.r.t. the selected baseline.
From the results, We observe that, compared to the best baseline used in our real system, COUPA consistently and significantly yields performance improvement by a large margin in three scenarios across all online metrics, which further demonstrates superior abilities of COUPA for capturing user preferences over time as well as mitigating the position bias in real-world applications. Overall, for the three real scenarios, COUPA gains the average improvement of 4.06%, 6.37%, 3.11% for IPV, 3.73%, 5.63%, 2.47% for CTR.


6.4. Case Study (RQ3)
At last, we conduct case studies to show how COUPA intuitively provides reasonable evidences for recommendation in term of continuous time and position bias, respectively.
From the temporal perspective, we select a certain user from dataset of Food delivery channel and plot his/her attention weights w.r.t. time interval of historical behaviors in Figure 6 (a). We have following two observations: (i) With the increase of time interval, the corresponding attention weight decreases, demonstrating user’s current preference is extremely affected by recent behaviors. (ii) The change of the attention weight has periodical trend, which is consistent with our findings in Section 2. In sum, both observations intuitively verifies that COUPA is capable of summarizing influence from historical interactions for recommendation.
From the perspective of position bias, we select two typical user and plot the average prediction score for each position (i.e., according to Eq. 11) as well as their actual click trends in Figure 6 (b). We only present the performance for the top 4 positions (i.e., “0” “3”) since they are most impressed to some extent by users in the channel rank scenario of Alipay. As the position changed, we find the distinct trends of average prediction score for the two selected users (i.e., increase for “User 1” and decrease for “User 2”). Not surprisingly, their score distributions predicted by COUPA are consistent with their trends of actual clicks w.r.t. each position. Above analyses further verify the strong capability of COUPA for capturing the personalized bias based on the real distribution over positions.
7. Related Work
Recent years have witnessed the great success of sequential recommendation methods, which devote to predicting user interests based on his/her historical behavior sequences. Early works mainly focus on modelling behavior sequences with Markov chains (Rendle et al., 2010; He and McAuley, 2016; Zimdars et al., 2001). Due to the ability of capturing complex feature interaction, the deep neural network (DNN) has yielded excellent performance in various applications. To marry up the advantage of DNN and sequential recommendation, a series of methods attempt to extract user preferences from his/her previous behaviors through recursive neural network (RNN) (Hidasi et al., 2016; Tan et al., 2016), convolutional neural network (CNN) (Xu et al., 2019b; Tang and Wang, 2018; Yan et al., 2019) and memory network (Chen et al., 2018) based structures. Meanwhile, recently emerging neural attention mechanism (Vaswani et al., 2017) is commonly adopted in characterizing long- and short-term preferences in a more fine-grained principle (Zhou et al., 2018, 2019; Kang and McAuley, 2018; Luo et al., 2020). Furthermore, many recent studies have proved that temporal information is of crucial importance for recommendation, which implies users’ underlying reasons to click/purchase a target item. Therefore, a few works are proposed to incorporate such temporal information into sequential recommendation with extended LSTM structure (Zhu et al., 2017), temporal point process (Bai et al., 2019) and time aware neural network (Ye et al., 2020). Here, we please refer the reader to recent surveys (Fang et al., 2020; Wu et al., 2021) for a thorough review,. Unfortunately, due to the time sensitivity of user preferences, how to fully exploit influences derived from historical interaction records still remains challenging. Moreover, in the industrial settings, automatically collecting real-time sequential data and guaranteeing low delays for on online serving also need to be carefully considered.
On the other hand, recent years have seen a surge of efforts on exploring the impacts of the biases and performing necessary debiasing in recommender systems. In our paper, we center on the widely studied position bias, which denotes that users are more likely to click the higher ranked items regardless of the corresponding relevance. To solve the position bias, a number of works have been proposed, which roughly fall into two lines. The first line makes some basic hypotheses about the user browsing behaviors. Specifically, these methods maximize the likelihood of the observed interaction data for the true relevance feedback with self-defined examination hypothesises (Craswell et al., 2008; Dupret and Piwowarski, 2008; Chapelle and Zhang, 2009), personalized transition probability based cascade models (Guo et al., 2009; Zhu et al., 2010) and state-of-the-art deep recurrent survival model (Jin et al., 2020). The other line is based on the recently emerging counterfactual learning (Joachims et al., 2017; Wang et al., 2018), which weights each sample with a position aware value and collects the user feedback with inverse propensity score (Agarwal et al., 2019). Following this line, some recent works (Qin et al., 2020; Ai et al., 2018) further adopt the dual leaning for jointly learning a propensity model as well as a recommendation model with well-designed EM algorithms. To learn more works about debising in recommender systems, please refer to the elaborate survey (Chen et al., 2020). Nevertheless, these methods mainly assume the independence of positions, while the personalization of position bias is still unexplored.
8. Conclusion and Future Work
In this paper, we proposed a novel continuous time and position aware recommender system for O2O service platforms (e.g., Alipay), called COUPA, which comprehensively takes user preferences towards temporal patterns and position biases into consideration. To better learn temporal patterns in real-world applications, we develop a continuous time aware point process equipped with continuous time aware attention mechanism to chronologically summarize influences derived from historical behaviors for recommendation. Moreover, a position selector module cooperated with a Multi-gate Mixture-of-Experts (MMoE) block and a Gated Linear Units (GLU) black is introduced for mitigating position bias in a personalized manner. At last, we devoted to the design and implementation of our whole system, which jointly employs edge, streaming and batch computing for capturing the real-time user behaviors and adopt a two-stage mode for efficient online serving. Extensive offline and online experiments demonstrated the superiority of our proposed COUPA.
Currently, our approach is able to effectively extract user preferences from historical interaction records for recommendation. While, there exist a number of users/items with sparse interactions in real-world scenarios, for which current approach fails to learn high-quality representations. As future work, we will consider how to incorporate graph structure data (e.g., social network and knowledge graph) for compensating these cold-start users/items. In addition, we will also consider adapting our approach to well deal with geographical information, which plays an important role in O2O service platforms.
References
- (1)
- Agarwal et al. (2019) Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, and Thorsten Joachims. 2019. A general framework for counterfactual learning-to-rank. In SIGIR. 5–14.
- Ai et al. (2018) Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased learning to rank with unbiased propensity estimation. In SIGIR. 385–394.
- Bai et al. (2019) Ting Bai, Lixin Zou, Wayne Xin Zhao, Pan Du, Weidong Liu, Jian-Yun Nie, and Ji-Rong Wen. 2019. Ctrec: A long-short demands evolution model for continuous-time recommendation. In SIGIR. 675–684.
- Chapelle and Zhang (2009) Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In WWW. 1–10.
- Chen et al. (2020) Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. 2020. Bias and Debias in Recommender System: A Survey and Future Directions. arXiv preprint arXiv:2010.03240 (2020).
- Chen et al. (2018) Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. In WSDM. 108–116.
- Cheng et al. (2016) Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In RecSys Workshop. 7–10.
- Covington et al. (2016) Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. 191–198.
- Craswell et al. (2008) Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In WSDM. 87–94.
- Dauphin et al. (2017) Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In ICML. 933–941.
- Dupret and Piwowarski (2008) Georges E Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations.. In SIGIR. 331–338.
- Fang et al. (2020) Hui Fang, Danning Zhang, Yiheng Shu, and Guibing Guo. 2020. Deep Learning for Sequential Recommendation: Algorithms, Influential Factors, and Evaluations. ACM Transactions on Information Systems 39, 1 (2020), 1–42.
- Gong et al. (2020) Yu Gong, Ziwen Jiang, Yufei Feng, Binbin Hu, Kaiqi Zhao, Qingwen Liu, and Wenwu Ou. 2020. EdgeRec: Recommender System on Edge in Mobile Taobao. In CIKM. 2477–2484.
- Guo et al. (2009) Fan Guo, Chao Liu, Anitha Kannan, Tom Minka, Michael Taylor, Yi-Min Wang, and Christos Faloutsos. 2009. Click chain model in web search. In WWW. 11–20.
- Guo et al. (2017) Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In IJCAI. 1725–1731.
- Guo et al. (2019) Huifeng Guo, Jinkai Yu, Qing Liu, Ruiming Tang, and Yuzhou Zhang. 2019. PAL: a position-bias aware learning framework for CTR prediction in live recommender systems. In RecSys. 452–456.
- He and McAuley (2016) Ruining He and Julian McAuley. 2016. Fusing similarity models with markov chains for sparse sequential recommendation. In ICDM. 191–200.
- Hidasi et al. (2016) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. In ICLR.
- Hou et al. (2022) Yupeng Hou, Binbin Hu, Zhiqiang Zhang, and Wayne Xin Zhao. 2022. Core: simple and effective session-based recommendation within consistent representation space. In SIGIR. 1796–1801.
- Hu et al. (2022) Binbin Hu, Zhengwei Wu, Jun Zhou, Ziqi Liu, Zhigang Huangfu, Zhiqiang Zhang, and Chaochao Chen. 2022. MERIT: Learning Multi-level Representations on Temporal Graphs. In IJCAI.
- Jin et al. (2020) Jiarui Jin, Yuchen Fang, Weinan Zhang, Kan Ren, Guorui Zhou, Jian Xu, Yong Yu, Jun Wang, Xiaoqiang Zhu, and Kun Gai. 2020. A deep recurrent survival model for unbiased ranking. In SIGIR. 29–38.
- Joachims et al. (2017) Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. In WSDM. 781–789.
- Kang and McAuley (2018) Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In ICDM. 197–206.
- Lian et al. (2018) Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In SIGKDD. 1754–1763.
- Luo et al. (2020) Anjing Luo, Pengpeng Zhao, Yanchi Liu, Fuzhen Zhuang, Deqing Wang, Jiajie Xu, Junhua Fang, and Victor S Sheng. 2020. Collaborative Self-Attention Network for Session-based Recommendation. In IJCAI. 2591–2597.
- Ma et al. (2018) Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In SIGKDD. 1930–1939.
- Minh et al. (2006) Ha Quang Minh, Partha Niyogi, and Yuan Yao. 2006. Mercer’s theorem, feature maps, and smoothing. In COLT. 154–168.
- Omi et al. (2019) Takahiro Omi, Naonori Ueda, and Kazuyuki Aihara. 2019. Fully neural network based model for general temporal point processes. In NIPS. 2120–2129.
- Qin et al. (2020) Zhen Qin, Suming J Chen, Donald Metzler, Yongwoo Noh, Jingzheng Qin, and Xuanhui Wang. 2020. Attribute-based propensity for unbiased learning in recommender systems: Algorithm and case studies. In SIGKDD. 2359–2367.
- Rendle et al. (2010) Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In WWW. 811–820.
- Tan et al. (2016) Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neural networks for session-based recommendations. In RecSys Workshop. 17–22.
- Tang and Wang (2018) Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In WSDM. 565–573.
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998–6008.
- Wang et al. (2018) Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position bias estimation for unbiased learning to rank in personal search. In WSDM. 610–618.
- Wu et al. (2021) Le Wu, Xiangnan He, Xiang Wang, Kun Zhang, and Meng Wang. 2021. A Survey on Neural Recommendation: From Collaborative Filtering to Content and Context Enriched Recommendation. arXiv preprint arXiv:2104.13030 (2021).
- Xiao et al. (2017) Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017. Attentional factorization machines: Learning the weight of feature interactions via attention networks. In IJCAI. 3119–3125.
- Xu et al. (2019b) Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Jiajie Xu, Victor S Sheng S. Sheng, Zhiming Cui, Xiaofang Zhou, and Hui Xiong. 2019b. Recurrent convolutional neural network for sequential recommendation. In WWW. 3398–3404.
- Xu et al. (2019a) Da Xu, Chuanwei Ruan, Sushant Kumar, Evren Korpeoglu, and Kannan Achan. 2019a. Self-attention with functional time representation learning. In NIPS. 15915––15925.
- Yan et al. (2019) An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, and Julian McAuley. 2019. CosRec: 2D convolutional neural networks for sequential recommendation. In CIKM. 2173–2176.
- Ye et al. (2020) Wenwen Ye, Shuaiqiang Wang, Xu Chen, Xuepeng Wang, Zheng Qin, and Dawei Yin. 2020. Time Matters: Sequential Recommendation with Complex Temporal Information. In SIGIR. 1459–1468.
- Zhao et al. (2019) Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending what video to watch next: a multitask ranking system. In RecSys. 43–51.
- Zhou et al. (2019) Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In AAAI. 5941–5948.
- Zhou et al. (2018) Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In SIGKDD. 1059–1068.
- Zhou et al. (2017) Jun Zhou, Xiaolong Li, Peilin Zhao, Chaochao Chen, Longfei Li, Xinxing Yang, Qing Cui, Jin Yu, Xu Chen, Yi Ding, et al. 2017. Kunpeng: Parameter server based distributed learning systems and its applications in alibaba and ant financial. In SIGKDD. 1693–1702.
- Zhu et al. (2017) Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. 2017. What to Do Next: Modeling User Behaviors by Time-LSTM.. In IJCAI. 3602–3608.
- Zhu et al. (2010) Zeyuan Allen Zhu, Weizhu Chen, Tom Minka, Chenguang Zhu, and Zheng Chen. 2010. A novel click model and its applications to online advertising. In WSDM. 321–330.
- Zimdars et al. (2001) Andrew Zimdars, David Maxwell Chickering, and Christopher Meek. 2001. Using temporal data for making recommendations. In UAI. 580–588.