Assessing the Impact of Upselling in Online Fantasy Sports

Aayush Chaudhary Dream11MumbaiIndia [email protected]

(2024)

Abstract.

This study explores the impact of upselling on user engagement. , we model users’ deposit behaviour on the fantasy sports platform Dream11. Subsequently, we develop an experimental framework to evaluate the effect of upselling using an intensity parameter. Our live experiments on user deposit behaviour reveal decreased user recall with heightened upselling intensity. Our findings indicate that increased upselling intensity improves user deposit metrics and concurrently diminishes user satisfaction and conversion rates. We conduct robust counterfactual analysis and train causal meta-learners to personalise user’s upselling intensity levels to reach an optimal trade-off point.

Behavioural sciences, Online Upselling, Causal Machine Learning, Price Sensitivity, Meta-learners

^†^†journalyear: 2024^†^†copyright: acmlicensed^†^†conference: The Fourth International Conference on Artificial Intelligence and Machine Learning Systems; ; Louisiana State University, Baton Rouge, LA^†^†booktitle: The Fourth International Conference on Artificial Intelligence and Machine Learning Systems (AIMLSystems 2024), October 25–28, 2024, Bangalore, India^†^†isbn: 979-8-4007-1649-2/23/10

1. Introduction

Upselling or cross-selling refers to offering users additional products or services that are higher-end or complementary to their initial purchase. This strategy is widely employed across various industries to enhance revenue and customer value. According to Schiffman (Schiffman, 2005), ”Upselling is the initiative to encourage customers who have already purchased to buy more of the same product or additional products.”, whereas Kubiak et al. (2010) describe upselling as advancing to a more expensive product or service (Kubiak and Weichbroth, 2010).

Upselling is studied extensively across multiple sectors, such as airlines, hospitality, retail, and insurance. And their impacts on customer satisfaction and revenue metrics. For instance, in the airline industry, upselling involves flight seat upgrades (Thirumuruganathan et al., 2023), while in the hospitality industry, it involves luxury room upgrades (Guillet, 2020), In the Insurance industry (Guelman et al., 2014) to cross-sell/Upsell.

Upselling applications span sectors. In this paper, we study user behaviour of upselling on a fantasy sports platform, Dream11, with over 200M users. Further, to optimise the upselling policy, we developed a system that has not been effectively addressed by existing methods relying on traditional machine learning techniques.

In this work, we draw the following conclusions to the field of user behaviour modelling in upselling on online platforms:

•

Using a multi-class supervised model, we predict the deposit propensity of a user and evaluate it across supervised regression, classification and heuristic baseline methods. During online experiments, we analysed the trade-offs between recall and revenue uplift.
•

Evaluate causal models to address the trade-offs and arrive at an optimal upselling policy from real-world deposit data collected using grid experiments across the spectrum of intensity values.

By accurately predicting deposit amounts and tailoring upselling suggestions, we aim to ensure a seamless payment system, user engagement, fairness, and the platform’s revenue optimisation. We conducted several online grid experiments across intensity gradients to upsell users. We found that increasing intensity leads to a decrease in user recall. The recall dropped because some users disliked upselling, and many new users had dropped off the conversion funnel. Therefore, it was evident that there should be a personalised upselling policy. The proposed approach builds upon previous work in uplift modelling to estimate heterogeneous treatment effects (Zhao and Harinen, 2019).

2. Problem Formulation and Algorithms

We develop a supervised model to predict the expected deposit amount for each user. This model will feed the upselling strategy by generating a list of suggested deposit amounts tailored to individual users.

2.1. Supervised Model Formulation

Let $\mathbf{X}=\{x_{1},x_{2},\ldots,x_{n}\}$ denote the set of user-specific and deposit-specific features, including demographic information, historical transaction data, and engagement metrics. Let $F(\mathbf{X})$ represent the set of deposit amounts for a user, modelled as a multi-class classification model. The problem involves two key components: predicting the expected deposit amount for each user and computing the recommended list of suggested deposit amounts. We want to learn the function $F(\mathbf{X})$ from the historical deposit data $\mathbf{X}$ to predict the expected deposit amounts.

2.1.1. Evaluation and Learnings

In this paper, we introduce key metrics designed to provide insights into the accuracy and reliability of our classification models. Since our dataset was imbalanced towards lower deposit amounts, we selected the weighted F1 score as our evaluation criterion because it is crucial to have a model that is equally correct for all the classes. The dataset is imbalanced with lower-value transaction amounts, causing a significant load on the payment systems. Focal loss and l2 regularization proved pivotal for the accuracy of the final model.

Model	F1 Score (Weighted)
Heuristic (Median rolling 10 Txn)	0.736
LGBM Regressor	0.758
LGBM Classifier	0.804
LGBM Classifier (Focal Loss)	0.852

Table 1. Model Performance Comparison

3. Experimentation

3.1. Upselling Experiment

The payments dataset used in this research comes from millions of payment transactions conducted on the Dream11 fantasy sports gaming platform. Whenever the user adds cash to the account, we recommend a set of three values shown in table 2. There are maximum limits and UX constraints on these values. We arrived at these values by analyzing historical deposit amounts and identifying the most frequent deposit amounts and their multiples. This allowed us to cover a range of common deposit behaviours and encourage higher deposits without overwhelming the user.

Amount to add
Prefill value	Option 1	Option 2
50	200	500

Table 2. Recommended Deposit Values

3.1.1. Objective:

The aim was to evaluate the impact of upselling using an intensity factor ( $a$ ). We sampled deposit amounts from the user-specific distribution $\text{F}(X_{i})$ and adjusted them using $a$ . Users were randomly assigned to one of the four target groups (TGs) listed in Table 3.

Target Group	Treatment
TG1	1x
TG2	1.25x
TG3	1.5x
TG4	2x

Table 3. Intensity Treatment Groups

3.1.2. Guardrails

We monitored the following metrics to ensure upselling didn’t negatively affect user experience:

•

Conversion Funnel: Ratio of successful transactions to attempted transactions.
•

User Engagement: Recall on the recommendations, Time spent on the platform and changes in transaction frequency.
•

Revenue Uplift: Comparison of average deposit amounts across target groups.

3.2. Online Results

The control group received a default recommendation, whereas the target groups received various personalized amounts alongside different discrete intensity values [1x, 1.25x, 1.5x, 2x]. This experiment lasted eight weeks, from January 1, 2024, to February 28, 2024. We observed an immediate impact on the per transaction value of about +5.6%, while users were also making fewer transactions on the platform (reduction in infra cost), resulting in a 4.8% fall in the most aggressive variant. While evaluating the long-term sustained impact in the avg_deposit_week_8, Figure [1] illustrates the uplift across the experimental variants shown in Table [3].

Refer to caption — Figure 1. Recall and Revenue Uplift Across Variants

Upselling impacted the experiment guardrails, which were the user engagement metrics. The time spent for a successful payment increased, and user conversion dropped by almost 2.1%, mainly impacting new users.

4. Causal Uplift Modelling

During data analysis of the upselling experiment, we saw that the challenge was to estimate how much to upsell a user without a trade-off in user satisfaction. A popular approach for efficiently assigning treatments to a subset of customers for targeted personalization is using causal machine learning models (Gutierrez and Gérardy, 2017; Diemert, 2018).

4.1. Dataset Preparation

For causal uplift modeling, the dataset consists of three key components: Treatment (T), Response (Y), and Features/Covariates (X). These are defined as follows:

•

Treatment (T_i): The intensity of upselling, represented by four levels: {1.0, 1.25, 1.50, 2.0}.
•

Response (Y): The primary outcome of interest, measured as the total deposit amount by each user.
•

Features (X_i): User-specific characteristics, including demographics, gameplay behavior, and historical transaction data.

To arrive at an optimal upselling policy, we personalize the intensity factor for each user. We estimate the incremental uplift by applying the intensity treatments (T_i) uniformly across all users. This dataset will then be used to model and optimize upselling strategies.

4.2. Model Training

We estimated the Conditional Average Treatment Effect (CATE) $Y(x)$ , which represents the change in the probability of completing a deposit $\Pr(Y=1)$ due to varying upselling intensity, based on user features $X$ . To achieve this, we tested four algorithms: S-learner, T-learner, X-learner, and R-learner, implemented using Python’s causalml library (Chen et al., 2020), with XGBRegressor as the base model and a fixed propensity score of 0.5.

For model training and optimization, we used the hyperopt package (Bergstra et al., 2013) with SparkTrials to distribute workloads, and MLflow (Zaharia et al., 2018) to manage model runs. The parameter search was guided by Tree-based Parzen Estimators (TPE), with the objective function defined as:

•

Define the parameter search space using causalml.
•

Train models on the training dataset.
•

Apply trained models to both training and test sets.
•

Optimize by calculating the mean AUUC (Area Under the Uplift Curve) score on the test set.

We ran 1000 random trials, logging the AUUC for each treatment. Two model configurations were tested:

•

Global configuration: Selects the model with the lowest test set loss across all treatments.
•

Local configuration: Optimizes the CATE for each treatment group (TG) using the highest test set AUUC.

Table 4 summarizes the XGBRegressor configurations used in the SparkTrials. Each treatment group (TG1, TG2, TG3, TG4) has its own local configuration parameters, while the global model (TG_global) is optimized for overall performance. A treatment is assigned based on the following rule:

\pi(x)=\begin{cases}\operatorname{argmax}\limits_{t\neq\text{CG}}\text{CATE}(x\mid T=t)&\text{if}\;\max\limits_{t\neq\text{CG}}\text{CATE}(x\mid T=t)>0\\ \text{CG}&\text{otherwise}\end{cases}

Parameter TG1 TG2 TG3 TG4 TG_global meta_learner r_learner x_learner r_learner s_learner r_learner gamma 1.24 8.40 7.95 4.60 5.67 n_estimators 59 153 102 125 200 colsample_bytree 0.82 0.56 0.93 0.88 0.75 max_depth 7 10 15 8 12 min_child_weight 5 3 6 4 2 reg_lambda 0.36 0.14 0.57 0.94 0.50 reg_alpha 77 115 50 163 90 Configuration Local config Global config

Table 4. XGBRegressor Configurations for Each Treatment Group

4.3. Evaluation

Uplift modelling faces performance limitations due to the need for an unbiased evaluation metric. Individual responses to actions and natural responses are unknown simultaneously, preventing explicit labelling of uplift responses. We underwent rigorous evaluations on both the business and model sides to establish confidence in our analysis.

•

Percent Treated: Percentage of users distributed across all possible treatment values, including control. In Table [5], the grid-experiment policy is uniformly distributed while the CATE policy is assigned with a non-negative threshold.

Percent Treated policy

Treatment Grid exp policy CATE policy( $\pi$ )

CG 19.92 3.16

TG1 20.12 12.71

TG2 20.05 31.86

TG3 20.13 37.42

TG4 19.78 14.86

Table 5. Policy Assignment Table
•

ERUPT: Expected Response Under Proposed Treatments: (Zhao et al., 2017), (Hitsch and Misra, 2018) It quantifies a model’s performance at the downstream task of predicting the optimal treatment to assign each member of the population. It is defined as follows

(1) $\text{ERUPT}=\mathbb{E}(Y\cdot\mathbb{I}(\pi(x)=t)\mid T=t,\pi)$

We calculate the expected bootstrapped revenue using a sample size of 5000. After model inference, the CATE policies for both test ( $\pi_{\text{test}}$ ) and train ( $\pi_{\text{train}}$ ) datasets are derived, with the intensity factor yielding the highest uplift being selected. As shown in Figure [2], the CATE policy results in a significant 10.7% uplift across both the training and test datasets.

Figure 2. Erupt revenue distribution for grid experiment policy vs CATE policy across test and train datasets.
•

AUUC (Area Under the Uplift Curve): The uplift chart consists of an Uplift Curve and a random baseline curve for each treatment. The charts demonstrate a strong offline fit of the underlying estimators. The cumulative gains are plotted by ranking users based on their inferred uplift scores, showing that most gains are concentrated within the top-ranked users.

Figure 3. AUUC curves across treatments, the y-axis represents the cumulative incremental gains, and the x-axis is the proportion of the population targeted.

Percent Treated	policy
CG	19.92	3.16
TG1	20.12	12.71
TG2	20.05	31.86
TG3	20.13	37.42
TG4	19.78	14.86

5. Conclusion and Future Work

5.1. Conclusion

Our research highlights the intricate trade-offs between upselling intensity and user satisfaction. While increasing upselling intensity leads to immediate revenue gains, it also results in a decline in user recall and satisfaction. This effect was most pronounced in the TG1 group, where a bi-modal distribution of responses revealed polarized reactions to upselling strategies. These findings emphasize the need to carefully balance upselling intensity with user experience to maintain both engagement and revenue growth.

The causal uplift modelling framework allows us to tailor the intensity for each user, effectively managing these trade-offs and ensuring fairness in recommendations. Results from both the test and training datasets show that taking these models online holds significant potential, with an estimated revenue uplift of 10.7%.

Further analysis, as shown in Figure [1], revealed that the guardrails mainly were impacted for new users. The offline CATE policy applied a 1x intensity for 80% of new users, resulting in a 2.1% improvement in conversion rates. While these offline results are promising, further validation is required in an online experimental setting to confirm the optimal trade-offs between revenue and user experience.

5.2. Future Work

Our future work will focus on testing the models online, exploring additional user features, and extending the experimentation to incentivise user behaviour upselling on the platform. Further, developing personalized upselling strategies that dynamically adjust intensity levels based on user behaviour and feedback could enhance revenue and user satisfaction.

6. Appendix

6.1. Platform Implementation

The ”Deposit Amount Upselling” platform shown in figure 4 integrates several key components to predict deposit amounts and provide personalized upselling recommendations. Below is a summary of the implementation steps:

•

Data Ingestion: The platform ingests data from two primary sources: streaming events and backend data stored in the warehouse.
•

Feature Aggregation/pre-processing: We prepare the data by performing the following steps: label creation, gradient features calculation, outlier removal and feed it to the Deposit Amount Predictor model, ensuring the model leverages both real-time and historical data for accurate predictions.
•

Deposit Amount Predictor: A machine learning model predicts the likely deposit amounts for users. The model outputs deposit amount classes (e.g., 0-20, 20-40, 1000-25000) and corresponding probability scores.
•

Upselling Recommendation: The predicted amounts and scores are processed by an upselling policy, which employs an intensity layer to experiment using the model’s predictions.

6.2. Technical Stack

•

Data Ingestion: Real-time data processing is performed using Kafka on our in-house stream processing platform (Gaikwad, 2023). Backend data is retrieved from SQL databases like Redshift.
•

Feature Store: Processed features are stored efficiently in a feature store, supporting high throughput and low-latency retrieval. Delta Lake is used for offline storage, while Redis is employed for online storage.
•

Model Training and Prediction: Causal Machine learning models are developed in Python using (Chen et al., 2020) library. Airflow manages the scheduling for both batch and real-time inference scenarios.
•

Model Management: ML Flow (Zaharia et al., 2018) provides seamless model tracking and registry management.
•

Deployment and Inference: We power and monitor our inference jobs on our in-house machine-learning platform, Darwin
•

Backend Integration: Recommendations are served into backend systems for delivery to end-user devices.

References

(1)
Bergstra et al. (2013) James Bergstra, Daniel Yamins, and David D. Cox. 2013. Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms. In SciPy. https://api.semanticscholar.org/CorpusID:52000504
Chen et al. (2020) Huigang Chen, Totte Harinen, Jeong-Yoon Lee, Mike Yung, and Zhenyu Zhao. 2020. CausalML: Python Package for Causal Machine Learning. arXiv:2002.11631 [cs.CY]
Diemert (2018) E. Diemert. 2018. A Large Scale Benchmark for Uplift Modeling. https://api.semanticscholar.org/CorpusID:235079097
Gaikwad (2023) Ruturaj Gaikwad. 2023. streamverse. https://tech.dream11.in/blog/navigating-the-streamverse-a-technical-odyssey-into-advanced-stream-processing-at-dream11 Accessed on Jan 01, 2023.
Guelman et al. (2014) Leo Guelman, Montserrat Guillén, and Ana Maria Pérez-Marín. 2014. A survey of personalized treatment models for pricing strategies in insurance. Insurance Mathematics & Economics 58 (2014), 68–76. https://api.semanticscholar.org/CorpusID:18464332
Guillet (2020) Basak Denizci Guillet. 2020. Online upselling: Moving beyond offline upselling in the hotel industry. International Journal of Hospitality Management 84 (2020), 102322.
Gutierrez and Gérardy (2017) Pierre Gutierrez and Jean-Yves Gérardy. 2017. Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive APIs and Apps. https://api.semanticscholar.org/CorpusID:6970463
Hitsch and Misra (2018) GGnter Hitsch and Sanjog Misra. 2018. Heterogeneous Treatment Effects and Optimal Targeting Policy Evaluation. SSRN Electronic Journal (01 2018). https://doi.org/10.2139/ssrn.3111957
Kubiak and Weichbroth (2010) Bernard Kubiak and Paweł Weichbroth. 2010. Cross- And Up-selling Techniques In E-Commerce Activities. Journal of Internet Banking and Commerce 15 (12 2010).
Schiffman (2005) Stephan Schiffman. 2005. Upselling Techniques: That Really Work! Simon and Schuster.
Thirumuruganathan et al. (2023) Saravanan Thirumuruganathan, Noora Al Emadi, Soon-gyo Jung, Joni Salminen, Dianne Ramirez Robillos, and Bernard J Jansen. 2023. Will they take this offer? A machine learning price elasticity model for predicting upselling acceptance of premium airline seating. Information & Management 60, 3 (2023), 103759.
Zaharia et al. (2018) Matei A. Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie, and Corey Zumar. 2018. Accelerating the Machine Learning Lifecycle with MLflow. IEEE Data Eng. Bull. 41 (2018), 39–45. https://api.semanticscholar.org/CorpusID:83459546
Zhao et al. (2017) Yan Zhao, Xiao Fang, and David Simchi-Levi. 2017. Uplift Modeling with Multiple Treatments and General Response Types. ArXiv abs/1705.08492 (2017). https://api.semanticscholar.org/CorpusID:7823562
Zhao and Harinen (2019) Zhenyu Zhao and Totte Harinen. 2019. Uplift Modeling for Multiple Treatments with Cost Optimization. 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2019), 422–431. https://api.semanticscholar.org/CorpusID:199668829

Percent Treated	policy
Treatment	Grid exp policy	CATE policy( $\pi$ )
CG	19.92	3.16
TG1	20.12	12.71
TG2	20.05	31.86
TG3	20.13	37.42
TG4	19.78	14.86