11email: jiangwei.zhang,li.zhang,vigneshwaran.raveendran,ziv.benzuk,[email protected]
PriceAggregator: An Intelligent System for Hotel Price Fetching
Abstract
This paper describes the hotel price aggregation system - PriceAggregator, deployed at Agoda, a global online travel agency for hotels, vacation rentals, flights and airport transfer. Agoda aggregates non-direct suppliers’ hotel rooms to ensure that Agoda’s customers always have the widest selection of hotels, room types and packages. As of today, Agoda aggregates millions of hotels.
The major challenge is that each supplier only allows Agoda to fetch for the hotel price with a limited amount of Queries Per Second (QPS). Due to the sheer volume of Agoda’s user search traffic, this limited amount of QPS is never enough to cover the all user searches. Inevitably, many user searches have to be ignored. Hence, booking lost.
To overcome the challenge, we built PriceAggregator. PriceAggregator intelligently determines when, how and what to send to the suppliers to fetch for price. In this paper, we not only prove PriceAggregator is optimal theoretically, but also demonstrate that PriceAggregator performs well in practice. PriceAggregator has been deployed in Agoda. Extensive online A/B experimentation have shown that PriceAggregator increases Agoda’s bookings significantly.
Keywords:
Optimization Dynamic Caching Inventory Management1 Introduction
Agoda 111Agoda.com is a global online travel agency for hotels, vacation rentals, flights and airport transfers. Millions of guests find their accommodations and millions of accommodation providers list their properties in Agoda. Among these millions of properties listed in Agoda, many of their prices are fetched through third party suppliers.
These third party suppliers do not synchronize the hotel prices to Agoda. Every time, to get a hotel price from these suppliers, Agoda needs to make 1 HTTP request call to the supplier to fetch the corresponding hotel price. However, due to the sheer volume of the search requests received from users, it is impossible to forward every request to the supplier. Hence, a cache database which temporarily stores the hotel prices is built. For each hotel price received from the supplier, Agoda stores it into this cache database for some amount of time and evicts the price from the cache once it expires. Figure 1 above abstracts the system flow.
Every time an user searches a hotel in Agoda, Agoda first reads from the cache. If there is a hotel price for this search from the user in the cache, it is a ’hit’ and we will serve the user with the cached price. Otherwise, it is a ’miss’ and the user will not have the price for that hotel. For every ’miss’, Agoda will send a request to the supplier to get the price for that hotel, and put the returned price into the cache. So that, the subsequent users can benefit from the cache price. However, every supplier limits the amount of requests we can send at every second. Once we reach the limit, the subsequent messages will be ignored. Hence, this poses four challenges.

Challenge 1: Time-to-live (TTL) determination.
For a hotel price fetched from the supplier, how long should we put such hotel price in the cache before expiring them? We call this duration as time-to-live (TTL). The larger the TTL, the longer the hotel prices stay in the cache database. As presented in Figure 2, the TTL plays three roles:

-
•
Cache Hit. With a larger TTL, hotel prices are cached in the database for a longer period of time and hence, more hotel prices will remain in the database. When we receive a search from our users, there is a higher chance of getting a hit in the database. This enhances our ability to serve our users with more hotel prices from the third party suppliers.
-
•
QPS. As we have limited QPS to each supplier, a larger TTL allows more hotel prices to be cached in database. Instead of spending QPS on repeated queries, we can better utilise the QPS to serve a wider range of user requests.
-
•
Price Accuracy. As the hotel prices from suppliers changes from time to time, a larger TTL means that the hotel prices in our cache database are more likely to be inaccurate. Hence, we will not be able to serve the users with the most updated hotel price.
There is a trade-off between cache hit and price accuracy. We need to choose the TTL that caters to both cache hit and price accuracy. To our best knowledge, most Online Travel Agents (OTA) typically pick a small TTL ranging from 15 minutes to 30 minutes. However, this is not optimal.
Challenge 2: Cross data centre QPS management.

Agoda has several data centres globally to handle the user requests. For each supplier, we need to set a maximum number of QPS that each data centre is allowed to send. However, each data centre has its own traffic pattern.
Figure 3 presents an example of the number of QPS sent to a supplier from two data centres A and B. For data centre A, it peaks around QPS around 18:00. At the same time, data centre B peaks around QPS around 04:00. If we evenly distribute this QPS to data centre A and data centre B, then we are not fully utilizing this QPS. If we allocate more than QPS to each data center, how can we make sure that data center A and data center B never exceed the QPS in total? Note that, the impact of breaching the QPS limit could be catastrophic to the supplier, which might potentially bring down the supplier to be offline.
Challenge 3: Single data centre QPS utilization.

As mentioned in the previous section, each data centre has its own traffic pattern, there are peak periods when we send the most amount of requests to the supplier, and non-peak period when we send much fewer number of requests to the supplier. As demonstrated in Figure 4, for this data centre, it sends < QPS to the supplier around 08:00. However, similar to the abovementioned example, QPS of this data centre is not utilized.
Challenge 4: Cache hit ceiling.
The passive system flow presented in Figure 1 has an intrinsic limitation to improve the cache hit. Note that, this design sends a request to supplier to fetch for price only if there is a miss. This is passive! Hence, a cache hit only happens if the same hotel search happened previously and the TTL is larger than the time difference between the current and previous hotel search.
Note that we cannot set TTL to be arbitrarily large as this will lower the price accuracy as explained in Challenge 1. As long as TTL of a specific search is not arbitrarily large, it will expire and the next request of this search will be a miss. Even though we can set the TTL to arbitrarily large, those hotel searches that never happened before will always be miss. For example, if more than 20% of the requests are new hotel searches. Then, it is inevitable for us to have <80% cache hit regardless of how large the TTL is set.
To overcome the 4 challenges mentioned above, we propose , an intelligent system for hotel price fetching. As presented in Figure 5, before every price is written to cache (Price DB), it always goes through a TTL service, which assigns different TTL for the different hotel searches. This TTL service is built on historical data extracted to optimize the trade-off between cache hit and price accuracy, which addresses the Challenge 1.
Apart from passively sending requests to supplier to fetch for hotel price, re-invent the process by adding an aggressive service which pro-actively sends requests to supplier to fetch for hotel price on a constant QPS. By having a constant QPS, Challenge 2 and Challenge 3 can be addressed easily. Moreover, this aggressive service does not wait for a hotel search to appear before sending requests to supplier. Therefore, it can increase the cache hit and hence, addresses Challenge 4.
In summary, we make the following contributions in the paper:
-
1.
We propose , an intelligent system which maximizes the bookings for a limited QPS. To the best of our knowledge, this is the first productionised intelligent system which optimises the utilization of QPS.
-
2.
We present a TTL service, SmartTTL which optimizes the trade-off between cache hit and price accuracy.
-
3.
Extensive A/B experiments were conducted to show that is effective and increases Agoda’s revenue significantly.

The rest of the paper is organized as following. Section 2 presents the necessary definitions before presenting the TTL service, SmartTTL in Section 3. In Section 4, we present the aggressive model. In Section 5, we present the experiment results and analysis. Section 6 presents the related work before concluding the paper in Section 7.
2 Preliminary and Definition
In this section, we make necessary definitions. Figure 6 presents the major steps in the hotel booking process. In stage 1, an user requests for a hotel price. In stage 2, if the hotel price is already existing in the cache, then the user will be presented with the cached price. Otherwise, the user won’t be able to see the hotel price. In stage 3, if the user is happy with the hotel price, then the user clicks booking. In stage 4, Agoda confirms with the hotel whether the price is eligible to sell. If the price is eligible to sell, then Agoda confirms the booking in stage 5.

Definition 1
Let be the set of users requesting for hotels in Agoda. Let be the set of hotels that Agoda have. Let be the set of search criteria that Agoda receives, and each is in the form of .
In the definition above, and are self-explanatory. For , means a search criteria having the checkin date as 2020-05 -01, the checkout date as 2020-05-02, the number of adults as 2, the number of children as 0 and the number of room as 1. Therefore, we can define the itinerary request and the user search as follows.
Definition 2
Let be the set of itinerary request that Agoda sends to the suppliers, where . Let be the set of searches that Agoda receives from the user, where .
For example, an itinerary request ,2020-06-01, means Agoda sends a request to the supplier to fetch price for hotel Hilton Amsterdam on checkin=2020-06-01, checkout=,2020-06-02, adults=1, children=0, rooms=1.
Similarly, an user search , Hilton Amsterdam, 2020-05-01, 2020- 05-02, 2, means Alex searched on hotel Hilton Amsterdam for price on checkin=2020-05-01, checkout=2020-05-02, adults=2, children=0, rooms=1. Note that, if Alex makes the same searches on Hilton Amsterdam, 2020-05-01, 2020-05-02, 2,0,1 multiple times in a day, it is considered as multiple user searches. Therefore, S here is a multi-set.
Definition 3
is the probability of an user search that hits on the hotel prices in the cache.
For example, if Alex makes the 10 searches on Hilton Amsterdam, 2020-05-01, 2020-05-02, 2, 0,1, and 8 out of these 10 searches hit on the price cached. Then,
Definition 4
is the probability of an user search that ended up with booking attempt, given that the hotel price is in the cache.
Following the above example, for , Hilton Amsterdam, 2020-05-01, , Alex has 8 searches returned prices. And out of these 8 searches, Alex makes 2 booking attempts. Then, ,
Definition 5
is the probability of the hotel price is accurate after an user makes a booking attempt on search .
Continuing the example above, out of the 2 booking attempts, 1 booking attempt succeeds. Hence, , , = . Therefore, we can formulate the number of bookings expected as follows.
Definition 6
The expected number of bookings is the following
(1) |
Therefore, our goal is to optimise such . To optimize , we would expect , , to be as high as possible. is an user behaviour, as a hotel price fetching system, this is not controllable. But we can learn this from historical data. However, , could be tuned by adjusting the TTL. As illustrated by Figure2, to increase , one can simply increase the TTL. Similarly, to increase , one just needs to decrease TTL. We will discuss how to set the TTL to optimize the booking in Section 3.
3 SmartTTL
In this section, we explain how we build a smart TTL service which assigns itinerary request specific TTL to optimize the bookings. There are three major steps: price-duration extraction, price-duration clustering and TTL assignment.
3.1 Price-Duration Extraction
Price-duration refers to how long each price stay unchanged. This is approximated by the time difference between two consecutive requests of the same itinerary that Agoda sends to the supplier. Figure 7 presents an example of extracting price-duration distribution from empirical data of hotel Hilton Amsterdam and search criteria , .
Agoda first sends a request to supplier at 13:00 to fetch for price, and that’s the first time we fetch price for such itinerary. So, there is no price change and no price-duration extracted. Later, at 13:31, Agoda sends the second request to the supplier to fetch for price, and observes that the price has changed. Hence, the price-duration for the previous price is 31 minutes (the time difference between 13:00 and 13:31). Similarly, at 14:03, Agoda sends the third request to the supplier to fetch for price, and again, observes that the price has changed. Hence, the price-duration for the second price is 32 minutes. Therefore, for each search criteria, we can extract the empirical price-duration distributions.

3.2 Price-Duration Clustering
In Agoda, we have billions of such user searches every day. It is practically intractable and unnecessary to store such volume of search criteria’s TTL into in-memory cache, e.g. Redis or Memcached. Therefore, we need to reduce the cardinality of the user searches. And we do it through clustering.
Figure 8 presents the price-duration clustering process. We cluster these user searches into clusters to reduce the cardinality. In , we used [1] for the clustering feature ranking, and the significant features are checkin, price_availability. We observe that the itinerary requests with same checkin and price_availability (whether the hotel is sold out or not) have the similar price-duration distribution. Hence, for all supplier requests with same checkin and price_availability, we group them into the same cluster, and use the aggregated price-duration distribution to represent the cluster. By doing this, we dramatically reduce the cardinality to , which can be easily stored into any in-memory data structure.

3.3 TTL Assignment
In the above section, we finished clustering. Next, we need to assign a TTL for each cluster. Note that, we want to optimize the bookings as expressed in Equation 1, and the TTL will affect the cache hit ( in Equation 1) and booking price ( in Equation 1) accuracy. Hence, we want to assign a TTL for each cluster in which Equation 1 is optimised.
For cache hit, we can easily approximate the cache miss ratio curve [2] using Cumulative Distribution Function (CDF) of the gap time (time difference between current request and previous request for the same itinerary search). Figure 9 presents the CDF of the gap time, where the x-axis is the gap time, and the y-axis is the portion of requests whose gap time a specific gap time. For example, of the requests are having gap time 120 minutes in Figure 9. Hence, by setting TTL at minutes, we can achieve cache hit. Therefore, the cache miss ratio curve related to TTL can be easily found, and we can know the approximated cache hit rate for each TTL we choose.
For booking accuracy of a cluster , this can be approximated by
For example, in a specific cluster, if the empirical price-duration observed is minutes and minutes, and we assigned minutes. Hence, we know that there are and minutes that we are using the accurate price. Hence, the accuracy is .
Hence, to optimize the bookings as expressed in Equation 1, we just need to numerate the different TTL in each cluster to find such TTL.
So far, we have completed the major steps in SmartTTL.

4 From Passive Model to Aggressive Model
As mentioned in Section 1, SmartTTL addresses the Challenge 1. We still have three more challenges remaining untackled. For Challenge 2 and Challenge 3, we can resolve it by guaranteeing each data centre sends constant rate of requests to the suppliers. Every time passive model sends requests to the suppliers, where , we proactively send extra requests to supplier. The question is how to generate these requests. Next, we will present one alternative of generating such requests.
4.1 Aggressive Model with LRU Cache
In this section, we describe an aggressive model which aggressively sends requests to the supplier to fetch for hotel price. These requests are generated from the auxiliary cache . There are two major steps:
Cache building. The auxiliary cache is built up by using historical user searches. For each user search , they are always admitted into . Once reaches its maximum capacity specified, will evict the user search which is Least Recently Used (LRU).
Request pulling. At every second , passive model needs to send requests to supplier. And the supplier allows us to send requests per second. Hence, aggressive model will send requests to the supplier. To generate such requests, Agoda pulls requests from which are going to expire (starts from requests that are closets to expiry until is used up).
It is obvious that the above approach can solve Challenge 2 and Challenge 3. Moreover, it can also help improve the cache hit by requesting the hotel prices before an user searches for it.
However, this is not optimal. For example, a specific hotel could be very popular. However, if the hotel is not price competitive, then Agoda does not need to waste such QPS to pull the hotel price from such supplier. In the next section, we will introduce an aggressive model which optimizes the bookings.
4.2 Aggressive model with SmartScheduler
As mentioned, aggressive model with LRU cache is not optimal. Moreover, previously, passive model always has the highest priority. Meaning aggressive model only sends requests to supplier if there is extra QPS left. However, this is again not optimized. In this section, we present an aggressive model which optimizes the bookings. It has 5 major steps.
Itinerary frequency calculation. This describes how many times an itinerary needs to be requested to ensure it is always available in database. If we want a high cache hit rate, we want an itinerary to be always available in the database, that means we need to make sure that such itinerary is fetched before it expires. Moreover, for each , we have the generated . Hence, to make sure an itinerary is always available in database for 24 hours (1440 minutes), we need to send requests to supplier, where is
(2) |
Itinerary value evaluation.This evaluates the value of an itinerary by the probability of booking from this itinerary. With above itinerary frequency calculation, we can assume an itinerary request is always a ’hit’ in the database. Hence, in this step, we evaluate the itinerary value given that such itinerary is always available in our Price DB. That is, for all user search on the same itinerary , , it will be always cache hit, i.e. . Recall from Equation 1, for each itinerary request , we have now the expected number of bookings as
(3) |
Request value evaluation. This evaluates the value of a request by the probability of booking from this request. By Equation 3 and Equation 2, we can have the expected bookings per supplier request as
(4) |
Top request generation. This generates the top requests we want to select according to their values. Within a day, for a specific supplier, we are allowed to send requests to supplier. Therefore, by Equation 4, we can order the supplier requests and pick the most valuable requests.
Top request scheduling. This describes how to schedule to pull the top requests we selected. Given that we have requests need to be sent to the supplier, we need to make sure 1. each of these requests is sent to the supplier before its previous request expires. 2. at every second, we send exactly requests to the supplier.
For all itinerary requests, we group the itinerary request by their frequency, where and itinerary request all have frequency and same . This means every single request where , is to be scheduled to send for times and all k itinerary requests, , are to be sent within a period of . To ensure every of these k itinerary requests are sent within a period of , we can simply distribute these requests evenly over each second in . Thus, we need to schedule to send requests each second within . Now we just need to send the same set of requests every and repeats this process for times For example, if , then we have 43200 itinerary requests having frequency = and = hours which is 21600 seconds. That means, in every 6 hours, we need to schedule 43200 itinerary requests, which is requests per second. That is, if we don’t consider any ordering of the 43200 requests, we will send requests and in 1st second, and in 2nd second until and itinerary requests in 21600th seconds. In 21601th second, and will be sent again and so on. These 43200 itinerary requests are scheduled to be sent for a frequency of 4 times in a single day.
By having the above 5 steps, we can see that the most valuable requests are sent by SmartScheduler which maximizes the booking.
5 Experiment and Analysis
The aggressive model with SmartScheduler has been deployed in production at Agoda. The deployment has yielded significant gains on bookings and other system metrics. Before the deployment, we have done extensive online A/B experiment to evaluate the effectiveness of the model.
In the following section, we will present the experiments conducted in 2019. As Agoda is a publicly listed company, we are sorry that we can’t reveal the exact number of bookings due to data sensitivity, but we will try to be as informative as possible. Overall, aggressive model with SmartScheduler wins other baseline algorithms by to .
5.1 Experimentation suppliers
There are two types of suppliers Agoda have experimented with:
-
1.
Retailers. Retailer are those suppliers whose market manager from each OTA deals with hotel directly and they are selling hotel rooms online.
-
2.
Wholesalers. Wholesalers are those suppliers that sell hotels/hotel’s room/other products in large quantities at lower price (package rate), mainly selling to B2B or retailer not direct consumer.
In this paper, we present the results from 5 suppliers.
Supplier A is a Wholesaler supplier which operates in Europe.
Supplier B is a Wholesaler supplier which operates worldwide.
Supplier C is a Wholesaler supplier which operates worldwide.
Supplier D is a Retailer supplier which operates in Japan.
Supplier E is a Retailer supplier which operates in Korea.
In this section, all the experiments were conducted through online A/B experiment over 14 days, where half of the allocated users are experiencing algorithm A and the other half are experiencing algorithm B. Moreover, for all the plots in this section,
-
•
x-axis is the nth day of the experiment.
-
•
bar-plot represents the bookings and line-plot represents the cache hit.
5.2 Fixed TTL v.s. SmartTTL
In this section, we compare the performance between passive model with Fixed TTL (A) and passive model with SmartTTL (B).

Figure 10 presents the results on Supplier A, and we can see that B variant wins A variant by a small margin. Overall, B variant wins by for cache hit, and for bookings. This is expected as SmartTTL only address Challenge 1.
5.3 SmartTTL v.s. Aggressive Model with SmartScheduler
In this section, we compare the performance between passive model with SmartTTL (A) and aggreesive model with SmartScheduler (B). We present the A/B experiment results Supplier C and Supplier E.
Figure 11 presents the results on Supplier C, and we can easily see that B variant wins A variant significantly in terms of booking and cache hit ratio. For cache hit and bookings, B variant wins A variant consistently.


Figure 12 presents the results on Supplier E, and we can easily see that B variant wins A variant significantly in terms of booking and cache hit ratio. For cache hit, B variant wins A variant consistently. For bookings, we can see that B never lose to A on any single day.
5.4 Aggressive Model with LRU Cache v.s. Aggressive Model with SmartScheduler
In this section, we compare the performance between aggressive model with LRU cache (A) and aggreesive model with SmartScheduler (B). We present the A/B experiment results Supplier B and Supplier D.

Figure 13 presents the results on Supplier B, and we can easily see that B variant wins A variant significantly in terms of booking and cache hit ratio. For cache hit, B variant wins A variant consistently. It is worthwhile to note that the overall booking declines along the x-axis, this could be caused by many factors such as promotions from competitors, seasonality etc. However, B variant is still able to win A variant by a consistent trend.

Figure 14 presents the results on Supplier D, and we can easily see that B variant wins A variant significantly in terms of booking and cache hit ratio. For cache hit, B variant wins A variant consistently. For bookings, we can see that B consistently wins A by more than . And on certain days, e.g. day 5, B wins by more than .
6 Related Work
The growth of traveling industry has attracted substantial academic attention [3, 4, 5]. To increase the revenue, many effort have been spent on enhancing the pricing strategy.
Aziz et al. proposed a revenue management system framework based on price decisions which optimizes the revenue [6]. Authors in [3] proposed Smart Price which improves the room booking by guiding the hosts to price the rooms in Airbnb. As long-term stay is getting more common, Ling et al. [7] derived the optimal pricing strategy for long-term stay, which is beneficial to hotel as well as its customer. Similar efforts have been seen in [8, 9] in using pricing strategies to increase the revenues.
Apart from pricing strategy, some effort has been spent on overbooking [10, 11]. For example, Antonio et al. [12] built prediction models for predicting cancellation of booking to mitigate revenue loss derived from booking cancellations.
Nevertheless, none of the existing work has studied hotel price fetching strategy. To the best of our knowledge, we are the first to deploy an optimized price fetching strategy which increases the revenue by large margin.
7 Conclusion and Future Work
In this paper, we presented ,
an intelligent hotel price fetching system which optimizes the bookings.
To the best of our knowledge,
is the first productionized system which addresses the 4 challenges mentioned in Section 1.
It differs from most existing OTA system by having SmartTTL which determines itinerary specific TTL.
Moreover, instead of passively sending requests to suppliers,
aggressively fetches the most valuable hotel prices from suppliers which optimizes the bookings.
Extensive online experiments shows that
is not only effective in improving system metrics like cache hit,
but also grows the company revenues significantly.
We believe that is a rewarding direction for application of data science in OTAs.
One of the factor which brings bookings is pricing. In the future, we will explore how to optimize the bookings through a hybrid of pricing strategy and pricing fetching strategy.
References
- [1] Tianqi Chen and Carlos Guestrin “Xgboost: A scalable tree boosting system” In KDD 2016, 2016, pp. 785–794
- [2] J.W. Zhang and Y.C. Tay “PG2S+: Stack distance construction using popularity, gap and machine learning” In WWW 20, 2020
- [3] Peng Ye et al. “Customized Regression Model for Airbnb Dynamic Pricing” In KDD 2018, 2018, pp. 932–940
- [4] Lucas Bernardi, Themistoklis Mavridis and Pablo Estevez “150 Successful Machine Learning Models: 6 Lessons Learned at Booking.Com” In KDD 2019, 2019, pp. 1743–1751
- [5] Graziano Abrate, Giovanni Fraquelli and Giampaolo Viglia “Dynamic pricing strategies: Evidence from European hotels” In International Journal of Hospitality Management 31.1 Elsevier, 2012, pp. 160–168
- [6] Heba Abdel Aziz, Mohamed Saleh, Mohamed H Rasmy and Hisham ElShishiny “Dynamic room pricing model for hotel revenue management systems” In Egyptian Informatics Journal 12.3 Elsevier, 2011, pp. 177–183
- [7] Liuyi Ling, Xiaolong Guo and Lina He “Optimal pricing strategy of hotel for long-term stay” In International Journal of Services Technology and Management 17.1 Inderscience Publishers Ltd, 2012, pp. 72–86
- [8] Breffni M Noone “Pricing for hotel revenue management: Evolution in an era of price transparency” In Journal of Revenue and Pricing Management 15.3-4 Springer, 2016, pp. 264–269
- [9] Abdelmoniem Bayoumi, Mohamed Saleh, Amir Atiya and Heba Habib “Dynamic pricing for hotel revenue management using price multipliers” In Journal of Revenue and Pricing Management 12, 2013, pp. 271–285
- [10] Rex S Toh and Frederick Dekay “Hotel room-inventory management: an overbooking model” In Cornell Hotel and Restaurant Administration Quarterly 43.4 Sage Publications, 2002, pp. 79–90
- [11] Takeshi Koide and Hiroaki Ishii “The hotel yield management with two types of room prices, overbooking and cancellations” In International Journal of Production Economics 93-94, 2005, pp. 417–428
- [12] Nuno Antonio, Ana de Almeida and Luis Nunes “Predicting hotel booking cancellations to decrease uncertainty and increase revenue” In Tourism and Management Studies 13 scielopt, 2017, pp. 25 –39