Diversifying Multi-aspect Search Results Using Simpson’s Diversity Index

Jianghong Zhou Emory University201 Dowman DriveAtlantaUnited StatesGa. 30322 [email protected] , Eugene Agichtein Emory UniversityAtlantaUnited States [email protected] and Surya Kallumadi The Home DepotAtlantaUnited States [email protected]

(2020)

Abstract.

In search and recommendation, diversifying the multi-aspect search results could help with reducing redundancy, and promoting results that might not be shown otherwise. Many previous methods have been proposed for this task. However, previous methods do not explicitly consider the uniformity of the number of the items’ classes, or evenness, which could degrade the search and recommendation quality. To address this problem, we introduce a novel method by adapting the Simpson’s Diversity Index from biology, which enables a more effective and efficient quadratic search result diversification algorithm. We also extend the method to balance the diversity between multiple aspects through weighted factors and further improve computational complexity by developing a fast approximation algorithm. We demonstrate the feasibility of the proposed method using the openly available Kaggle shoes competition dataset. Our experimental results show that our approach outperforms previous state of the art diversification methods, while reducing computational complexity.

search result diversification, Simpson’s diversity index for search, efficient search diversification computation

^†^†journalyear: 2020^†^†copyright: acmcopyright^†^†conference: Proceedings of the 29th ACM International Conference on Information and Knowledge Management; October 19–23, 2020; Virtual Event, Ireland^†^†booktitle: Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), October 19–23, 2020, Virtual Event, Ireland^†^†price: 15.00^†^†doi: 10.1145/3340531.3412163^†^†isbn: 978-1-4503-6859-9/20/10

1. Introduction

Diversifying search results is of vital importance in many search systems due to the ambiguity of search queries and the complexity of users’ intents. In multi-aspect search, a diverse set of search results are more likely to satisfy the users’ needs and improve the online shopping experience. Therefore, many approaches for diversifying results in information retrieval and recommender systems have been proposed (Zhou and Agichtein, 2020), among which the algorithms based on DPPs (Determinantal Point Processes) are considered to be state of the art methods. DPPs aim to balance diversity and relevance, and are based on the geometric properties of vectors associated with a set of items (Kulesza et al., 2012). DPPs define the diversity of a set by the volume of a parallelepiped spanned by the vectors of this set. The volume is determined by the vectors’ length and their cosine similarity, which can represent the relevance and similarity respectively. Specifically, DPPs compute the determinant of the product of the selected items’ matrix and its transpose to identify a diverse subset of results.

Diversity is characterised by two aspects 1) richness and 2) evenness (Simpson, 1949). Richness quantifies the number of different classes elements in a set, while evenness considers the uniformity of the distribution of these classes. However, DPPs only consider the richness aspect (Xie et al., 2017). It narrows the definition of diversity, which may degrade actual diverse search results. Besides, evenness can affect richness, especially the item has multiple aspects (Stirling and Wilsey, 2001).

In addition to not considering “evenness”, DPPs also suffer from high computation complexity, which is NP hard. In its greedy formulation, the computation complexity is $O(N^{4})$ , in which $N$ is the scale of search space. Therefore, many variations are developed to improve its computational efficiency (Chen et al., 2017)(Chen et al., 2018)(Kulesza and Taskar, 2011).

Due to the aforementioned difficulties, we consider another model to measure diversity, which is Simpson’s diversity index (Simpson, 1949). It measures the probability that two items are chosen at random and independently from a set is the same. Both Shannon entropy and Simpson’s diversity index measure richness and evenness. Notwithstanding, Simpson’s diversity index is a pairwise approach. Through some transformations and improvements, we develop it to a binary quadratic program, with predicable approximation bounds.

We evaluate our approach on a standard benchmark dataset from Kaggle shoes’ price competition (https://www.kaggle.com/datafiniti/womens-shoes-prices, https://www.kaggle.com/datafiniti/mens-shoe-prices). The experimental results show that the proposed method outperforms state of the art DPP algorithms and extensions. In summary, our contributions are threefold:

1. Introducing Simpson’s diversity index (SDI) to search result diversification domain.

2. We operationalize SDI computation as a quadratic program, which can be solved efficiently.

3. We show that the proposed method outperforms state of the art methods on a standard e-commerce search benchmark.

Next, we motivate our method and place it in the context of related prior work.

2. Related Work

In this part, we emphasize the importance of evenness, brief the major baseline method $--$ DPPs, and introduce SDI as a diversity model in multi-aspect search problems.

2.1. Richness and Evenness

Prior research on search diversification focused on the richness of the results (Simpson, 1949). Richness quantifies the number of different types the dataset of interest contains, while evenness considers the uniformity of the distribution of these types. However, evenness is also important for search diversification. Importantly for the e-commerce setting, it can affect users’ online shopping experience. For example, consider a product catalog with 1000 A items, 500 B items, and 500 C items. Suppose, we attempt to retrieve 12 results as diverse as possible to fit on the first page of the results. Without considering evenness, the expected search results might be 6 A items, 3 B items, and 3 C items. While considering evenness, the expected results are all 4. The richness of the two versions of the results is identical. However, the information entropy of the second set of results is higher, meaning that higher evenness could increase the information that the users receive in a limited space. In multi-aspect search, evenness is even more important. In multi-aspect search, the search results are the combination of many independent single-aspect searches. In prior research, (Simpson, 1949) shows that the combinations of the even results are the most diverse. Therefore, evenness affects the richness of multi-aspect search as well.

The above discussion shows that evenness is significant for diversity analysis. However, vast majority of the diversity models for search do not consider results’ evenness. Therefore, we propose to use a model considering evenness to enrich the diversity model for search. We propose to adopt the Simpson’s diversity index (SDI) method from biology for this task (Simpson, 1949). But first, we formally introduce the DPP-based algorithms for result diversification and then introduce our new SDI method.

2.2. Determinantal Point Processes for Search Diversification

Determinantal Point Processes (DPPs) are useful in domains where repulsive effects or diversity are important aspects to model. In recommender systems and information retrieval domains, where the discovery of documents and items are significant, DPPs can be used to diversify results. In search problems, DPPs are a subset selection problem. We define the search space as a set $S$ , whose size is $N$ , and the search results are a subset $S^{\prime}$ , whose size is $k$ . $B_{S^{\prime}}$ represents the selected items in $S^{\prime}$ . DPPs can be formulated as:

$X=\underset{S^{\prime}\subseteq S}{\operatorname{argmax}}$ det $({B^{T}_{S^{\prime}}B_{S^{\prime}}}))$

$X$ is the volume of a parallelepiped spanned by the vectors of $B$ . It is determined by the vectors’ length (relevance) and their cosine similarity (similarity). By maximizing $X$ , we can obtain the best subset of $S$ , considering both diversity and relevance.

However, DPPs are a richness-greedy diversification method, because $X$ does not change when we apply linear transformations to revise the evenness of $B$ . Therefore, the evenness of the obtained subset of each aspect is not promised. In multi-aspect search, the results are the combination of every aspect. As we mentioned in Section 2.1, this can lead to weaker diversity of the combination.

2.3. Simpson’s diversity index

Simpson’s Diversity Index (SDI) was originally introduced in biology to measure both the evenness and richness of a habitat (Simpson, 1949), and aims to estimate the probability that two randomly selected individuals from a sample will be the same. Apparently, if the habitat has more species (Diversity), this probability is smaller. Based on Jensen’s inequality, if the total number of the animals in the habitat is a constant, this probability is also smaller when the numbers of each species are the same (Evenness).

More formally, SDI is defined as the probability that two items are chosen at random and independently from a set is the same:

$D=\frac{\sum_{i=1}^{R}{(n_{i}(n_{i}-1))}}{N(N-1)}$

Where $R$ is the number of classes. $n_{i}$ is the number of items of the $i$ th class. $N$ is the total number of all items. Smaller $D$ means higher diversity. SDI considers both richness and evenness, according to Section 2.1, it is more suitable to diversify search results over multi-aspect items.

3. Diversifying Multi-aspect Search Results Using SDI

In this section, we present a method that diversifies multi-aspect search results using Simpson’s diversity index. We first present an important variation of Simpson’s diversity index and then explain how we apply it to search. We conclude by constructing a quadratic program to compute SDI more efficiently.

3.1. Diversifying Multi-aspect Search Results using Simpson’s diversity index

In multi-aspect search, items are labeled by many aspects. We first consider the SDI of an aspect. In addition to the equation in Section 2.2, we have another way to calculate SDI.

In a set, we denote the number of items in this set as $N$ . $R$ is the number of items’ classes. $n_{i}$ is the number of items of the $i$ th class.

When we randomly select two items as a pair, there are $\frac{1}{2}N(N-1)$ pairs. For each pair, they belong to the same class or not. The total number of pairs belonging to the same class is $\sum_{i<j\leq N}{(a_{i}\&a_{j})}$ . Where $a_{i}$ is the Item $i$ . If two items are the same $(a_{i}\&a_{j})=1$ , else $(a_{i}\&a_{j})=0$ . If we also count the item itself, we have:

(1)

D(p,S^{\prime})=\frac{2\sum_{i,j\in S^{\prime},i\leq j}{(a_{p,i}\&a_{p,j})}}{|S^{\prime}|(|S^{\prime}|+1)}

Where $p$ is the aspect, $P$ is the aspect set, $S^{\prime}$ is the selected set, $S$ is the whole set, $|S|$ is the size of $S$ . $a_{p,i}$ is the Aspect $p$ of Item $i$ .

If we consider all the aspects at the same time, we have:

$H(S^{\prime})=\sum_{p\in P}{\omega_{p}\frac{D(p,S^{\prime})}{D(p,S)}}$

In this equation, we first normalize each aspect’s SDI, because different aspects’ SDI may have different scales. Then, we weigh every aspect with $\omega_{p}$ . If we only consider diversification, $\omega_{p}=1$ . However, in the search problem, relevance is also very important. When the item is highly relevant to some aspects. For example, when users search ’blue shoe’, it is not necessary to present other colors’ shoes anymore. Therefore, the ’color’ aspect should be penalized. In this paper, $\omega_{p}=1-\frac{k_{p}-1}{|S^{\prime}|-1}$ , Where $k_{p}$ is the number of the most common aspect of Aspect $p$ in top $|S^{\prime}|$ relevant results. For convenience, we let $\phi_{p}=\frac{\omega_{p}}{D(p,S)}$ .

This transformation of SDI helps us organize it to a more efficient form to be optimized $--$ the binary quadratic program (BQP).

3.2. The Binary Quadratic Program

In this section, we transfer the SDI optimization to a BQP problem, which has thorough previous research to systematically be approximated and optimized (Ma, 2010)(Clarkson, 2010)(Springer, 2002). Since $H(S^{\prime})$ is pairwise, it can be computed in an easy way. Let’s expand $H(S^{\prime})$ .

$H(S^{\prime})=\sum_{i,j\in S^{\prime},i\leq j}{\frac{2}{|S^{\prime}|(|S^{\prime}|+1)}\sum_{p\in P}{\phi_{p}(a_{p,i}\&a_{p,j})}}$

Let $Q_{i,j}={\frac{2}{|S^{\prime}|(|S^{\prime}|+1)}\sum_{p\in P}{\phi_{p}(a_{p,i}\&a_{p,j})}}$ , we can reformulate $H(S^{\prime})$ as $H(S^{\prime})=\frac{1}{2}x^{T}Qx$ , $x$ is a vector representing $S^{\prime}$ . The length of $x$ equals to $|S|$ . Each entry is an item of $S$ . The entry of the chosen item is 1, or it is 0. If we further consider the relevance, we have:

(2)

\min_{x}T(S^{\prime})=\min_{x}H(S^{\prime})+R(S^{\prime})=\min_{x}\frac{1}{2}x^{T}Qx+b^{T}x

Where $b$ is the relevance vector. Consequently, it becomes a binary quadratic program problem, whose constraints are: s.t. $x_{i}\in\{0,1\},\sum_{k=1}^{|S|}{x_{k}}=|S^{\prime}|$ , where $x_{i}$ is the entry of $x$ .

Equation 2 is a classical BQP problem. In this paper, we first relax it to a convex optimization problem (Ma, 2010). Then we use the Frank–Wolfe algorithm to optimize it (Clarkson, 2010). Finally, we use Goemans and Williamson procedure for the approximate solutions (Springer, 2002). Such a procedure guarantees that the solution is a close approximation that $T\in[T_{min},\frac{\pi}{2}T_{min}]$ .

4. Experimental Setting and Results

We first introduce the experimental datasets, including the queries generation and items’ collection. Then, we describe the evaluation methods, initial relevance ranking, and the baseline methods used to report the experimental results.

4.1. Datasets

We experiment with two datasets: a publicly available Kaggle benchmark for the searching items dataset, and a complementary dataset constructed using publicly available Amazon Web Search engine interface for the queries dataset.

Kaggle Shoes Price Competition Dataset (Kaggle): Kaggle shoe price dataset is a list of shoes and their associated information, containing 19046 women shoes and 19387 men shoes. However, the items of this dataset also contain clothes, gloves, jewelry, and so on. Some items do not have features. To focus on the diversity problem, we remove items without features, clothes items, jewelry items, gloves items, trousers items, and toys items. At last, we have 14609 items. Besides, even though the items have many aspects, they do not have the values of many aspects. Therefore, we only choose 21 most commonly used features: ’Season’, ’Material’, ’Gender’, ’Shoe Size’, ’Color’, ’Brand’, ’Age Group’, ’Heel Height’, ’Fabric Material’, ’Shoe Width’, ’Occasion’, ’Shoe Category’, ’Casual & Dress Shoe Style’, ’Shoe Closure’, ’Assembled Product Dimensions (L x W x H)’, ’Fabric Content’, ’Shipping Weight (in pounds)’, ’prices.offer’, ’prices.amountMin’, ’prices.amountMax’ and ’prices.isSale’.

Amazon Queries Suggestions Dataset (AQS): To complement the Kaggle queries dataset, we created a new dataset automatically constructed for the “shoes” category using the query suggestions provided by the Amazon Web Search engine. Specifically, we collected the queries suggestions for the top-level category ’shoes’. This process resulted in 384 queries, such as ’fur lined winter coat women’. The AQS dataset is available at the URL https://docs.google.com/spreadsheets/d/17DMU5pMSiNxi05yu5pdEvrUf3b0mIAF1rYn2ypruS7A/edit?usp=sharing.

4.2. Baselines and Experimental Setting

4.2.1. Setting

In the experiments, we search for queries and every model returns the top 10 results. We used the averaged results of all queries as the data points.

All the diversification methods considered, and our proposed method, require the relevance vectors and the similarity matrices. In this paper, BM25(Robertson et al., 2009) score of each item with respect to the query. For the item-item similarity matrix, the number of identical aspects between the items is used, which could be improved in future work.

To trade off the relevance and diversity, we introduce a parameter $\theta$ . $\theta$ adjusts the relation between diversity and relevance in the following way: $(1-\theta)H(S^{\prime})+\theta R(S^{\prime})$ . For the proposed method, the approximate rate $\epsilon$ is 0.0001.

4.2.2. Baselines

We use three baselines, including Greedy version DPPs and two recent and influential variations. 1) DPPs Greedy: Greedily selecting items to maximize the determinant (Borodin, 2009). 2) k-DPPs: It is a sample-based DPPs, which can accelerate the Greedy DPPs. However, since it localizes the optimization, the performance may be unstable (Kulesza and Taskar, 2011). 3) Fast MAP DPPs: A novel approach improves the computation complexity of the maximum a posteriori (MAP) inference DPPs, which is the state of art DPPs algorithm (Chen et al., 2018).

4.3. Evaluation Metrics

We now summarize the metrics used to compare the methods.

Coverage Rate(CR): The coverage rate measures the diversity of the subset (Castells et al., 2011). For each aspect of the item, they have at most 10 features or the number of this aspect’s features in this set. The coverage rate is a diversity metric.

$CR(S^{\prime})=\frac{1}{|P|}\sum_{p\in P}{\frac{|A^{\prime}_{p}|}{\min(|S^{\prime}|,|A_{p}|)}}$

Where $S^{\prime}$ is the selected set, $P$ is the set of all aspects, $A^{\prime}_{p}$ is the features of Aspect $p$ in the selected set and $A_{p}$ is the features of Aspect $p$ .

NDCG@10: We use normalized discounted cumulative gain (NDCG) to measure ranking results of the relevance. Since both results of DPPs and the proposed method do not have the order, we rank the selected subset based on the items’ relevance before computing the NDCG@10 (Castells et al., 2011).

$\alpha$ -NDCG: A variations of NDCG, which can measure both relevance and diversity (Castells et al., 2011).

Variance: We use averaged variance of every aspect’s number of features to measure the evenness (Ricotta, 2017).

4.4. Results

The results of diversity, evenness, and relevance are presented in Figure 1. The result of the computing time is illustrated in Table 1. In Figure 1, the trade-off parameter $\theta$ ranges from 0 to 1, whose step size is 0.1. For CR(a), NDCG@10(b), and $\alpha$ -NDCG(c), the larger their value, the better the performance of the model is. For variance(d), a lower line means the model is evener. In Table 1, less computing time means the model is faster. Generally, the proposed method outperforms the baselines in terms of relevance, diversity, and evenness, especially when $\theta$ ranges from 0.3 to 0.7. The improvement is up to more than 15%. In terms of computing time, the proposed method spends less time than most of the baselines.

We use CR to measure the richness of the multi-aspect results in Figure 1 (a), which shows the proposed method outperforms the baselines when $\theta$ ranges from 0.2 to 0.8. In terms of relevance, the NDCG@10 scores from Figure 1 (b) show that the proposed method has closed performances. We further use $\alpha$ -NDCG to measure the relevance and diversity at the same time in Figure 1 (c), which demonstrates that the proposed approach has the best performance while $\theta$ ranges from 0.4 to 0.7. Figure 1 (d) further applies the Variance to measure the evenness of the returned results. In this figure, we find that when the model emphasizes diversity ( $\theta$ closes to 0), the variance of SDI is smaller than all the baselines. We further test the computing time of the proposed method and summarize the results in Table 1, revealing that the proposed method is slower than k-DPPs but faster than other baselines.

Refer to caption — Figure 1. Comparison of trade-off performance between relevance and diversity under different choices of trade-off parameters $\theta$ on Amazon Queries Suggestions dataset (a, c) and Kaggle Shoes Price Competition Dataset (b, d).

Table 1. Comparison of average running time (in milliseconds)

Model	SDI	DPPs-Greedy	k-DPPs	Fast-MAP-DPPs
Computing Time	45.32	1093.48	17.86	98.45

5. Analysis & Discussion

We now discuss the proposed method’s advantages and drawbacks based on the experimental results above. Figure 1 (a) (b) (c) show that our SDI method outperforms all the baselines while we consider diversity and relevance with similar or closed weights. Simultaneously, Figure 1 (d) shows that the evenness of SDI is substantially higher than the baselines’, supporting the analysis in Section 2.

Table 1 reports the computation time of 4 methods, which aligns with their computation complexity. Greedy DPPs’ is $O(N^{4})$ . Fast-MAP-DPPs’ is closed to $O(N^{3})$ . k-DPPs’ is $O(Nk^{2})$ . The proposed method has $O(1/\epsilon)$ iterations, and each iteration is just linear programming. Although the proposed method is slower than k-DPPs, k-DPPs have the worst general performance in Figure 1, which still supports the conclusion that SDI is a competitive diversity model.

6. Conclusions

In this paper, we presented a novel approach to multi-aspect search results diversification, by adapting the Simpson’s Diversity Index (SDI) from biology for this task. Unlike important previous SOTA family of methods for diversification, our proposed method considers both evenness and richness. Based on theoretical analysis and experimental evaluation, we show that the evenness is significant for multi-aspect search, which is common in e-commercial product search. By defining the relevance and diversity in different ways, SDI can be further applied to other search and recommendation methods based on learning-to-rank or neural search and recommendation algorithms (Zoph and Le, 2016). In the future, we plan to explore incorporating our SDI-based approach directly into learning-to-rank models, to further improve search and recommendation performance for e-commerce search and recommendation.

References

(1)
Borodin (2009) Alexei Borodin. 2009. Determinantal point processes. arXiv preprint arXiv:0911.1153 (2009).
Castells et al. (2011) Pablo Castells, Saúl Vargas, and Jun Wang. 2011. Novelty and diversity metrics for recommender systems: choice, discovery and relevance. (2011).
Chen et al. (2018) Laming Chen, Guoxin Zhang, and Eric Zhou. 2018. Fast greedy map inference for determinantal point process to improve recommendation diversity. In Advances in Neural Information Processing Systems. 5622–5633.
Chen et al. (2017) Laming Chen, Guoxin Zhang, and Hanning Zhou. 2017. Improving the diversity of top-N recommendation via determinantal point process. In Large Scale Recommendation Systems Workshop.
Clarkson (2010) Kenneth L Clarkson. 2010. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Transactions on Algorithms (TALG) 6, 4 (2010), 1–30.
Kulesza and Taskar (2011) Alex Kulesza and Ben Taskar. 2011. k-DPPs: Fixed-size determinantal point processes. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 1193–1200.
Kulesza et al. (2012) Alex Kulesza, Ben Taskar, et al. 2012. Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning 5, 2–3 (2012), 123–286.
Ma (2010) Wing-Kin Ken Ma. 2010. Semidefinite relaxation of quadratic optimization problems and applications. IEEE Signal Processing Magazine 1053, 5888/10 (2010).
Ricotta (2017) Carlo Ricotta. 2017. Of beta diversity, variance, evenness, and dissimilarity. Ecology and evolution 7, 13 (2017), 4835–4843.
Robertson et al. (2009) Stephen Robertson, Hugo Zaragoza, et al. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
Simpson (1949) Edward H Simpson. 1949. Measurement of diversity. Nature 163, 4148 (1949), 688.
Springer (2002) Springer 2002. Improved rounding techniques for the MAX 2-SAT and MAX DI-CUT problems. Springer.
Stirling and Wilsey (2001) Gray Stirling and Brian Wilsey. 2001. Empirical relationships between species richness, evenness, and proportional diversity. The American Naturalist 158, 3 (2001), 286–299.
Xie et al. (2017) Pengtao Xie, Aarti Singh, and Eric P Xing. 2017. Uncorrelation and evenness: a new diversity-promoting regularizer. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 3811–3820.
Zhou and Agichtein (2020) Jianghong Zhou and Eugene Agichtein. 2020. RLIRank: Learning to Rank with Reinforcement Learning for Dynamic Search. In Proceedings of The Web Conference 2020. 2842–2848.
Zoph and Le (2016) Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).