Modeling Tag Prediction based on Question Tagging Behavior Analysis of CommunityQA Platform Users

Kuntal Kumar Pal [email protected] 1234-5678-9012 Arizona State UniversityTempeArizonaUnited States and Michael Gamon, Nirupama Chandrasekaran, Silviu Cucerzan Microsoft ResearchRedmondWashingtonUnited States

(2018)

Abstract.

In community question-answering platforms, tags play essential roles in effective information organization and retrieval, better question routing, faster response to questions, and assessment of topic popularity. Hence, automatic assistance for predicting and suggesting tags for posts is of high utility to users of such platforms. To develop better tag prediction across diverse communities and domains, we performed a thorough analysis of users’ tagging behavior in 17 StackExchange communities. We found various common inherent properties of this behavior on those diverse domains. We used the findings to develop a flexible neural tag prediction architecture, which predicts both popular tags and more granular tags for each question. Our extensive experiments and obtained performance show the effectiveness of our model.

Text mining, question tagging, community question answering, tag prediction, transformers, stack exchange, tagging behavior modeling

^†^†copyright: acmcopyright^†^†journalyear: 2018^†^†doi: XXXXXXX.XXXXXXX^†^†conference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NY^†^†price: 15.00^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†submissionid: 901^†^†ccs: Information systems Recommender systems^†^†ccs: Information systems Question answering^†^†ccs: Information systems Document topic models^†^†ccs: Computing methodologies Natural language generation^†^†ccs: Computing methodologies Information extraction

1. Introduction

Community Question Answering (CQA) platforms have become a very important online source of information for Web users. On these platforms, information seeking takes the form of questions and answers in communities formed around common domains of interest. StackExchange, Quora, AnswerBag, Question2Answer, Reddit¹¹1stackexchange.com, quora.com, answerbag.com, question2answer.org, reddit.com and Biostars (Parnell et al., 2011) are some of the most popular public CQA platforms. Many enterprise entities offer similar private platforms for their employees. These communities have amassed over time large online information repositories, with high numbers of daily active users. Thus, there is a need to organize and retrieve information efficiently, as well as to facilitate question routing to interested and qualified experts in order to provide a seamless user experience and interaction. Semantic tagging of questions plays an important role in this context.

Most CQA platforms require users to assign tags to their questions. Tags are keywords representative of the topics covered by those questions. They help communities to (1) categorize and organize information (2) retrieve existing answers for users looking for information, which in turn reduces duplicate question creation (3) route questions to topic experts which improves query response time and answer quality (4) provide tag-based notifications, which allow knowledgeable community members to answer questions in their areas of expertise and gain reputation (5) assess the popularity of various areas and topics in the targeted domain.

Asking users to annotate their questions with tags without providing adequate support poses several challenges, in particular with respect to novice users and to the lack of knowledge about tag usage in a community, which may lead to the creation of various tags with the same meaning, as well as different orthographic forms of those tags. This makes question routing difficult (for tag-based subscription platforms), delays response time, and leads to poor information organization. In turn, addressing these issues would require community administrators to constantly work on identifying and merging near-duplicate tags. Additionally, lack of support in suggesting adequate tags may inhibit novice users from asking questions and/or lead to questions being mistagged and not answered. These challenges may become more severe in enterprise CQA platforms due to community size and topic sparsity.

Against this background, tag-prediction becomes an extremely important while challenging task for both public and private CQA platforms. In this investigation, our first goal was to understand the commonalities of the tagging behaviors of users through a large scale analysis of 17 diverse domains in StackExchange (Section 3). Our analysis revealed that while these domains are quite diverse in terms of volume of questions, users and tags, they share common distributional properties for tag and tag pair usage. Also, there is a large lexical overlap between the tags and user texts in every domain. Post coverage of tags is high in all domains. Tags also show positional stability and tag pairs show particular ordering preferences forming a soft hierarchy among tags.

Refer to caption — Figure 1. Community Diversity in terms of Volume

We incorporate the findings to develop a neural model with two tag-prediction heads - one trained to predict existing popular tags such as the name of important topics in a domain (e.g. ”harry-potter”, doctor-who”, and ”star-wars” in the scifi domain) and frequently-used meta-tags (e.g. ”video-games”, ”books”, and ”short-stories” in scifi) and another one generate finer-grained tags, which may have been used rarely on previous questions or are new. Typically, the former category of tags represent the main topic area of a question while the latter help in further scoping down and clarifying it. Both types of tags are equally important in identifying the question and hence it is necessary for the tag prediction systems to not only predict the main generic tags but also the refined ones.

Our experiments show that the proposed approach significantly outperforms baseline methods in prediction of both generic tags and finer-grained tags. We also investigate and show the effect of reducing the pre-defined vocabulary size, as well as the contributions of each prediction head. Our main contributions in this work are:

•

We present an in-depth analysis of the tagging behaviors of the users of a CQA platform (StackExchange) on 17 diverse domains. We present our findings of question tag analysis across four dimensions: tag space, tag co-occurrence, tag pair ordering, and tag positional stability.
•

We propose a tag prediction architecture for both predicting popular tags from a pre-defined vocabulary and generating refined tags not present in the vocabulary.
•

We perform comprehensive experiments on the 17 domains and show effects of each model component under various experimental settings.

2. Dataset Preparation

We collected data from 17 communities of StackExchange that correspond to a diverse set of domains. We use the StackExchange data dumps²²2https://archive.org/details/stackexchange_20210301 (2021-03-01) for our analysis and model. We find that the Post.xml file is sufficient for our tag analysis and predictions. We only consider the posts from the dataset which are either questions or answers (PostTypeId) for our analysis. We reject posts with no owners (OwnerUserId, OwnerDisplayName). As imposed by StackExchange, the minimum and maximum number of tags assigned to each posts are one and five respectively and all the posts in this data set are dated prior to March, 2021. We chose several domains from each of the following StackExchange categories³³3https://stackexchange.com/sites#: Technology, Culture & recreation, Life & arts, Science and Professional. Each selected domain has at least a decade of posts. We do not include the stackoverflow domain because of its enormous volume and also a random sample set might not be representative of the full data of this domain. Hence we consider askUbuntu which is also a representative community of the Technology domain.

3. Tagging Behavior Analysis

To understand the user behavior of question tagging and to identify the inherent commonalities, we analyze ten years of data from these 17 domains.

Mathematical Notation: Without loss of generality, let $D$ denote one of the domains (out of 17) being investigated, $P$ the set of posts in the data for this domain, and $T=\{t_{1},t_{2},\dots,t_{|T|}\}$ the set of all tags used in domain $D$ . Each post $p_{j}\in P$ has associated a sequence of tags $S(p_{j})=\big{(}t_{(1)},t_{(2)},\dots,t_{(l)}\big{)}$ , $1\leq l\leq 5$ , where $t_{(i)}$ denotes the tag at position $i$ in that sequence. We employ parentheses to distinguish between the positional information of a tag in a sequence and the indexes that identify elements $t_{i}$ of the tag set $T$ observed for domain $D$ .

3.1. Community Diversity

We observed a high degree of variability for the selected domains in terms of Question Volume, Tag Space and Asker Volume. Figure 1 shows a visual comparison of this variability, while Table 1 shows general statistics for each domain. In terms of the amount of information created over a decade, only four domains have over 100K posted questions while the domains politics and history have merely 12K. If we consider the number of unique tags (#T) created, the domain movies ranks highest, as new movie titles are added to the tag set on weekly basis. To quantify tag re-use in each domain, we define post-per-Tag (PPT) as the number of posts available for one tag. We also observe that physics, askubuntu, and chemistry are domains with the most tag-reuse (PPT $>$ 100) while movies domain (PPT $<$ 5) shows frequent new tags. The number of posts having views over 100 (V $>$ 100) can be used to infer the popularity of posts in each domain. From the average number of tags (AvgT) per post, we can infer the need for detailed tagging in each domain. In travel, physics, and money, AvgT $>$ 3 indicates users feel the need to assign more than 3 tags to clarify their questions. Also, the movie domain has the least AvgT (2.09), showing that only two tags on average are sufficient. Some domains like aviation, philosophy, history, movies, politics are not popular (#A $<$ 10K in a decade). More statistics are in Appendix Table 10.

Table 1. Community Diversity. V:Views, PPT:Posts/Tag, QPA:#Q/#A, AvgT: Average #T per Q, #T: Unique Tags

Domain	#Q	#T	PPT	AvgT	V $>$ 100	#A	QPA
askubuntu	371800	3121	119.13	2.78	1093	201912	1.84
aviation	20345	1002	20.30	2.56	12	7066	2.88
biology	25671	739	34.74	2.58	11	12089	2.12
chemistry	37476	375	99.94	2.37	7	17202	2.18
cooking	24513	833	29.43	2.30	13	12413	1.97
electronics	152980	2226	68.72	2.77	36	61869	2.47
history	12562	813	15.45	2.84	19	5296	2.37
money	32648	995	32.81	3.11	37	18010	1.81
movies	20749	4348	4.77	2.09	30	6931	2.99
music	20925	512	40.87	2.52	5	10447	2.00
philosophy	15624	559	27.95	2.40	6	6640	2.35
physics	180166	893	201.75	3.17	131	59774	3.01
politics	12416	739	16.80	2.90	27	3970	3.13
rpg	42693	1195	35.73	2.91	56	11541	3.70
scifi	62987	3433	18.35	2.25	153	22717	2.77
serverfault	299895	3814	78.63	2.90	327	130214	2.30
travel	42201	1891	22.32	3.28	42	24895	1.70

3.2. Tag-Space Analysis

We analyzed each domain’s tag spaces into (1) General Tag Statistics (2) Tag Distributions (3) Tag-Post Coverage (4) Tag-Post Overlap.

General Tag Statistics: The shortest tag in every domain is merely 1-3 characters long (c, air, 3g) while the longest tag is 22-35 characters long (valerian-city-of-a-thousand-planets, neurodegenerative-disorders). askubuntu has the lowest average tag length (8.17) while movies has the highest (13.66). We believe that the tags in askubuntu are short technical terms of a subtopic but movie names tend to be quite long in comparison and are often used as a part of a tag in the movie domain. Table 2 shows the distribution based on the number of words of the tags. With the exception of movies, rpg, and scifi the majority of tags in all the domains consist of three or fewer words. The shortest and longest tags for each domain are presented in Appendix Table 9.

Table 2. Tag % based on the Number of Words in the Tag

Domain	1	2	3	4	5	>5
askubuntu	80.83	18.73	0.37	0.07	0	0
aviation	49.74	43.86	6.34	0.05	0	0
biology	69.30	29.95	0.75	0	0	0
chemistry	47.17	50.36	2.31	0.16	0	0
cooking	78.53	21.11	0.36	0.01	0	0
electronics	74.23	23.9	1.33	0.54	0	0
history	56.86	36.1	7.01	0.03	0	0
money	50.00	45.51	4.05	0.45	0	0
movies	32.81	41.58	16.32	5.61	2.57	1.1
music	77.74	21.07	1.17	0.02	0	0
philosophy	69.42	14.02	16.29	0.27	0.01	0
physics	41.37	49.31	9.02	0.3	0	0
politics	51.26	45.05	3.59	0.08	0.02	0
rpg	42.43	51.39	4.82	1.11	0.16	0.09
scifi	31.04	49.23	13.19	3.23	2.1	1.2
serverfault	67.91	23.09	7.62	1.32	0.06	0
travel	65.78	26.5	6.87	0.85	0	0

Tag Distributions: There is a long tail in the distribution of tags in every domain (Figure 2). We observe that (1) most larger domains where the tag re-use is high, have smoother tag distributions like askubuntu, electronics, biology and (2) for some smaller domains like scifi, movies, rpg, the most frequent tag dominates the distribution. The rest of the distributions are shown in the Appendix Figure 12. Also, Table 3 shows that the 100 most frequent tags (100Tag%) constitute a very small portion of the tag space for large domains.

Table 3. Top-n Tag’s Post Coverage. #T:#distinct tags, 100Tag%:Frequent 100 tag % among whole tag-space.

Domain	#T	100Tag%	Top1	Top10	Top100
askubuntu	3121	3.20	5.67	40.21	82.68
aviation	1002	9.98	11.05	45.93	89.43
biology	739	13.53	9.22	55.05	91.76
chemistry	375	26.67	23.05	61.38	95.35
cooking	833	12.00	9.55	38.99	85.19
electronics	2226	4.49	4.94	32.81	81.98
history	813	12.30	10.86	45.91	89.95
money	995	10.05	37.04	68.52	94.18
movies	4348	2.30	36.93	66.84	85.88
music	512	19.53	14.93	58.04	94.54
philosophy	559	17.89	19.39	63.30	93.77
physics	893	11.20	12.70	55.10	91.68
politics	739	13.53	46.00	66.41	94.95
rpg	1195	8.37	42.50	79.75	92.66
scifi	3433	2.91	27.86	70.67	85.04
serverfault	3814	2.62	11.92	42.76	82.86
travel	1891	5.29	22.20	58.34	92.36

Post Coverage by Tags: We consider a tag to cover a post if it is present in the tag sequence of the post. Table 3 shows the percentage of total posts that can be covered by the top $n$ most frequent tags in each domain. We observe that the most frequent tag covers (Top1) at most 10% of posts in electronics, askubuntu, cooking, and biology domains but more than 40% in politics and rpg domains. More than 81% of all posts in each domain are covered by the 100 most frequent tags.

Tag-Post Overlap: Figure 3 shows whether the tags appear in user contents (question-title / question-body / answers) using two metrics: (1) single worded tag exact-match (EMS) and both single and multiple worded tag exact-match (EMM). We observe that in 8/17 domains, tags appear in more than 50% of post titles. The movie domain has more multi-worded tags than single worded tags (9.49% compared to 34.51%). Two science domains - biology and chemistry - have the lowest tag overlap ( $<$ 30%) with the question title (T-EMS). When we include the question body, we observe, in 9/17 domains, question tags appear in more than 70% of posts. Finally, if we include every answer for each question, all the domains (except chemistry and biology) have their tags appear in more than 70% of the posts. The three larger domains (askubuntu, serverfault, and electronics) have more than 90% overlap. The overlap is lowest (56%) for the chemistry and biology domains.

Table 4. Tag Pairs Post Coverage : % posts covered by top-k tag pairs. Single: % of posts with single tag

Domain	Top-1	Top-3	Top-5	Top-10	Top-50	Top-100	Single
askubuntu	1.57	2.89	5.33	9.43	17.97	23.45	17.70
aviation	2.05	3.49	4.78	6.99	17.00	23.81	19.27
biology	2.85	4.90	7.41	11.39	25.85	33.34	20.67
chemistry	4.33	7.62	9.99	14.56	29.82	36.95	23.89
cooking	1.60	3.45	4.34	5.89	13.51	18.54	25.81
electronics	0.76	2.16	3.20	5.08	13.03	18.62	18.31
history	2.37	4.86	6.09	9.93	20.97	27.58	15.34
money	10.39	17.13	18.52	24.16	39.92	46.49	10.51
movies	2.50	6.28	7.81	10.93	20.29	25.30	21.98
music	2.48	5.32	7.56	13.52	31.20	38.17	20.49
philosophy	1.74	4.97	7.32	11.08	26.54	33.79	27.85
physics	2.32	5.29	7.10	11.07	28.54	37.46	11.39
politics	4.59	11.98	17.60	27.24	43.27	49.24	10.40
rpg	12.48	17.23	22.27	28.13	43.54	52.73	9.96
scifi	5.58	12.09	17.92	26.12	43.57	49.29	25.86
serverfault	1.09	2.84	4.16	6.23	16.07	22.29	13.03
travel	5.17	12.01	14.64	18.04	31.01	38.43	6.45

3.3. Tag Co-Occurrence Analysis

For a post $p_{k}$ , we define tag co-occurrence $C_{ij}=\{\{t_{i},t_{j}\}:t_{i},t_{j}\in S(p_{k}),t_{i}\neq t_{j}\}$ as a pair of tags $\{t_{i},t_{j}\}$ appearing in a post together irrespective of their positions.

Soft Tag Hierarchy: From the tag co-occurrence analysis in the 17 domains, we find that there exists a soft hierarchy among the tag pairs. One of the tags indicates the main topic or area of the question and the other tag is often fine-grained which makes the question more specific. For these examples, the second tag is a sub-category of the first: (baking, bread) in cooking, (dnd-5e, spells) in rpg and (aircraft-design, wing) in aviation. In the science domain, similar examples of topic-subtopic relationships are (organic-chemistry, carbonyl-compounds) in chemistry and (hilbert-space, quantum-mechanics) in physics. The most frequently occuring tag-pair for each domain is shown in Table 5, in Appendix Table 11 a more comprehensive set of the top-5 most frequent pairs per domain are shown.

Table 5. Most Frequently Co-Occurring Tag-Pairs

Domain	Top Pair	Post-Count
askubuntu	(’boot’, ’grub2’)	5845
aviation	(’aerodynamics’, ’aircraft-design’)	417
biology	(’entomology’, ’species-identification’)	731
chemistry	(’organic-chemistry’, ’reaction-mechanism’)	1621
cooking	(’baking’, ’bread’)	393
electronics	(’current’, ’voltage’)	1161
history	(’nazi-germany’, ’world-war-two’)	298
money	(’taxes’, ’united-states’)	3393
movies	(’character’, ’plot-explanation’)	518
music	(’chords’, ’theory’)	519
philosophy	(’logic’, ’philosophy-of-mathematics’)	272
physics	(’homework-and-exercises’, ’newtonian-mechanics’)	4182
politics	(’donald-trump’, ’united-states’)	570
rpg	(’dnd-5e’, ’spells’)	5330
scifi	(’short-stories’, ’story-identification’)	3514
serverfault	(’linux’, ’ubuntu’)	3261
travel	(’uk’, ’visas’)	2181

Tag Pair Post Coverage: We consider a tag-pair ({ $t_{i}$ , $t_{j}$ }) to cover a post if the tag-pair occurs in the sequence of tags for that post in any position. Table 4 shows the tag pair post coverage across the domains. We see around 10-20% of posts have only a single tag. Considering the most frequent 100 pairs we can cover 18-53% posts. Also, the most frequent tag pair can cover more than 10% of posts in money and rpg domains which shows that this tag-pair is extremely essential for these two domains.

Tag Pair Distribution: On analyzing the distribution of top-50 frequently occurring tag pairs in each domain, we observe three patterns: (1) Smooth Distribution (2) Spike in Top-1 and (3) Spikes in top few pairs. Larger domains (askubuntu, serverfault, electronics) have smooth distributions. In smaller domains (movies, scifi, travel) few tag pairs dominate the distributions, indicating their popularity. More Details are available in Appendix Section C and Figures 10 and 11.

3.4. Tag Pair Ordering

We analyze the top-10 most frequent tag pairs in each domain to identify users’ ordering preferences for tags. For a post $p_{k}$ , $O_{ij}=(t_{(m)},t_{(n)})$ (and $O_{ji}$ ) are the tag ordering for the tag pairs $t_{i}$ and $t_{j}$ , where $m$ and $n$ are the positions of $t_{i}$ and $t_{j}$ respectively in the tag sequence $S(p_{k})$ . We find that community users have a tendency to assign the more generic tags prior to the specific ones, for each domain by analyzing the occurrence of $O_{ij}$ and $O_{ji}$ . For example, aircraft-design always appears before wings out of 221 times they appear together in aviation, united-states appears before income-tax, 99.95% of times out of 3393 times they appear in the money domain and dnd-5e always appears before magic-items out of 1367 times in rpg. More examples are in the Appendix F.

3.5. Tag Position Stability

Table 6. Sets of five randomly picked stable tags for positions 1,2 and five for positions 3, 4, 5, respectively, across 17 domains.

Domain	Position 1, 2	Position 3, 4, 5
askubuntu	[’software-installation’,’server’,’community’,’locoteams’,’10.04’]	[’multiple-workstations’,’equalizer’,’speakers’,’workflow’,’flicker’]
aviation	[’air-traffic-control’,’radio-communications’,’airspace’,’flight-planning’,’faa-regulations’]	[’rotary-wing’,’rvsm’,’sfo’,’dash-8’,’special-vfr’]
biology	[’biochemistry’,’immunology’,’cell-biology’,’dna’,’molecular-biology’]	[’ribosome’,’binding-sites’,’exons’,’dendritic-spines’,’rna-interference’]
chemistry	[’crystal-structure’,’equilibrium’,’organic-chemistry’,’thermodynamics’,’inorganic-chemistry’]	[’nitro-compounds’,’bent-bond’,’phenols’,’organosulfur-compounds’,’reaction-coordinate’]
cooking	[’baking’,’oven’,’eggs’,’substitutions’,’sauce’]	[’oregano’,’condensed-milk’,’chopping’,’blind-baking’,’scottish-cuisine’]
electronics	[’arduino’,’motor’,’soldering’,’ethernet’,’avr’]	[’basic-stamp’,’debugwire’,’sinking’,’nxp’,’fuse-bits’]
history	[’20th-century’,’world-war-one’,’language’,’china’,’political-history’]	[’proof’,’dday’,’crusaders’,’templars’,’republic-of-ireland’]
money	[’investing’,’united-states’,’canada’,’taxes’,’credit-card’]	[’pension-plan’,’contractor’,’contribution’,’limits’,’debt-reduction’]
movies	[’wedding-crashers’,’analysis’,’star-wars’,’comedy’,’the-pink-panther’]	[’manichitrathazhu’,’chandramukhi’,’bhool-bhulaiyaa’,’clint-eastwood’,’for-a-few-dollars-more’]
music	[’learning’,’voice’,’theory’,’tuning’,’scales’]	[’stick-control’,’archeterie’,’instrumentation’,’rsi’,’rock-n-roll’]
philosophy	[’epistemology’,’philosophy-of-mathematics’,’ethics’,’existentialism’,’logic’]	[’dreams’,’plantinga’,’rationalism’,’rule-ethics’,’arithmetic’]
physics	[’quantum-mechanics’,’particle-physics’,’string-theory’,’acoustics’,’experimental-physics’]	[’action’,’faq’,’stability’,’wavefunction-collapse’,’coriolis-effect’]
politics	[’election’,’political-theory’,’democracy’,’united-kingdom’,’israel’]	[’first-past-the-post’,’checks-and-balances’,’redistricting’,’faithless-elector’,’puerto-rico’]
rpg	[’pathfinder-1e’,’dnd-3.5e’,’game-recommendation’,’dungeons-and-dragons’,’dogs-in-the-vineyard’]	[’feywild’,’group-scaling’,’round-robin-gming’,’romance’,’charmed’]
scifi	[’novel’,’vorkosigan-saga’,’total-recall-2070’,’star-trek’,’the-road’]	[’star-trek-data’,’3001-the-final-odyssey’,’rama-revealed’,’star-trek-emh’,’skylark-series’]
serverfault	[’sql-server’,’backup’,’sql-server-2008’,’raid’,’windows’]	[’tempdb’,’fakeraid’,’tuning’,’su’,’debian-etch’]
travel	[’loyalty-programs’,’transportation’,’public-transport’,’sightseeing’,’safety’]	[’amazon-river’,’amazon-jungle’,’singapore-airlines’,’sin’,’trans-siberian’]

We study the positional stability of tags i.e., whether some tags frequently appear in any particular position among the five allowed by StackExchange. We consider $\phi_{x}(t)$ as the percentage of occurrence of a tag ( $t$ ) in any position $x$ , given by,

(1)

\phi_{x}(t)=\frac{c(t_{(x)})}{\sum_{k=1}^{5}{c(t_{(k)})}}\%

where $c(t_{(x)})$ denotes the count of tag $t$ in position $x$ . We consider three stability thresholds ( $\delta$ ) - 80%, 90%, 99% (Figure 4 and 5). For a tag $t$ and position $x$ , $\phi_{x}(t)>\delta$ indicates that the tag is stable at that position.

(2)		$\displaystyle Q_{X}=\{t\in T:\sum_{x\in X}\phi_{x}(t)\geq\delta\}$
(3)		$\displaystyle ST_{X}=\frac{\|Q_{X}\|}{\|T\|}\%$

where $Q_{X}$ is the set of tags that occurs more than $\delta$ in sets of positions defined by $X$ and $ST_{X}$ is the percentage of tags in a domain that are stable at positions $X$ . In Figure 4, (rpg domain) for $\delta=99\%$ , we find $ST_{1,2}=13.81$ i.e. 13.81% of all tags in rpg are stable in positions 1 and 2 combined, and $ST_{3,4,5}=15.06$ are stable in positions 3, 4 and 5 combined. The rest of the tags are unstable. Also, the stable tags ( $Q_{3,4,5}$ ) appearing in positions 3, 4, and 5 are finer-grained (or refined) tags that support the stable tags present in positions 1 and 2 ( $Q_{1,2}$ ).

The travel domain, has the highest number of stable tags appearing in positions 3,4, and 5 ( $Q_{3,4,5}$ ) with $\delta=90\%,99\%,80\%$ threshold showing that to make a question specific more than one refined tags is needed in this domain. We neither find any conclusive evidence of this stability within positions 1 and 2 (i.e. $Q_{1}$ and $Q_{2}$ ), nor within positions 3, 4 and 5 (i.e. $Q_{3}$ , $Q_{4}$ and $Q_{5}$ ) individually.

Table 6 shows five randomly selected examples of position-stable tags in 17 domains. These positions account for more than 99% of the occurrences of these tags in their respective domains.

4. Modeling Tag Prediction

Based on the observations from our tagging behavior analysis (Section 3), we develop an automated generic tag prediction approach for CQA platforms that predicts both generic and refined tags. The inherent commonalities in community diversity influence our decision to develop a common tag generation framework. The long tail in tag-space analysis guided us to develop a predictive-generative hybrid model. Tag co-occurrence analysis, tag-pair ordering, and tag-positional analysis on these domains led us to generate $n$ tags from a common vocabulary of popular tags at certain positions and $m$ related granular tags at the remaining positions.

4.1. Majority Baseline

Five most frequent tags per domain from training, data are considered as Top1-Top5 predictions for the test data in order (Hit@1 to Hit@5). We introduce this baseline as the top few tags cover a large number of posts in each domain (Table 3).

4.2. Feature-Based Models

We use linear multi-label classifiers using the one-vs-all strategy with tf-idf and bag-of-word features as two baselines since most of the feature-based tag prediction models use either of them as features. We hypothesize that these models can leverage the high amount of tag-post overlap (Figure 3). Here we train the models for each domain with classes corresponding to all the unique tags.

4.3. MetaTag Predictor Model (MP)

In this model (Figure 6), we first select a vocabulary (MetaTag) of tags based on a frequency analysis of the tag’s post coverage per domain. Here, we consider popular tags as meta tags. We formulate this multi-label classification task as a language model mask-filling task using pre-trained roberta-base (Liu et al., 2019) as the base of this model. We train separately for each domain.

Training: We tokenize the question title ( $Q_{T}$ ) and body ( $Q_{B}$ ) and hide the tags from the MetaTag vocabulary with a mask token, $\langle$ mask $\rangle$ . These are concatenated and provided as input to the model.

$Q_{T}$ + $Q_{B}$ + $\langle$ mask $\rangle$ … $\langle$ mask $\rangle$

This model is trained to predict those masks optimizing the prediction loss ( $\mathcal{L}_{P}$ ) over all masked tokens ( $\mathcal{L}=\mathcal{L}_{P}$ ). Here the number of mask tokens may vary based on the post (shown as above). $\mathcal{L}$ is the total loss.

Inference: We tokenize $Q_{T}$ and $Q_{B}$ , and append five $\langle$ mask $\rangle$ tokens at the end, enforcing the model to predict exactly five tags for the post (the most probable tag for each position). This is because StackExchange allows a maximum of five tags to be associated with a question. This ensures that the model predicts the tags from the MetaTag vocabulary.

$Q_{T}$ + $Q_{B}$ + $\langle$ mask $\rangle$ $\langle$ mask $\rangle$ $\langle$ mask $\rangle$ $\langle$ mask $\rangle$ $\langle$ mask $\rangle$

4.4. Meta Refined Tag Predictor Generator Model (MRPG)

This model (Figure 6) is similar to the MP model, with the additional ability to generate tags not present in the MetaTag vocabulary (OOV). In a more general sense, here the motivation is to develop a model capable of predicting tags from a predefined set and generating novel tags as well.

Training: Similar to MP model, we tokenize $Q_{T}$ and $Q_{B}$ , and replace the tags present in the MetaTag vocabulary with $\langle$ mask $\rangle$ token. The rest of the tags (out-of-vocab or OOV) are tokenized and each token is replaced with a separate mask token, $\langle$ maskref $\rangle$ . A $\langle$ tagsep $\rangle$ token is added to mark the boundaries (start and end) of these OOV tag tokens. The model is trained on joint loss ( $\mathcal{L}$ ) of meta tag prediction head loss ( $\mathcal{L}_{P}$ ) and refined tag generation head loss ( $\mathcal{L}_{G}$ ) given by $\mathcal{L}=\mathcal{L}_{P}+\mathcal{L}_{G}$ .

Inference: Our goal is to encourage the model to generate a combination of meta and refined tags. Based on our tag-stability analysis (Section 3.5), tag pair ordering analysis (Section 3.4) and soft tag-hierarchy findings (Section 3.3), we train the MRPG model to predict the first two tags from the MetaTag vocabulary and to generate the remaining three tags based on the user texts. We append two $\langle$ mask $\rangle$ tokens and a parameterized number of $\langle$ maskref $\rangle$ tokens with tokenized $Q_{T}$ and $Q_{B}$ .

$Q_{T}$ + $Q_{B}$ + $\langle$ mask $\rangle$ $\langle$ mask $\rangle$ $\langle$ maskref $\rangle$ … $\langle$ maskref $\rangle$

Tag Generation: For each $\langle$ maskref $\rangle$ tokens, MRPG generates one token from the tokenizer vocabulary following a greedy approach by selecting the most probable token. We concatenate the generated tokens between two $\langle$ tagsep $\rangle$ tokens and form a tag. We choose the most probable three generated refined tags based on our earlier data analysis and stack exchange tag limitations. However, for implementing this model to any other CQA platform, this number can be incremented or decremented based on the above-mentioned parameter. Also, there is no restriction in the model that will limit it to generating tags with more than 3 words. But they are rare for most of the domains, as can be seen from Table 2. More details are in Appendix Section H.

5. Experiments

5.1. Settings

We split our dataset into train-dev-test in the ratio 70:10:20 based on a random seed value. In our experiments we build our model on top of the base version (125M parameters) of pre-trained roberta language model. We remove html tags (since these tags are irrelevant to StackExchange tags) from the user contents (question title and body) before tag prediction. We ran all experiments on 4 NVIDIA RTX A6000 GPUs (48GB GPU memory) with a batch size of 60 and an input length of 256. We use AdamW (Loshchilov and Hutter, 2017) optimizer, linear warmup scheduler, and a learning rate of 5e-5.

5.2. Metrics:

We define Hit@k (where $k=1,\dots,5$ ) as the percentage of posts where at least one predicted tags match with the actual tags for $k$ predictions. We generate at most 5 tag predictions in line with StackExchange’s upper limit of tags. This metric aligns with our motivation of maximizing the probability that a user will be able to find at least one tag among the recommended fixed number of tags. Hence we do not consider other metrics like precision and recall.

5.3. Performance Analysis

5.3.1. Baseline vs MP vs MRPG

In Table 7, we compare our models with the baselines (mean of five different runs). The feature-based models, bag-of-word, and tf-idf models are able to achieve good performance for those domains where we found a high overlap between user texts and tags. We find that our MP model shows improvements over the majority baseline and the feature-based models by a substantial margin (p-values $<$ 0.05 on Wilcoxon test) in Hit@5 performance. The MRPG model outperforms other methods in almost all the domains (significant improvements in 12 out of 17 domains). This is because it was able to generate tags outside the MetaTag vocabulary. In the biology domain, the MP model performs better than MRPG. This might be because of the high tag reuse in this domain. All the model performance numbers (Hit@k for $k=1\dots 5$ ) are present in Appendix Table 20. In this table, we observe that for Hit@1 MRPG model is always better than MP model.

Table 7. Performance Hit@5, 90% Tag-Post Coverage. Significant improvements (p-values

<

0.05) of MRPG over MP are in bold. P-values are in Appendix J

Domain	Majority	TF-IDF	Bag-of-Words	MP	MRPG
askubuntu	24.84	59.76 $\pm$ 0.06	71.25 $\pm$ 0.56	80.44 $\pm$ 0.11	82.94 $\pm$ 0.15
aviation	35.05	55.12 $\pm$ 0.29	65.58 $\pm$ 0.64	77.09 $\pm$ 0.44	77.63 $\pm$ 0.56
biology	37.94	54.91 $\pm$ 0.16	64.79 $\pm$ 0.50	78.96 $\pm$ 0.34	77.55 $\pm$ 0.41
chemistry	48.89	58.76 $\pm$ 0.17	68.09 $\pm$ 0.46	77.66 $\pm$ 0.10	79.17 $\pm$ 0.45
cooking	29.04	70.28 $\pm$ 0.19	71.69 $\pm$ 0.34	80.86 $\pm$ 0.42	85.18 $\pm$ 0.29
electronics	20.68	57.80 $\pm$ 0.11	70.12 $\pm$ 0.13	77.51 $\pm$ 0.26	81.30 $\pm$ 0.53
history	34.67	58.93 $\pm$ 0.32	59.29 $\pm$ 0.36	80.45 $\pm$ 0.09	81.23 $\pm$ 1.00
money	55.96	75.54 $\pm$ 0.19	79.70 $\pm$ 0.30	84.15 $\pm$ 0.23	87.94 $\pm$ 0.42
movies	54.99	60.80 $\pm$ 0.14	64.57 $\pm$ 0.24	82.91 $\pm$ 0.55	83.25 $\pm$ 0.99
music	47.91	68.15 $\pm$ 0.15	74.26 $\pm$ 0.42	82.66 $\pm$ 0.26	83.71 $\pm$ 0.51
philosophy	48.93	62.71 $\pm$ 0.10	64.06 $\pm$ 0.34	79.45 $\pm$ 0.20	79.49 $\pm$ 0.56
physics	39.98	66.81 $\pm$ 0.16	79.59 $\pm$ 0.17	81.12 $\pm$ 0.22	86.34 $\pm$ 0.37
politics	64.16	81.50 $\pm$ 0.21	83.37 $\pm$ 0.73	86.29 $\pm$ 0.25	90.98 $\pm$ 0.46
rpg	76.66	75.79 $\pm$ 0.23	82.71 $\pm$ 0.24	83.31 $\pm$ 0.33	89.09 $\pm$ 0.16
scifi	62.24	80.48 $\pm$ 0.10	85.88 $\pm$ 0.21	85.91 $\pm$ 0.11	91.53 $\pm$ 0.32
serverfault	29.84	62.83 $\pm$ 0.06	73.07 $\pm$ 0.20	81.66 $\pm$ 0.16	85.82 $\pm$ 0.26
travel	48.31	76.82 $\pm$ 0.48	83.73 $\pm$ 0.27	83.96 $\pm$ 0.12	89.50 $\pm$ 0.30

5.3.2. Effects of Vocabulary Size Reduction

We build the MetaTag vocab with 85% post-coverage by tags ( $\downarrow$ 5%) and show the impact in Figure 7. We observe that the performance gap between MP and MRPG at 90% (Table 7) reduces as vocab size decreases by 5% (Figure 7) across all domains.

This is because the MP model suffers the most (2-5%) for this reduction. This is expected since MP’s performance (by P-head) is based on how big the MetaTag vocabulary is. MRPG model, however, is robust to this vocabulary reduction, i.e., the performance (Hit@5) only changes in the range 0-1.13% with the exception of askubuntu domain (2.26%). Details are in Appendix Table 18. Also with reduced vocab, the maximum performance difference is 9.12% (travel) since it has more refined tags (Section 3.5). The minimum difference is 1.06% (biology). Here the MRPG model could not take much advantage over MP because of high tag reusablity and fewer refined tags.

5.3.3. Head Contribution of MRPG

Figure 8 shows the contribution of P-Head and G-Head in the prediction performance (Hit@5 for 90% coverage vocab). We extract for how many posts (%) (1) only the P-Head correctly predicted at least one tag and (2) only the G-Head correctly predicted at least one tag. P-Head’s contributions were highest (45-74%) since the MetaTag Vocabulary is created using popular tags in each domain. The G-Head was able to predict at least one tag correctly for an extra 4-13% of the posts. The effect of decreasing and increasing the MetaTag vocabulary size by 5% change in tag-post coverage is shown in Appendix Table 19. We observe that the G-Head’s contribution increases up to 4% (on vocab size decrease) and decreases up to 5% (on vocab size increase). We also find that both the heads combined were able to suggest some non-overlapping tags in up to 33% of the posts.

5.3.4. Out-of-Vocabulary Tags Generation %

Table 8 shows MRPG’s performance in the prediction of tags outside MetaTag Vocabulary for 90% Tag-Post Coverage. % Posts shows the percentage of posts where MRPG correctly predicted at least one OOV tag. It has the least contribution in two domains movies (13.88%) and scifi (17.01%). % ALL Tags and % OOV Tags shows that MRPG was able to correctly predict a considerable amount of OOV tags because of the generative head.

Table 8. MRPG’s Out-of-Vocab (OOV) Tag Prediction Match on Test. % Posts: % total posts where MRPG correctly predicted at least one OOV tag. % ALL Tags: % correctly predicted OOV tags out of total #gold tags. % OOV Tags: % correctly predicted OOV tags out of total #OOV gold tags.

Domains	askubuntu	aviation	biology	chemistry	cooking	electronics	history	money	movies	music	philosophy	physics	politics	rpg	scifi	serverfault	travel
% Posts	31.38	23.10	22.09	24.68	29.21	27.32	22.37	35.8	13.88	25.19	19.10	41.93	34.35	36.49	17.01	34.69	43.59
% ALL Tags	12.74	9.49	8.98	11.31	13.65	10.73	8.17	12.74	6.85	10.60	8.55	15.32	13.18	13.92	8.10	13.62	15.66
% OOV Tags	41.92	28.03	27.34	37.84	47.81	31.39	22.78	33.21	22.76	36.87	28.55	36.55	35.37	43.04	38.96	36.38	38.80

5.4. Case Studies

We compare tag predictions of our methods in Figure 9. MRPG was able to generate two extra refined tags than MP in askubuntu domain and was able to predict four out of five tags in physics domain. Included below are examples for five other domains.

Domain: Physics Title: Does matter become energy at the speed of light? Gold: special-relativity, speed-of-light, mass-energy, matter MP: special-relativity, energy, speed-of-light, mass MRPG: special-relativity, speed-of-light, mass-energy, matter

Domain: Travel Title: Nigerian citizen (university student) was refused a UK visit visa due to lack of funds and connection to school - how to resolve? Gold: UK, visa-refusals, nigerian-citizens MP: visas, customs-and-immigration, visa-refusals, paperwork, standard-visitor-visas MRPG: uk, visa-refusals, nigerian-citizens

Domain: Music Title: Piano tuning just under the absolute pitch Gold: piano, tuning MP: piano, tuning, maintenance MRPG: piano, tuning, alternative-tunings, pitch, relative-pitch

Domain: Biology Title: Why aren’t all infections immune-system resistant? Gold: evolution, microbiology, immunology, bacteriology MP: evolution, microbiology, bacteriology, bacteriology, immune-system MRPG: evolution, bacteriology, immunity, antibiotic-resistance

Domain: History Title: Where to find a list of participants in The Crusades? Gold: middle-ages, crusades MP: middle-ages, middle-ages, europe, historiography MRPG: middle-ages, sources, crusades

5.5. Adaptability of the MP & MRPG Architectures

Both the MP and MRPG models can be adapted for use in other domains or in different public and private CQA platforms with specific tag-space restrictions. This can help in efficient question routing to area-experts for faster response time, especially in private CQA platforms where the motivation of the community authority is to get queries resolved faster. Such adaptations can be done by customizing the MetaTag vocabulary based on prior behavioral analysis. Additionally, the number of meta and refined tags can be controlled based on the domain and platform requirements without changes in architecture (through a parameter). Also, the MRPG model can be used in platforms where a soft-hierarchy of tags is known, and routing requires the prediction of top-level tags and leaf tags. In such a scenario, the MetaTag vocabulary could be populated with only top-level tags, allowing the model to generate lower-level tags (from the tail of the tag distribution) based on user texts. With the combination of both types of tags, a query can be routed to a specific sub-area expert without overwhelming all the experts to a specific topic.

6. Related Work

Community QA platform analysis: There have been several studies on Folksonomy (Vander Wal, 2007), the practice of associating custom tags to questions in a social environment. Some of the prior works are: a large-scale analysis of tags and their correlation with other tags (Fu et al., 2020), tag-distribution and tag-occurrence of 168 SE communities (Fu et al., 2020), quality analysis of SO (Singh et al., 2015). User behavior analysis was done on Quora (Wang et al., 2013), Yahoo Answers (Adamic et al., 2008), Google Answers (Chen et al., 2010) and StackOverflow (Anderson et al., 2013). However, here we perform a large-scale study of tags, tag occurrences, and tag relation for 17 domains to understand how they have some common properties in spite of being quite diverse, an observation similar to a prior work (Fu et al., 2020).

Community QA NLP Tasks: As the use of community QA platforms increased and with it the volume of community-created data, various NLP approaches were used to address some of the issues of each platform and also to understand behaviors of users. There have been various insights gathered through analysis of such communities. Similar Question Identification (Zhang et al., 2017, 2018; Vanam and Pulipati, 2021; Kumar and Chauhan, 2022), Similar Tag Identification (Beyer and Pinzger, 2015; Chen et al., 2019), Tag popularity prediction (Fu et al., 2017), Popular Question Prediction (Zhao et al., 2021), Tag predictions (Lipczak, 2008; Lipczak and Milios, 2010; Wang et al., 2015; Wu et al., 2016; Sonam et al., 2019; Tang et al., 2019; Wankerl et al., 2020; Venktesh et al., 2021), detecting anomalous tag combinations (Banerjee et al., 2019), CQA entity linking (Li et al., 2022), expert recommendation (Tondulkar et al., 2018; Lv et al., 2021; Menaha et al., 2021; Anandhan et al., 2022; Krishna and Antulov-Fantulin, 2022; Askari et al., 2022; Liu et al., 2022), question routing (Krishna et al., 2022), identifying unclear questions (Trienes and Balog, 2019), automatic identification of best answers (Burel et al., 2012) and tag-hierarchy predictions (Chen et al., 2019) are some of the interesting tasks. We, perform a large-scale analysis with data over 10 years and across 17 diverse communities. We focus only on the tag-prediction NLP task for CQA platform.

Text Tagging: There are some feature-based machine learning approaches (Wang et al., 2015; Charte et al., 2015; Sonam et al., 2019; Zangerle et al., 2011; Sigurbjörnsson and Van Zwol, 2008; Zangerle et al., 2011; Lipczak and Milios, 2010; Wu et al., 2016) and some deep learning approaches (Tang et al., 2019; Li et al., 2020; Wankerl et al., 2020) for tag prediction. Tagcombine (Wang et al., 2015) uses software object similarity while TagStack (Sonam et al., 2019) uses tf-idf features with Naive Bayes classifier on StackOverflow texts. QUINTA (Charte et al., 2015) works on 6 StackExchange domains using KNN, (Zangerle et al., 2011) on microblogging sites (Twitter) based on tweet-similarity, Tag2word (Wu et al., 2016) in math and StackOverflow domains using an LDA variant, (Lipczak, 2008; Lipczak and Milios, 2010) on BibSonomy and StackOverflow datasets based on tag co-occurrence and user preference. Among the deep learning methods, F2Tag (Wankerl et al., 2020) is on math domains based on visual and textual formula representation, ITAG (Tang et al., 2019) is on the math domain using RNN and TagDC (Li et al., 2020) is based on software object similarity using an LSTM. We here, predict a soft hierarchy of tags (predicting both meta and fine-grained tags) unlike the above-mentioned methods.

7. Conclusion

We perform an in-depth analysis of 17 domains in a popular CQA platform, StackExchange, focusing on various aspects of question tagging such as domain diversity analysis, tag-space analysis, tag co-occurrence analysis, tag order, and tag positional stability. We present multiple insights into user behavior in assigning tags to the questions they post. Based on these findings we develop a tag prediction architecture that generates rarer and finer-grained tags in addition to popular tags from a pre-selected vocabulary. Our approach significantly out-perform feature-based baselines and also shows significant improvement in 12 domains when compared with vocabulary-based approach.

8. Limitations

The analysis and its findings presented here are limited to 17 selected StackExchange domains considering their diversity. However, they may vary for the remaining 150 domains. Some of the findings (e.g. tag’s positional stability) may vary for other CQA platforms which do not have any bounds on the number of tags. We use roberta-base and a smaller input size (256 tokens) for our experiments. With larger models and more context, the performance is expected to increase since more context usually leads to better learning by larger parameterized models. We have ignored the answers in StackExchange for model training. We believe that indiscriminately selecting all answers as context for a question could be too noisy and if we were to select one or more appropriate answers, this would add complexity in choosing between the fastest answer, best answer, accepted answers, etc. We consider this as a separate area of research and future work. We randomly sampled the data for each domain to create the train and test split to show that our MRPG model is capable of both predicting and generating tags. Splitting with respect to timestamp would require tag temporal analysis and tag-evolution which we consider as a future area of research.

9. Ethical Statement

This work analyzes various aspects of aggregate tagging behavior of users on a popular community question-answering platform StackExchange. The data is publicly provided by StackExchange as an anonymized dump of all user-contributed content on the Stack Exchange network. The data is cc-by-sa 4.0 licensed, and intended to be shared and remixed. No specific user has been identified and no user-level information (user name etc.) has been used for this work. We only used the Post.xml extracted from the StackExchange dumps and do not use any user profile statistics. The aggregate user behavior has been analyzed with respect to tagging and user-generated questions. Based on these findings a tag predictor model has been developed. The data has not been modified or redistributed as part of this research.

References

(1)
Adamic et al. (2008) Lada A Adamic, Jun Zhang, Eytan Bakshy, and Mark S Ackerman. 2008. Knowledge sharing and yahoo answers: everyone knows something. In Proceedings of the 17th international conference on World Wide Web. 665–674.
Anandhan et al. (2022) Anitha Anandhan, Maizatul Akmar Ismail, and Liyana Shuib. 2022. EXPERT RECOMMENDATION THROUGH TAG RELATIONSHIP IN COMMUNITY QUESTION ANSWERING. Malaysian Journal of Computer Science 35, 3 (2022), 201–221.
Anderson et al. (2013) Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2013. Steering user behavior with badges. In Proceedings of the 22nd international conference on World Wide Web. 95–106.
Askari et al. (2022) Arian Askari, Suzan Verberne, and Gabriella Pasi. 2022. Expert Finding in Legal Community Question Answering. In European Conference on Information Retrieval. Springer, 22–30.
Banerjee et al. (2019) Rohan Banerjee, Sailaja Rajanala, and Manish Singh. 2019. Evaluating the Choice of Tags in CQA Sites. In International Conference on Database Systems for Advanced Applications. Springer, 625–640.
Beyer and Pinzger (2015) Stefanie Beyer and Martin Pinzger. 2015. Synonym suggestion for tags on stack overflow. In 2015 IEEE 23rd International Conference on Program Comprehension. IEEE, 94–103.
Burel et al. (2012) Grégoire Burel, Yulan He, and Harith Alani. 2012. Automatic identification of best answers in online enquiry communities. In Extended Semantic Web Conference. Springer, 514–529.
Charte et al. (2015) Francisco Charte, Antonio J Rivera, María J del Jesus, and Francisco Herrera. 2015. QUINTA: A question tagging assistant to improve the answering ratio in electronic forums. In Ieee eurocon 2015-international conference on computer as a tool (eurocon). IEEE, 1–6.
Chen et al. (2019) Hui Chen, John Coogle, and Kostadin Damevski. 2019. Modeling stack overflow tags and topics as a hierarchy of concepts. Journal of Systems and Software 156 (2019), 283–299.
Chen et al. (2010) Yan Chen, Teck-Hua Ho, and Yong-mi Kim. 2010. Knowledge market design: A field experiment at Google Answers. Journal of Public Economic Theory 12, 4 (2010), 641–664.
Fu et al. (2017) Chenbo Fu, Yongli Zheng, Shidi Li, Qi Xuan, and Zhongyuan Ruan. 2017. Predicting the popularity of tags in StackExchange QA communities. In 2017 International Workshop on Complex Systems and Networks (IWCSN). IEEE, 90–95.
Fu et al. (2020) Xiang Fu, Shangdi Yu, and Austin R Benson. 2020. Modelling and analysis of tagging networks in Stack Exchange communities. Journal of Complex Networks 8, 5 (2020), cnz045.
Hollander et al. (2013) Myles Hollander, Douglas A Wolfe, and Eric Chicken. 2013. Nonparametric statistical methods. John Wiley & Sons.
Krishna and Antulov-Fantulin (2022) Vaibhav Krishna and Nino Antulov-Fantulin. 2022. Simplifying Sparse Expert Recommendation by Revisiting Graph Diffusion. arXiv preprint arXiv:2208.02438 (2022).
Krishna et al. (2022) Vaibhav Krishna, Vaiva Vasiliauskaite, and Nino Antulov-Fantulin. 2022. Topic Community Based Temporal Expertise for Question Routing. arXiv preprint arXiv:2207.01753 (2022).
Kumar and Chauhan (2022) Shobhan Kumar and Arun Chauhan. 2022. A Transformer Based Encodings for Detection of Semantically Equivalent Questions in cQA. Comput. J. (2022).
Li et al. (2020) Can Li, Ling Xu, Meng Yan, and Yan Lei. 2020. TagDC: A tag recommendation method for software information sites with a combination of deep learning and collaborative filtering. Journal of Systems and Software 170 (2020), 110783.
Li et al. (2022) Yuhan Li, Wei Shen, Jianbo Gao, and Yadong Wang. 2022. Community Question Answering Entity Linking via Leveraging Auxiliary Data. arXiv preprint arXiv:2205.11917 (2022).
Lipczak (2008) Marek Lipczak. 2008. Tag recommendation for folksonomies oriented towards individual users. ECML PKDD discovery challenge 84 (2008), 2008.
Lipczak and Milios (2010) Marek Lipczak and Evangelos Milios. 2010. Learning in efficient tag recommendation. In Proceedings of the fourth ACM conference on Recommender systems. 167–174.
Liu et al. (2019) Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
Liu et al. (2022) Yue Liu, Weize Tang, Zitu Liu, Lin Ding, and Aihua Tang. 2022. High-quality domain expert finding method in CQA based on multi-granularity semantic analysis and interest drift. Information Sciences 596 (2022), 395–413.
Loshchilov and Hutter (2017) Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
Lv et al. (2021) Xiaoqi Lv, Ke Ji, Zhenxiang Chen, Kun Ma, Jun Wu, Yidong Li, and Guandong Xu. 2021. Expert Recommendations with Temporal Dynamics of User Interest in CQA. In International Conference on Web Information Systems Engineering. Springer, 645–652.
Menaha et al. (2021) R Menaha, VE Jayanthi, N Krishnaraj, et al. 2021. A Cluster-based Approach for Finding Domain wise Experts in Community Question Answering System. In Journal of Physics: Conference Series, Vol. 1767. IOP Publishing, 012035.
Parnell et al. (2011) Laurence D Parnell, Pierre Lindenbaum, Khader Shameer, Giovanni Marco Dall’Olio, Daniel C Swan, Lars Juhl Jensen, Simon J Cockell, Brent S Pedersen, Mary E Mangan, Christopher A Miller, et al. 2011. BioStar: an online question & answer resource for the bioinformatics community. PLoS computational biology 7, 10 (2011), e1002216.
Sigurbjörnsson and Van Zwol (2008) Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th international conference on World Wide Web. 327–336.
Singh et al. (2015) Sanjay Singh et al. 2015. Is Stack Overflow Overflowing With Questions and Tags. arXiv preprint arXiv:1508.03601 (2015).
Sonam et al. (2019) Sonam Sonam, Ayushi Verma, Sangeeta Lal, and Neetu Sardana. 2019. TagStack: Automated system for predicting tags in stackoverflow. In 2019 International Conference on Signal Processing and Communication (ICSC). IEEE, 223–228.
Tang et al. (2019) Shijie Tang, Yuan Yao, Suwei Zhang, Feng Xu, Tianxiao Gu, Hanghang Tong, Xiaohui Yan, and Jian Lu. 2019. An integral tag recommendation model for textual content. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5109–5116.
Tondulkar et al. (2018) Rohan Tondulkar, Manisha Dubey, and Maunendra Sankar Desarkar. 2018. Get me the best: predicting best answerers in community question answering sites. In Proceedings of the 12th ACM Conference on Recommender Systems. 251–259.
Trienes and Balog (2019) Jan Trienes and Krisztian Balog. 2019. Identifying unclear questions in community question answering websites. In European conference on information retrieval. Springer, 276–289.
Vanam and Pulipati (2021) Divya Vanam and Venkateswara Rao Pulipati. 2021. Identifying Duplicate Questions in Community Question Answering Forums Using Machine Learning Approaches. In Machine Learning Technologies and Applications. Springer, 131–140.
Vander Wal (2007) Thomas Vander Wal. 2007. Folksonomy.
Venktesh et al. (2021) V Venktesh, Mukesh Mohania, and Vikram Goyal. 2021. TagRec: Automated Tagging of Questions with Hierarchical Learning Taxonomy. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 381–396.
Wang et al. (2013) Gang Wang, Konark Gill, Manish Mohanlal, Haitao Zheng, and Ben Y Zhao. 2013. Wisdom in the social crowd: an analysis of quora. In Proceedings of the 22nd international conference on World Wide Web. 1341–1352.
Wang et al. (2015) Xin-Yu Wang, Xin Xia, and David Lo. 2015. Tagcombine: Recommending tags to contents in software information sites. Journal of Computer Science and Technology 30, 5 (2015), 1017–1035.
Wankerl et al. (2020) Sebastian Wankerl, Gerhard Götz, and Andreas Hotho. 2020. f2tag—Can Tags be Predicted Using Formulas?. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 565–571.
Wu et al. (2016) Yong Wu, Yuan Yao, Feng Xu, Hanghang Tong, and Jian Lu. 2016. Tag2word: Using tags to generate words for content based tag recommendation. In Proceedings of the 25th ACM international on conference on information and knowledge management. 2287–2292.
Zangerle et al. (2011) Eva Zangerle, Wolfgang Gassler, and Günther Specht. 2011. Using tag recommendations to homogenize folksonomies in microblogging environments. In International conference on social informatics. Springer, 113–126.
Zhang et al. (2017) Wei Emma Zhang, Quan Z Sheng, Jey Han Lau, and Ermyas Abebe. 2017. Detecting duplicate posts in programming QA communities via latent semantics and association rules. In Proceedings of the 26th International Conference on World Wide Web. 1221–1229.
Zhang et al. (2018) Wei Emma Zhang, Quan Z Sheng, Jey Han Lau, Ermyas Abebe, and Wenjie Ruan. 2018. Duplicate detection in programming question answering communities. ACM Transactions on Internet Technology (TOIT) 18, 3 (2018), 1–21.
Zhao et al. (2021) Li Xian Zhao, Li Zhang, and Jing Jiang. 2021. Hot question prediction in Stack Overflow. IET Software 15, 1 (2021), 90–106.

Appendix A Domain Statistics

Table 10 shows more details about domain diversity apart from those mentioned in the main section 3.1. We can see cooking and rpg are the domains with the least number of questions with no answers ( $<$ 5%) which indicates the experts in these domains are very active. The science domains have more than 15% questions with no answers which shows that special knowledge is required to answer such questions. $maxview$ and $maxans$ show the maximum limit of users who viewed the questions and the maximum number of answers that a question has. no accept ans shows the percentage of posts that have not been accepted by the askers as answers. This gives an indication of whether askers are active and also whether the answers are satisfactory.

Table 9. Tag Statistics: AvgTLen - Average Tag Length

Domains	Longest Tag (number of characters)		Shortest Tag		AvgTLen
Domains	Tag	Size	Tag	Size	AvgTLen
askubuntu	windows-subsystem-for-linux	27	c	1	8.17
aviation	performance-based-navigation	28	cg	2	10.53
biology	neurodegenerative-disorders	27	ph	2	10.97
chemistry	differential-scanning-calorimetry	33	ph	2	12.75
cooking	please-remove-this-tag	22	ue	2	8.56
electronics	semiconductor-process-technology	32	c	1	8.80
history	articles-of-confederation	25	art	3	9.83
money	health-reimbursement-arrangement	32	w9	2	10.87
movies	valerian-city-of-a-thousand-planets	35	m	1	13.66
music	solid-body-electric-guitars	27	dj	2	9.62
philosophy	philosophy-of-political-science	31	art	3	11.17
physics	heisenberg-uncertainty-principle	32	air	3	13.39
politics	immigration-customs-enforcement	31	alp	3	11.12
rpg	werewolf-the-apocalypse-2nd-edition	35	e6	2	12.29
scifi	the-hitchhikers-guide-to-the-galaxy	35	dc	2	13.31
serverfault	google-cloud-internal-load-balancer	35	3g	2	8.87
travel	new-zealand-permanent-resident	30	eu	2	9.39

Appendix B Tag Length Analysis

Table 9 shows the maximum and minimum length tags in each domain. We also see that the average tag length of the movies and physics domain are the highest. We find that often the movie names or physics topics are longer than three words leading to an increase in average tag length.

Table 10. Domain Statistics

Domain	Q	T	Q/T	AVGT	NOANS (%)	NOSCORES (%)	NO ACCEPT ANS (%)	MAXANS	MAXVIEW	VIEWGT100	#ASKERS
askubuntu	371800	3121	119.13	2.78	23.47	37.21	66.99	82	5409384	1093	201912
aviation	20345	1002	20.3	2.56	7.02	9.82	46.66	18	219002	12	7066
biology	25671	739	34.74	2.58	20.73	15.16	56.13	11	445257	11	12089
chemistry	37476	375	99.94	2.37	19.36	16.86	59.12	11	1077991	7	17202
cooking	24513	833	29.43	2.3	4.6	11.79	50.57	85	1619295	13	12413
electronics	152980	2226	68.72	2.77	9.13	40.87	50.94	38	591616	36	61869
history	12562	813	15.45	2.84	9.85	4.6	49.94	34	994376	19	5296
money	32648	995	32.81	3.11	7.91	17.73	54.56	25	821144	37	18010
movies	20749	4348	4.77	2.09	9.48	2.96	39.08	19	1183407	30	6931
music	20925	512	40.87	2.52	3.24	10.96	49.62	25	611990	5	10447
philosophy	15624	559	27.95	2.4	11	15.27	63.73	31	250018	6	6640
physics	180166	893	201.75	3.17	17.49	29.54	57.08	49	847876	131	59774
politics	12416	739	16.8	2.9	6.81	5.55	48.72	27	833812	27	3970
rpg	42693	1195	35.73	2.91	4.41	2.26	32.39	44	865197	56	11541
scifi	62987	3433	18.35	2.25	10.62	2.45	42.87	34	1430390	153	22717
serverfault	299895	3814	78.63	2.9	11.68	37.05	51.93	160	2478923	327	130214
travel	42201	1891	22.32	3.28	11.2	8.19	59.48	30	430504	42	24895

Appendix C Tag Co-Occurrence Distribution Analysis

We analyzed the distribution of the top-50 frequently occurring tag pairs in each domain (Figure 10, 11). We observe three main patterns: (1) Smooth Distribution (2) Spike in Top-1 (3) Spikes in top few pairs. Larger domains like askubuntu, serverfault, electronics, and physics, have smooth distributions. Some of the smaller domains like politics, philosophy, and music also show this behavior, which we believe is because, in these domains, the questions have fine-grained topics. In domains like rpg, money, history, aviation, biology, chemistry, the tags of the most frequent tag pair that appears in abundance are generic in nature. Finally, in domains like movies, scifi, cooking and travel, few tag pairs dominate the distributions, indicating their popularity in such smaller domains.

Appendix D Tag Co-Occurrence Examples

Table 11 shows the most frequent tag pairs that appear in each domain.

Table 11. Top-5 Most Frequent Tag Pairs

Domain	Top-5 Most Frequent Tag Pairs
askubuntu	(boot, grub2), (boot, dual-boot), (dual-boot, grub2), (bash, command-line), (apt, package-management)
aviation	(aerodynamics, aircraft-design), (aircraft-design, wing), (aerodynamics, wing), (aircraft-design, aircraft-performance),
	(air-traffic-control, faa-regulations)
biology	(entomology, species-identification), (species-identification, zoology), (botany, species-identification),
	(neurophysiology, neuroscience), (biochemistry, molecular-biology)
chemistry	(organic-chemistry, reaction-mechanism), (physical-chemistry, thermodynamics), (aromatic-compounds, organic-chemistry),
	(nomenclature, organic-chemistry), (carbonyl-compounds, organic-chemistry)
cooking	(baking, bread), (baking, cake), (baking, cookies), (baking, substitutions), (bread, dough)
electronics	(current, voltage), (pcb, pcb-design), (power, power-supply), (batteries, battery-charging), (microcontroller, pic)
history	(nazi-germany, world-war-two), (united-states, world-war-two), (europe, middle-ages),
	(japan, world-war-two), (military, world-war-two)
money	(taxes, united-states), (income-tax, united-states), (401k, united-states), (income-tax, taxes), (tax-deduction, united-states)
movies	(character, plot-explanation), (marvel-cinematic-universe, plot-explanation),
	(game-of-thrones, plot-explanation), (analysis, plot-explanation), (avengers-infinity-war, marvel-cinematic-universe)
music	(chords, theory), (chord-theory, chords), (harmony, theory), (scales, theory), (chord-theory, theory)
philosophy	(logic, philosophy-of-mathematics), (epistemology, philosophy-of-science), (fallacies, logic),
	(logic, symbolic-logic), (metaphysics, ontology)
physics	(homework-and-exercises, newtonian-mechanics), (forces, newtonian-mechanics), (hilbert-space, quantum-mechanics),
	(operators, quantum-mechanics), (quantum-mechanics, wavefunction)
politics	(donald-trump, united-states), (president, united-states), (presidential-election, united-states),
	(congress, united-states), (election, united-states)
rpg	(dnd-5e, spells), (dnd-5e, magic-items), (class-feature, dnd-5e), (dnd-5e, monsters), (pathfinder-1e, spells)
scifi	(short-stories, story-identification), (marvel, marvel-cinematic-universe), (books, story-identification),
	(the-lord-of-the-rings, tolkiens-legendarium), (novel, story-identification)
serverfault	(linux, ubuntu), (centos, linux), (amazon-ec2, amazon-web-services), (linux, networking), (apache-2.2, php)
travel	(uk, visas), (schengen, visas), (usa, visas), (customs-and-immigration, usa), (indian-citizens, visas)

Appendix E Tag Distributions

Figure 12 shows the distribution of top-100 most frequent tags in each domain.

Appendix F Tag Ordering Example:

Tables 12, 13, and 14 show top-10 most frequently occurring tag pairs in each domain. On analyzing manually, we found that in most of the cases meta-tag appears before the refined tags.

Table 12. Tag Ordering Statistics - First 6 domains

Domain	Total	Order-1	%	Order-2	%
askubuntu	5845	(boot,grub2)	99.93	(grub2,boot)	0.07
askubuntu	5174	(boot,dual-boot)	99.96	(dual-boot,boot)	0.04
askubuntu	5104	(dual-boot,grub2)	91.12	(grub2,dual-boot)	8.88
askubuntu	4552	(bash,command-line)	1.89	(command-line,bash)	98.11
askubuntu	4547	(apt,package-management)	98.53	(package-management,apt)	1.47
askubuntu	4304	(networking,wireless)	70.07	(wireless,networking)	29.93
askubuntu	4178	(dual-boot,partitioning)	97.75	(partitioning,dual-boot)	2.25
askubuntu	4128	(drivers,nvidia)	99.93	(nvidia,drivers)	0.07
askubuntu	3257	(networking,server)	97.97	(server,networking)	2.03
askubuntu	3003	(bash,scripts)	99.9	(scripts,bash)	0.1
aviation	417	(aerodynamics,aircraft-design)	1.68	(aircraft-design,aerodynamics)	98.32
aviation	221	(aircraft-design,wing)	100	(wing,aircraft-design)	0
aviation	221	(aerodynamics,wing)	100	(wing,aerodynamics)	0
aviation	183	(aircraft-design,aircraft-performance)	100	(aircraft-performance,aircraft-design)	0
aviation	138	(air-traffic-control,faa-regulations)	0	(faa-regulations,air-traffic-control)	100
aviation	136	(faa-regulations,instrument-flight-rules)	100	(instrument-flight-rules,faa-regulations)	0
aviation	127	(aerodynamics,lift)	100	(lift,aerodynamics)	0
aviation	125	(aerodynamics,airfoil)	100	(airfoil,aerodynamics)	0
aviation	124	(air-traffic-control,radio-communications)	100	(radio-communications,air-traffic-control)	0
aviation	124	(aerodynamics,aircraft-performance)	100	(aircraft-performance,aerodynamics)	0
biology	731	(entomology,species-identification)	10.81	(species-identification,entomology)	89.19
biology	361	(species-identification,zoology)	76.18	(zoology,species-identification)	23.82
biology	350	(botany,species-identification)	44.29	(species-identification,botany)	55.71
biology	322	(neurophysiology,neuroscience)	0	(neuroscience,neurophysiology)	100
biology	321	(biochemistry,molecular-biology)	99.69	(molecular-biology,biochemistry)	0.31
biology	274	(dna,genetics)	4.38	(genetics,dna)	95.62
biology	272	(evolution,genetics)	37.5	(genetics,evolution)	62.5
biology	256	(botany,plant-physiology)	98.05	(plant-physiology,botany)	1.95
biology	251	(entomology,zoology)	0.8	(zoology,entomology)	99.2
biology	247	(cell-biology,molecular-biology)	1.21	(molecular-biology,cell-biology)	98.79
chemistry	1621	(organic-chemistry,reaction-mechanism)	100	(reaction-mechanism,organic-chemistry)	0
chemistry	703	(physical-chemistry,thermodynamics)	99.43	(thermodynamics,physical-chemistry)	0.57
chemistry	648	(aromatic-compounds,organic-chemistry)	0	(organic-chemistry,aromatic-compounds)	100
chemistry	585	(nomenclature,organic-chemistry)	0	(organic-chemistry,nomenclature)	100
chemistry	529	(carbonyl-compounds,organic-chemistry)	0	(organic-chemistry,carbonyl-compounds)	100
chemistry	526	(acid-base,organic-chemistry)	0	(organic-chemistry,acid-base)	100
chemistry	457	(organic-chemistry,synthesis)	100	(synthesis,organic-chemistry)	0
chemistry	429	(organic-chemistry,stereochemistry)	100	(stereochemistry,organic-chemistry)	0
chemistry	420	(acid-base,ph)	100	(ph,acid-base)	0
chemistry	348	(acid-base,inorganic-chemistry)	0	(inorganic-chemistry,acid-base)	100
cooking	393	(baking,bread)	99.75	(bread,baking)	0.25
cooking	290	(baking,cake)	100	(cake,baking)	0
cooking	180	(baking,cookies)	100	(cookies,baking)	0
cooking	179	(baking,substitutions)	91.06	(substitutions,baking)	8.94
cooking	137	(bread,dough)	91.24	(dough,bread)	8.76
cooking	131	(bread,sourdough)	100	(sourdough,bread)	0
cooking	124	(baking,dough)	100	(dough,baking)	0
cooking	122	(bread,yeast)	96.72	(yeast,bread)	3.28
cooking	116	(baking,oven)	100	(oven,baking)	0
cooking	111	(dough,pizza)	91.89	(pizza,dough)	8.11
electronics	1161	(current,voltage)	0.6	(voltage,current)	99.4
electronics	1138	(pcb,pcb-design)	100	(pcb-design,pcb)	0
electronics	1043	(power,power-supply)	0.48	(power-supply,power)	99.52
electronics	844	(batteries,battery-charging)	100	(battery-charging,batteries)	0
electronics	775	(microcontroller,pic)	98.58	(pic,microcontroller)	1.42
electronics	620	(amplifier,operational-amplifier)	3.87	(operational-amplifier,amplifier)	96.13
electronics	619	(power-supply,switch-mode-power-supply)	100	(switch-mode-power-supply,power-supply)	0
electronics	612	(bjt,transistors)	0.49	(transistors,bjt)	99.51
electronics	598	(mosfet,transistors)	0.17	(transistors,mosfet)	99.83
electronics	587	(arduino,microcontroller)	86.03	(microcontroller,arduino)	13.97

Table 13. Tag Ordering Statistics - Second 6 domains

Domain	Total	Order-1	%	Order-2	%
history	298	(nazi-germany,world-war-two)	0.67	(world-war-two,nazi-germany)	99.33
history	179	(united-states,world-war-two)	94.41	(world-war-two,united-states)	5.59
history	153	(europe,middle-ages)	0	(middle-ages,europe)	100
history	141	(japan,world-war-two)	0	(world-war-two,japan)	100
history	138	(military,world-war-two)	0.72	(world-war-two,military)	99.28
history	136	(19th-century,united-states)	0	(united-states,19th-century)	100
history	134	(soviet-union,world-war-two)	0.75	(world-war-two,soviet-union)	99.25
history	117	(20th-century,united-states)	8.55	(united-states,20th-century)	91.45
history	106	(ancient-rome,roman-empire)	84.91	(roman-empire,ancient-rome)	15.09
history	105	(ancient-history,ancient-rome)	99.05	(ancient-rome,ancient-history)	0.95
money	3393	(taxes,united-states)	0.03	(united-states,taxes)	99.97
money	2087	(income-tax,united-states)	0.05	(united-states,income-tax)	99.95
money	883	(401k,united-states)	0	(united-states,401k)	100
money	839	(income-tax,taxes)	3.81	(taxes,income-tax)	96.19
money	662	(tax-deduction,united-states)	0.15	(united-states,tax-deduction)	99.85
money	638	(investing,stocks)	16.3	(stocks,investing)	83.7
money	613	(ira,united-states)	0	(united-states,ira)	100
money	604	(investing,united-states)	0	(united-states,investing)	100
money	554	(mortgage,united-states)	0	(united-states,mortgage)	100
money	541	(roth-ira,united-states)	0	(united-states,roth-ira)	100
movies	518	(character,plot-explanation)	2.9	(plot-explanation,character)	97.1
movies	509	(marvel-cinematic-universe,plot-explanation)	0.2	(plot-explanation,marvel-cinematic-universe)	99.8
movies	367	(game-of-thrones,plot-explanation)	0.82	(plot-explanation,game-of-thrones)	99.18
movies	242	(analysis,plot-explanation)	7.85	(plot-explanation,analysis)	92.15
movies	233	(avengers-infinity-war,marvel-cinematic-universe)	0	(marvel-cinematic-universe,avengers-infinity-war)	100
movies	205	(character,marvel-cinematic-universe)	100	(marvel-cinematic-universe,character)	0
movies	199	(avengers-endgame,marvel-cinematic-universe)	0	(marvel-cinematic-universe,avengers-endgame)	100
movies	184	(analysis,character)	29.89	(character,analysis)	70.11
movies	179	(dialogue,plot-explanation)	5.59	(plot-explanation,dialogue)	94.41
movies	143	(ending,plot-explanation)	2.8	(plot-explanation,ending)	97.2
music	519	(chords,theory)	0	(theory,chords)	100
music	490	(chord-theory,chords)	0	(chords,chord-theory)	100
music	435	(harmony,theory)	0	(theory,harmony)	100
music	410	(scales,theory)	0	(theory,scales)	100
music	404	(chord-theory,theory)	0	(theory,chord-theory)	100
music	363	(electric-guitar,guitar)	0	(guitar,electric-guitar)	100
music	337	(notation,sheet-music)	99.41	(sheet-music,notation)	0.59
music	329	(chords,guitar)	0	(guitar,chords)	100
music	328	(chord-progressions,theory)	0	(theory,chord-progressions)	100
music	306	(notation,piano)	0	(piano,notation)	100
philosophy	272	(logic,philosophy-of-mathematics)	100	(philosophy-of-mathematics,logic)	0
philosophy	266	(epistemology,philosophy-of-science)	94.36	(philosophy-of-science,epistemology)	5.64
philosophy	246	(fallacies,logic)	0.41	(logic,fallacies)	99.59
philosophy	193	(logic,symbolic-logic)	100	(symbolic-logic,logic)	0
philosophy	186	(metaphysics,ontology)	100	(ontology,metaphysics)	0
philosophy	186	(logic,philosophy-of-logic)	100	(philosophy-of-logic,logic)	0
philosophy	183	(argumentation,logic)	0.55	(logic,argumentation)	99.45
philosophy	179	(epistemology,metaphysics)	100	(metaphysics,epistemology)	0
philosophy	179	(epistemology,logic)	1.68	(logic,epistemology)	98.32
philosophy	178	(logic,proof)	100	(proof,logic)	0
physics	4182	(homework-and-exercises,newtonian-mechanics)	99.74	(newtonian-mechanics,homework-and-exercises)	0.26
physics	3658	(forces,newtonian-mechanics)	0.52	(newtonian-mechanics,forces)	99.48
physics	2565	(hilbert-space,quantum-mechanics)	0	(quantum-mechanics,hilbert-space)	100
physics	2360	(operators,quantum-mechanics)	0	(quantum-mechanics,operators)	100
physics	2337	(quantum-mechanics,wavefunction)	100	(wavefunction,quantum-mechanics)	0
physics	2238	(electromagnetism,magnetic-fields)	99.82	(magnetic-fields,electromagnetism)	0.18
physics	2196	(homework-and-exercises,quantum-mechanics)	0	(quantum-mechanics,homework-and-exercises)	100
physics	1988	(newtonian-gravity,newtonian-mechanics)	0	(newtonian-mechanics,newtonian-gravity)	100
physics	1767	(quantum-mechanics,schroedinger-equation)	100	(schroedinger-equation,quantum-mechanics)	0
physics	1704	(black-holes,general-relativity)	0	(general-relativity,black-holes)	100

Table 14. Tag Ordering Statistics - Last 5 domains

Domain	Total	Order-1	%	Order-2	%
politics	570	(donald-trump,united-states)	0	(united-states,donald-trump)	100
politics	557	(president,united-states)	0	(united-states,president)	100
politics	523	(presidential-election,united-states)	0	(united-states,presidential-election)	100
politics	478	(congress,united-states)	0	(united-states,congress)	100
politics	475	(election,united-states)	0.63	(united-states,election)	99.37
politics	467	(brexit,united-kingdom)	0	(united-kingdom,brexit)	100
politics	328	(constitution,united-states)	0	(united-states,constitution)	100
politics	282	(law,united-states)	0.35	(united-states,law)	99.65
politics	279	(senate,united-states)	0.36	(united-states,senate)	99.64
politics	254	(united-states,voting)	100	(voting,united-states)	0
rpg	5330	(dnd-5e,spells)	99.21	(spells,dnd-5e)	0.79
rpg	1367	(dnd-5e,magic-items)	100	(magic-items,dnd-5e)	0
rpg	1212	(class-feature,dnd-5e)	0	(dnd-5e,class-feature)	100
rpg	1204	(dnd-5e,monsters)	99.83	(monsters,dnd-5e)	0.17
rpg	1188	(pathfinder-1e,spells)	90.24	(spells,pathfinder-1e)	9.76
rpg	959	(dnd-3.5e,spells)	72.78	(spells,dnd-3.5e)	27.22
rpg	676	(dnd-5e,feats)	99.85	(feats,dnd-5e)	0.15
rpg	632	(dnd-5e,warlock)	100	(warlock,dnd-5e)	0
rpg	607	(balance,dnd-5e)	0.16	(dnd-5e,balance)	99.84
rpg	567	(combat,dnd-5e)	0.53	(dnd-5e,combat)	99.47
scifi	3514	(short-stories,story-identification)	1.05	(story-identification,short-stories)	98.95
scifi	2109	(marvel,marvel-cinematic-universe)	76.67	(marvel-cinematic-universe,marvel)	23.33
scifi	2029	(books,story-identification)	0.74	(story-identification,books)	99.26
scifi	1922	(the-lord-of-the-rings,tolkiens-legendarium)	52.76	(tolkiens-legendarium,the-lord-of-the-rings)	47.24
scifi	1859	(novel,story-identification)	1.02	(story-identification,novel)	98.98
scifi	1638	(movie,story-identification)	1.47	(story-identification,movie)	98.53
scifi	1497	(star-trek,star-trek-tng)	99.67	(star-trek-tng,star-trek)	0.33
scifi	1077	(aliens,story-identification)	2.04	(story-identification,aliens)	97.96
scifi	866	(a-song-of-ice-and-fire,game-of-thrones)	6.24	(game-of-thrones,a-song-of-ice-and-fire)	93.76
scifi	723	(star-wars,star-wars-legends)	100	(star-wars-legends,star-wars)	0
serverfault	3261	(linux,ubuntu)	98.13	(ubuntu,linux)	1.87
serverfault	2865	(centos,linux)	1.33	(linux,centos)	98.67
serverfault	2498	(amazon-ec2,amazon-web-services)	76.7	(amazon-web-services,amazon-ec2)	23.3
serverfault	2452	(linux,networking)	99.14	(networking,linux)	0.86
serverfault	1912	(apache-2.2,php)	86.72	(php,apache-2.2)	13.28
serverfault	1803	(debian,linux)	1.5	(linux,debian)	98.5
serverfault	1716	(linux,ssh)	98.19	(ssh,linux)	1.81
serverfault	1643	(apache-2.2,linux)	2.01	(linux,apache-2.2)	97.99
serverfault	1560	(iptables,linux)	1.15	(linux,iptables)	98.85
serverfault	1466	(apache-2.2,virtualhost)	96.18	(virtualhost,apache-2.2)	3.82
travel	2181	(uk,visas)	0.05	(visas,uk)	99.95
travel	1779	(schengen,visas)	0.06	(visas,schengen)	99.94
travel	1340	(usa,visas)	2.24	(visas,usa)	97.76
travel	871	(customs-and-immigration,usa)	0	(usa,customs-and-immigration)	100
travel	795	(indian-citizens,visas)	0	(visas,indian-citizens)	100
travel	727	(transit,visas)	0	(visas,transit)	100
travel	726	(customs-and-immigration,visas)	0	(visas,customs-and-immigration)	100
travel	643	(standard-visitor-visas,uk)	0	(uk,standard-visitor-visas)	100
travel	566	(uk,visa-refusals)	100	(visa-refusals,uk)	0
travel	511	(visa-refusals,visas)	0	(visas,visa-refusals)	100

Table 15. Tag-Post Overlap:% posts where at least one tag appears in user texts. EMS%: single word tag exact match, EMM%: single & multi-word tag exact match

Domain	Title		Title+Body		Title+Body+Answer
Domain	EMS	EMM	EMS	EMM	EMS	EMM
askubuntu	56.94	71.64	77.29	88.67	81.46	91.15
aviation	29.09	49.63	47.53	66.11	58.98	75.76
biology	17.95	29.68	33.70	47.17	42.92	56.97
chemistry	19.72	29.26	32.81	46.44	40.99	56.20
cooking	53.51	71.04	72.75	82.92	80.87	88.40
electronics	55.24	71.11	75.21	86.69	80.6	89.99
history	22.23	44.34	43.76	67.23	59.66	79.51
money	34.47	57.89	56.32	78.66	66.62	86.23
movies	9.49	34.51	17.69	72.92	24.06	77.32
music	37.42	53.76	60.81	75.32	75.02	85.78
philosophy	23.85	43.24	44.85	63.77	59.81	75.54
physics	24.34	40.95	40.25	61.48	48.24	70.20
politics	28.37	54.11	52.64	77.52	67.41	88.21
rpg	23.69	41.14	44.72	65.13	57.52	76.20
scifi	17.76	38.31	28.38	60.49	34.63	70.64
serverfault	58.67	74.50	76.70	89.34	80.02	91.51
travel	45.93	63.66	65.88	79.76	75.11	86.38

Appendix G Tag-Post Overlap: Full Table

Table 15 shows the tag-post overlap in tabular form similar to Figure 3 in Section 3.

Appendix H Decoding Phase of the MRPG Model

We allow the model to generate the tags based on the input parameter maximum output length and then use few heuristics to filter out appropriate tag-tokens and choose the top-k tags. Our heuristics are based on prior knowledge about how a tag token should be like (1) a tag cannot start or end with a ’-’ (2) skip the punctuation tokens (3) ignoring adjacent repeated tags. We then combine the tag tokens between two $\langle$ tagsep $\rangle$ tokens to form the final tag. We also calculate the top-k ( $k=1\dots 5$ ) most probable tags based on the combined probability scores of the tag-tokens.

Appendix I Feature-based Model Configurations:

For building both the tf-idf and bag of words features we consider unigram and bigram features with a minimum document frequency of 0.00009. We generate 200,000 maximum features. We consider log loss and search hyper-parameter space using alpha = [0.0001,0.001,0.00001] and penalty=[ $l_{1}$ , $l_{2}$ ] for the Stochastic Gradient Descent One versus rest classifier. For both the models, we find that $l_{2}$ penalty with 0.00001 alpha yields the best performance.

Appendix J P-values for Hit@5

Table 16 shows the p-values when MRPG model’s Hit@5 is compared with MP model. The significance test has been done by one-sided Wilcoxon Test(Hollander et al., 2013). For k=1,2,3,4 MRPG model’s Hit@k shows significant improvements over MP model. MRPG model outperforms all other baselines significantly in Hit@k metrics for each value of k.

Table 16. MRPG vs MP: P-values for Hit@5 calculated by one-sided wilcoxon test (Hollander et al., 2013). The improvement of MRPG is considered significant if it is less than 0.05

Domains	P-Values	Is Significant
askubuntu	0.03125	Yes
aviation	0.15625	No
biology	1.00000	No
chemistry	0.03125	Yes
cooking	0.03125	Yes
electronics	0.03125	Yes
history	0.09375	No
money	0.03125	Yes
movies	0.40625	No
music	0.03125	Yes
philosophy	0.50000	No
physics	0.03125	Yes
politics	0.03125	Yes
rpg	0.03125	Yes
scifi	0.03125	Yes
serverfault	0.03125	Yes
travel	0.03125	Yes

Appendix K Detailed Tag-Post Coverage %

Table 17 shows detailed tag-post coverage.

Table 17. Top-n Tag’s Post Coverage. #T:#distinct tags, 100T%:Top-100 tag percent among whole tag-space.

Domain	#T	Top1	Top3	Top5	Top10	Top50	Top100	100T%
askubuntu	3121	5.67	15.87	24.81	40.21	71.84	82.68	3.2
aviation	1002	11.05	25.81	33.87	45.93	79.13	89.43	9.98
biology	739	9.22	23.91	37.84	55.05	84.39	91.76	13.53
chemistry	375	23.05	42.61	48.62	61.38	87.69	95.35	26.67
cooking	833	9.55	22.45	29.55	38.99	71.45	85.19	12
electronics	2226	4.94	13.84	20.88	32.81	68.96	81.98	4.49
history	813	10.86	25.08	35.27	45.91	80.82	89.95	12.3
money	995	37.04	49.69	56.62	68.52	88.33	94.18	10.05
movies	4348	36.93	49.59	56.36	66.84	81.59	85.88	2.3
music	512	14.93	39.08	47.59	58.04	87.42	94.54	19.53
philosophy	559	19.39	37.1	48.56	63.3	87.29	93.77	17.89
physics	893	12.7	28.35	39.99	55.1	83.98	91.68	11.2
politics	739	46	59.16	63.64	66.41	89.63	94.95	13.53
rpg	1195	42.5	61.23	76.9	79.75	88.01	92.66	8.37
scifi	3433	27.86	47.75	62.03	70.67	81.32	85.04	2.91
serverfault	3814	11.92	22.16	29.97	42.76	72.8	82.86	2.62
travel	1891	22.2	36.03	48.34	58.34	84.39	92.36	5.29

Appendix L Effect of Using Answers

We can use answers in those domains or organizations where we already have some answers posted and the tag-prediction approach is being deployed later. The motivation for using answers directly comes from our Tag-Post Overlap analysis in Table 3, where we can find a minimum overlap of tags in 70% of posts in 16/17 domains with the exception of chemistry and biology domains. In these two domains, the overlap increases by around 9-10%. In some domains, the overlap also increases to 91%.

Table 18. Effect of Vocabulary Size Reduction on Individual Models Hit@5 Metric

Domain	MP		MRPG
Domain	90	85	90	85
askubuntu	80.42	75.73 $\Delta$ -4.69	83.18	80.92 $\Delta$ -2.26
aviation	77.12	73.21 $\Delta$ -3.91	77.64	77.68 $\Delta$ 0.04
biology	79.31	76.35 $\Delta$ -2.96	78.03	77.41 $\Delta$ -0.62
chemistry	77.77	75.62 $\Delta$ -2.15	79.51	79.63 $\Delta$ 0.12
cooking	80.42	76.81 $\Delta$ -3.61	85.38	85.29 $\Delta$ -0.09
electronics	77.92	73.69 $\Delta$ -4.23	81.62	80.56 $\Delta$ -1.06
history	80.57	77.59 $\Delta$ -2.98	82.29	81.21 $\Delta$ -1.08
money	84.46	80.38 $\Delta$ -4.08	88.19	87.9 $\Delta$ -0.29
movies	83.54	78.6 $\Delta$ -4.94	82.77	82.8 $\Delta$ 0.03
music	82.72	78.73 $\Delta$ -3.99	84.37	84.18 $\Delta$ -0.19
philosophy	79.17	74.4 $\Delta$ -4.77	79.58	79.1 $\Delta$ -0.48
physics	81.49	77.3 $\Delta$ -4.19	86.48	85.78 $\Delta$ -0.7
politics	86.43	82.4 $\Delta$ -4.03	91.38	90.74 $\Delta$ -0.64
rpg	83.71	79.41 $\Delta$ -4.3	89.23	88.1 $\Delta$ -1.13
scifi	85.81	82.22 $\Delta$ -3.59	91.55	90.72 $\Delta$ -0.83
serverfault	81.87	77.26 $\Delta$ -4.61	85.9	85.04 $\Delta$ -0.86
travel	84.09	79.41 $\Delta$ -4.68	89.47	88.53 $\Delta$ -0.94

Table 19. Head Contributions for 90%, 85% and 95% Post Coverage by Tags: P represents total correct predictions % by P-Head only and G represents total correct prediction % by G-Head only

Domain	90		85		95
Domain	P	G	P	G	P	G
askubuntu	49.66	11.85	46.5(-3.16)	14.3(2.45)	59.27(9.61)	7.54(-4.31)
aviation	53.8	10.42	47.78(-6.02)	13.76(3.34)	61.54(7.74)	5.41(-5.01)
biology	53.86	9.88	49.32(-4.54)	11.47(1.59)	61.1(7.24)	5.8(-4.08)
chemistry	55.18	10.61	50.89(-4.29)	12.96(2.35)	63.87(8.69)	6.31(-4.3)
cooking	55.03	10.95	47.15(-7.88)	14.93(3.98)	64.45(9.42)	6.24(-4.71)
electronics	52.07	11.28	46.28(-5.79)	14.39(3.11)	59.52(7.45)	7.24(-4.04)
history	55.06	9.91	52.15(-2.91)	10.51(0.6)	65.25(10.19)	3.94(-5.97)
money	50.17	10.23	43.74(-6.43)	12.82(2.59)	61.49(11.32)	5.99(-4.24)
movies	68.51	4.55	58.63(-9.88)	7.4(2.85)	74.17(5.66)	1.61(-2.94)
music	56.46	10.18	50.32(-6.14)	13.19(3.01)	64.44(7.98)	6.69(-3.49)
philosophy	58.98	7.81	53.02(-5.96)	10.85(3.04)	64.9(5.92)	4.61(-3.2)
physics	45.01	12	39.84(-5.17)	14.73(2.73)	55.04(10.03)	8.41(-3.59)
politics	52.6	9.14	44.87(-7.73)	12.81(3.67)	67.38(14.78)	4.51(-4.63)
rpg	51.89	7.68	39.03(-12.86)	11.34(3.66)	68.68(16.79)	3.97(-3.71)
scifi	74.64	5.11	62.79(-11.85)	8.37(3.26)	84.69(10.05)	1.6(-3.51)
serverfault	50.63	11.9	42.96(-7.67)	15.97(4.07)	58.55(7.92)	8.45(-3.45)
travel	46.26	12.39	40.6(-5.66)	15.94(3.55)	53.59(7.33)	8.19(-4.2)

Table 20. Model Performance (Hit@k) for MP and MRPG models

Domain	MP					MRPG
Domain	Hit@1	Hit@2	Hit@3	Hit@4	Hit@5	Hit@1	Hit@2	Hit@3	Hit@4	Hit@5
askubuntu	31.59 $\pm$ 0.14	50.89 $\pm$ 64.85	0.21 $\pm$ 0.14	74.23 $\pm$ 0.31	80.44 $\pm$ 0.11	50.86 $\pm$ 0.13	72.72 $\pm$ 0.1	80.19 $\pm$ 0.09	81.71 $\pm$ 0.17	82.94 $\pm$ 0.15
aviation	30.72 $\pm$ 0.95	48.39 $\pm$ 62.05	0.93 $\pm$ 0.55	72.23 $\pm$ 0.66	77.09 $\pm$ 0.44	47.28 $\pm$ 0.19	67.37 $\pm$ 0.36	75.21 $\pm$ 0.47	76.02 $\pm$ 0.45	77.63 $\pm$ 0.56
biology	34.66 $\pm$ 0.64	50.81 $\pm$ 63.77	1.01 $\pm$ 0.31	73.96 $\pm$ 0.32	78.96 $\pm$ 0.34	49.67 $\pm$ 0.7	68.8 $\pm$ 0.37	75.7 $\pm$ 0.42	76.51 $\pm$ 0.44	77.55 $\pm$ 0.41
chemistry	38.83 $\pm$ 0.51	54.77 $\pm$ 65.84	0.36 $\pm$ 0.72	73.35 $\pm$ 0.34	77.66 $\pm$ 0.1	50.28 $\pm$ 0.15	69.81 $\pm$ 0.33	76.43 $\pm$ 0.3	77.23 $\pm$ 0.38	79.17 $\pm$ 0.45
cooking	35.46 $\pm$ 0.28	56.17 $\pm$ 67.2	0.79 $\pm$ 0.82	76.21 $\pm$ 0.95	80.86 $\pm$ 0.42	52.43 $\pm$ 0.68	75.61 $\pm$ 0.15	82.73 $\pm$ 0.25	83.74 $\pm$ 0.31	85.18 $\pm$ 0.29
electronics	28.67 $\pm$ 0.43	47.28 $\pm$ 61.78	0.73 $\pm$ 0.54	70.97 $\pm$ 0.24	77.51 $\pm$ 0.26	49.64 $\pm$ 0.63	71.4 $\pm$ 0.53	78.92 $\pm$ 0.46	80.06 $\pm$ 0.47	81.3 $\pm$ 0.53
history	34.18 $\pm$ 1.21	54.16 $\pm$ 66.47	1.24 $\pm$ 0.97	76.22 $\pm$ 0.53	80.45 $\pm$ 0.09	54.07 $\pm$ 0.1	73.56 $\pm$ 0.62	79.5 $\pm$ 0.82	80.29 $\pm$ 0.88	81.23 $\pm$ 1
money	51.05 $\pm$ 0.44	66.28 $\pm$ 75.43	0.98 $\pm$ 0.43	81.01 $\pm$ 0.24	84.15 $\pm$ 0.23	60.59 $\pm$ 0.47	78.89 $\pm$ 0.14	86.01 $\pm$ 0.4	86.75 $\pm$ 0.41	87.94 $\pm$ 0.42
movies	50.06 $\pm$ 0.5	64.28 $\pm$ 73.58	1.53 $\pm$ 0.88	79.48 $\pm$ 0.7	82.91 $\pm$ 0.55	57.33 $\pm$ 0.44	78.41 $\pm$ 0.28	82.03 $\pm$ 0.69	83.05 $\pm$ 0.9	83.25 $\pm$ 0.99
music	37.03 $\pm$ 0.41	57.17 $\pm$ 68.79	0.62 $\pm$ 0.27	77.76 $\pm$ 0.43	82.66 $\pm$ 0.26	53.17 $\pm$ 0.69	75.17 $\pm$ 0.35	81.46 $\pm$ 0.52	82.28 $\pm$ 0.49	83.71 $\pm$ 0.51
philosophy	34.9 $\pm$ 0.8	52.94 $\pm$ 66.03	0.26 $\pm$ 0.77	75.46 $\pm$ 0.53	79.45 $\pm$ 0.2	53.09 $\pm$ 0.95	72.01 $\pm$ 0.27	78.03 $\pm$ 0.59	78.76 $\pm$ 0.52	79.49 $\pm$ 0.56
physics	41.27 $\pm$ 0.46	60.63 $\pm$ 70.59	0.49 $\pm$ 0.18	77.39 $\pm$ 0.31	81.12 $\pm$ 0.22	57.96 $\pm$ 0.28	75.49 $\pm$ 0.28	83.5 $\pm$ 0.38	84.47 $\pm$ 0.41	86.34 $\pm$ 0.37
politics	65.26 $\pm$ 1.76	73.87 $\pm$ 78.43	1.18 $\pm$ 0.86	84.04 $\pm$ 0.42	86.29 $\pm$ 0.25	71.61 $\pm$ 0.5	82.92 $\pm$ 0.31	88.97 $\pm$ 0.46	89.91 $\pm$ 0.42	90.98 $\pm$ 0.46
rpg	68.68 $\pm$ 0.29	75.93 $\pm$ 79.43	0.31 $\pm$ 0.37	81.66 $\pm$ 0.36	83.31 $\pm$ 0.33	72.85 $\pm$ 0.4	81.91 $\pm$ 0.17	87.59 $\pm$ 0.12	88.28 $\pm$ 0.16	89.09 $\pm$ 0.16
scifi	76.65 $\pm$ 0.34	80.64 $\pm$ 82.69	0.42 $\pm$ 0.36	84.89 $\pm$ 0.15	85.91 $\pm$ 0.11	81.99 $\pm$ 0.12	86.2 $\pm$ 0.21	90.41 $\pm$ 0.23	90.85 $\pm$ 0.28	91.53 $\pm$ 0.32
serverfault	25.9 $\pm$ 0.12	50.99 $\pm$ 65.52	0.59 $\pm$ 0.25	75.24 $\pm$ 0.34	81.66 $\pm$ 0.16	53.05 $\pm$ 0.38	75.36 $\pm$ 0.18	82.56 $\pm$ 0.17	84.16 $\pm$ 0.29	85.82 $\pm$ 0.26
travel	45.04 $\pm$ 0.51	63.62 $\pm$ 72.52	0.14 $\pm$ 0.41	79.96 $\pm$ 0.24	83.96 $\pm$ 0.12	58.7 $\pm$ 0.34	78.57 $\pm$ 0.2	86.75 $\pm$ 0.25	87.78 $\pm$ 0.28	89.5 $\pm$ 0.3