Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

Rima Hazra, Agnik Saha, Somnath Banerjee, Animesh Mukherjee
Indian Institute of Technology Kharagpur
[email protected]
[email protected], [email protected]

Abstract

Community Question Answering (CQA) platforms steadily gain popularity as they provide users with fast responses to their queries. The swiftness of these responses is contingent on a mixture of query-specific and user-related elements. This paper scrutinizes these contributing factors within the context of six highly popular CQA platforms, identified through their standout “answering speed". Our investigation reveals a correlation between the time taken to yield the first response to a question and several variables: the metadata, the formulation of the questions, and the level of interaction among users. Additionally, by employing conventional machine learning models to analyze these metadata and patterns of user interaction, we endeavor to predict which queries will receive their initial responses promptly.

1 Introduction

Community question-answering platforms are gradually evolving over the past decade. Such portals allow users to post their queries and answer the questions. These CQA platforms help experts share their knowledge and new users solve their queries. Initially, these community question-answering platforms focused on providing the most relevant answers to the users’ queries [14][2]. As the CQA platforms keep growing, they not only focus on providing the relevant answers but also on quickly delivering such answers [12][9]. The growth and maturity of such platforms is contingent on a number of variables, including the “quality of questions and answers", the “response time of queries", “user involvement", and “user activity" [8]. The users with expertise in specific domains significantly help the platform’s growth. Based on the users’ activeness on the platform and the importance of their posts, they have been rewarded reputation scores, badges, and additional privileges. In most such CQA platforms driven by a reputation system, the platform’s popularity depends on the response time [3] of the question (i.e., time to receive the first answer). So, the users who want to post a query may also want to know the time by which they can expect a response to their question. StackExchange¹¹1https://stackexchange.com/ is one of the CQA platforms where the users can ask questions and receive answers from other users. An analysis of such a platform’s dynamics helps the community maintainers adapt and design the framework to address various CQA-related problems [1] [7] [13] [11] [6] [5] [4]. In this paper, we examine the dynamics of different CQA platforms relating the usual response time with other factors of the portals. We conduct an empirical study on six CQA platforms – Mathematics, Software Engineering, English, Game Development, Chemistry, and Money. To analyze the dynamics of the diverse community question-answering platforms, we look into various factors mentioned here – (1) Characterize the questions in terms of textual information, linguistic characteristics, question tags, votes received by the questions etc. We obtain correlations of these features with the time required to respond to a question. (2) Characterizing the users of the platform in terms of their reputation & activity in the platforms. Further, we build Asker-Answerer graph (AAG) where the nodes are the users and a link is created if the question from a user is answered by another user. Subsequently, we conduct in depth analysis of the characteristics of the common users present across all the platforms. (3)For all the platforms, we use metadata of the questions, question structure and user interaction as features to build a standard machine learning model for classifying the newly posted questions which are having fast answer. The majority of preceding studies [10] scrutinize the correlation between diverse question-related factors and response time. The novelty of our current research resides in our endeavor to comprehend how various inherent network-related features (reflected in the asker-answerer graph) coincide with question-related features in relation to response time. In this study, we make the following observations – (1) Short questions get faster responses. Chemistry and Software Engineering questions are harder. More tags mean slower replies. Math, Software Engineering, and English answers come in an hour, others in a day. In Software Engineering and English, many questions get $\geq 3$ answers.(2) Most users ( 60%-70%) on all platforms stay inactive. Domains like Money, Game Development, and Chemistry have more active users than Mathematics, Software Engineering, and English. (3) XGBoost and MLP perform best at classifying new questions on most platforms. The second best is usually the random forest model.

2 Dataset

We test six CQA datasets from StackExchange up to March 2022, picking six platforms (top three and bottom three in answering speed) from an initial group that includes Physics, Chemistry, Mathematics, English, Software Engineering, Game Development, AskUbuntu, Mathematica, Travel, Money. Answering speed, different from response time, is the % of questions answered within time $t$ ( $t$ = 10 mins, 20 mins, etc.). Math, English, and Software Engineering are top in speed; Money and Finance, GameDev, and Chemistry are bottom. Data includes question title, body, tags, reporting time, answers with timestamps, votes, and user reputations. Table 1 presents dataset stats.

Datasets	#Questions	#Unique Tags	#Unique Users	Age (in yr.)
SE	60,801	1667	3,46,642	12 (2010)
GD	53,427	1094	1,22,553	12 (2010)
MA	14,79,363	1898	8,88,141	12 (2010)
EN	1,24,373	981	3,50,766	12 (2010)
CH	40,742	368	88,677	10 (2012)
MO	35,494	1005	84,650	13 (2009)

Table 1: Basic statistics of six community question answering platforms.

Datasets	Average number	Total number	% of answered	% of questions	Average number
	of tags	of answers	questions	having accepted answer	of votes
SE	2.802	1,72,275	95.44	57.78	6.6139
GD	2.795	77,996	86.31	52.80	2.567
MA	2.369	19,76,51	82.32	52.71	2.11
EN	2.08	2,79,346	92.12	48.1	3.36
CH	2.396	47,420	79.6	40.63	3.42
MO	3.11	66,917	91.19	45.32	4.71

Table 2: Basic statistics about the question structure

3 Empirical analysis

In order to compare the platforms we consider two different factors – (i) the characteristic features of the questions posted in a platform, and (ii) the interaction behaviour of users on a platform. In each case we correlate these factors with the response time, which is the time required to obtain the first answer after a question is posted.

3.1 Characterizing the questions

In order to characterize the questions of different platforms, we explore various structural properties of the question. These include length of the title, length of the body, linguistic characteristics of question title and question body. In Table 2, we have shown the average number of tags present in the question, total number of answers, percentage of questions having at least one answer, percentage of questions having accepted answer, average number of votes received by questions. From Table 2, it is observed that 90%-95% of questions have received at least one answer in $SE$ , $EN$ and $MO$ platforms. Lowest percentage ( $\sim$ 79%) of questions have been answered in the $CH$ platform. Further $SE$ , $GD$ and $MA$ have a higher percentage of questions having accepted answers than other platforms. The average number of votes to the questions are maximum for the $SE$ platform.
Length of the title and body: We have computed length of the title and body as the number of words present in the each of them respectively. Further we took top 20% questions ( $Q_{top}$ ) and bottom 20% questions ( $Q_{bottom}$ ) based on the response time. Top 20% questions correspond to a lower response time and bottom 20% questions correspond to a higher response time. In $MA$ , $EN$ and $SE$ , the average length of the body of $Q_{top}$ questions are 69.06, 71.86, 150.70 respectively; the length of the body of $Q_{bottom}$ questions are 117.48, 102.55, 189.70 respectively. Thus questions with lesser length get faster answers. The same trend is observed for $MO$ , $GD$ and $CH$ where the length of the body for $Q_{top}$ are 121.56, 133.63, 81.197 and for $Q_{bottom}$ are 138.48, 166.15, 106.117 respectively. We did not find much difference in the length of the titles for the $Q_{top}$ vs the $Q_{bottom}$ questions.
Linguistic characteristics: In order to understand the linguistic characteristics, we have used three measures – Flesch reading ease test measure ²²2https://simple.wikipedia.org/wiki/Flesch_Reading_Ease (higher the better), Coleman Liau Index ³³3https://en.wikipedia.org/wiki/Coleman-Liau_index (lower the better) and Automated Readability Index ⁴⁴4https://en.wikipedia.org/wiki/Automated_readability_index(lower the better). We observe the readability scores for $Q_{top}$ and $Q_{bottom}$ questions. For Flesch-reading-ease test (see Figure 1), the $Q_{top}$ questions of $MA$ , $EN$ and $SE$ have average scores of 68.53, 71.71, 63.31 respectively while the $Q_{bottom}$ questions have the average scores of 60.59, 68.86 and 58.14 respectively. In case of $MO$ , $GD$ and $CH$ , the average scores of $Q_{top}$ are 71.42, 62.75 and 63.22 respectively while that of $Q_{bottom}$ questions are 68.83, 55.26 and 58.87 respectively. Thus readability is better (easier) for the $Q_{top}$ questions across all platforms irrespective of their popularity (in terms of the answering speed). This observation holds true for the other two readability metrics as well (see Figure 1).

Refer to caption — Figure 1: (left): a. Flesch Reading Ease, ( $2^{nd}$ from left): b. Coleman-Liau index, (middle): c. Automated readability index scores for $Q_{top}$ and $Q_{bottom}$ questions. ( $2^{nd}$ from right): d. Average Clustering Coefficient, (right): e. Average Degree of AAG of $Q_{top}$ and $Q_{bottom}$ .

Tag related analysis: For each platform, we obtain the most frequently occurring tags (top 50) across the questions posted. For $SE$ , Java, Design, C# are the most popular tags. For $GD$ , Unity, C++, C# and OpenGL are most frequent and for $MO$ , United States and taxes are most popular tags. In case of $MA$ , $EN$ and $CH$ , we do not observe any particularly dominant set of tags. Further, for each portal, we calculate the average number of tags present in $Q_{top}$ and $Q_{bottom}$ questions. In $MA$ , $EN$ and $SE$ , the average number of tags in $Q_{top}$ and $Q_{bottom}$ are 2.08, 2.03, 2.53, 2.66, 2.17 and 3.02 respectively. In $GD$ , $CH$ and $MO$ , the average number of tags for $Q_{top}$ and $Q_{bottom}$ are 2.58, 2.28, 3.00, 2.94, 2.54 and 3.23 respectively. We observe that there are lesser number of tags in the questions in $Q_{top}$ than in $Q_{bottom}$ . This could possibly indicate lesser tags makes a question topically less confusing and streamlines it more effectively to people who can actually answer them.

Votes received by the question: In order to understand the temporal pattern of votes received by the questions, we count the fraction of questions receiving 1 upvote, 2 upvotes and $\geq 3$ upvotes on three consecutive days, i.e., the same day when the question is posted followed by the two subsequent days. In the same day, $\sim$ 30% questions of $SE$ platform receive more than 3 votes. But for other platforms most of the questions received one vote. In second day, around $\sim$ 20% of questions receive one more vote in $SE$ , $MO$ and $CH$ platforms. In case of $SE$ and $MO$ , there are 10% of questions receiving at least three votes in the second day. In the third day, there are very less number of questions receiving votes across all the platform. This indicates that on all platforms the attention span of users for a particular question dies down within 48 hours from the post of the question. In case of $MA$ , $EN$ and $SE$ , the average votes for the questions in $Q_{top}$ are 1.99, 5.31 and 15.82 respectively and that for the questions in $Q_{bottom}$ are 2.75, 2.60 and 4.44 respectively. In case of $MO$ , $GD$ and $CH$ , the average number of votes in $Q_{top}$ are 6.46, 4.63 and 4.21 respectively while that in $Q_{bottom}$ are 2.96, 2.08 and 5.17 respectively. Hence questions in lower response time typically receive more upvotes irrespective of the popularity of the platform.

3.2 Characterizing the users

In this section we analyse the user behaviour on each of these platforms. Number of users in $MA$ , $SE$ and $EN$ are approximately 888K, 346K and 350K which is much larger than the number of users in other platforms. For $MO$ , $GD$ and $CH$ , the number of users are 84K, 122K and 88K respectively. In Table 3, we show the basic statistics about these unique users such as % of users only posting question, % of users only answering questions, % of users posting both the question and answers, and users whose at least one answer got accepted by an asker.

Datasets	%Users posting only questions	%Users posting only answers	%Users posting both Q&A	%Users whose at least one answer got accepted
SE	8.9	7.5	2	1.9
GD	19	12.7	5.3	4.9
MA	33	10	5	4
EN	17	12.3	2.6	2.3
CH	20	7	2.6	2.1
MO	22.2	9.8	2.7	2.2

Table 3: Basic statistics of the user interaction.

We have seen that there are around 60%-70% users who remain inactive in all the platforms. Inactive users neither post anything, nor do they interact with anyone on the platform. $MA$ , $SE$ and $EN$ platforms have approximately 37.38%, 14.14% and 26.73% of users who have posted at least one question or one answer (aka active users). In case of $MO$ , $GD$ and $CH$ , the percentage of active users are 29.38%, 26.47% and 25.29% respectively. According to Table 3, $MA$ has maximum fraction of users (33% users) who have posted at least one question. In terms of writing answers, around 12% users are active in $EN$ and $GD$ platforms which is larger than the other platforms. $GD$ has maximum fraction of users ( $\sim$ 5.3%) who have posted both question and answer. Once again, $GD$ has the largest fraction of users ( 4.9%) whose at least one answer got selected. Further we conduct in depth analysis on the users who have asked or answered any questions in $Q_{top}$ and $Q_{bottom}$ respectively. In Table 5, we show number of askers, answerers and their overlap for $Q_{top}$ and $Q_{bottom}$ users. We analyze various factors of the asker and answerer of the questions in $Q_{top}$ and $Q_{bottom}$ below.

Reputation score of users: We separately observe the reputation scores of asker and answerer of questions of $Q_{top}$ and $Q_{bottom}$ category. Asker is the user who asked the question in the portal and answerer is the user who answered the question. In $MA$ , $EN$ and $SE$ , the average reputation scores of askers in $Q_{top}$ are 559.07, 546.27 and 1001.96 respectively. The average reputation scores of askers in $Q_{bottom}$ 705.52, 522.78 and 644.14 respectively. In $MO$ , $GD$ and $CH$ , the average reputation score of askers in $Q_{top}$ are 585.71, 401.53 and 424.68 respectively. The average reputation score of askers in $Q_{bottom}$ are 540.71, 307.14 and 561.10 respectively. In case of answerers, the average reputations in $Q_{top}$ are 4063.39, 2147.42 and 3209.59 for $MA$ , $EN$ and $SE$ respectively and those in $Q_{bottom}$ are 2001.24, 1622.79 and 1846.81 respectively. For $MO$ , $GD$ and $CH$ , the average reputation of answerer in $Q_{top}$ are 4017.05, 1117.52 and 1948.33 respectively while that in $Q_{bottom}$ are 1965.42, 552.84 and 1397.38 respectively. We observe three things here – (i) across all platforms the reputation scores of askers as well as answerers are typically more for $Q_{top}$ questions (ii) in the popular platforms the reputation scores for both askers and answerers are higher and (iii) answerer reputations are an order of magnitude higher than the askers.

Asker-Answerer graph (AAG): To understand the user interaction patterns on a platform, we build an undirected graph where the nodes are the users and the edges denote if a user (answerer) answers a question posted by another user (asker). For every platform, we build two such AAGs for users who either asked or answered the questions of $Q_{top}$ and $Q_{bottom}$ respectively (See Table 4). Further, we calculate average degree, average clustering coefficient for the whole network. In Figure 1 (e), we observe that the average degree of $Q_{top}$ AAG for all platforms except $CH$ are relatively higher than the average degree of $Q_{bottom}$ AAG. For all the platforms, the average clustering coefficient of $Q_{top}$ AAG are higher than the average clustering coefficient of $Q_{bottom}$ AAG (see Figure 1 (d)). This indicates that there is a higher density of interactions among the asker-answer(s) in the $Q_{top}$ AAG (and exceptionally higher platforms $SE$ , $GD$ and $MO$ ).

Datasets	# nodes	# edges	fraction of nodes present in largest connected component.
SE ( $Q_{top},Q_{bottom}$ )	10319, 12756	11179, 11234	0.73, 0.53
GD ( $Q_{top},Q_{bottom}$ )	8011, 9453	8730, 8547	0.72, 0.42
MA ( $Q_{top},Q_{bottom}$ )	106948, 110284	218119, 210865	0.92, 0.865
EN ( $Q_{top},Q_{bottom}$ )	19609, 20672	21499, 21800	0.702, 0.71
CH ( $Q_{top},Q_{bottom}$ )	5350, 5336	5953, 6069	0.79, 0.73
MO ( $Q_{top},Q_{bottom}$ )	5093, 6041	5966, 6096	0.87, 0.69

Table 4: Comparison of network properties of

Q_{top}

Q_{bottom}

AAGs.

Datasets	Number of askers in $Q_{top}$	Number of answerer in $Q_{top}$	Number of asker in $Q_{bottom}$	Number of answerer in $Q_{bottom}$	overlap of askers	overlap of answerers
SE	7697	2861	8933	4701	797	878
GD	5959	2249	6417	4941	843	1905
MA	89557	14072	88772	39909	6891	18397
EN	14533	4542	15564	6349	1148	1241
CH	4408	1180	4021	1860	427	545
MO	4497	762	4724	1789	256	472

Table 5: Asker and Answerer behaviour

Q_{top}

Q_{bottom}

Cross platform users: We found around 2739 users who are present in all the six platforms. Fraction of common users posted at least one question is around 29%-30% for $MA$ and $EN$ and for $SE$ it is 18%. However for $CH$ , $GD$ and $MO$ platforms, this fraction comparatively lower ( $\leq 15\%$ ). The observations are similar for the number of common users posting an answer – the popular platforms have far larger fractions compared to less popular ones. Next we compute the number of common users who participated in asking/answering a question in $Q_{top}$ and $Q_{bottom}$ . The percentage of common users asking one or more questions in case of $MA$ , $EN$ and $SE$ are 19.82%, 17.30% and 9.71% respectively in $Q_{top}$ and 20.0%, 16.24% and 9.49% respectively in $Q_{bottom}$ . The percentage of common users in these three platforms who answered one or more questions in $Q_{top}$ are 9.3%, 8.72% and 6.17% respectively. In $Q_{bottom}$ these percentages are 12.92%, 7.7% and 6.4% respectively. For $MO$ , $GD$ and $CH$ , the percentage of common users acting as asker in $Q_{t}op$ are 7.84%, 6.38% and 5.69% respectively and in $Q_{bottom}$ are 8.06%, 6.09% and 5.69% respectively. For these three platforms, the percentage of common users acting as answerer in $Q_{t}op$ are 2.7%, 4.3% and 1.8% respectively and in $Q_{bottom}$ are 4.2%, 4.9% and 2.1% respectively.

Engagement of common users: In order to investigate the involvement of common users (present in $Q_{top}$ and $Q_{bottom}$ as answerers) in the overall platform, we compute the average number of questions/answers posted by them at an aggregate level. For $MA$ , $EN$ and $SE$ , the average number of questions posted by common users in $Q_{top}$ are 22.68, 8.5 and 3.04 and that in $Q_{bottom}$ are 21.60, 8.74 and 3.57 respectively. For these three platforms, the average number of answers posted by common users in $Q_{top}$ are 228.83, 47.32 and 61.01 and that in $Q_{bottom}$ are 166.7, 52.20 and 56.74 respectively. For $MO$ , $GD$ and $CH$ , the average number of questions posted by common users in $Q_{top}$ are 11.56, 5.1 and 5.61 respectively. and in $Q_{bottom}$ are 9.02, 5.45 and 7.55 respectively. For these three platforms, the average number of answers posted by common users in $Q_{top}$ are 49.81, 24.74 and 15.55 respectively. The average number of answers posted by common users in $Q_{bottom}$ are 35.11, 21.43 and 14.22 respectively. Thus (i) irrespective of the popularity of the platforms, common users engage more in terms of asking and answering in $Q_{top}$ than in $Q_{bottom}$ and (ii) quite interestingly, the average number of answers posted by the common users in $Q_{top}$ and $Q_{bottom}$ are far more than the questions posted by them which indicates that these users are more engaged in posting answers than questions.

4 Question Category Prediction

4.1 Classification model

We consider previously discussed properties of the questions as features and predict whether a question will belong to $Q_{top}$ or $Q_{bottom}$ . The features are – length of question body, Flesch reading ease, Coleman Liau index, automated readability index, number of tags, reputation of the asker and the clustering coefficient of asker in the asker-answerer graph. For every dataset, we have randomly divided the whole dataset into three parts – training (70%), validation (10%), and test (20%). Given a dataset, the number of instances present in training, validation, and test are given in Table 6. For this experiment, we have used four standard machine learning classifiers – (a) logistic regression (LR), (b) support vector classifier (SVC), (c) random forest (RF) (d) XGBoost. For evaluation, we have used overall precision, recall and macro F1 score.

Datasets	Train Size	Valid Size	Test Size
${SE}$	15848	2268	4525
${GD}$	12660	1812	3615
${MA}$	325153	46543	92809
${EN}$	31119	4454	8884
${CH}$	8844	1266	2525
${MO}$	8875	1270	2534

Table 6: Number of instances present in training, validation and testing for all the datasets

Parameter settings: The parameter settings for each model are given below. For SVC, we consider the value of parameter $C$ in the range of 1-3. The values tried for kernel are poly, rbf and sigmoid. For random forest classifier, the range of the number of estimators is between 160 to 200. The criterion values are gini and entropy. For XGBoost, the range of values for the number of estimators is from 160 to 200. The range of learning rate is 0.1 to 0.3.

Logistic regression (LR): For all datasets, we found the default parameters are the best.
Support vector classifier (SVC): For $CH$ , the best value of $C$ is 3, and kernel is set to poly. For other datasets, the best parameter for kernel is rbf. For $EN$ and $GD$ , the value of the parameter C is set to 2 and 1 respectively. For the other datasets, the value of C is 3.
Random forest (RF): For $CH$ , $EN$ , $MO$ , $MA$ , the best value for criterion is entropy. For $GD$ and $SE$ , the criterion is set to gini. For $MO$ , the number of estimators is set to 180. For other datasets, the number of estimators is set to 200.
XGBoost: For all the datasets, the best value for the learning rate is 0.1. For $CH$ , $GD$ , $MO$ , the value of the number of estimators is set to 160. For $EN$ and $SE$ , the number of estimators is 180. For $MA$ , the number of estimators is 200.

4.2 Results

Datasets	LR			SVC			RF			XGBoost
Datasets	Precision	Recall	Macro F1	Precision	Recall	Macro F1	Precision	Recall	Macro F1	Precision	Recall	Macro F1
${SE}$	0.67	0.67	0.67	0.69	0.69	0.69	0.71	0.71	0.71	0.72	0.72	0.72
${GD}$	0.62	0.61	0.61	0.63	0.62	0.62	0.62	0.62	0.62	0.63	0.63	0.63
${MA}$	0.69	0.69	0.69	0.70	0.70	0.70	0.69	0.69	0.69	0.71	0.71	0.71
${EN}$	0.61	0.61	0.60	0.61	0.61	0.61	0.63	0.63	0.63	0.64	0.64	0.64
${CH}$	0.59	0.58	0.58	0.59	0.58	0.57	0.59	0.58	0.59	0.58	0.58	0.58
${MO}$	0.56	0.56	0.56	0.56	0.56	0.56	0.56	0.56	0.56	0.57	0.57	0.57

Table 7: Results of question classification models for all the datasets. The best results are highlighted in bold. The

2^{nd}

best results are highlighted in underline.

In Table 7, we present the overall precision, recall and macro F1 score for all the datasets. For $SE$ , RF and XGBoost are performing better than other models. For $GD$ , almost all the models’ performances are similar (0.61-0.63 F1 score). For $MA$ , XGBoost model is performing better (0.71 macro F1 score) than other models. For $EN$ , RF and XGBoost attain better F1 score than other models. For $CH$ , RF performs better (0.59 F1 score) than other models. For $MO$ , all the models attain similar performance (0.56-0.57 F1 score).

5 Conclusion

In this paper, we study question-answering trends across six diverse CQA platforms. We find metadata, question structure, and user interaction patterns affect response time. Shorter, clearer questions with fewer tags are answered faster. High-reputation users engage more with quickly-answered questions. These question and asker features predict fast responses. We use machine learning to classify new questions based on metadata, planning to incorporate these factors into deep learning models to predict response times for new questions using text, metadata, and asker features.

References

Anderson et al. [2012] Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. Discovering value from community activity on focused question answering sites: A case study of stack overflow. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, page 850–858, New York, NY, USA, 2012. Association for Computing Machinery. ISBN 9781450314626. doi: 10.1145/2339530.2339665. URL https://doi.org/10.1145/2339530.2339665.
Bachschi et al. [2020] Timur Bachschi, Aniko Hannak, Florian Lemmerich, and Johannes Wachs. From asking to answering: Getting more involved on stack overflow, 2020.
Bhat et al. [2014] Vasudev Bhat, Adheesh Gokhale, Ravi Jadhav, Jagat Pudipeddi, and Leman Akoglu. Min(e)d your tags: Analysis of question response time in stackoverflow. In 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), pages 328–335, 2014. doi: 10.1109/ASONAM.2014.6921605.
Hazra et al. [2021] Rima Hazra, Hardik Aggarwal, Pawan Goyal, Animesh Mukherjee, and Soumen Chakrabarti. Joint autoregressive and graph models for software and developer social networks. In Djoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, and Fabrizio Sebastiani, editors, Advances in Information Retrieval, pages 224–237, Cham, 2021. Springer International Publishing. ISBN 978-3-030-72113-8.
Hazra et al. [2023a] Rima Hazra, Arpit Dwivedi, and Animesh Mukherjee. Is this bug severe? a text-cum-graph based model for bug severity prediction. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part VI, page 236–252, Berlin, Heidelberg, 2023a. Springer-Verlag. ISBN 978-3-031-26421-4. doi: 10.1007/978-3-031-26422-1\_15. URL https://doi.org/10.1007/978-3-031-26422-1_15.
Hazra et al. [2023b] Rima Hazra, Debanjan Saha, Amruit Sahoo, Somnath Banerjee, and Animesh Mukherjee. Duplicate question retrieval and confirmation time prediction in software communities. CoRR, abs/2309.05035, 2023b. doi: 10.48550/ARXIV.2309.05035. URL https://doi.org/10.48550/arXiv.2309.05035.
Mondal et al. [2021] Saikat Mondal, C M Khaled Saifullah, Avijit Bhattacharjee, Mohammad Masudur Rahman, and Chanchal K. Roy. Early detection and guidelines to improve unanswered questions on stack overflow. In 14th Innovations in Software Engineering Conference (Formerly Known as India Software Engineering Conference), ISEC 2021, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450390460. doi: 10.1145/3452383.3452392. URL https://doi.org/10.1145/3452383.3452392.
Moutidis and Williams [2021] Iraklis Moutidis and Hywel TP Williams. Community evolution on stack overflow. Plos one, 16(6):e0253010, 2021.
Wang et al. [2013] Shaowei Wang, David Lo, and Lingxiao Jiang. An empirical study on developer interactions in stackoverflow. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, page 1019–1024, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450316569. doi: 10.1145/2480362.2480557. URL https://doi.org/10.1145/2480362.2480557.
Wang et al. [2018a] Shaowei Wang, Tse-Hsun Chen, and Ahmed E. Hassan. Understanding the factors for fast answers in technical q&a websites: An empirical study of four stack exchange websites. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pages 884–884, 2018a. doi: 10.1145/3180155.3182521.
Wang et al. [2018b] Shaowei Wang, Tse-Hsun Chen, and Ahmed E. Hassan. Understanding the factors for fast answers in technical q&a websites: An empirical study of four stack exchange websites. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pages 884–884, 2018b. doi: 10.1145/3180155.3182521.
Yazdaninia et al. [2021] Mohamad Yazdaninia, David Lo, and Ashkan Sami. Characterization and prediction of questions without accepted answers on stack overflow, 2021.
Zhang et al. [2021a] H. Zhang, S. Wang, T. Chen, Y. Zou, and A. E. Hassan. An empirical study of obsolete answers on stack overflow. IEEE Transactions on Software Engineering, 47(04):850–862, apr 2021a. ISSN 1939-3520. doi: 10.1109/TSE.2019.2906315.
Zhang et al. [2021b] Haoxiang Zhang, Shaowei Wang, Tse-Hsun (Peter) Chen, and Ahmed E. Hassan. Are comments on stack overflow well organized for easy retrieval by developers? ACM Trans. Softw. Eng. Methodol., 30(2), feb 2021b. ISSN 1049-331X. doi: 10.1145/3434279. URL https://doi.org/10.1145/3434279.