Socio-Technical Grounded Theory
for Software Engineering
Abstract
Grounded Theory (GT), a sociological research method designed to study social phenomena, is increasingly being used to investigate the human and social aspects of software engineering (SE). However, being written by and for sociologists, GT is often challenging for a majority of SE researchers to understand and apply. Additionally, SE researchers attempting ad hoc adaptations of traditional GT guidelines for modern socio-technical (ST) contexts often struggle in the absence of clear and relevant guidelines to do so, resulting in poor quality studies. To overcome these research community challenges and leverage modern research opportunities, this paper presents Socio-Technical Grounded Theory (STGT) designed to ease application and achieve quality outcomes. It defines what exactly is meant by an ST research context and presents the STGT guidelines that expand GT’s philosophical foundations, provide increased clarity and flexibility in its methodological steps and procedures, define possible scope and contexts of application, encourage frequent reporting of a variety of interim, preliminary, and mature outcomes, and introduce nuanced evaluation guidelines for different outcomes. It is hoped that the SE research community and related ST disciplines such as computer science, data science, artificial intelligence, information systems, human computer/robot/AI interaction, human-centered emerging technologies (and increasingly other disciplines being transformed by rapid digitalisation and AI-based augmentation), will benefit from applying STGT to conduct quality research studies and systematically produce rich findings and mature theories with confidence.
Index Terms:
Socio-technical grounded theory, STGT, grounded theory, GT, software engineering, research method, theory, theory development, qualitative research, data analysis, guidelines, evaluation1 Introduction
Grounded theory is a complete research method that enables systematic and evidence-based development of theory. The emergence of the Grounded Theory (GT) method challenged the dominance of quantitative research in sociology in the mid-1960’s and marked an emphatic resurgence of qualitative research. Since its introduction (Glaser and Strauss, 1967), GT has evolved into three versions: the Classic or Glaserian version by one of the founding fathers, Barney Glaser, who continued to support the original method (Glaser and Strauss, 1967; Glaser, 1992), the second being the Strauss-Corbinian version introduced by the other founding father, Anselm Strauss, in collaboration with Juliet Corbin (Strauss and Corbin, 1990), and the third, Constructivist GT version, introduced by Kathy Charmaz nearly two decades ago (Charmaz, 2006).
GT is a deep dive into an area of interest or phenomenon. It is particularly suited to understudied areas, phenomena with gaps between research and practice, and for addressing complex and deep questions of how and why in addition to the what of the phenomenon being studied. While GT is said to be a general method open to quantitative data, it is predominantly applied as a qualitative method through traditional data collection techniques such as interviews and observations. GT’s unique features include rigorous qualitative data analysis procedures such as open coding and constant comparison and systematic theory development. Using an iterative, incremental, and interleaved approach to data collection and analysis, GT is one of the most agile research methods (Hoda et al., 2012a). When applied well, GT enables the researcher to undertake an up-close and in-depth exploration of a practical phenomenon and emerge with original, relevant, and parsimonious theories, distinguishing it from other qualitative methods such as case studies (Runeson and Höst, 2009), ethnography (Sharp et al., 2016), survey research, and interview based studies.
GT continues to be popular beyond its native field of sociology and is widely applied to study social phenomena in areas such as psychology (Rennie et al., 1988; Fassinger, 2005), nursing (Schreiber et al., 2001), and medical education (Kennedy and Lingard, 2006; Watling and Lingard, 2012). In the last decade, GT has become increasingly popular in software engineering (SE) to investigate its human and social aspects. It has been applied successfully to explain SE phenomena such as software process improvement (Coleman and O’Connor, 2007), self-organization (Hoda et al., 2010, 2012a), agile architecture (Waterman et al., 2015), design problems (Sousa et al., 2018), and self-assignment (Masood et al., 2020a). Table I presents a list of exemplar GT in SE studies.
However, a majority of SE researchers struggle with applying GT, as evident from a critical review of GT in SE studies over 20 years which identified major quality concerns such as studies not acknowledging the version of GT being applied, combining guidelines without rationales, and applying GT practices à la carte with no reference to particular guidelines (Stol et al., 2016). The review reported other concerns such as only 15% studies provided detailed accounts of their research procedures and fewer made theoretical contributions. Such poor quality show is indicative of researcher apprehension, misunderstanding, poor presentations, and extreme evaluations, in addition to possible method misuse and abuse.
And yet, these findings are not surprising. Glaser and Strauss would have hardly had the SE researcher in mind when introducing GT. The traditional GT guidelines are spread across the fundamental texts of the three GT versions (Glaser and Strauss, 1967; Glaser, 1978; Strauss and Corbin, 1990; Charmaz, 2006, 2014). They declare theory development to be exclusively interesting to and achievable by “only sociologists [who] are trained to want it, look for it, and to generate it” (Glaser and Strauss, 1967, 2017). They were written by and for sociologists using language, format, and examples native to them, making them difficult for SE researchers to easily access and understand. At the same time, decontextualising GT is acknowledged to lead to näive applications (Martin, 2019), pointing to the need for contextualised GT guidelines for SE research. While exemplar studies (Table I) and dedicated papers describing GT guidelines for SE research provide good starting points (Hoda et al., 2012a; Adolph et al., 2011; Stol et al., 2016), they do not provide enough independent guidance for novice SE researchers on adapting and applying GT in SE contexts. Currently, achieving high quality applications and outcomes relies on first understanding the traditional GT guidelines and then effectively selecting and applying one of the three versions.
GT is acknowledged to be “fundamentally aimed at explaining and rendering convincing portrayals of social processes” (Timonen et al., 2018). Software engineering, on the other hand, is a fundamentally socio-technical domain (Whitworth, 2011; Storey et al., 2020). SE abounds with phenomena that are neither exclusively social nor purely technical, rather, predominantly socio-technical (ST) where the social and technical aspects are interwoven in a way that studying one without due consideration of the other makes for an incomplete investigation and understanding. Requirements elicitation, user-centred design, collective estimations, pair programming, daily standups, multi-team coordination, acceptance testing, maintenance, and DevOps are examples of ST phenomena, where a complete research investigation needs to consider the full ST context. Using GT’s social science approach to studying socio-technical phenomena in SE will only go so far before adaptations and updates are required.
Indeed the most successful GT in SE studies have been those that have overcome the entry barriers of understanding traditional GT guidelines and applied them, often with implicit adaptations, to work in ST contexts. However, as evident from the GT in SE quality review, they are exceptions not the norm (Stol et al., 2016). How can SE researchers, a majority of whom struggle with basic understanding of traditional GT guidelines, be expected to select one of three versions and adapt and apply them effectively without any explicit philosophical and methodological guidance?
Critically, there are no comprehensive guidelines available for SE researchers to adapt and apply GT simply and effectively in SE’s modern ST research contexts. Reviewer proclamations of “this is not GT!” and recommendations to retrospectively rebrand attempted GT applications and adaptations as interview-based or mixed methods studies do not fix the underlying misalignment and guidance gap.
Finally, SE is the birthplace of modern data collection and analysis tools and techniques that are revolutionising research. For example, publicly available data on social media (Storey et al., 2010) and open source repositories, and modern data collection and analysis techniques such as mining software repositories, natural language processing including sentiment analysis (Novielli et al., 2018), and artificial intelligence based tools. While traditional GT guidelines do not explicitly overrule such advancements, they obviously do not address them, leaving individual SE researchers struggling with ad hoc adaptations (Stol et al., 2016). SE is particularly well positioned to leverage modern software-driven tools and techniques to enhance GT practice within and beyond SE research, pointing to the need for a contemporary adaptation and update of the traditional GT guidelines so practitioners can systematically apply them.
Adaptations and updates to traditional research methods from time to time are not uncommon. For example, contextualised guidelines for other sociological research methods such as Case Studies (Runeson and Höst, 2009) and Ethnography (Sharp et al., 2016) have made it easier for SE researchers to apply and succeed with these methods. Modern updates to traditional methods, such as Virtual ethnography (Hine, 2008) and Visual ethnography (Schwartz, 1989; Pink, 2020) have provided the necessary guidance to harness modern technological advancements while leveraging the best of traditional methods. Constructivist GT provided a constructivist update to traditional GT guidelines (Charmaz, 2006).
Similarly, motivated by the needs of, and opportunities afforded by, the SE research community, I present Socio-Technical Grounded Theory that aims to ease accessibility and improve GT quality outcomes in SE by:
-
•
defining the socio-technical research context as it applies to SE research,
-
•
presenting simple and contextualised guidelines for conducting and evaluating STGT research,
-
•
explicitly listing the fundamental knowledge and skills required to achieve quality outcomes,
-
•
expanding its philosophical foundations to accommodate a range of paradigms and perspectives,
-
•
offering the option to apply the full STGT method or STGT for Data Analysis within other methods,
-
•
offering guidelines on harnessing modern data types, sources, collection tools, and analysis techniques,
-
•
offering the option to select from emergent or structured theory development modes later in the process, once experience with basic data collection and analysis is gained,
-
•
encouraging frequent reporting of a variety of interim, preliminary, and mature outcomes,
-
•
presenting nuanced evaluation guidelines for different types of applications and outcomes.
As with any guidelines, practical applications will serve to provide feedback for further improvements.
2 Grounded Theory – Social Traditions
2.1 The Origins
Grounded Theory (GT) emerged in the sociological research scene at a time when quantitative research with its focus on objectivity, accuracy, verification, and generalisability had taken a strong foothold following similar trends in the wider research communities of the natural sciences disciplines. Qualitative research, on the other hand, was being seen as lacking scientific rigour and relying on anecdotal evidence. With its systematic and rigorous techniques and procedures, the introduction of GT challenged the status quo and served to re-establish the value of qualitative research in sociology. It challenged the overemphasis on theory verification predominant at the time and argued for theory development as a fundamental research objective.
GT originated from the research of sociologists, Barney G. Glaser and Anselm L. Strauss, as they studied the awareness of dying patients and the role of their social status in the level of nursing care received, interspersed across three texts, Awareness of Dying (Glaser and Strauss, 1966), Time for Dying (Glaser and Strauss, 1980), and Status Passage (Glaser and Strauss, 2011). The procedures and strategies employed in their qualitative research project led to the formulation of the Grounded Theory method, documented into their book “The Discovery of Grounded Theory – strategies for qualitative research” (Glaser and Strauss, 1967).
2.2 The Method
Grounded theory is a rigorous research method that enables systematic and evidence-based development of theory. GT is a complete research method, covering data collection, analysis, and more advanced steps of theory development. GT enables the investigation of social phenomena using a social lens native to sociology. The name grounded theory is derived from its focus on theory development firmly grounded in evidence collected from real-world practice.
Distinguishing features of GT include its unique focus on inductive theory development through iterative and interleaved rounds of data collection and analysis. While literature review is not strictly forbidden in GT, it is encouraged in ”unrelated fields” so as not to bias theory development in the area being studied (pg35, Glaser (1992).) A GT study begins with evidence collection from the substantive field, typically through semi-structured interviews and observations. GT offers a set of data analysis procedures such as open coding, constant comparison, and selective coding that lead to the inductive identification of key patterns, referred to as concepts and categories. Rigorous application of the GT method leads to the development of theories which encapsulate the key patterns and relationships between them.
Theory development in GT is facilitated by researcher traits such as theoretical sensitivity (Glaser, 1978; Strauss and Corbin, 1990) and procedures such as memoing and theoretical- sampling, sorting, coding, and saturation. Theoretical sensitivity is the propensity of the researcher to develop theoretical codes, structures, relationships, and eventually theories. Unlike field and observational notes that are focused on documenting full accounts of observed practice in a single instance, memoing enables the researcher to explore emerging concepts, potential relationships between concepts compared across instances, and identify gaps in the emerging theoretical structures.
Guided by the theoretical gaps, researchers can decide where and from whom to collect data in the next collection-analysis cycle and refine collection procedures such as interview questions accordingly, a process known as theoretical sampling. Theoretical sorting involves examining, arranging, and rearranging theoretical memos to enable story lines or theoretical structures to emerge. These emerging structures can be presented using best matching predefined theory templates, an optional procedure called theoretical coding. Once collecting more data reaches a point of diminishing returns, the study is said to have reached theoretical saturation. These traditional GT strategies, techniques, and procedures are described at length in the traditional texts (Glaser and Strauss, 1967; Glaser, 1978, 1998; Strauss and Corbin, 1990, 1994; Charmaz, 2006, 2014).
The resulting theory derived from the application of GT is said to be a mid-range theory, limited in its applicability to the contexts studied (Glaser and Strauss, 1967). However, it remains open to future modifications in light of new evidences gained from further investigations.
Coleman and O’Connor (2007) | Using grounded theory to understand software process improvement… Information and Software Technology. |
Hoda et al. (2010) | Organizing self-organizing teams, IEEE International Conference on Software Engineering. |
Dagenais et al. (2010) | Moving into a new software project landscape, IEEE International Conference on Software Engineering. |
Hoda et al. (2011) | The impact of inadequate customer collaboration on self-organizing Agile teams, Information and Software Technology. |
Hoda et al. (2012b) | Self-organizing roles on agile software development teams, IEEE Transactions on Software Engineering. |
Adolph et al. (2012) | Reconciling perspectives: A grounded theory of how people manage the process of software development, Journal of Systems and Software. |
Maglyas et al. (2013) | What are the roles of software product managers? An empirical investigation, Journal of Systems and Software. |
Jantunen and Gause (2014) | Using a grounded theory approach for exploring software product management challenges, Journal of Systems and Software. |
∗Waterman et al. (2015) | How much up-front? A grounded theory of agile architecture, IEEE International Conference on Software Engineering. |
Stray et al. (2016) | The daily stand-up meeting: A grounded theory study, Journal of Systems and Software. |
∗Hoda and Noble (2017) | Becoming agile: A grounded theory of agile transitions in practice, IEEE International Conference on Software Engineering. |
Sedano et al. (2017) | Software development waste, IEEE International Conference on Software Engineering. |
∗Sousa et al. (2018) | Identifying design problems in the source code: A grounded theory, IEEE International Conference on Software Engineering. |
Masood et al. (2020a) | How agile teams make self-assignment work: a grounded theory study, Empirical Software Engineering. |
Masood et al. (2020b) | Real world scrum a grounded theory of variations in practice, IEEE Transactions on Software Engineering. |
Shastri et al. (2021a) | The role of the project manager in agile software development projects, Journal of Systems and Software. |
Shastri et al. (2021b) | Spearheading agile: the role of the scrum master in agile projects, Empirical Software Engineering. |
2.3 The Evolution
Since its introduction in 1967, the GT method has evolved – been adapted and has changed forms over the years – through the introduction of two other versions. Nearly two decades later, one of the two founding fathers, Anselm Strauss along with Juliet Corbin proposed the first variation of GT in their book “Basics of Qualitative Research”, which came to be known as Strauss-Corbinian GT (Strauss and Corbin, 1990). The original GT method supported by Glaser is since referred to as Classic or Glaserian GT. Kathy Charmaz introduced a third version with her book “Constructing Grounded Theory” (Charmaz, 2006) a decade and a half later.
While Glaser maintained that theory emerges naturally from the underlying data (Glaser, 1978), Strauss and Corbin introduced a more prescriptive and structured way to derive theory (Strauss and Corbin, 1990). A distinguishing feature of the Strauss-Corbinian approach was the introduction of axial coding, a “process of systematically relating categories and sub-categories”. For this, they recommended the use of analytical tools such as the coding paradigm, a sort of template structure of a theory with consideration of the context, causal conditions, intervening conditions, action-interaction strategies, and consequences related to the phenomenon. Using such tools (e.g. coding paradigm or conditional matrix) is likely to result in the theory being structured sooner in the study as compared to Classic GT, and verified through deductive approaches in the later parts of the study. Its structured approach is a reason why some researchers prefer Strauss-Corbinian GT (Masood et al., 2020a, b).
However, Glaser strongly objected to this variation as “whole different method” and the resulting theory as “forced” rather than emergent, captured in his rejoinder “Basics of Grounded Theory Analysis: Emergence vs. Forcing” (Glaser, 1992). Meanwhile, with the passing away of Strauss in 1996, Corbin continued to publish their co-written second edition and a third edition while maintaining their joint position.
The next chapter in the evolution of GT was marked by a third version formulated by Kathy Charmaz in her book “Constructing Grounded Theory” (Charmaz, 2006). Charmaz described the research paradigms underlying Glaserian and Strauss-Corbinian GT as primarily positivist and the latter with some pragmatic influences. Charmaz proposed Constructivist GT, a constructivist approach to GT, highlighting the role of researcher in subjectively interpreting and constructing reality as opposed to an objectivist stance where the researcher is seen as neutral, objectively observing and rendering reality (Charmaz, 2014). Charmaz’s plenary presentation in 1993 “caused quite a stir” with opinions prominently divided along gender lines. Though initially supported mainly by women (Charmaz, 2014), Constructivist GT subsequently had many takers, becoming a prominent version of GT. Dedicated articles discussing the differences between the versions in-depth can be consulted in (Babchuk, 1996; Heath and Cowley, 2004; Kenny and Fourie, 2014).
3 GT in Software Engineering
3.1 The Promise
Theories have the potential to be useful to both research and industry (Sjøberg et al., 2008). SE researchers interested in theory development will benefit from using GT, which promises theory development as its core feature. GT can help researchers lay down the theoretical foundations of SE, the youngest engineering discipline (Ebert, 2018), explaining its unique landscape, practices, challenges, and strategies.
Being an empirical method, GT promises combining research rigour with practical relevance – described as a ’grand challenge’ (Gregory et al., 2016). Theories developed using GT have the potential to lay the foundations of better understanding SE phenomena, present recommendations and guidelines, and motivate tools development. For example, a GT study on the role of ethics in artificial intelligence can lead to theories that explain how developers perceive ethics, the socio-technical barriers to embedding ethics in AI, and enabling strategies. Practical applications can then be drawn in terms of practitioner guidelines for embedding ethics in AI. Similarly, grounded theories can establish rich theoretical foundations of the problem space, leading to development of software tools and techniques.
Like most methods, GT has a sweet spot – an ideal context where it is most likely to succeed and indeed where it makes most sense to use. The application of GT promises to benefit SE studies that fall within the traditional GT sweet spot.
-
•
Human and social aspects focus. Originating from sociology, it is no surprise that GT is aptly suited to studying human and social aspects of SE.
-
•
Theory development. GT is particularly suited to studying areas lacking existing theories, those with research gaps, and where existing research fails to resonate with practice and can benefit from new theories grounded in empirical evidence.
-
•
Practice-based topics. If the research topic is practice-based or industry relevant, it is more likely to lead to a successful GT because it increases the possibility of finding enough evidence from practice and for the researcher to conduct field observations.
-
•
Complex and deep questions. While GT can be employed to answer the what type questions, it is best used to explore and answer more complex and deep issues through the how and why type questions.
-
•
Primarily qualitative studies. While GT is described as a general research method (Glaser and Strauss, 1967) capable of incorporating both qualitative and quantitative data, there is limited guidance on achieving this. Practically, GT is most often applied as a qualitative research method due to its rigorous qualitative data analysis procedures and results in what Storey et al. (2020) refer to as descriptive knowledge using their design science approach to categorising research outcomes.
-
•
Data collected through interviews and observations. While GT is open to including all types of data, interviews and observations form the most commonly used data sources in a traditional GT study.
On the other hand, research contexts relying primarily on quantitative data, exploring simple questions that can be answered using descriptive analysis, and those not interested in deriving theoretical frameworks, theories, or models, do not require a GT approach. Unsurprisingly, a majority of GT studies in software engineering in the last decade fall within the traditional GT sweet spot.
Traditional GT Context and Limitations | Associated SE Challenges | Socio-Technical GT Advantages |
---|---|---|
Traditional GT guidelines were written by and for sociologists. | #EagerButNotEquipped | STGT method is written by and for ST researchers using format, language, and examples native to SE. |
Traditional GT guidelines are spread across multiple books and three versions: Glaserian, Strauss-Corbinian, Constructivist. | #NoVersionControl | STGT guidelines present one research method, offering flexibility in its application. |
Traditional GT guidelines expect the researcher to understand all three versions and select one upfront. | #NoVersionControl | STGT delays the decision about theory development modes, emergent or structured, after some experience is gained. |
Traditional GT guidelines assume that the researcher is aware of and possesses the requisite knowledge and skills for conducting successful GT studies. | #EagerButNotEquipped #PoorPresentations #ExtremeReviews | STGT guidelines explicitly spell out the fundamental knowledge and skills for conducting successful STGT studies. |
Traditional GT guidelines focus on theory generation. | #ScopeConfusion #ExtremeReviews | A full STGT study can be used to generate theories and STGT for Data Analysis can be used for a data analysis only within other research frameworks, e.g. case study. |
Traditional GT methods were designed to study social phenomena. | #DIYGoneWrong #ExtremeReviews | STGT method enables the investigation of socio-technical phenomena and domains. |
Traditional GT guidelines were designed to use traditional data types, sources, and collection techniques. | #DIYGoneWrong #ExtremeReviews | STGT method is designed to leverage traditional and modern data types, sources, and collection techniques. |
Glaserian and Strauss-Corbinian GT were designed to be standalone methods. | #DIYGoneWrong #ExtremeReviews | STGT method can be applied standalone or in combination with other methods and techniques. |
Traditional GT guidelines did not have to consider the need to share detailed evidence. | #EvidenceGateKeeping #TheFacade | STGT method provides guidelines on presenting sufficient sanitised evidence and examples to establish credibility and enable proper evaluation. |
3.2 The Progress
Both the SE discipline and the GT research method are about half a century old and still maturing in practice. The two have continued to progress in parallel with their paths crossing over as SE researchers discovered and attempted GT. The use of GT has gained steady acceptance in SE research, particularly in the last decade (Stol et al., 2016).
The growing appeal of GT in SE research is being propelled on the one hand by theory development efforts (Stol et al., 2016) and on the other, by the rise of human and social aspects in SE (Prikladnicki et al., 2013) and agile software development (Hoda et al., 2018). GT has been used to study a variety of human and social aspects of SE. Some exemplar GT studies in software engineering that describe both their method application and the resulting theory in sufficient detail are listed in Table I. Dedicated SE papers providing guidance on traditional GT methods include Adolph et al. (2011), Hoda et al. (2012a), and Stol et al. (2016).
3.3 The Challenges
A critical review of GT in SE studies identified increasing adoption over the last two decades but raised serious quality concerns (Stol et al., 2016). Of the final 98 journal papers analysed, half explicitly claimed to perform GT while only a third provided details of their method application. These were seen to apply guidelines primarily from the Glaserian or Strauss-Corbinian versions. A majority of studies did not acknowledge the version of GT being employed. Some combined guidelines from different versions without rationale. Several were seen to apply practices à la carte with no reference to any guidelines. Only five individual studies were found to be comprehensive and detailed in their GT application description, serving as exemplars (Coleman and O’Connor, 2007), (Hoda et al., 2011), (Adolph et al., 2012), (Hoda et al., 2012b), and (Jantunen and Gause, 2014).
Over fifteen years of experience in conducting, supervising, reviewing, and editing GT in SE studies suggests these widespread issues are not always intentional method misuse or abuse. Method slurring, false claims of applying method guidelines (Baker et al., 1992), is but one reason quality issues arise. Many genuine attempts fail or lead to poor results and harsh reviews, indicating researcher apprehensions, misunderstandings, poor presentations, and extreme evaluations underlying the poor show of quality, captured in the following patterns of GT in SE challenges.
#EagerButNotEquipped – When software engineering researchers are interested in using GT but struggle to understand and apply it. The traditional GT texts are written by and for sociologists and are naturally challenging for SE researchers, many of whom have little or no sociological or qualitative research background. Often the traditional texts start to make sense after preliminary data collection and analysis, as researchers connect the guidelines with their lived experiences. While the exemplar GT studies in SE provide application details (Table I), they are not complete methodological guidelines for the novice researcher. For the vast majority of SE researchers, GT remains largely inaccessible and challenging. Not all SE researchers need to or may want to develop theories, but for those who are interested, GT should be accessible.
#NoVersionControl – When software engineering researchers are unaware or unclear about the different versions of traditional GT and how to apply them in practice. Despite in-depth attempts to delineate between the different versions of GT (Babchuk, 1996; Heath and Cowley, 2004; Kenny and Fourie, 2014), researchers continue to apply traditional GT versions poorly (Stol et al., 2016). For the novice GT researcher, understanding any one version is challenging enough. Expecting an understanding of all three, the differences between them, and selecting one upfront, further raises the entry barrier.
#ScopeConfusion – When software engineering researchers find it difficult to distinguish between conducting a full GT study and using GT data analysis techniques alone. GT’s data analysis procedures of open coding, constant comparison, and selective coding are popular in qualitative research. Sometimes SE researchers use these GT data analysis techniques within some other research framework and present it in a way that claims a full GT study. While the other key GT procedures of iterative and interleaved data collection and analysis, memoing, theoretical sampling, sorting, coding, and theoretical saturation are clearly missing.
#EvidenceGateKeeping – When software engineering researchers find it difficult to support their GT findings with adequate evidence without compromising ethics. There is an increasing push for open science and data transparency in the SE research community. Typically, the raw data underlying a GT study, such as interviews and observational field notes, cannot be shared outside of the core research group because of the confidentiality and anonymity clauses of the governing human ethics. Many GT studies err on the side of providing no evidence of raw data and analysis, leading to credibility concerns.
#TheFacade – When software engineering researchers do not apply GT carefully but claim to do so. Sometimes, SE researchers claim to apply GT but their methodology and outcomes descriptions suggest poor and rushed applications. The increasing pressure to publish, both system permeated (e.g. tenure or promotion driven) and self-inflicted (e.g. peer pressure, international rankings, quasi-gamification of researcher profiles), manifests as increasing trends toward achieving publication quantity over quality. While not exclusive to GT, such trends encourage shallow research studies focusing on simple problem spaces, low hanging fruits, instead of addressing complex problems through deep and meaningful work that necessitate careful method application. So long as the research community does not acknowledge and reverse these trends, method misuse and abuse will likely continue.
#PoorPresentation – When software engineering researchers struggle to present GT studies to a high standard. In contrast to #TheFacade, this pattern captures scenarios where the method was well applied but the outcome is poorly presented. Tedious presentation of the findings reading like endless walls of text and lack of clear summaries of the theoretical and practical contributions, pertinent quotes, and evaluation make for poor presentations.
#DIYGoneWrong – When software engineering researchers attempt to adapt traditional GT but lack guidelines to do so. SE researchers often attempt to combine steps and procedures across GT versions. Such ad hoc variations are often poorly executed. While some of these are rightly labelled method slurring, other well-intended adaptations are reviewed harshly because of the absence of philosophical and methodological guidelines for adaptations.
#ExtremeReviews – When software engineering reviewers are extreme, overly trusting or harsh, in their review of GT studies. Reviewers can be overly trusting based on some checklist that authors can demonstrate having satisfied and a seemingly reasonable looking theory. Passing poorly done GT studies as acceptable does not help improve quality. More often, reviewers are overly harsh in their evaluations because they do not fully understand GT, evaluate it using criteria fit for quantitative methods and positivist research paradigms (e.g. reproducibility, replicability), are unconvinced about the need for adaptations, and expect textbook applications of traditional GT methods using a checklist approach. Debunking attempts at applying GT as “this is not GT” or recommending a post facto re-branding to case study or interview-based study is not constructive and detrimental to future GT attempts.
3.4 Misalignment and Need for Evolution
“You can’t use an old map to explore a new world” – Albert Einstein
What makes successful GT practice so challenging in SE? At the heart of these challenges lies the fundamental misalignment of traditional, sociological GT guidelines to study software engineering’s socio-technical research context.
While traditional GT methods have served well to study the human and social aspects of SE, successful GT studies continue to be exceptions, not the norm in SE research (Stol et al., 2016). Designed in the mid 1960’s by and for sociologists to study social phenomena, understandably, traditional GT methods are not well equipped to fully address SE’s socio-technical research context. As a first step to adapting GT for SE, the key limitations of traditional GT methods and the new opportunities afforded by SE need to be acknowledged. For example, traditional GT guidelines are spread across multiple books and three different GT versions, leading to the #NoVersionControl challenge. SE researchers will benefit from having a single GT method, offering flexibility in its application. Similarly, SE researchers are keen to explore modern research techniques such as sentiment analysis within GT, with some success (Madampe et al., 2020), but it can easily lead to #DIYGoneWrong. SE researchers will benefit from modern GT guidelines that explain the use of modern data types, sources, collection, and analysis techniques. Table II provides a set of such contextual limitations of traditional GT guidelines, their mapping to the associated SE challenge patterns, and how these are addressed in the STGT guidelines (described in the later sections).
Traditional social science research methods have been successfully ‘translated’ and explained for the SE research community. Runeson and Höst (2009) provided guidelines for conducting and reporting Case Studies in SE research, while Sharp et al. (2016) explained the role of Ethnography for SE contexts, making it easier for SE researchers to apply social science methods to studying SE phenomena. Other efforts, outside of SE, to update traditional social science methods for modern times include Virtual ethnography, a modern update to traditional ethnography to cater to “novel social spaces” (Hine, 2008) and Visual ethnography (Schwartz, 1989; Pink, 2020).
Similarly, a modern, contextualised update to the traditional GT guidelines will enable SE researchers to conduct socio-technical GT research with confidence, harnessing the best of traditional and modern research opportunities, tools, and techniques. The first step in this direction is to define the new world – the socio-technical research context (section 4), followed by describing the new map – the socio-technical grounded theory (STGT) method (sections 5 to 7).
4 Socio-Technical Grounded Theory
Socio-Technical Grounded Theory (STGT) is an iterative and incremental research method for conducting socio-technical research using traditional and modern research techniques to generate novel, useful, parsimonious, and modifiable theories. Distinguishing features: interleaved rounds of basic data collection and analysis, emergent or structured mode of theory development through advanced data collection, analysis, and theory development procedures, using primarily inductive but also deductive and abductive reasoning.
Motivated by the common challenges of the software engineering research community in understanding and practicing traditional GT guidelines (described in section 3.3) and with the aim to update traditional GT guidelines for conducting socio-technical research (defined next, in section 4.1), I present Socio-Technical Grounded Theory.
STGT is an iterative and incremental research method for conducting socio-technical research using traditional and modern research techniques. It is iterative because it involves interleaved cycles of data collection and analysis that inform and support each other. It is incremental because each cycle progresses the research project toward theoretical outcomes.
The STGT guidelines include its socio-technical research context (section 4.1), philosophical foundations (section 4.2), methodological steps and procedures (section 5 and 6), its application, outcomes, and reporting (section 7), and evaluation (section 8). Table IV presents an overview of the STGT guidelines while comparing it to traditional GT guidelines.
4.1 Socio-Technical Research
To understand the underlying foundations of STGT, it is important to first define what is meant by socio-technical research. Figure 1 shows the socio-technical research framework for grounded theory, with its four dimensions:
-
•
Phenomenon – what is being studied? what is the level of interplay between the social and technical aspects?
-
•
Domain and actors – in which field or discipline does the phenomenon occur? who are the actors?
-
•
Researcher – who is conducting the research? what are their knowledge and skill sets?
-
•
Data, tools, and techniques – what is the nature of data being collected? what tools and techniques are being used?
These dimensions capture a research landscape or context that is fundamentally different to the native contexts for which traditional GT methods were originally designed.
Socio-Technical Phenomenon
Software engineering abounds with socio-technical phenomenon, where human and technological interactions are tightly coupled, such that studying one without the other makes for an incomplete investigation and understanding. For example, the core SE practice of programming is not strictly technical, rather, it is socio-technical, involving intensive human-human collaboration and coordination and human-technology interactions. Similarly, most SE practices such as pair programming, designing prototypes, customer collaboration, software deployment, maintenance and DevOps, are all socio-technical in nature.
Research communities such as human-computer interaction (HCI), computer-supported cooperative work (CSCW), and cooperative and human aspects of software engineering (CHASE) have been at the forefront of exploring the human and social aspects. Most topics covered by these areas involve socio-technical, rather than strictly social, phenomena. More recently, with the rise of artificial intelligence (AI), there is renewed interest in gaining in-depth understanding of human factors (Hidellaarachchi et al., 2021), human values (Perera et al., 2020; Hussain et al., 2020), and the human-in-the-loop in order to build more realistic, responsible, and usable AI-enabled systems.
Some contemporary phenomena and future topics that are particularly suited for STGT studies in SE/AI and other socio-technical domains include:
-
•
understanding human aspects (e.g. personality, gender, age, emotions, motivation, etc.) in SE/AI,
-
•
understanding human values (e.g. ethics, privacy, safety, security, morality, etc.) in SE/AI,
-
•
understanding social aspects (e.g. collaboration, coordination, formation etc.) in SE/AI,
-
•
role of technology (e.g. social media, virtual/remote collaboration tools) in crisis management, remote work, online education, political campaigns, gaming, etc.,
-
•
human- and social-centered design of inclusive and responsible SE/AI systems,
-
•
understanding socio-technical aspects of specific SE practices, e.g. requirements elicitation, user-centered design, estimation, pair/mob programming, etc.
Some meta-research topics that will benefit from STGT studies include:
-
•
nature of human reality in the digital/AI age (ontology),
-
•
role of researcher in studying ST research (epistemology),
-
•
ethics in SE research,
-
•
credibility in SE research,
-
•
practical/industrial relevance in SE research,
-
•
bias in SE reviewing (e.g. applying inappropriate evaluation criteria: quantitative vs qualitative, research paradigm),
-
•
enabling interdisciplinary research teams
Acknowledging these phenomena as socio-technical, in SE and other domains, is the first step to enabling more complete and comprehensive interpretations, constructions, and rendering of modern human realities.

Socio-Technical Domain and Actors
With increasing proliferation of technology and digitalisation, most domains are becoming socio-technical to varying degrees, e.g. banking, medicine, education, and retail. Domains such as SE, computer science, human computer interaction (HCI), artificial intelligence, and information systems, are inherently and inextricably socio-technical domains, with tight coupling between its social and technical aspects. Domains such as SE do not simply represent a user base of technology, they are the birthing cradle of information technology and systems. Technology is not only a prominent enabler or feature in these domains, it is the core raison d’être for organisations, e.g. for software companies.
Actors, people who play key roles in the phenomenon, in these domains are not regular users of technology. For example, software engineers, data scientists, computer scientists – the actors in the SE domain – are the creators and maintainers of software technology, tools, and platforms that are increasingly becoming indispensable to all domains. Often referred to as ‘geeks’, they display their own unique language, interactions, culture, worldviews, and social norms.
Software engineers are particularly interesting to study, as they are both the producers and users of technology. They are not only the architects of the software platforms that enable multiple realities, they too function and interact within combinations of these worlds. For example, using a combination of real and digital workspaces, artefacts, and communication mechanisms such as GitHub, physical Scrum task boards, online tools such as Trello or JIRA, physical and virtual pair programming, using Slack channels for team communication, face-to-face and Zoom meetings. All these involve intensively socio-technical interactions.
More generally, depending on the domain and the phenomenon, the actors being studied may be software engineers including developers, business analysts, testers, senior managers, or software users, including classroom teachers, spacecraft designers, satellite developers, government officials, doctors, nurses, psychologists, artists. Additionally, STGT studies may involve actors that are human, virtual avatars, or AI agents, depending on the nature of the realities where phenomena occur and are studied.
Socio-Technical Researcher
A socio-technical researcher has the requisite knowledge and skills from social sciences, philosophy, qualitative research, and theory development as well as technical background and/or experience applicable to the domain. Lack of sociological background is seen as hampering grounded theories (Martin, 2019; Gibson and Hartman, 2013). This does not imply that an ST researcher needs to be an expert in each of these areas. Neither do they have to be one person. To conduct quality STGT studies, a socio-technical researcher, or an interdisciplinary team of researchers, should typically possess a core repertoire of knowledge and skills.
-
•
philosophical foundations – understanding fundamental philosophical concepts such as types of reasoning, ontology, epistemology, and research paradigm. A basic overview of these concepts is presented in section 4.2.
-
•
qualitative research – designing research protocols, ethics assessment/approval, collecting and analysing qualitative data.
-
•
theoretical sensitivity – affinity for theoretical abstraction, conceptualisation, and theory development.
-
•
socio-technical sensitivity – an affinity for exploring and understanding the intricacies of human interactions and socio-technical aspects of phenomena.
-
•
domain knowledge – an understanding of the research domain, typically provided by a subject matter expert.
When equipped with relevant background in social sciences and theory development, SE researchers become socio-technical researchers. Those successful in conducting quality GT studies in SE so far (Table I) have applied their repertoire of knowledge and skills gained from social sciences, philosophy, theory development, and SE. Researchers lacking the requisite knowledge and skills struggle to achieve quality research outcomes (Storey et al., 2020).
Similarly, sociologists with required technical background are well placed to be socio-technical researchers, conducting STGT studies of phenomenon with prominent and interwoven role of technology. An inter-disciplinary team of software engineering researchers or technologists, sociologists, and domain experts (e.g. banking and finance researchers for studies in the banking domain) can also make for an effective cross-functional socio-technical research team to study an increasingly socio-technical world. The importance of inter-disciplinary and trans-disciplinary research teams is not only relevant to STGT studies, but is also being acknowledged in empirical software engineering in general (Fernández and Passoth, 2019).
Socio-Technical Data Tools and Techniques
Socio-technical researchers studying socio-technical phenomena, domains, and actors can leverage from a range of traditional and modern research data, tools, and techniques.
Traditional data types, sources and collection techniques such as interviews, observations, and field trips offer rich data and insights. However, since the introduction of GT about half a century ago, a number of modern research data, tools, and techniques have emerged. While traditional GT guidelines are not averse or impervious to them, they understandably do not provide explicit guidance or examples of how best to harness them.
SE researchers are at the forefront of the technological revolution in research. Modern research data, tools, and techniques have given rise to entirely new research communities, e.g. mining software repositories. Modern types and sources of data such as publicly available software code repositories with commit messages, source code, documentation, and wikis regularly used in SE research, make for potent new avenues for theory development. Especially, when combined with modern techniques such as data mining and natural language processing, they can be used to extract large amounts of data, hitherto impossible with manual techniques. However, easy access to publicly available datasets and powerful search tools can be tempting to practice ‘data-driven research’, where easy access to large datasets precedes and motivates, often trivial, research questions and studies. On the contrary, the use of publicly available, large datasets in STGT is envisioned as a mechanism to scale theories to cover wider contexts of use, while still being firmly grounded in empirical evidence and following systematic and rigorous STGT steps and procedures to iteratively and incrementally derive key patterns from underlying data.
Additional sources of data such as research literature and grey literature are also being explored for theory development (Martin, 2019). In fact, Wolfswinkel et al. (2013) have presented a GT-based approach to conducting literature reviews, called the grounded theory literature review (GTLR). Future STGT studies in SE can explore these avenues to create modern renditions of grounded theories.
Furthermore, traditional GT analysis techniques, open coding and constant comparison, can be complemented with modern analysis techniques such as sentiment analysis (Calefato et al., 2018), e.g. to study emotions in software engineering. Currently, such innovations are limited to individual studies (Madampe et al., 2020). In order to influence improvements in GT practice in SE research at scale, the use of modern data, tools, and techniques need to be acknowledged and documented as method guidelines.
Future STGT studies can explore the use of cutting edge tools and techniques such as artificial, virtual, augmented, and mixed reality devices and platforms to achieve immersive and experiential data collection in natural and alternative realities. The improving efficacy of automation tools promises to ease and augment many research steps such as transcriptions, data collection, and even analysis.
TYPES OF REASONING (AKA INFERENCE OR LOGIC) | |||
---|---|---|---|
Reasoning | Description | Example | Manifestation in STGT |
Induction | A bottom-up approach of drawing general conclusions from patterns evidenced across specific instances. Strength of evidence improves certainty of the conclusion that is localised not universal. | A sparrow in a city is observed with brown-black-white feathers, another sparrow is observed with similar coloured feathers, and another, until the observer finds a pattern amongst a number of sparrows and concludes that all sparrows in the city are likely to have brown-black-white feathers. | Open coding, targeted coding, and constant comparison. |
Deduction | A top-down approach of drawing conclusions about specific instances from general facts or theories. The conclusion can be verified through repeated testing, giving rise to the term hypothetico-deductive approach, and is guaranteed certainty until proven false. | All birds have beaks, a Pigeon is a bird, so a Pigeon has a beak. | Theoretical sampling. |
Abduction | A ‘detour’ approach of proposing the best possible explanation based on all the evidence, albeit incomplete, available. Abduction plays an important role in theory development. | If bird feathers are spotted on the doormat and the family cat is seen with some feathers on their paws, the most likely explanation is that the cat ate the bird. | Memoing, theoretical structuring, theoretical integration. |
ONTOLOGY, EPISTEMOLOGY, & RESEARCH PARADIGMS | |||
---|---|---|---|
Concept | Definition | Type & Example | Manifestation in STGT |
Ontology | Ontology refers to what we believe exists or what we perceive as reality, in a research context. | Only directly observable material bodies are real, or both material and immaterial forms are real, or reality includes socially constructed concepts such as identities and beliefs. | STGT enables the study of physical, virtual, combined, and simulated realities. |
Epistemology | Epistemology refers to what we treat as knowledge and how we can know about what we perceive as reality, in a research context. | A subjectivist epistemology holds that knowledge about actors & phenomena is subjectively constructed through inputs from the participants by non-interchangeable researchers. An objectivist epistemology holds that knowledge about objects & phenomena is derived through neutral measurements or observations by interchangeable researchers. | Subjective epistemology for physical, virtual, and combined contexts. Objective epistemology possible for some simulated contexts. |
Research Paradigm | Research paradigm refers to the combination of ontological and epistemological stances that informs the researcher’s worldview. | Constructivism aligns with the concept of a socially constructed reality that is subjectively co-constructed. Positivism aligns with the concept of a standalone external reality that can be objectively observed. | Constructivism with Symbolic interactionism in physical, virtual, & combined contexts. Positivism in simulated contexts. Combinations possible (e.g. constructivist-feminist GT) |
4.2 STGT – Philosophical Foundations
This section presents the fundamental philosophical concepts requisite to conducting quality STGT studies including types of reasoning, ontology, epistemology, and research paradigms. These are based on Peircean (Peirce, 1960; Burch, 2018; Douven, 2017), more modern philosophical views (Guba and Lincoln, 1994) and relevant software engineering literature (Russo and Nuseibeh, 2001; Easterbrook et al., 2008; Wohlin and Aurum, 2015; Seaman, 1999) and are presented in an accessible way for researchers non-native to social science. Table III presents an overview of these key concepts including simple definitions, examples, and how they are manifested in research. Table IV presents a comparison of STGT with traditional Glaserian, Strauss-Corbinian, and Constructivist GT, across their philosophical foundations (second block). Understanding these fundamental philosophical concepts is critical to performing quality research studies and evaluations, especially STGT studies.
Types of Reasoning
GT challenged the top-down hypothetico-deductive approach to verifying existing theories dominating the sociological research scene in the mid-1960’s and steered it toward a bottom-up inductive approach to generating new and original theories. In a bid to delineate its stance, traditional GT overemphasises the role of induction but classifying GT as purely inductive is naïve (Haig, 1995). Different types of reasoning, inductive, deductive, and abductive are seen to apply in theory development. Please refer to Table III for their simple definitions and examples.
The role of deduction in guiding theoretical sampling is acknowledged in Glaserian GT (Glaser and Strauss, 2017, chapter II). Strauss-Corbinian GT suggests a greater role of deduction in the later stages of a GT study as the emergent theory is systematically verified against the data (Strauss and Corbin, 1990) while Glaserian GT is strongly opposed to this approach, claiming it as “forcing” rather than emergence of theory (Glaser, 1992).
On the other hand, the role of sudden or slow-dawning “sensitive insights of the observer” is acknowledged as “the root sources of all significant theorizing” (Glaser and Strauss, 2017, chapter XI). While the descriptions of these “insights” are close to the definition of abduction by Charles Sanders Peirce who defines it as “a process of forming an explanatory hypothesis” (Douven, 2017; Haig, 1995), Glaserian GT does not explicitly acknowledge the role of abduction in theory development. Strauss-Corbinian GT is said to have been influenced by abduction but its reference remains limited to a footnote in Strauss’ book (Charmaz, 2014; Strauss, 1987).
Charmaz acknowledges abduction as a type of “inferential leap” required of the GT researcher when they arrive at a surprising finding that does not fit the emergent patterns (Richardson and Kramer, 2006; Charmaz, 2014). Using this “imaginative way” of reasoning, the researcher comes up with useful explanations (theoretical conjectures or inferences) in an attempt to account for the surprising finding, then returns to re-examine or collect more data to check them (Charmaz, 2014).
The role of abduction in developing grounded theories remains to be fully explained (Bruscaglioni, 2016; Martin, 2019). Understanding different abductive reasoning modes such as reason or hunch, clue, metaphor or analogy, symptom, pattern, and explanation (Shank, 1998) opens a variety of avenues for creative thinking which is core to theorising and theory development (Martin, 2019). For example, abduction is the basis for diagnosis in medical sciences, where plausible hypotheses are explored and then accepted or discarded based on how well they ‘fit’, i.e. explain the symptoms being observed. Criminal investigations also offer examples of the three types of reasoning, particularly abduction in action, most prominently demonstrated in the engaging pursuits the fictional character of Sherlock Holmes, but also referred to by Peirce from a personal ‘detective’ experience (Reichertz, 2007). Similarly, abduction is applied in research for theory development to suggest hypotheses based on evidence. Through iterative data collection, analysis, and memoing, these hypotheses fall through or become established over the course of the study. Additionally, coming up with robust hypotheses about a socio-technical phenomenon requires a full understanding of the socio-technical context.
STGT, like traditional GT, follows a predominantly and overarching inductive approach, drawing generalised conclusions from patterns evidenced across specific instances. Induction in STGT is most clearly manifested in its open and targeted coding and constant comparison procedures where evidence is repeatedly raised in levels of abstraction toward a theory. STGT acknowledges and explains the role of deduction as applied in theoretical sampling and, unlike traditional GT (see Table IV), the role of abduction as applied in memoing, theoretical structuring, and theoretical integration (procedures described later.)
Ontology
Ontology refers to what we believe exists or what we perceive as reality, in a research context. As researchers, we hold different views about what constitutes reality in the research context, referred to as our ontological stance. For example, we may believe only directly observable material bodies in the natural world constitute reality that is certain or that reality includes socially constructed concepts such as identity, culture, and beliefs, and combinations thereof. Researchers inclined to study the former (material reality) are often seen to prefer what are typically referred to as ‘hard’ sciences such as physics and maths, where the reality of the research context includes relatively stable phenomena (e.g. acceleration, gravity) studied through observing or measuring actions on interchangeable objects (e.g. apples), typically using quantitative research (numbers, accuracy, precision). Researchers inclined to study the latter (socially constructed concepts as reality) are often seen to prefer what are traditionally referred to as ‘soft’ sciences such as psychology and sociology, where the reality of the research context includes phenomena that are full of variations (e.g. driving, cooking, programming) studied through interactions with non-interchangeable subjects (e.g. people), typically using qualitative research.
Software engineering research predominantly involves studying socio-technical phenomena that unfold in multiple realities, beyond the traditional physical reality, e.g. in artificial, virtual, augmented, and combinations of realities. For example, studying a typical software development team involves physical contexts (e.g. physical scrum boards) and physical interactions (e.g. physical daily standups) as well as virtual contexts (e.g. project management on JIRA or Trello), virtual interactions (e.g. video calls, emails, virtual chat, online discussion forums), and even combined interactions (e.g. two developers collaborating in-person over a virtual task board self-assigning tasks using symbols such as their online avatars). Contexts such as remote work are primarily virtual, offering different affordances, norms, and symbolism (e.g. “you are on mute” makes no sense in a physical meeting). Studying social interactions between real people using their online profiles and avatars on online social media platforms such as Twitter and Facebook presents yet another type of reality as real versus fake identities come into play (e.g. symbols such as the ‘blue tick’ on Twitter enabling authentication.) These multiple realities are defined by intermixing operational contexts, symbols, languages, norms, and guidelines of both physical and virtual worlds, offering same if not more complexity than the physical world. While traditional GT guidelines address physical realities, STGT acknowledges the diverse and combined ontological frames prevalent in SE and provides guidelines to enable the study of physical, virtual, combined, and simulated realities.
Epistemology
Epistemology refers to what can be treated as knowledge and how that knowledge is gained, in a research context. As researchers, we hold different views about how we can gain knowledge about the reality of our research context, referred to as our epistemological stance. For example, we may believe that we can learn about the research world (objects and phenomena therein) through neutral measurements or observations where exchanging an observer with another will yield the same results. This is referred to as an objectivist epistemology. On the other hand, we may believe that we can learn about the research world through subjective interpretations of observations where exchanging the observer with another will likely yield different findings. This is referred to as a subjectivist epistemology.
STGT highlights that a researcher’s epistemological stance is typically determined by two aspects, the nature of the reality being studied, e.g. physical, virtual, augmented, combined, and the researcher’s own preferences. For example, STGT studies of virtual game worlds can be designed with an objectivist approach where the game analytics are used to collect objective metrics about the game (e.g. lives lost, levels successfully cleared, time taken per level, choice of actions and interactions). Similarly, STGT studies of combined realities, such as development team environments, can apply a mix of objective metrics and observations that are interpreted subjectively.
Research Paradigm
A combination of ontological and epistemological stances form a research worldview. Research paradigms formally capture the researcher worldview about what they believe is reality (ontology) and how knowledge about that reality can be gained (epistemology), in a research context. Popular paradigms include positivism, post-positivism, constructivism, symbolic interactionism, and pragmatism. Let us consider the prominent paradigms associated with GT below, compared under philosophical foundations in Table IV.
• Positivism refers to a definite, assured, and certain nature of reality that can be studied through direct observations, and excludes traditional metaphysics and theology. The ontological lens used in positivism suggests reality is external and standalone, irrespective of human existence or interference, and ready to be discovered through (only) what can be observed and measured. The observer’s job is to discover and render it accurately. The epistemology native to positivism is objectivist which suggests knowledge about reality is discovered through objective observations and measurements by neutral and interchangeable observers who do not influence the reality by their presence or measurements. Consequently, positivists emphasise verification through testing of hypotheses or existing theories against observable facts and value reproducibility, replicability, refutability, generalisability, and statistical significance. Research methods typically following a positivist paradigm include controlled and quasi-experiments, although survey and case study research is also known to be conducted with a positivist approach (Easterbrook et al., 2008).
Positivism is naturally associated with quantitative data and deductive reasoning, drawing specific conclusions from general facts through repeated testing and proving/falsification of hypotheses. Historically, quantitative researchers with positivist inclinations have been staunch critics of qualitative research often conducted with interpretive approaches (Charmaz, 2014). Post positivism, on the other hand, refocuses attention on falsification of hypothesis, increasing confidence in proposed theories with every failed attempt at falsification (Easterbrook et al., 2008).
Classic or Glaserian GT has been associated with a positivist approach (Charmaz, 2006).
• Constructivism supports the ontology of a reality that is socially constructed as (opposed to being external, standalone reality) and an epistemology that supports the view that the researchers’ and participants’ presence and interactions construct the reality that is studied (as opposed to some objective and neutral discovery of an external reality). In other words, “knowledge and truth are created, not discovered by mind” (Schwandt et al., 1994). Because of its native subjective stance, researchers are not considered neutral or interachangeable, rather mediums through which observations are made and interpreted in unique ways. Consequently, findings from a research study conducted using an constructivist approach are acknowledged to be limited to the same or similar contexts in which they were conducted and not universally generalisable.
Constructivists emphasise generation of new knowledge or theories over verifying existing knowledge or theories, and value originality, novelty, usefulness, and real-world significance. Research methods most closely aligned with a constructivist approach include ethnography and case studies (Easterbrook et al., 2008).
The primary premise of introducing a new version of GT by Charmaz was to introduce and apply a constructivist research approach to conducting GT (Charmaz, 2006).
RESEARCH CONTEXT | ||||
---|---|---|---|---|
Research Method | Phenomena | Domain | Data, Tools & Techniques | Researcher |
Socio-technical GT (Hoda, 2020) | Socio-technical, with interwoven social and technical aspects (e.g. ethics in AI). | Socio-technical, software engineering (original), computer science, artificial intelligence, human computer interaction, information systems, and other domains with varying levels of social and technical interplay. | Qual data (primary), Quant data (supplementary). Traditional e.g. interviews, observations, and Modern, e.g. data mining, NLP, sentiment analysis, immersive techniques. | Socio-technical researcher/team, with technical/domain knowledge & relevant skills in social sciences, philosophy, qualitative research, & theory development. |
Traditional GT methods (Classic, Strauss-Corbinian, Constructivist) | Social, primary focus on humans and social interactions (e.g. social loss). | Various, medicine (original), nursing, psychology, education, attempted in software engineering with varying success. | Qual data (primary), Quant data (supplementary). Traditional, e.g. interviews and observations. | Sociologist/team, with knowledge and skills in social sciences, philosophy, qualitative research, and theory development. |
PHILOSOPHICAL FOUNDATIONS | ||||
---|---|---|---|---|
Research Method | Reasoning | Ontology | Epistemology | Research Paradigm |
Socio-Technical GT (Hoda, 2020) | Inductive, deductive & abductive | Context-specific, e.g. physical, virtual, combined, simulated. | Context specific, e.g. subjective in natural settings, objective in simulated world. | Context specific, e.g. constructivism in natural and virtual worlds, positivism in simulated worlds. Combinations possible. |
Constructivist GT (Charmaz, 2006) | Inductive & abductive | Physical (original) | Subjective | Constructivism |
Strauss-Corbinian GT (Strauss & Corbin, 1990) | Inductive & deductive | Physical (original) | Interpretive | Symbolic interactionism |
Glaserian GT (Glaser & Strauss, 1967) | Inductive (& hints of deductive) | Physical (original) | Objective (original) | Positivism |
METHODOLOGICAL STEPS AND PROCEDURES | |||
---|---|---|---|
Research Method | Literature Review | Data Collection | Data Analysis & Theory Development |
Socio-Technical GT (Hoda, 2020) | Lean literature review, Targeted literature review. | All is data, with care – qual. primary (quan. supplement) from credible sources with ethical considerations, using traditional & modern approaches via different sampling initially & theoretical sampling later. | Basic Stage – open coding, constant comparison, basic memoing. Advanced Stage – advanced memoing and option (1) Emergent mode (targeted coding, constant comparison, theoretical structuring) or option (2) Structured mode (structured coding, constant comparison, theoretical integration). |
Constructivist GT (Charmaz, 2006) | Customised literature review. | Variety of elicited and extant qual. data via initial & theoretical sampling. | Initial coding, focused coding, theoretical coding. |
Strauss-Corbinian GT (Strauss & Corbin, 1990) | Multiple & extensive use of literature review. | Variety of primarily qual. data via theoretical (open, relational, discriminate) sampling. | Open coding, constant comparison, axial & selective coding, memoing, sorting. |
Glaserian GT (Glaser & Strauss, 1967) | Avoid literature reviews in same area. | All is data – qual. primary (quant. possible) using interviews & observations via theoretical sampling. | Open coding, constant comparison, selective coding, memoing, theoretical sorting, theoretical coding. |
EVALUATION GUIDELINES | |||
---|---|---|---|
Research Method | Evaluating Method Application | Evaluating Partial Findings | Evaluating Mature Theories |
Socio-Technical GT (Hoda, 2020) | Credibility, rigour | Originality, relevance, density | Novelty, usefulness, parsimony, modifiability |
Constructivist GT (Charmaz, 2006) | - | - | Credibility, originality, resonance, usefulness |
Strauss-Corbinian GT (Strauss & Corbin, 1990) | Sampling, major categories, indicators, theoretical sampling, hypotheses, discrepancies, core category | - | Concepts, categories, and links generation, variation, process reliability, significance |
Glaserian GT (Glaser & Strauss, 1967) | - | - | Fit, work, relevance, modifiability |
• Other paradigms, perspectives, theories include pragmatism (James and Burkhardt, 1975), which aligns with the ontology that sees reality as what is of practical use and an epistemology that is based on understanding reality in ways that address what practical difference it might make to the observer. Symbolic interactionism, often referred to as a theory rather than a paradigm (Schwandt et al., 1994), is guided by the ontological view that reality is what is socially understood and the epistemological view that reality is perceived through social interactions that define and redefine symbolic meaning of concepts. Strauss-Corbinian GT is characterised as building on pragmatism and symbolic interactionism underpinnings (Strauss and Corbin, 1990).
In addition to research paradigms, some unique perspectives can be applied to research. Feminism is concerned with the representation of females in the studied phenomena and so applies that lens to all aspects of research including research design, data collection, analysis, presentation, and evaluation. Critical theory is based on the belief that research serves a political purpose and can be used to empower specific groups, specially minority groups through raising awareness and recommending change.
While no one research paradigm or worldview is correct or wrong, it is important for researchers to be aware of and declare their philosophical stance because it influences how we conduct studies, present findings, and most prominently, how we evaluate research. For example, a positivist reviewer evaluating a research study conducted using a constructivist approach is likely to – consciously or unconsciously – apply positivist evaluation criteria, inappropriate for the study, e.g. look for generalisability instead of novelty or expect reproducibility and replicability instead of credibility and rigour, and vice-versa. Further information on research paradigms and their impact on research design can be consulted in (Seaman, 1999; Easterbrook et al., 2008; Wohlin and Aurum, 2015; Russo and Nuseibeh, 2001). The use of research paradigms in SE research has been explored in (Melegati and Wang, 2021).
An interesting point to note is that an individual’s approach to life in general, i.e. their general worldview, while likely to influence their research worldview, need not be the same. For example, an individual with a pragmatic approach to making their life choices may consciously chose to apply a positivist approach to their research study.
More often, researchers prefer a research paradigm over another and resort to applying their preferred paradigm for a majority of their research studies. However, researchers need not be bound to any one research paradigm for all their studies and can chose to conduct different studies using paradigms best suited to the nature of the study. Such flexibility requires a mature understanding of philosophical foundations and high levels of researcher self-awareness and reflection. Similarly, research paradigms and perspectives can be combined, e.g. critical positivist or feminist-constructivist to achieve unique research aims and flavours.
STGT Research Paradigms
Traditional GT methods have been associated with particular paradigms, Glaserian GT with positivism (Charmaz, 2006), Strauss-Corbinian as building on pragmatism and symbolic interactionism (Strauss and Corbin, 1990), and Charmaz’s GT with constructivism (Charmaz, 2006).
STGT, as the name suggests, applies a socio-technical worldview to conducting GT research, which is open to selecting and applying specific ontological and epistemological stances as best suited to the research context. In other words, STGT can be applied using different research worldviews based on the ontology applicable to the study and on the researcher’s preference. This does not equate to being paradigm ignorant or agnostic, rather it requires a careful consideration of the research context to ascertain the choice of paradigm to apply.
For novice researchers, some suggestions are provided. For example, STGT studies of physical and/or combined realities, such as offered by software team environments, may be conducted using a subjective approach where the nature of the findings are dependent on the researcher conducting the data collection and analysis. On the other hand, STGT studies of simulated and virtual realities, such as in computer games, may apply an objective approach, leveraging in-game analytics. Studying end-user interaction with AI manifested as chatbots or robots offers similar contexts. Such simulated and combined realities can be designed to be finite and deterministic, manifesting closely the idea of an external standalone reality with limited possible interpretations, best captured by a objectivist stance.
Some research areas can be studied using different worldviews depending on the study focus and objectives. For example, a study on test case selection can be conducted using an objective approach if focusing purely on the technical tools and their performance, validating and verifying outcomes against goals. While a constructivist approach can be applied to the same research context if considering the tester’s motivations, rationales, and behaviours with regards to test case selection. STGT studies can also combine a selected epistemology with established theories and perspectives used as an additional lens, e.g. a constructivist-feminist STGT study of female contributors to open source software, their experiences, challenges, and strategies.
By acknowledging the research paradigms and perspectives, researchers can remain aware and consistent in their application, and actively set the expectations for their readers and reviewers so that the findings can be understood and evaluated in commensurate ways.

.
5 STGT Method – Overview and Basic Stage
These next sections present an overview of the Socio-Technical Grounded Theory method and describe its methodological steps and procedures. Further details of the STGT method introduced in this article can be found in the author’s forthcoming book. Example applications of specific steps and procedures (e.g. open coding, memoing) are referenced throughout.
5.1 STGT – An Overview
The STGT method comprises of steps and procedures adapted from the three different versions of traditional GT – Glaserian, Strauss-Corbinian, and Constructivist – in a way that eases entry into an STGT study through a basic data collection and analysis stage and offers flexibility through two options in the advanced theory development stage, following either an emergent approach to theory development, similar to Glaserian GT, or a structured approach, similar to Strauss-Corbinian GT. It also offers flexibility in the choice of ontology, epistemology, and therefore, research paradigm applied as best suited to the study context, including the constructivist paradigm as applied in Constructivist GT.
Table IV provides a comparison of STGT with the traditional GT methods, including similarities and adaptations. For example, STGT’s approach to literature review is closer to the Constructivist GT approach, looking for a balance between being sufficiently informed versus overly influenced by existing works. STGT’s approach to progressively narrow down the scope of the data collection and analysis to focus on the key emerging categories over time reflects the same fundamental trajectory of all traditional GT methods as they move from open to some form of targeted data collection and analysis. Like Strauss-Corbinian GT, STGT distinguishes between evaluating the research method application and outcomes. Additionally, STGT also acknowledges partial findings and lists criteria to evaluate them differently to mature findings.
Figure 2 presents a visual model or diagram of the socio-technical grounded theory method including its key stages, depicted within two large blocks of basic and advanced stages (labelled at the top), steps and procedures (middle row) including options of emergent or structured modes of theory development, and outcomes progressively emerging from the steps (bottom row).
Basic stage – Data Collection and Analysis (DCA), involving the steps of lean literature review, study preparation and piloting, and iterations of basic data collection and analysis. Initial concepts and categories start to emerge from the piloting (see bottom row of the model diagram). The development of a few strong categories and preliminary hypotheses mark the end of the basic stage.
Advanced stage – Theory Development, involving targeted literature review as and when needed, and selecting from two modes of theory development:
-
•
Emergent mode, enabling the emergence of theory through the iterative targeted data collection and analysis step which ends with theoretical saturation and resulting in a mature theory that is integrated. The theory can be further structured using theoretical presentation templates during theoretical structuring.
-
•
Structured mode, enabling a structured development of theory through structured data collection and analysis ending with theoretical saturation and resulting in a mature theory that is structured and can be further integrated through theoretical integration.
The term development here is used for theory generation or creation. Emerging categories and hypotheses can be reported as interim and mature theory as final findings.
5.2 Literature Review
STGT offers two types of literature reviews, lean literature review and targeted literature review, at the basic and advanced stages of the research project respectively.
-
•
Lean literature review (LLR), a lightweight and high-level review performed early in the study, during the basic stage, to identify research gaps and motivate the need for a study. An LLR may identify that the topic is relatively nascent, with no or few existing theories, and can benefit from an original theory in the area. Alternatively, it may identify that the topic is extensively studied but lacks theories, or the existing theories do not resonate with practical experience (from reading practitioner literature or personal experience as a practitioner in the field). A GT study in the latter case should be attempted by experienced theorists. Where an LLR identifies robust existing theories on the topic, changing or refining the topic is suggested.
-
•
Targeted literature review (TLR), an in-depth review of literature targeting relevance to the emerging/emergent categories and hypotheses, performed periodically during the advanced stage. A TLR helps compare the emergent original findings with existing work and situate them in the wider research landscape, filling research gaps.
STGT offers three unique advantages of literature reviews.
Improving theoretical sensitivity. Learning about theories and theory development outside of the study domain is highly recommended to improve theoretical sensitivity. For example, SE researchers who do not possess relevant training in sociology or theorising will benefit from exploring theories and theory development in other fields such as sociology, psychology, and nursing.
Improving socio-technical sensitivity and domain knowledge. Reading practitioner and research literature within the study domain will improve domain knowledge and socio-technical sensitivity – the ability of the researcher, e.g. non-native or relatively new to the domain, to understand its terminology, concepts, and associated jargon. Domain knowledge and socio-technical sensitivity enables useful and effective data collection, analysis, and reporting.
Refining or advancing the state of theory. This is an advanced practice. With experience in the study domain and in theorising, researchers are likely to become aware of the relevant theories in their field. This knowledge can motivate the need for a fresh look at familiar problems for which a number of solutions, models, or theories may exist, in an attempt to refine or advance the state of theory. At the same time, they will need to ensure that thorough knowledge or reviews of relevant literature does not adversely influence their ability to develop original theories. The theory of becoming agile (Hoda and Noble, 2017) is an example of this principle in action, advancing the state of theory on agile transformations in software teams.
Systematic Reviews. Formal and comprehensive literature reviews popular in SE, such as systematic literature reviews (SLR) and mapping studies (Kitchenham, 2004), are not recommended for novice researchers, those new to research in general or those new to GT. However, these are not overruled for experienced researchers. Systematic reviews and mapping studies primarily apply a positivist paradigm, aiming for completeness, replication of the process, and reproducibility of the results. Where a systematic review is performed in/with an STGT study, the onus is on the researcher(s) to be aware of, manage, and explain the interplay between an SLR’s inherent positivist stance and the stance adopted for the STGT study. They also need to explain the relationship between the findings of the SLR and the subsequent original STGT findings. For example, if applying an emergent mode of theory development, it is important that the researcher(s) explain the role of the systematic review with regards to (a) the mechanisms used to avoid being biased by the review findings, if using a positivist approach or (b) how it informed the construction of the theory, if using a constructivist approach. On the other hand, the themes or taxonomies derived from the systematic review can be used to drive a structured mode of theory development.
Grounded Theory Literature Reviews. As mentioned earlier under additional sources of data in section 4.1, a GT approach to literature review is also possible. Future STGT studies in SE can attempt the Grounded Theory Literature Review method as proposed by Wolfswinkel et al. (2013). Due to its methodological alignment, a GTLR may be better suited to accompany an STGT study as compared to an SLR.
5.3 Study Preparation and Piloting
In preparation for an STGT study, as with most other studies, the researcher(s) needs to select a research topic, a research team, prepare the study protocols, conduct pilot data collection and analysis, and apply necessary refinements. The research team will benefit from including an experienced theorist as a core member or advisor but can also be comprised of members new to theory development, following these STGT guidelines.
Depending on the data collection techniques envisioned, an initial set of research protocols in the form of recruitment strategies, semi-structured interview questions, online surveys or questionnaires, observation protocols, and modern data mining techniques can be designed. An ethical assessment should be conducted, and if applicable, approved by an ethics committee. Once approved, a handful of pilot data collection and analysis instances can be carried out. For example, new researchers will benefit from one or two pilots to test the study protocols followed by necessary refinements, and two or three others to enable familiarity with data collection and analysis procedures. Experienced researchers will also gain from piloting as each topic presents new contexts and challenges.
Data collected as part of piloting can be used in the study unless the piloting was done in contexts different to the actual study (e.g. piloting a survey with research students where the study focuses on industry practitioners) or the protocols needed to be completely redesigned, including changing the research topic, as can happen in rare instances.
5.4 Basic Data Collection and Analysis
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/c22ff87d-acf0-467f-84e4-b6feedd3953f/BasicDCA.png)
Basic data collection and analysis (DCA) is an iterative step involving the procedures of (theoretical) sampling, basic data collection, basic data analysis, and basic memoing.
(Theoretical) Sampling
STGT supports different sampling techniques to get data collection underway, e.g. convenience, purposive, random, or representative sampling, as applicable to the project context and constraints. Once the iterations of basic data collection and analysis start yielding concepts and categories, theoretical sampling can be employed. Theoretical sampling is the ongoing process of assessing the emerging codes, concepts, (sub)categories, and hypotheses, and targeting specific data sources and types for collection in the upcoming iterations that are likely to help identify, develop, and saturate them while filling any theoretical gaps.
Theoretical sampling has been described a “complex form of sampling” (Coyne, 1997). While not including theoretical sampling explicitly, traditional views classify all sampling in qualitative research as purposeful, intentionally selecting data sources based on their specific characteristics (Patton, 1990; Sandelowski, 1995). In this sense, theoretical sampling has been classified as purposeful (Coyne, 1997), driven by the emerging theory, while its dynamic nature can also lend itself to the snowballing strategy. A distinguishing feature of theoretical sampling is that it is not pre-determined, rather ongoing and drives interleaved data collection and analysis.
Because it involves assessing high-level findings to direct data collection in specific instances (moving from general to specific), theoretical sampling can be said to apply deductive reasoning (see Table III) (Becker, 1993). For example, applying theoretical sampling to the emergent self-organising roles on agile teams based on data collected from relatively new agile teams in the early stages of the study (Hoda et al., 2010) led to identifying theoretical gaps around the maturity of the roles. The later parts of the study employed theoretical sampling to target data collection from mature agile teams leading to the subsequent mature and saturated definitions of these roles (Hoda et al., 2012b). Theoretical sampling includes grooming and refinement of the research protocols from time to time, e.g. refining interview questions or keywords for mining public repositories to progressively focus on emerging concepts.
Being aware of the full socio-technical context of the study domain will mean researchers can apply sampling effectively to identify data sources such as participants (e.g. software developers, users, stakeholders) and documentation (e.g. app reviews, GitHub wikis) and apply appropriate data collection techniques such as physical or virtual interviews and observations and automated or manual mining of software repositories and online forums.
Basic Data Collection
STGT accepts a variety of data sources as well as collection approaches. The modern socio-technical world with its multiple realities offers new public sources (e.g. publicly available practitioner blogs, presentations and talks as YouTube videos, and opinion trends on Twitter) and domain-specific sources (e.g. digital project management boards such as Trello, open source software (OSS) repositories such as GitHub, communication channels such as Slack) and modern collection approaches (e.g. immersive and experiential approaches on online social media or in virtual game worlds.)
Data collected from traditional sources such as semi-structured interviews and observations continue to be important, providing rich information, facial and voice cues, and opportunities for follow-up questioning. They offer real examples verifying participant statements and adding necessary contextual background, but are limited in scale due to the manual effort required. Modern data publicly available, on the other hand, e.g. online forums and OSS repositories, and modern techniques such as data mining, offer unique opportunities to collect large amounts of qualitative data. However, these data are not custom elicited and will likely require more effort in selecting, filtering, and cleaning before the can be used for analysis.
A combination of traditional and modern approaches is recommended to achieve best results, where applicable. For example, an STGT study can use traditional data collection techniques such as interviews to elicit an initial set of rich and highly relevant concepts and (sub)categories during the basic stage. These can be used as seed terms to perform large-scale targeted data collection on public and open source data. Without the initial traditional data collection, it would be challenging to identify, select, and filter from the large numbers of and massive public and open datasets. Without modern data sources and techniques, it would be nearly impossible to manually scale the theoretical outputs, e.g. to be more robust and include a wider set of applicable contexts. A combination of both offers unprecedented relevance, robustness, and scale in theory development.
The open nature of modern data sources, where anyone can add any information with minimal authentication and scrutiny but with mass reach, poses new challenges for researchers. While traditional Glaserian GT accepts “all is data” (Glaser and Strauss, 1967), STGT nuances the original stance with the necessary caution required to assess modern data sources and accepts credible data. It also highlights the importance of ethical considerations, especially associated with modern data collection and usage (upcoming in section 7.3). That is, for STGT studies, all is data, with care.
Basic Data Analysis
Basic data analysis includes open coding and constant comparison. Coding is the process of representing textual raw data into condensed formats that best capture its essence and meaning. Using a socio-technical approach to coding enables the researcher to reach beyond analysing the social context and meaning. It ensures the technical aspects are neither ignored nor treated as a token or decorative element (e.g. like a black box), rather they are considered together with the social aspects to capture a more comprehensive socio-technical essence and meaning of the raw data.
Open coding refers to the coding applied in the early stages of the research study where the researcher remains open to any and all codes arising, ensuring a comprehensive coverage. Because of its open nature, open coding is likely to result is large amounts of code. Constant comparison is the process of constantly comparing derived codes within the same source and across sources to identify key patterns in the data. Constant comparison concretely manifests an inductive approach, leading from specific instances toward general patterns, within the study context. Constant comparison is applied at increasing levels of abstraction to raise the codes to the concept level, concepts to sub-category (where applicable) and category levels.
Application of a traditional sociological approach to GT data analysis in a software engineering research context is likely to identify social concepts and come up with rich theories comprising them. For example, applying open coding the traditional way can lead to the identification of concepts that capture how software development teams regularly engage in social activities of brainstorming, discussing, debating, and reconciling choices. However, software engineering is not simply a mixture of social and technical aspects studied in isolation. Socio-technical data analysis in STGT refers to analysing the complete socio-technical context. For example, applying STGT in the same context is likely to uncover and explain the full socio-technical nature of the same activities as brainstorming software requirements, discussing software architecture, debating low fidelity prototypes, and reconciling choices of diverse technologies and platforms. These concepts encapsulate more comprehensive socio-technical meanings and essences of the activities being observed and are more likely to be relevant to and benefit practitioners, a key aim of GT studies. Researchers applying a socio-technical approach to open coding are likely to apply high levels of socio-technical sensitivity, picking up on the nuances of different socio-technical practices involving seemingly similar social activities, e.g. debating low fidelity prototypes as opposed to debating design patterns. Where developers may be biased toward the prototypes they create themselves and therefore more defensive in their debating strategies in the former case and more open to logical argumentation when debating established design patterns in the latter case.
Interestingly, most quality GT studies in software engineering seem to ‘naturally’ apply a socio-technical adaptation to data analysis, although rarely acknowledged and hitherto undefined and unexplained, leading to the formulation of socio-technical grounded theories, e.g., theories of reconciling agile and software architecture (Waterman et al., 2015), becoming agile (Hoda and Noble, 2017), and variations in scrum practice (Masood et al., 2020b). More examples of applying a socio-technical approach to open coding can be found in (Hoda et al., 2011, 2012b; Hoda and Noble, 2017; Masood et al., 2020b, a; Shastri et al., 2021b, a).
Explicitly acknowledging and explaining the data’s socio-technical nature will enable aspiring researchers to conduct quality STGT data analysis. Additionally, while it was considered “very shortsighted to believe that computers are capable of gleaning the meaning embedded in qualitative data” (Becker, 1993), rapid technological advancements in natural language processing, sentiment analysis, and other AI based analysis techniques, promise further easing and augmenting of manual analysis, enabling unprecedented improvements and scaling of qualitative data analysis and theory development in the future.
Basic Memoing
Basic memoing is the process of documenting the researcher’s thoughts, ideas, and reflections on emerging concepts and (sub)categories and evidence-based conjectures on possible links between them. Memos can be in the form of written notes, sketches, annotated images or photographs. While voice and video recording can also be used, it is recommended these are transcribed for ease of further comparison and analysis in the advanced stage.
Memoing is an imperative procedure that distinguishes STGT from other qualitative research methods. It is the mechanism through which researcher reflection is systematically documented and used for theory development. Examples of memos can be found in (Hoda et al., 2012a) and (Masood et al., 2020a).
The basic data analysis stage enables a progressive condensation of the large amounts of initial codes generated during open coding using constant comparison and memoing into concepts and categories, and narrows the focus of analysis. As key concepts, (sub)categories, and preliminary hypotheses start to emerge, the researcher is ready for the advanced theory development stage. Practically, this is manifested in a few key categories being strong and detailed enough to be presented for peer-review, and subsequent workshop presentations, and conference or journal publications.
6 STGT Method – Advanced Theory Development
STGT acknowledges the challenges of the novice socio-technical researcher in understanding each of the traditional sociological GT versions (Glaserian, Strauss-Corbinian, Constructivist), differences between them, and selecting one version upfront with no room for an easy switch or combining of procedures from different versions later on. STGT delays the decision making to an advanced stage where the researcher has experienced basic data analysis procedures and is in a position to understand and decide which mode of advanced data analysis and theory development best applies to their emergent findings.
In the advanced stage, STGT offers a choice of two modes of theory development, emergent and structured, resulting in mature, structured and integrated theories as outcomes. Table V provides an overview of their differences and commonalities. Both modes include targeted literature review and theoretical sampling, as described earlier. Both modes also have advanced memoing and theoretical saturation in common, described next.
Advanced Memoing
Advanced memoing is common to both emergent and structured modes of theory development and involves careful revisiting, reassessing, refining, and comparing of memos. Advanced memoing111Advanced memoing proposed by STGT replaces theoretical sorting in Glaserian and Strauss-Corbinian GT (Glaser and Strauss, 1967; Strauss and Corbin, 1990), acknowledging hypotheses development and structuring as iterative and incremental rather than a one-time activity. serves to confirm and strengthen existing relationships, reject unsupported indicative relationships, and establish new links between (sub)categories.
Reflections captured in memos help develop and solidify emerging concepts and (sub)categories. When memos are compared and related to one another, insights in the form of possible links between different concepts and (sub)categories start to emerge. Exploring these potential links (aka hypotheses) requires abductive reasoning – proposing the best possible explanation based on all available evidence (see Table III – types of reasoning). As more evidence is collected, it adds to the strength of the proposed explanations and emerging links. Applying a socio-technical approach to memoing ensures that the underlying data encapsulates the full socio-technical research context and the researcher can propose explanations based on socio-technical knowledge and understanding. For example, a memo may capture the varying levels of scrum master involvement in team practices across multiple teams observed and propose a relationship with the agile maturity of the team where novice teams show more scrum master involvement as compared to experienced agile teams. Such hypothesis development requires not only an understanding of the social relationships between the scrum master and the team but also an understanding of the team’s socio-technical practices, e.g. daily standup and pair programming, to grasp the level of scrum master involvement to be expected as compared to what is observed.
Emergent mode | Structured mode | |
---|---|---|
Context of Use | Broad phenomenon (e.g. exploring agile project management). | Narrow phenomenon (e.g. investigating self-assignment in practice). |
Open coding applied widely on broad phenomemon. | Open coding applied on narrow phenomenon. | |
Basic stage ended with the emergence of some strong and some not so strong (sub)categories and indicative relationships. | Basic stage ended with the emergence of clear set of (sub)categories that form an overarching structure. | |
Distinct Steps | Iterations of targeted data collection and analysis, targeting most significant (sub)categories and continuing to establish relationships between them. | Iterations of structured data collection and analysis, focusing on saturating individual (sub)categories and firming up relationships between them. |
Theoretical structuring. | Theoretical integration. |
Common Steps | Targeted literature review, theoretical sampling, advanced memoing, theoretical saturation. |
Outcome | Structured and integrated theory or theories. |
Theoretical Saturation
When data collection reaches a point of diminishing returns where further collection does not generate new or significantly add to existing concepts, categories, or insights, the study has reached theoretical saturation. Practically, this manifested as the last few data collection attempts (e.g. last 3-4 interviews) serving to verify the theory. The theory is considered mature as it comprises of strongly supported and dense categories and hypotheses.
6.1 Emergent Mode of Theory Development (Option 1)
The emergent mode allows the researcher to conduct further rounds of iterative and interleaved data collection and analysis in a targeted manner. The emergent mode enables an organic emergence of theory as recommended by Glaserian GT (Glaser and Strauss, 1967). The emergent mode includes iterations of targeted data collection and analysis ending with theoretical saturation and followed by theoretical structuring to derive mature theory.
Context of Use
Where the initial research phenomenon or topic was broad (e.g. exploring agile project management (Hoda et al., 2012a)) and open coding was applied extensively on the broad level, the basic stage will likely lead to the emergence of a number of (sub)categories, some stronger than others, with indicative relationships emerging between them. In this case, where a some strong categories and initial relationships have emerged but a clear theoretical structure is not evident, the researcher can decide to proceed with an emergent mode of theory development which relies on the theoretical structure emerging progressively over time.
Targeted Data Collection
Data collection in the emergent mode is driven by theoretical sampling as described earlier, targeting data sources most likely to help fill theoretical gaps and strengthen key categories, and including refinements of the data collection protocols to progressively focus on the key categories.
Targeted Data Analysis
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/c22ff87d-acf0-467f-84e4-b6feedd3953f/TargetedDCA.png)
Targeted data analysis involves targeted coding which is the same procedure as open coding but applied in a targeted manner, and constant comparison. Unlike open coding, targeted coding222Targeted coding is the same as selective coding in Glaserian GT but STGT avoids using the term selective coding since it refers to two very different procedures in Glaserian GT and Strauss-Corbinian GT. To avoid confusion, STGT uses the term targeted coding. limits analysis to only the most significant concepts and categories arising from the basic stage. However, some new categories can still emerge during targeted data analysis. Advanced memoing ensures the links between (sub)categories are strengthened.
Theoretical Structuring
Once theoretical saturation is achieved, the researcher can decide to present the theory as is, using the emergent structure. However, STGT recommends the researcher to perform theoretical structuring by (a) exploring and identifying the genre of theories that best fits, e.g. process, taxonomy, degree, strategies (Glaser, 1978), and (b) exploring if the emergent theoretical structure naturally maps to a pre-existing theoretical template, e.g. six C’s model, process template with defined stages and entry/exit indicators, and in case of a good fit, use the template to further structure their theory. Theoretical structuring333Theoretical structuring in essence is the same as theoretical coding in Glaserian GT but STGT avoids using that term since the procedure does not involve any actual coding, rather structuring or organising of the emergent theory. is an optional but highly recommended procedure.
The theory of becoming agile is an example of presenting a mature theory in its emergent structure, i.e. a bespoke process model (Hoda and Noble, 2017). The theory was classified under the genre of process theory since it explained the process of software teams becoming agile teams but no explicit pre-defined structure was adopted. On the other hand, in case of the theory of inadequate customer collaboration (Hoda et al., 2011), the pre-defined six C’s theoretical template was adopted at a late stage as a result of exploring Glaser’s theoretical coding families and discovering a good fit with the six C’s template (Glaser, 1978).
6.2 Structured Mode of Theory Development (Option 2)
The structured mode allows the researcher to conduct further rounds of iterative and interleaved data collection and analysis in a structured manner. The structured mode enables structured development of theory as captured by the early forms of Strauss-Corbinian GT (Strauss and Corbin, 1990). The structured mode includes iterations of structured data collection and analysis ending with theoretical saturation and followed by theoretical integration to derive mature theory.
In the structured mode, the researcher can explore possible fits with pre-defined theoretical templates early, coming out of the basic stage. Where fitting, a pre-defined theoretical template can be adopted and used to guide the remaining data collection, analysis, and theory development. For example, templates such as the coding paradigm (e.g. (Masood et al., 2020a)), conditional matrix (Strauss and Corbin, 1990), or the Glaserian six C’s template can be used. Alternatively, the researcher can use their own theoretical structure emerging out of the basic stage to guide further data collection, analysis, and theory development, e.g. (Masood et al., 2020b)). In either case, the advanced stage is guided by a theoretical structure already in place. Sjøberg et al. (2008)’s guidance on defining constructs and propositions provides an early example of using a “grounded theory” based approach to structured theory development within an exploratory case study.
Context of Use
Where the initial research topic was relatively narrow and well defined (e.g. investigating self-assignment practice (Masood et al., 2020a)) and the data collection was focused on well defined facets of the topic, the basic stage can lead to the emergence of a set of (sub)categories and a relatively clear theoretical structure. In this case, it makes sense for the researcher to proceed with a structured mode of theory development.
Structured Data Collection
Data collection in the structured mode is driven by theoretical sampling, collecting data to progressively fit, add, strengthen, and saturate the concepts and (sub)categories already present in the guiding theoretical structure. It includes refinements of the data collection protocols to progressively saturate existing categories and occasionally identify new concepts that appear within the theoretical structure.
Structured Data Analysis
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/c22ff87d-acf0-467f-84e4-b6feedd3953f/StructuredDCA.png)
Data analysis in the structured mode involves structured coding and constant comparison. Structured coding is similar to the original axial coding introduced in (Strauss and Corbin, 1990) in that it applies data analysis using a guiding theoretical structure and aims to identify and strengthen the relationships between the key categories. Advanced memoing ensures the links between (sub)categories are strengthened.

Theoretical integration
Once theoretical saturation is reached in the structured mode, the theory should comprise of dense and strongly supported categories and hypotheses and is considered both structured and mature. But, it may not be fully integrated. Theoretical integration is the procedure of ensuring the (sub)categories are fully integrated into the overall theoretical structure. Practically, this involves finalising the story-line of the theory, answering the question, what is this theory about? or what phenomenon is captured and explained by this theory? For example, in case of (Masood et al., 2020a), theoretical integration toward the later stages involved discussing what basic phenomenon the categories and relationships were best capturing. How agile teams make self-assignment work was decided as the overall story-line or phenomenon that the categories and relationships best helped explain.
Sometimes, integration can result in the emergence of an additional theoretical layer that serves to support and enhance the existing structure. For example, the theory of Scrum variations was structured around three categories, variations in Scrum roles, variations in Scrum practices, and variations in Scrum artefacts (Masood et al., 2020b). In the final stages of integrating the theory, a nuanced classification system emerged which served to further classify the variations across the three categories in terms of standard, necessary, and contextual variations, and clear deviations.
7 STGT - Application, Outcomes, Reporting
Socio-technical grounded theory can be applied in two ways. First, as a full STGT method resulting in novel, useful, parsimonious, and modifiable theories grounded in evidence. And second, in a limited capacity as STGT for data analysis, by applying its basic data analysis techniques, within or alongside other research methods, producing outcomes of original and relevant descriptive findings in the form of key categories or preliminary hypotheses and propositions. Figure 3 shows types of STGT applications and associated possible outcomes. It is important that researchers understand, select, and clearly present their application approach and claim the appropriate outcomes and reviewers evaluate relevant outcomes accordingly, using the STGT Evaluation Guidelines (section 8).
7.1 Full STGT Method
Full STGT method refers to the application of the wider STGT method guidelines beyond basic data collection and analysis and including advanced stages of theory development, ultimately resulting in mature theories (see Figure 2). Full applications can be both standalone and as part of combined studies within larger mixed methods frameworks that apply other research methods, e.g. action research or survey research, and/or are conducted as the research component of research and development (R&D) projects.
Standalone STGT Studies
Standalone STGT studies refer to a full STGT method application in a standalone capacity and do not apply other research methods such as survey research to further validate the theory. Mixed data, both qualitative and quantitative data, can be used within standalone studies. Standalone applications result in what Storey et al. (2020) refer to as descriptive knowledge, with potentially limited prescriptive knowledge in the form of practical guidelines and recommendations (e.g. as seen in Masood et al. (2020a), Masood et al. (2020b), and Shastri et al. (2021b).)
Standalone STGT studies can vary in scale. They can take the form of small studies that address narrow research topics, e.g. impact of human values on software design decisions, mob programming techniques, or the use of coupling and cohesion in microservices design, the last derived from a Twitter post by Grady Booch. They are conducted over a relatively small duration of time, e.g. six to eighteen months, typically by a single researcher or a team of researchers. Small standalone STGT studies are likely to result in one mature theory, usually presented in a single publication, although preliminary and partial results can be reported along the way.
They can also take the form of large scale studies that address broad research topics, e.g. role of ethics in artificial intelligence, future of work in a post pandemic world, and digital equity for the elderly. A comprehensive treatment of such topics requires several years, typically by a team of researchers. A standalone STGT study conducted as a part of a PhD program, typically three to four years in length, is an example of a large-scale STGT study. These are likely to result in multiple theories, usually published as multiple articles. How the multiple theories ‘hang together’ is usually detailed in the thesis report and should be briefly explained in individual articles with cross-reference to related articles.
Combined STGT Studies
There is an increasing trend of performing mixed methods, combined studies, and R&D projects in SE. In such cases, a full STGT study serves as a robust mechanism to research the problem space, leading to the development of highly relevant and customised solutions. For example, a small scale STGT study can be conducted as a first step to establish theoretical foundations, explain the problem in depth, and/or devise taxonomies followed by wider validation and/or tools development. Outcomes in these cases are expected to be what Storey et al. (2020) refer to as descriptive knowledge from the STGT research part that then ideally feed into or lead to the development of solutions, e.g. tools, techniques, platforms, from the application of other methods or in the development component of the R&D project.
7.2 STGT for Data Analysis
A limited application of STGT is possible through using its basic data analysis procedures of open coding and constant comparison, including memoing, within a different research method or framework. For example, the qualitative data collected as part of a case study, survey study, ethnography, action research, interview-based study, or even experiments, can be analysed using STGT’s basic data analysis procedures. These are likely to produce descriptive findings that encapsulate and describe dense codes, concepts, and categories. Using memoing, the beginnings of relationships between the concepts, and (sub)categories may become evident. These can be presented as preliminary hypotheses or propositions (e.g. use of GT for data analysis within a case study (Bick et al., 2017) or within a mixed method research study (Madampe et al., 2020)).
In the absence of applying the remaining STGT steps and advanced data analysis and theory development procedures, the outcomes cannot and should not be claimed as a mature theory. Why is this so? Where both STGT’s basic and advanced data analysis procedures are applied within other methods, the study enters a murky area where it becomes difficult to justify outcomes as grounded theories at par with those produced through the application of the full STGT method propelled by theoretical sampling and iterative and interleaved data collection and analysis. In such cases, reviewers may question the motivations for the mixed approach along with asking why the full STGT method was not applied given its core data analysis steps – often regarded as the most rigorous part – were applied. In these cases, researchers are strongly recommended to consider applying the full STGT method.
7.3 Ethical Concerns
Conducting STGT studies relies on collecting and analysing data, publicly available and custom collected for the study. An early ethical assessment should be conducted to consider potential ethical concerns with the type and means of data collection and reporting. Where applicable, approval should be gained from the relevant ethical committee.
Given the wide range of data sources and techniques STGT studies can leverage, it is critical that ethical concerns are explicitly addressed. For example, for data that is custom collected with the informed consent of participants – such as through interviews, focus groups, observations, surveys, questionnaires – conditions of participant confidentiality and anonymity can be maintained by obscuring the identifiable details and sharing selected sanitised parts of the underlying data when reporting findings.
However, this is challenging when working with public data, typically available through modern, digital sources such as project repositories, social media posts, and YouTube videos. Where possible, STGT researchers should seek explicit consent (e.g. requesting a speaker to use their interview or talk video on YouTube for research purposes) or obscure identifiable information, especially in case of potential harm arising from reported research (e.g. a study using public data as examples of ‘bad’ programming practices with traceable individual or organisational identities).
A number of ethics resources can be consulted. For example, (Singer and Vinson, 2002) and the articles contained in a special issue on research ethics (Singer and Vinson, 2001) provide numerous examples of ethical issues and practice recommendations specific to SE research such as informed consent, beneficence to individual and organisation (maximizing benefits and minimizing harm to society and participants), and confidentiality. Guidelines on ethical elicitation and responsible use of public data, e.g. social media ethics framework (Townsend and Wallace, 2021), ethical decision-making in internet research (of Internet Research, 2012) and more are also available. The Menlo (Dittrich et al., 2012) report lays down the ethical principles guiding information and communication technology research as respect for persons, beneficence, justice, respect for law, and public interest. It also shares helpful approaches to identifying and balancing potential benefits and risks. Gold and Krinke (2020) used the Menlo framework to review and discuss the ethical implications of mining software repository (MSR) research across sources such as IDE events, version control data, build logs, stack overflow, issue trackers, and mailing lists. They identified a need for open discussion of ethical issues in the MSR community. Similar efforts have been made in the open source community, e.g. (El-Emam, 2001). STGT researchers should consult and apply ethical research frameworks and guides applicable to the data sources, types, and collection techniques they intend to use.
7.4 Reporting STGT
Where a researcher has applied the full STGT method rigorously and produced outcomes in the form of dense categories and supported relationships between them, they can and should claim and present it as a mature theory. The mature theory, whether developed using an emergent or structured mode, is ready for reporting. Because of the size and density of a mature theory, it is best presented in a journal, thesis, or book format which allows for enough room to explain all its constituent parts in detail. However, a mature theory can also be presented in a shorter format such as a conference paper in a highly condensed form with an overarching view of the phenomenon (e.g. theory of becoming agile (Hoda and Noble, 2017)).
Feedback on emerging concepts, categories, and hypotheses is also important, both from the practitioner and research communities to assess and improve relevance and rigour respectively. STGT researchers can submit partial results in the form of an emerging or saturated (sub)category or categories, or preliminary theory to research workshops, conferences, and journals. In doing so, they should note the paper reports on a specific (sub)category or preliminary theory. Where a number of papers are prepared resulting from the same underlying study, which can easily happen with large STGT studies, they should present a high level overview of the larger study and how the different papers relate to each other. Similarly, the overarching STGT application, e.g. as a full STGT study, should be explained while clarifying parts being reported. STGT researchers are strongly recommended to present their emerging findings to practitioner groups, including both their participants and others not related to the study, to receive feedback on how well they resonate with practice. Peer review and practitioner feedback help identify theoretical gaps and can further guide theory development.
8 STGT – Evaluation Guidelines
Evaluating an STGT manuscript (report, article, or thesis) involves evaluating both the method application and outcomes. Below I present a set of criteria for evaluating STGT studies, including applications and outcomes. While general quality criteria expected of most research studies apply, I focus on specific criteria that apply to STGT studies. These do not aim to constitute a checklist approach and are deliberately nuanced to distinguish between method application and outcomes, and further between preliminary (or partial) and mature outcomes. Authors can use these as guidelines to ensure they present strong manuscripts and reviewers can use these as evaluation criteria. Table VI summarises the evaluation criteria for the method application and outcomes.
Method Application | Partial findings or Preliminary theory | Mature theory |
Credibility, Rigour | Originality, Relevance, space Density | Novelty, Usefulness, Parsimony, Modifiability |
8.1 Evaluating STGT Method Application
The research method is the bridge that connects the study aims to its outcomes. When a manuscript is submitted for review, it invites the reviewers to cross that bridge. If the methodology bridge is assessed as shaky or unreliable, it will raise concerns about the robustness of the study and trustworthiness of its outcomes. Conversely, exemplar studies include sufficient method application details (see Table I). A well done STGT study will display and can be evaluated on two overarching criteria – credibility and rigour.
• Credibility.
Method application details provide evidence of clear understanding and effective application. Referring to or restating method guidelines is not enough. How those guidelines were practically applied in the study should be described, preferably including examples of application, to establish credibility.
-
–
How were participants recruited?
-
–
What initial sampling technique was applied?
-
–
How was iterative and interleaved data collection and analysis ensured?
-
–
How were memos written and used?
For manuscripts presenting mature theories, additional information should be provided.
-
–
How was theoretical sampling applied?
-
–
How were the research protocols refined through the iterations?
-
–
Which mode of theory development was applied, emergent or structured?
-
–
How was theoretical saturation achieved?
-
–
What ontological and epistemological stances were used and why?
Depending on the ontological and epistemological stances, credibility may be assessed through additional criteria, e.g. replicability and reproducibility for positivist studies.
Lack of space in the manuscript is often cited as a justification for missing application details. Prioritising and treating the research methodology section as a first class citizen in the manuscript will enable enough room to discuss these aspects. Alternatively, supplementary materials can be used to add to the basic details covered in the manuscript.
• Rigour.
Sharing sufficient evidence of the underlying raw data, the data analysis procedures, and research artefacts, including evidence of theory development where applicable, within the manuscript is both important and possible. Providing enough and strong evidence helps establish both credibility and rigour of application.
-
–
Example of basic coding. The methodology section should include at least one working example of the basic data analysis stage, covering open coding and constant comparison, that demonstrates how raw data was analysed to produce codes, concepts, and (sub)category (or categories, if presenting more than one category.)
-
–
Embedding of sanitised evidence. Pertinent quotations and excerpts from the raw underlying data should be carefully interwoven with the description of the findings, ensuring all identifying details are removed or replaced with pseudo-names, to improve strength of evidence and add contextual depth and flavour. Similarly, photos from observations can be included after obfuscating faces and other identifying details.
-
–
Evidence of theory development. For manuscripts presenting mature theories, evidence of conducting the advanced theory development stage in emergent or structured mode should be supplied. For example, which coding structure was employed, where applicable.
Confidentiality and privacy concerns are often cited as justifications for not sharing evidence in qualitative studies. While these are very real ethical concerns, it does not equate to providing no evidence or weak evidence. Between complete disclosure that is impractical for qualitative studies and complete withholding of evidence, there is a balanced approach to establishing robustness through strength of evidence within the manuscript as described above. Evidence of memoing, additional working examples of the basic and advanced data analysis stages, descriptions of how a theoretical structure emerged or was applied can be included in supplementary materials.
8.2 Evaluating STGT Outcomes
STGT outcomes vary in nature. Partial findings (e.g. one category being reported in-depth) or preliminary theories (e.g. emerging hypotheses, propositions) result from the basic stage of a full STGT study or the limited application of STGT for data analysis with other methods. Mature theories result as final outcomes of full STGT studies. Different outcomes cannot be evaluated using a one-size-fits-all checklist.
Partial outcomes and preliminary theories should exhibit originality, relevance, and density.
• Originality.
Because an STGT study is purpose designed and executed, it is highly likely to produce original outcomes. Where originality is questioned, say because of how the outcomes map to or resemble existing research theories or practitioner models, originality should be demonstrated through: (a) evidence of a purpose-designed and executed STGT study and practical steps taken to avoid biases arising from personal researcher backgrounds and explaining possible interplay between the resulting theory and related practitioner literature and pre-existing theories, in case of a positivist stance or (b) discussion of potential biases and particular perspectives and their role in the construction of the theory, in case of a constructivist approach.
• Relevance.
Because an STGT study is empirically based, its findings should have relevance in same or similar contexts. Relevance is said to be achieved when feedback from participants and other practitioners serve to validate the findings. Reviewers can also independently assess relevance based on their wider knowledge of the domain. A side-effect of achieving relevance is that the findings can come across as “unsurprising” or “intuitive” to the experienced reader. A robust STGT application will ensure the findings, albeit intuitive, are never trivial or lacking depth.
• Density.
Density refers to the depth or richness of categories. Density is said to be achieved when a category is supported by multiple concepts and properties that capture a range of contexts and nuances. The descriptions of the categories include multiple pertinent examples and evidence from the underlying data. Categories derived from the application of STGT are generally denser than from descriptive or thematic analysis, reflecting the rigour of its data analysis procedures.
Mature theories resulting from full STGT studies need to be evaluated to higher standards, achieving the traits of novelty (beyond originality), usefulness (beyond relevance), parsimony (beyond density), and modifiability.
• Novelty.
Since the main aim of STGT is to develop new theories instead of verifying existing ones, novelty is a key criterion for evaluating a mature theory outcome. Novelty is said to be achieved when the theory presents unprecedented insights on a relatively under-researched phenomenon (e.g. (Masood et al., 2020a)) or a fresh approach to understanding a well-studied phenomenon (e.g. (Hoda and Noble, 2017)). In fact, novelty can be assumed as a default once credibility and rigour of application are established. If a theory is seen to have close resemblance to existing models from research or practitioner sources, its novelty and the role of relevant models/literature can be further scrutinised.
• Usefulness.
Mature theories go beyond being relevant and resonating with practice. They should provide actionable insights and recommendations for practice and preferably for research. Usefulness is said to be achieved when the possible applications and implications for research and practice are shared, e.g. (Hoda and Noble, 2017; Masood et al., 2020a)
• Parsimony.
Parsimony refers to the compactness of the theory such that it exhibits conceptual density and explains potentially complex phenomenon in simple and elegant ways. Parsimony is said to be achieved when the theory can be described in a compact way, in a few sentences or a succinct paragraph that captures all the key categories and the relationships between them (e.g. see theory of Scrum variations (Masood et al., 2020b)), and preferably depicted in an easy to understand visual format, e.g. (Hoda et al., 2011; Hoda and Noble, 2017; Masood et al., 2020a) while its full description demonstrates rich, nuanced, and often multi-faceted or multi-layered findings that are otherwise difficult to achieve with other methods.
• Modifiability.
Theories derived from an STGT method application represent the state of practice as a moment in time and are locally generalisable to the studied and similar contexts. In this sense, a credible and rigorously derived STGT theory can never be falsified. A robust theory stands the test of time through its ability to be modifiable in light of new evidences and new contexts. The generalisability of the theory improves as it evolves to accommodate data from varying contexts. Possibilities of deriving strong theories applicable in a wide set of contexts are explored next.
9 Vision and New Frontiers
This article is an attempt to acknowledge, highlight, and label the unique socio-technical research context of software engineering and introduce Socio-Technical Grounded Theory (STGT). The philosophical, methodological, and evaluation guidelines of STGT presented in this article will be particularly valuable to researchers who are new to theory development, struggling to understand and apply traditional GT methods, unaware or unclear about the different traditional GT versions and how to select one, unsure if they want to conduct a full GT study or only use its data analysis techniques, and unsure how to present and evaluate preliminary, partial, and mature findings of GT studies to a high standard. As shown in Table II, if applied well, STGT can help address each of these challenges.
Conversely, STGT may not be applicable in all research contexts. For example, where researchers are interested in studying the social nature of a phenomenon and not in its technical aspects, or where the phenomenon itself is primarily social, the traditional sociological GT methods may be better applicable.
In summary, the key contribution of this article are to,
- •
-
•
Present Socio-technical Grounded Theory (STGT) guidelines, including its unique socio-technical research context (as above), philosophical foundations (section 4.2 and Table III), methodological steps and procedures (detailed in sections 5 and 6, and summarised in ‘ 2 and Tables IV and V), guidelines on application, outcomes, and reporting (detailed in section 7 and captured in Figure 3) and evaluation guidelines (section 8 and Table VI).
The concepts introduced in this article push research boundaries into new frontiers in many ways.
-
•
Beyond software engineering: STGT has the potential to propel theory development in all contexts where the phenomenon can be best understood by analysing primarily qualitative data and in domains where socio-technical context plays a key role, e.g. SE and related socio-technical disciplines such as information systems, computer science, human computer interaction, artificial intelligence, and user experience research. With increasing proliferation of technology and digitilisation, it may also be useful for classic disciplines such as the social sciences, medicine, and education.
-
•
Beyond grounded theory: As the world becomes increasingly socio-technical, the framework of socio-technical research as defined in this article, holds relevance beyond STGT, for most research methods such as socio-technical case studies, survey research, and ethnography.
-
•
Beyond theory development: STGT encourages different levels of application, including full STGT study for theory development and STGT for data analysis, applied in a standalone capacity or in combination with other methods, as part of R&D studies or large programs.
-
•
Beyond local theories: Grounded theories are widely acknowledged to be local in application. While universality is not an aim or quality criteria of STGT studies, the relevance and generalisability of theories can be improved to cover a greater variety of underlying contexts, up to a point without losing contextual nuances, through the use of modern sources of data (e.g. GitHub, StackOverflow, Twitter), modern approaches to data collection (e.g. mining software repositories) and analysis (natural language processing, sentiment analysis) that make it possible to strengthen, deepen, and expand the scale of grounded theories.
-
•
Beyond manual theory development: Further advancements in technology and artificial intelligence offers unexplored potential in supplementing, augmenting, and automating parts of qualitative data analysis to ease human effort and improve both the quality and scale of theory development.
-
•
Beyond human learning: Developing theories helps build collective human knowledge and propel human learning. However, the fundamental systematic and rigorous steps and procedures of the STGT method, i.e. identifying patterns, insights, and relationships through constant comparison across datasets, can also be potentially harnessed as approaches to guide machine and deep learning in AI-based systems.
Acknowledgements
I sincerely thank Margaret-Anne Storey, Philippe Kruchten, John Grundy, Klaas-Jan Stol, Christoph Treude, Zainab Masood, Steve Adolph, Ingo Mueller, and Johannes Berglind Söderqvist for providing their thoughtful and invaluable feedback on the drafts. I am also grateful to James Noble and George Allan for their guidance in the early days of my GT journey. I acknowledge the invaluable contributions of Barney Glaser, Anselm Strauss, Juliet Corbin, and Kathy Charmaz to the Grounded Theory research community.
References
- (1)
- Adolph et al. (2011) Adolph, S., Hall, W. and Kruchten, P. (2011). Using grounded theory to study the experience of software development, Empirical Software Engineering 16(4): 487–513.
- Adolph et al. (2012) Adolph, S., Kruchten, P. and Hall, W. (2012). Reconciling perspectives: A grounded theory of how people manage the process of software development, Journal of Systems and Software 85(6): 1269–1286.
- Babchuk (1996) Babchuk, W. A. (1996). Glaser or strauss? grounded theory and adult education, Proceedings of the 15th Annual Midwest Research-to-Practice Conference in Adult, Continuing, and Community Education, ERIC, pp. 1–6.
- Baker et al. (1992) Baker, C., Wuest, J. and Stern, P. N. (1992). Method slurring: The grounded theory/phenomenology example, Journal of advanced nursing 17(11): 1355–1360.
- Becker (1993) Becker, P. H. (1993). Common pitfalls in published grounded theory research, Qualitative health research 3(2): 254–260.
- Bick et al. (2017) Bick, S., Spohrer, K., Hoda, R., Scheerer, A. and Heinzl, A. (2017). Coordination challenges in large-scale software development: a case study of planning misalignment in hybrid settings, IEEE Transactions on Software Engineering 44(10): 932–950.
- Bruscaglioni (2016) Bruscaglioni, L. (2016). Theorizing in grounded theory and creative abduction, Quality & Quantity 50(5): 2009–2024.
- Burch (2018) Burch, R. (2018). Charles Sanders Peirce, in E. N. Zalta (ed.), The Stanford Encyclopedia of Philosophy, winter 2018 edn, Metaphysics Research Lab, Stanford University.
- Calefato et al. (2018) Calefato, F., Lanubile, F., Maiorano, F. and Novielli, N. (2018). Sentiment polarity detection for software development, Empirical Software Engineering 23(3): 1352–1382.
- Charmaz (2006) Charmaz, K. (2006). Constructing grounded theory: A practical guide through qualitative analysis, Sage.
- Charmaz (2014) Charmaz, K. (2014). Constructing grounded theory, Sage.
- Coleman and O’Connor (2007) Coleman, G. and O’Connor, R. (2007). Using grounded theory to understand software process improvement: A study of irish software product companies, Information and Software Technology 49(6): 654–667.
- Coyne (1997) Coyne, I. T. (1997). Sampling in qualitative research. purposeful and theoretical sampling; merging or clear boundaries?, Journal of advanced nursing 26(3): 623–630.
- Dagenais et al. (2010) Dagenais, B., Ossher, H., Bellamy, R. K., Robillard, M. P. and De Vries, J. P. (2010). Moving into a new software project landscape, Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pp. 275–284.
- Dittrich et al. (2012) Dittrich, D., Kenneally, E. et al. (2012). The menlo report: Ethical principles guiding information and communication technology research, Technical report, US Department of Homeland Security.
- Douven (2017) Douven, I. (2017). Abduction, in E. N. Zalta (ed.), The Stanford Encyclopedia of Philosophy, summer 2017 edn, Metaphysics Research Lab, Stanford University.
- Easterbrook et al. (2008) Easterbrook, S., Singer, J., Storey, M.-A. and Damian, D. (2008). Selecting empirical methods for software engineering research, Guide to advanced empirical software engineering, Springer, pp. 285–311.
- Ebert (2018) Ebert, C. (2018). 50 years of software engineering: Progress and perils, IEEE Software 35(5): 94–101.
- El-Emam (2001) El-Emam, K. (2001). Ethics and open source, Empirical Software Engineering 6(4): 291.
- Fassinger (2005) Fassinger, R. E. (2005). Paradigms, praxis, problems, and promise: Grounded theory in counseling psychology research., Journal of counseling psychology 52(2): 156.
- Fernández and Passoth (2019) Fernández, D. M. and Passoth, J.-H. (2019). Empirical software engineering: from discipline to interdiscipline, Journal of Systems and Software 148: 170–179.
- Gibson and Hartman (2013) Gibson, B. and Hartman, J. (2013). Rediscovering grounded theory, Sage.
- Glaser (1978) Glaser, B. (1978). Theoretical sensitivity, Advances in the methodology of grounded theory .
- Glaser (1992) Glaser, B. G. (1992). Basics of grounded theory analysis: Emergence vs forcing, Sociology Press.
- Glaser (1998) Glaser, B. G. (1998). Doing grounded theory: Issues and discussions, Sociology Press.
- Glaser and Strauss (1966) Glaser, B. G. and Strauss, A. L. (1966). Awareness of dying, Transaction Publishers.
- Glaser and Strauss (1967) Glaser, B. G. and Strauss, A. L. (1967). Discovery of grounded theory: Strategies for qualitative research, Adline.
- Glaser and Strauss (1980) Glaser, B. G. and Strauss, A. L. (1980). Time for dying, Aldine.
- Glaser and Strauss (2011) Glaser, B. G. and Strauss, A. L. (2011). Status passage, Transaction Publishers.
- Glaser and Strauss (2017) Glaser, B. G. and Strauss, A. L. (2017). Discovery of grounded theory: Strategies for qualitative research, Routledge.
- Gold and Krinke (2020) Gold, N. E. and Krinke, J. (2020). Ethical mining: A case study on msr mining challenges, Proceedings of the 17th International Conference on Mining Software Repositories, pp. 265–276.
- Gregory et al. (2016) Gregory, P., Barroca, L., Sharp, H., Deshpande, A. and Taylor, K. (2016). The challenges that challenge: Engaging with agile practitioners’ concerns, Information and Software Technology 77: 92–104.
- Guba and Lincoln (1994) Guba, E. G. and Lincoln, Y. S. (1994). Competing paradigms in qualitative research, Handbook of qualitative research 2(163-194): 105.
- Haig (1995) Haig, B. D. (1995). Grounded theory as scientific method, Philosophy of Education 28(1): 1–11.
- Heath and Cowley (2004) Heath, H. and Cowley, S. (2004). Developing a grounded theory approach: a comparison of glaser and strauss, International Journal of Nursing Studies 41(2): 141–150.
- Hidellaarachchi et al. (2021) Hidellaarachchi, D., Grundy, J., Hoda, R. and Madampe, K. (2021). The effects of human aspects on the requirements engineering process: A systematic literature review, IEEE Transactions on Software Engineering .
- Hine (2008) Hine, C. (2008). Virtual ethnography: Modes, varieties, affordances, The SAGE handbook of online research methods pp. 257–270.
- Hoda and Noble (2017) Hoda, R. and Noble, J. (2017). Becoming agile: a grounded theory of agile transitions in practice, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), IEEE, pp. 141–151.
- Hoda et al. (2010) Hoda, R., Noble, J. and Marshall, S. (2010). Organizing self-organizing teams, Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pp. 285–294.
- Hoda et al. (2011) Hoda, R., Noble, J. and Marshall, S. (2011). The impact of inadequate customer collaboration on self-organizing agile teams, Information and Software Technology 53(5): 521–534.
- Hoda et al. (2012a) Hoda, R., Noble, J. and Marshall, S. (2012a). Developing a grounded theory to explain the practices of self-organizing agile teams, Empirical Software Engineering 17(6): 609–639.
- Hoda et al. (2012b) Hoda, R., Noble, J. and Marshall, S. (2012b). Self-organizing roles on agile software development teams, IEEE Transactions on Software Engineering 39(3): 422–444.
- Hoda et al. (2018) Hoda, R., Salleh, N. and Grundy, J. (2018). The rise and evolution of agile software development, IEEE software 35(5): 58–63.
- Hussain et al. (2020) Hussain, W., Perera, H., Whittle, J., Nurwidyantoro, A., Hoda, R., Shams, R. A. and Oliver, G. (2020). Human values in software engineering: Contrasting case studies of practice, IEEE Transactions on Software Engineering .
- James and Burkhardt (1975) James, W. and Burkhardt, F. (1975). Pragmatism, Vol. 1, Harvard University Press.
- Jantunen and Gause (2014) Jantunen, S. and Gause, D. C. (2014). Using a grounded theory approach for exploring software product management challenges, Journal of Systems and Software 95: 32–51.
- Kennedy and Lingard (2006) Kennedy, T. J. and Lingard, L. A. (2006). Making sense of grounded theory in medical education, Medical Education 40(2): 101–108.
- Kenny and Fourie (2014) Kenny, M. and Fourie, R. (2014). Tracing the history of grounded theory methodology: From formation to fragmentation, The Qualitative Report 19(52): 1.
- Kitchenham (2004) Kitchenham, B. (2004). Procedures for performing systematic reviews, Keele, UK, Keele University 33(2004): 1–26.
- Madampe et al. (2020) Madampe, K., Hoda, R. and Singh, P. (2020). Towards understanding emotional response to requirements changes in agile teams, New Ideas and Emerging Results track of the 42nd IEEE/ACM International Conference on Software Engineering, ICSE2020.
- Maglyas et al. (2013) Maglyas, A., Nikula, U. and Smolander, K. (2013). What are the roles of software product managers? an empirical investigation, Journal of Systems and Software 86(12): 3071–3090.
- Martin (2019) Martin, V. B. (2019). Using popular and academic literature as data for formal grounded theory, The Sage Handbook of Current Developments in Grounded Theory. Sage pp. 222–243.
- Masood et al. (2020a) Masood, Z., Hoda, R. and Blincoe, K. (2020a). How agile teams make self-assignment work: a grounded theory study, Empirical Software Engineering 25(6): 4962–5005.
- Masood et al. (2020b) Masood, Z., Hoda, R. and Blincoe, K. (2020b). Real world scrum a grounded theory of variations in practice, IEEE Transactions on Software Engineering .
- Melegati and Wang (2021) Melegati, J. and Wang, X. (2021). Surfacing paradigms underneath research on human and social aspects of software engineering, Proceedings of the 14th International Conference on Cooperative and Human Aspects of Software Engineering.
- Novielli et al. (2018) Novielli, N., Girardi, D. and Lanubile, F. (2018). A benchmark study on sentiment analysis for software engineering research, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), IEEE, pp. 364–375.
- of Internet Research (2012) of Internet Research, A. (2012). Ethical decision-making and internet research, http://aoir.org/reports/ethics2.pdf.
- Patton (1990) Patton, M. Q. (1990). Qualitative evaluation and research methods, Sage.
- Peirce (1960) Peirce, C. S. (1960). Collected papers of charles sanders peirce, Vol. 2, Harvard University Press.
- Perera et al. (2020) Perera, H., Hussain, W., Whittle, J., Nurwidyantoro, A., Mougouei, D., Shams, R. A. and Oliver, G. (2020). A study on the prevalence of human values in software engineering publications, 2015-2018, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), IEEE, pp. 409–420.
- Pink (2020) Pink, S. (2020). Doing visual ethnography, Sage.
- Prikladnicki et al. (2013) Prikladnicki, R., Dittrich, Y., Sharp, H., De Souza, C., Cataldo, M. and Hoda, R. (2013). Cooperative and human aspects of software engineering: Chase 2013, ACM SIGSOFT Software Engineering Notes 38(5): 34–37.
- Reichertz (2007) Reichertz, J. (2007). Abduction: The logic of discovery of grounded theory, Sage.
- Rennie et al. (1988) Rennie, D. L., Phillips, J. R. and Quartaro, G. K. (1988). Grounded theory: A promising approach to conceptualization in psychology?, Canadian Psychology/Psychologie canadienne 29(2): 139.
- Richardson and Kramer (2006) Richardson, R. and Kramer, E. H. (2006). Abduction as the type of inference that characterizes the development of a grounded theory, Qualitative Research 6(4): 497–513.
- Runeson and Höst (2009) Runeson, P. and Höst, M. (2009). Guidelines for conducting and reporting case study research in software engineering, Empirical Software Engineering 14(2): 131–164.
- Russo and Nuseibeh (2001) Russo, A. and Nuseibeh, B. (2001). On the use of logical abduction in software engineering, Handbook of Software Engineering and Knowledge Engineering: Volume I: Fundamentals, World Scientific, pp. 889–914.
- Sandelowski (1995) Sandelowski, M. (1995). Sample size in qualitative research, Research in Nursing & Health 18(2): 179–183.
- Schreiber et al. (2001) Schreiber, R. S., Stern, P. N. et al. (2001). Using grounded theory in nursing, Springer.
- Schwandt et al. (1994) Schwandt, T. A. et al. (1994). Constructivist, interpretivist approaches to human inquiry, Handbook of qualitative research 1: 118–137.
- Schwartz (1989) Schwartz, D. (1989). Visual ethnography: Using photography in qualitative research, Qualitative sociology 12(2): 119–154.
- Seaman (1999) Seaman, C. B. (1999). Qualitative methods in empirical studies of software engineering, IEEE Transactions on Software Engineering 25(4): 557–572.
- Sedano et al. (2017) Sedano, T., Ralph, P. and Péraire, C. (2017). Software development waste, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), IEEE, pp. 130–140.
- Shank (1998) Shank, G. (1998). The extraordinary ordinary powers of abductive reasoning, Theory & Psychology 8(6): 841–860.
- Sharp et al. (2016) Sharp, H., Dittrich, Y. and De Souza, C. R. (2016). The role of ethnographic studies in empirical software engineering, IEEE Transactions on Software Engineering 42(8): 786–804.
- Shastri et al. (2021a) Shastri, Y., Hoda, R. and Amor, R. (2021a). The role of the project manager in agile software development projects, Journal of Systems and Software 173: 110871.
- Shastri et al. (2021b) Shastri, Y., Hoda, R. and Amor, R. (2021b). Spearheading agile: the role of the scrum master in agile projects, Empirical Software Engineering 26(1): 1–31.
- Singer and Vinson (2001) Singer, J. and Vinson, N. G. (2001). Why and how research ethics matters to you, yes you!, Empirical Software Engineering 6(4): 287–290.
- Singer and Vinson (2002) Singer, J. and Vinson, N. G. (2002). Ethical issues in empirical studies of software engineering, IEEE Transactions on Software Engineering 28(12): 1171–1180.
- Sjøberg et al. (2008) Sjøberg, D. I., Dybå, T., Anda, B. C. and Hannay, J. E. (2008). Building theories in software engineering, Guide to advanced empirical software engineering, Springer, pp. 312–336.
- Sousa et al. (2018) Sousa, L., Oliveira, A., Oizumi, W., Barbosa, S., Garcia, A., Lee, J., Kalinowski, M., de Mello, R., Fonseca, B., Oliveira, R. et al. (2018). Identifying design problems in the source code: A grounded theory, Proceedings of the 40th International Conference on Software Engineering, pp. 921–931.
- Stol et al. (2016) Stol, K.-J., Ralph, P. and Fitzgerald, B. (2016). Grounded theory in software engineering research: a critical review and guidelines, Proceedings of the 38th International Conference on Software Engineering, pp. 120–131.
- Storey et al. (2020) Storey, M.-A., Ernst, N. A., Williams, C. and Kalliamvakou, E. (2020). The who, what, how of software engineering research: a socio-technical framework, Empirical Software Engineering 25(5): 4097–4129.
- Storey et al. (2010) Storey, M.-A., Treude, C., van Deursen, A. and Cheng, L.-T. (2010). The impact of social media on software engineering practices and tools, Proceedings of the FSE/SDP workshop on Future of software engineering research, pp. 359–364.
- Strauss and Corbin (1990) Strauss, A. and Corbin, J. (1990). Basics of qualitative research, Sage.
- Strauss and Corbin (1994) Strauss, A. and Corbin, J. (1994). Grounded theory methodology, Handbook of qualitative research 17(1): 273–285.
- Strauss (1987) Strauss, A. L. (1987). Qualitative analysis for social scientists, Cambridge university press.
- Stray et al. (2016) Stray, V., Sjøberg, D. I. and Dybå, T. (2016). The daily stand-up meeting: A grounded theory study, Journal of Systems and Software 114: 101–124.
- Timonen et al. (2018) Timonen, V., Foley, G. and Conlon, C. (2018). Challenges when using grounded theory: A pragmatic introduction to doing gt research, International Journal of Qualitative Methods 17(1): 1609406918758086.
- Townsend and Wallace (2021) Townsend, L. and Wallace, C. (2021). Social media research: A guide to ethics, https://www.gla.ac.uk/media/Media_487729_smxx.pdf.
- Waterman et al. (2015) Waterman, M., Noble, J. and Allan, G. (2015). How much up-front? a grounded theory of agile architecture, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1, IEEE, pp. 347–357.
- Watling and Lingard (2012) Watling, C. J. and Lingard, L. (2012). Grounded theory in medical education research: Amee guide no. 70, Medical Teacher 34(10): 850–861.
- Whitworth (2011) Whitworth, B. (2011). The social requirements of technical systems, Virtual Communities: Concepts, Methodologies, Tools and Applications, IGI Global, pp. 1461–1481.
- Wohlin and Aurum (2015) Wohlin, C. and Aurum, A. (2015). Towards a decision-making structure for selecting a research design in empirical software engineering, Empirical Software Engineering 20(6): 1427–1455.
- Wolfswinkel et al. (2013) Wolfswinkel, J. F., Furtmueller, E. and Wilderom, C. P. (2013). Using grounded theory as a method for rigorously reviewing literature, European journal of information systems 22(1): 45–55.
![]() |
Rashina Hoda is an Associate Professor of Software Engineering in the Faculty of Information Technology at Monash University, Melbourne. Rashina specialises in use of Grounded Theory in Software Engineering. Rashina received a distinguished paper award for her grounded theory of becoming agile at the IEEE International Conference on Software Engineering, ICSE2017. She serves on the Review Board of the IEEE Transactions on Software Engineering and the Advisory Board of the IEEE Software. She presented a Technical Briefing on “decoding grounded theory for software engineering” at ICSE2021. Rashina is currently writing a book detailing the Socio-Technical Grounded Theory method introduced in this article. For more information visit: www.rashina.com |