Coevolution of theoretical and applied research: a case study of graphene research by temporal and geographic analysis
Abstract
As a part of science of science (SciSci) research, the evolution of scientific disciplines has been attracting a great deal of attention recently. This kind of discipline level analysis not only give insights of one particular field but also shed light on general principles of scientific enterprise. In this paper we focus on graphene research, a fast growing field covers both theoretical and applied study. Using co-clustering method, we split graphene literature into two groups and confirm that one group is about theoretical research (T) and another corresponds to applied research (A). We analyze the proportion of T/A and found applied research becomes more and more popular after 2007. Geographical analysis demonstrated that countries have different preference in terms of T/A and they reacted differently to research trend. The interaction between two groups has been analyzed and shows that T extremely relies on T and A heavily relies on A, however the situation is very stable for T but changed markedly for A. No geographic difference is found for the interaction dynamics. Our results give a comprehensive picture of graphene research evolution and also provide a general framework which is able to analyze other disciplines.
1 Introduction
As an emerging field, science of science (SciSci) has attracted a great deal attention recently (Zeng et al., 2017; Fortunato et al., 2018; Wang and Barabási, 2021). In SciSci, science is treated as a complex system, which include ideas, papers, scientists, funding agencies and, more importantly, the connections among them. Using methods from complex system and complex network, researchers have revealed many interest findings of Science from data. They includes universal citation distribution (Radicchi et al., 2008), scientists’ career dynamics (Petersen et al., 2012), the role of team in science (Wuchty et al., 2007), just to name a few.
As a complex system, science ecosystem has obvious hierarchical structure: roughly speaking, science includes physical science and life science, physical science includes physics, astronomy, chemistry and earth science, physics includes mechanics, electromagnetism, thermodynamics, relativity, quantum physics and so on, electromagnetism itself also have its internal fine structure. At each hierarchical level, scientific elements (people, ideas, papers) are closely connected within disciplines, and loosely connected between disciplines. These naturally exist modular structure provide convenience to SciSci research since we can concentrate on particular discipline by assuming the influence from other disciplines is negligible. This kind of discipline level analysis not only give insight of one particular field but also shed light on general principles of scientific enterprise. Therefore, people’s interest on discipline analysis has been growing rapidly.
One of the earliest extensive studies on this subject was done by Bettencourt and Kaur (2011). The authors focus on the evolution of sustainability science, a new discipline emerged in 1980s. By analyzing a large corpus of relevant publications, they found that sustainability science has been growing explosively since its advent in the 1980s. This discipline has an unusual spatial distribution of its contributions: they are widely distributed in both developed counties and developing countries; the collaboration network has strong roots in national capital rather than traditionally more academic cities. To capture the main themes that define the filed, the authors decomposed the corpus into traditional disciplines and found they are integrated management of human, social, and ecological systems from an engineering and policy perspective. The work in Bettencourt and Kaur (2011) also revealed that the unification of sustainability science happened around the year 2000 by collaboration network analysis.
One recent discipline analysis look to artificial intelligence (AI) research (Frank et al., 2019). The authors tried to figure out whether AI research and relevant social science fields keep pace with each other. To answer this question, they used citation to track the communication between AI research and other fields. They analyzed citation flows from 1950 to 2018 and found these flow are neither constant nor symmetric. AI research cited philosophy, geography and art a lot in its early years, however current AI research cited mathematics and computer science most strongly. On the other hand, other fields didn’t cite AI research in proportion to its growing number of publications. There is an attention gap between AI research and social science.
Here, we conduct a discipline analysis of graphene research, a relatively young field focus on single layer of carbon atoms arranged in a two-dimensional honeycomb lattice. Since Novoselov et al. (2004) seminal work, which helps Andre Geim and Konstantin Novoselov won Nobel Prize in Physics in 2010, the interest in graphene has grown explosively and even led European Commission to fund Graphene Flagship with €1 billion budget (homepage, 2021a) and the creation of National Graphene Institute in UK (homepage, 2021b). Therefore, it is important and beneficial to understand graphene research as a emerging research discipline. It will not only increase our knowledge about scientific enterprise at discipline level, but also provide useful information of the whole discipline to PhD students, graphene scientists, universities and funding agencies.
Generally speaking, there are two reasons that attract people to do graphene research: 1) graphene is very strange from a theoretical perspective. It is the first example of two-dimensional atomic crystal and it was thought that such structure is thermodynamically unstable. The graphene’s electronic properties are peculiar: electrons behave as massless relativistic particles. 2) graphene is very attractive from an applied perspective. It has many properties which are better than those in all other materials. Hence, graphene has enormous potential to improve many applications, range from electronics to medicine. And these two reasons guide people go to different directions and form two branches: theoretical branch and applied branch. Although this division is well known in graphene research community, there is a lack of quantitative analysis from this perspective (Barth and Marx, 2008; Lv et al., 2011; Nguyen et al., 2020). To fill this gap, we collect graphene research literature and conduct a systematic analysis. First we identify these two branches by their word frequency, then validate the division result with keywords. We find that theoretical branch includes around one third of all papers and applied branch includes remaining two third. We check how these two branches developed temporally and geographically. The results show that yearly proportion of applied branches decreased from 2004 to 2007 and kept increasing since then. The research contribution mainly come from a few regions, and geographic distributions are quite different in theoretical and applied research. Finally we study the interaction between two branches and find theoretical research extremely relies on past theoretical research while applied research heavily relies on past applied research. Temporal analysis suggests that this self-dependence was very stable for theoretical research while grown significantly for applied research. And this trend holds for all main regions of graphene research.
This paper is organized as follows. In Sec. 2 we describe our dataset and the framework to quantify and analyze the inside coevolution. In Sec. 3 we present our main results, namely, the internal structure of graphene research and linguistic validation, spatial and temporal analysis of graphene publications and their interdependence, and also discuss the limitations and implications. Sec. 4 summarize our findings and suggests future perspectives.
2 Methodology
2.1 Dataset
To build the dataset, we chose ‘graphene’ as the topic keyword to search in the Web of Science Core Collection and obtained bibliographical records of 135,617 graphene-related journal papers in August 2018. These records have been used in our another paper (Nguyen et al., 2020) and interested readers can find more information about these records there. Web of Science may not cover every journal publication in this topic, however, given the wide coverage of Science Citation Index, most mainstream graphene papers should be included in our dataset.
There are various document types in 135,617 records: articles, proceeding papers, reviews, meeting abstracts, etc. Since the primary focus of this paper is coevolution between theoretical and applied branches in graphene research, we only included research articles with DOI names and publication years for our analysis. There are 115,988 remaining records and all analyses in this paper are done with them if not mentioned otherwise.
2.2 Co-clustering
To group these papers into clusters by their research topics, we applied a block diagonal co-clustering algorithm introduced by Ailem et al. (2015, 2016), namely CoClus, to divided papers into a number of non-overlapping clusters with their characteristic words. It is well known that papers with different research topics tend to have different word frequency features and researchers have demonstrated that Coclus algorithm can effectively co-cluster document-word matrices (Ailem et al., 2016). In this paper, we assume (a) the linguistic content inside the title and abstract is sufficient to tell the topic in each article, and (b) words that appear less than 0.01% of all records have insignificant impact on the clustering process. Generally speaking, CoClus algorithm aims to partition the object set I of size P and corresponding attribute set J of size N into non-overlapping clusters with high in-cluster density and low cross-cluster density. In our case, the goal is to split our data collection (papers and associated words) into groups and each group has a set of papers and a set of words. These words are used more frequently in the papers belong to the same group and less frequently in the papers belong to other groups.
Firstly, every article is represented by a N-dimensions vector :
(1) |
where is the number of feature words we considered in this study and is the count of word in article . By concatenating all paper vectors together, we constructed a paper-word matrix A:
(2) |
where represents the total number of word in the title and abstract of article . Then, the algorithm tries to cluster matrices effectively by introducing a block seriation :
(3) |
where if object and attribute belong to the same cluster and otherwise. Ailem et al. (2015) introduced a reformulated modularity as:
(4) |
And it turns out that partition with high in-cluster density and low cross-cluster density is equivalent to high and the that returns largest is the best partition we are looking for. The reformulated modularity has a linear dependence on , therefore the co-clustering task can now be regarded as an integer linear programming problem. A python package CoClust (Role et al., 2019) was used to find the partition that returns the largest reformulated modularity . Each paper or word has a cluster label in and we can easily construct any cluster by grouping papers or words with the same label . This is the main idea of CoClus algorithm and we refer the interested readers to (Ailem et al., 2015, 2016).
2.3 Keywords by Z-score
Co-clustering method can help us divide papers and words into several groups. However, it is not a panacea for all research problems. Apart from the technical perspective, the result of co-clustering heavily depends on how you abstract your research problem into mathematical form. If the abstraction is not reasonable, then co-clustering may not return meaningful results. Therefore it is very important to validate co-clustering result from different perspectives. In this study, we built a keyword list for each cluster. If the co-clustering works well, then one keyword list should have more “theoretical” words and another should have more “applied” words and this can be easily checked by anyone with basic science training. Beyond the validation, these keywords also give us a comprehensible overview of graphene research.
There are many ways to extract keywords from corpus (Berry and Kogan, 2010). The most straightforward option for us is to count all words in each cluster and pick most frequent words as keywords. However this naive method may take general words like “research”, “study”, “found”, “result” or even “graphene” which are extensively used in most papers. Such words are not informative as keywords for graphene research topics. Therefore, we use statistical significance of word occurrence to measure the correlation between words and clusters. The words with strong correlation are the keywords for that cluster.
To measure the correlation between word and cluster (with papers), we first count the number of papers in that contain in their titles and abstracts, refer to it as . We also measure frequency of in the whole data collection and denote it as . By assuming the distribution of follows a binomial function, the expected mean and variance of the number of records in that contain are:
(5) | |||
(6) |
Then, the z-score for word can be defined as:
(7) |
Apparently, a word that appears more often in a cluster rather than any others will have a high z-score in that cluster. On the other hand, words with high z-score do not necessarily appear often, as long as its frequency is higher than expected value. The top z-score words should be able to illustrate the topic of each community and we use them as keywords for each cluster.
2.4 Regional credit
As a high competitive field, many regions keep a important position for graphene when they make their research programme. For example, European Commission fund Graphene Flagship with €1 billion budget (homepage, 2021a), UK government fund National Graphene Institute with £38 million (homepage, 2021b) and South Korea government announced the “Technology Roadmap for Promoting Commercialization of graphene (2015–2020)” in 2015 . Since regions have different research traditions, resource and targets, it is not surprise that they may has different aims and preferences in graphene research. To examine this hypothesis, we counted authors’ affiliations of each paper and calculated regional credit accordingly. More specifically, if all authors of paper only have affiliations in region , then has full credit for this paper , that is 1. On the other hand, if paper is result of international collaboration, we first split the credit among all authors evenly. Then, for each author, we split his/her credit among all his/her affiliations uniformly. Finally the credits associated with each affiliation are added together to get regional credits to that paper. They can be fractions for regions. For instance, paper has three authors , and . has two affiliations, one in country and the other in country . has only one affiliation in country . has two affiliations both in . Then has credit , has credit and has credit .
2.5 T/A dependency
After graphene research literature be divided into theoretical and applied branches, a question emerges naturally: how did theoretical branch and applied branch influence each other and shape current graphene research? Although the existence of interplay between them is apparent, the extend and strength of interaction are far from obvious. Generally, theoretical research provides guideline for applied research and applied research contributes proving ground for theoretical research . Meanwhile, theoretical research inspires followed discussion of theoretical questions and applied research stimulate subsequent applied study for the common interest. Given those complicated interactions, a quantitative method is needed to measure the influence between theoretical research and applied research. Only then we can answer that question from our dataset.
In this study, we use citation to capture influence: paper A cites only theoretical papers means only theoretical research influence A, paper B cites only applied papers means only applied research influence B, paper C cites theoretical and applied papers means both theoretical and applied research influence C. At first glance, it is tempting to simply use proportion of reference to measure influence: if paper D cites 3 theoretical papers and 7 applied papers, then theoretical research has 30% influence and applied research has 70% influence over paper D. However, this method oversimplifies the process of knowledge accumulation: the 3 theoretical papers in D’s references may cite some applied papers and the 7 applied papers in D’s references may heavily depend on early theoretical works. Just counting references will miss this information, therefore, can not accurately reflect the interaction between T and A.
Inspired by persistent influence in della Briotta Parolo et al. (2020), we introduce a dependency factor pair of paper to describe its dependency on theoretical research () and applied research (). A paper with a pair means it receives 70% influence from theoretical research and 30% influence from applied research. satisfies following conditions: , , and for obvious reason. To avoid the oversimplification mentioned in previous paragraph, we split whole dependency into two parts: direct dependency () and indirect dependency (). The direct dependency describes paper’s reference proportion of T and A, while the indirect dependency capture those references’ dependency. The direct dependency is very straightforward: for any paper, we only need to check its reference: if of them are theoretical papers, then . We use indirect dependency to reflect explicit influence: paper A may only cites applied papers, however those applied papers cite many theoretical papers, therefore paper A benefits from theoretical research indirectly and this influence is captured by . For paper , its indirect dependency is defined as the average of all its references’ dependency factor, that is to say,
(8a) | |||
(8b) |
By combining the direct and indirect dependency, our definition can reflect paper’s reliance on theoretical and applied research more accurately and comprehensively.
After obtaining and , the direct and indirect dependency of a paper on T and A respectively, the overall dependency on T and A, and can be expressed in the following way:
(9a) | |||
(9b) |
where is the control parameter determining the mix ratio of direct and indirect dependency with constrain . By setting , dependency factor will reduce to only direct dependency, that is to say dependency factor will only reflect paper’s reference proportion in T/A. On the other hand, dependency factor will reduce to only indirect dependency with and reference proportion in T/A will not have explicit effect. These extreme cases are the oversimplification problem we have discussed in previous paragraphs. Through introducing , we are able to get over these oversimplification without loss of generality. The effect of is shown in Fig. 1. We discussed the numerical effect of is in SI and its value is set to in this study if not mentioned otherwise. It is worth noting that if a paper does not cite any paper in our dataset (only be cited), we can not define its dependency factor. We call such papers as ”root papers” and discuss about them in SI.

3 Results and discussion
We start our study by identifying theoretical and applied papers in our data collection. Given the complexity of research, this dichotomy may miss a little information since some papers are hard to be categorized. However, this classification is well accepted by graphene research community and our results are in good agreement with this convention. Dividing graphene literature into two branches can capture the most significant heterogeneity inside graphene research. Therefore, we use theoretical/applied dichotomy through this paper and call them T and A for short.
It is well known that word frequency changes significantly from field to field. We assume theoretical graphene papers tend to use a set of words frequently while applied graphene papers tend to use another set of words frequently and these two sets of words are distinct. Based on this assumption, we first counted all words in titles and abstracts of 115,988 papers through standard natural language processing techniques (tokenization, stopword filtering, stemming and so on). The words with frequency larger than 0.01% are selected as feature words and we have 12328 of them. Every article is represented by a 12328-dimensions vector and is the count of word in that article. Combining all paper vectors together, we constructed a paper-word matrix. Then the CoClus method in Sec. 2.2 was applied to extract no-overlapping clusters. In CoClus we must specified number of cluster first to find best partition with this . Normally, the number of clusters is unknown at the beginning. The common protocol is to repeat this process with different and choosing with highest modularity as the result. We run CoClus algorithm with and found the modularity peaks at two values and , see supplementary information for more details.
However, the cluster structure in paper-word network is quite fuzzy. More effort is needed to get the reasonable partition. By comparing best partitions under different , we noticed they share a common feature: one group repeatedly occurred in most partitions while other groups are not stable. It suggests that there exist a distinct boundary between that group and others while other detected structures are more or less the “overfitting”. To show the stability of this ”hyperstructure”, we compare the partitions with two highest modularity, namely, and . In case of , We named the stable group as I and other groups as II, III, IV for convenience. For the same reason, we named the stable group as 1 and other groups as 2, 3, 4, 5, 6 in case of . Since the sum of group II, III and IV is the complement of group I, we call it group I’ and also call the sum of group 2, 3, 4, 5, 6 as group 1’. We find group I has 43323 papers, group 1 has 42798 papers and they have 40406 papers in common; group I’ has 72665 papers, group 1’ has 73190 papers and they have 70293 in common. The situations are very similar for other values. Therefore, to avoid the “overfitting”, we use the “hyperstructure” with , namely group I and I’ as the partition result.
Using the keyword extraction method, we found the stable group is about theoretical research and we call it T. Other three groups under are (1) synthesis and functionalization, (2) supercapacitor, and (3) sensor. That also explain the fuzzy boundary between them since they are closer to each other than they are from theoretical research. Since they are application-oriented research comparing with T, we merged them into one cluster as A. There are 72665 papers, 8162 words in group A and 43323 papers, 4166 words in group T. We visualized the partition in Fig. 2: paper vectors in T are colored blue and paper vectors in A are colored red. We also sorted columns to make first 4166 columns represent words in group T and remaining 8162 columns represent words in group A. As illustrated by Fig. 2, it is apparent that first 4166 words are used more frequently in group T and other words are used more frequently in group A since these areas are darker than adjacent blocks. And this is exactly the aim of co-clustering: high in-cluster density and low cross-cluster density. It is worth noting that there are three subclusters inside A (these blocks are darker than other red area). They are more specific topics (1) synthesis and functionalization, (2) supercapacitor, and (3) sensor, respectively.

To validate the clustering result and have an intuitive picture of research topic, we built a keyword list for each cluster. To avoid statistical fluctuations, we only considered the words those are in top 2% sorted by frequency. The results are not sensitive to the quantile we chosen here as long as we dropped rare words (large fluctuations). These keywords are ranked by their z-scores and top 25 keywords for T and A are shown in Fig. 3. The lists are very informative: list A covers hot terms in applied graphene research, like electrochemical, cycles, supercapacitors, batteries, electrode, lithium and so on; list T covers key concepts in theoretical graphene research, such as Dirac, gap, spin, calculations, band, point, states and so on. Therefore, we can conclude confidentially that co-clustering method successfully divide graphene research literature into theoretical branch and applied branch. We also give two more extensive word clouds for T and A in SI.

Based on the co-clustering result, we first analyze the proportion of T and A. Among all graphene research papers, 62.6% belong to group A and 37.4% belong to group T. It suggests that graphene research attract attention from both theoretical and applied perspectives and both have made indispensable contribution to this field. Furthermore, this ratio is not constant over time. As shown in Fig. 4, the proportion of T increased from 70% to around 90% during 2004-2007, then gradually decreased to lower than 30% in 2017. (Our temporal analysis focus on papers since 2004 because that is the year graphene got global attention and papers about graphene before 2004 are rare, only about 0.5% in our dataset.) These curves indicate that at the early stage of graphene research, theoretical branch played a dominant role and gained even more popularity until 2007. After that applied branch grown relatively faster and became the majority after 2012 and this trend kept until 2017. The reason behinds this process is not clear. Our speculation is that after Novoselov et al. (2004) seminal paper, graphene research attract a great deal of attention. At that time, this area was still in it’s infancy and many theoretical questions remained to be answered, while preparation of graphene was difficult and expensive, applied research is only limited to few labs. So researchers published more theoretical papers than applied papers. As time elapsed, people gained more understanding of graphene, low-hanging fruit has been picked and theoretical questions became more difficult and time-consuming. On the contrary, technical advancement make preparation of graphene easier. It lowered research barrier and allowed more scientists joined the applied research. Also, more understanding of graphene inspires more application scenarios, which motivate more applied research. All of these factors together shape the curves in Fig. 4. Although this hypothesis remains to be validated, our finding that proportion of group A has increased steadily since 2007 provide a big picture of graphene research ecosystem for researchers, companies and funding agencies.

Beside temporal evolution, we also studied geographic distribution of graphene research. As a highly actively field with enormous economic potential, it attracts scientists and engineers all over the world. Affected by tradition, manpower and funding policy, regions may have different aims and preferences in graphene research. To validate this conjecture, we calculated regional credit using method in Sec. 2.4. By summing all papers’ credit distribution, we are able to measure each region’s contribution in the whole field. As shown in Fig. 5, Mainland China is the topmost player in both theoretical and applied research, with 22.4% share in T and 51.8% share in A. That means Mainland China’s contribution in applied research is even more than the sum of all other regions. The United States is also a big player with 18.9% share in T and 7.5% share in A. In contrast to Mainland China, US has more share in T than A. It suggests that Mainland China is more focus on applied graphene research while The United States takes a more balanced position. Furthermore, the composition of Fig. 5 (a) and (b) reveal a subtle difference between theoretical and applied research: there are 13 regions with at least 2% contribution in T while only 7 such regions in A. It suggests that applied research is less geographically diverse than theoretical research. The reason behinds is complicated, may due to economic factor, funding policy, research culture and so on. We leave it for future research.

So far we have studied the graphene research temporarily and geographically. If this two dimensions are combined together, we are able to analyze evolution of graphene research in particular regions. More concretely, for a given region in a given year we can calculate its credit in T () and A () by method in Sec. 2.4 using publications only in that year. Then the yearly proportion of T and A of that region in that year can be calculated as and . And we plot yearly proportion of T/A for six regions in Fig. 6. These six regions have at least 2% share in both theoretical and applied research (see Fig. 5), therefore, are considered as top players in graphene research. The curves in Fig. 6 can be thought of as regional version or components of Fig. 4 and it shows that all six regions follow the same general trend in Fig. 4: gradually decrease of theoretical research and increase of applied research. However, that shift occurred slower in The United States than in other regions. It suggests that research community in The United States still put considerable resource in theoretical research. This finding is very important to understand graphene research competition among regions and make funding policy.

Like most, if not all, science research fields, both theoretical and applied branches are indispensable parts of graphene research. Each of them has its own mission and focus. At the same time, they both rely on the communication with each other: theoretical research gets feedback from applied research, applied research receives guideline from theoretical research. This inside coevolution is extremely important for any science research fields. If this coevolution mechanism does not work well, that research field will experience certain difficulty to move forward, like Aristotelian Physics or science in ancient non-western civilizations. Given such importance, we quantified the inside coevolution in graphene research in terms of interplay between T and A. The dependency factors were computed for all papers, then the average values were calculated for papers in group T and A respectively. As shown in Table 1, both T and A rely more on itself than others. However, the difference is obvious: T is critically depend on T (90%) while A is relatively rely on A (69%). This result illustrates a remarkable difference between T and A: theoretical research is mostly driven by other theoretical research, on the other hand applied research pays fairly high attention to theoretical research. On possible interpretation of this difference is that: theoretical graphene research have achieved significant progress and its current effort is beyond existing technology, while applied research benefits a lot from theoretical research and as the result A cites T, directly and indirectly, a lot. More evidences are need to verify this explanation, however, given the fact that in history of science theory is ahead of application at most of time , we believe it is a plausible explanation.
Theoretical (T) | Applied (A) | |
---|---|---|
Theoretical (T) | 0.90 | 0.31 |
Applied (A) | 0.10 | 0.69 |
The numbers in Table 1 is aggregated results of papers published in all years. Even though they provides many insights for us, the temporal information is lost. To get a better understanding of coevolution process, we calculated the average values of T/A dependency for T/A papers in each year and plot them in Fig. 7. As shown in that figure, dependency curves behavior significantly different between theoretical and applied group: the curves in T is very stable, while the curves in A underwent dramatic changes during that time. From 2004 to 2006, A’s dependency on T increased from 0.3 to 0.7 and gradually decreased afterward. That is to say applied graphene research is mainly driven by theoretical research at early stage and become more motivated by it self as time goes on. This result suggests that the inside coevolution is not a “static equilibrium”, but rather a “dynamic process”. Not only proportions of T and A change with time (Fig. 4), the interaction between them also changes. More work is needed to fully understand the reason behind these changes, and our finding can serve as the basis for further research.

We have already shown the geographically difference in Fig. 6. Does such effect also happen in T/A dependency? To answer this questions, we grouped all papers according to authors’ affiliation region, and calculated their T/A dependency of T/A. We only consider papers with all authors in the same region since it is not clear to attribute dependency to regions with international collaborated papers. However, we found that single region papers are very similar to international collaborated papers in terms of dependency evolution. Therefore, the result here will not change much after international collaborated papers are included. Please see SI for more details. The results of topmost six regions are shown in Fig. 8. Unlike Fig. 6, all regions in Fig. 8 show roughly the same behavior, even for Mainland China and The United States. This suggests that the trend we found in Fig. 7 is a universal phenomenon for graphene research and is insensitive to geographic factor.

4 Conclusion
Discipline-level analysis give many insights of scientific enterprise, also offer important reference for practical purpose, like career decision, hiring decision, funding policy and so on. Some important works have been done with SciSci paradigm, for example the evolution of sustainability science, the structure and evolution of physics, and the development of artificial intelligence (Bettencourt and Kaur, 2011; Sinatra et al., 2015; Frank et al., 2019).
In this study, we complemented previous studies by investigating evolution of graphene research in terms of its main components: theoretical branch and applied branch. Using the co-clustering method we divided the graphene publication collection into two groups. By extracting each group’s keywords, we confirmed that one group is about theoretical research and the other does applied research. Overall, 37.4% of papers belong to T and remaining 62.6% are members of A. However, these ratios are not constants over time: the proportion of T increased to around 90% in 2007 and gradually decreased afterward, while the proportion of A did just opposite. It suggests that applied research grown faster than theoretical research and the attention of graphene research community shift gradually from theory to application. By analyzing authors’ affiliations, we computed region credit for every paper and all regions total contribution. The distribution of contribution is very different in T and A: many regions made significant contribution in T while Mainland China is dominant in A. And the evolution curves also show significant difference among regions. Using the dependency factor we invented, the reliance between theoretical research and applied research is quantified. We found that such dependency is asymmetric: theoretical research is extremely influenced by itself while applied research benefits from T and A in a more balanced way. Such dependency relation is very stable for theoretical research while changed significantly for applied research. And we found this phenomenon is insensitive to geographic factor, which suggests it is a universal process.
Although many interest findings were observed, several important questions remain to be answer. For instance, graphene papers were classified either as theoretical research or applied research in this study. However, this dichotomy may fail for some papers since collaboration between theorists and experimentists become common nowadays and it is inaccurate to put those paper either in T or A. In other words, an overlapping-clusters picture of graphene research may be a better description of reality . Future work should take overlapping into account given the importance of collaboration in modern scientific enterprise. In this study all graphene papers are treated with equal importance. This choice simplifies our analysis, but introduces deviation from reality: a few papers receive most attention. It would be beneficial to incorporate this fact into our framework and new results may give us a better picture of graphene research.
5 Acknowledgements
This research is supported by the Singapore Ministry of Education Academic Research Fund, under the grant number MOE2017-T2-2-075.
6 Author Contributions
Conceived and designed the analysis: Wenyuan Liu.
Collected the data: Ai Linh Nguyen.
Performed the analysis: Ai Linh Nguyen.
Wrote the paper: Ai Linh Nguyen, Wenyuan Liu and Siew Ann Cheong
7 Appendix A. Region code
CN: Mainland China. US: The United States. JP: Japan. DE: Germany. KR: South Korea. IR: Iran. IN: India. RU: Russia. GB: The United Kingdom. FR: France. ES: Spain. IT: Italy. SG: Singapore. TW: Taiwan.
References
- Ailem et al. (2015) Melissa Ailem, François Role, and Mohamed Nadif. Co-clustering document-term matrices by direct maximization of graph modularity. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 1807–1810, Melbourne Australia, October 2015. ACM. ISBN 9781450337946. doi: 10.1145/2806416.2806639. URL https://dl.acm.org/doi/10.1145/2806416.2806639.
- Ailem et al. (2016) Melissa Ailem, François Role, and Mohamed Nadif. Graph modularity maximization as an effective method for co-clustering text data. Knowledge-Based Systems, 109:160–173, October 2016. ISSN 09507051. doi: 10.1016/j.knosys.2016.07.002. URL https://linkinghub.elsevier.com/retrieve/pii/S0950705116302064.
- Barth and Marx (2008) Andreas Barth and Werner Marx. Graphene - A rising star in view of scientometrics. arXiv:0808.3320 [cond-mat, physics:physics], September 2008. URL http://arxiv.org/abs/0808.3320. arXiv: 0808.3320.
- Berry and Kogan (2010) Michael W Berry and Jacob Kogan. Text mining applications and theory. 2010. ISBN 9780470689653 9780470749821 9780470689646. URL https://nbn-resolving.org/urn:nbn:de:101:1-201501026976. OCLC: 719451203.
- Bettencourt and Kaur (2011) L. M. A. Bettencourt and J. Kaur. Evolution and structure of sustainability science. Proceedings of the National Academy of Sciences, 108(49):19540–19545, December 2011. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1102712108. URL http://www.pnas.org/cgi/doi/10.1073/pnas.1102712108.
- della Briotta Parolo et al. (2020) Pietro della Briotta Parolo, Rainer Kujala, Kimmo Kaski, and Mikko Kivelä. Tracking the cumulative knowledge spreading in a comprehensive citation network. Physical Review Research, 2(1):013181, February 2020. ISSN 2643-1564. doi: 10.1103/PhysRevResearch.2.013181. URL https://link.aps.org/doi/10.1103/PhysRevResearch.2.013181.
- Fortunato et al. (2018) Santo Fortunato, Carl T. Bergstrom, Katy Börner, James A. Evans, Dirk Helbing, Staša Milojević, Alexander M. Petersen, Filippo Radicchi, Roberta Sinatra, Brian Uzzi, Alessandro Vespignani, Ludo Waltman, Dashun Wang, and Albert-László Barabási. Science of science. Science, 359(6379):eaao0185, March 2018. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.aao0185. URL https://www.sciencemag.org/lookup/doi/10.1126/science.aao0185.
- Frank et al. (2019) Morgan R. Frank, Dashun Wang, Manuel Cebrian, and Iyad Rahwan. The evolution of citation graphs in artificial intelligence research. Nature Machine Intelligence, 1(2):79–85, February 2019. ISSN 2522-5839. doi: 10.1038/s42256-019-0024-5. URL http://www.nature.com/articles/s42256-019-0024-5.
- homepage (2021a) Graphene Flagship homepage, 2021a. URL https://graphene-flagship.eu/.
- homepage (2021b) National Graphene Institute homepage, 2021b. URL https://www.graphene.manchester.ac.uk/about/ngi/.
- Lv et al. (2011) Peng Hui Lv, Gui-Fang Wang, Yong Wan, Jia Liu, Qing Liu, and Fei-cheng Ma. Bibliometric trend analysis on global graphene research. Scientometrics, 88(2):399–419, August 2011. ISSN 1588-2861. doi: 10.1007/s11192-011-0386-x. URL https://doi.org/10.1007/s11192-011-0386-x.
- Nguyen et al. (2020) Ai Linh Nguyen, Wenyuan Liu, Khiam Aik Khor, Andrea Nanetti, and Siew Ann Cheong. The golden eras of graphene science and technology: Bibliographic evidences from journal and patent publications. Journal of Informetrics, 14(4):101067, November 2020. ISSN 17511577. doi: 10.1016/j.joi.2020.101067. URL https://linkinghub.elsevier.com/retrieve/pii/S1751157719303542.
- Novoselov et al. (2004) K. S. Novoselov, A. K. Geim, S. V. Morozov, D. Jiang, Y. Zhang, S. V. Dubonos, I. V. Grigorieva, and A. A. Firsov. Electric field effect in atomically thin carbon films. Science, 306(5696):666–669, October 2004. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1102896. URL https://www.sciencemag.org/lookup/doi/10.1126/science.1102896.
- Petersen et al. (2012) A. M. Petersen, M. Riccaboni, H. E. Stanley, and F. Pammolli. Persistence and uncertainty in the academic career. Proceedings of the National Academy of Sciences, 109(14):5213–5218, April 2012. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1121429109. URL http://www.pnas.org/cgi/doi/10.1073/pnas.1121429109.
- Radicchi et al. (2008) F. Radicchi, S. Fortunato, and C. Castellano. Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences, 105(45):17268–17272, November 2008. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.0806977105. URL http://www.pnas.org/cgi/doi/10.1073/pnas.0806977105.
- Role et al. (2019) François Role, Stanislas Morbieu, and Mohamed Nadif. coclust : a python package for co-clustering. Journal of Statistical Software, 88(7), 2019. ISSN 1548-7660. doi: 10.18637/jss.v088.i07. URL http://www.jstatsoft.org/v88/i07/.
- Sinatra et al. (2015) Roberta Sinatra, Pierre Deville, Michael Szell, Dashun Wang, and Albert-László Barabási. A century of physics. Nature Physics, 11(10):791–796, October 2015. ISSN 1745-2473, 1745-2481. doi: 10.1038/nphys3494. URL http://www.nature.com/articles/nphys3494.
- Wang and Barabási (2021) Dashun Wang and Albert-László Barabási. The science of science. Cambridge University Press, Cambridge, 2021. ISBN 9781108492669. URL https://www.cambridge.org/core/books/science-of-science/572A745A6F97B55A263F5E86225E3F70.
- Wuchty et al. (2007) S. Wuchty, B. F. Jones, and B. Uzzi. The increasing dominance of teams in production of knowledge. Science, 316(5827):1036–1039, May 2007. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1136099. URL https://www.sciencemag.org/lookup/doi/10.1126/science.1136099.
- Zeng et al. (2017) An Zeng, Zhesi Shen, Jianlin Zhou, Jinshan Wu, Ying Fan, Yougui Wang, and H. Eugene Stanley. The science of science: From the perspective of complex systems. Physics Reports, 714-715:1–73, November 2017. ISSN 03701573. doi: 10.1016/j.physrep.2017.10.001. URL https://linkinghub.elsevier.com/retrieve/pii/S0370157317303289.