This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Large-Scale Assessment of Labour Market Dynamics in China during the COVID-19 Pandemic

Ying Sun1, Hengshu Zhu2, Hui Xiong1 1The thrust of Artificial Intelligence, The Hong Kong University of Science and Technology, Guangzhou, 511455, China
2Baidu Talent Intelligence Center, Baidu Inc., Beijing, 100085, China
[email protected], [email protected], [email protected]
(2018)
Abstract.

The outbreak of the COVID-19 pandemic has had an unprecedented impact on China’s labour market, and has largely changed the structure of labour supply and demand in different regions. It becomes critical for policy makers to understand the emerging dynamics of the post-pandemic labour market and provide the right policies for supporting the sustainable development of regional economies. To this end, in this paper, we provide a data-driven approach to assess and understand the evolving dynamics in regions’ labour markets with large-scale online job search queries and job postings. In particular, we model the spatial-temporal patterns of labour flow and labour demand which reflect the attractiveness of regional labour markets. Our analysis shows that regional labour markets suffered from dramatic changes and demonstrated unusual signs of recovery during the pandemic. Specifically, the intention of labour flow quickly recovered with a trend of migrating from large to small cities and from northern to southern regions, respectively. Meanwhile, due to the pandemic, the demand of blue-collar workers has been substantially reduced compared to that of white-collar workers. In addition, the demand structure of blue-collar jobs also changed from manufacturing to service industries. Our findings reveal that the pandemic can cause varied impacts on regions with different structures of labour demand and control policies. This analysis provides timely information for both individuals and organizations in confronting the dynamic change in job markets during the extreme events, such as pandemics. Also, the governments can be better assisted for providing the right policies on job markets in facilitating the sustainable development of regions’ economies.

copyright: nonejournalyear: 2018doi: XXXXXXX.XXXXXXXconference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NYprice: 15.00isbn: 978-1-4503-XXXX-X/18/06

1. Introduction

In the past decades, the rapid growth of the Chinese economy has benefited tremendously from its prosperous labour market (Fang and Dewen, 2008). Indeed, the sustainable development of regional economics has been largely determined by the level of talent attraction in the region (Hanushek and Kimko, 2000; Hendricks, 2002; Bonaventura et al., 2021). However, the labour market is dynamic in nature and is vulnerable and sensitive to the policy change and the complicated social environment (Pries and Rogerson, 2005; Guirao et al., 2017). The outbreak of the COVID-19 pandemic has had an unprecedented impact on the labour markets worldwide (Albanesi and Kim, 2021; Brodeur et al., 2021; Lee, 2021). The structure of labour supply and demand in different regions of China has also been drastically changed by this pandemic. To this end, in this paper, we provide a data-driven fine-grained analysis to assess the regional labour market dynamics in China during the COVID-19 pandemic, which can benefit both individuals and organizations in confronting the uncertainty brought by the change of policy and society in the post-pandemic times. Meanwhile, this analysis can provide timely information to governments for better real-time policy-making in the macro regulation of labour market sustainable development.

Indeed, most of existing studies for labour market analysis are based on the census and survey data (Boustan et al., 2010; Schwandt and Von Wachter, 2019; Fortin et al., 2021). However, due to the high cost of data collection, these data cannot support the fine-grained and timely analysis of regional labour market dynamics. For example, it is infeasible to collect sufficient regular data to capture the monthly or quarterly city-level labour migrants during the pandemic. Fortunately, with the rapid prevalence of online search engine and recruitment services, it is possible for us to study this problem with alternative large-scale data sources. Indeed, the search engine queries can naturally reflect users’ demand and intention (Ginsberg et al., 2009; Cousins et al., 2020), which are widely used for crowd behavior profiling. The rationale behind is that when people have job transfer intention, many of them will seek for job information through the online search engines. Therefore, by mining the cross-city job seeking behaviors in the search engine queries, we can collect the labour flow intention data in a cost-effective way. Meanwhile, the massive job posting data in the online recruitment services can be naturally used for understanding the fine-grained labour demand structure and distribution in the labour markets of different regions.

The data used in this paper contain more than 2 billion online job search queries and 400 million job postings in China, across a time span of 21 months from September 2019 to June 2021 (see details in Methods). By using a combination of data mining approaches, we can effectively model the spatial-temporal patterns of labour flow and labour demand which reflect the attractiveness of regional labour market. Specifically, we first build a city-level labour flow intention graph by mining the spatial-temporal information in job search queries. Based on this graph, we can discover the overall labour agglomeration patterns in China, such as communities, black holes and volcanoes (Hong et al., 2015; Li et al., 2010), by using link analysis and modularity-based clustering algorithms (Newman, 2006). Then, based on the Hyperlink-Induced Topic Search (HITS) algorithm (Kleinberg et al., 1998), we can capture the temporal changes of regional labour market attractiveness from the labour flow intention during different periods of the pandemic, including lockdown (2020Q1), wide reopen (2020Q2), and recovery under regular pandemic control (after 2020Q2). Our analysis shows that the regional labour markets suffered from dramatic changes and demonstrated unusual signs of recovery during the pandemic. Specifically, the intention of labour flow quickly recovered with a trend of deviating from large to small cities and from northern to southern regions, respectively. Furthermore, with the text-clustering method (Beil et al., 2002), we can classify job postings published in different regions into representative labour demands (e.g., blue collar and white collar) and analyze the change of regional labour demand during the pandemic. It is observed that the pandemic sharply decreased the labour demand of blue collar workers relative to that of white collar. In particular, the demand structure of blue collar jobs also changed from manufacturing to service industries. Our findings reveal that the pandemic might cause varied impacts on regions with different structures of labour demand and control policies.

2. Results

To capture the labour market dynamics during the pandemic, we propose to construct a labour flow intention graph, where nodes denotes cities and edges indicate labour flow intention between corresponding cities. Specifically, we exploited four classic link analysis metrics for measuring the labour market dynamics, including two local metrics Inflow and Outflow, and two global metrics Authority and Hub obtained by HITS algorithm. In particular, both Inflow and Authority indicate the strength of labour importation, while both Outflow and Hub indicate the strength of labour exportation.111Additional maps, terms, and figures can be found in supplementary information.

2.1. The overview of labour flow intention

Based on the labour flow intention graph, we first sketch the characteristics of regional labour markets in China from 2019Q4 to 2021Q2. Generally, our analysis reveals that the labour flow intention is highly correlated with regional economic development.

First, developed cities have more active labour flow intentions. Specifically, we can observe that all the metric scores distribute unevenly among cities and are strongly correlated with regional GDP 222Data source: https://www.ceicdata.com/(as shown in Table 1). This indicates that the developed cities usually have higher labour importation and exportation compared with other cities. Consequently, the distribution of labour flow intention is extremely imbalanced between eastern and western China. In western cities, few Inflow nor Outflow intentions exist, implying the inactive labour flow caused by their low development level and inconvenient transportation.

Second, regional labour agglomeration exists in China. Specifically, the modularity-based city clustering results (see Methods for details) indicate that cities have formed hierarchical communities of labour flow intention. Indeed, some clusters are provinces (e.g., Shandong province), reflecting inner-province labour exchange, while some clusters consist of multiple provinces (e.g., Jiangsu-Anhui-Zhejiang-Shanghai area), reflecting inter-province labour exchanges. Indeed, by adjusting the resolution, the detected communities can be further split or merged, reflecting the hierarchy of labour flow intention. In each community, most cities are Volcanoes (i.e., the Outflow score is larger than Inflow score), while only a few cities are Black holes (i.e., the Inflow score is larger than Outflow score), implying labours were hierarchically gathering in a few big cities.

Table 1. The correlation between labour flow intention and regional GDP in 2020. For each metric, we calculated Pearson, Spearman, and Kendall correlation correlations.
Score Pear. pp-value Spear. pp-value Kend. pp-value
Inflow 0.950 <0.001<0.001 0.896 <0.001<0.001 0.737 <0.001<0.001
Outflow 0.916 <0.001<0.001 0.915 <0.001<0.001 0.753 <0.001<0.001
Hub 0.819 <0.001<0.001 0.900 <0.001<0.001 0.738 <0.001<0.001
Authority 0.911 <0.001<0.001 0.882 <0.001<0.001 0.717 <0.001<0.001

Third, the labour flow intention implies potential regional brain drain. All the cities in Heilongjiang and Jilin provinces are Volcanoes, while other provinces usually have some Black holes. Worse still, their capital cities (i.e., Harbin and Changchun) are among their largest Volcanoes, implying significant brain drain in Northeast China (Zhou et al., 2018). Nevertheless, Liaoning province, another province in Northeast China, has two Black holes (i.e., Shenyang and Dalian) to gather labours. Meanwhile, we find that cities in big urban agglomerations might also meet brain drain. Specifically, Yangtze River Delta Urban Agglomeration (YRDUA), Pearl River Delta Urban Agglomeration (PRDUA), and Beijing-Tianji-Hebei Urban Agglomeration (BTHUA) are three major urban agglomerations in China (Wei, 2009) . While Black holes usually have scattered spatial distribution in China, these urban agglomerations have Black hole groups with 4, 3, and 2 Black holes, respectively. In these areas, a few large Volcanoes (e.g., Shaoxing in YRDUA, Dongguan in PRDUA, and Langfang in BTHUA) may supply labours for many Black holes and easily suffer a brain drain. In addition, Chengdu-Chongqing area forms a Black hole group in Southwest China, which is known as the Chengyu urban agglomeration (Chen et al., 2020). As this area develops further, nearby cities should pay attention to prevent the potential brain drain, especially Leshan, which is currently the largest Volcano within.

2.2. Assessing the dynamics of labour flow intention during the pandemic

Here, we use the Authority and Hub as major dynamic indicators to analyze the change of labour flow intention during the pandemic. The reason is that the absolute Inflow and Outflow score can be influenced by many factors (i.e., promotion of the search engine) and cannot reflect the real trend of labour flow intention. In contrast, Authority and Hub scores are based on normalized global link analysis that are robust to reveal the periodicity of labour flow intention. Specifically, for new tier 1 and tier 2 cities, Authority scores increased both from 2019Q4 to 2020Q1 and from 2020Q4 to 2021Q1, and decreased both from 2020Q1 to 2020Q2 and from 2021Q1 to 2021Q2, while the Hub scores show contrary trend. For lower tiers, the periodicity has a converse shape from higher tiers. Indeed, as labours return to hometowns for spring festival in Q1, flow intentions from small cities to big cities increase, implying a large labour flow after the holiday. Based on the periodicity, we can capture the changes of regional labour attractiveness during different periods of the pandemic, including lockdown (2020Q1), wide reopen (2020Q2), and recovery under regular pandemic control (after 2020Q2). Our analysis shows that the regional labour markets suffered from dramatic changes and demonstrated unusual signs of recovery during the pandemic.

First, the labour flow intention deviated from large cities to small cities during the pandemic. Over the year, the Authority scores of tier 1 cities decreased while tier 3 to tier 5 cities increased. Considering the periodicity, we further assess the increase ratio of Authority score and Hub score from 2019Q4 to 2020Q4. Specifically, the Authority scores decreased by 10% by median in tier 1 cities. In contrast, they increased significantly in cities below tier 2. For new tier 1 and tier 2 cities, although the median change is nearby 0, they generally have positive distribution. Different from Authority, although there were sharp fluctuations during the peak of the pandemic (i.e., 2020Q1), there shows no obvious annual change of Hub scores. In particular, the increase ratio from 2019Q4 to 2020Q4 distribute around 0. This implies that, although dramatically influenced by the pandemic in the short term, labour exportation generally recovered by the end of 2020. Nevertheless, labour flow intention from higher tiers to lower tiers significantly increased.

Second, the labour flow intention deviated towards south during the pandemic. We can observe that labour flow intention generally decreased in North China. Especially, the Authority scores of all the major cities in Bohai rim (Dang et al., 2020) decreased, including Beijing (20.5%), Qingdao (10.9%), Yantai (7.9%), Dalian (7%), Shijiazhuang (6.9%), Tianjin (6.8%), Jinan (3.0%), and Shenyang (2.0%). Moreover, the Hub scores decreased throughout North China. Considering the correlation between labour flow intention and regional development, this may not be a good sign for economic development in Northern China. In contrast, labour flow intention increased in South China. In particular, apart from the small cities, the Authority scores of new tier 1 cities also increased, such as Changsha (16.9%), Chengdu (10.5%), and Foshan (7.1%). Moreover, the Hub scores increased in South China, which even radiated northwestern areas (e.g., cities in Shaanxi province). These observations imply the general increase of labour flow activeness in South China.

Third, the labour flow intention sharply fluctuated during the peak of the pandemic, while high flow intention appeared from small cities to nearby big cities. During 2020Q1, since the Chinese government took a strict pandemic control policy, the labour market has been dramatically influenced. As labour flow intention in most cities except for tier 1 largely recovered by the end of 2020, we compare the labour flow intention in 2020Q1 with 2021Q1. In small cities (i.e., below tier 2), Hub scores increased while Authority scores decreased. In contrast, in big cities, Authority scores increased while Hub scores decreased. Indeed, compared with 2021Q1, flow intention from small cities (i.e., lower than tier 2) to large cities significantly increased in 2020Q1. Especially, new tier 1 suffered the sharpest fluctuation in Authority scores (i.e., 10% higher than 2021Q1). In contrast, the Authority scores of tier 1 did not increase much from 2019Q4 to 2020Q1, implying labours paid extra attention to their nearby big cities instead of the super cities.

Fourth, the labour flow intention quickly recovered from 2020Q2, while it is faster in South China than North China. During 2020Q2, the Chinese government gradually lifted the lockdown policy as the pandemic got controlled. Similar to 2020Q1, we also assessed the deviation of labour flow intention in 2020Q2 by comparing the same quarter in the next year (i.e., 2021Q2). Specifically, the median deviation of Authority scores quickly dropped to nearby 0 during this period, especially for new tier 1 and tier 2 cities. The difference of Hub scores still significantly decreased for new tier 1 (above 10%) and tier 2 (above 15%) cities, although it also became closer to 0 than 2020Q1, implying that the labour exportation needed a longer time to recover. Notably, labour flow intention recovered faster in southern areas than northern areas of China. Specifically, the increase of Authority scores was centralized in PRDUA and gradually decayed to further areas, causing a general Authority increase in Guangdong province, both including typical big cities such as Shenzhen (40%), Guangzhou (40%), Dongguan (46.5%), and Foshan (60%), and small cities such as Zhongshan (47.9%), Huizhou (38.5%), and Qingyuan (39.0%). Accompanying with the increase of Authority, the Hub also burst in the southern areas, implying strong labour exportation strength. In contrast, the Authority and Hub scores were still low in North China, implying a slower recovery of labour flow intention. In particular, while Authority scores increased in some northern cities annually, they still showed a general decrease of Authority in 2020Q2.

2.3. Assessing the dynamics of labour demand during the pandemic

Here, we analyze the temporal changes of labour demand in different cities, through counting the number of job postings published in different regional labour markets. To avoid the semantic gap in different textual descriptions, we first classified job postings into representative labour demands (see details in Methods), such as blue and white collar. We can observe that regional labour demand structure has largely changed during the pandemic.

First, the pandemic sharply decreased the labour demand of blue collar workers relative to that of white collar. From 2019Q4 to 2020Q4, the job postings for white collars increased by 18.3% for the whole country, 13.5% for tier 1, 29.5% for new tier 1, and 23.0% for tier 2. Indeed, even during the lockdown period (i.e., 2020Q1), the demand of white collars still increased. Compared to white collars that have stable jobs and can work at home, blue collars got a more severe influence. Specifically, from 2019Q4 to 2020Q4, the demand of blue collars decreased by 60% for the whole country, 29.1% for tier 1, 44.4% for new tier 1, 55.8% for tier 2, 59.6% for tier 3, 63.3% for tier 4, and 67.9% for tier 5.

Second, the demand structure of blue collar jobs changed from manufacturing to service industries during the pandemic. Specifically, we find that the manufacturing related labour demand got the most severe influence and continuously decreased throughout the year. From 2019Q4 to 2020Q4, the demand decreased by 80.1% in the whole country, 64.9% for tier 1, 77.3% for new tier 1, 82.1% for tier 2, 80.5% for tier 3, 80.7% for tier 4, and 82.3% for tier 5. Especially, the decrease from 2020Q1 to 2020Q2 was the fastest, with a ratio of 60% for the whole country. Indeed, many factories broke down during the pandemic in China. Although the decrease slowed down after 2020Q3, the demand remained low and showed no significant increase till 2021Q2. This implies Chinese manufacturing industry needs a longer time to recover after the pandemic. In contrast, express related labour demand had a burst with the reopen policy and changed less from the annual view. As a life-service industry, express quickly recovered after the lockdown period, especially in big cities. This led to the burst of express-related labour demand. Specifically, from 2020Q1 to 2020Q2, the express related labour demand increased by 106% for the whole country, 172% for tier 1, 179% new tier 1, 51.8% for tier 2, 52.0% for tier 3, 57.5% for tier 4, and 46.7% for tier 5. From 2019Q4 to 2020Q4, the express related labour demand decreased by only 17.4% for the whole country, 25.2% for tier 1, 31.4% for new tier 1, 13.6% for tier 2, 13.7% for tier 3, 9.6% for tier 4, and 4.1% for tier 5. Different from other blue collar demands, the passenger transport related demands also increased during the pandemic. Especially, from 2019Q4 to 2021Q2, this demand increased by 181.2% for tier 1, 129.3% for new tier 1, 139.4% for tier 2, 171.2% for tier 3, 151.2% for tier 4, 98.3% for tier 5, and 141.0% for the whole country. This largely raised the whole blue collar demand, especially for small cities. Indeed, the blue collar related labour demand started recovering in 2021 (with 57.1% in the whole country till 2020Q2), where about 50% were covered by passenger transport-related jobs. As a result, proportion of manufacturing continuously decreased in blue collar demands, while ratio of service continuously increased. Specifically, in 2019Q4, the manufacture, express, and passenger transport jobs occupied 72.7%, 14.2%, and 13.1% of the blue collar demands in the whole country, respectively. In 2021Q2, they became 29.4%, 20.3%, and 50.3%, respectively.

Third, the pandemic had different impact on cities with different industrial structures and blue collar demands. Blue collar demand in big cities recovered fast with the reopening policy. Specifically, from 2020Q1 to 2020Q2, it increased by 151.0% and 74.7% for tier 1 and new tier 1 cities, respectively. In contrast, it decreased 17.1%, 30.9%, 38.6%, and 46.3% for tier 2, tier 3, tier 4, and tier 5 cities, respectively. The reason is that bigger cities usually have higher proportion of express jobs in their blue collar demands. Indeed, according to our analysis, in 2019Q4, express demand occupied 69.1%, 36.2%, and 17.3% in blue collar demands of the tier 1, new tier 1, and tier 2 cities, while less than 10% for lower tier cities. In contrast, small cities had a higher proportion (above 70% in 2019Q4) of manufacturing related labour demands, which are significantly higher than tier 1 (23.2%) and new tier 1 cities (49.8%). Therefore, the pandemic’s big impact on manufacturing industry has made blue collar demand difficult to recover in small cities. As a result, in the short term, a larger proportion blue collar demand are occupied by big cities. Specifically, in 2020Q2, the tier 2 and above cities occupied 14.6% of blue collar demands, which is nearly four times of it in 2019Q4 (3.7%). During this period, big cities quickly increased their attractiveness for blue collars. Nevertheless, with the decrease of express related labour demand in big cities from 2020Q3 and the increase of passenger-transport related labour demand in small cities from 2021Q1, the proportion of different tiers in blue collar demands kept recovering.

Fourth, the distribution of labour demand in China showed geographical deviation. From 2019Q4 to 2020Q4, white collar demands deviated to South China while blue collar demands deviated to North China. In particular, the white collar demands in the Bohai rim (especially BTHUA) significantly decreased, especially in Beijing (15.2%). In contrast, the southern urban agglomerations (i.e., PRDUA and YRDUA) increased their white collar demands. In these areas, the demand of many smaller cities increased more than that of super cities. For example, Suzhou (71.7%), Wuxi (84.3%), Dongguan (126%), and Foshan (95%) increased more than Shanghai (28.1%), Guangzhou (22.1%), and Shenzhen (29.8%). This caused South China to occupy a higher ratio of white collar demands. Although blue collar demands decreased all over China, it is slower in North China than in South China, especially Henan, Shandong, Heilongjiang, and Hebei provinces. Then, North China occupies a higher ratio of blue collar demands.

3. Discussion

To get more insights into the geographical change of labour flow intention, we further discuss the labour demand change in major urban agglomerations during the pandemic.

On white collar labour demands, PRDUA and YRDUA had a similar trend, with two peaks in 2020Q1 and 2020Q3. Indeed, Q1 and Q3 are usually white collar recruitment seasons. This implies that, during the peak of the pandemic, companies in southern urban agglomerations were still working well and had regular recruitment. However, BTHUA only had one peak in 2020Q3.

This implies that BTHUA missed the spring recruitment in 2020Q1 for the tight lockdown policy. This might cause the white collars to flow to YRDUA and PRDUA. On blue collar demand, YRDUA got the most severe impact. Indeed, among the three agglomerations, YRDUA has the most blue collar demands, which was over twice than those of BTHUA and PRDUA in 2019Q4. Worse still, according to our analysis, nearly 70% its blue collar demands were manufacturing. Indeed, as the core base of manufacturing industry in China, YRDUA has a long history of developing heavy production entailing metals, automobile, electrical equipment, and machinery (Feng et al., 2019). With the pandemic’s big influence on manufacture, there was a dramatic drop in blue collar demands in YRDUA. As a result, YRDUA largely lost its previous high attractiveness for blue collars. In contrast, blue collar demand in PRDUA depends less on manufacturing (i.e., less than 55% in 2019Q4), of which a large portion (i.e., above 35% in 2019Q4) is express. In 2020Q2, the burst of express demand made up the loss of blue collar demand in manufacturing. Manufacturing workers who lost their job can find a job in the express industry. Moreover, potentially related to the less strict pandemic control policy, PRDUA burst much more express demands than the others. Specifically, PRDUA originally has a similar number of express demands with BTHUA. However, from 2019Q4 to 2020Q2, express demand in PRDUA increased by over four times, while BTHUA and YRDUA only doubled. This caused the burst of blue collar demand in PRDUA. It thus attracted a large number of blue collar workers’ flow intention. According to the above observations, PRDUA got the least impact on both white collar and blue collar demands, both for political reasons and its industrial structures. As a result, PRDUA gained much attractiveness for labours in the southern and western regions of China. This can be an important reason for the long-term geographical deviation of labour flow intention in China.

In summary, our findings revealed that the pandemic might cause varied impacts on regions with different structures of labour demand and control policies. This analysis can provide timely information for both individuals and organizations in confronting the uncertainty brought by the change of policy and society in the post-pandemic times, and meanwhile benefits governments for better real-time policy-making in the macro regulation of labour market sustainable development.

4. Methods

In this part, we introduce the methods used in this paper.

4.1. Data

The data used in this paper mainly contain two parts: (1) Online search query logs. We use anonymous query logs collected from the most popular search engine in China, across a time span ranging from October 2019 to June 2021. Each record consists of timestamp, location coordinate, query text, and url of the clicked web page. Through the url, we collected the titles of the clicked web pages. (2) Job posting data. The job posting data are collected from major Chinese online recruitment platforms. The time span ranges from October, 2019 to June, 2021. Each record consists of publish timestamp, working city, title, and job description.

4.1.1. Data Pre-Processing

We filter job search queries and recognize their origin, destination, and time information as follows:

Job Search Query Filtering. We filter queries whose contents or clicked titles contain Chinese words “recruitment” or “job hunting”. Then, we deleted duplicated queries.

Search Location Mapping. To recognize the origin city of the flow intention, we map the geographic coordinate into cities. Specifically, we regard cities as polygons and find out the one that contains the coordinate. We judge if a point is in a polygon with computation geometry algorithms in Shapely (Gillies et al., 07) package.

Destination Mapping. To find out the destination cities, we recognize the places users mentioned or clicked. Specifically, we first build a dictionary containing the official names and alias of Chinese provinces, cities, and districts. Then, we match the texts with words in the dictionary using Aho-Corasick algorithm (Aho and Corasick, 1975). This algorithm supports fast multi-pattern string matching with O(N)O(N) time complexity, where NN denotes the length of the text. In particular, some words may mean multiple places. For example, Chaoyang might mean Chaoyang district in Beijing or Chaoyang city in Liaoning province. Therefore, we set three rules to decide the priority of cities for each query: (1) The place that locates in the same province as the origin has higher priority. (2) The place with higher administrative level has 2nd higher priority. (3) The place that is closer to the origin has 3rd higher priority. We check the three rules until we can decide the priority. After assigning each appeared word with a place, if there are still multiple mentioned places, we choose the one with the lowest administrative level.

Cross-City Query Filtering. To recognize cross-city labour flow intentions, we drop queries that cannot indicate city-level destination. In particular, district-level places are mapped into their cities. Then, the queries whose origins and destinations are different indicate cross-city labour flow intentions.

Due to large-scaled data, we implemented these operations with Apache Spark (Meng et al., 2016) distributed computing framework. After all the above pre-processings, we obtained over 200 millions of cross-city flow intention records in total.

4.2. Job Posting Clustering

We recognize the labour demand of the job postings with a textual-based clustering method.

4.2.1. Keyword Dictionary Generation.

We use jieba (Sun, 2012), a well-known Chinese word cutting tool, to cut job titles into words. After dropping descriptive words and single-character words, we drop low-frequency (less than 1,000) words and high-frequency words (top-50). After manually deleting words that are irrelevant with jobs, we obtain a dictionary with 1,159 words.

4.2.2. Keyword-based Clustering.

For each job posting, we build a vector xdNx^{d}\in\mathbb{R}^{N} to represent the dd-thth job advertisement, where NN means the size of the dictionary, the element xid=fidkfkdx^{d}_{i}=\frac{f^{d}_{i}}{\sum_{k}{f^{d}_{k}}}, fidf^{d}_{i} indicates the frequency of the ii-thth keyword in this title. Then, we use KMeans (Bahmani et al., 2012; Hämäläinen et al., 2021), a commonly used clustering algorithm, to cluster the vectors. In particular, we use its Spark MLlib (Meng et al., 2016) implementation for parallelized clustering.

4.3. Measurements of Labour Flow Intention

We measure labour flow intention on labour flow intention graph with degree centrality and HITS (Kleinberg et al., 1998).

4.3.1. Labour Flow Intention Graph Formulation

In each quarter tt, we build a labour flow graph Gt=<Vt,Et,Wt>G^{t}=<V^{t},E^{t},W^{t}>, where each node vVtv\in V^{t} denotes a city, each directed edge <o,d>Et<o,d>\in E^{t} denotes labour flow intention from origin oo to destination dd. In WtW^{t}, we weight each edge by the number of flow intention records between this origin-destination (OD) pair. Furthermore, we represent the graph with an adjacency matrix FtN×N.F^{t}\in\mathbb{R}^{N\times N}. If <i,j>Et<i,j>\in E^{t}, Fi,jtF^{t}_{i,j} equals the edge weight of OD pair <i,j><i,j>, otherwise Fi,jt=0F^{t}_{i,j}=0.

4.3.2. Degree-based Centrality

Based on the labour flow intention graph, we calculate Inflow and Outflow of a city ii in quarter tt as int(i)=kFi,ktin^{t}(i)=\sum_{k}F^{t}_{i,k} and outt(i)=kFk,itout^{t}(i)=\sum_{k}F^{t}_{k,i}, respectively. To better reveal labour gathering, we mine Black holes and Volcanoes (Hong et al., 2015; Li et al., 2010) on the labour flow intention graph. In particular, Black holes are cities with high net Inflow int(i)=int(i)outt(i)in^{t}(i)=in^{t}(i)-out^{t}(i)). In contrast, Volcanoes are cities with high net Outflow noutt(i)=nint(i)nout^{t}(i)=-nin^{t}(i).

4.3.3. HITS

Multi-hop labour flow may exist across cities. For example, labours from a small city may first flow to medium cities before further flowing to big cities. Therefore, we use HITS (Kleinberg et al., 1998), a widely-adopted link-based node importance analysis algorithm, to measure labour flow intention considering multi-hop transitions. Specifically, HITS iteratively measure two traits, namely Authority and Hub, with the basic idea that (1) nodes with high Hub links to nodes with high Authority; and (2) nodes with high Authority are linked by nodes with high Hub. In this work, Authority reflects the strength of labour attraction from important cities while Hub reflects the strength of labour exporting to important cities. Formally, we normalize FtF^{t} by row and form an adjacency matrix PN×NP\in\mathbb{R}^{N\times N}, where Pi,j=Fi,jk=1NFi,k.P_{i,j}=\frac{F_{i,j}}{\sum^{N}_{k=1}F_{i,k}}. Each row of PP can be regarded as a city’s flow-out distribution to other cities. The Authority ANA\in\mathbb{R}^{N} and Hub HNH\in\mathbb{R}^{N} should satisfy H=PAH=PA and A=PTH.A=P^{\mathrm{T}}H.

4.4. Modularity-based Clustering

Graphs with high modularity have dense inner-cluster node connections but sparse inter-cluster connections. Specifically, it measures the concentration of inner-cluster edges compared with random distribution of edges regardless of clusters. Formally, given a cluster assignment 𝒞\mathcal{C} of a graph FF, modularity is defined as Q=12mi,j(Fi,jkikj2m)𝕀{𝒞(i)==𝒞(j)},Q=\frac{1}{2m}\sum_{i,j}(F_{i,j}–\frac{k_{i}k_{j}}{2m})\mathbb{I}\{\mathcal{C}(i)==\mathcal{C}(j)\}, where m=i,jFi,jm=\sum_{i,j}F_{i,j} denotes the summation of edge weights, ki=j(Fi,j+Fj,i)k_{i}=\sum_{j}(F_{i,j}+F_{j,i}) denotes the degree of the ii-thth node, 𝒞(i)\mathcal{C}(i) indicates the cluster of node ii. Modularity-based clustering (Newman, 2006) aims to maximize QQ, so that nodes in the same cluster are densely connected. In this paper, widely-adopted greedy algorithm, Louvain (Blondel et al., 2008), for optimization. In Louvain, two steps are iteratively performed to enlarge modularity. Specifically, the first step is to find local communities by selectively merging neighboring nodes to enlarge modularity. The second step is to update the network by aggregating each community into one node. In particular, a super-parameter named resolution will control the scale of the clusters.

References

  • (1)
  • Aho and Corasick (1975) Alfred V Aho and Margaret J Corasick. 1975. Efficient string matching: an aid to bibliographic search. Commun. ACM 18, 6 (1975), 333–340.
  • Albanesi and Kim (2021) Stefania Albanesi and Jiyeon Kim. 2021. Effects of the COVID-19 Recession on the US Labor Market: Occupation, Family, and Gender. Journal of Economic Perspectives 35, 3 (2021), 3–24.
  • Bahmani et al. (2012) Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Scalable k-means++. arXiv preprint arXiv:1203.6402 (2012).
  • Beil et al. (2002) Florian Beil, Martin Ester, and Xiaowei Xu. 2002. Frequent term-based text clustering. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 436–442.
  • Blondel et al. (2008) Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.
  • Bonaventura et al. (2021) Moreno Bonaventura, Luca Maria Aiello, Daniele Quercia, and Vito Latora. 2021. Predicting urban innovation from the US Workforce Mobility Network. Humanities and Social Sciences Communications 8, 1 (2021), 1–9.
  • Boustan et al. (2010) Leah Platt Boustan, Price V Fishback, and Shawn Kantor. 2010. The effect of internal migration on local labor markets: American cities during the Great Depression. Journal of Labor Economics 28, 4 (2010), 719–746.
  • Brodeur et al. (2021) Abel Brodeur, David Gray, Anik Islam, and Suraiya Bhuiyan. 2021. A literature review of the economics of COVID-19. Journal of Economic Surveys 35, 4 (2021), 1007–1044.
  • Chen et al. (2020) Yizhong Chen, Hongwei Lu, Jing Li, and Jun Xia. 2020. Effects of land use cover change on carbon emissions and ecosystem services in Chengyu urban agglomeration, China. Stochastic Environmental Research and Risk Assessment 34 (2020), 1197–1215.
  • Cousins et al. (2020) Henry C Cousins, Clara C Cousins, Alon Harris, and Louis R Pasquale. 2020. Regional infoveillance of COVID-19 case rates: analysis of search-engine query patterns. Journal of medical internet research 22, 7 (2020), e19483.
  • Dang et al. (2020) Yunxiao Dang, Li Chen, Wenzhong Zhang, Dan Zheng, and Dongsheng Zhan. 2020. How does growing city size affect residents’ happiness in urban China? A case study of the Bohai rim area. Habitat International 97 (2020), 102120.
  • Fang and Dewen (2008) Cai Fang and Wang Dewen. 2008. Impacts of internal migration on economic growth and urban development in China. In Migration and Development Within and Across Borders. 245.
  • Feng et al. (2019) Delian Feng, Qun Chen, Malin Song, and Lianbiao Cui. 2019. Relationship between the degree of internationalization and performance in manufacturing enterprises of the Yangtze river delta region. Emerging Markets Finance and Trade 55, 7 (2019), 1455–1471.
  • Fortin et al. (2021) Nicole M Fortin, Thomas Lemieux, and Neil Lloyd. 2021. Labor market institutions and the distribution of wages: The role of spillover effects. Journal of Labor Economics 39, S2 (2021), S369–S412.
  • Gillies et al. (07 ) Sean Gillies et al. 2007–. Shapely: manipulation and analysis of geometric objects. https://github.com/Toblerity/Shapely
  • Ginsberg et al. (2009) Jeremy Ginsberg, Matthew H Mohebbi, Rajan S Patel, Lynnette Brammer, Mark S Smolinski, and Larry Brilliant. 2009. Detecting influenza epidemics using search engine query data. Nature 457, 7232 (2009), 1012–1014.
  • Guirao et al. (2017) Begoña Guirao, Antonio Lara-Galera, and Juan Luis Campa. 2017. High Speed Rail commuting impacts on labour migration: The case of the concentration of metropolis in the Madrid functional area. Land Use Policy 66 (2017), 131–140.
  • Hämäläinen et al. (2021) Joonas Hämäläinen, Tommi Kärkkäinen, and Tuomo Rossi. 2021. Improving scalable K-means++. Algorithms 14, 1 (2021), 6.
  • Hanushek and Kimko (2000) Eric A Hanushek and Dennis D Kimko. 2000. Schooling, labor-force quality, and the growth of nations. American economic review 90, 5 (2000), 1184–1208.
  • Hendricks (2002) Lutz Hendricks. 2002. How important is human capital for development? Evidence from immigrant earnings. American Economic Review 92, 1 (2002), 198–219.
  • Hong et al. (2015) Liang Hong, Yu Zheng, Duncan Yung, Jingbo Shang, and Lei Zou. 2015. Detecting urban black holes based on human mobility data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. 1–10.
  • Kleinberg et al. (1998) Jon M Kleinberg et al. 1998. Authoritative sources in a hyperlinked environment.. In SODA, Vol. 98. Citeseer, 668–677.
  • Lee (2021) Yong-Kwan Lee. 2021. The Impact of COVID-19 on the Working Conditions of Wage Workers-Focusing on Differences by Employment Types. Journal of Labour Economics 44, 2 (2021), 71–90.
  • Li et al. (2010) Zhongmou Li, Hui Xiong, Yanchi Liu, and Aoying Zhou. 2010. Detecting blackhole and volcano patterns in directed networks. In 2010 IEEE International Conference on Data Mining. IEEE, 294–303.
  • Meng et al. (2016) Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al. 2016. Mllib: Machine learning in apache spark. The Journal of Machine Learning Research 17, 1 (2016), 1235–1241.
  • Newman (2006) Mark EJ Newman. 2006. Modularity and community structure in networks. Proceedings of the national academy of sciences 103, 23 (2006), 8577–8582.
  • Pries and Rogerson (2005) Michael Pries and Richard Rogerson. 2005. Hiring policies, labor market institutions, and labor market flows. Journal of Political Economy 113, 4 (2005), 811–839.
  • Schwandt and Von Wachter (2019) Hannes Schwandt and Till Von Wachter. 2019. Unlucky cohorts: Estimating the long-term effects of entering the labor market in a recession in large cross-sectional data sets. Journal of Labor Economics 37, S1 (2019), S161–S198.
  • Sun (2012) J Sun. 2012. Jieba chinese word segmentation tool. Accessed: Jun 25 (2012), 2018.
  • Wei (2009) Wang Wei. 2009. A comparative study on eco-spatial morphological features of the three major urban agglomerations in China. In Urban Planning Forum, Vol. 1. 46–53.
  • Zhou et al. (2018) Yang Zhou, Yuanzhi Guo, and Yansui Liu. 2018. High-level talent flow and its influence on regional unbalanced development in China. Applied geography 91 (2018), 89–98.