∎

¹¹footnotetext: School of Science, Beijing Jiaotong University, China²²footnotetext: Guanghua School of Management, Peking University, China¹¹institutetext: Marui Du ²²institutetext: ²²email: [email protected]

A Meta Path Based Evaluation Method for Enterprise Credit Risk

Marui Du ¹ Yue Ma ² Zuoquan Zhang ¹

(Received: date / Accepted: date)

Abstract

Nowadays small and medium-sized enterprises have become an essential part of the national economy. With the increasing number of such enterprises, how to evaluate their credit risk becomes a hot issue. Unlike big enterprises with massive data to analyze, it is hard to find enough information of small enterprises to assess their financial status. Limited by the lack of primary data, how to inference small enterprises’s credit risk from secondary data, like information of their upstream, downstream, parent and subsidiary enterprises, attracts big attention from industry and academy. Targeting on accurately evaluating the credit risk of the small and medium-sized enterprise (SME), in this paper, we exploit the representative power of Information Network gupta2017heteclass on various kinds of SME entities and SME relationships to solve the problem. A novel feature named meta path feature proposed to measure the credit risk, which makes us able to evaluate the financial status of SMEs from various perspectives. Experiments show that our method is effective to identify SME with credit risks.

Keywords:

Credit Risk DetectionHeterogeneous Information Network Meta Path Enterprise Evaluation

1 Introduction

Small and medium-sized enterprise (SME) is one of backbones in the national economy, whose development directly affects it. However, due to the incomplete management system and the lack of appropriate financial indicators, the credit risk assessment process is usually time-consuming, and the evaluation result is often of low accuracy. Therefore, in this paper, we are going to propose an appropriate method of credit risk assessment to target this problem.

Industry and academy always have a critical focus on how to measure enterprise credit risk. Conventional approaches of assessment mainly extract enterprise-related features, such as financial indicators, to predict enterprise solvency. However, with the expansion of global market size in recent years, conventional approaches have lost their power of discrimination in the situations, where relations and interactions between SMEs are numerous and complicated. An SME’s financial status can be easily affected by some actions from its related other SMEs. For example, the contagion risk is caused by associated credit entities, which besets many SMEs with the risk of default even in good financial conditions. Therefore, rather than single financial indicators, relations and interactions between SMEs should be paid more attention in studying SME credit risk.

To model the relations and interactions, various entities and their relationships can be considered in the information networks gupta2017heteclass . In the previous, most of researchers studied the above problem with a homogeneous information network sun2013mining consisting only one single relation type and one entity type. However in SME setting, the structure of homogeneous information network may be a bit simple to explain the relationships between SMEs. To not lose important information, a heterogeneous information network 2016A with complicated graph structure is more suitable to study the interaction between SMEs. In the heterogeneous information network, meta paths (MP) 2016A are taken as a fundamental data structure to capture semantical relationships between entities. Through MP, complicated relationships between entities can be systematically and concisely defined. The path provides a clear view of how entities interact mutually in the information network. In this paper, to assess the status of SMEs, we exploit the power of meta path to study how influences among financial entities spread in the information network of SMEs.

In our method, we first build a heterogeneous information network of SMEs to describe interactive relationships between different entities associated with SME. Figure 1 is a toy example of Alibaba heterogeneous information network, which demonstrates some possible connections of Alibaba and its related entities. For example, path “ $Alibaba\xrightarrow{subsidiary}Lazada$ ” represents information that Lazada is a subsidiary of Alibaba; path “ $Alibaba\xrightarrow{CEO}{Bob}\xrightarrow{controller}{Taobao}$ .” represents information that Alibaba’s CEO, Bob, is also Taobao’s controller; and path “ $Alibaba\xrightarrow{control}{YouKu}\xrightarrow{report}{news}$ .” represents information that Alibaba’s control enterprise, YouKu, is criticized by the newspaper. It is easy to see that through information networks, the interrelated relations between entities can be easily obtained. By building information network of SMEs, we can not only obtain the self-related information but also the interactive information associated with the target enterprise.

Refer to caption — Figure 1: Alibaba heterogeneous information network example. There exist multiple types of nodes in the network, such as *enterprise* (Alibaba, Lazada, YouKu, Heineken), *person* (Bob, Alex, Jack), *commodity* (Taobao, Alipay) and *news* (newspaper). Links between nodes represent relation connect entities, for example, Jack is the *founder* of Alibaba, Heineken is the *supplier* of Taobao, and the newspaper *report* a piece of news of Alibaba’s *control* company YouKu etc.

With the given information network of SMEs, we propose a novel feature - meta path feature, to measure the impact through meta paths from one financial entity to another. Unlike conventional financial indicators, the meta path feature can be defined and applied very flexibly. The flexibility makes us able to evaluate the credit status of SMEs from various perspectives more comprehensively. The proposed meta path feature can also explicitly show how much one entity can be affected by a specific logical path, which can provide an intuitive view for banks, lenders, and relevant experts to understand the credit risk faced by SMEs. In this way, SME default can be effectively identified.

In the rest of this paper, section 2 introduces the SME credit risk evaluation method and the application of information networks. Section 3 builds a model of SMEs’ heterogeneous information network and proposes the meta path feature. In section 4, by considering the ability of risk identification, three features are proposed based on meta path. Section 5 presents the experiment on three real-world datasets and section 6 concludes the paper.

2 Related work

The credit risk evaluation model of SMEs was first established by Edmister edmister1972empirical in 1972, leading to the emergence of a large number of credit risk measurement index systems. Most of the early credit evaluation models for SME at home and abroad follow the index system of credit evaluation model for large enterprises, that is, the extraction of some key financial indicators of enterprise financial statements. Among these key financial indicators, profitability indicators cultrera2016bankruptcy , such as operating profit ratio and ratio of profits to cost, and solvency indicators tian2015variable , such as current and quick ratio, are used the most. Besides, operational capacity indicators bauer2014hazard , development capacity indicators sermpinis2018modelling , and liquidity indicators sermpinis2018modelling are added in many studies. Since financial indicators alone can not lineate the complete picture of an enterprise, non-financial indicators such as, managers background 2010Evaluation , working experience 2013Feature , and enterprise internal structure 2010Predicting moro2013loan are added for evaluation. However, financial and non-financial indicators can not capture the contagion credit risk among financial entities since they are independent and do not consider the casual chain.

Recently, with the rapid improvement of computing capacity and the development of data mining technology, information network has gained much attention from researchers and makes excellent work in the field of clustering sun2009ranking sun2012relation , classification ji2010graph wang2016text , relation prediction sun2012will popescul2003statistical and recommendation shi2015semantic ma2009learning . Researchers often use two kinds of information networks: the homogeneous information network and the heterogeneous information network. The homogeneous information network builds with same type of objects and link relations. For example, Jamali jamali2010matrix builds a social network for user recommendation based on user ratings; Ma ma2008sorec builds a friend relationship prediction network based on personal relations. These homogeneous information networks ignore the relationship between different objects and relations, which causes the loss of important information. The concept of heterogeneous information network was first proposed by Sun 2016A in 2009. It combines more information and contains logical semantics of different object types and link types. For example, Wang wang2018shine proposes a Signed Heterogeneous Information Network Embedding to capture the sentiment links of online social information by considering users with sentiment and social relations; Hosseini hosseini2018heteromed used the heterogeneous information network with high dimensional data and rich relationships for medical diagnosis. The heterogeneous information network is usually used to capture complicated semantic and logical relationships among different entities.

In the field of SME credit risk evaluation, a large amount of data related to enterprises has been accumulated, such as upstream and downstream enterprise information and relevant news information. The heterogeneous relationships between different entities have also provided researchers with new ideas to find SME credit risk factors. For example, Tsai tsai2017risk pays attention to the impact of enterprise-related news information on the credit risk of SMEs; Yin yin2020evaluating utilizes legal judgments to evaluate the credit risk of SMEs. Moro moro2013loan takes the impact of SMEs and bank manager trust relationship on enterprise credit risk into consideration. Tobback 2017Bankruptcy pays attention to the inter-enterprise relationship data for feature selection to measure the credit risk of SMEs; Kou 2020Bankruptcy is focused on enterprise payment and transaction data information to measure the credit risk of SMEs. However, all of their works are built on homogeneous information networks, most of which do not consider the heterogeneous information. Therefore, in this paper, we build a heterogeneous information network for SMEs to more effectively evaluate SME credit risks, which considers both the heterogeneous information of SMEs and the semantic information carried by different SME entities.

3 Model of SME Credit Risk

To evaluate SME credit risk, conventional methods adopted by experts usually make their judgments only based on the features directly affecting SME default, such as asset-liability ratio, current ratio, and turnover rate, but not on logical relationships between SMEs, such as parent and subsidiary situations, upstream and downstream situations, enterprise director or high-level manager related situations. For example, when a parent company defaults, the solvency of its subsidiaries will also be affected. If the influences exerted by the parent company are neglected, its subsidiary company’s default conditions will be overestimated. Therefore, apart from the features directly affecting default, the logical relationships between SMEs should also be considered in evaluating SMEs’ status. Paying attention to different connections between SMEs can improve both the reliability and the interpretability of the evaluation. This section will give a model of SME credit risk with logical relationships adopted.

3.1 SME Heterogeneous Information Network

A heterogeneous information network 2016A is a classical data structure used to model objects and relations in a directed graph. This graph structure has shown its superiority in representing and storing knowledge about the natural world for many applications 2013Modeling 2013Recommendation 2019Cash . Given different objects in information networks, logical connections can be effectively constructed, and semantic relationships can be easily captured. Hence, we also build our model in a information network which is defined as follows:

Definition 1

With a schema $S=(\mathcal{A},\mathcal{R})$ , an information network defined as a directed graph $G=(\mathcal{V},\mathcal{E})$ with object type function $\tau:\mathcal{V}\rightarrow\mathcal{A}$ and relation type function $\phi:\mathcal{E}\rightarrow\mathcal{R}$ , where object $v\in\mathcal{V}$ belongs to object type $\tau(v)\in\mathcal{A}$ , link $e\in\mathcal{E}$ belongs to relation type $\phi(e)\in\mathcal{R}$ .

In this paper, our model is built as a heterogeneous information network of SMEs. The SME schema is shown in Figure 2.

In our model, enterprise ( $\mathcal{A}_{e}$ ) , commodity ( $\mathcal{A}_{c}$ ), person ( $\mathcal{A}_{p}$ ) and news ( $\mathcal{A}_{n}$ ) are four fundamental object types in studying SME credit risk. The studied relation types are summarized from public enterprise information and objective facts, such as the shareholder relation between enterprise and person, the produce relation between enterprise and commodity, and the report relation between enterprise and news. The types mentioned in this paper are listed in Table 1.

Table 1: Object type and relation type notations

Notation	Discriptions
$\mathcal{A}_{e}$	the object type of enterprise
$\mathcal{A}_{c}$	the object type of commodity
$\mathcal{A}_{p}$	the object type of person
$\mathcal{A}_{n}$	the object type of news
$\mathcal{R}_{parent}$	the relation type of parent between enterprises
$\mathcal{R}_{subsidiary}$	the relation type of subsidiary between enterprises
$\mathcal{R}_{supplyer}$	the relation type of supply between enterprises
$\mathcal{R}_{saler}$	the relation type of sales between enterprises
$\mathcal{R}_{control}$	the relation type of controller between enterprise and person
$\mathcal{R}_{shareholder}$	the relation type of shareholder between enterprise and person
$\mathcal{R}_{manager}$	the relation type of manager between enterprise and person
$\mathcal{R}_{employee}$	the relation type of employee between enterprise and person
$\mathcal{R}_{produce}$	the relation type of produce between enterprise and commodity
$\mathcal{R}_{report}$	the relation type of report between enterprise and news
$\mathcal{R}_{relate}$	the relation type of relate between person

With the SME schema defined, an example of SME heterogeneous information network is shown in Figure 3.

We can see that $v_{1}$ , $v_{2}$ , $v_{7}$ are enterprise, that we have $\tau(v_{1})=\mathcal{A}_{e}$ , the same as $\tau(v_{2})$ and $\tau(v_{7})$ are. The $v_{6}$ , $v_{9}$ are commodities, that we have $\tau(v_{6})=\mathcal{A}_{c}$ , the same as $\tau(v_{9})$ . The $v_{5}$ , $v_{10}$ , $v_{11}$ , $v_{12}$ , $v_{13}$ are news, that we have $\tau(v_{5})=\mathcal{A}_{n}$ , the same as $\tau(v_{10})$ , $\tau(v_{11})$ , $\tau(v_{12})$ , and $\tau(v_{13})$ are. The $v_{3}$ , $v_{4}$ , $v_{8}$ are persons, that we have $\tau(v_{3})=\mathcal{A}_{p}$ , the same as $\tau(v_{4})$ and $\tau(v_{8})$ are. The $e_{5}$ , $e_{8}$ are the relation of produces, that we have $\phi(e_{5})=\mathcal{R}_{produce}$ , the same as $\phi(e_{8})$ . The $e_{4}$ , $e_{9}$ , $e_{10}$ , $e_{11}$ , $e_{12}$ are the relation of reports, that we have $\phi(e_{4})=\mathcal{R}_{report}$ , the same as $\phi(e_{9})$ , $\phi(e_{10})$ , $\phi(e_{11})$ , and $\phi(e_{12})$ are. The $e_{6}$ is the relation of supply, $e_{1}$ is the relation of parent, that we have $\phi(e_{6})=\mathcal{R}_{supply}$ , and $\phi(e_{1})=\mathcal{R}_{parent}$ . The $e_{7}$ , $e_{2}$ are the relation of controller, $e_{3}$ is the relation of employee, that we have $\phi(e_{7})=\mathcal{R}_{control}$ , the same as $\phi(e_{2})$ , and $\phi(e_{3})=\mathcal{R}_{employee}$ . The $e_{13}$ is the relation of relate, that we have $\phi(e_{13})=\mathcal{R}_{relate}$ .

3.2 SME Meta Path

In the SME network graph we build in Section 3.1, a graph edge is used to present the relationship between two objects. Limited by the definition of edge, the represented relationships can only be some simple ones, which are insufficient to describe the relationships used in the problem of SME credit risk. In order to model complicated relationships, in this section, we introduce another data structure, meta path (MP), to represent complicated and implicit relations in SME network.

Definition 2

With a schema $S=(\mathcal{A},\mathcal{R})$ , a meta path $P$ is a path in the form $\mathcal{A}_{1}\stackrel{{\scriptstyle\mathcal{R}_{1}}}{{\longrightarrow}}\mathcal{A}_{2}\stackrel{{\scriptstyle\mathcal{R}_{2}}}{{\longrightarrow}}...\stackrel{{\scriptstyle\mathcal{R}_{n}}}{{\longrightarrow}}\mathcal{A}_{n+1}$ which defines a composite relation $\mathcal{R}=\mathcal{R}_{1}\circ\mathcal{R}_{2}\circ\ldots\circ\mathcal{R}_{n}$ between $\mathcal{A}_{1}$ and $\mathcal{A}_{n+1}$ , where $\circ$ denotes the composition operator on relations.

For simplicity, we use the names of object types and relation types denoting the MP: $P=\mathcal{A}_{1}{\cdot}\mathcal{R}_{1}{\cdot}\mathcal{A}_{2}\ldots\mathcal{R}_{n}{\cdot}\mathcal{A}_{n+1}$ . With the definition of meta path, a path $p=v_{1}{\cdot}e_{1}{\cdot}v_{2}\ldots e_{n}{\cdot}v_{n+1}$ in graph $G$ follows a meta path $P$ , if for any vertex $v_{i}\in\mathcal{V}$ and any edge $e_{i}\in\mathcal{E}$ , there have edge $e_{i}$ is between $v_{i}$ and $v_{i+1}$ , $\tau(v_{i})={\mathcal{A}_{i}}$ and $\phi(e_{i})={\mathcal{R}_{i}}$ . We also call $p$ as a path instance of $P$ with the denotation $p\in P$ .

According to the definition, some examples of meta paths can be seen in Figure 2. $P=\mathcal{A}_{e}{\cdot}\mathcal{R}_{parent}{\cdot}\mathcal{A}_{e}{\cdot}\mathcal{R}_{report}{\cdot}\mathcal{A}_{n}$ is a MP, which represent the information that the SME’s parent enterprise has report a news. According to Figure 3, there is a path instance $p=v_{1}{\cdot}e_{1}{\cdot}v_{2}{\cdot}e_{9}{\cdot}v_{10}$ of MP $P$ . Because $\tau(v_{1})=\mathcal{A}_{e}$ , $\tau(v_{2})=\mathcal{A}_{e}$ , $\tau(v_{10})=\mathcal{A}_{n}$ , $\phi(e_{1})=\mathcal{R}_{parent}$ , $\phi(e_{9})=\mathcal{R}_{report}$ .

The given MP definition structures logical connections between objects, making our model more expressive and interpretable. It not only can show explicit reasons of factors affecting SMEs on credit risk, but also can explain implicit logics of correlation between objects having no direct links in SME information network.

Compared to the information carried by objects, the information carried by meta paths is more critical in evaluating the credit risk of SMEs. The reason is that the expression ability of meta paths is more stronger. Through different meta paths, the same financial object may affect another financial object significantly differently. For instance, in Figure 3 we can see that there exist two paths from person $v_{4}$ to enterprise $v_{1}$ . The first one is $p=v_{1}{\cdot}e_{3}{\cdot}v_{4}$ following meta path $P=\mathcal{A}_{e}{\cdot}\mathcal{R}_{employee}{\cdot}\mathcal{A}_{p}$ and the second one is $p=v_{1}{\cdot}e_{2}{\cdot}v_{3}{\cdot}e_{13}{\cdot}v_{4}$ following meta path $P=\mathcal{A}_{e}{\cdot}\mathcal{R}_{control}{\cdot}\mathcal{A}_{p}{\cdot}\mathcal{R}_{relate}{\cdot}\mathcal{A}_{p}$ . From the first path, the bribery scandal of an outsourcing employee $v_{4}$ may do limited harm to the enterprise $v_{1}$ since $v_{1}$ may have many other outsourcing employees to replace the role of $v_{4}$ . However, from the second path, the bribery scandal of the outsourcing employee $v_{4}$ may do significant harm to enterprise $v_{1}$ since $v_{4}$ has a domestic relation with $v_{3}$ who directs enterprise $v_{1}$ . Therefore, instead of inspecting each object’s direct impact, our model regards a whole logical path consisting of objects and relations as a factor, in evaluating the credit risk of SMEs.

4 Meta Path Impact On SME

In the above, we have given the definition of MP, a well-patterned structure to represent various semantics relating to SME credit risk. It has been shown that even with no direct link given, the negative information of some SME may affect others heavily through meta paths. For example, a piece of negative news about an enterprise director may lead to a bad reputation for his enterprise; a low-quality product of a parent enterprise may cause a loss of competitiveness to its subsidiary enterprises. Usually, potential risks brought from paths is non-trivial to be neglected when an SME is evaluated, but how to formulate such potential risk remains a question. In order to solve this question, in this section, we will propose several novel features, named meta path feature, to represent the risk.

4.1 Risk Inference from Object

Before introducing meta path features, we first give a method to identify if there exists potential risk in financial objects themselves. According to the object types studied in Section 3.1, except thenews object which is used to provide negative or positive information, a commodity object is regarded with potential risks if its quality is not reliable; a person object is regarded with potential risks if his capability is not qualified; an enterprise object is regarded with potential risks if it is lack of credibility. In this paper, in order to inference if potential risks exist, considering applicability and generality, we use Naive Bayes model to inference if the mentioned objects is risky or not. Our probabilistic model is learnt from public historical data, such as financial statements, annual reports, and online public news. The definition of our Naive Bayes inference model is given as the following:

Definition 3

With the assumption that each attribute feature of an object is independent of each other, we define an inference function $\Gamma(x)$ to evaluate if object $x$ is risky based on the probability $\mathbb{P}(y=1|x)$ learned from Naive Bayes model.

	$\displaystyle\Gamma(x)$	$\displaystyle=\begin{cases}1&\mathbb{P}(y=1\|x)>0.5\\ 0&\text{otherwise}\end{cases}$		(1)
	$\displaystyle\mathbb{P}(y=1\|x)$	$\displaystyle=\frac{{\prod_{i}^{n}}\mathbb{P}(x^{(i)}\|y=1)\mathbb{P}(y=1)}{{\prod_{i}^{n}}\mathbb{P}(x^{(i)}\|y=1)\mathbb{P}(y=1)+{\prod_{i}^{n}}\mathbb{P}(x^{(i)}\|y=0)\mathbb{P}(y=0)}$		(1)

where $x^{(i)}$ is the $i$ th attribute feature of object $x$ , $n$ is the number of all attributes, $y=1$ indicates risky object and $y=0$ indicates non-risky object.

With the inference function, we are able to identify the risk of a financial object by its own information. For instance, a commodity object with low sales volume, high repair rate, and high refund will be inferred as risky one; a person object with irrelevant education background, irrelevant working experience, and short working years will be inferred as risky one; an enterprise object with low ROE ratio, low quick ratio, and high asset-liability ratio will be inferred as risky one. In the next, we will study how to inference the potential risk from MP level.

4.2 Risk Inference from Meta Path

In an SME information network, an enterprise may have many paths linking to other financial objects, as shown in Figure 4. We can see enterprise $J$ has $5$ path instances for meta path $P$ = $\mathcal{A}_{e}{\cdot}\mathcal{R}_{control}{\cdot}\mathcal{A}_{p}{\cdot}\mathcal{R}_{shareholder}{\cdot}\mathcal{A}_{e}$ and enterprise $K$ has $4$ path instances for MP $P$ .

With the inference function defined above, we are able to identify if objects in the above information network are risky or not. Thus, for a specific MP, with the objects linked by its path instances, it is natural to infer that an enterprise is most likely to be risky if potential risks exist in most of its linked objects. Based on this straight intuition, we next present several features to elaborate such risk from meta paths.

4.2.1 Meta Path Feature

Given an enterprise $x$ , the number of risky objects connected by a MP $P$ are taken as an indicator to reflect the impact of meta path $P$ on target enterprise $x$ . The larger the indicator is, the higher the potential risk exists. Formally, we call the indicator as naive MP feature and give its definition as the following.

Definition 4

Naive MP feature $N_{P}(x)$ is an indicator to reveal the impact of meta path $P$ on enterprise $x$ :

\displaystyle N_{P}(x)=\frac{|\{x^{\prime}\in D|\exists p_{x\rightsquigarrow x^{\prime}}\in P,\Gamma(x^{\prime})=1\}|}{|\{x^{\prime}\in D|\exists p_{x\rightsquigarrow x^{\prime}}\in P\}|}

(2)

where $D$ is an SME object collection, $p_{x\rightsquigarrow x^{\prime}}$ is a path instance from object $x$ to object $x^{\prime}$ and $\Gamma(x)$ is the inference function defined in Section 4.1.

In Figure 4, if $Q_{2}$ , $Q_{3}$ and $Q_{4}$ are risky objects, then we have $N_{P}(J)$ = $3/5$ = $0.6$ , $N_{P}(K)$ = $3/4$ = $0.75$ .

4.2.2 Weighted Meta Path Feature

Although the above meta path feature can effectively indicate the impact of MP, it may be argued that the impact of different objects on the same MP should not be the same. For all the objects in the network, irrelevant objects may affect small; relevant ones may matter big. Especially for an SME, the enterprise, which is its parent company, should influence it deeper than the enterprise, which only has one cooperation with it. Therefore, instead of treating all objects equally, it is more reasonable to treat them differently according to their relevance with the target SME. Next, considering relevance between objects, we will give a relevance-weighted version of meta path feature accordingly.

Usually, relevance is used to measure how close two objects distance to each other. As there is no unified definition of relevance, different applications have unique and appropriate relevance measures. In SME application, there exists a usual fact that, even though an enterprise is of well financial status, it may also default, which is caused by the propagated negative influence of its related upstream and downstream enterprises. Therefore, to measure the relevance between SME objects, a logical structure-based relevance measure is better than a textual context-based relevance measure.

A straightforward idea is that for any object pair, the two which have more paths should be more relevant. From this idea, we simply introduce a path count version of MP weighted feature as follows.

Definition 5

CountSim MP weight feature $C_{P}(x)$ is an indicator to reveal the structure relevance impact of meta path P on enterprise $x$ . We call it CountSim MP feature.

\displaystyle C_{P}(x)=\frac{|\{x^{\prime}\in D|\exists p_{x\rightsquigarrow x^{\prime}}\in P\}|}{{{|\{x\in S\}}|}+{|\{x^{\prime}\in S^{\prime}\}}|}

(3)

where $S$ and $S^{\prime}$ are SME object collections where all links from $x$ and to $x^{\prime}$ respectively. $D$ is another SME object collection where contains all objects.

The path count version is simple to apply but it make little use of graph structure. In SME heterogeneous information network, logical relationships between objects are captured by the structure of graph paths. Hence, compared to other measures, a path-based measure of relevance is more appropriate to be adopted in our model. At last, we apply HeteSim shi2014hetesim , an effective path-based similarity, to evaluate the relevance between objects.

Definition 6

HeteSim MP weight feature $H_{P}(x)$ takes HeteSim as the similarity measure to reveal the path relevance impact of meta path P on enterprise $x$ . We call it HeteSim MP feature.

\displaystyle H_{P}(x)=\frac{{\sum\nolimits_{x^{\prime}\in\{x^{\prime}|\exists p_{x\rightsquigarrow x^{\prime}}\in P,\Gamma(x^{\prime})=1\}}}HeteSim(x,x^{\prime})}{{\sum\nolimits_{x^{\prime}\in\{x^{\prime}|\exists p_{x\rightsquigarrow x^{\prime}}\in P\}}}HeteSim(x,x^{\prime})}

(4)

where $p_{x\rightsquigarrow x^{\prime}}$ is a path instance from object $x$ to object $x^{\prime}$ , $HeteSim(x,x^{\prime})$ is the relevance between object $x$ and object $x^{\prime}$ under HeteSim and $\Gamma(x)$ is the estimating function defined in Section 4.1.

5 Experiments

In this section, we are going to investigate the effectiveness of meta path features. We conduct experiments on three real-world SME datasets. The result and explanation is detailed in this part.

5.1 Data and settings

In our experiments, three datasets recording enterprises’ statistics are used for comparison. GEM(The Growth Enterprise Market from Shenzhen Stock Exchange) and STAR (The Science and Technology Innovation Board from Shanghai Stock Exchange) datasets are about the SMEs of high technology, and SB (The Small and Medium-sized Enterprise Board from Shenzhen Stock Exchange) dataset are about traditional enterprises. All the datasets can be downloaded from CSMAR (https://www.gtarsc.com). As this paper only considers four types of financial entities (person, commodity, enterprise and news), our experiments are only performed on the enterprises those at least relate to one person, one commodity, one other enterprise and one piece of news.

The risk information about whether an enterprise is lack of credibilities, a person is lack of qualifications and a commodity is lack of reliabilities is obtained from CSMAR and cninf (http://www.cninfo.com.cn), which provide an authoritative and professional assessment on the entities. The news information is collected from China Judgements Online (https://wenshu.court.gov.cn). The final details of datasets is shown in Table 2.

Table 2: Dataset information

	GEM	STAR	SB
number of enterprise	528	297	722
related enterprise information	58478	26554	80729
related person information	360462	38663	515504
related news information	13026	3748	24718
related commodities information	17450	8987	36735

As the gathered risk information may not be complete, for some important but unknown entities, we use the model in Section 4.1 to infer their risk. If an entity’s inferred probability is larger than $0.75$ , it is deemed as risky.

Since the brought impact from a meta path decreases with its length increasing, we only consider the meta paths with length less than 6. And, the meta paths which do not start with SME type are not selected for our experiments. With the proposed MP features, we test their performance using a default prediction model which is used to learn the weights associated with those features. The logistic regression model is taken as the prediction model, which is optimized by MLE (Maximum Likelihood Estimation).

In this section, all experiments were performed using Python 2.7.17 in Win $8.1+$ with CPU $i5-9300+$ processor and $8G+$ RAM.

5.2 Selection of Meta Path Features

Even limited by the length constraint, there may still exist numerous meta paths. Among all possible meta path features, which ones are the most valuable ones? In this section, we will run experiments to show the importance of meta path features.

We first generate 40 meta path features according to Definition 4 for simplicity. Then each feature is tested under Wald test and the $p$ -value of the feature associated with its meta path is used to evaluate the feature’s importance. The test is performed on all three datasets. From the results, we list the top 20 significant meta path features for each dataset in Tables 3 - 5 and the bottom 20 meta path features in Table 6 - 8 . We can see that for all three datasets, the controller’s ability, parent enterprise financial status, and news reported for enterprise, plays very significant roles in determining SME status. However, the longer the relation chains, the worse the performance of MP features. This may be due to the fact that longer links contain less valuable information. Or the longer the chain, the more distracting and inaccurate information it contains. Look into details, we find that for GEM and STAR datasets, the MP features containing personnel relations are most significant, while those containing enterprise relations are the least. For SB dataset, the opposite is true. It is reasonable, that the conventional SME, due to their own resource constraints, will pay more attention to the relationship with stakeholders in order to ensure stable development. The high-technology SME mainly focus on technology research and development, so the ability of personnel has a significant impact on the enterprise.

Table 3: Top -

20

significant meta path features for the GEM dataset

	Meta path feature	P-value	Significance level¹
1	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	3.7876e-46	****
2	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}$	5.3500e-37	****
3	$\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	1.7758e-32	****
4	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{e}$	1.0156e-32	****
5	$\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	3.9645e-29	****
6	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	8.3629e-26	****
7	$\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	2.2358e-26	****
8	$\mathcal{A}_{e}\cdot\mathcal{R}_{boardmember}\cdot\mathcal{A}_{p}$	6.1598e-23	****
9	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}$	2.4664e-15	****
10	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	1.6067e-9	****
11	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	3.7876e-6	****
12	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}$	5.3500e-5	****
13	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00121	***
14	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{relate}\cdot\mathcal{A}_{p}$	0.00160	***
15	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00236	***
16	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{e}$	0.00246	***
17	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.00396	***
18	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00615	***
19	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.00758	***
20	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}$	0.00823	***

1

*: p<0.1, **: p<0.05, ***: p<0.01, ****: p<0.001

Table 4: Top -

20

significant meta path features for the STAR dataset

	Meta path feature	P-value	Significance level¹
1	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	7.4107e-44	****
2	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}$	3.3610e-37	****
3	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}$	1.8247e-29	****
4	$\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	1.8709e-22	****
5	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}$	1.925e-17	****
6	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	2.7723e-11	****
7	$\mathcal{A}_{e}\cdot\mathcal{R}_{boardmember}\cdot\mathcal{A}_{p}$	9.2910e-8	****
8	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	2.8380e-4	****
9	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.000929	****
10	$\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	0.00175	***
11	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{e}$	0.00277	***
12	$\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00283	***
13	$\mathcal{A}_{e}\cdot\mathcal{R}_{boardmember}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00341	***
14	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{p}$	0.0044	***
15	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.00455	***
16	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}$	0.00476	***
17	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	0.00496	***
18	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00510	***
19	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00528	***
20	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00741	***

1

*: p<0.1, **: p<0.05, ***: p<0.01, ****: p<0.001

Table 5: Top -

20

significant meta path features for the SB dataset

	Meta path feature	P-value	Significance level¹
1	$\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	1.2831e-48	****
2	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}$	3.0306e-45	****
3	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	1.5510e-36	****
4	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}$	6.5260e-35	****
5	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	3.7263e-35	****
6	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{e}$	4.4973e-33	****
7	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{e}$	2.3524e-33	****
8	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}$	1.1475e-28	****
9	$\mathcal{A}_{e}\cdot\mathcal{R}_{boardmember}\cdot\mathcal{A}_{p}$	6.8367e-27	****
10	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	5.2674e-13	****
11	$\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	1.2831e-11	****
12	$\mathcal{A}_{e}\cdot\mathcal{R}_{boardmember}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	3.0306e-9	****
13	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}$	1.5510e-8	****
14	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{e}$	6.5260e-6	****
15	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}$	3.7263e-5	****
16	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{p}$	4.4973e-4	****
17	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{e}$	2.3524e-4	****
18	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00114	***
19	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00526	***
20	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.00683	***

1

*: p<0.1, **: p<0.05, ***: p<0.01, ****: p<0.001

Table 6: Bottom -

20

significant meta path features for the GEM dataset

	Meta path feature	P-value	Significance level¹
1	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}$	0.0783	*
2	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.0778	*
3	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{e}$	0.0788	*
4	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{e}$	0.0832	*
5	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{e}$	0.0854	*
6	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.0861	*
7	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.0874	*
8	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{e}$	0.0889	*
9	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	0.0893	*
10	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{e}$	0.0896	*
11	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	0.0899	*
12	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	0.0932	*
13	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	0.1775	-
14	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	2.4662	-
15	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{employee}\cdot\mathcal{A}_{p}$	3.9645	-
16	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}$	6.1598	-
17	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}$	7.4662	-
18	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	10.6710	-
19	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{employee}\cdot\mathcal{A}_{e}$	12.4639	-
20	$\mathcal{A}_{e}\cdot\mathcal{R}_{employee}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	16.0762	-

1

*: p<0.1, **: p<0.05, ***: p<0.01, ****: p<0.001

Table 7: Bottom -

20

significant meta path features for the STAR dataset

	Meta path feature	P-value	Significance level¹
1	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{relate}\cdot\mathcal{A}_{p}$	0.0538	*
2	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{e}$	0.0598	*
3	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{e}$	0.0641	*
4	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.0870	*
5	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.0873	*
6	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{e}$	0.0881	*
7	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.0886	*
8	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	0.0928	*
9	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	0.0941	*
10	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{e}$	0.0951	*
11	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	0.0974	*
12	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.0976	*
13	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.0982	*
14	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	0.0987	*
15	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{employee}\cdot\mathcal{A}_{e}$	4.6731	-
16	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	7.7232	-
17	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	9.2910	-
18	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{e}$	12.8380	-
19	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	14.4176	-
20	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholeder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{employee}\cdot\mathcal{A}_{e}$	17.5919	-

1

*: p<0.1, **: p<0.05, ***: p<0.01, ****: p<0.001

Table 8: Bottom -

20

significant meta path features for the SB dataset

	Meta path feature	P-value	Significance level¹
1	$\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.0714	*
2	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{report}\cdot\mathcal{A}_{n}$	0.0730	*
3	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{relate}\cdot\mathcal{A}_{p}$	0.07551	*
4	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.07652	*
5	$\mathcal{A}_{e}\cdot\mathcal{R}_{parent}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}$	0.08352	*
6	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.08497	*
7	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	0.08632	*
8	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{produce}\cdot\mathcal{A}_{c}$	0.08756	*
9	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.09367	*
10	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{e}$	0.09526	*
11	$\mathcal{A}_{e}\cdot\mathcal{R}_{subsidiary}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	0.09831	*
12	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	0.09836	*
13	$\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{e}$	5.5101	-
14	$\mathcal{A}_{e}\cdot\mathcal{R}_{employee}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{e}$	6.5260	-
15	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}$	9.7263	-
16	$\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{employee}\cdot\mathcal{A}_{p}$	14.4973	-
17	$\mathcal{A}_{e}\cdot\mathcal{R}_{sales}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}$	23.5246	-
18	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{shareholder}\cdot\mathcal{A}_{p}$	27.7731	-
19	$\mathcal{A}_{e}\cdot\mathcal{R}_{control}\cdot\mathcal{A}_{p}\cdot\mathcal{R}_{employee}\cdot\mathcal{A}_{p}$	28.3672	-
20	$\mathcal{A}_{e}\cdot\mathcal{R}_{supply}\cdot\mathcal{A}_{e}\cdot\mathcal{R}_{manager}\cdot\mathcal{A}_{p}$	31.5267	-

1

*: p<0.1, **: p<0.05, ***: p<0.01, ****: p<0.001

5.3 Overall Comparisons of MP feature

In this section, we compare our three kinds of MP features with two kinds of other features proposed for evaluating SME credit risk. One kind of the compared features is conventional features jrfm12010030 , such as current liquidity, quick ratio, assets turnover, a total of 16 financial indicators, and age of the enterprise, employment, a total of 5 non-financial indicators. In our experiments, we call it SME CV. The other kind of the compared features is homogeneous path feature 2017Bankruptcy which is modeled from homogeneous information networks. It contain only one object type and only one relation type, for example, two SMEs are related if they share a high-level manager. In our experiments, we call it SME HPF. For our MP features, we respectively select the Naive MP features, CountSim MP features and HeteSim MP features according to the ranking result in Section 5.2 as the candidate features for comparison. All the comparisons are still conducted on the mentioned three datasets. To compare mentioned methods, we first select the top-10 performed features of each method. And then we use their average AUC score as the overall score of each mentioned method. The comparison results are summarized in Figures 5(a) - 5(c) and Table 9.

We can see that the heterogeneous MP features beats the SME CV features and the SME HPF features in all three datasets. For the proposed MP features, it turns out that: (1) All the MP features show better classification performance than the SME conventional features and the homogeneous path features; (2) The classification performance of the CountSim MP features and the HeteSim MP features beats the Naive MP features; (3)The classification performance of the CountSim MP features and the HeteSim MP features are similar. The above results demonstrate the effectiveness of our proposed features in classifying default SMEs.

Table 9: average AUC score comparison for three datasets

	SME CV	SME HPF	Naive MP	CountSim MP	HeteSim MP
GEM	0.716	0.728	0.747	0.771	0.774
STAR	0.654	0.707	0.759	0.767	0.791
SB	0.721	0.733	0.752	0.756	0.783

5.4 Discussion

In this section, we will discuss some interesting point which we found in our experiments. In general, prediction accuracy increases with data size increasing. However, we found for SMEs the impact of data size is affected by the timestamp of data. Next, we will detail and discuss how this affection comes. Figure 6(a) - 6(c) shows the classification accuracy of meta path features under different timestamps.

It is interesting that when we extend SME data used in our model with the latest data in one year, the accuracy of the model increases. But if we extend that with data before last year, the accuracy of the model shows a declining trend. This phenomenon may be due to the fact that if the additional data is still in its valid duration, our model can be learnt more fully within the life circle of the enterprise. But if the additional data is out of its valid duration, our model may be learnt out of the life circle and loss its effectiveness. For example, employee turnover rate over two years can not reflect the truth about the target enterprise now.

6 Conclusion

This paper proposes a meta path based SME credit risk evaluation method that models SME-related information as a heterogeneous information network. In detail, we first build an SME heterogeneous information network based on four entity types and ten relation types. The heterogeneous information network of SMEs can capture the relationship among related enterprises and provide more comprehensive and reliable information for the credit risk measurement of SMEs. Then, we extracted meta path features associated with SME based on the information network schema, which represents the situation of SME credit risk. Finally, we developed three features to evaluate the effect of meta path on SME credit risks. The experimental result shows that our proposed SME credit risk measuring method has a higher significance than the conventional features and the homogeneous features.

Acknowledgements.

This work is supported by the Project of Science and Technology Research and Development of China State Railway Group Co., Ltd. under Grant K2020Z002.

References

(1) Bauer, J., Agarwal, V.: Are hazard models superior to traditional bankruptcy prediction approaches? a comprehensive test. Journal of Banking & Finance 40, 432–442 (2014)
(2) Cultrera, L., Brédart, X.: Bankruptcy prediction: the case of belgian smes. Review of Accounting and Finance (2016)
(3) Edmister, R.O.: An empirical test of financial ratio analysis for small business failure prediction. Journal of Financial and Quantitative analysis 7(2), 1477–1493 (1972)
(4) Gang, K.A., Yong, X.A., Yi, P.B., Feng, S.C., Yang, C.A., Kc, D., Sk, D.: Bankruptcy prediction for smes using transactional data and two-stage multiobjective feature selection. Decision Support Systems 140 (2020)
(5) Gupta, M., Kumar, P., Bhasker, B.: Heteclass: A meta-path based framework for transductive classification of objects in heterogeneous information networks. Expert Systems with Applications 68, 106–122 (2017)
(6) Hajek, P., Michalak, K.: Feature selection in corporate credit rating prediction. Knowledge-Based Systems 51, 72–84 (2013)
(7) Hosseini, A., Chen, T., Wu, W., Sun, Y., Sarrafzadeh, M.: Heteromed: Heterogeneous information network for medical diagnosis. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 763–772 (2018)
(8) Hu, B., Zhang, Z., Shi, C., Zhou, J., Qi, Y.: Cash-out user detection based on attributed heterogeneous information network with a hierarchical attention mechanism. Proceedings of the AAAI Conference on Artificial Intelligence 33 (2019)
(9) Jamali, M., Ester, M.: A matrix factorization technique with trust propagation for recommendation in social networks. In: Proceedings of the fourth ACM conference on Recommender systems, pp. 135–142 (2010)
(10) Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 570–586. Springer (2010)
(11) Lugovskaya, L.: Predicting default of russian smes on the basis of financial and non-financial variables. Journal of Financial Services Marketing 14(4), 301–313 (2010)
(12) Ma, H., King, I., Lyu, M.R.: Learning to recommend with social trust ensemble. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp. 203–210 (2009)
(13) Ma, H., Yang, H., Lyu, M.R., King, I.: Sorec: social recommendation using probabilistic matrix factorization. In: Proceedings of the 17th ACM conference on Information and knowledge management, pp. 931–940 (2008)
(14) Moro, A., Fink, M.: Loan managers’ trust and credit access for smes. Journal of banking & finance 37(3), 927–936 (2013)
(15) Popescul, A., Ungar, L.H.: Statistical relational learning for link prediction. In: IJCAI workshop on learning statistical models from relational data, vol. 2003. Citeseer (2003)
(16) Psillaki, M., Tsolas, I.E., Margaritis, D.: Evaluation of credit risk based on firm performance. European Journal of Operational Research 201(3), 873–881 (2010)
(17) Ptak-Chmielewska, A.: Predicting micro-enterprise failures using data mining techniques. Journal of Risk and Financial Management 12(1) (2019). DOI 10.3390/jrfm12010030. URL https://www.mdpi.com/1911-8074/12/1/30
(18) Sermpinis, G., Tsoukas, S., Zhang, P.: Modelling market implied ratings using lasso variable selection techniques. Journal of Empirical Finance 48, 19–35 (2018)
(19) Shi, C., Kong, X., Huang, Y., Philip, S.Y., Wu, B.: Hetesim: A general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering 26(10), 2479–2492 (2014)
(20) Shi, C., Li, Y., Zhang, J., Sun, Y., Yu, P.S.: A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29(1), 17–37 (2016)
(21) Shi, C., Zhang, Z., Luo, P., Yu, P.S., Yue, Y., Wu, B.: Semantic path based personalized recommendation on weighted heterogeneous information networks. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 453–462 (2015)
(22) Sun, Y., Aggarwal, C.C., Han, J.: Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. arXiv preprint arXiv:1201.6563 (2012)
(23) Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. Acm Sigkdd Explorations Newsletter 14(2), 20–28 (2013)
(24) Sun, Y., Han, J., Aggarwal, C.C., Chawla, N.V.: When will it happen? relationship prediction in heterogeneous information networks. In: Proceedings of the fifth ACM international conference on Web search and data mining, pp. 663–672 (2012)
(25) Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 797–806 (2009)
(26) Tian, S., Yu, Y., Guo, H.: Variable selection and corporate bankruptcy forecasts. Journal of Banking & Finance 52, 89–100 (2015)
(27) Tobback, E., Bellotti, T., Moeyersoms, J., Stankova, M., Martens, D.: Bankruptcy prediction for smes using relational data. Decision Support Systems 102(oct.), 69–81 (2017)
(28) Tsai, M.F., Wang, C.J.: On the risk prediction and analysis of soft information in finance reports. European Journal of Operational Research 257(1), 243–250 (2017)
(29) Wang, C., Song, Y., Li, H., Zhang, M., Han, J.: Text classification with heterogeneous information network kernels. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
(30) Wang, H., Zhang, F., Hou, M., Xie, X., Guo, M., Liu, Q.: Shine: Signed heterogeneous information network embedding for sentiment link prediction. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 592–600 (2018)
(31) Xiao, Y., Xiang, R., Sun, Y., Sturt, B., Han, J.: Recommendation in heterogeneous information networks with implicit user feedback. In: Acm Conference on Recommender Systems (2013)
(32) Yin, C., Jiang, C., Jain, H.K., Wang, Z.: Evaluating the credit risk of smes using legal judgments. Decision Support Systems 136, 113364 (2020)
(33) Zhong, E., Wei, F., Yin, Z., Qiang, Y.: Modeling the dynamics of composite social networks. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (2013)