Abstract
Nowadays, small and medium-sized enterprises (SMEs) have become an essential part of the national economy. With the increasing number of such enterprises, how to evaluate their credit risk becomes a hot issue. Unlike big enterprises with massive data to analyse, it is hard to find enough primary information of SMEs to assess their financial status, which makes the credit risk evaluation result less accurate. Limited by the lack of primary data, how to infer SMEs’ credit risk from secondary data, such as information about their upstream, downstream, parent, and subsidiary enterprises, attracts big attention from industry and academy. Targeting on accurately evaluating the credit risk of the SME, in this study, we exploit the representative power of the information network on various kinds of SME entities and SME relationships to solve the problem. With that, a heterogeneous information network of SMEs is built to mine enterprise’s secondary information. Furthermore, a novel feature named meta-path feature is proposed to measure the credit risk, which makes us able to evaluate the financial status of SMEs from various perspectives. Experiments show that our proposed meta-path feature is effective to identify SMEs with credit risks.
1. Introduction
Small and medium-sized enterprise (SME) is one of the backbones in the national economy, whose development directly affects it. However, due to the incomplete management system and the lack of appropriate financial indicators, the credit risk assessment process is usually time-consuming, and the evaluation result is often of low accuracy. Therefore, in this study, we are going to propose an appropriate method of credit risk assessment to target this problem.
Industry and academy always have a critical focus on how to measure enterprise credit risk. Conventional approaches of assessment mainly extract enterprise-related features, such as financial indicators, to predict enterprise solvency. However, with the expansion of global market size in recent years, conventional approaches have lost their power of discrimination in the situations, where relations and interactions between SMEs are numerous and complicated. An SME’s financial status can be easily affected by some actions from its other related SMEs. For example, the contagion risk is caused by associated credit entities, which besets many SMEs with the risk of default even in good financial conditions. Therefore, rather than single financial indicators, relations and interactions between SMEs should be paid more attention in studying SME credit risk.
To model the relations and interactions, various entities and their relationships can be considered in the information networks [1]. In the previous one, most of the researchers studied the abovementioned problem with a homogeneous information network [2] consisting only one single relation type and one entity type. However, in SME setting, the structure of the homogeneous information network may be a bit simple to explain the relationships between SMEs. To not lose important information, a heterogeneous information network [3] with complicated graph structure is more suitable to study the interaction between SMEs. In the heterogeneous information network, meta-paths (MP) [3] are taken as a fundamental data structure to capture semantical relationships between entities. Through MP, complicated relationships between entities can be systematically and concisely defined. The path provides a clear view of how entities interact mutually in the information network. In this study, to assess the status of SMEs, we exploit the power of meta-path to study how influences among financial entities spread in the information network of SMEs.
In our method, we first build a heterogeneous information network of SMEs to describe interactive relationships between different entities associated with SME. Figure 1 is a toy example of the Alibaba heterogeneous information network, which demonstrates some possible connections of Alibaba and its related entities. For example, path “” represents information that Lazada is a subsidiary of Alibaba; path “” represents information that Alibaba’s CEO, Bob, is also Taobao’s controller; and path “” represents information that Alibaba’s control enterprise, YouKu, is criticized by the newspaper. It is easy to see that through information networks, the interrelated relations between entities can be easily obtained. By building the information network of SMEs, we can not only obtain the self-related information but also the interactive information associated with the target enterprise.

With the given information network of SMEs, we propose a novel feature, -meta-path feature, to measure the impact through meta-paths from one financial entity to another. Unlike conventional financial indicators, the meta-path feature can be defined and applied very flexibly. The flexibility makes us able to evaluate the credit status of SMEs from various perspectives more comprehensively. The proposed meta-path feature can also explicitly show how much one entity can be affected by a specific logical path, which can provide an intuitive view for banks, lenders, and relevant experts to understand the credit risk faced by SMEs. In this way, SME default can be effectively identified.
The main contributions of this study are as follows:(i)Due to the low relationship capturing power of the conventional approaches, in our method, we build a heterogeneous information network of SMEs to describe interactive relationships between different entities associated with SME(ii)Propose three meta-path features to measure the impact through meta-paths from one financial entity to another from different angles(iii)Our proposed meta-path features improve the performance of the SME credit risk evaluation. We compared state-of-the-art SME credit risk evaluation features with our proposed three meta-path features. Our meta-path feature achieved better results compared to state-of-the-art features.
In the rest of this paper, Section 2 introduces the SME credit risk evaluation method and the application of information networks. Section 3 builds a model of SMEs’ heterogeneous information network and proposes the meta-path feature. In Section 4, by considering the ability of risk identification, three features are proposed based on the meta-path. Section 5 presents the experiment on three real-world datasets, and Section 6 concludes the study.
2. Related Works
In this section, we will review the related studies from the following perspectives: SME credit risk evaluation methods and information network applications.
2.1. SME Credit Risk Evaluation Methods
The credit risk evaluation model of SMEs was first established by Edmister [4] in 1972, leading to the emergence of a large number of credit risk measurement index systems. Most of the early credit evaluation models for SME at home and abroad follow the index system of the credit evaluation model for large enterprises, that is, the extraction of some key financial indicators of enterprise financial statements. Among these key financial indicators, profitability indicators [5, 6], such as the operating profit ratio and ratio of profits to cost, and solvency indicators [4, 5, 7], such as the current and quick ratio, are used the most. Besides, operational capacity indicators [8], development capacity indicators [9], and liquidity indicators [9] are added in many studies. Since financial indicators alone cannot lineate the complete picture of an enterprise, nonfinancial indicators such as managers background [10, 11], working experience [6], and enterprise internal structure [12, 13] are added for evaluation. However, financial and nonfinancial indicators cannot capture the contagion credit risk among financial entities since they are independent and do not consider the casual chain.
With the development of big data technology, a large amount of unstructured data related to enterprises have been accumulated, such as enterprise news information, enterprise transaction data, and enterprise relational data. The information used for SME credit risk evaluation has been extended. With the help of natural language processing techniques, we are able to explore meaningful information from text information. For example, Mosteller and Wallace [14] proposed an approach to analyse the Federalist papers; Spafford and Weeber [15] proposed an approach to analyse software forensics; Akram et al. [16] proposed a short text clustering technique using the deep learning model. Abbasi [17] proposed a framework to extract author-related information from unstructured textual information. In the field of SME credit risk evaluation, Tsai and Wang [18] proposed a method to extract enterprise-related news information and used it to support credit risk evaluation; Yin et al. [19] utilized legal judgments to support the evaluate of SME credit risk. Other than textual information, different kinds of relational information were also used for SME credit risk evaluation. Letizia and Lillo [20] used payment relation between enterprises; Tobback et al. [21] used enterprise’s common shareholder and common director relation to extract interenterprise information; Kou et al. [22] focused on three different kinds of enterprise information, namely, basic enterprise information, manager/shareholders information, and payment and transactional information, to extract useful information. A summary of different types of features used in SMEs credit risk evaluation is listed in Table 1.
However, all of their works are built homogeneously, most of which do not consider heterogeneous information.
2.2. Information Network Applications
Recently, with the rapid improvement of computing capacity and the development of data mining technology, the information network has gained much attention from researchers and makes excellent work in the field of clustering [27–29], classification [30, 31], relation prediction [32, 33], and recommendation [34, 35]. Researchers often use two kinds of information networks, namely, the homogeneous information network and the heterogeneous information network. The homogeneous information network builds with same type of objects and link relations. For example, Jamali and Ester [36] built a social network for user recommendation based on user ratings; Ma et al. [37] built a friend relationship prediction network based on personal relations. These homogeneous information networks ignore the relationship between different objects and relations, which causes the loss of important information. The concept of heterogeneous information network was first proposed by Shi et al. [3] in 2009. It combines more information and contains logical semantics of different object types and link types. For example, Wang et al. [38] proposed a signed heterogeneous information network embedding to capture the sentiment links of online social information by considering users with sentiment and social relations; Hosseini et al. [39] used the heterogeneous information network with high dimensional data and rich relationships for medical diagnosis. The heterogeneous information network is usually used to capture complicated semantic and logical relationships among different entities.
2.3. Heterogeneous Information Network for SMEs
In the above-discussed related work, the state-of-the-art SME credit risk evaluation information is built on homogeneous information networks. It can only capture one single type of entity and one single type of relation, which is hard to capture the complicated relations of SMEs. Since massive data have been cumulated and many data analysis methods have been proposed, we are able to build a complicated network to capture more information of SMEs. The heterogeneous information network is able to capture more complicated graph structure, which is more suitable for SMEs. Therefore, in this study, we build a heterogeneous information network for SMEs to more effectively evaluate SME credit risks, which considers both the heterogeneous information of SMEs and the semantic information carried by different SME entities. In this way, we are able to capture more information to accurately evaluate the credit risk of SMEs.
3. Model of SME Credit Risk
To evaluate SME credit risk, conventional methods adopted by experts usually make their judgments only based on the features directly affecting SME default, such as the asset-liability ratio, current ratio, and turnover rate, but not on logical relationships between SMEs, such as parent and subsidiary situations, upstream and downstream situations, enterprise director, and high-level manager related situations. For example, when a parent company defaults, the solvency of its subsidiaries will also be affected. If the influences exerted by the parent company are neglected, its subsidiary company’s default conditions will be overestimated. Therefore, apart from the features directly affecting default, the logical relationships between SMEs should also be considered in evaluating SMEs’ status. Paying attention to different connections between SMEs can improve both the reliability and the interpretability of the evaluation. This section will give a model of SME credit risk with logical relationships adopted.
3.1. SME Heterogeneous Information Network
A heterogeneous information network [3] is a classical data structure used to model objects and relations in a directed graph. This graph structure has shown its superiority in representing and storing knowledge about the natural world for many applications [40–42]. Given different objects in information networks, logical connections can be effectively constructed, and semantic relationships can be easily captured. Hence, we also build our model in an information network which is defined as follows:
Definition 1. With a schema , an information network is defined as a directed graph with object type function and relation type function , where object belongs to object type and link belongs to relation type .
In this study, our model is built as a heterogeneous information network of SMEs. The SME schema is shown in Figure 2.
In our model, enterprise, commodity, person, and news are four fundamental object types in studying SME credit risk. The studied relation types are summarized from public enterprise information and objective facts, such as the shareholder relation between enterprise and person, the produce relation between enterprise and commodity, and the report relation between enterprise and news. The types mentioned in this study are listed in Table 2.
With the SME schema defined, an example of SME heterogeneous information network is shown in Figure 3. We can see that , , and are the enterprises, that we have , the same as and are. The and are the commodities, that we have , the same as . The , , , , and are news, that we have , the same as , , , and are. The , , and are persons, that we have , the same as and are. The and are the relation of produces, that we have , the same as . The , , , and are the relation of reports, that we have , the same as , , , and are. The is the relation of supply, is the relation of parent, that we have and . The and are the relations of controller and is the relation of employee, that we have , the same as , and . The is the relation of relate, that we have .


3.2. SME Meta-Path
In the SME network graph, we built in Section 3.1, a graph edge is used to present the relationship between two objects. Limited by the definition of edge, the represented relationships can only be some simple ones, which are insufficient to describe the relationships used in the problem of SME credit risk. In order to model complicated relationships, in this section, we introduce another data structure, meta-path (MP), to represent complicated and implicit relations in the SME network.
Definition 2. With a schema , a meta-path is a path in the form which defines a composite relation between and , where denotes the composition operator on relations.
For simplicity, we use the names of object types and relation types denoting the MP: . With the definition of meta-path, a path in graph follows a meta-path , if for any vertex and any edge , the edge is between and , , and . We also call as a path instance of with the denotation .
According to the definition, some examples of meta-paths can be seen in Figure 2. is a MP, which represents the information that the SME’s parent enterprise has reported a news. According to Figure 3, there is a path instance of MP . Because , , , , and .
The given MP definition structures logical connections between objects, making our model more expressive and interpretable. It not only can show explicit reasons for factors affecting SMEs on credit risk but also can explain implicit logics of correlation between objects having no direct links in the SME information network.
Compared to the information carried by objects, the information carried by meta-path is more critical in evaluating the credit risk of SMEs. The reason is that the expression ability of meta-path is stronger. Through different meta-paths, the same financial object may affect another financial object significantly differently. For instance, in Figure 3, we can see that there exist two paths from person to enterprise . The first one is following meta-path and the second one is following meta-path . From the first path, the bribery scandal of an outsourcing employee may do limited harm to the enterprise since may have many other outsourcing employees to replace the role of . However, from the second path, the bribery scandal of the outsourcing employee may do significant harm to enterprise since has a domestic relation with who directs enterprise . Therefore, instead of inspecting each object’s direct impact, our model regards a whole logical path consisting of objects and relations as a factor, in evaluating the credit risk of SMEs.
4. Meta-Path Impact on SME
In the above section, we have given the definition of MP, a well-patterned structure to represent various semantics relating to SME credit risk. It has been shown that even with no direct link given, the negative information of some SME may affect others heavily through meta-paths. For example, a piece of negative news about an enterprise director may lead to a bad reputation for his enterprise; a low-quality product of a parent enterprise may cause a loss of competitiveness to its subsidiary enterprises. Usually, potential risks brought from paths is nontrivial to be neglected when an SME is evaluated, but how to formulate such potential risk remains a question. In order to solve this question, in this section, we will propose several novel features, named meta-path feature, to represent the risk.
4.1. Risk Inference from Object
Before introducing meta-path features, we first give a method to identify if there exists potential risk in financial objects themselves. According to the object types studied in Section 3.1, except the news object which is used to provide negative or positive information, a commodity object is regarded with potential risks if its quality is not reliable; a person object is regarded with potential risks if his capability is not qualified; an enterprise object is regarded with potential risks if it lacks credibility. In this study, in order to infer if potential risks exist, considering applicability and generality, we use the Naive Bayes model to infer if the mentioned objects are risky or not. Our probabilistic model is learnt from public historical data, such as financial statements, annual reports, and online public news. The definition of our Naive Bayes inference model is given as the following:
Definition 3. With the assumption that each attribute feature of an object is independent of each other, we define an inference function to evaluate if object is risky based on the probability learnt from the Naive Bayes model.where is the th attribute feature of object , is the number of all attributes, indicates the risky object, and indicates the nonrisky object.
With the inference function, we are able to identify the risk of a financial object by its own information. For instance, a commodity object with low sales volume, high repair rate, and high refund will be inferred as a risky one; a person object with irrelevant education background, irrelevant working experience, and short working years will be inferred as a risky one; an enterprise object with the low ROE ratio, low quick ratio, and high asset-liability ratio will be inferred as risky one. In the next section, we will study how to infer the potential risk from the MP level.
4.2. Risk Inference from Meta-Path
In an SME information network, an enterprise may have many paths linking to other financial objects, as shown in Figure 4. We can see enterprise has 5 path instances for meta-path = and enterprise has 4 path instances for MP .

With the inference function defined above, we are able to identify if objects in the above information network are risky or not. Thus, for a specific MP, with the objects linked by its path instances, it is natural to infer that an enterprise is most likely to be risky if potential risks exist in most of its linked objects. Based on this straight intuition, we next present several features to elaborate such risk from meta-path.
4.2.1. Meta-Path Feature
Given an enterprise , the number of risky objects connected by a MP are taken as an indicator to reflect the impact of meta-path on target enterprise . The larger the indicator is, the higher the potential risk exists. Formally, we call the indicator as naive MP feature and give its definition as the following:
Definition 4. Naive MP feature is an indicator to reveal the impact of meta-path on enterprise :where is an SME object collection, is a path instance from object to object , and is the inference function defined in Section 4.1.
In Figure 4, if , , and are the risky objects, then we have = = 0.6, = = 0.75.
4.2.2. Weighted Meta-Path Feature
Although the abovementioned meta-path feature can effectively indicate the impact of MP, it may be argued that the impact of different objects on the same MP should not be the same. For all the objects in the network, irrelevant objects may affect small; relevant ones may matter big. Especially for an SME, the enterprise, which is its parent company, should influence it deeper than the enterprise, which only has one cooperation with it. Therefore, instead of treating all objects equally, it is more reasonable to treat them differently according to their relevance with the target SME. Next, considering relevance between objects, we will give a relevance-weighted version of meta-path feature accordingly.
Usually, relevance is used to measure how close two objects distance to each other. As there is no unified definition of relevance, different applications have unique and appropriate relevance measures. In SME application, there exists a usual fact that even though an enterprise is of well financial status, it may also default, which is caused by the propagated negative influence of its related upstream and downstream enterprises. Therefore, to measure the relevance between SME objects, a logical structure-based relevance measure is better than a textual context-based relevance measure.
A straightforward idea is that for any object pair, the two which have more paths should be more relevant. From this idea, we simply introduce a path count version of MP-weighted feature as follows:
Definition 5. CountSim MP weight feature is an indicator to reveal the structure relevance impact of meta-path P on enterprise . We call it CountSim MP feature.where and are the SME object collections where all links from and to , respectively. is another SME object collection which contains all objects.
The path count version is simple to apply but it makes little use of graph structure. In the SME heterogeneous information network, logical relationships between objects are captured by the structure of graph paths. Hence, compared to other measures, a path-based measure of relevance is more appropriate to be adopted in our model. At last, we apply HeteSim [43], an effective path-based similarity, to evaluate the relevance between objects.
Definition 6. HeteSim MP weight feature takes HeteSim as the similarity measure to reveal the path relevance impact of meta-path P on enterprise . We call it HeteSim MP feature.where is a path instance from object to object , is the relevance between object and object under HeteSim, and is the estimating function defined in Section 4.1.
5. Experiments
In this section, we are going to investigate the effectiveness of meta-path features. We conduct experiments on three real-world SME datasets. The result and explanation are detailed in this part.
5.1. Data and Settings
In our experiments, three datasets recording enterprises’ statistics are used for comparison. GEM (The Growth Enterprise Market from Shenzhen Stock Exchange) and STAR (The Science and Technology Innovation Board from Shanghai Stock Exchange) datasets are about the SMEs of high technology, and SB (The Small and Medium-Sized Enterprise Board from Shenzhen Stock Exchange) dataset is about traditional enterprises. All the datasets can be downloaded from CSMAR (https://www.gtarsc.com). As this study only considers four types of financial entities (person, commodity, enterprise, and news), our experiments are only performed on the enterprises that at least relate to one person, one commodity, one other enterprise, and one piece of news.
The risk information about whether an enterprise lacks credibilities, a person lacks qualifications, and a commodity lacks reliabilities is obtained from CSMAR and CNINF (http://www.cninfo.com.cn), which provide an authoritative and professional assessment on the entities. The news information is collected from China Judgements Online (https://wenshu.court.gov.cn). The final details of datasets are shown in Table 3. As the gathered risk information may not be completed, for some important but unknown entities, we use the model in Section 4.1 to infer their risk. If an entity’s inferred probability is larger than 0.75, it is deemed as risky.
Since the brought impact from a meta-path decreases with its length increasing, we only consider the meta-paths with length less than 6. The meta-paths which do not start with SME type are not selected for our experiments. With the proposed MP features, we test their performance using a default prediction model which is used to learn the weights associated with those features. The logistic regression model is taken as the prediction model, which is optimized by MLE (maximum likelihood estimation).
In this section, all experiments were performed using Python 2.7.17 in Win with CPU processor and RAM.
5.2. Selection of Meta-Path Features
Even limited by the length constraint, there may still exist numerous meta-paths. Among all possible meta-path features, which ones are the most valuable ones? In this section, we will run experiments to show the importance of meta-path features.
We first generate 40 meta-path features according to Definition 4 for simplicity. Then, each feature is tested under the Wald test, and the value of the feature associated with its meta-path is used to evaluate the feature’s importance. The test is performed on all three datasets. Tables 4–6 list the top 20 significant meta-path features for each dataset and Tables 7–9 the bottom 20 meta-path features. From Tables 4–6, we can see that for all three datasets, the controller’s ability , parent enterprise financial status , and news reported for enterprise play very significant roles in determining SME status. However, from Tables 7–9, there is a trend that the longer the relation chains, the worse the performance of MP features. This may be due to the fact that longer links contain less valuable information as the longer relation chains means a more distant relationships with the enterprise. The longer the chain, the more distracting and inaccurate information it contains. Look into details, we find that for GEM and STAR datasets (high-technology SMEs), the MP features containing personnel relations are most significant, while those containing enterprise relations are the least. For SB dataset (conventional SMEs), the opposite is true. It is reasonable that the conventional SME, due to their own resource constraints, will pay more attention to the relationship with stakeholders in order to ensure stable development. The high-technology SME mainly focuses on technology research and development, so the ability of personnel has a significant impact on the enterprise.
5.3. Overall Comparisons of MP Feature
In this section, we compare our three kinds of MP features with four kinds of other state-of-the-art features proposed for evaluating SME credit risk. First kind of the compared features is conventional features [44], such as current liquidity, quick ratio, assets turnover, a total of 16 financial indicators, and age of the enterprise, employment, a total of 5 nonfinancial indicators. In our experiments, we call it SME CV. The second kind of the compared features is textual feature [19], which is modeled from unstructured textual information. It not only contains enterprise basic financial and nonfinancial information but also the enterprise legal information. In our experiments, we call it SME TF. The third kind of the compared features is homogeneous path feature [21], which is modeled from homogeneous information networks. It contains only one object type and only one relation type, for example, two SMEs are related if they share a high-level manager. In our experiments, we call it SME HPF. The last kind of the compared features is multiple homogeneous path feature [22], which is modeled from more than one homogeneous information networks. It not only contains basic enterprise information but also three kinds of homogeneous path features, namely, manager network-based features, shareholder network-based features, and payment network-based features. In our experiments, we call it SME MHPF. For our MP features, we, respectively, select the Naive MP features, CountSim MP features, and HeteSim MP features according to the ranking result in Section 5.2 as the candidate features for comparison. All the comparisons are still conducted on the mentioned three datasets. To compare the mentioned methods, we first select the top 10 performed features of each method. Then, we use their average AUC score as the overall score of each mentioned method. The comparison results are summarized in Table 10.
We can see that the heterogeneous MP features outperform all the comparison features in all three datasets. For the proposed MP features, it turns out that (1) all the MP features show better classification performance than the SME conventional features, textual features, and homogeneous path features; (2) the classification performance of the CountSim MP features and the HeteSim MP features beats the Naive MP features; (3) the classification performance of the CountSim MP features and the HeteSim MP features are similar. The above results demonstrate the effectiveness of our proposed features in classifying default SMEs.
5.4. Discussion
In this section, we will discuss some interesting point which we found in our experiments. In general, prediction accuracy increases with data size increasing. However, we found that for SMEs, the impact of data size is affected by the timestamp of data. Next, we will detail and discuss how this affection comes. Figures 5(a)–5(c) show the classification accuracy of meta-path features under different timestamps.

(a)

(b)

(c)
It is interesting that when we extend SME data used in our model with the latest data in one year, the accuracy of the model increases for all three datasets. But if we extend that with data before last year, the accuracy of the model shows a declining trend. This phenomenon may be due to the fact that if the additional data are still in its valid duration, our model can be learnt more fully within the life circle of the enterprise. But if the additional data are out of its valid duration, our model may be learnt out of the life circle and lose its effectiveness. For example, employee turnover rate over two years cannot reflect the truth about the target enterprise now. The number of corporate enterprises over two years may be changed.
6. Conclusion
This study proposes a meta-path-based SME credit risk evaluation method that models SME-related information as a heterogeneous information network. In detail, we first build an SME heterogeneous information network based on four entity types and ten relation types. The heterogeneous information network of SMEs can capture the relationship among related enterprises and provide more comprehensive and reliable information for the credit risk measurement of SMEs. Then, we extracted meta-path features associated with SME based on the information network schema, which represents the situation of the SME credit risk. Finally, we developed three features to evaluate the effect of meta-path on SME credit risks. The experimental result shows that our proposed SME credit risk measuring method has a higher significance than the state-of-the-art features.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Disclosure
The presentation of the manuscript is used as Arxiv in [45].
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Project of Science and Technology Research and Development of China State Railway Group Co., Ltd. (K2020Z002).