#### Abstract

Factor models provide a cornerstone for understanding financial asset pricing; however, research on China’s stock market risk premia is still limited. Motivated by this, this paper proposes a four-factor model for China’s stock market that includes a market factor, a size factor, a value factor, and a liquidity factor. We compare our four-factor model with a set of prominent factor models based on newly developed likelihood-ratio tests and Bayesian methods. Along with the comparison, we also find supporting evidence for the alternative t-distribution assumption for empirical asset pricing studies. Our results show the following: (1) distributional tests suggest that the returns of factors and stock return anomalies are fat-tailed and therefore are better captured by t-distributions than by normality; (2) under t-distribution assumptions, our four-factor model outperforms a set of prominent factor models in terms of explaining the factors in each other, pricing a comprehensive list of stock return anomalies, and Bayesian marginal likelihoods; (3) model comparison results vary across normality and t-distribution assumptions, which suggests that distributional assumptions matter for asset pricing studies. This paper contributes to the literature by proposing an effective asset pricing factor model and providing factor model comparison tests under non-normal distributional assumptions in the context of China.

#### 1. Introduction

Factor models play a fundamental role in explaining the risk premia on financial assets and serve as the benchmark for constructing investment portfolios. China has the world’s second-largest stock market, but the determinants of risk premia on stocks in China remain largely unknown to both researchers and practitioners. Most research on China’s stock market still follows the tradition in the US stock market by applying the three-factor model of Fama and French (FF3) [1] as the benchmark [2, 3]. This may be biased considering China’s unique economic and institutional environment and separation from developed markets. Liu et al. [4] reveal that the returns of the smallest stocks in China are largely related to the potential of being shells in reverse mergers instead of their fundamental values. After excluding the smallest 30% stocks that are subject to shell-value contamination, they propose an adjusted three-factor model (CH3) that uses earning-to-price ratio to construct the value factor. They show that the CH3 model dominates the FF3 and is able to explain many of the financial anomalies in the Chinese stock market. However, the CH3 model still fails to price some important anomalies, which motivates further exploration.

In this paper, we extend the CH3 model of Liu et al. [4] by introducing a liquidity factor. Existing literature suggests that the liquidity risk is deemed a greater concern in China than in developed stock markets [5–7]. First, unlike developed markets that are quote-driven, China’s stock market is an order-driven market without market markers. Under this circumstance, stock returns are sensitive to liquidity-related trading costs (e.g., price impact). In addition, China’s stock market is also known for its large fraction of inexperienced retail investors, who contributed to 82.01% of the total trading volume in 2018 according to the annual statistics of the Shanghai Stock Exchange. Retail investors favor speculative trading with short holding periods and, as a result, cannot provide stable liquidity. Moreover, government interventions may hurt market liquidity due to the transaction constraints on state-owned shares and its policy intentions for stock market stability [7, 8]. This evidence points to the important role of stock liquidity. Motivated by the facts, we extend the CH3 model by introducing a liquidity factor based on the illiquidity measure of Amihud [9], which is a prevalent proxy for price impacts in emerging markets like China [10]. We show that our four-factor model outperforms the CH3 model, together with a set of prominent factor models that are widely recognized in developed countries.

When implementing the factor model comparison, we also note that inappropriate distributional assumptions can induce substantial bias to the results [11]. Since Fama [12], extensive literature has revealed that the distributions of stock returns have fat tails and the normality assumption is not supported by data. The normality assumption, however, has been widely applied in the asset pricing statistical inferences because of its elegant statistical properties. Specifically, t-distribution is employed because it is the most representative fat-tailed distribution, and normality-based asset pricing theories tend to be applicable under t-distributions [11]. We first test whether the factors and anomalies can be described by t-distributions rather than normal distributions based on skewness and kurtosis [13]. We show that normality assumptions are strongly rejected because of the fat tails, and t-distributions are able to well capture such a salient feature.

In the spirit of this, we conduct the model evaluation under the alternative t-distributions. Consistent with tests under normal distribution, we first compare our model with a set of candidate factor models by examining their ability to explain each other’s factors. Utilizing the likelihood-ratio test proposed by Kan and Zhou [11], we find that our four-factor model successfully explains all factors in the other models, while none of the existing models can subsume our four-factor model. Using the same method, we further test the candidate models’ ability to price a large set of return anomalies and find that our four-factor model also outperforms the other models. Lastly, considering that the foregoing results are based on frequentist approaches, we complement the empirical analysis using a newly developed Bayesian methodology under t-distributions [14]. The results confirm that our four-factor model dominates. In addition, we also conduct the empirical tests under normality assumptions. The results show that distributional assumptions may cause substantial changes to the empirical results, but our model still outperforms.

Our contributions to the literature are mainly twofold. First, we achieve a significant improvement on the factor models in China by introducing an effective liquidity factor. Through rigorous model comparison procedures, we show that our four-factor model outperforms the prominent factor models in the literature. In particular, by surveying the most comprehensive list of anomalies for China, we confirm that our proposed factor model has superior explanatory power. Second, we contribute to the factor model comparison literature by providing evidence under alternative distributions instead of normality. Our results suggest that distributional assumptions can cause substantial changes to empirical results and reveal the necessity of considering alternative distributions in factor model studies. To the best of our knowledge, we are the first to evaluate and compare factor models under non-normal distributions for China’s stock market.

#### 2. Literature Review

##### 2.1. Asset Pricing Factor Models in China

Early research on factor models in China mainly replicates the three-factor model (FF3) of Fama and French [1] and the five-factor model (FF5) of Fama and French [15]. These studies apply the Fama–French models in China and examine their performances. Yang and Chen [16], for instance, reveal the existence of size and value effects in A-share markets of China and find that the FF3 model applies to China’s stock markets by fully pricing the excess returns of 25 portfolios formed based on size and book-to-market ratios. Fan and Shan [17] further verify the ability of FF3 to explain a set of anomalies including market capitalization, trading volume, book-to-market, and the ratio of A-shares to total shares. However, these studies are carried out using data of early 2000s when China’s stock market was at its early stage. Hu et al. [18] point out that value effect is actually not robust in China using a longer and more recent sample. They argue that the contradiction stems from several outliers in the early years of the market and their impacts fade away in a longer sample. There are also disagreements on the advisability and effectiveness of the FF5 model in China. Zhao et al. [19] find that profitability and investment factors are not significant in China, so FF3 fits China’s stock market better than FF5. However, Lin [20] points out that the profitability factor contributes significant explanatory power and FF5 consistently outperforms FF3 in China. Li et al. [21] further reveal that profitability and investment effects are both significant in China’s stock markets and FF5 dominates FF3.

There are researchers who attempt to construct alternative China-specific factor models instead of simply following the US models proposed by Fama and French [1, 15]. Pan and Xu [22] find that price-to-earnings ratio has better predictability for China’s stock markets than price-to-book ratio. Based on this result, they propose an adjusted FF3 model using price-to-earnings ratio as the proxy for value. Recently, Liu et al. [4] further take the influence of shell-value contamination into consideration. They point out that the value of the smallest stocks in Chinese market can be largely attributed to the potential of being shells in reverse mergers instead of their fundamental values. This is due to China’s stringent regulations on the initial public offering (IPO) market and the high IPO costs. They find that the market values of smallest 30% percent stocks are significantly exposed to the shell-value contamination. After excluding the smallest 30% stocks, they propose an adjusted three-factor model that uses price-to-earnings ratio to construct the value factor and show that the proposed model performs well in China.

##### 2.2. Model Evaluation under Nonnormal Distributions

Existing research mainly evaluates factor models based on normality assumption [1, 15, 23]. Only a few have explored the implications of alternative distributions and the results are mixed.

Some early studies find no impact of distributional assumptions on empirical conclusions. Harvey and Zhou [24] investigate the mean-variance efficiency of international portfolios and find that the results are the same across normality and t-distribution assumptions. Groenewold and Fraser [25] examine the sensitivity of mean-variance analysis to iid-normality assumption using Australian data. Although the iid-normality assumption is rejected by the data, they find no difference between the results under different distribution assumptions for the unconditional CAPM, the conditional CAPM, and the APT model.

However, Zhou [26] points out that the mean-variance framework of factor models makes sense if and only if the ellipticity assumption of returns holds. The empirical results will be biased when the ellipticity assumption is maintained while normality assumption is not consistent with the data. Beaulieu et al. [27] examine several factor models’ mean-variance efficiencies and ability to explain return anomalies under normal distribution and t-distribution, respectively. Their results suggest that incorporating fat tails improves the models’ explanatory power in terms of pricing anomalies and the corresponding mean-variance efficiencies are less rejected. Kan and Zhou [11] provide a tractable method to estimate and evaluate factor models. They show that the normal assumption is violated for most financial assets in the US, due to the salient fat tails of their return distributions. They further find that model evaluation results can have drastic changes when switching the distributional assumption from normality to t-distribution.

#### 3. Methodology

##### 3.1. Distributional Tests

Let be the asset returns at time for an investment portfolio . To test whether can be described by a normal distribution, we conduct tests based on the skewness and kurtosis :where and are the sample mean and the covariance matrix of , respectively. According to Kan and Zhou [11] and Mardia [13], if obeys a normal distribution, the expectation of and would be zero and three, respectively, and linear transformations of do not change the values of and . Therefore, to test the normality of , we can assume that its true distribution is a standard normal distribution. Then, we derive the empirical distributions of and by simulating 100,000 draws from the standard normal distribution using Markov Chain Monte Carlo (MCMC) methods and test whether and can be described by normal distributions accordingly.

This methodology also applies to test whether can be described by a t-distribution with a degree of freedom of . In this case, the density function iswhere and is the covariance matrix.

##### 3.2. The Likelihood-Ratio Method for Asset Pricing Tests under t-Distributions

Let be a return vector for factors and the excess returns for testing portfolios, . To examine whether model can explain , asset pricing studies generally run the following regression:and test the parametric restrictions for the alphas:

If the alphas are not significantly different from zero, then we say can be explained by model . Note that can be either factors from other models or return anomalies.

Assuming that follow a t-distribution with unknown degrees of freedom, Kan and Zhou [11] propose a likelihood-ratio (LR) test for (5). They show that, under the null hypothesis in (5), the corresponding LR statistics follow a Chi-square distribution asymptotically:where is the likelihood function under t-distributions as given in (3) and and are the estimations of the means, covariance matrices, and degrees of freedom for and , respectively, based on the Expectation-Maximum algorithm in Kan and Zhou [11]. Notice that and are the scaled covariance matrices with and , where and are the sample covariance matrices for and .

It is noteworthy that the degrees of freedom are assumed to be unknown in the above procedures. In fact, the method also applies when we specify the corresponding degrees of freedom a priori.

##### 3.3. Bayesian Model Comparison under t-Distributions

Different from the frequentist approaches, Bayesian methods compare competing models in terms of their marginal likelihoods with solid theoretical basis. Higher marginal likelihoods indicate better performances in fitting real data. We adopt a recent Bayesian method of Chib and Zeng [14], which features little user-intervention and therefore helps avoid data mining.

In the Bayesian method of Chib and Zeng [14], test assets are irrelevant and we only need to focus on factors. Let be return for traded factors at time , where is the pricing model that includes factors and is the remaining factors, . We assume that all factors here are under a multivariate t-distribution:where is the mean vector, is the variance and covariance matrix, and is the degree of freedom for the distribution. Equation (7) can be represented in a form of a gamma-scale mixture of normal distributions:

Define the stochastic discount factor as , and because of the restriction that , we havewhere , .

When conducting Bayesian model comparison, we need to sample the posterior distributions of . To achieve this goal, we firstly need to set the prior distributions of the parameters above. However, in fact includes parameters. When the number of candidate factors is large, it is difficult to set the priors, and the results will be highly sensitive to the choice of priors. Therefore, Chib and Zeng [14] adopt the commonly used multivariate t-distributions and inverse Wishart distribution as the priors of and , respectively. Under this circumstance, we can obtain the prior distributions of by specifying only three parameters and selecting a subsample of as the training sample.

Specifically, let the prior of be a multivariate t-distribution, which can be represented as a scale mixture of normal distribution:where are the sample means of the training sample, is the dispersion, is the number of degrees of freedom, and are the latent scale random-variables. As for the prior of , we employ an inverse Wishart distribution:where is the number of degrees of freedom and is the sample variance-covariance matrix of the training sample of scaled by .

Based on the foregoing setup, we only need to specify the training sample of and the values of to sample the prior distributions of and . Then, we can employ the Markov Chain Monte Carlo (MCMC) method to obtain the posterior probability of :where is the corresponding likelihood function. Finally, after deriving the posterior distributions, we can compute the marginal likelihood of a given model . We do not report the details of MCMC sampling here and refer the interested readers to the original paper.

Following Chib and Zeng [14], we let be equal to , equal to 0.0025, and equal to 2.1. When sampling the priors, we repeat the sampling steps 80,000 times and discard the first 40,000 burn-in draws. The posterior distributions are obtained based on 10,000 MCMC draws beyond a burn-in of 40,000.

#### 4. Data

##### 4.1. Sample

The sample period is from January 2000 to December 2019. The trading and financial data in our paper all come from CSMAR database, except for the factors of Liu et al. [4], which are obtained from CRSP.

We include all A-share stocks of Shanghai and Shenzhen stock exchanges, whose first two digits of the unique Chinese six-digit stock identifier are 00, 30, and 60. We further impose some filters considering the economic and political background of China’s stock markets following Liu et al. [4]. We exclude stock observations that (i) become public within six months, (ii) have less than 120 daily trading records during the past 12 months, or (iii) have less than 15 daily trading records in the last month (except for the months of the Spring Festival). These filters are set to limit the influence of IPO and long trading suspensions. Moreover, we eliminate the stocks that are in the bottom 30% size group because of the shell-value contamination following Liu et al. [4]. The proxy for size is total market capitalization including nontradable shares. We choose one-year deposit rate as the risk-free rate.

##### 4.2. Models and Factors

The model we propose is built based on the CH3 model of Liu et al. [4], and we introduce a new liquidity factor. The liquidity factor, ILLIQ in this paper, is constructed as follows. In each month, we independently sort the sample stocks into two groups based on their market value, small (*S*) and big (*B*). We also sort stocks into three illiquidity groups based on the illiquidity measure of Amihud [9]. The most liquid 30% are defined as group *L*, most illiquid 30% are sorted into group *H*, and the rest are in group *M*. We then form six value-weighted portfolios: *S*/*H*, *S*/*M*, *S*/*L*, *B*/*H*, *B*/*M*, and *B*/*L*. The ILLIQ factor is defined as

We therefore call this new four-factor model CH3 + ILLIQ as an abbreviation.

The finance literature has proposed various factor models while most of them have been ignored in the studies of China’s stock markets. To better examine the performance of our model CH3 + ILLIQ, we compare it with several prominent factor models that are widely recognized along with the China-specific three-factor model (CH3).

Our first choice contains those well-recognized models inspired by economic and finance theory including the five-factor model (FF5) of Fama and French [15] and the Q-factor model of Hou et al. [28]. We also choose an alternative five-factor model (FF5CP) of Fama and French [29] that replaces the accruals-based profitability factor in FF5 by a cash profitability factor. Another important criterion for choosing candidate models and factors is empirical evidence. In addition to liquidity, momentum is also one of the most studied capital market phenomena. Although literature points out that momentum strategies generally fail in China [30, 31], it is still worthwhile to explore momentum in the stock universe without the smallest 30% stocks. We therefore include the models that combine the abovementioned Fama–French models and momentum including Carhart [32] and the two six-factor models (FF6 and FF6CP) of Fama and French [29]. Lastly, we also incorporate the four-factor model (CH3 + PMO) of Liu et al. [4] that adds a sentiment factor PMO into CH3 and shows that CH3 + PMO outperforms CH3.

Table 1 presents all the candidate models and their respective component factors. We replicate the factors following corresponding literature except for the factors (SMB-CH and VMG) of CH3, which are directly obtained from CRSP database. Specifically, the Fama–French factors (MKT, SMB, HML, RMW, CMA) are constructed following Fama and French [15]. RMWCP of Fama and French [29] is sorted by cash profitability as opposed to RMW, which is based on accruals-based operating profitability. HMLm, as opposed to HML, uses the market cap of last month and rebalances monthly. Factors (ME, IA, ROE) from Hou et al. [28] are 2 × 3 × 3 sorted. ME, sorted by market cap, and IA, sorted by the annual growth rate of total asset, are rebalanced annually. ROE is the latest reported net profit divided by book equity of last quarter. UMD is sorted by cumulative returns from t-12 to t-2 and monthly rebalanced. PMO from Liu et al. [4] is sorted by abnormal turnover and rebalanced monthly. All factors are value-weighted monthly long-short returns, except that UMD is equal-weighted.

##### 4.3. Anomalies

Literature has extensively explored the anomalies in the US stock market [33], whereas related research is limited in China. Considering the importance of anomalies in evaluating asset pricing models and constructing portfolios, it is of great interest to investigate anomalies in China. Jiang et al. [34] survey a large list of stock characteristics and examine their predictability for Chinese stocks. Following their list, for each characteristic we sort the stocks into ten deciles, and the corresponding anomaly is the difference between the value-weighted returns of the highest and lowest portfolios. After excluding the characteristics that cannot be used to construct anomalies using the above procedure, we construct 65 return anomalies, and the details are provided in Table 2.

It is worth mentioning that not all the anomalies constructed will be used for model comparison. In the spirit of related literature, we only employ the anomalies that cannot be priced by CAPM under proper distributional assumptions. Since we survey the most comprehensive list of anomalies in model comparison studies for China, this also adds to the contribution to related literature. Finally, we derive a list of 15 significant anomalies that fall into four categories according to Jiang et al. [34]: (1) value-versus-growth: earnings-to-price (EP) and sales-to-price (SP); (2) profitability: return on equity (ROE), gross profitability ratio (GP), and Zscore (Z); (3) momentum: one-month reversal (REV) and change in six-month momentum (CHMOM); (4) trading frictions: market capitalization (MV), one-month abnormal turnover (ABTURN), one-month volatility (VOL), turnover (TURN), idiosyncratic return volatility (IVOL), maximum daily returns (MAX), price delay (PRCDEL), and market beta (BETA). The details are provided in Table 2 along with other anomalies that are explained by CAPM.

#### 5. Empirical Results

In this section, we first present distributional tests on both risk factors and return anomalies. Then, under proper distributional assumptions, we report model comparison results using three approaches based on the candidate models: (1) ability to explain the factors in each other, (2) ability to price return anomalies, and (3) Bayesian marginal likelihoods.

##### 5.1. Distributional Tests

First of all, we use the method in Section 3.1 to test whether the factors and anomalies can be described by normality or t-distributions based on skewness and kurtosis. It is noteworthy that we need to specify the degrees of freedom (d.f.), which are larger than 2, for t-distributions. The larger the d.f., the more the similarities between t-distributions and normality. For robustness concerns, we set the d.f. to be 3, 4, 5, 8, 16, and 32, and we find that t-distributions with small d.f. tend to fit the data better. For conciseness, we only report the results corresponding to the d.f. of 3 in this section. Our results are robust to appropriate alternative parameter values.

We first test the distributional hypothesis of each factor, respectively. The univariate tests of the 14 factors are presented in Table 3. We can see that each factor has a kurtosis that is higher than 3, which indicates a fat-tail distribution. The statistical tests for kurtosis show that the normality assumptions are all rejected at the 1% significance level, while none of the t-distribution assumptions is rejected. In terms of the skewness tests, we can see that all the factors can be described by t-distributions whereas only half of the factors satisfy normality assumptions. We also conduct the same tests for the return anomalies and find similar results. (We do not report the univariate tests of the anomalies out of readability concerns since we have 65 anomalies in total. The results are available upon request.)

Considering that related empirical analysis mainly relies on the joint distributions of factors and anomalies, it is necessary to test the multivariate kurtosis and skewness as well. Note that, following literature, we only employ the anomalies that cannot be priced by CAPM at the 5% significance level for empirical analysis. Based on the results above, we test whether the anomalies can be priced by CAPM under t-distributions, of which the degrees of freedom are assumed to be unknown and estimated based on the methods as illustrated in Section 3.2.

Table 4 presents the multivariate tests for the candidate factors and the 15 anomalies that cannot be explained by CAPM. Consistent with the univariate tests, we also set the degrees of freedom to be 3. Given the results of the univariate tests, it is not surprising to see the strong rejection of normality assumptions in terms of both skewness and kurtosis. On the other hand, both the joint distributions of factors and anomalies advocate the use of t-distributions.

In sum, fat tails are a nonnegligible feature of factors and anomalies and deserve serious investigations in related studies. We therefore choose t-distributions as the alternative distributional assumption for normal distributions in the following model comparison analysis.

##### 5.2. Model Comparison

As discussed earlier, we propose a four-factor model (CH3 + ILLIQ) that combines the three-factor model (CH3) of Liu et al. [4] and an liquidity factor (ILLIQ) that is constructed based on Amihud [9] illiquidity measure. A main goal of this paper is to examine whether the CH3 + ILLIQ model outperforms existing factor models. In this section, we compare the candidate models based on their ability to explain the factors in each other and the anomalies, as well as Bayesian marginal likelihoods.

All the empirical tests are carried out under t-distributions and, therefore, we need to specify the d.f. for each test. To ensure that our results are consistent with each other, we use the optimal d.f. of the joint distribution of the 14 candidate factors and 15 anomalies throughout the empirical analysis. Based on the Expectation-Maximum method of Kan and Zhou [11], the corresponding optimal d.f. is set to be 3.2462.

###### 5.2.1. Redundancy Tests for the Factors in CH3 + ILLIQ

Before any formal comparison, we first need to check the redundancy of the ILLIQ factor in the CH3 + ILLIQ model, i.e., whether ILLIQ can be explained by CH3. To this end, we use the method of Section 3.2 to test whether the intercept of ILLIQ with respect to CH3 is significant. As shown in column (1) of Table 5, the corresponding intercept is significantly different from zero. This suggests that the new factor ILLIQ is not redundant and brings additional information to the baseline CH3 model. Likewise, we also test the redundancies of the remaining three factors, MKT, SMB-CH, and VMG, and the results in columns (2) to (4) show that none of them is redundant.

The redundancy tests above also help justify the validity of CH3 + ILLIQ and confirm that CH3 + ILLIQ outperforms CH3 in the spirit of Fama and French [15, 29] and Hou et al. [23, 28].

###### 5.2.2. Explaining Factors

We proceed to compare the candidate models based on their ability to explain the factors in each other. Specifically, to test whether model A can explain model B, we regress the exclusive factors in B on A and test whether the intercepts are jointly zero using the LR method in Section 3.2 under t-distributions. A is considered to outperform B if the corresponding intercepts are not significant. Table 6 gives the results using CH3 + ILLIQ as the benchmark. (It is worth mentioning that this section involves pairwise model comparison, and the LR statistics therefore may not be comparable across model pairs due to the different number of factors in each regression.) We can see that all the other candidate models fail to explain CH3 + ILLIQ with -values smaller than . In contrast, CH3 + ILLIQ can explain all the other candidate models in the sense that none of their intercepts with respect to CH3 + ILLIQ are statistically significant at the 10% significance level. In particular, while Liu et al. [4] show that their four-factor model CH3 + PMO outperforms CH3, our results show that CH3 + ILLIQ dominates CH3 + PMO significantly. In sum, these results indicate that CH3 + ILLIQ dominates other candidate models in terms of explaining the factors in each other.

###### 5.2.3. Explaining Anomalies

We further evaluate the performances of the factor models in terms of their ability to explain the 15 significant anomalies, which are presented in Section 4.3. The left panel of Table 7 presents the LR tests for the intercepts of regressing the anomalies jointly on the candidate models using the method in Section 3.2 under t-distributions. We can see that only CH3 + ILLIQ succeeds in explaining all 15 anomalies, whereas all the other candidate models fail to do so at the 5% significance level. To examine whether the t-distribution assumption is essential, we also conduct analogous tests under normality assumptions using the classical GRS test of Gibbons et al. [78]. The estimation results are reported on the right panel of Table 7. We find that the normal assumption may have an inclination to overestimate the explanatory power of models: both CH3 + ILLIQ and CH3 + PMO can explain the candidate anomalies jointly. These results indicate that distributional assumptions can have a large impact on model comparison. Since the ability to explain anomalies is one of the most important criteria for factor model evaluation, this further verifies the necessity of using proper distributional assumptions in related studies.

To further explore the details, we present the results of univariate analysis by regressing each anomaly on the candidate models, respectively. In Table 8, we only report each anomaly’s alphas with respect to CH3 + ILLIQ, CH3 + PMO, and CH3, which are the top three candidates according to Table 7. We can see that the t-distribution seems to provide more strict anomaly tests. Under normality assumptions, CH3 + PMO only fails to explain MV at the 10% significance; however, it fails to price three different anomalies under t-distributions. As for CH3, the anomalies unexplained remain the same across distributional assumptions but CH3’s performance is evidently weaker under t-distributions. These results further underscore the importance of using proper distributional assumptions.

In sum, the results above show that CH3 + ILLIQ outperforms other candidate models in terms of explaining return anomalies.

###### 5.2.4. Bayesian Model Comparison

Lastly, considering that the above results are based on frequentist approaches, we complement the empirical analysis from a Bayesian perspective using the newly developed method of Chib and Zeng [14]. The Bayesian method enables us to directly rank a set of candidate models according to their marginal likelihoods. A higher marginal likelihood would indicate a better performance in terms of explanatory power and fitness.

We conduct the tests under two alternative distributions, respectively. The left panel of Table 9 gives the log marginal likelihoods of the candidate models under t-distribution assumptions. To be consistent with the above analysis, the degree of freedom in the Bayesian analysis is also set to be 3.2464. The results show that CH3 + ILLIQ produces the highest marginal likelihood and outperforms all other candidate models. Notice that even though the log differences between CH3 + ILLIQ and the other models are seemingly small, the implied gaps are actually quite large. For example, a difference of 6.09 between CH3 + ILLIQ’s and CH3 + PMO’s log marginal likelihoods implies that the marginal likelihood of CH3 + ILLIQ is 440 times higher than that of CH3 + PMO. We also present the Bayesian model comparison results under normality assumptions on the right panel of Table 9 and find that CH3 + ILLIQ still dominates other models with the highest marginal likelihood. Moreover, the marginal likelihoods are comparable across different distributional assumptions, and we can see that the marginal likelihood under normality is far smaller than that under t-distributions for all models. This also justifies the usage of t-distributions as the alternative to normality in factor model studies from a Bayesian perspective.

In sum, our four-factor model CH3 + ILLIQ dominates the other candidate models in the sense of Bayesian marginal likelihoods.

#### 6. Conclusions

Motivated by the fact that liquidity plays an important role in Chinese stock market, we propose a four-factor model that extends the three-factor model (CH3) of Liu et al. [4] by introducing a liquidity factor. As suggested by distributional tests, we take into account the fat-tail features of stock returns and compare our four-factor model with a set of prominent factor models under t-distributions using newly developed likelihood-ratio tests and Bayesian methods. Under t-distributions, our model comparison results show that our four-factor model significantly outperforms the other competing models in terms of explaining the factors in the other models and anomalies, as well as Bayesian marginal likelihoods. Our results also indicate that distributional assumptions may cause significant changes to model comparison results and therefore advocate the use of proper distributions instead of normality.

Our paper contributes to the literature by proposing a more effective four-factor model for China. Through rigorous model comparisons, we find that our four-factor model outperforms the prominent factor models from the existing literature. In particular, by surveying the most comprehensive list of anomalies for China in related literature, we find that our factor model has a strong explanatory power on return anomalies. Besides, we provide supporting evidence for t-distribution assumption in factor model comparisons. We show that different distribution assumptions can cause substantial changes to empirical results and reveal the necessity of considering alternative distributions in factor model studies.

#### Data Availability

The data used to support the findings of this study are available upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This research was funded by National Natural Science Foundation of China (grant no. 71672079).