Abstract

Data insufficiency has become the primary factor affecting research on income disparity in China. To resolve this issue, this paper explores Chinese income distribution and income inequality using distribution functions. First, it examines 20 sets of grouped data on family income between 2005 and 2012 by the China Yearbook of Household Surveys, 2013, and compares the fitting effects of eight distribution functions. The results show that the generalized beta distribution of the second kind has a high fitting to the income distribution of urban and rural residents in China. Next, these results are used to calculate the Chinese Gini ratio, which is then compared with the findings of relevant studies. Finally, this paper discusses the influence of urbanization on income inequality in China and suggests that accelerating urbanization can play an important role in narrowing the income gap of Chinese residents.

1. Introduction

Several conflicts exist over the calculation of China’s Gini coefficient. A literature review reveals over 30 different estimations of the Chinese Gini coefficient, all lacking a consensus. The estimations of the Gini coefficient of 1995 best exemplify this situation. While Chen [1] estimated a Gini coefficient of 0.365, in 2002, Chen and Zhou [2] used two different methods and obtained a result of 0.38392 and 0.41914. The latter results were similar to Chen’s [3] and Ravallion and Chen’s [4] 0.4169 and 0.415; however, these similar results were obtained using entirely different methods. Similarly, Xiang [5], Chotikapanich et al. [6], and Xu and Zhang [7] used different methods and derived similar results: 0.3515, 0.3506, and 0.3591, respectively. Hu et al. [8] and Hu [9] adopted the method of distribution function and obtained the values of 0.3761 and 0.3691. Zhao et al. [10] estimated a value of 0.445. The highest value, 0.452, was provided by Khan and Riskin [11], which is 28.9% higher than the 0.3506 by Chotikapanich et al. [6]. Since the National Bureau of Statistics’ survey data on original households has not been made public, major disagreements on the calculation of a Gini coefficient for resident income in China have persisted since the reform and opening up. The Gini coefficient is used as basic statistical data to analyze income inequality. Thus, the discrepancy in estimation methods, combined with data insufficiency, has largely limited research on China’s income inequality [12, 13]. Following is a brief introduction to the existing data sources and this study’s primary data source.

According to the China Statistical Yearbook (2012), in 2011, the household survey teams of the National Bureau of Statistics (NBS) conducted a survey on 66,000 urban households and 75,000 rural households in 7,100 villages. Thus, in comparison with others sources, NBS is a far superior source of data on resident income and regions. However, the yearbook has also been criticized. For instance, Khan and Riskin [11] stated that the statistical data in the yearbook was too aggregated to conduct a careful and deep analysis on income inequality. Similarly, Fang et al. [14] believed that the income disparity within each group had been ignored owing to the aggregated data and the results were not accurate enough. In addition, disputes exist over the yearbook’s income standard. According to Li and Luo [13], the hidden subsidies of urban residents were much higher than those of rural residents. In other words, the actual income gap between urban and rural residents was larger than that depicted by the yearbook. Sutherland and Yao [15] pointed out that NBS did not fully consider factors such as welfare disparity between urban and rural residents, cost of living in different regions, rapid expansion of urban areas, and the large number of migrant workers. To this effect, various scholars hold different opinions on the income gap between urban and rural areas. While Sicular et al. [16] estimated a small income gap, Li and Luo [17] found a dramatic and widening gap.

The Chinese Household Income Projects (CHIPs) measure income distribution using data from surveys conducted on households in selected provinces and cities for 1988, 1995, 2002, and 2007. The data were gathered by the research group at the Institute of Economic Research of Chinese Academy of Social Science. Using the adjusted data from the 1988 and 1995 surveys, Khan and Riskin [11] and Zhao et al. [10] analyzed the income distribution of residents in China. Wang [18] used the data from the 1988 and 1995 surveys to examine income mobility. Research on Income Distribution in China, edited by Li et al. [19], was based on the 2002 survey, while Shi Li and Sui Yang referenced the 2007 data, whose conclusions attracted much attention. However, the sample size of CHIPs is much smaller than that of NBS; for example, CHIPs 1995 samples included 14,929 households from 19 provinces, while NBS was 10,286 households for the same year.

The China Health and Nutrition Survey (CHNS) is a collaborative project by the Population Research Centre at the University of North Carolina, the National Institute of Nutrition and Food Safety, and the China Disease Prevention and Control Centre. In addition to income, the survey uses data on residents, nutrition, health, adults, children, and communities, among others, to analyze income distribution. Shi et al. [20] examined CHNS’ 1997 data on rural and urban income and indicated that NBS provided marginally higher income distribution than that in CHNS. Wei [21] analyzed 1993 data for rural areas to explore factors influencing nonagricultural employment and salary. Wang [22] used 1989–1997 data to study income mobility [22] and 1989–2006 data to examine fairness in income opportunities (2012). Moreover, using the 1989–1997 data, Zhang et al. [23] analyzed changes in income distribution, and Zhu and Luo [24] studied the relationships between income inequality, poverty, and economic growth in China based on the same data sources. Between 1989 and 2009, CHNS has been conducted eight times using the multistage and grouped sampling method. After 1997, the surveys were conducted in nine provinces and autonomous regions. In 2009, the rural and urban samples included 8,028 persons and 3,456 persons. During 1997–2009, the sample size of each survey was almost similar and too far lower than that in NBS.

In 2011, the Chinese Household Finance Survey (CHFS) by the Southwestern University of Finance and Economics adopted a hierarchical, three-step sampling design. The sample data included 2,585 counties and/or cities in 25 provinces and autonomous regions 8,438 households. However, Shi and Wang [25] questioned the representativeness of the sample.

Other data sources include the China Health and Retirement Longitudinal Survey (CHARLS) and Chinese Family Panel Studies (CFPS). CHARLS is large-scale project organized by the research center at China Economy of Peking University. It was designed to provide basic data for academic research on China’s aging population as well as formulate and improve China’s social security policy. In 2008, a preliminary survey was conducted in both the urban and rural areas of the Gansu and Zhejiang Province. Next, CFPS was designed by those at the research center of Peking University to trace and gather three levels of data: individual, family, and community. In 2007, two test surveys with 140 households were completed in Beijing, Hebei, and Shanghai. In 2008, the exploratory studies were conducted in Beijing, Shanghai, and Guangdong and in the following year, the instrumental test panel studies were performed in the three cities.

In sum, the China Statistics Yearbook has several issues, while NBS has a large number of samples that have wide coverage and can be traced back to the beginning of China’s reform and opening up. Thus, NBS would be an ideal data source to analyze residents’ income distribution in China. Unfortunately, data provided by NBS are only from grouped households; in other words, the current grouped data include mean values that neglect income inequality within the group and consequently underestimate income disparity [11]. Moreover, the NBS survey data should be accessible to all so that great strides can be made in studies on income inequality in China.

Without access to NBS’ original data, some studies calculated the Gini coefficient using China’s per capita GDP [26] or per capita NI [27]. Even if per capita GDP and NI were closely related with per capita income, the Gini coefficients calculated from GDP or NI were incomparable with that obtained from income. Kanbur and Zhang [28] calculated the Gini coefficient using provincial per capita consumption and emphasized the limitations of data in measuring income inequality. Some scholars have even tried to break through the data insufficiency bottleneck by, for example, employing new statistical methods to recover missing information. Using China Statistics Yearbook’s grouped data, Wu and Perloff [29] restored income distribution for all residents for 1985–2001 and revealed an income inequality in China. Using the same method, Wang [30] studied income distribution in China’s urban and rural areas and Chi et al. [31] analyzed the income distribution in these areas from 1987 to 2004. In short, these studies employed nonparametric estimation methods. However, Zhang et al. [32] showed that the error in nonparametric estimation may be much larger than that in the parameter estimation to fit income distribution. Therefore, to study income inequality in China, it is of great significance to explore a breakthrough in methodology and estimate overall income distribution using NBS’ grouped data.

In addition, it is imperative to examine the income distribution function when estimating overall income distribution using grouped data for resident income. Studies on income distribution functions have a long history. Over a century ago, Pareto [33] proposed the Pareto distribution, which earned him the same stance as Lorenz’s study on income distribution [34]. Gibrat [35] stated that log-normal distribution could be a good fit for income distribution. However, successive research indicated that this distribution would underestimate the income of high-income group [36]. Distribution functions offer a host of analytical tools for studies on income inequality and promoted remarkable development. Subsequent studies made significant contributions to the field [3744].

In China, research on income distribution function began much later. Wang [45] used the Pareto distribution to fit income data for China from 1988 to 1995. Mao et al. [46] adopted gamma distribution to fit the income of China’s urban households from 2005 to 2007. Using per capita disposable income, Wang [47] studied the income distribution of rural residents and concluded that log-normal distribution had the best fitting effect. Duan and Chen [48] suggested that the national and regional per capita income of urban and rural families obeyed the mixed distribution of the Pareto distribution, normal distribution, and exponential distribution. Huang and Liu [49] adopted the nonparametric method to fit China’s income distribution. Using the same method, Wang [30] explored income distribution from 1985 to 2009. Hu et al. [8] introduced different fitting methods and fit the income of rural and urban residents with the Weibull distribution, log-normal distribution, and beta distribution of the second kind (B2). The empirical results showed that B2 enjoyed the best fitting effect, in view of which the Gini coefficient of China’s resident income was calculated. Hu [9] hypothesized that the generalized beta distribution of the second kind (GB2) had the best fitting effect and estimated China’s Gini coefficient from 1985 to 2009. Chen et al. [50] focused on the numerical feature of the distribution function and its application to income inequality. Zhang et al. [32] compared the fitting effects of the different distribution functions and the nonparametric estimation method and showed that a three-parameter distribution function was superior to a two-parameter distribution function, and GB2 with four parameters had the best fitting effect in the income distribution of urban residents in the Anhui Province. In addition, they believed that when analyzing a distribution with complications arising from limited parameters, the parameter estimation method was clearly less capable than the nonparametric one in adjusting the distribution shape. However, an analysis of the smooth unimodal density distribution revealed that too many parameters in the nonparametric methods produced redundant information, in other words, “noise” that influenced the fitting effect. Thus, the parameter method is superior to the nonparametric one.

Various distribution functions were employed in examining the income distribution to determine the distribution function with the highest goodness of fit. This is due to the dissimilar features of income distribution in various countries and regions during different periods; that is, no distribution function was universal. For example, Tachibanaki et al. [51] employed six commonly used distribution functions to research the income distribution of residents in Japan. McDonald [42] and McDonald and Xu [52] analyzed US household income for 1970, 1975, 1980, and 1985 and compared the fitting effects of 11 distribution functions. McDonald and Mantrala [53] adopted 15 types of distribution functions to analyze US household income for 1970, 1975, 1980, 1985, 1990, and 1995. Hu et al. [8] used grouped data and compared the fitting effects of three distribution functions. Using microeconomic survey data on urban households and three nonparametric methods, Zhang et al. [32] compared nine types of distribution functions to analyze the goodness of fit.

Drawing on the above, this study uses 20 sets of grouped income data on urban and rural residents by the China Yearbook of Household Survey, 2013, and eight types of distribution functions to fit the income distribution. Accordingly, we calculated the Gini coefficient in China and its changing tendency. This study makes the following theoretical and practical contributions.

(1) The China Yearbook of Household Survey, 2013, has published 20 sets of grouped income data, which is the most scientific and accurate compared to its previous publications. The 20 sets of grouped data provide a larger amount of information than the previous seven sets of grouped data and a wider income range of rural residents than the China Rural Survey Yearbook. Thus, estimating the income distribution fit using this data source is more reliable.

(2) The paper compares eight distribution functions, of which GB2, B2, Singh-Maddala (SM), and Dagum distribution are multiple parameter distribution functions. In comparison with the two-parameter distribution function, the multiple parameter distribution function has more parameters that have a greater effect on a function’s shape; thus, the latter has stronger control over distribution shape and enjoys a better fitting effect. Moreover, to the authors’ knowledge, no study has applied GB2, B2, and Dagum distribution to study the income distribution of urban and rural residents in China. We attach more importance to the fitting effect of GB2 because the goodness of fit is very high when GB2 is adopted to examine the income distribution of overseas residents. To this effect, Zhang et al. [32] show that GB2 has the best fitting effect among nine distribution functions and three nonparametric methods. However, the question remains whether GB2 is a good fit to the income distribution of Chinese rural and urban residents.

(3) When the distribution function fits the actual income distribution, the fitting effect of the two ends of distribution—low- and high-income group—is not good enough. To this effect, we focus on the goodness of fit of the two ends when comparing the goodness of fit of the different distribution functions.

(4) Against the background of urbanization, we fit different urbanization rates using 2012 data and estimate the contribution rate of the intrarural and intraurban Gini coefficients and the Gini between rural and urban areas to that of the overall residents in China; this can help in framing policies aimed at narrowing the income gap in China.

The remainder of this paper is organized as following. Section 2 introduces the data and method. Section 3 compares the fitting effects of the eight distribution functions on income distribution. Section 4 explores income distribution, China’s Gini coefficient, and the influence of urbanization on the Gini coefficient. Section 5 provides suggestions and concludes the paper.

2. Data and Method

2.1. Data

As mentioned, the overaggregated statistical data in the Chinese Statistical Yearbook and China Yearbook of Rural Household Survey hinder careful and in-depth analyses of income inequality. Whereas the Chinese Statistical Yearbook provides seven sets of grouped income data for urban residents, the China Yearbook of Rural Household Survey offers 20 sets of grouped income data for rural residents; however, the grouping of high-income groups is extensive. For instance, the China Yearbook of Rural Household Survey, 2011, presented 20 sets of grouped data in which rural households with per capita net income exceeding RMB5,000 accounted for 52.41%; but the yearbook did not subdivide such households.

Furthermore, the 20 sets of grouped income data by the China Urban Life and Price Yearbook have only been updated until 2010. Moreover, the data were based on per capita income and not on income per household. Since 2012, the China Yearbook of Rural Household Survey and China Urban Life and Price Yearbook merged into the China Yearbook of Household Survey.

In 2012, the China Yearbook of Household Survey issued 20 new sets of grouped income data on rural residents. The annual household income was grouped by less than RMB100 and more than RMB5,000, extending to less than RMB2,000 and more than RMB20,000. The yearbook also published 20 sets of grouped income data on urban residents and these data were based on the income per household. Thus, the new grouped data provided a more detailed income distribution of high-income group in rural areas. In addition, the grouped data for urban households by the China Yearbook of Household Survey provided more information than the Chinese Statistical Yearbook. Thus, the data can be used to more accurately fit the income distribution of rural and urban residents. However, the latest China Yearbook of Household Survey, 2013, only provided data for 2005, 2009, 2010, 2011, and 2012. As opposed to Hu et al. [8], the study analyzes urban and rural income distribution using income per household rather than per capita income per household.

We compared the 20 sets of grouped income data of rural residents between the China Yearbook of Household Survey and Chinese Statistical Yearbook and found that the former were given up to two decimal places, while the latter were given one. However, after fitting the same distribution function, the parameters obtained were not the same. Thus, on account of the former providing fuller data, grouped data by the China Yearbook of Household Survey were used.

2.2. Eight Distribution Functions

This paper adopts eight distribution functions; their probability density functions are as follows.(1)Log-normal distribution is (2)Gamma distribution is (3)Log-logistic distribution is (4)Weibull distribution is (5)Singh-Maddala distribution is (6)Dagum distribution is (7)Beta distribution of the second kind (B2) is (8)Generalized beta distribution of the second kind (GB2) is

The log-normal, gamma, log-logistic, and Weibull distributions are two-parameter distribution functions; the two parameters are scale and shape. This paper focuses on the parameters and interrelationships of multiple parameter distributions. is a scale parameter in GB2 and , , and are shape parameters. In SM, is a scale parameter, and and are shape parameters. only affects the right end and affects both. When , GB2 yields the SM distribution. In the Dagum distribution, is a scale parameter and and are shape parameters. When , GB2 becomes the Dagum distribution. In the B2 distribution, is a scale parameter, and and are shape parameters. When , GB2 yields the B2 distribution.

Zhang et al. [32] compared the Lomax distribution with that of Pareto. More precisely, the Lomax distribution applies to income distributions with a significantly wide income gap (the Gini coefficient is at least more than 0.5) [43]. Pareto [33] believed that all income distributions obeyed the Pareto distribution. However, empirical studies state that the Pareto distribution was not a good fit for the distribution of overall income groups but only applied to the income distribution of 1–3% of the highest income group [54]. Therefore, this study adopts B2 and not these two types of distribution. Similarly, Chotikapanich et al. [6] and Hu et al. [8] adopted B2 to analyze the income distribution of urban and rural residents and obtained a good fitting effect. However, Hu [9] posited that GB2 enjoyed the best fitting effect. Thus, this study not only compares the goodness of fit of B2 to that of the other seven distribution functions but also empirically examines whether GB2 is a good fit for the income distribution of urban and rural residents.

2.3. Parameter Estimation of Distribution Functions

The two most popular methods for estimating parameters of distribution functions are maximum likelihood estimation (MLE) and moment method estimation (MME). To apply MME, we must have complete individual data; however, according to Chotikapanich et al. [44], Hu et al. [8], since the yearbook only provides the maximum and minimum of the grouped income rather than the average, MLE would be more appropriate than MME.

To find the maximum of the likelihood function, one may use the first order condition; that is, we set the partial derivatives to zero and solve. This is feasible if the likelihood function is simple; however, the likelihood functions we encounter are oftentimes too complex to be differentiated analytically. Even if the partial derivatives can be calculated, the separate problem of solving for the zeros may still be too formidable a task. This issue has been addressed by some research for particular distributions, but the problem considered in this paper is more complex.

This study shows that the higher the number of parameters, the greater the difficulty to estimate. Here, taking GB2 as an example (see formula (8)), the log-likelihood function with number of random samples is expressed as where , , , and are the partial derivatives, and thus where is the derivative of the gamma function’s natural logarithm and can be obtained as

Formulae (13) and (15) can be calculated as

By substituting formulae (15) in formula (10), and can be expressed by including and . and are then used in formulae (12) and (13) and and are estimated using the Newton-Raphson iteration.

The above method is based on the assumption that we have complete individual data. The data of the present research is grouped data, so the use of this approach would not be appropriate.

To approach the technical issue of finding the parameters which maximize the likelihood function, we have several options. The most direct approach is that after defining the likelihood function, we use available software to find the parameters that maximize the value of this function. Most software such as MATLAB has a built-in command to implement this search. Our numerical experiments suggest that this method is quite efficient and robust when we have no more than 3 parameters; however, with more parameters, the likelihood function could be very irregular. To the authors’ knowledge, most software uses a recursive method to search for the maximum value, which depends heavily on the initial point. On the other hand, since the 4-parameter distributions are so complicated, it is often impossible for us to choose a starting point that is “reasonably” close to the true maximum. If we fail to have a good initial point to start with, we may end up with two results: the software may fail to find a point before it runs out of the maximum number of iterations, or we obtain a local maximum instead of the global maximum.

To avoid this, we utilized the grid algorithm to find the maximum of the likelihood function. This method does not require us to give a good starting point and has shown to be quite efficient according to our analysis. The errors between the theoretical distribution and empirical distribution, which is measured by chi-square value, are rather small. From Table 1, we see that the result is quite acceptable. It should also be mentioned that, in household income analysis, it is most important to correctly estimate the distribution of the incomes in the lowest and the highest group; for example, setting the poverty line is based on the distribution of the lowest income group while the personal taxation system design follows the distribution of the highest income group.

The basic idea of parameter estimation is explained by considering an example with three-parameter distribution functions denoted by , , and . First, using the relevant literature, the value range of is estimated as ; thus, the extreme value point is in the cuboid . Next, we divide the value range of each parameter into 10 parts; that is, the cuboid is divided into 1000 small cuboids. For each small cuboid, we calculate the average of the likelihood function of eight vertexes. Then, we identify the small cuboid with the maximum average value of the likelihood function. Suppose that the likelihood function is smooth enough; we have reason to believe that the extreme value point is in this small cuboid. We continue the process of further dividing the small cuboid into smaller ones and repeat the previous steps to find the smaller cuboid with the maximum average value of the likelihood function. This procedure is repeated until the dimensions of the cuboid reach our accuracy requirement. Finally, we can believe that the extreme value point is located in the center of the small cuboid.

Using the grid method, we measure the parameters of the distribution function. We judge the goodness of fit using the chi-square (chi2) value. The chi2 test is commonly used to test fitting degree, in which the domain is divided into several intervals. We select the chi2 test because of the grouped data. Then, the theoretical and actual observation value in each interval is compared and checked to identify whether the error is higher than the threshold value to determine the acceptance to or rejection of the distribution function. As mentioned above, when fitting the actual income distribution with the distribution function, we may not often get a good enough fitting effect on the two ends of distribution. To test the fitting effect of different distribution functions on the two ends, we compare the actual share of population in the two ends with the theoretical one, calculated using the distribution function. In fact, the smaller the absolute value of difference, the better the fitting effect. To the best of the authors’ knowledge, no similar literature has yet been found. Thus, because this research focuses on the distribution of low- and high-income groups, this method is more appropriate to study income disparity.

3. Actual Fitting Effect of the Eight Distribution Functions

Using the China Yearbook of Household Survey, 2013, data, we calculate the parameters and goodness of fit of the eight distribution functions for the income distribution of rural and urban residents in China (Table 1).

Table 1 shows that the fitting effect of the three-parameter distribution function is better than that of the two-parameter distribution function, and GB2 with four parameters has the best fitting effect. In the two-parameter distribution function, the fitting effect of the log-logistic distribution is better than that of the others. As is shown, in most years, the chi2 value and the fitting errors of the two ends are smaller than those of the other distributions. However, some scholars prefer the log-normal distribution; for instance, Souma [55] examined resident income in Japan and indicated that the log-logistic distribution is a universal structure of income distribution. By contrast, Battistin et al. [56] stated that income includes permanent income and a small amount of random income and that while the former obeys the log-normal distribution, the latter does not always do so. To this effect, the log-normal distribution fails to effectively describe income distribution.

To understand the difference between the log-normal, log-logistic, and actual income distribution, we adopt survey data on about 12,000 urban households in the Sichuan Province for 2008 and compare the differences between the theoretical and practical values of the two distribution functions mentioned above.

In Figure 1, the -axis denotes the income quantile and the -axis is the difference between the theoretical and practical value (estimation error). If the difference is close to 0, the fitting effect is good. On the left is the log-normal distribution and to the right is the log-logistic distribution.

Figure 1 demonstrates the difference between the theoretical and practical value fitted by the log-normal distribution gathered in the high-income group, and the negative difference shows that the log-normal distribution function underestimates the income of this group, which is similar to the findings of Zhang et al. [32]. They believed that the fast convergence rate of the ends of the log-normal distribution often led to the underestimation of the high-income group. By contrast, the fitting effect of the log-logistic distribution is better, that is, the difference between the theoretical and practical value is not obvious, which shows that there is no significant underestimation of income in the high-income group.

In the three-parameter distributions, the fitting effect of the Dagum distribution on the low-income group is better than that of the SM distribution; however, the fitting effect even on the high-income groups is often not as good as that of the SM distribution. For the urban residents, the Dagum distribution has a better fitting effect than that of the SM distribution, whereas the opposite holds true for the rural residents. Overall, these two distributions are comparable. There is no major difference in the fitting effect between the B2 distribution function and the SM and Dagum distributions when B2 fits the income distribution of the urban residents; however, B2 is more effective than the latter two for rural residents.

As for GB2, the chi2 value and the fitting error of the two ends are smaller than that of other distributions, which is similar to the conclusions of McDonald [42] and Zhang et al. [32]. Like before, the GB2 distribution has four parameters, of which three are shape parameters. Therefore, GB2 has a stronger ability to adjust and control shape than the two- and three-parameter distribution functions. Thus, GB2 enjoys the best fitting effect on the income distribution of rural and urban residents in the eight distribution functions. Next, we compare the Gini coefficient calculated using different distribution functions. Here, we take the example of urban residents’ income distribution in 2012.

Table 2 shows that the Gini coefficient using GB2 is 0.3151, which is regarded as a standard because it is the best fitting effect of GB2. In comparison, we find that the Gini coefficient estimated by the gamma distribution is the largest (14% higher than the standard) and that by the Weibull distribution is the smallest (7% lower than the standard). This proves that the choice of distribution function has significant influence on the calculation of the Gini coefficient.

4. Income Distribution and Gini Coefficient of Residents in China

The Gini coefficient of national residents can be calculated using the above results. According to the parameters of income distribution and the ratio of urban to rural population by the China Statistical Yearbook, a random number is generated using the Monte Carlo method. This random number represents the actual income distribution of national residents, which helps estimate the Gini coefficient of residents in China. In this section, we explore the Gini coefficients of residents in China.

4.1. Income Distribution of Residents in China

We use the GB2 distribution function and the rural to urban population ratio from the China Statistical Yearbook to obtain the income distribution of rural, urban, and national residents between 2005 and 2012 (Figure 2).

In Figure 2, the income coordinates for each year are consistent. It is evident that at the end of every year income distribution becomes increasingly dense and the peaks trend to the upper right. Due to the changes in the national income distribution shape, the fitting effects of the eight distribution functions become significantly weak (Table 3). For the chi2 value, GB2 has the best fitting effect among the eight functions. In the two-parameter distribution functions, the log-logistic and Weibull distribution have a better fitting effect than the others, while among the three-parameter distribution functions, B2 enjoys the best fitting effect. We also find that the fitting effect of the three-parameter distribution function is superior to that of the two-parameter function. Next, we calculate the Gini coefficient of residents in China from 2005 to 2012.

4.2. Gini Coefficient of Chinese Residents

In line with studies on GB2 [42], the Gini coefficient can be directly derived and expressed as where is the beta distribution, in which and , and is the generalized hypergeometric function, expressed as

Here, .

Since calculating the Gini coefficient can be complicated, we use the distribution function with the known parameters to generate a random number using the Monte Carlo method. The random number depends on the ratio of urban to national population (Table 4), after which the Gini coefficient is calculated using , where is the total number of households, is the average income, and and , respectively, represent the th and th family income. Each random number obeying a specific distribution can be viewed as the family income data. Further research reveals that, as the number of simulations increase, the result gradually converges into a specific value. Finally, after 1,00,000 simulations, we obtain the income Gini coefficients of urban, rural, and national residents for 2005–2012 (Table 4).

Table 4 indicates that the Gini coefficient of urban and national residents has been declining from 2005 to 2012, while that of rural residents is on the rise. Compared to the Gini coefficient by NBS, our results are marginally lower but show the same trend.

The Gini coefficient for 2003–2012 by NBS was calculated using new standards, new caliber, and old data for the following reasons: (1) to determine the indicators of disposable income after the integrated household survey system was conducted, (2) to resolve the problem of rich households’ income being estimated lower than actual income, and (3) to adjust the classification of rural migrant worker. As for (1), the indicator was set in accordance with the National Economic Accounting, 2008, by the UN Statistical Commission and the Canberra Group Handbook of Household Income Statistics (2011) by the UN Economic Commission for Europe. More precisely, the index of the per capita net income of rural residents is adjusted to the per capita disposable income; that is, social security expenditure is deducted from and interest is added to the income of rural migrant workers. Moreover, to standardize the caliber of per capita disposable income means to further deduct all transfer spending—excluding social security expenditure, income tax, and property expenditure, mainly mortgage interest—and increase the net rental and in-kind income. Next, to solve the issue discussed in (2), NBS studied the calibration methods adopted in domestic research. By a comparison with personal income tax, NBS calculated the weights of high-income samples, adjustment coefficient, and per capita income for the smooth calibration of the high-income group. Finally, to adjust the classification mentioned in (3), migrant workers employed for more than half a year are classified under urban population from the current rural population to maintain consistency with the demographic classification. Next, the weights of the grouped data from the urban and rural household survey are merged in the light of the urban to national population ratio (Source: “the director of household survey office in the bureau of statistics provided the Gini coefficient of resident income,” China News Service, 19:42, Feb. 1, 2013 (http://news.sohu.com/20130201/n365335330.shtml)).

From the above, we believe that certain factors cause a marginally higher Gini coefficient when using NBS data because NBS adjusted the income distribution of high-income group and used various internationally standard calibration methods for trial. In addition, it selected the calibration method with detailed data sources and the maximum value of calibration and finally relied on individual income tax data. Thus, the Gini coefficient was marginally higher than our estimation. According to the existing survey, the average income of migrant workers is much higher than that of rural residents. From a statistical view point, it widens the rural-urban income gap in classifying migrant workers who migrate to urban areas to find employment. Undoubtedly, the income gap between rural and urban areas plays an important role in the income disparity of residents in China. Therefore, adjusting the statistical standard leads to a marginally higher Gini coefficient.

NBS emphasizes that the adjusted result is internationally comparable; however, we notice that the household survey in China adopts this widely varying method from other countries. The Chinese respondents are required to record their spending throughout the year, while in many other countries, a survey is administered to the respondents on income and expenditure on a weekly, biweekly, or monthly basis. The data are then multiplied by 52, 26, or 12 to obtain annual data [57]. By contrast, the average data from annual surveys in China help reduce the volatility of income and expenditure. For example, a sudden income decrease in one month balances an unexpected increase in another. Compared with other countries, China’s Gini coefficient can be underestimated because of the difference in statistical method [58]. We use monthly survey data of urban residents in Anhui Province to measure the monthly and annual Gini coefficient for 2008, which shows that the annual value (112%) is lower than the monthly one (116%). Thus, from the perspective of international comparison, China’s Gini coefficient has been underestimated because of the use of different statistical methods. If the annual statistical data by NBS are accurate, the current Gini coefficient is underestimated. Correspondingly, depending on the household survey data in Anhui Province, the Gini coefficient of Chinese residents may exceed 0.52. Owing to the detailed data source and calibration method with a maximum calibration quantity, the Gini coefficient of Chinese residents is less than 0.6, even when the underestimation of annual statistical data is considered.

Hu [9] adopted a similar method to calculate the Gini coefficient of Chinese residents for 1985–2009. We compared the estimators of 2005 and 2009 and found that in the same years the parameters of the GB2 distribution function are not similar to ours when we fit the income of rural and urban residents. The significant differences are in the estimation of income distribution parameters for rural residents. Although the data structures are not identical, they are all derived from the urban and rural household survey by NBS. We adopt his results and derive identical results. In view of this, his parameters are only local optimum and not global optimal, which leads to a large difference in the intrarural Gini coefficient. While Hu [9] calculated the intrarural Gini coefficient as 0.4183 for 2005 and 0.6738 for 2009, our estimations are 0.3718 and 0.3812. We question Zhijun’s result because the same Gini coefficient estimations by the Department of Employment and Income Distribution at the National Development and Reform Commission are 0.38 and 0.39 (see “Yearly Report on Resident Income in China” (page 254, 2010), edited by the Department of Employment and Income Distribution at the National Development and Reform Commission, Economic Science Press). Owing to the overestimated Gini coefficient for rural residents, that of national residents in 2009 by Hu [9] is higher than 0.54. Hu et al. [8] also used NBS data and distribution functions for 1985–2008 to estimate the Gini coefficient for residents in cities and towns as well as the country and nation and derived an intrarural Gini of only 0.3370. In addition, we compare the intrarural, intraurban, and national Gini by Hu et al. [8] and Chotikapanich et al. [6] and identify a major difference among them. While the Gini coefficient of national residents by Hu [9] is marginally high, that by Chotikapanich et al. [6] is low; Chotikapanich et al. estimated a value of only 0.2827 for 1985. Thus, choosing different distribution functions can produce prominent calculation errors. In particular, even the same distribution function can produce a significant deviation if the results are only local optimal.

4.3. Influence of Urbanization on the Income Gap of Residents in China

According to Chen [12], urbanization can diminish the income gap in China. We examine 2012 data to verify the influence of urbanization on China’s income gap. Supposing that the income distribution of rural and urban residents is fixed in 2012 but there are variations in urban to national population ratio ranging from 0% to 100%, the Gini coefficient will change from 0.3808 to 0.3150.

Using the decomposition method [5961], the national Gini coefficient () can be decomposed into four parts: intrarural Gini (), intraurban Gini (), the Gini coefficient between the urban and rural area (), and the cross term of resident income in urban and rural areas (): where and denote the results of the share of urban and rural population multiplied by the income proportion of the urban and rural areas , which is (). The decomposed national Gini includes four segments: , , , and . Their relevant coefficients are 1, , , and 1, respectively. This formula clearly shows the contributions of each part to the national Gini coefficient.

emanates from the income overlap of urban and rural residents. Although previous studies on the influence of are not apparent, was often referred to as the residual or cross term of income, also termed the “income concentration zone” by Bhattacharya and Mahalanobis [62]. In addition, Mookherjee and Shorrocks [63] stated that it was a frustrating interaction term that was impossible to accurately calculate. Lambert and Aronson [61] argued that the residual was a result produced by calculations for both within and between groups, presenting the overlapping degree of the income distribution of different groups, which was similar to Cowell’s [64] ideology. Lambert and Decoster’s [65] mathematical expression for was

Here, and are the income distribution functions of urban and rural residents and is the average income of residents across the country. Using formulae (18) and (19), we calculate the national Gini coefficient or the changes in , , , and because of urbanization (Figure 3; the -axis shows the urbanization ratio).

In Figure 3, the upper right illustration shows the changes in with urbanization. Due to an extremely small effect, it is difficult to identify how it changes from the bottom left and right one. The bottom left one shows the contributions of several factors to the national income disparity in rural areas (the curve steadily declines to the bottom right), the income inequality in urban areas (the , curve steadily rises to the upper right), the income difference between urban and rural areas ( is the inverted curve), and (the inverted curve close to the axis). The bottom right one is a combination of the upper left and bottom left illustrations.

From Figure 3, we conclude that urbanization can help narrow the income disparity of residents in China. With urbanization, the contribution of and that of income disparities in rural areas and between rural and urban areas to national income inequity will be in continuous decline, while the contribution of income disparity in urban areas will rise. To conduct an in-depth analysis of the influence of factors mentioned above on the national Gini coefficient, we calculate the contribution rate of various factors to the national Gini coefficient (Figure 4).

Figure 4 demonstrates that at the beginning of urbanization the contribution rate of the intrarural Gini coefficient is higher; thereafter, the income disparity between rural and urban areas makes a larger contribution. Once urbanization has been achieved the intraurban Gini coefficient plays a more important role than others. When the income distribution of urban and rural residents in 2012 is fixed, the urbanization rate reaches 15.3%, and the intrarural Gini coefficient and that for the urban and rural areas makes the same contributions to the overall Gini coefficient. As for the urbanization rate in 2012, the Gini coefficient between urban and rural areas accounts for 58% of the contribution rate to the overall Gini. If the average income ratio of urban and rural residents remains unchanged when the urbanization rate is 38%, the contribution rate of the Gini coefficient for rural and urban areas to the national one reaches 61.6%. If the urbanization rate is 68.3%, the intraurban Gini coefficient and that for urban and rural areas make the same contribution to the national Gini coefficient. Currently, the income inequality between the rural and urban areas is a key factor influencing the income disparity of residents in China. Thus, accelerating urbanization can play a major role in diminishing the income inequality of residents in China.

5. Conclusions and Suggestions

Using the 20 sets of grouped rural and urban family data for 2005–2012 by the China Yearbook of Household Survey, this paper adopted eight common distribution functions to fit the income distribution of urban and rural residents. The results showed that GB2 has the best fitting effect among the eight distribution functions. In addition, the fitting effect of the three-parameter distribution function is superior to that of the two-parameter distribution function. In the two-parameter distribution function, the log-logistic distribution function enjoys the best fitting effect. We also compare the fitting effects of the log-logistic and log-normal distribution and reveal the reasons for the differences in fitting effect.

By referencing previous research, we describe the income distribution of residents in China. We calculate the Gini coefficient for rural and urban income as well as the overall Gini coefficient. The results show that the national Gini coefficient has been on the decline from 2005 to 2012, which also hold true for urban residents. By contrast, the overall Gini coefficient has been rising. In addition, we analyze the Gini coefficient for national residents by NBS and estimate the range of the national Gini coefficient.

To discuss the effects of urbanization on the income gap of residents in China, the parameters of income distribution function for 2012 are kept constant and the urbanization rate is assumed to range from 0% to 100%. We explore the degree of influence that the intrarural Gini, intraurban Gini, and Gini coefficient between rural and urban areas have on the overall Gini coefficient. We conclude that the Gini coefficient between rural and urban areas makes the biggest contribution to the national Gini, which, however, appears to be on a downward trend because of urbanization. We predict that the intraurban Gini coefficient will make increasingly larger contributions and play a decisive role in the future.

In line with the above results, we make the following three suggestions. First, urbanization plays a key role in narrowing the income inequality in China and should thus be promoted. Second, higher importance should be attached to intraurban income disparity. With urbanization, the intraurban income inequality will make a larger contribution to the overall income disparity in China. Finally, fuller and more accurate data on resident income are expected to be released by NBS to strongly promote research on China’s income inequality.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This paper is funded by the National Natural Science Foundation of China (Grant nos. 71473203 and 11426181), the Innovation Team Project for Basic Research Funding by the Central University (JBK120502), and the Important Basic Theory Research of Southwestern University of Finance and Economics (JBK141102).