Journal of Applied Mathematics

Volume 2015, Article ID 939020, 16 pages

http://dx.doi.org/10.1155/2015/939020

## Chinese Gini Coefficient from 2005 to 2012, Based on 20 Grouped Income Data Sets of Urban and Rural Residents

^{1}School of Public Finance and Taxation, Collaborative Innovation Center for Financial Security, Southwestern University of Finance and Economics, Chengdu 611130, China^{2}School of Economics, Renmin University of China, Beijing 100086, China^{3}University of Edinburgh Business School, University of Edinburgh, Edinburgh EH8 9JS, UK^{4}School of Economics and Mathematics, Southwestern University of Finance and Economics, Chengdu 611130, China^{5}School of Insurance, Southwestern University of Finance and Economics, Chengdu 611130, China^{6}Anhui University of Finance and Economics, Bengbu 233030, China

Received 9 November 2014; Revised 26 January 2015; Accepted 26 January 2015

Academic Editor: Jinyun Yuan

Copyright © 2015 Jiandong Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Data insufficiency has become the primary factor affecting research on income disparity in China. To resolve this issue, this paper explores Chinese income distribution and income inequality using distribution functions. First, it examines 20 sets of grouped data on family income between 2005 and 2012 by the *China Yearbook of Household Surveys, 2013,* and compares the fitting effects of eight distribution functions. The results show that the generalized beta distribution of the second kind has a high fitting to the income distribution of urban and rural residents in China. Next, these results are used to calculate the Chinese Gini ratio, which is then compared with the findings of relevant studies. Finally, this paper discusses the influence of urbanization on income inequality in China and suggests that accelerating urbanization can play an important role in narrowing the income gap of Chinese residents.

#### 1. Introduction

Several conflicts exist over the calculation of China’s Gini coefficient. A literature review reveals over 30 different estimations of the Chinese Gini coefficient, all lacking a consensus. The estimations of the Gini coefficient of 1995 best exemplify this situation. While Chen [1] estimated a Gini coefficient of 0.365, in 2002, Chen and Zhou [2] used two different methods and obtained a result of 0.38392 and 0.41914. The latter results were similar to Chen’s [3] and Ravallion and Chen’s [4] 0.4169 and 0.415; however, these similar results were obtained using entirely different methods. Similarly, Xiang [5], Chotikapanich et al. [6], and Xu and Zhang [7] used different methods and derived similar results: 0.3515, 0.3506, and 0.3591, respectively. Hu et al. [8] and Hu [9] adopted the method of distribution function and obtained the values of 0.3761 and 0.3691. Zhao et al. [10] estimated a value of 0.445. The highest value, 0.452, was provided by Khan and Riskin [11], which is 28.9% higher than the 0.3506 by Chotikapanich et al. [6]. Since the National Bureau of Statistics’ survey data on original households has not been made public, major disagreements on the calculation of a Gini coefficient for resident income in China have persisted since the reform and opening up. The Gini coefficient is used as basic statistical data to analyze income inequality. Thus, the discrepancy in estimation methods, combined with data insufficiency, has largely limited research on China’s income inequality [12, 13]. Following is a brief introduction to the existing data sources and this study’s primary data source.

According to the* China Statistical Yearbook* (2012), in 2011, the household survey teams of the National Bureau of Statistics (NBS) conducted a survey on 66,000 urban households and 75,000 rural households in 7,100 villages. Thus, in comparison with others sources, NBS is a far superior source of data on resident income and regions. However, the yearbook has also been criticized. For instance, Khan and Riskin [11] stated that the statistical data in the yearbook was too aggregated to conduct a careful and deep analysis on income inequality. Similarly, Fang et al. [14] believed that the income disparity within each group had been ignored owing to the aggregated data and the results were not accurate enough. In addition, disputes exist over the yearbook’s income standard. According to Li and Luo [13], the hidden subsidies of urban residents were much higher than those of rural residents. In other words, the actual income gap between urban and rural residents was larger than that depicted by the yearbook. Sutherland and Yao [15] pointed out that NBS did not fully consider factors such as welfare disparity between urban and rural residents, cost of living in different regions, rapid expansion of urban areas, and the large number of migrant workers. To this effect, various scholars hold different opinions on the income gap between urban and rural areas. While Sicular et al. [16] estimated a small income gap, Li and Luo [17] found a dramatic and widening gap.

The Chinese Household Income Projects (CHIPs) measure income distribution using data from surveys conducted on households in selected provinces and cities for 1988, 1995, 2002, and 2007. The data were gathered by the research group at the Institute of Economic Research of Chinese Academy of Social Science. Using the adjusted data from the 1988 and 1995 surveys, Khan and Riskin [11] and Zhao et al. [10] analyzed the income distribution of residents in China. Wang [18] used the data from the 1988 and 1995 surveys to examine income mobility.* Research on Income Distribution in China*, edited by Li et al. [19], was based on the 2002 survey, while Shi Li and Sui Yang referenced the 2007 data, whose conclusions attracted much attention. However, the sample size of CHIPs is much smaller than that of NBS; for example, CHIPs 1995 samples included 14,929 households from 19 provinces, while NBS was 10,286 households for the same year.

The China Health and Nutrition Survey (CHNS) is a collaborative project by the Population Research Centre at the University of North Carolina, the National Institute of Nutrition and Food Safety, and the China Disease Prevention and Control Centre. In addition to income, the survey uses data on residents, nutrition, health, adults, children, and communities, among others, to analyze income distribution. Shi et al. [20] examined CHNS’ 1997 data on rural and urban income and indicated that NBS provided marginally higher income distribution than that in CHNS. Wei [21] analyzed 1993 data for rural areas to explore factors influencing nonagricultural employment and salary. Wang [22] used 1989–1997 data to study income mobility [22] and 1989–2006 data to examine fairness in income opportunities (2012). Moreover, using the 1989–1997 data, Zhang et al. [23] analyzed changes in income distribution, and Zhu and Luo [24] studied the relationships between income inequality, poverty, and economic growth in China based on the same data sources. Between 1989 and 2009, CHNS has been conducted eight times using the multistage and grouped sampling method. After 1997, the surveys were conducted in nine provinces and autonomous regions. In 2009, the rural and urban samples included 8,028 persons and 3,456 persons. During 1997–2009, the sample size of each survey was almost similar and too far lower than that in NBS.

In 2011, the Chinese Household Finance Survey (CHFS) by the Southwestern University of Finance and Economics adopted a hierarchical, three-step sampling design. The sample data included 2,585 counties and/or cities in 25 provinces and autonomous regions 8,438 households. However, Shi and Wang [25] questioned the representativeness of the sample.

Other data sources include the China Health and Retirement Longitudinal Survey (CHARLS) and Chinese Family Panel Studies (CFPS). CHARLS is large-scale project organized by the research center at China Economy of Peking University. It was designed to provide basic data for academic research on China’s aging population as well as formulate and improve China’s social security policy. In 2008, a preliminary survey was conducted in both the urban and rural areas of the Gansu and Zhejiang Province. Next, CFPS was designed by those at the research center of Peking University to trace and gather three levels of data: individual, family, and community. In 2007, two test surveys with 140 households were completed in Beijing, Hebei, and Shanghai. In 2008, the exploratory studies were conducted in Beijing, Shanghai, and Guangdong and in the following year, the instrumental test panel studies were performed in the three cities.

In sum, the* China Statistics Yearbook* has several issues, while NBS has a large number of samples that have wide coverage and can be traced back to the beginning of China’s reform and opening up. Thus, NBS would be an ideal data source to analyze residents’ income distribution in China. Unfortunately, data provided by NBS are only from grouped households; in other words, the current grouped data include mean values that neglect income inequality within the group and consequently underestimate income disparity [11]. Moreover, the NBS survey data should be accessible to all so that great strides can be made in studies on income inequality in China.

Without access to NBS’ original data, some studies calculated the Gini coefficient using China’s per capita GDP [26] or per capita NI [27]. Even if per capita GDP and NI were closely related with per capita income, the Gini coefficients calculated from GDP or NI were incomparable with that obtained from income. Kanbur and Zhang [28] calculated the Gini coefficient using provincial per capita consumption and emphasized the limitations of data in measuring income inequality. Some scholars have even tried to break through the data insufficiency bottleneck by, for example, employing new statistical methods to recover missing information. Using* China Statistics Yearbook’s* grouped data, Wu and Perloff [29] restored income distribution for all residents for 1985–2001 and revealed an income inequality in China. Using the same method, Wang [30] studied income distribution in China’s urban and rural areas and Chi et al. [31] analyzed the income distribution in these areas from 1987 to 2004. In short, these studies employed nonparametric estimation methods. However, Zhang et al. [32] showed that the error in nonparametric estimation may be much larger than that in the parameter estimation to fit income distribution. Therefore, to study income inequality in China, it is of great significance to explore a breakthrough in methodology and estimate overall income distribution using NBS’ grouped data.

In addition, it is imperative to examine the income distribution function when estimating overall income distribution using grouped data for resident income. Studies on income distribution functions have a long history. Over a century ago, Pareto [33] proposed the Pareto distribution, which earned him the same stance as Lorenz’s study on income distribution [34]. Gibrat [35] stated that log-normal distribution could be a good fit for income distribution. However, successive research indicated that this distribution would underestimate the income of high-income group [36]. Distribution functions offer a host of analytical tools for studies on income inequality and promoted remarkable development. Subsequent studies made significant contributions to the field [37–44].

In China, research on income distribution function began much later. Wang [45] used the Pareto distribution to fit income data for China from 1988 to 1995. Mao et al. [46] adopted gamma distribution to fit the income of China’s urban households from 2005 to 2007. Using per capita disposable income, Wang [47] studied the income distribution of rural residents and concluded that log-normal distribution had the best fitting effect. Duan and Chen [48] suggested that the national and regional per capita income of urban and rural families obeyed the mixed distribution of the Pareto distribution, normal distribution, and exponential distribution. Huang and Liu [49] adopted the nonparametric method to fit China’s income distribution. Using the same method, Wang [30] explored income distribution from 1985 to 2009. Hu et al. [8] introduced different fitting methods and fit the income of rural and urban residents with the Weibull distribution, log-normal distribution, and beta distribution of the second kind (B2). The empirical results showed that B2 enjoyed the best fitting effect, in view of which the Gini coefficient of China’s resident income was calculated. Hu [9] hypothesized that the generalized beta distribution of the second kind (GB2) had the best fitting effect and estimated China’s Gini coefficient from 1985 to 2009. Chen et al. [50] focused on the numerical feature of the distribution function and its application to income inequality. Zhang et al. [32] compared the fitting effects of the different distribution functions and the nonparametric estimation method and showed that a three-parameter distribution function was superior to a two-parameter distribution function, and GB2 with four parameters had the best fitting effect in the income distribution of urban residents in the Anhui Province. In addition, they believed that when analyzing a distribution with complications arising from limited parameters, the parameter estimation method was clearly less capable than the nonparametric one in adjusting the distribution shape. However, an analysis of the smooth unimodal density distribution revealed that too many parameters in the nonparametric methods produced redundant information, in other words, “noise” that influenced the fitting effect. Thus, the parameter method is superior to the nonparametric one.

Various distribution functions were employed in examining the income distribution to determine the distribution function with the highest goodness of fit. This is due to the dissimilar features of income distribution in various countries and regions during different periods; that is, no distribution function was universal. For example, Tachibanaki et al. [51] employed six commonly used distribution functions to research the income distribution of residents in Japan. McDonald [42] and McDonald and Xu [52] analyzed US household income for 1970, 1975, 1980, and 1985 and compared the fitting effects of 11 distribution functions. McDonald and Mantrala [53] adopted 15 types of distribution functions to analyze US household income for 1970, 1975, 1980, 1985, 1990, and 1995. Hu et al. [8] used grouped data and compared the fitting effects of three distribution functions. Using microeconomic survey data on urban households and three nonparametric methods, Zhang et al. [32] compared nine types of distribution functions to analyze the goodness of fit.

Drawing on the above, this study uses 20 sets of grouped income data on urban and rural residents by the* China Yearbook of Household Survey, 2013*, and eight types of distribution functions to fit the income distribution. Accordingly, we calculated the Gini coefficient in China and its changing tendency. This study makes the following theoretical and practical contributions.

(1) The* China Yearbook of Household Survey, 2013*, has published 20 sets of grouped income data, which is the most scientific and accurate compared to its previous publications. The 20 sets of grouped data provide a larger amount of information than the previous seven sets of grouped data and a wider income range of rural residents than the* China Rural Survey Yearbook*. Thus, estimating the income distribution fit using this data source is more reliable.

(2) The paper compares eight distribution functions, of which GB2, B2, Singh-Maddala (SM), and Dagum distribution are multiple parameter distribution functions. In comparison with the two-parameter distribution function, the multiple parameter distribution function has more parameters that have a greater effect on a function’s shape; thus, the latter has stronger control over distribution shape and enjoys a better fitting effect. Moreover, to the authors’ knowledge, no study has applied GB2, B2, and Dagum distribution to study the income distribution of urban and rural residents in China. We attach more importance to the fitting effect of GB2 because the goodness of fit is very high when GB2 is adopted to examine the income distribution of overseas residents. To this effect, Zhang et al. [32] show that GB2 has the best fitting effect among nine distribution functions and three nonparametric methods. However, the question remains whether GB2 is a good fit to the income distribution of Chinese rural and urban residents.

(3) When the distribution function fits the actual income distribution, the fitting effect of the two ends of distribution—low- and high-income group—is not good enough. To this effect, we focus on the goodness of fit of the two ends when comparing the goodness of fit of the different distribution functions.

(4) Against the background of urbanization, we fit different urbanization rates using 2012 data and estimate the contribution rate of the intrarural and intraurban Gini coefficients and the Gini between rural and urban areas to that of the overall residents in China; this can help in framing policies aimed at narrowing the income gap in China.

The remainder of this paper is organized as following. Section 2 introduces the data and method. Section 3 compares the fitting effects of the eight distribution functions on income distribution. Section 4 explores income distribution, China’s Gini coefficient, and the influence of urbanization on the Gini coefficient. Section 5 provides suggestions and concludes the paper.

#### 2. Data and Method

##### 2.1. Data

As mentioned, the overaggregated statistical data in the* Chinese Statistical Yearbook* and* China Yearbook of Rural Household Survey* hinder careful and in-depth analyses of income inequality. Whereas the* Chinese Statistical Yearbook* provides seven sets of grouped income data for urban residents, the* China Yearbook of Rural Household Survey* offers 20 sets of grouped income data for rural residents; however, the grouping of high-income groups is extensive. For instance, the* China Yearbook of Rural Household Survey, 2011*, presented 20 sets of grouped data in which rural households with per capita net income exceeding RMB5,000 accounted for 52.41%; but the yearbook did not subdivide such households.

Furthermore, the 20 sets of grouped income data by the* China Urban Life and Price Yearbook* have only been updated until 2010. Moreover, the data were based on per capita income and not on income per household. Since 2012, the* China Yearbook of Rural Household Survey* and* China Urban Life and Price Yearbook* merged into the* China Yearbook of Household Survey*.

In 2012, the* China Yearbook of Household Survey* issued 20 new sets of grouped income data on rural residents. The annual household income was grouped by less than RMB100 and more than RMB5,000, extending to less than RMB2,000 and more than RMB20,000. The yearbook also published 20 sets of grouped income data on urban residents and these data were based on the income per household. Thus, the new grouped data provided a more detailed income distribution of high-income group in rural areas. In addition, the grouped data for urban households by the* China Yearbook of Household Survey* provided more information than the* Chinese Statistical Yearbook*. Thus, the data can be used to more accurately fit the income distribution of rural and urban residents. However, the latest* China Yearbook of Household Survey, 2013*, only provided data for 2005, 2009, 2010, 2011, and 2012. As opposed to Hu et al. [8], the study analyzes urban and rural income distribution using income per household rather than per capita income per household.

We compared the 20 sets of grouped income data of rural residents between the* China Yearbook of Household Survey* and* Chinese Statistical Yearbook* and found that the former were given up to two decimal places, while the latter were given one. However, after fitting the same distribution function, the parameters obtained were not the same. Thus, on account of the former providing fuller data, grouped data by the* China Yearbook of Household Survey* were used.

##### 2.2. Eight Distribution Functions

This paper adopts eight distribution functions; their probability density functions are as follows.(1)Log-normal distribution is (2)Gamma distribution is (3)Log-logistic distribution is (4)Weibull distribution is (5)Singh-Maddala distribution is (6)Dagum distribution is (7)Beta distribution of the second kind (B2) is (8)Generalized beta distribution of the second kind (GB2) is

The log-normal, gamma, log-logistic, and Weibull distributions are two-parameter distribution functions; the two parameters are scale and shape. This paper focuses on the parameters and interrelationships of multiple parameter distributions. is a scale parameter in GB2 and , , and are shape parameters. In SM, is a scale parameter, and and are shape parameters. only affects the right end and affects both. When , GB2 yields the SM distribution. In the Dagum distribution, is a scale parameter and and are shape parameters. When , GB2 becomes the Dagum distribution. In the B2 distribution, is a scale parameter, and and are shape parameters. When , GB2 yields the B2 distribution.

Zhang et al. [32] compared the Lomax distribution with that of Pareto. More precisely, the Lomax distribution applies to income distributions with a significantly wide income gap (the Gini coefficient is at least more than 0.5) [43]. Pareto [33] believed that all income distributions obeyed the Pareto distribution. However, empirical studies state that the Pareto distribution was not a good fit for the distribution of overall income groups but only applied to the income distribution of 1–3% of the highest income group [54]. Therefore, this study adopts B2 and not these two types of distribution. Similarly, Chotikapanich et al. [6] and Hu et al. [8] adopted B2 to analyze the income distribution of urban and rural residents and obtained a good fitting effect. However, Hu [9] posited that GB2 enjoyed the best fitting effect. Thus, this study not only compares the goodness of fit of B2 to that of the other seven distribution functions but also empirically examines whether GB2 is a good fit for the income distribution of urban and rural residents.

##### 2.3. Parameter Estimation of Distribution Functions

The two most popular methods for estimating parameters of distribution functions are maximum likelihood estimation (MLE) and moment method estimation (MME). To apply MME, we must have complete individual data; however, according to Chotikapanich et al. [44], Hu et al. [8], since the yearbook only provides the maximum and minimum of the grouped income rather than the average, MLE would be more appropriate than MME.

To find the maximum of the likelihood function, one may use the first order condition; that is, we set the partial derivatives to zero and solve. This is feasible if the likelihood function is simple; however, the likelihood functions we encounter are oftentimes too complex to be differentiated analytically. Even if the partial derivatives can be calculated, the separate problem of solving for the zeros may still be too formidable a task. This issue has been addressed by some research for particular distributions, but the problem considered in this paper is more complex.

This study shows that the higher the number of parameters, the greater the difficulty to estimate. Here, taking GB2 as an example (see formula (8)), the log-likelihood function with number of random samples is expressed as where , , , and are the partial derivatives, and thus where is the derivative of the gamma function’s natural logarithm and can be obtained as

Formulae (13) and (15) can be calculated as

By substituting formulae (15) in formula (10), and can be expressed by including and . and are then used in formulae (12) and (13) and and are estimated using the Newton-Raphson iteration.

The above method is based on the assumption that we have complete individual data. The data of the present research is grouped data, so the use of this approach would not be appropriate.

To approach the technical issue of finding the parameters which maximize the likelihood function, we have several options. The most direct approach is that after defining the likelihood function, we use available software to find the parameters that maximize the value of this function. Most software such as MATLAB has a built-in command to implement this search. Our numerical experiments suggest that this method is quite efficient and robust when we have no more than 3 parameters; however, with more parameters, the likelihood function could be very irregular. To the authors’ knowledge, most software uses a recursive method to search for the maximum value, which depends heavily on the initial point. On the other hand, since the 4-parameter distributions are so complicated, it is often impossible for us to choose a starting point that is “reasonably” close to the true maximum. If we fail to have a good initial point to start with, we may end up with two results: the software may fail to find a point before it runs out of the maximum number of iterations, or we obtain a local maximum instead of the global maximum.

To avoid this, we utilized the grid algorithm to find the maximum of the likelihood function. This method does not require us to give a good starting point and has shown to be quite efficient according to our analysis. The errors between the theoretical distribution and empirical distribution, which is measured by chi-square value, are rather small. From Table 1, we see that the result is quite acceptable. It should also be mentioned that, in household income analysis, it is most important to correctly estimate the distribution of the incomes in the lowest and the highest group; for example, setting the poverty line is based on the distribution of the lowest income group while the personal taxation system design follows the distribution of the highest income group.