Abstract

Situated in southern China, Zhaoqing City is a part of Guangdong Province, China. The total administrative area of the city covers 14,891 square kilometers. The data of China’s seventh population census in 2020 showed that the permanent resident population in Zhaoqing City reached up to 4,413,594. Meanwhile, Zhaoqing is one of the cities in the Guangdong-Hong Kong-Macao Greater Bay Area. House price analysis and prediction carried out against Zhaoqing City will have directive significance for relevant policies formulated by the local government, residential investment or purchase of consumers, and prediction of house price trend as well as business decisions made by enterprises. By virtue of machine learning and statistical theory, the house price in Zhaoqing City from 2010 to 2020 will be researched, and the house price prediction model of Zhaoqing City will be constructed in this paper with several variables including GDP, proportion of tertiary industry, income of urban residents, fiscal revenue, land price, investment volume in real estate development, permanent resident population, population density, and proportion of urban population in net migration. First of all, the methods of correlation analysis will be utilized, to select variables that are highly correlated with house price data based on correlation coefficients. Then, the model will be constructed for predicting the house price on the basis of multiple linear regression analysis that is conducted with selected variables. Finally, the prediction model will be adjusted gradually based on data with different correlations selected from available data, to realize better imitative effect and more precise predictive effect and select optimum prediction model. By means of the above model, the house prices of Zhaoqing City in 2021 and beyond will be predicted accurately, with preferable fitting effect and prediction effect.

1. Introduction

1.1. China’s Real Estate Development

The real estate industry is one of the pillar industries of China’s economy and has a significant impact on the national economy. The real estate industry can directly drive the growth of gross domestic product and contribute to China’s fiscal revenue. At the same time, the real estate industry also drives the development of upstream and downstream industries such as building materials, machinery manufacturing, finance, and home appliances.

1.1.1. Development of China’s Residential Real Estate

The growth of the real estate business can improve people’s lives. Land and real estate have always been valued by the Chinese. Every Chinese requires a fixed dwelling, so they are eager to buy properties. Housing is very important to Chinese people. It is not simply a sign of a home but also a requirement for marriage. In this way, some Chinese struggle for a lifetime to buy a home. Real estate development can better meet people’s material requirements and improve their quality of life.

Since the People’s Republic of China’s founding, real estate has been slowly developing. Prior to reform and liberalization, China had a planned economy and welfare housing distribution. A building has several single rooms. Those days, welfare housing had a high population density and a small dwelling area. China’s reform and openness era began in 1978. Apartments and villas have replaced tile roofed dwellings and bungalows in China. From 1980 to 1991, China’s housing circumstances improved, as did housing quality and facilities. Welfare housing has steadily been supplanted by commercial housing for personal usage. In 1992 and 1993, China’s real estate market overheated. From 1993 to 1997, China lowered property prices. The 1997 economic crisis lowered China’s property prices. China’s economy and real estate market have both grown rapidly since 1998. China’s property values are soaring. At the same time, China’s housing market has changed. Environmentally friendly housing has gradually become the preferred choice of Chinese youth, increasing their sense of contentment.

1.1.2. Development of China’s Real Estate Investment

Real estate investing includes land and property development, property management, and property buying. To realize the investment advantage, it must mobilize many social resources, including capital, land, material, labor, technology, information, and others. Early stage is from 1978 to 1998, and growth stage is from 1998 to present of China’s real estate investment (1999-present).

The Eleventh Central Committee of the Communist Party of China decided to implement reform and opening in 1978, which boosted the development of the real estate business. To begin, China’s investigation of the virtuous circle of housing funds modified the national policy of unified building and distribution, enacted public housing sales and rent reform, and opened the bank’s real estate credit operations. Second, adopt policies such as public-private collaborative construction, establish housing management cooperatives, rent or sell land with compensation, and increase real estate development. From 1992 to 1998, there was irrational real estate speculation. Then property values soared. Because of this, the state changed its regulations and created the Housing Provident Fund System. This system has enhanced the investment system of real estate investment and established the ground for the growth of policy real estate investment under the market-oriented system. From 1999 to 2015, the arrival of diversified capital chains, new regulatory regulations, and real estate securitization improved real estate investment. Simultaneously, land availability has increased, while demand has soared, resulting in several price-raising behaviors, such as real estate speculation.

Houses are for living, not for speculating, said a December 2016 Chinese economic conference. To accelerate the creation of a long-term mechanism fit for the real estate industry’s market law. The Chinese government has used prudent financial, land, fiscal, taxation, investment, and legislative measures to contain the real estate bubble and keep prices stable. The rental sales ratio of property prices is excessively high in China’s top tier cities. Due to high housing costs and strong life pressure in first tier cities, many new residents are unable to live a house life. Due to population reduction, house values in second and third tier cities tend to be steady or decline.

2. Brief Introduction of Zhaoqing’s Real Estate Market

House prices in China have been increasing in the last 20 years and have a large impact on people’s lives. The disparity in house prices between Chinese cities is large. Guangdong Province, located in the south of China, including Guangzhou, Foshan, Zhaoqing, Shenzhen, Dongguan, Huizhou, Zhuhai, Zhongshan, and Jiangmen, is an example. The first-tier cities, represented by Shenzhen and Guangzhou, have very high house prices and are at the forefront of the world. According to CBRE’s Global Living Report 2020, Shenzhen is in fifth place globally with an average house price of $738 per square foot.

Although Zhaoqing is part of the same Guangdong Province as Shenzhen and Guangzhou, the house price is lower than the provincial average. According to Figure 1, Zhaoqing, located in the central and western part of Guangdong Province, the whole territory is located at 22°47-24°24 N latitude and 111°21-112°52 E longitude, with subtropical monsoon climate, and is one of the important node cities in Guangdong-Hong Kong-Macao Greater Bay Area. Zhaoqing City has jurisdiction over Duanzhou District, Dinghu District, Gaoyou District, etc., with economic functional areas such as Zhaoqing High-tech Zone. The total area of the city’s administrative region is 14,891 square kilometers [1]. According to the 7th China Census 2020, the resident population of Zhaoqing is 4,413,594. The Guangdong-Hong Kong-Macao Greater Bay Area is located in the south of China and includes Hong Kong, Macau and Guangzhou, Shenzhen, Foshan, Dongguan, Zhuhai, Jiangmen, Zhaoqing, Huizhou, and Zhongshan in Guangdong Province, with a total area of 56,000 km2 and a total population of about 71,159,800.

Figure 2 plotted by the National Bureau of Statistics can visually reflect that the house price in Zhaoqing is lower than the average house price in China and significantly lower than the average house price in Guangdong Province from 2010 to 2020.

The house price trend in Zhaoqing is a microcosm of a part of China’s cities with a lower economic level but a greater potential for economic development. According to the China City Statistical Yearbook of previous years, since 2010-2020, the house prices in Zhaoqing have risen and then fallen, and total residential sales have been increasing. Figure 3 shows the trend of residential sales and house prices in Zhaoqing from 2010 to 2020.

The residential sales price per square meter in Zhaoqing is influenced by many factors. This research uses correlation analysis, linear regression models, and political, social, and locational factors to establish a prediction model for Zhaoqing house prices based on the relationship between house prices and data on several influencing factors. Establishing the prediction model of Zhaoqing house price is a guide for the government to make policies, consumers to invest or purchase residential properties, enterprises to judge the trend of house price and house price prediction, and upstream and downstream manufacturers to make inventory and business decisions.

The house price prediction model in Zhaoqing is a guide for the Zhaoqing government to introduce policies to control house prices. If house prices are too high and growing too fast, the government can take a series of measures to control house prices; if house prices continue to fall, it is likely to have a negative impact on the local economy, and the government can make preparations for a reduction in real estate tax revenue. At the same time, the house price prediction model can make the house price prediction model can make the government aware of the gap between house prices and residents’ income and vigorously develop the economy to make up for this gap and improve residents’ quality of life. Stable house prices can make Zhaoqing’s economy run more smoothly.

In addition, home price prediction models guide consumers in purchasing or investing in homes. The house price prediction model has implications for consumers, who can determine whether house prices will rise or fall in the future and the magnitude of the change in house prices, etc., and calculate the rate of return on investment in real estate. Consumers can also determine whether it is the right time to buy a home as an immediate need and whether the market is at a lower price at this time, and residents can decide whether to buy a local home in that year based on this model.

From the industrial perspective, for companies, advance judgment of house price trends and house price forecasts can bring some value, and upstream and downstream manufacturers can prepare inventory and make business decisions. Companies can judge rental costs based on the house price forecast model, and investment companies can judge future earnings based on this model. Industries that are highly correlated with real estate prices need to determine real estate prices and rents in advance, such as manufacturing and retail, and the house price prediction model enables companies to make better decisions.

3. Literature Review

There are many classic theories affecting house prices. This paper introduces location theory, land rent and land price theory, and land return decline theory.

The neoclassical economist Alfred Marshall (2009) put forward the location theory to analyze the factors such as economy, society, politics, climate, and location, which reflects the correlation degree of the mutual operation relationship of location subjects in spatial location. Comprehensive location factors have a certain impact on house prices, and location theory can better analyze house prices [2]. Petty put forward the theory of land rent and land price. Then, there is the emergence of Western neoclassical urban land rent theory, which analyzes the spatial distribution characteristics of urban land prices. This theory can better analyze and compare the house prices in different regions of the city [1]. Malthus summarized the theory of diminishing land returns and analyzed the view that the annual output increased according to the proportion of farming progress must gradually and invariably decrease compared with the previous average increase. This theory plays a guiding role in real estate investment and development [3].

In recent years, with the development of statistics and machine learning, many researchers have carried out quantitative analysis on the variables that affect housing prices, which has realized the prediction of housing prices. Some researchers have used the same multivariable linear regression model as this paper to predict housing prices in different cities, which has achieved good results. Also, some researchers have used other models using machine learning and data mining algorithms, including ANN model, SVM, ARIMA model, Neural Network, C&R Tree, Multilayer Perception model, GM (1,1) model, and Hedonic model to predict the housing price of some cities in the Guangdong-Hong Kong-Macao Greater Bay Area of China, which also has achieved great results.

In recent years, scholars have used linear regression models to predict housing prices in different cities around the world. Ghosalkar and Dhage selected the data of three factors affecting house prices, including physical conditions, concepts, and locations. Without any expectation of market price and cost increment, they used linear regression to predict house prices in Mumbai, India [4]. Kaushal et al. compared the accuracy of some machine learning models, such as multiple linear regression, Lasso, ridge, and decision tree regression models, to find out which one has the best performance. The results showed that the multiple linear regression model selected in this paper has the highest accuracy compared with other machine learning models [5]. Amri and Tularam established multiple regression, linear regression, and nonlinear models to compare their effects on predicting house prices in Bathurst, Australia. The results show that the improved linear MR performed almost as well as the nonlinear NN. In most situations, linear approaches appear to be equally as exact as the more time-consuming nonlinear methods for accounting for variances and variation [6].

In addition, some scholars have used other models of machine learning to predict housing prices in cities in the Guangdong-Hong Kong-Macao Greater Bay Area, such as Hong Kong, Macau, Shenzhen, and Guangzhou. Abidoye et al. collected variables affecting house prices in Hong Kong, such as interest rate, unemployment rate, and family size, and fitted them with ARIMA, ANN, and SVM models to generate forecasts of real estate prices [7]. Fong and Wah gathered multiattribute datasets from the Macao SAR Government’s Statistics and Census Service and used several data mining methods and algorithms such as SVM, Neural Network, C&R Tree, Weka, SPSS, and Multilayer Perception model to forecast Macao home prices [8]. Hongbin constructed and compared the prediction accuracy of ARIMA model and GM (1,1) model on house prices in Shenzhen and judged which model has better effect in predicting the change trend of house prices in the future [9]. Bilei used the Hedonic model to analyze the relationship between honorary title and house price in the Guangzhou community, and the model test results are good [10].

Previous scholars used linear regression and other machine learning models to analyze and predict housing prices in many cities. However, there is currently no research on the housing price prediction on Zhaoqing City in the Guangdong-Hong Kong-Macao Greater Bay Area of China. Based on correlation analysis and multiple linear regression analysis, this study provides a set of methodology as well as a model for predicting housing prices in Zhaoqing. This model is of guiding significance for the Zhaoqing government to formulate policies, consumers to invest or purchase houses, and enterprises to judge the trends of housing price.

4. Research Hypothesis and Analysis of Factors Affecting the House Price of Zhaoqing

4.1. Research Hypothesis

We hypothesize that collecting the data of multiple factors affecting the house price in Zhaoqing can predict the house price. Moreover, the house price of Zhaoqing can be expressed as a formula by multiple influencing factors.

The factors affecting the house price in Zhaoqing are divided into quantifiable and nonquantifiable factors. Quantifiable factors, such as GDP and population size, can be converted into data that visually reflect the variable and form a formula to predict the house price. The nonquantifiable factors, such as policy factors, which can also be used to predict the house price in Zhaoqing, need to be considered outside the model.

4.2. Quantifiable Factors

The following factors that affect house prices can be found in the relevant data, and the researcher can quantify the factor. Based on the data below, a model is developed using correlation and multiple regression methods that can be used to predict future house prices in Zhaoqing City.

4.2.1. Economic Factors

(1) Economic Development Level (GDP) and Fiscal Revenue. GDP is the final result of production activities of all resident units in a country or region in a certain period of time [11]. GDP of Zhaoqing, i.e., the value of production activities in Zhaoqing in a certain period of time, can be used to measure the level of economic development of Zhaoqing. Fiscal revenue, the sum of all funds raised by the government to carry out its functions, implement public policies and provide public goods and services as needed [12]. Zhaoqing’s GDP influences Zhaoqing real estate prices. The higher Zhaoqing’s GDP, the more developed the local economy is, and the more valuable Zhaoqing real estate is. The price is determined by the value, and the more expensive the house in Zhaoqing, the higher the price. Higher fiscal revenues indicate higher government revenues through taxes and other means, representing a more dynamic local economy with more business activity.

(2) Industrial Structure. Industrial structure is a concept proposed in development economics. Industrial structure, also called industrial system, is the main component of the socioeconomic system, including agriculture, industry, and services, i.e., primary, secondary, and tertiary industries [13]. Figure 4 is a graph of the ratio of the output value of primary, secondary, and tertiary industries in GDP of Zhaoqing from 2010 to 2020, based on the data of Statistics Bureau of Guangdong Province.

It can be seen from the above graph that the output value ratio of primary industry in Zhaoqing is the lowest from 2010 to 2020. In 2010-2016, the share of output value of tertiary industry in Zhaoqing is lower than the share of output value of secondary industry. However, in 2017-2020, the share of output value of tertiary industry in Zhaoqing is higher than the share of output value of secondary industry. These data indicate that the proportion of tertiary industry output value in Zhaoqing has increased in recent years. The higher ratio of tertiary industry means the higher quality of economic development and the more attractive to foreign workers. High-end service industry can attract talents, and the concentration of population makes the demand for housing higher. At the same time, the developed tertiary industry provides a certain number of supporting facilities for housing, such as restaurants, entertainment and recreation, and sports. The secondary industry, on the other hand, has lower wage levels than the tertiary industry, fewer workers, and difficulty in population clustering, which leads to a lower demand for housing. Therefore, the proportion of both secondary and tertiary industries in Zhaoqing is likely to be a factor affecting the house price in Zhaoqing.

(3) Income of Urban Residents and Level of Employment of Residents. The income of urban residents is the sum of value created by workers in a city over a certain period of time [14]. The employment level of residents is the number of employed population. The employed population is the population aged 16 years and above who are engaged in certain social labor or business activities and receive labor remuneration or business income [15]. The higher the disposable income of urban residents, the better the local economy, the stronger the willingness of residents to purchase residential housing, and the higher the demand for residential housing. The higher the employment level of residents, the more the local working population, and the stronger the ability of the working population to buy residential houses. Therefore, the income of urban residents and the employment level of residents in Zhaoqing may have some influence on the house price in Zhaoqing.

(4) Land Price and Investment in Real Estate Development. The land price is the price of a piece of land or a tract of land at a certain point in time in a certain state of rights [16]. The price of land is the cost of building a home. The higher the cost of constructing a residence for a developer, the higher the selling price of the house may be accordingly. In addition, the planning of residential land in Zhaoqing City will also affect the price of housing.

Real estate development investment is the amount of investment completed in a certain period of time, and real estate development companies and commercial housing construction companies matter real estate development or business activities of the unit of unified development [17]. If the amount of investment in real estate development in Zhaoqing is increased, it may be that developers invest more costs in individual properties, or developers may build more properties in Zhaoqing, which may have an impact on the price of housing in Zhaoqing.

4.2.2. Social Factors

(1) Population (Resident Population, Population Density, and Net Migration of Population). The resident population is defined as the population that is always at home or resides at home for more than 6 months throughout the year and also includes mobile people who live in the city where they are located. Population density is the number of people per unit of land area. In addition, it is an important indicator of the distribution of population in a country or region. The higher the population density, the more concentrated the population is. The more concentrated the population is, the more productive the place is, the higher the economic level is, and the higher the house price is likely to be. The population migration is the absolute amount of population moving in and out of a certain period and a certain area [18]. Net population migration is to some extent indicative of the local economic situation and the income level of local residents, and this factor may have some impact on house prices.

(2) Urbanization (Proportion of Urban Population). The degree of urbanization is measured by the proportion of urban population. Urbanization refers to the historical process of gradual transformation of a country or region’s society from a traditional rural-type society, which is mainly agricultural, to a modern urban-type society, which is mainly nonagricultural industries such as industry and services, with the development of social productivity, progress in science and technology, and the adjustment of industrial structure [19]. The proportion of urban residents reflects the stage of development of local urbanization. The greater the degree of urbanization, the more developed the economy, and consequently, the more expensive the home.

Table 1 is a list of factors that can be quantified to influence house prices in Zhaoqing. Since data prior to 2010 is more difficult to obtain from public sources, only data on factors affecting house prices in 2010-2020 are used.

4.3. Nonquantifiable Factors

In addition to quantifiable economic and demographic factors, there are other nonquantifiable factors that affect the house prices in Zhaoqing, such as political factors like policies and planning and social factors like transportation and education and location, among others. Since these factors are not easily quantifiable, such factors are not analyzed in depth in the model part of the research, but they can be considered as supplementary elements in the practical operation of house price analysis and forecasting.

4.3.1. Political Factors

(1) Policies of Guangdong-Hong Kong-Macao Greater Bay Area. Benchmarked against the San Francisco Bay Area and New York Bay Area in the United States, the Guangdong-Hong Kong-Macao Greater Bay Area is an important strategy for China to promote in-depth cooperation between the Mainland, Hong Kong, and Macao and is one of the most open and economically vibrant regions in China. Relying on the Guangdong-Hong Kong-Macao Greater Bay Area, Zhaoqing City has certain development potential and space.

Zhaoqing makes full use of the Guangdong-Hong Kong-Macao Bay Area’s regional and economic advantages. As can be seen, Zhaoqing’s economy will continue to grow in the future. Zhaoqing will continue to attract more employed people and investors because of Guangdong-Hong Kong-Macao Bay Area policies, and Zhaoqing’s house prices will be significantly influenced in the future by Guangdong-Hong Kong-Macao Bay Area.

(2) Housing Policy and Urban Planning. The residential purchase policies in the Guangdong-Hong Kong-Macao Greater Bay Area differ from each other. Since Hong Kong and Macau use different real estate market policies than Mainland China, only cities in the Pearl River Delta are selected for residential purchase policy comparison in this research. The Pearl River Delta in Guangdong Province is one of the strongest city clusters in China, consisting of nine cities with high economic levels in Guangdong Province, Guangzhou, Foshan, Zhaoqing, Shenzhen, Dongguan, Huizhou, Zhuhai, Zhongshan, and Jiangmen. Table 2 provides a visual comparison of the residential purchase policies of Zhaoqing City and other cities in the Pearl River Delta.

Compared to other cities in Guangdong Province, Zhaoqing is the only city that does not restrict the purchase of residential units for both local and foreign household members. This is due to the lower economic level of Zhaoqing compared to other cities in the Guangdong-Hong Kong-Macao Greater Bay Area and the lower demand for purchasing residential properties. But at the same time, Zhaoqing’s real estate has a huge potential for development.

In addition, Zhaoqing City has reasonably controlled the scale of residential land in recent years. According to the Urban Master Plan of Zhaoqing New Area, Guangdong (2012-2030) and the Urban Master Plan of Zhaoqing (2015-2030), the planned urban construction land in the new district is 6,046.1 hectares, with 1,740.9 hectares of residential land, accounting for 28.8% of the urban construction land. This provides space for property developers to build new residences.

4.3.2. Social Factors

(1) Culture and Tourism. Zhaoqing has a relatively deep historical and cultural heritage and rich tourism resources. Zhaoqing Star Lake Scenic Area is a National AAAAA level tourist attraction in China [20]. Based on the rich tourism resources, Zhaoqing can vigorously develop tourism industry, pension industry, etc., which will bring more economic benefits.

(2) Transportation. Zhaoqing is the only regional center city in the Greater Bay Area bordering the Great Southwest of China and is positioned as a hub gateway connecting the Greater Bay Area to the Great Southwest. In recent years, Zhaoqing has been encrypting its high-speed rail and urban rail network, and it has become a reality to travel from Zhaoqing to Foshan in 20 minutes, Guangzhou in 40 minutes, and Shenzhen in 1 hour and Hong Kong in 1 hour and 20 minutes [21]. This enhances Zhaoqing’s overall transportation capacity, accelerates the agglomeration of people, logistics, capital, and information, and promotes economic development.

(3) Public Service Facilities and Education. Zhaoqing plans to improve public service facilities such as land for administrative offices, cultural facilities, education and research, sports, medical and health care, and social welfare facilities. In recent years, Zhaoqing’s public service facilities are being gradually improved. In addition, Zhaoqing has a stable education workforce and an overall increase in expenditure costs for education, which basically meets the needs of basic education.

4.3.3. Locational Factors

(1) Radiation Influence of Surrounding Cities. Zhaoqing relies on the Guangdong-Hong Kong-Macao Greater Bay Area, with Guangzhou City and Foshan City of relatively high economic level nearby, forming a pattern of Guangzhou, Foshan, and Zhaoqing as one city. Guangzhou, as the national central city, has a strong regional radiation carrying power. The central city of Foshan and the central city of Zhaoqing, as subcenters, carry the comprehensive service function of leading the comprehensive and balanced growth of the economic circle. In recent years, Zhaoqing has been developing its economy by taking advantage of its resources and using the radiation drive of Guangzhou and Foshan.

(2) Climate. Zhaoqing is located on the south side of the Tropic of Cancer and is a humid climate region influenced by the southern subtropical monsoon. With abundant rainfall, abundant sunshine and mild climate, the annual average temperature of 22°C, Zhaoqing has a good climate environment for people to work and live [22]. Due to the local cold, some northerners in China purchase homes in the south, which may have an effect on Zhaoqing’s housing prices.

5. Research Method

The Pearson coefficient correlation analysis and multiple linear regression analysis were chosen for this research. Firstly, the variables with high correlation with the house prices in Zhaoqing City in 2010-2018 were found by correlation analysis and then analyzed based on the multiple linear regression model to obtain the value of the goodness of fit . Then, according to the formula of the multiple linear regression model, the difference between the predicted and actual house prices for the two years of 2019 and 2020 is obtained . Finally, the value of the goodness-of-fit and the difference between the predicted and actual house prices of house prices are combined to observe the prediction effect.

This research first selected all variables with correlation coefficients greater than 0.8 for the years 2010-2018 to build multiple regression model A. In order to enrich and improve the research, I debugged the model. In the process of adjusting the model, this research selected variables with correlation coefficientsgreater than 0.79 and 0.78 from 2010 to 2018 to build multiple regression models B and C. I selected variables with correlation coefficientsgreater than 0.8 and 0.77 from 2010 to 2019 to build multiple regression models D and E. Finally, I selected the most suitable model for predicting Zhaoqing house prices by comparing and .

5.1. Model Introduction
5.1.1. Correlation Coefficient

This research uses Pearson correlation coefficient to investigate the relationship between multiple variables and house prices in Zhaoqing.

The overall correlation coefficient is defined as the ratio of the covariance between two variables and , and the product of the standard deviation of the two is as follows:

Estimating the covariance and standard deviation of the sample gives the sample Pearson correlation coefficient, often expressed by :

can also be estimated from the standard score means of the sample points to obtain an expression equivalent to the above equation:

In the equation,,andare the standard score of sample, sample mean, and sample standard deviation, andis the number of samples [23].

5.1.2. Multiple Linear Regression

(1) Model Assumption. The linear assumption means that the mean of the distribution obeyed by the dependent variable can be expressed using a linear combination of the independent variable and its associated term.

The assumption of normality is the most fundamental assumption that distinguishes linear models from other models. It means that the dependent variables corresponding to each sample obey normal distribution. Combined with the linearity assumption, the linear model can be written in the following form:

Because the residuals of the model , so there is . The normality assumption actually requires that the residuals of the model corresponding to each sample obey the same normal distribution. If does not obey a normal distribution, it is not a linear model.

(2) Establishment of Model. Multiple linear regression is a model that predicts quantitative based on multiple predictor variables . It assumes an approximate linear relationship between and . We can write this linear relationship as follows:

In equation (5), expresses the intercept term in the linear model, and express the slope term. We use the estimated values of the model coefficients and , and it is possible to use the specific variable to predict the future quantitative .

In the equation, expresses the value of based on the prediction of .

(3) Model Solution. In the multiple linear regression equation, a matrix is usually used to solve the equation. The weight of the independent variable with respect to the dependent variable , in linear algebra, can be written in the following form:

The results from linear regression should satisfy the following relationship with the actual results:

Then, the probability distribution density function of the error can be written as follows:

In addition, the maximum likelihood function is as follows:

Express the square of the mean error of the model

It can be expressed in matrix as follows:

Taking the partial derivative of equation (12), is regarded as a column vector, and let , the solution is as follows:

The best was solved above using matrix operations, and the best will be solved by gradient descent method next, and the results will be analyzed [24].

(4) Model Evaluation. The research performed multiple regression analysis on multiple independent and dependent variables, and the data of multiple regression models with multiple indicators could be obtained. The indicators of the multiple regression model include Multiple , Square, Adjusted Square, standard error, Observed Value, df, SS, MS, Significance , Intercept Coefficients, and Variable Coefficients. Below is my interpretation of the multiple linear regression model for each metric is interpreted below.

First, I would like to introduce the metrics of the regression statistics section; Multiple refers to the correlation coefficient , whose value is between -1 and 1. The closer the value of Multiple is to 1, the higher the positive correlation, and vice versa, the higher the negative correlation. Adjusted Square refers to the corrected coefficient of determination. When researchers compare two regression equations with different numbers of independent variables, they must also consider the effect of the number of independent variables included in the equation. The standard error is equal to the square root of the residual SS/residual df. This, like the coefficient of determination, describes how well the regression model fits the actual data, and it represents the distance between the actual values and the regression line. The observations, on the other hand, refer to how many sets of independent variables there are.

Then, I want to introduce the metrics of the analysis of variance (ANOVA). Regression analysis df refers to the degree of freedom of the regression analysis model, the number of independent or free to vary in the sample when the overall is estimated from the sample. The data degrees of freedom are equal to the number of sample groups minus 1. The degrees of freedom of the regression analysis model are 1, that is, this regression model has 1 parameter, and the residual degrees of freedom are equal to the total degrees of freedom minus the degrees of freedom of the regression analysis model. SS is the sum of error squares, and MS is the mean squared deviation. Significance refers to the value of the -test. Subtracting this value from 1 is the confidence level. When the value is less than 0.05, the confidence level of the regression model is greater than 95%.

The last metric I would like to introduce is the regression parameter table. Stat refers to the regression coefficient or standard error. -test refers to the correlation of a variable with the dependent variable . Intercept Coefficients refers to the regression value of the intercept. Variable Coefficients refers to the regression value of the slope. value is the value corresponding to the -test [25].

The main metrics used in this paper are Square, Intercept Coefficients, and Variable Coefficients. After obtaining the individual data of the multiple regression model, we use Square to observe the magnitude of the model fit and thus select the best prediction model. Then, the formulas of this model are given using Intercept Coefficients and Variable Coefficients, which are used to predict the future house prices in Zhaoqing City.

(5) Application Area of Model. Linear regression models have important applications in the fields of biology, medicine, economics, management, agriculture, industry, engineering, and technology, in addition to the field of economics.

5.2. Data Analysis
5.2.1. Correlation Analysis

The relationship between house prices ( variable) and influencing factors ( variable) in Zhaoqing from 2010 to 2018 was calculated using Pearson correlation coefficient as Table 3.

Select from , shown in Table 4.

5.2.2. Multiple Regression Analysis

Multiple regression analysis is carried out between the selected five variables and house price in Zhaoqing as Table 5.

Obtain .

Substituting the data for 2019 to test the model,

Substituting the data for 2020 to test the model,

Average prediction difference

5.3. Model Debugging Records

In order to select a relatively good model to predict house prices, different models need to be built by selecting different variables and using data from different years. Then, the model for predicting future house prices in Zhaoqing City is selected by comparing and the difference between actual and forecast.

5.3.1. Adjust the Model with Data of 2010-2018

In the process of testing the model for 2010-2018, the variables with correlation coefficientsgreater than 0.8 were first selected to build the model separately. To debug the model, we then selected the influential variables with greater than 0.79 and 0.78 to build two other regression models, respectively, and then tested them for comparison based on the data for the two years 2019 and 2020.

(1) Select . In the data from 2010 to 2018, the variables selected with are as Table 6.

Multiple regression analysis is carried out between the selected six variables and house price in Zhaoqing as Table 7.

Obtain .

Substituting the data for 2019 to test the model,

Substituting the data for 2020 to test the model,

Average prediction difference

(2) Select . In the data from 2010 to 2018, the variables selected with are as Table 8.

Multiple regression analysis is carried out between the selected seven variables and house price in Zhaoqing as Table 9.

Obtain .

Substituting the data for 2019 to test the model,

Substituting the data for 2020 to test the model,

Average prediction difference

5.3.2. Adjust the Model with Data of 2010-2019

Because the house prices in Zhaoqing are in an overall increasing trend from 2010-2018, but in a decreasing trend from 2019 to 2020, building the model with the data from 2010 to 2018 may have the problem of large deviation between the predicted and actual values. Therefore, in the process of debugging the model, the data of 2019 was also selected to be included in the prediction model, and the data of 2020 was used as a test.

The relationship between house prices ( variable) and influencing factors ( variable) in Zhaoqing from 2010 to 2019 was calculated using Pearson correlation coefficient as Table 10.

(1) Select . In the data from 2010 to 2019, the variables selected with are as Table 11.

Multiple regression analysis is carried out between the selected seven variables and house price in Zhaoqing as Table 12.

Obtain .

Substituting the data for 2020 to test the model,

Prediction difference

(2) Select . In the data from 2010 to 2019, the variables selected with are as Table 13.

Multiple regression analysis is carried out between the selected eight variables and house price in Zhaoqing as Table 14.

Obtain .

Substituting the data for 2020 to test the model,

Prediction difference

5.4. Select a Better Prediction Model

From the above data analysis, the regression coefficients of the five models , , , , and were obtained as: , , , , and ; the absolute values of the differences between actual and predicted house prices used to test the five models are as follows: , , , , and .

Compare the regression coefficients of the five models: .

Compare the absolute value of the difference between actual and predicted house prices for the five models: .

Since the larger the regression coefficient , the better the model fit; the smaller the absolute value of the difference between the actual and predicted house prices, the better the prediction. Combining the two, we can choose the relatively better one model to predict house prices. Because the regression coefficients of the five models are all above 0.995, the regression coefficients are all at a high level, and the fitting effect is better, and the absolute value of the difference between the actual house price and the predicted house price of model is the smallest, so model is chosen to predict the house price, that is,

where is income of urban residents, is employed persons of residents, is land price, is investment in real estate development, is resident population, is population density, and is proportion of urban population.

In summary, model was screened as the final model to predict the house prices in Zhaoqing by comparing the absolute values of the regression coefficients and the differences between the actual and predicted house prices. The final multiple regression model is derived based on the full data from 2010 to 2020 to predict the house prices in Zhaoqing in 2021 and beyond as Table 15.

The model for predicting house prices in Zhaoqing was obtained as follows:

where X1 is income of urban residents, X2 is employed persons of residents, X3 is land price, X4 is investment in real estate development, X5 is resident population, X6 is population density, and X7 is proportion of urban population.

5.5. Application from Research Conclusions to Practice

This model can be applied to real life. For example, we can collect the data of income of urban residents, employed persons of residents, land price, investment in real estate development, resident population, and population density and proportion of urban population in the first half of the year and calculate the average and their value of the whole year. In this way, we can predict the house price of Zhaoqing at the end of the year through this model.

Governments, industries, and individuals can apply this model to real life. If home values continue to decrease, it is likely to have a detrimental influence on the local economy, and the government may prepare for a drop in real estate tax income. At the same time, the home price prediction model may alert the government to the disparity between house prices and residents’ income. It can also encourage economic growth to close the gap and enhance residents’ quality of life. Stable property prices can help Zhaoqing’s economy. Also, house price prediction models help customers buy or invest in properties. The model can estimate future housing values, their size, and the rate of return on investment in real estate. Based on this model, consumers may decide if it is the proper time to buy a home for an immediate need, and locals can decide whether to buy a local home that year.

6. Conclusions and Future Works

6.1. Conclusions

According to the research methods of machine learning and statistics, the method in this thesis can use the method of correlation analysis with the help of variables such as GDP, the proportion of secondary industry, the proportion of tertiary industry, urban residents’ income, residents’ employed persons, fiscal revenue, land price, the amount of investment in real estate development, resident population, population density, net migration of population within the province, net migration of population outside the province, and the proportion of urban population. Based on the correlation coefficients, the variables with high correlation with the house price data are selected and multiple linear regression analysis is performed to build a model to realize the prediction of house prices. Based on the available data, we select data with different correlations to continuously adjust the model to make the fitting effect better and the prediction effect more accurate and select the best prediction model.

Through the research process described above, the model derived in this paper for forecasting house prices in 2021 and beyond is as follows:

In the equation, expresses income of urban residents, expresses employed persons of residents, expresses the land price, expresses the investment in real estate development, expresses resident population, expresses population density, and expresses the proportion of urban population.

However, from a practical forecasting perspective, the unquantifiability of many influencing variables, such as policies, makes the accurate prediction of house prices still difficult, but it is undeniable that our model can give good guidance suggestions.

In the prediction model, there is an incremental increase of various indicators in 2019 and 2020, but the decreasing trend of house prices, making the forecast value and the actual value of the large deviation, which is more related to China’s policy of restricting house prices.

The Chinese government has introduced a series of policies to regulate and limit the growth of house prices, such as preventing the expansion of real estate investment and land use, by reforming the land grant, financial management, and investment system. In order to prevent government land from being bought at low prices and sold at high prices or borrowing heavily from banks, starting from the bank credit system, commercial banks raise the requirements for free capital and qualifications of real estate developers, should strengthen the examination of real estate developers when lending to open houses, and should restrict loans to construction companies with low own capital and high accounts receivable. Strictly control the financing supervision of real estate developers and increase the capital ratio of real estate developers.

In that research, in addition to giving models with implications for guiding house price forecasting, we also gained the following insights. When predicting house prices in a region, researchers should focus not only on quantifiable model data, but also on unquantifiable influencing factors, such as policy and social factors. Especially in countries with more government intervention in the market economy, policy factors have a greater impact on house prices and may exceed the extent to which economic factors affect house prices.

6.2. Future Works

From the modeling perspective, the unquantifiable factors in our research failed to be put into the house price prediction model, and these unquantifiable factors, such as policies, happen to be the factors that have a great influence on house prices, and adding unquantifiable factors can improve the prediction accuracy of the model to some extent. Due to the influence of unquantifiable factors, there may be some deviation between the final prediction results and the actual results. From the general trend of house prices in Zhaoqing given in this thesis, it is clear that the house prices in Zhaoqing have reached an inflection point in 2018, from an upward trend to a downward trend, which is strongly related to the Chinese government’s policy of “houses are used for living and not for speculation” and the policy of limiting the loan ratio for the second suite, that is, the down payment ratio should not be less than 40%. These policies restrict with a part of people to buy homes in Zhaoqing, which makes the demand and price of Zhaoqing homes become lower. In future research, the problem of quantifying nonquantifiable factors and how to include nonquantifiable factors in the model to make the forecast results more accurate and good research directions.

The time lag of the policy will also be a good direction for future research. The Chinese government implemented the purchase restriction policy in Beijing in 2010, put forward the concept of “houses are for living and not for speculation” in 2016, and put forward the second suite purchase restriction policy in 2017, while the house price in Zhaoqing only started to decline in 2019, before that it was in an upward trend, which shows that the policy has a certain lag on real estate prices. This will be a good direction for future research.

In addition to the multivariable linear regression model in machine learning used in this research, the logistics model and nonlinear regression model in machine learning are also good methods for predicting housing prices.

The essence of logistic regression is to divide the probability of occurrence by the probability of no occurrence and then take the logarithm. That is, this less cumbersome transformation changes the contradiction of the taking value interval and the curvilinear relationship between dependent variables and independent variables. The reason is that the probability of occurrence and nonoccurrence becomes a ratio which is a buffer, expanding the value range, and then performing logarithmic transformation, the entire dependent variable changes. More than that, this transformation often leads to a linear relationship between dependent variables and independent variables, which is summarized based on a lot of practice. Therefore, logistic regression basically solves the problem that dependent variables are not continuous variables. Also, logistic is widely used because many real-world problems fit its model. For example, whether an event occurs is related to other numerical independent variables. Therefore, logistic regression can also be used to predict housing prices based on various variables in reality [26].

The nonlinear model is a mathematical expression that reflects the nonlinear relationship between the independent variable and the dependent variable. Compared with the linear model, the dependent variable and the independent variable cannot be expressed as a linear correspondence in the coordinate space. Distinguishing whether it is a linear model is mainly based on the coefficient before the independent variable in a multiplication formula. If affects only one , then the model is a linear model. For example, in formula (35), and are curvilinear relationship, but it is a linear model, because it can be observed that is only affected by one in .

while formula (36) is a nonlinear model. Because it is observed that is not only affected by the parameter, but also by . If the independent variable is affected by more than two parameters, then this is nonlinear model.

In many cases, it can be found that the linear model cannot fit well, resulting in distorted data and even forced linear regression. As a result, it not only changed the normality of the original data but even changed the equal variance and independence of the data [27]. At this time, the introduction of nonlinear regression models should be taken into account, such as neural networks in machine learning, SVR, decision tree regression, and KNN regression. The use of these models to predict housing prices is also a good research direction in the future.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.