Abstract

Social media reviews play an increasingly important role in ranking the influence of urban brands. In this study, the public social media review database is mined, and a regression model of urban influence is established. Firstly, this study introduces text mining and data collection. Comment data are collected from static websites and dynamic websites, and ICTCLAS word segmentation tool is used to preprocess the comment data. The algorithm of urban influence level is established, and finally the regression model of urban brand influence is established. A database of 10000 city-related reviews was used in the experiment. The gender thesaurus is established to further improve the accuracy of the experimental results. The feasibility of the model is verified from two aspects: expectation calculation and standardization. This paper analyzes the composition of citizens who comment on social media and the proportion of comment content and puts forward some suggestions to enhance the influence of the city. Finally, it summarizes the popular comments in different months in December and obtains the force points to enhance the influence of the city in different time periods from the structure.

1. Introduction

With the steady development of the global economy, the process of urbanization is also advancing in an orderly manner. At present, by 2022, the total number of large, medium, and small cities has reached 725 [1]. Among them, those with a population of more than 2 million are called megacities. Usually, the larger the population, the greater the influence of the city. Of course, urban influence is also restricted by many factors, such as the degree of economic development, the per capita income of citizens, the traffic location of the city, and urban tourism resources. Under the premise of global market economy, it is expected that the number of cities in China will exceed 1000 in the next decade [2]. The size of the city or its economic influence will also restrict the development of the city. Moreover, it will have a profound and far-reaching impact not only on the economy but also on politics and culture, which is related to everyone as citizens. In the end, it is not difficult to draw the conclusion that urban development is a national event related to people’s livelihood and the country.

The more the urbanization progresses, the more cities there will be and the smaller the influence of a single city in theory will be. Therefore, the first task is to shape a good image of the city and promote the uniqueness of the city. Worldwide, how to define the city’s positioning, how to spread the city’s positive image, and how to enhance the city’s influence have become common issues faced by government departments at all levels and the media [3]. The dissemination of urban influence involves many departments, industries, and specialties, which is a complex and cooperative system engineering. At the same time, the promotion of urban influence is not achieved overnight but requires long-term planning and operation [4]. It is precisely because of the complexity and permanence of the promotion of urban influence that there are many problems in the shaping of urban image all over the world, such as homogenization, simplification, and backwardness. In the development process of urbanization, it is bound to attract a large number of migrant population [5]. A large number of migrant workers pour into cities from rural areas. While the population surge and labor force are supplemented, it also brings corresponding great challenges to urban management. If people’s livelihood problems such as clothing, food, housing, and transportation of citizens with a large population cannot be well handled, it will inevitably damage the influence of the city [6]. A big city is an economic and political center within a certain region, which not only affects the life and work of its residents but also is closely related to the development and trend of the surrounding cities. The community composed of adjacent cities bears the social, economic, and cultural functions of more residents, so the importance of cities is self-evident [7]. The most intuitive embodiment of the influence of a city is the city image. The city image refers to the impression and perception of the people on the city in the real world and virtual network, including the understanding and evaluation of the form and characteristics of the city [8]. Objective social existence is the main component of urban image. With the development of Internet and the rise of social media, the subjective evaluation of Internet users accounts for an increasing proportion in the composition of urban image [9]. The popularity of the city, the level of economic development, and the sense of belonging to the citizens are the three criteria for judging the influence of the city. These three aspects include not only the external infrastructure and supporting facilities of the city but also the connotation of a city, which is the embodiment of the combination of internal and external [10]. The promotion of urban brand influence mainly depends on one of its own advantages to make this advantage bigger and stronger. Then attract various factors conducive to urban development, the most important of which is the investment or settlement of various companies. At the same time, with the improvement of various government service functions, we will finally achieve the sustainable development goal of promoting urban economic development and improving citizens’ happiness. From the perspective of communication, urban influence is divided into three parts: political image influence, cultural image influence, and economic image influence [11]. Urban brand influence is an organic combination of these three, which are indispensable. They complement each other and promote the development of the other two.

Because the urban development of foreign developed countries started early, the research on urban image also began very early. In 1990, American scholars used the “diamond theory” to analyze the competitiveness of national cities. This theory holds that production factors, domestic demand market, strategy, structure of related and supporting industries and enterprises, and the performance of competitors are the four elements of urban influence [12]. Australian scholars have established a dynamic model of urban tourism and put forward the definition of several indicators of the impact of tourism development on cities [13]. The research on urban influence in China is relatively late, but it has also made a series of achievements. This paper makes an in-depth and comprehensive study on the influence of urban brand mainly in the aspects of TV advertising marketing, tourism layout development, and residents’ living feelings. Nowadays, with the network going deep into all aspects of the world, social software and online media have become an integral part of every resident’s daily life [14]. There are few studies on the evaluation of urban influence on social media, which is a part of the gap this study attempts to fill.

The accelerated development of urbanization is inseparable from the popularity of the Internet. From the perspective of going deep into all aspects of life, mankind has entered the era of big data. The scale of big data is so huge that it is impossible to directly obtain the desired information from it. It needs to be processed and analyzed, which is collectively referred to as data mining [15]. Social media comments are a very important part of big data. For the development of urban brand influence, the massive data of social media comments accumulated for more than 10 years have considerable mining value. To mine social media comment data, we need to know more about big data. According to the distribution of data, big data can be divided into structural big data, semi-structured big data, and unstructured big data. In cities, unstructured big data accounts for the highest proportion, which can reach 83.9%, and this proportion will continue to increase with the continuous improvement of data collection and storage [16]. On the basis of understanding and recognizing the characteristics of big data, this research also needs technology as a means to give full play to the value of big data. Nowadays, there are various popular and general algorithms, most of which have been relatively perfect and mature. Using computer algorithm technology to mine real social media comment data is the ultimate goal of big data processing. Only in this way can the real value of big data be reflected in the field of urban brand influence [17].

To sum up, there are still many deficiencies in the current research on urban brand influence. The first is the lack of representativeness of urban brand influence, which still stays in the description of thanks. There is no clear evaluation index to define it at the rational level [18]. Secondly, the research and practice of urban brand influence is not deep enough, and more scholars are needed to study it from different angles [19]. Finally, it is the backwardness of research tools and objects [20]. In the era of big data, social media comment data that were ignored in the past and could not be mined due to insufficient technical power can be better utilized. Using the new tool of computer algorithm to mine the social media comment data as a new object is the way and method of this study.

2. Confirmation and Preparation of Research Methods

2.1. Social Media Comment Text Mining

When citizens have demand, they will have expectations for urban-related facilities or services, and after this expectation, citizens will pay more attention to some aspects of the products or services provided by urban brands that they particularly care about. According to the expectation disappointment theory, if citizens fail to meet their expectations in what they care about, they will weaken their sense of belonging to the city and reduce the influence of the city brand. These relevant aspects of the city that are particularly concerned by the public are collectively referred to as public concerns in this study. It not only includes the characteristics of urban infrastructure and services that the public are most concerned about but also includes the public’s concerns and suggestions on these aspects. In the recent pace of urban development, the continuous demolition and construction of infrastructure is inevitable. The planning of infrastructure construction should be highly scientific. At the same time, the decision making of the whole planning and design should go beyond a single project and focus on the whole system. The existing urban infrastructure is increasingly unable to meet people’s normal work and life. With the development of the city, the urban infrastructure should also be continuously improved, scientifically reconstructed, and focused on the future, so as to better serve the people and improve their quality of life. Because citizens’ concerns are generated by different citizens, mining the main concerns of different citizens from social media comment data is the key content of citizens’ demand analysis. Combined with the above, the content not only needs to dig out the characteristics of urban infrastructure and services concerned by citizens but also needs to get the data of citizens’ concerns and suggestions in these aspects.

The two traditional methods of obtaining citizens’ ideas and opinions, issuing recycling questionnaires and household interviews, need a lot of time, as well as the human and material resources of government staff. The emergence of social media has effectively solved this problem. City managers can use social media comments authorized by citizens to get a large amount of citizen information in a short time, and these data can be updated in real time. The production factor in diamond theory includes time factor. Time is very important to the competition between cities, so it is of great strategic significance to mine the required information from massive unstructured data. Text mining has long been used in unstructured data analysis. The object of this study, social media comments, is considered to be typical unstructured data.

In 1995, two American scientists first proposed the probability of text mining, which is a comprehensive technology across many disciplines including information retrieval and statistics, big data mining, machine black box learning, and computer algorithms. Text mining can transform unstructured data into structured and easy to read data and then find keywords from structured data, mine hidden information, obtain new knowledge, and find the relationship between words and sentences. Text mining will first preprocess the text dataset, then use the text to represent the unstructured data, and finally mine the structured text to get the required information. This process can be represented by a flowchart. Figure 1 can clearly and intuitively reflect the process of text mining.

For Chinese text data of social media comments, data cleaning needs to be carried out first. Chinese text data cleaning includes deleting stop words, restoring Pinyin abbreviations, and eliminating meaningless phrases. Because the computer algorithm cannot directly recognize and process Chinese sentences, it is necessary to formalize the original social media comment text data first. Social media data provide a large amount of text information. This provides an unprecedented opportunity to understand social behavior in different places. With the continuous expansion of the scale of geotagged social media data, a large number of visual mapping elements overlap each other, making it difficult to visually capture the topics of interest and their spatial distribution. This paper presents a visual abstract framework for exploring large-scale geotagged social media data. Firstly, the probabilistic topic model is used to summarize the semantics of the text and extract a group of interesting topic features. A multiobjective sampling model is designed to generate a subset of the original dataset. This not only reduces the visual confusion of large-scale social media data visualization but also retains the sorting function of topics of interest and the geographical distribution dataset of the original social media. The visual abstract framework integrates rich visual design. Three models are usually used in the field of Chinese text processing, namely, dynamic vector space model, discrete probability model, and fuzzy concept model. This experiment selects a relatively more stable and mature dynamic vector space model to represent the social media comment data. In the dynamic vector space model, each text is a set of phrases, which are called features. At this time, the Chinese text can be expressed in the form of vectors. It can be expressed by the following formula.where refers to the weight of each vector in the text space, is the same value in the text space, and is the weight value of the same text for each corner mark. The specific weight value can be calculated by the following formula.

2.2. Social Media Comment Data Collection

At present, the mainstream social media have web version and mobile version, and the mobile version is divided into Android app and IOS app. Because of the closeness of IOS system, it is difficult to capture comment data. Although Android system is relatively open, it also needs many permissions to obtain the required text data completely. Therefore, this study collects social media comment data from the web. According to the form of data, web pages can be divided into static web pages and dynamic web pages. Static web pages are also known as HTML (Hypertext Markup Language) formatted web pages. Its characteristic is that the content of the web page has been determined in the process of web page writing, and will not change with the change of background and user interaction. Just use python programming technology to crawl static web pages, compress and save all the contents on these web pages into TXT text format, remove the web page label, and then store them in the database for use.

The static web page is a dynamic web page because its content will change with the background operation, so the data collection of dynamic web pages is much more complex than that of static web pages. The following figure is the flowchart of dynamic web page text data mining.

As can be seen from the flowchart of dynamic web page processing in Figure 2, first use the web page text crawling tool gooseeker to grab the dynamic content of the web page in different periods and then save it as the XML text format that can be used by Python algorithm, and then use Python algorithm to preliminarily preprocess the text. Different from static web pages, dynamic web pages have a lot of rolling information, so it is necessary to remove this part of the text that is meaningless to the research. In this study, the excellent Chinese word segmentation tool ICTCLAS is selected for processing. ICTCLAS can divide the whole sentence into phrases, identify semantics, actively remove content such as advertising, and leave only comments related to the city. Word segmentation mainly divides phrases into four categories: nouns, verbs, adjectives, and adverbs, because prepositions and pronouns have no practical meaning. By selecting the features of phrases, the weight value of text feature vector can be further optimized, and formula (2) can be rewritten aswhere refers to the probability of noun, verb, or adjective adverb appearing in the text and is the total number of social media comments in the text database.

2.3. City Brand Influence Level

The definition of urban brand influence level is the radiation influence degree of all functions of a city on areas outside the city (including the country where the city is located and foreign areas). The higher the level, the greater the urban brand influence. All functions of a city can be divided into internal and external functions according to the scope of radiation. Internal function refers to the role of the city in the infrastructure and services of citizens in the city. External function refers to the impact of the city on the residents of surrounding cities, mainly including transportation and tourism. In this study, GNP, GDP per capita, total social retail sales per capita, the proportion of citizens’ fixed assets in the per capita value of GNP, and the proportion of urban public revenue budget in GNP are selected as the numerical indicators of internal function, and urban cargo throughput, tourist inflow and outflow, and the total length of urban highway and railway routes are selected as the numerical indicators of external function. On the basis of determining the numerical indexes of internal and external functions, the discrete entropy method is used to calculate the weight of each numerical index. Positive numerical indicators arewhere is the maximum value in the positive numerical index. Negative numerical indicators arewhere is the minimum value in the negative numerical index. Normalize the positive and negative numerical indicators to obtain

Positive and negative values can then form a standardized matrix:

Calculate the entropy of the numerical index in the standard matrix, wherewhere is a constant. For a system with completely ordered information, its entropy is 0. At this time, takes the minimum value. On the contrary, if the system is completely disordered, the entropy is maximum, and is the maximum at this time.

Calculate the deviation degree of the th numerical index in the standard matrix as follows:

Calculate the weight of the th numerical index in the standard matrix as follows:

Then, use the following formula to standardize the original social media comment data.

The equilibrium point model of American sociologist Cconvers needs to be introduced to calculate the final urban brand influence level. The equilibrium point theory proposed by him holds that the influence of one city on another city is directly proportional to the distance between the two cities, that is, the closer the distance between cities, the greater the influence between them. At the same time, the larger the scale of a city and the stronger its economic and political strength, the greater the scope of the city’s brand influence and the more the cities affected within the scope. The point of equal influence between the two cities is the equilibrium point. The formula of the equilibrium point iswhere refers to the distance from the equilibrium point to the city and represents the comprehensive strength of the city.

Then, add the urban economic development into the formula for consideration. Cities with higher economic development level are taken as the origin of influence, cities with lower economic development level are taken as the force point, and traffic is taken as the influence medium to spread capital, means of production, talents, high-tech information, and other capital from the origin of influence to the force point. Then, the strength formula of economic impact field is

The economic impact intensity model between cities is defined as

Based on the urban economic impact intensity model derived from the equilibrium point theory, the formula of urban impact energy level index is finally obtained as follows:

3. Establishment of Social Comment Thesaurus and Ranking of Urban Influence

3.1. Building a Gender Thesaurus of Social Media Comments

There are obvious gender differences in the influence of social media comments on urban brands. In order to improve the accuracy of experimental conclusions, it is necessary to establish a gender thesaurus of social media comments. The establishment of gender thesaurus can effectively reduce the capacity of the database and further improve the efficiency of analysis by eliminating phrases with small gender differences. At the same time, it can screen out phrases with gender characteristics to improve the accuracy of urban brand influence analysis. This paper selects 5000 female comments and 5000 male comments related to urban brand influence from social media comments to form the experimental object. For the comment text, first use the ICTCLAS word segmentation tool to segment words and then calculate the interactive information values of men and women, respectively. The results show that the maximum difference of interactive information values between men and women is 6.37. Therefore, this experiment selects 0%, 10%, 20%, 30%, 40%, and 50% of the maximum difference of interactive information values of male and female comments as the threshold. Because phrases exceeding 50% of the maximum value are classified into phrases with great differences, it has no reference significance for the research of urban brand influence, so it will not be considered. Six thesauruses are created according to different proportions, and the dynamic vector space model above is used to vectorize the thesaurus. The classification results of 20 cross validation are shown in Figure 3.

The results of the above figure show that with the increase of the proportion of the maximum value of the difference, the interactive information value and accuracy show a continuous decline. When the threshold proportion is 20%, the accuracy is higher, and the difference between men and women’s interactive information values is small. Therefore, the optimal threshold proportion is 20%. This proportion can be applied to the ranking of urban brand influence, which can improve the accuracy and scientificity of gender classification on the results.

Based on the above analysis results, this paper divides male and female social media comments into different citizens’ needs for the city according to the characteristic words contained in their citizens’ needs. For female customers and male customers, this paper counts the number of comments in each quarter of each citizen demand category, and the results are shown in Figure 4.

3.2. Regression Model of Urban Influence and Its Feasibility Analysis

This paper also analyzes the regulatory relationship between city types to meet citizens’ expectations and citizens’ satisfaction, which is reflected by the interaction between city brand influence types and citizens. From the analysis of social media review data, it can be concluded that the three interactive items of urban employment rate and employment surface, cost of living, and infrastructure are statistically strongly correlated with the rate of citizens returning to the city. The interactive value of urban employment rate is negative, indicating that the impact of employment rate and employment in super large cities on citizen return is much greater than that in small and medium-sized cities. The interaction item of cost of living is weaker than that related to employment, but it still has a great impact. In order to verify the feasibility of the regression model of urban influence, the relationship between the expected cumulative probability and the observed cumulative probability of the model is drawn in this experiment, as shown in Figure 5.

It is obvious from the above figure that the expected points are randomly distributed near the observation standard line, indicating that the experimental error is far less than the acceptable error. The expectation error of citizens approximately obeys the Zhengtai distribution, and the regression model of urban brand influence is feasible in the calculation of expectation.

The standardized residual test is also carried out, and the resulting scatter diagram is shown in the figure below. In Figure 6, the standardized residuals of citizens are randomly distributed near the zero line, indicating that the model conforms to the regression hypothesis, that is, the random error is equal variance and distributed independently. Therefore, it can be seen that the regression model of urban brand influence is feasible in terms of standardization.

3.3. Composition of Citizens and Classification of Social Media Comments

At present, social media has basically become one of the most important channels for each enterprise to promote its own brand. In order to achieve the best promotion effect, it is necessary to introduce data monitoring, analyze the operation promotion effect through the data of each dimension, and take this as the support to guide the subsequent optimization scheme. Through data mining and processing, we can go deep into consumer insight, find the positive position of target users of brand products, and match the tonality of brand products. Guide content output through knowledge maps and other means to achieve the right channels and scenarios, communicate with the right content and target users, and help the brand comprehensively improve communication efficiency.

In the above, this study selected 5000 female comments and 5000 male comments related to urban brand influence from social media comments to form the experimental subjects. After using ICTCLAS word segmentation tool to segment the comment text, we can get the proportion map of citizens participating in urban comments and the proportion of different comment contents. Figure 7(a) shows the proportion of citizen compositions, and Figure 7(b) shows the proportion of social media comments with different contents.

From the proportion of citizens in the left figure, we can see that the largest proportion of locals is 44%, indicating that local citizens are most concerned about the development of their own city brand influence. The next 18% are migrant workers, which is related to the work of citizens and the focus of urban comments. The small difference accounts for 17% of the comments of tourists. After traveling to the city, opening comments has become an indispensable part of social media websites. Investors also contributed 14% of the comments. As can be seen from the comments in the right figure, the content of the city image accounts for a little more than half, indicating that the most intuitive expression of a city is the city image. The content of discussion on large-scale events accounts for 26%, indicating that holding large-scale events such as sports games, expositions, and concerts can drive the popularity of the city. If successfully held, it will greatly enhance the influence of the city brand.

3.4. Social Media Comment Rating and Factors

ICTCLAS word segmentation tool can judge the emotional color of adjectives in comments and can transform citizens’ social media comments into more intuitive city scores. The experimental comments are divided into one to three stars. One star comments mainly express dissatisfaction and disgust with the city, and three star comments represent great satisfaction with the city. Figure 8 shows the relationship between the number of social media comments and the score of the corresponding city.

The results of Figure 8 show that there is a significant difference in the number of comments among the three scoring levels. For the influencing factors of several cities, the number of dissatisfied and satisfied comments varies greatly. Therefore, it can be seen that citizens’ satisfaction shows different attitudes on different factors. It can be seen that the number of three scores in the satisfaction with urban traffic is basically the same, indicating that citizens maintain a basically qualified attitude towards the development of urban traffic. The number of dissatisfied comments on urban house prices is far more than that of satisfied ones, indicating that if the city wants to improve its brand influence, it can make efforts in regulating house prices. The number of satisfied comments on catering is far more than the number of dissatisfied comments, so the city can strengthen its advantages in catering, attract more investment and tourists, and enhance the influence of the city’s brand.

Urban social media comments will also show certain rules over time. This paper counts the number of different types of comments on cities in social media in 12 months.

The chart of the number of social media comments over time is shown in Figure 9. Citizens’ comments on social media will change significantly with the change of months. It can be seen from the figure that the number of comments on urban heating accounts for an absolute majority in colder months. During the summer vacation, urban tourism has become a hot topic for discussion among citizens. Urban festivals and citizens’ daily work are also the focus of citizens’ attention in different months. On the whole, from the above results, we can know that improving the influence of urban brand can reasonably allocate and tilt resources in appropriate months, so as to obtain more effective harvest.

4. Conclusion

Through the analysis of the database, a gender thesaurus is established, which further improves the accuracy of the experimental results. The feasibility of the regression model is analyzed, and the feasibility of the model is verified from the two aspects of expected value calculation and standardization. This paper analyzes the composition of citizens who comment on social media and the proportion of comment content and puts forward some suggestions to enhance the influence of the city. The new urban cultural resources need to improve the city brand positioning, which should be updated and rebuilt with the changes of the times. The establishment of diversified and systematic brand positioning in the new era should be combined with the city’s cultural resources and economic development. Build a city brand communication system fully integrated into the international context and modern expression. The transformation of characteristic cultural resources, the enhancement of industrial competitive advantages, and the synchronous development of culture and economy are the driving factors for the high-quality coordinated development of cities. However, the big data of social media comments still has many hidden contents to be mined. As citizens pay more and more attention to personal privacy, it is more and more difficult to obtain user comments, which brings some difficulties to the research.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by 2021 Qingdao Social Science Planning Project (Research on “New Cultural Creation” Promoting the Cultural Construction of Qingdao International Fashion City) (No. QDSKL2101222).