Abstract

With the rapid development of Internet, new media news has gradually become the most concerned information source of new media people. New media public opinion is a force that cannot be ignored. It needs monitoring and guidance. The research on hot topic discovery and trend analysis can timely find social hot topics and analyze the trend of topics, which is conducive to grasp the trend of public opinion, so as to correctly guide and maintain social stability. At the same time, the emerging industry of new media has sprung up like mushrooms after a rain, with a rapid momentum. With the advent of the era of big data, the development of new media presents characteristics and advantages that are different from traditional media, but we should also note that the era of big data has both advantages and disadvantages for the development of new media. In this paper, LDA and ARIMA models can be used to calculate and analyze the popularity measurement and trend analysis of new media reports under the background of big data mining. The model designed in this paper has a conclusion: the missed detection rate is reduced by 75.4%. Through experiments, it is found that the accuracy of heat topic detection of the model designed in this paper can reach 84.6%. In trend analysis, the first stage of transmission is called the incubation period. Then after a certain critical point, it will come to the outbreak period. The outbreak period lasted for a period of time, entered a period of plateau, and finally came to a subsidence period.

1. Introduction

Today, we have ushered in the era of big data. That is, data resources have become a common understanding. Obviously, in the future business competition, having massive data will become the development advantage of all enterprises [1]. Detailed to the news industry, it is manifested in the rise of data news. In the daily environment of our work and life, big data has become a hot word of the times [2]. In recent years, the emergence of new media such as Weibo, WeChat, and Douyin has made new media have attracted widespread attention. This series of emerging media, based on data information technology and using new media as a carrier to disseminate information, are jointly building the whole new media industry.

The concept of big data refers to massive, high growth rate, and diversified information assets that require new processing modes to have stronger decision-making power, insight and discovery power, and process optimization ability [3]. Driven by the global big data development wave and the huge industrial big data application demand, big data intelligent analysis technology represented by data mining and machine learning has made great progress [4]. At the level of algorithm, more accurate and effective basic theories and algorithm models of machine learning are constantly emerging, especially the deep learning model represented by the new media of deep nerve has made great success in the fields of image processing, natural language processing, and speech recognition [5]. However, with the development of the Internet and the emergence of various new media, we are often troubled by useless redundant information and repeated and cumbersome information in the process of receiving information [6]. For news audiences, tools that can filter news information in a more intelligent and targeted manner are needed; while for news communicators, it is necessary to increase the popularity of news and expand the scope of news diffusion. Win and gain in the industry are an advantage in the industry [7]. Nowadays, portal websites such as Netease News, Sina News, Today’s Headlines, and Tencent News collect updated news in real time and push hot news to users in time, so that users can know the current hot news in time and accelerate the spread of hot news [8]. With the rapid growth of new media news, news information has gradually become huge and bloated [9]. Due to the virtuality of new media, there are also a lot of junk information, even fraud information in new media. News report is one of the most accessible and concerned new media information types. News report has the characteristics of dynamic change, fast growth, and many related topics [10]. Traditional search engines play an important role in people’s new media life. Through traditional search engines, people can retrieve and browse the required information from a large number of new media resources, which greatly facilitates the way of information acquisition.

The news media also influences readers’ emotional attitudes and value decisions through the choice of information and the expression of attitudes. At the same time, the interactivity and openness of the Internet provide a channel for people to express their views [11]. For the news released by new media, readers can independently screen the topics they are interested in, and “turn a blind eye” to some news they are not interested in. You can also praise or comment on the news to express your views and attitudes [12]. In fact, under the influence of the big data era and the wave of data news, the mainstream news media in my country have already started a series of data news practices [13]. However, these data news are formed by domestic news practitioners learning and imitating foreign practices, and there are still many problems [14]. In addition, most of these studies on advanced cases are micro-interpretation and sharing of specific measures, which cannot provide a complete production path for data news production of other media and organizations [15]. The research goal of this paper is to use big data mining algorithms to find hot topics from a wide range of new media news reports and to study topic trends. On the other hand, through the hot topic detection and trend research method constructed in this paper, the classified data are modeled, so that the hot topics in each time period and the trend analysis of each topic can be obtained.

2. Methodology

2.1. New Media News Analysis from the Perspective of Big Data

Big data has brought new opportunities for new media news communication. In terms of new media news, news data has exploded. Finding the value of news data through data analysis and exploration is the focus of current journalists’ efforts to meet the reading requirements of the masses. Most news is sudden and temporary. Journalists should have strong adaptability. With the arrival of the era of big data, the media industry has more prominent requirements for journalists’ response ability. In the era of big data, journalists in the new media environment should make full use of big data means to collect valuable news clues on the Internet, and use the beauty and interest of network information to dress up and beautify news pictures and backgrounds, so as to timely deliver high-level and valuable news to news audiences. Today, in the era of big data, the most important resource in news reporting is data, and computers and the Internet are no longer an auxiliary method used by journalists in news reporting [16]. It exerts its great advantages and permeates all aspects of news production. It can not only process and analyze the collection of data in news reports but also assist the discovery, production, and release of news clues. It can be said that it has run through the whole news production process, even including the dissemination and operation of news [17]. As a result, a new news reporting model has emerged, that is, data news. As the research of distributed data parallel computing framework has become the general factual standard of big data processing [18]. Set the relationship table and divide r into R1 and R2, where name, title, gender, and salary are the attribute coefficients in the relationship table R. The data results derived in this paper at R1 and R2 are shown in Tables 1 and 2.

In the era of big data, the new media shows the characteristics different from the traditional media, such as timeliness, information content, and personalization. Especially in the timeliness of information, new media often have more advantages than traditional media [19]. New media relying on new media can often integrate and publish information faster and more conveniently, thus ensuring the timeliness of news or events to a certain extent [20]. In terms of content, new media can accommodate and disseminate a relatively large amount of information. Compared with the limited layout of newspapers and the limited duration of television, the amount of information that new media can disseminate is far from what the communication media can match [21]. On the unique personalized recommendation, the new media has enough right to speak. In the era of big data, new media puts the user’s needs in a more important position, analyzes the user’s browsing records and other data, tailors-related services for users, and pushes the matching content, for example, the TikTok personalized push [22]. The reporting method of accurate news focuses on the accurate analysis and processing of events in society through the method of social statistics, stripping out the law and significance behind the general phenomenon, and showing it to the public [23]. The core understanding of it lies in obtaining information and reporting facts through the means of social science research [24]. Through the study of multivariate data sets, to explore the main factors affecting news flow, we can accurately predict news popularity, so as to have a better understanding of how to optimize news and more effectively enhance news competitiveness [25]. The heat forecast will also involve the strengthening of decision-making, which will make the news delivery effect more ideal. Once again, a data tree is introduced to represent this form, and the node dependency function in relation table R is shown in Figure 1.

2.2. Research and Text Representation of Hot News

The text preprocessing stage is the process of transforming text data into structured data. Firstly, this paper uses Python new media crawler technology to form a data set of the crawled new media news, then Chinese word segmentation is carried out on the text, the text is cut into scattered independent word sets, part of speech tagging is carried out on each word, and a stop vocabulary is constructed to remove irrelevant interference words [26]. News text has no fixed structure, no uniform format, and cannot be directly identified. In order to facilitate subsequent processing, the news text needs to be vectorized. After the text is vectorized, each news text can be represented as a vector, and the vector can be used for text similarity comparison, text clustering, classification, etc. [27]. Table 3 shows the general data of obtaining text.

At this point, an LDA-based text representation model can be introduced, which is a common topic model. Let be the document collection, be the document, be the collection of potential topics, be the potential topic, and be the first of the document th word, is the topic of document containing the th word. is the basic distribution of LDA model. is the polynomial distribution of topics on a document in the document set. The basic model diagram of general LDA is shown in Figure 2.

The hot news topics discussed in this paper generally refer to the event topics that have become the focus of the society and caused certain influence in the news environment, which are caused by certain events and widely participated by netizens [28]. Generally, the hot topics of news are divided into two types. One is news group events. In this case, it refers to those aggregated groups temporarily composed of some groups or unspecified groups of people caused by certain social contradictions and social consciousness. The second is general news events and entertainment events. This type of new media hot topic is also a new media hot topic highly concerned and participated by Internet users. Such hot topic events are often related to people’s general demand for new media information. Such new media hot topics are not obvious and have a negative impact on the new media order. The correctness of distributed function dependency discovery can be solved by data redistribution. The number of data redistributions is equal to the number of attributes.

By observation, most of the functional dependencies can be found after the first few data redistributions. Take the relational table as an example. For property , the set of candidate functional dependencies grouped by can be represented as where is the dependency of candidate functions. Make the size of . Take the first attribute as an example:

After the first data redistribution, all the candidate function dependencies with in the left will be verified one by one. Therefore, is equal to

Suppose is the total number of all candidate functional dependencies in the property grid. Generally, can be expressed as

Figure 3 shows the CRF corresponding to the relationship table containing five attribute groups. As you can see, most of the candidate function dependencies are clustered in the first few attributes. The CRF of the first two attributes is as high as 75%. Therefore, after previous data redistribution, most functional dependencies can be found.

Time is generally introduced into news reports. In statistical research, a set of random variables arranged in chronological order is used to represent a time series of random events. Time series has a wide range of applications and has many practical applications in the fields of natural science and engineering technology. It is generally defined as an ordered set composed of recording time and recording value:

Among them, represents the value of the record, represents the time, and the element represents the record value of the time is . ARIMA model is a modeling method used to study time series. It is used to describe the linearity of data series and is often used for short-term modeling and forecasting. The model can be subdivided into three categories: AR model, MA model, and ARMA model. The AR model can be expressed as a multiorder autoregressive model, abbreviated as :where is the coefficient of autoregressive term, is the order of autoregressive term, and is the random interference sequence. MA model is generally expressed as -order moving average model, which is noted as :where is the coefficient of the moving average term, is the order of the moving average term, and is the random interference sequence. is the initial value. Amra model is an autoregressive moving average model, which is recorded as . At this time, the expression iswhere is the undetermined coefficient, is the independent error term, is the coefficient of autoregressive term and moving average term respectively, and is the order of autoregressive term and moving average term, respectively.

2.3. Heat Measurement and Trend Analysis

When measuring hot news, the news text needs to be represented first. Since the news text cannot be directly used for calculation by the computer, before topic detection, the news text should be represented in a form that the computer can calculate. The representation of news text is the primary problem to be solved in the preprocessing stage of topic detection, that is, to solve the problem of formatted representation of text in the system for easy recognition and processing. Language model is a statistical model based on probability, which can also be regarded as a statistical model for generating a certain language text. In most statistical language models, Markov model is usually used to construct a sentence model. The language model in news reports is constructed as follows. Assuming that the probabilities of words appearing in news reports are independent of each other, the probability that a report is related to topic iswhere represents the number of words in the report, represents the distribution probability of word in the corpus, and represents the generation probability of words in topic . indicates the weight of the word in the document , then:

Among them, represents the probability of word appearing in document , represents the inverse document frequency of word in the document, represents the total number of documents, and represents the number of documents containing word . In judging the relevance of news reports and hot topics, the concept of similarity needs to be introduced, which is generally a language-based similarity model:where represents the report, represents the topic, represents the feature items, represents the sum of the feature items in the report , represents the probability of appearing in , represents the probability of appearing in the data set, and represents the word frequency of in the document. Table 4 is the topic index table of hot topics.

In order to show the start time of each topic and the general development process, topic indices of 4 time nodes are selected in the above table to reflect the complete trend change process of each topic. Each hot topic has experienced life cycle stages such as incubation period, growth period, maturity period, and decline period. When hot topics first appeared, there may be few relevant reports, but with the development of time, the heat increased, gradually became a hot spot, and then reached the peak. From the above analysis, it can be found that the processing process of the first few attributes accounts for a large part of the discovery. Taking this as an example to improve the efficiency of balanced resource utilization, the attributes with low inclination and cardinal number should be given priority. Figure 4 shows the overall flow of Smart FD algorithm.

The basic model functions are linear, exponential, logarithmic, normal distribution, and polynomial functions. Through the analysis of the quantitative indicators of hot news, this paper uses the polynomial function as the model function to carry out regression analysis on the number and proportion of news-related reports and the number and proportion of source new media. The expression of the polynomial function is as follows:

In the fitting of polynomials, the higher the degree of polynomials, that is, the greater the , the more obvious the fitting degree of curves to real data. Therefore, it is necessary to limit the degree of polynomials:

The subscript represents the degree of the polynomial and represents the degree of fit of the degree polynomial. The rule for selecting the degree of polynomial is: the ratio of the difference between the degree of fit under the degree and the degree of and the ratio of the degree of fit under the degree is less than the threshold, that is, when the degree increases, the increase in the degree of fit is not obvious, indicating that the regression function tends to be stable. In this paper, is used to express the trend prediction result of topic and is defined by using the prediction slope of trend model under multigranularity.where represents the total number of granularity selected in multigranularity fusion, represents the regression model of the granularity, and represents the fitting degree of the regression model of the granularity; represents the value of the granularity, , and represents the prediction slope of the prediction point in the regression model of the granularity.

3. Result Analysis and Discussion

In order to establish a scientific and accurate new media report theme heat measurement and trend analysis model, this paper, on the basis of the above research and analysis, further experimental analysis, hoping to show the effect in specific data experiments. In the hot measurement and trend analysis of whether the detection model can match the topic of new media reports, this paper analyzes the missing rate parameter, the hot topic detection accuracy rate, and the general trend of topic spread. Q represents the missing rate parameter matching the subject reported by the new media. Q represents the hot spot detection accuracy reported by new media. M represents the overall trend of theme communication. Here are the experimental analysis charts of three different sample sets Q, W, and M in these three indicators, as shown in Figures 57.

As can be seen from Figure 5, with the continuous increase of the parameters, the missed detection rate gradually decreases, and finally stabilizes. Based on the experimental results of the four stages of incubation, growth, maturity and decline, the optimal value of the parameter is 2.44, and the missed detection rate at each stage has reached the minimum. Overall, the missed detection rate is decreased by 75.4%. As can be seen from the above figure, the final result of topic detection is that generally, the result of topic detection is worse than that of clustering. In practical application, the report set to be processed is very large, and it is impossible to cluster the data of the whole data set once. Through the experiment, it is found that the accuracy of hot topic detection of the model designed in this paper can reach 84.6%. In the initial stage of the topic, the event news is only spread in a small range, and the object and audience are relatively few. With the gradual spread and expansion of hot topics, there will be some new media or famous people involved at this time. Through their reports and opinions, the whole hot topic explodes at some zero boundary point. The audiences of these new media or well-known figures become the initiators of the next round of communication. Through their forwarding, the cumulative social attention of the whole hot events has increased rapidly. Then the spread of hot topics entered a stable period. At this time, there were new follow-up reports, and the attention of the media and the public continued at a certain level for a period of time. Then, over time, public attention gradually declined and eventually disappeared. Summarizing the above-mentioned transmission process, the first stage of transmission can be called the incubation period. Then, after a certain critical point, it will come to the outbreak period. The outbreak period lasted for a period of time, entered a period of plateau, and finally came to a subsidence period.

4. Conclusion

The form of news realizes the transformation of traditional news production, and its advanced nature determines that it is in line with the future development trend of news industry. Compared with traditional news production, it introduces new subjects, puts forward higher requirements for journalists and teams, and promotes the development and transformation of news industry. The progress of the times has not only brought dawn to the development of all industries but also brought new challenges, and journalism is no exception. Therefore, the research on data journalism from the perspective of big data is a very meaningful thing. In this paper, you are proposing an algorithm based on big data to analyze the topic heat measurement and trend of new media reports, and analyze some quantitative characteristics of the topic, including the number of topic-related reports, the number of source websites, and the dispersion of reports. Comprehensively, analyze the indicators of the topic to obtain an energy value of the topic and judge whether a topic is a hot topic according to the energy value. On the basis of experimental analysis, a conclusion is drawn about the model designed in this paper: overall, the missed detection rate is reduced by 75.4%. Through the experiment, it is found that the accuracy of hot topic detection of the model designed in this paper can reach 84.6%. In trend analysis, the first stage of transmission is called incubation period. Then, wait until after a critical point and then come to the outbreak period. The outbreak period lasts for a period of time, then enters a stable period, and finally comes to the regression period. This paper uses polynomial function as model function, which lacks regression analysis of the number and proportion of news reports. Therefore, future research needs to be combined with more data sources for analysis.

Data Availability

The data used to support the findings of this study are available from the author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.