Support Vector Machine and Granular Computing Based Time Series Volatility Prediction

Yang, Yuan; Ma, Xu

doi:https://doi.org/10.1155/2022/4163992

Journal of Robotics

On this page

Abstract Introduction Related Work Conclusions Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Internet of Robotic Things-Enabled Edge Intelligence Cognition for Humanoid Robots

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 4163992 | https://doi.org/10.1155/2022/4163992

Support Vector Machine and Granular Computing Based Time Series Volatility Prediction

Yuan Yang¹and Xu Ma¹

Academic Editor: Shan Zhong

Received17 Dec 2021

Revised13 Feb 2022

Accepted14 Feb 2022

Published16 Apr 2022

Abstract

With the development of information technology, a large amount of time-series data is generated and stored in the field of economic management, and the potential and valuable knowledge and information in the data can be mined to support management and decision-making activities by using data mining algorithms. In this paper, three different time-series information granulation methods are proposed for time-series information granulation from both time axis and theoretical domain: time-series time-axis information granulation method based on fluctuation point and time-series time-axis information granulation method based on cloud model and fuzzy time-series prediction method based on theoretical domain information granulation. At the same time, the granulation idea of grain computing is introduced into time-series analysis, and the original high-dimensional time series is granulated into low-dimensional grain time series by information granulation of time series, and the constructed information grains can portray and reflect the structural characteristics of the original time-series data, to realize efficient dimensionality reduction and lay the foundation for the subsequent data mining work. Finally, the grains of the decision tree are analyzed, and different support vector machine classifiers corresponding to each grain are designed to construct a global multiclassification model.

1. Introduction

With the rapid development of internet technology and the improved performance of data storage devices in recent years, a large amount of data is generated and stored in various industries. Among these data, a large portion of them are time-tagged, that is, a series of observations recorded in chronological order, called time series. How to effectively analyze and process this time series of data to uncover potential and valuable knowledge and information to support more efficient production, operation, management, and decision-making activities of enterprises is one of the important tasks in today’s big data era [1]. Computation is a new approach to simulating human problem-solving thinking and solving complex tasks with big data and is an emerging research direction in artificial intelligence in recent years. The main idea of the theory is too abstract and divides complex problems into several simpler problems (i.e., granulation), thus contributing to better analysis and problem-solving. The existing research on time-series information granulation is mainly divided into two aspects of research in the time axis and theoretical domain, that is, solving the problems of effective divisional representation of the time window and the theoretical domain. Support vector machines show many unique advantages in solving small sample, nonlinear, and high-form recognition problems, such as using this technology to avoid local minimum of knowledge and realize capacity control, and can be extended to other machine learning problems such as function fitting. Time-series time-axis information granulation research usually uses a fixed time interval to divide the time series, that is, hard division, and then represents the information grains obtained after the division, ignoring the changing characteristics of the time series on the time axis, which does not conform to the essential meaning of information grains, so it is necessary to design the time-series information granulation method according to the changing characteristics of the time series on the time axis so that the information grains obtained internal structures are similar to each other and the information grains are distinct from each other [2]. Time-series domain information granulation studies usually cannot combine the requirements of both interpretability and prediction accuracy of the domain partition interval, so there is a need to design a time-series domain information granulation method that can have both strong interpretability and high prediction accuracy [3].

This paper introduces the granulation idea of granulation in time-series analysis, by granulating the time series with information, granulating the original high-dimensional time series into low-dimensional granulated time series, constructing information granules that can portray and reflect the structural characteristics of the original time-series data, thus realizing efficient dimensionality reduction and laying the foundation for subsequent data mining work, studying the time-series information granulation problem-oriented to clustering and prediction, addressing the shortcomings in existing research methods, proposing three different time-series information granulation methods in terms of both time axis and thesis domain, and applying them to stock time-series data for clustering and prediction analysis. Because of the shortcomings in existing research methods, the study of time-series information granulation for clustering and prediction proposes three different time-series information granulation methods from both time axis and theoretical domain and puts them into stock time-series data for clustering and prediction analysis. To address the problems of long training time and low efficiency of existing support vector machines for solving multiclassification problems, the idea of granular computing is introduced to construct support vector machine multiclassification models, and the learning algorithm for improving the construction of decision trees is investigated to achieve the purpose of improving its training efficiency and classification accuracy.

Combining other more mature theories and methods with SVM has become a research topic with great potential for development. However, at the same time, it faces problems such as difficult classification and inaccurate prediction. The current research on granularity support vector machines mainly focuses on the combination of specific models: SVM with rough sets, decision trees, clustering, quotient spaces, association rules, and so on. These results only preprocess the data, but these models are important for the theoretical study of machine learning and support vector machines, as well as for the exploration of problems such as intelligent information processing.

Egrioglu E [4] studies rough lower approximation and rough upper approximation on the space of grain approximations from the perspective of rough set theory. Subsequently, the concept of grain-logic (G-logic) is given in the study of literature [5], where a similar inference system is built based on rough logic, while instance verification and analysis are implemented in medical diagnosis problems. Many results have also been achieved in terms of practical applications. The importance of attributes, as elaborated in the literature [6], was added to the granular computation of knowledge while used in solving the minimal attribute approximation, among others. In subsequent research, fuzzy quotient space theory was created by literature [7], improved by literature [8], perfected in the context of data mining, and so on. He Y [9] dealt with word computation and language dynamics and proposed a language dynamics system. The subsequent literature [10] elaborates a grain computation model based on tolerance relations, giving a grain operation criterion for incomplete information systems, a grain representation, and a grain decomposition method. At the same time, in connection with the attribute simplification of rough sets, the determination conditions are given, and the problems such as the acquisition of attribute necessity rules for incomplete information systems are addressed. Luo C [11] applies the compatible granularity space model in the field of image segmentation. Kim S T [12] combines granularity with neural networks and is applied to efficient knowledge discovery. Dong G [13] elaborates on the connection between concept description and concept hierarchy transformation based on the similarity of the concept lattice and granularity partitioning in the process of concept clustering. Su W H [14] grain vector space and artificial neural network, which improves the timeliness and comprehensibility of knowledge representation of the artificial neural network. Literature [15] decomposed copper and wheat prices based on EMD and EEMD methods, respectively, based on multiscale perspective analysis, and finally, BP neural network, SVM model, and ARIMA were used for prediction and integration, and the prediction results showed that the combined model prediction is better. Although the prediction model integrated by decomposition is better, there are some defects, such as the wavelet decomposition method has problems of weak adaptability of itself and poor robustness of network training in the process of data decomposition, while the EMD method has problems of modal overlap and lack of theoreticality in the decomposition process. Moreover, for price series with multiscale and high noise, the number of each component after decomposition of these methods is high, which is not conducive to the subsequent forecasting work. After that, the literature [16] constructed a new sequence decomposition method, the empirical wavelet transform method, based on wavelet transform and combined the advantages of EMD. The literature [17] and others used EWD and EWT to decompose the wind power sequences and then combined the with the neural network method for cross-combination prediction, and after comparison, it was found that the sequences decomposed by EWT had a better prediction effect. The basic idea of a rough set is to form concepts and rules through analytical induction and study target equivalence relations as well as categorical approximation knowledge discovery. Zhao Y [18] combines multilevel and multiperspective granularity methods by defining the division sequence product space and using nested division sequences to define different granular layers over the theoretical domain. Finally, a granulation model based on the division order is given using the division order product space. Chen W [19] proposes a neighborhood granulation method by introducing inter- and intraclass thresholds to construct a supervised neighborhood-based rough set model and gives the rough approximation quality and conditional entropy monotonicity change theorems for this model by analyzing the neighborhood particle change law under double thresholds. After studying the operation mechanism of data information particles in literature [20], the nonstandard analysis is used as the operation rule of information particles; the accompanying binary relation is proposed; and the coarse and fine particle layer division of information particles in the binary relation is analyzed in-depth, and the algorithm can realize the merging and decomposition of particle layer space, which can effectively reduce the data calculation intensity and simplify the data analysis process.

3. Support Vector Machine Based Algorithms for Particle Computation

Set theory is the foundation of modern mathematics, and fuzzy set theory is one of the new mathematical tools and theories. Once the concept of fuzzy sets and the problem of granularity of fuzzy information were introduced, it rapidly expanded the scope of its use and extended the theory of fuzzy logic, followed by the “theory of word computation”, which aims to use language for fuzzy computation and reasoning to achieve fuzzy intelligent control. At the same time, the integration of fuzzy set theory and quotient space, using fuzzy equivalence relations, completed the study of the expansion of the quotient space model and grain computation and was able to accurately map and solve uncertainty problems. Therefore, a proper hierarchical progressive granularity structure can solve the problem effectively. However, this theory does not have the means and technical algorithms to complete the transformation, including between granularity and granularity world, between granularity and granularity, and between granularity world and granularity world. If this problem can be solved, it will improve and promote the theory and the scope of use of the quotient space [21].

The fusion of the three models in turn produces fuzzy rough sets, fuzzy quotient spaces, and so on so that the three models are both distinct and related. First, between rough sets and fuzzy sets, the former are processed later, and the latter are preexisting. However, both describe and generalize the incompleteness and inaccuracy of information grains, and there are significant differences in the processing of information grains. Rough sets focus on the coarseness of information grains, describe grains by upper and lower approximation operators, and emphasize indistinguishability and the classification of different equivalence classes. Fuzzy sets focus on fuzziness, describe and emphasize the indistinguishability of boundaries using affiliation and affiliation functions, and study only the degree of affiliation of the same equivalence class. Figure 1 shows the framework of the algorithm flow of the support vector machine-based granular computing.

Solving for an estimate of a system sample based on a known training sample, in a dependency between the output inputs of a system, machine learning makes a relatively accurate estimate of the unknown data as possible. Then it is possible to model the problem of machine learning as the existence of some unknown dependency between input variables and output variables [22]. The basic idea of the support vector machine is to get a high-dimensional space, use nonlinearity to transform the space of the input, and then solve the optimal linear classification and finally define the appropriate inner product function to complete this nonlinear transformation. The triadic theory of granular computing includes multiperspective, multilevel granular structures, and granular computing triangles. The methodology of granular computing is a structured problem-solving, and the computational model of granular computing is structured information processing. The triad emphasizes the mutual support of the philosophical, methodological, and computational models of that computation. The study of granular computing attempts to organize, abstract, and combine granular processing ideas from various disciplines to obtain a higher-level, systematic, and discipline-specific knowledge-independent principle of granular computing.

The traditional algorithm steps are as follows:(1)Select the number of grains to be divided . is the overall feature function of the data set.(2)Determine its objective optimization function. In objective optimization, the function is the penalty parameter, and the classified samples will appear as nonseparable regions and may also appear to belong to one class of samples or multiple classes of samples.(3)Generate k decision functions as follows:(4)The radial basis kernel function is then obtained as follows:(5)The final if-then form and the main fuzzy constraint propagation rules are proposed.

It follows that cluster analysis can be considered as a concrete implementation of the idea of granulation, which is another layer of abstraction on the idea of cluster analysis. Granular computing is a concrete implementation of the ideas of granularity and hierarchy in the solution of machine problems. The core concept of granular computing is multilevel and multiview granular structure [23]. The fundamental framework of granular computing consists of particles, granular layers, and granular structures. The most fundamental element of granular computation is called a particle, which is composed of a collection of individuals described by internal properties and a whole of external properties. An abstracted description of the space of problems or samples of operations is called a granule layer, and the whole of particles obtained based on some required granulation criterion constitutes a granule layer of internal particles with some identical (or similar) granularity or properties. Granularity comes from the way people perceive the world. The observation, definition, and transformation of practical problems are granular calculations for different problems measured from different perspectives or levels, and in different applications, granularity can be interpreted as size, abstraction, complexity, and so on. Different grains can be ordered by their granularity. Each grain provides a local, partial description, and all grains in that layer combine to provide a global, complete description. Grain calculations are often solved on different grain layers. A multilevel grain structure is a description of the relationships and connections between grains and grains, grains and layers, and layers and layers. The grain structure is a relational structure consisting of interconnections between grain layers. There are three modes: top-down, bottom-up, and center-out, which are three common modes of information processing for humans and computer information processing.

4. Support Vector Machine and Granular Computing Based Time Series Volatility Prediction

4.1. Empirical Modal Decomposition of Time Series Fluctuation Algorithms

Empirical modal decomposition (EMD) is a signal decomposition processing method, the principle of which is to decompose the originally complex original price signal sequence into a finite number of simpler eigenmode functions (IMFs). Each IMF represents the information contained in the original price series at different scale levels, which can effectively reflect the embedded characteristics of the original price series at a low level. EMD decomposition method is a kind of data processing method that can smooth the complex signal series, which decomposes the original signal series into several component series according to the frequency level; the first component series has the highest frequency, then decreasing in order; and the last sequence has the lowest frequency so that many component sequences at different feature scale levels can be obtained, and the decomposed component sequence corresponds to an IMF eigenmode component of the relevant frequency. The frequency fluctuations of the general sequence can indicate the degree of price series fluctuations; the higher the frequency of the component corresponding to the price series fluctuations more violent, the lower the frequency of the component corresponding to the price series fluctuations more moderate, so the last decomposition of the component out of the fluctuations of the most moderate general can represent the price series fluctuations towards, generally referred to as the residual component [24].

EMD decomposition is simple, and the decomposition results are more accurate because the process of mathematical processing is adaptive without human interference and is automatically generated decomposition results. EMD wil automatically generate different basis functions and the most appropriate number of components according to the fluctuation of different price series. In contrast, the wavelet decomposition method requires the selection of the basis function in advance when processing the original sequence and then requires several training trials to determine the most appropriate number of components. Time-series information granulation based on a support vector machine is to introduce the idea of granular computing support vector machine into time-series analysis, which is a new research direction of time-series analysis [25]. The idea of the concept of information granulation is to decompose a whole into small parts and then study the decomposed parts, and each part is divided into a particle. In other words, information granules are elements that are similar and indistinguishable or have a certain function. Information granulation of time series is the basis for compressing the scale of time-series data and using it for subsequent time-series analysis, interpretation, and modeling [26]. Therefore, compared with the wavelet decomposition method, the EMD decomposition method highlights obvious advantages in the operation of the decomposition process. However, there are some drawbacks in the application process of EMD, such as the easy occurrence of component stacking situations and endpoint contamination situations. This is because EMD needs to construct the upper and lower envelopes of the sequence using the cubic spline method in the decomposition process, but the cubic spline method will disperse near the boundary points of the original sequence, and with the decomposition of EMD, the endpoint effect will gradually spread inward to pollute the whole sequence, resulting in interference with the final decomposition effect. The simplest way to cope with this problem is to keep discarding the nonextreme part at the endpoints during the decomposition, but this will cause data waste and thus affect the latter prediction effect. If the data at the boundary points of the sequence is not deleted, generally, it is only possible to add data to each end by various methods, and this extension process will be disturbed by human factors, which will eventually affect the decomposition effect. Figure 2 shows the process of variational modal decomposition.

The empirical modal decomposition algorithm can be understood as a set of adaptive filters that sieve the data layer by layer according to its essential scale characteristics in a process where the characteristic time scales are separated in order from small to large. After such decomposition, the superimposed waves in the original signal can be removed, and symmetric modal waveforms can be obtained. In the EMD algorithm, the sequence IMF1 contains the component of the original sequence with the smallest period. The residual term after subtracting IMF1 from the original sequence contains the part of the vibration signal whose period is larger than IMF1, so the average period of IMF2 is generally larger than the average period of IMF1. By analogy, the IMF sequence filtered by the EMD algorithm has to decrease the signal frequency, decrease fluctuation intensity, and increase the average period, and the final residual term is a constant or monotonic function, which reflects the long-term trend in the sequence. The problem of unbalanced data classification has also become an important direction in the field of data mining and machine learning to study the classification problem. A better solution to the classification problem of unbalanced data distribution is required to deal with the data classification problem in a more comprehensive way. Usually, the total number of components obtained by EMD decomposition is log₂N, where N is the number of data samples in the original series. However, since the actual time series are composed of both real signals and noise, the empirical modal decomposition is processed for data containing noise, and for some time-series data with signal jump changes, the jump signals may cause scale loss, making the decomposed results have the problem of modal confounding.

When there is a jump in the scale of the original time-series signal, the decomposition result of EMD may have the problem of data modal mixing. The so-called modal overlap is manifested in the decomposition results because there should be only one scale feature in the scale feature, and the subsequence of the scale feature is not unique, and the signals of multiple scale features are mixed in the sequence. Particularly, influenced by the signal in many aspects such as collection frequency, frequency components, and signal amplitude, the phenomenon of modal blending can easily occur when empirical modal decomposition is performed directly, and modal blending mainly refers to the following two aspects:(1)A single IMF contains components of the full heterogeneous scale(2)Signal components of the same scale appear in different IMFs

To obtain a relatively stable speed of the vehicles within the cluster, we use the average speed of the vehicles within the cluster to characterize the stability of the cluster and filter the vehicle nodes within the above set of pairs of neighboring nodes by motion consistency to remove the vehicle nodes within the cluster with a large difference from the average speed, to ensure that the cluster can travel on the road in a relatively stable manner. Specifically, the average speed of the vehicles within the cluster at time t can be expressed as follows:where N (t) denotes the number of elements in the set of neighboring nodes of V_i at time t and V_it represents the n-th element within the set N_vi of neighboring nodes of _Vi at time t. If the velocity of V_jn satisfies the following equation, it will be removed:

The set of neighbor nodes of vehicle V_i at moment t can be expressed as follows:

The average end-to-end delay reflects the effectiveness of the protocol and can be determined using the following formula:When the input signal of the network is the k-th training sample X_k, its output value of the j-th neuron after the nonlinear transformation of the i-th hidden layer neuron is as follows:

To address the phenomenon of modal confusion in the empirical modal decomposition algorithm, Huang proposed the method of interruption detection; the specific idea is: after each decomposition, the final decomposition result is analyzed and judged, and if the phenomenon of modal confusion is found, it is filtered by selecting the appropriate interruption scale from it and then decomposed again. However, the interruption detection is a posteriori means of considering the judgment; then it may lead to the following situation: the scales potentially contained in the signal are incorrectly filtered out, and the adaptiveness of the empirical modal decomposition itself is thus greatly weakened, so in some specific cases, this interruption detection method has certain shortcomings, which will have an impact on its inspection effect. The approach in the direction of classification algorithms, that is, based on the flaws and deficiencies found in previous algorithms for solving imbalance problems, the algorithms are appropriately improved and extended to improve the ability to handle imbalance classification problems. The use of a pseudosignal has also been proposed to solve the modal aliasing problem by introducing a pseudosignal to avoid the inclusion of too wide a band in the IMF, but this method is also an approach that requires human subjective judgment to intervene and suffers from the same problem of weakened adaptivity.

4.2. Time Series Volatility Prediction Model Based on Support Vector Machine Grain Calculation

Support vector machine-based time-series information granulation is a new research direction of time-series analysis by introducing the idea of granular computing support vector machine into time-series analysis. The concept of information granulation whose idea is to decompose a whole into small parts and then study the parts obtained from the decomposition, and each part that is divided is a grain. In another way, an information grain is some elements that are similarly close and indistinguishable or have some function and are combined. Information granulation of time series is the basis for compressing the size of time-series data and using it for subsequent time-series analysis, interpretation, and modeling, and its research framework is shown in Figure 3.

When granulating information on time series, it mainly includes two steps: information grain division and information grain description. Information grain division is to divide the time series into several small subsequences, and each subsequence is called an information grain; information grain description is to construct a description method to effectively characterize the information grain on the information grain obtained from the division. Through the information granulation operation of time series, the research object can be abstracted from the low-level, fine-grained original high-dimensional time-series granulation to the high-level, coarse-grained low-dimensional grain time series, and the constructed information grain can portray and reflect the local features of the original time-series data, which achieves efficient dimensionality reduction and lays the foundation for the subsequent data mining work. For the information granulation of time series, some scholars have conducted relevant research and achieved certain research results.

By combing the published research results, the existing research on time-series information granulation can be divided into two aspects: (1) time-axis information granulation of time series, that is, to solve the problem of an effective division of time window representation of time series, and (2) research on time series’ domain information granulation, that is, to solve the problem of an effective division of time series’ domain representation.

Time-axis information granulation of time series is to divide the time series into some time windows according to its change characteristics on the time axis according to some method, and the subsequence on each time window is regarded as an information grain, and then the subsequence on the divided time window is characterized effectively. The resulting interval information grain can achieve full coverage for the data samples on the time windows, as shown in Figure 4.

The theoretical domain information granulation of time series is to divide the time series into several theoretical domain intervals according to its variation characteristics on the theoretical domain according to some method, and each theoretical domain interval is regarded as an information grain, and then the divided theoretical domain intervals are characterized effectively. The research on the theoretical domain information granulation of time series is mainly divided into four types: the first is the equal interval theoretical domain division method; the second is the equal frequency theoretical domain division method; the third is the clustering-based theoretical domain division method; and the fourth is the optimization theory-based theoretical domain division method. The main research methods of time-series information granulation in terms of time axis are interval-based time-axis information granulation, clustering-based time-axis information granulation, and fuzzy set-based time-axis information granulation. These methods usually use a fixed time interval to divide the time series, that is, hard division, and then represent the subsequence (information grain) obtained after the division, ignoring the change characteristics (structural characteristics) of the time series on the time axis, which does not conform to the essential meaning of information grain. The problem of unbalanced data classification is a key problem in the field of machine learning and data mining. From a realistic point of view, the distribution of data sets in a large number of classification problems is unbalanced, and the importance of each category is different; usually, sparse categories of data are more worthy of study in a particular context. Therefore, it is necessary to design the information granulation method of time series according to the changing characteristics of time series on the time axis so that the obtained information grains have similar internal structures among themselves and distinguishable information grains from each other. Real-life time-series data are usually characterized by high dimensionality and high noise, so it is crucial to effectively perform information granulation operations on time series to reduce the data size of time series and reduce the impact of noise.

The information granulation operation on time series in terms of the time axis is essentially the same as the traditional time-series dimensionality reduction representation method; both are to reasonably compress the data size while keeping the important features of the original time series as much as possible. However, the traditional time-series dimensionality reduction representation method does not compress the time series to a high degree and does not reflect the structural characteristics of the time series well, thus affecting the effectiveness of the subsequent analysis work. The granular time series obtained after the information granulation operation cannot directly participate in the data mining work of time series yet and need to combine the characteristics of the information granular to propose the corresponding similarity measure before the subsequent analysis calculation. The most important decision tree in the multiclassification problem is established by using the combination of grain calculation and Huffman tree, using its feature of the shortest length of the path with weight so that it can be attributed to the category in the shortest time, constructing the Huffman model of multiclassification, analyzing the grains of the decision tree, and using the granularity and decision tree to construct different multiclassifiers corresponding to each grain. Finally, the global model is constructed. In solving the multiclassification problem, the use of grain computing, time series, and support vector machines are combined, which not only inherit the advantages of each but also complement the disadvantages of each, and get the synergistic enhancement effect.

5. Experimental Verification and Conclusions

5.1. Time Series Domain Division

In multiclassification of textual problems, the huge amount of information is a major problem in solving multiclassification. First, the problem is granularized, and in the text background knowledge, the text content is categorized as environment, computer, transportation, education, economy, military, sports, medicine, art, politics, and so on. Then the different disciplines are considered as particles in the multilevel granular structure, and all these categories belong to the same granular layer. The particles in the upper granular layer of these particles are combinations with similar particle size characteristics, and these are coarse particles of the lower layer, and the lower layer is fine particles of the upper layer. The weights processed sum to 1. For better operation, the weights are multiplied by 1,000 times to become integers. And use VC++ 6.0 software programming to construct the decision tree, function CrtHuffTree (Huffnode ht[] and int n) to implement the tree construction, and function code (Huffnode ht[], Huffcode hcd[], Huffcodess, and int n) to implement the encoding. From Figure 5, it can be seen that the subintervals of the theoretical domain obtained based on the support vector machine class method are consistent with the distribution characteristics of the data, that is, the subinterval interval divided is smaller in regions with dense data distribution and larger in regions with sparse data distribution.

However, there is a great difference in the amount of data contained in the subintervals of the theoretical domain obtained by this method, that is, the amount of data contained in the subintervals with dense data distribution is large and the amount of data contained in the subintervals with sparse data distribution is small, so the subintervals are optimized using the information granulation method below. From the decomposition results, the multiscale characteristics of the original time series can be initially found: from IMF1 to IMF10, the IMF component gradually changes from high-frequency vibrations to low-frequency vibrations; the series with higher vibration frequencies, compared with the original series, remains the same in terms of fluctuation frequency, and there are some differences in terms of amplitude, which represent the short-term effects triggered by the normal fluctuations of the securities market and the occurrence of irregular events; the vibration series with lower frequencies vary relatively flat, and each change in the series represents a long-term impact triggered by a major event; and the original series always fluctuates up and down around the residual term and shows an increasing trend in the long run.

5.2. Accuracy Comparison between Different Models

According to the results reflected in the volume-price relationship model, the model regression with the volume-price relationship as input is better, and the prediction effect is better for the more volatile prediction, and the volatility of the volume-price relationship has a closer relationship with the volatility of the closing price of the stock. For different step sizes, the loss function of the training set decreases and finally converges, while the loss function of the test set with step sizes 10 and 30 has a better effect, and the loss function of the test set with step size 50 decreases and converges poorly. According to the final mean absolute error MAE, the MAE of the training set with different step sizes is close to 0.04, and the best performance of the model with 30 step sizes is 0.011784855. Refining the classification decision for local neighborhood data distribution, adjusting the posterior probability estimates and using rough set approximation theory to handle extreme distribution cases, eliminates the uncertainty of lacking rare class data. After the reclassification decision based on the refined instance distribution, the dynamic mean-neighbor classification algorithm based on neighborhood rough sets can classify query instances into classes more accurately.

In terms of training time, the longer the step size window, the longer the training time. Based on the above conclusion, in the next model testing process, it was decided to choose a window with a step size of 30 for testing. According to the loss function change image of the technical indicator model, the loss function of both the training and test sets are under and finally converge. According to the results reflected by the technical indicator model, the loss function MAE of the test set with technical indicators as input is 0.01071587, which is slightly smaller than that of the volume-price relationship model, and the overall regression result is slightly better than that of the volume-price relationship model.

However, based on the observation of the fitted curves, the technical indicator model has a better prediction for more volatile and weaker volatility realizations for less volatile. The ARIMA-SVM-GRC method is applied to reduce the normalized volume-price relationship and technical indicator data to extract the data features of volume-price relationship and technical indicators and reduce the six-dimensional volume-price relationship data to three-dimensional and technical indicator data to three-dimensional. According to the results of the principal component ratio, the first principal component of the volume-price relationship accounts for the largest share incorporates all the information of the sample, and the first and second principal components of the technical indicators account for about 50%. The change in the value of the loss function of each model is shown in Figure 6.

As shown in Figure 7, the ARIMA-SVM-GRC method increases the validity of the prediction results by performing deterministic prediction of runoff while also obtaining uncertain prediction results from experimental data. Usually, the prediction interval can reflect the fluctuation of runoff, and the ARIMA-SVM-GRC method can handle discrete and nonlinear relationships. To further demonstrate the predictive power of the proposed ARIMA-SVM-GRC method, we compare the ARIMA-SVM-GRC method with other traditional data-driven methods in a cross-sectional manner.

In this section, we conduct comparison experiments with BP, RBF, and SVQR methods. In the cross-sectional comparison analysis of these methods in terms of both point prediction and interval prediction, the BP and RBF algorithms have poor results for runoff prediction, with an average absolute percentage error above 6%, while SVQR and ARIMA-SVM-GRC have smaller average absolute percentage error and both below 4%, with the plurality-based SVQR method having the best average absolute percentage error of 2.7%. For the relative mean squared error, both BP and RBF results are above 3%, while both SVQR and ARIMA-SVM-GRC methods are stable below 2%. The mean absolute error is greater than 30 for both BP and RBF methods and below 30 for SVQR and ARIMA-SVM-GRC methods.

For the interval prediction results, we compared the SVQR method and the ARIMA-SVM-GRC method, and the coverage probabilities of both methods were above 95%, and the bandwidths were less than 30%, indicating that both the SVQR method and the ARIMA-SVM-GRC method could better reflect the interval prediction results, and the FIG-SVQR method improved the coverage probability by 0.59% compared with the SVQR method and reduced the coverage probability by 0.59%. The FIG-SVQR method improves the coverage probability by 0.59% and reduces the bandwidth by 0.99% compared with the SVQR method, which indicates that the ARIMA-SVM-GRC method proposed in this section is significantly better than the SVQR method in terms of interval prediction.

5.3. Comparison of Model Error Scenarios

According to the model error, the prediction error of the decomposition integration model is lower than the prediction error of the model under multiple factors, but the overall deviation situation is not significant. Moreover, the decomposition integration model can improve the prediction accuracy of the neural network model as much as possible. So, when the difference between the prediction results of the multifactor under model and the decomposition integration model under time series is applied to solve the more complex problem of fluctuation is not large, it is necessary to consider the prediction results of the reference two models together. When the difference between the forecasts of the two models is large, the model with the smaller error can be considered for forecasting. From the comparison in Figure 8, it can be seen that CMS takes the longest time, which is mainly because CMS needs to consider both intra- and interattribute similarity in the similarity of two objects, and from the introduction in Section 3, it can be seen that the algorithm has a high time complexity and therefore takes the longest time. HM, OF, IOF, Eskin, and k-modes algorithms belong to the more classical algorithms.

And after algorithm optimization, the running time of the ARIMA-SVM-GRC algorithm mainly depends on the baseline scale clustering results and the scale conversion method and is affected by the number of clusters in the baseline scale clustering results; the more the number of clusters obtained, the longer the time required for scale up-projection, and the less the number of clusters, the shorter the time required for scale up-projection, independent of the original data set independent of the size of the set. Since the number of clusters obtained from the benchmark scale clustering is much smaller than the sample size of the original data set, the ARIMA-SVM-GRC algorithm requires much less running time than other comparison algorithms. According to the dynamic mean query neighborhood, a more scientific as well as more rigorous method is needed to calculate the local confidence intervals and global confidence intervals in the query neighborhood to determine the actual situation of the sample distribution of the minority category data in the neighborhood. Since the algorithm running time is affected by the experimental environment and the degree of algorithm optimization, the CMS and Eskin methods in the experimental algorithm are obtained according to the formula reduction in the journal without algorithm optimization, so the running time is relatively long.

6. Conclusions

With the development of technology and network, multiclassification problems occupy an increasingly important position in people’s lives. The multiclassification problem is becoming more and more difficult to solve along with the increase of disturbing factors and the massive amount of data.

By studying the multiclassification problem, a support vector machine multiclassification model based on granular computing is proposed and combined with a time-series fluctuation prediction model to analyze and deal with the multiclassification problem. In this paper, granular computation is incorporated into the data preprocessing model. The problems of large data processing and low training speed can be effectively solved by using ideas such as granulation and hierarchy of granular computing triad, which are analyzed and transformed from the perspective of granular computing. In the multiclassification problem, whether the decision tree is constructed reasonably is crucial, using the granularity of granular computing combined with the Huffman tree to construct the optimal decision binomial tree, so that it can obtain the classification in the shortest time and solve the problem of uneven samples within the class, which provides a practical method for the analysis of multiclassification problems. In terms of the time axis of time series, the method of granulating time-series information based on fluctuation points is proposed for the structural characteristics of low-frequency time series. The key to the research of this method is the definition and identification of fluctuation points in the time series, identifying fluctuation points by operating on the original time series, dividing the information grain of the original time series using the fluctuation points, and then describing the information grain after the division using a linear function to complete the information granulation operation of the time series and transforming the original time series into a grain time series. Since the number of information grains and the corresponding time window size in the granular time series are different, a new similarity measure based on linear information granulation is proposed to facilitate the subsequent data mining work on the granular time series. Firstly, to ensure the one-to-one correspondence between linear information grains of different time series, the segmentation matching algorithm of linear information grains is proposed; secondly, for the matched linear information grains, the corresponding similarity metric algorithm is proposed.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest in this article.

Acknowledgments

This work was supported by the projects of Ningxia Natural Science Foundation (No. 2022AAC03315, time-series analysis and method research based on granular computing; No. 2022AAC03328, research on vegetable supply chain traceability based on combined RFID technology and service-oriented architecture (SOA); No. 2021AAC03235, fusion method of interest target detection in low illumination visible image and infrared image; No. 2022AAC03314, high-precision numerical simulation for the incompressible magnetohydrodynamic problems; and No. 2022AAC03301, high-order difference schemes on an adaptive algorithm for convection-diffusion reaction equations).

References

M.-H. Fan, M.-Y. Chen, and E.-C. Liao, “A deep learning approach for financial market prediction: utilization of google trends and keywords,” Granular Computing, vol. 6, no. 1, pp. 207–216, 2021.
View at: Publisher Site | Google Scholar
E. Bas, U. Yolcu, and E. Egrioglu, “Intuitionistic fuzzy time series functions approach for time series forecasting,” Granular Computing, vol. 6, no. 3, pp. 619–629, 2021.
View at: Publisher Site | Google Scholar
T. Wang, T. Ma, D. Yan et al., “Prediction of heating load fluctuation based on fuzzy information granulation and support vector machine,” Thermal Science, vol. 25, no. 5, pp. 3219–3228, 2021.
View at: Publisher Site | Google Scholar
E. Egrioglu, U. Yolcu, and E. Bas, “Intuitionistic high-order fuzzy time series forecasting method based on pi-sigma artificial neural networks trained by artificial bee colony,” Granular Computing, vol. 4, no. 4, pp. 639–654, 2019.
View at: Publisher Site | Google Scholar
Z. Han, J. Zhao, H. Leung, K. F. Ma, and W. Wang, “A review of deep learning models for time series prediction,” IEEE Sensors Journal, vol. 21, no. 6, pp. 7833–7848, 2019.
View at: Google Scholar
M. Bose and K. Mali, “Designing fuzzy time series forecasting models: a survey,” International Journal of Approximate Reasoning, vol. 111, pp. 78–99, 2019.
View at: Publisher Site | Google Scholar
D. Yu, Z. Xu, and X. Wang, “Bibliometric analysis of support vector machines research trend: a case study in China,” International Journal of Machine Learning and Cybernetics, vol. 11, no. 3, pp. 715–728, 2020.
View at: Publisher Site | Google Scholar
H. Wu, H. Long, Y. Wang, and Y. Wang, “Stock index forecasting: a new fuzzy time series forecasting method,” Journal of Forecasting, vol. 40, no. 4, pp. 653–666, 2021.
View at: Publisher Site | Google Scholar
Y. He, Y. Yan, and Q. Xu, “Wind and solar power probability density prediction via fuzzy information granulation and support vector quantile regression,” International Journal of Electrical Power & Energy Systems, vol. 113, pp. 515–527, 2019.
View at: Publisher Site | Google Scholar
K. Mateńczuk, A. Kozina, A. Markowska et al., “Financial time series forecasting: comparison of traditional and spiking neural networks,” Procedia Computer Science, vol. 192, pp. 5023–5029, 2021.
View at: Google Scholar
C. Luo, C. Tan, and Y. Zheng, “Long-term prediction of time series based on stepwise linear division algorithm and time-variant zonary fuzzy information granules,” International Journal of Approximate Reasoning, vol. 108, pp. 38–61, 2019.
View at: Publisher Site | Google Scholar
S.-T. Kim, I.-H. Choi, and H. Li, “Identification of multi-concentration aromatic fragrances with electronic nose technology using a support vector machine,” Analytical Methods, vol. 13, no. 40, pp. 4710–4717, 2021.
View at: Publisher Site | Google Scholar
G. Dong, R. Li, J. Jiang, H. Wu, and S. C. McClure, “Multigranular wavelet decomposition-based support vector regression and moving average method for service-time prediction on web map service platforms,” IEEE Systems Journal, vol. 14, no. 3, pp. 3653–3664, 2019.
View at: Google Scholar
W. H. Su, S. Bakalis, and D. W. Sun, “Chemometric determination of time series moisture in both potato and sweet potato tubers during hot air and microwave drying using near/mid-infrared (NIR/MIR) hyperspectral techniques,” Drying Technology, vol. 38, pp. 806–823, 2020.
View at: Publisher Site | Google Scholar
M. Askari and F. Keynia, “Mid‐term electricity load forecasting by a new composite method based on optimal learning MLP algorithm,” IET Generation, Transmission & Distribution, vol. 14, no. 5, pp. 845–852, 2020.
View at: Publisher Site | Google Scholar
A. R. S. Parmezan, V. M. A. Souza, and G. E. A. P. A. Batista, “Evaluation of statistical and machine learning models for time series prediction: identifying the state-of-the-art and the best conditions for the use of each model,” Information Sciences, vol. 484, pp. 302–337, 2019.
View at: Publisher Site | Google Scholar
C. H. Fajardo-Toro, J. Mula, and R. Poler, “Adaptive and hybrid forecasting models—a review,” Engineering Digital Transformation, Springer, Berlin, Germany, 2019.
View at: Google Scholar
Y. Zhao, T. Li, and C. Luo, “Spatial-temporal fuzzy information granules for time series forecasting,” Soft Computing, vol. 25, no. 3, pp. 1963–1981, 2021.
View at: Publisher Site | Google Scholar
W. Chen, M. Jiang, W.-G. Zhang, and Z. Chen, “A novel graph convolutional feature based convolutional neural network for stock trend prediction,” Information Sciences, vol. 556, pp. 67–94, 2021.
View at: Publisher Site | Google Scholar
T. Ouyang, Y. He, H. Li, Z. Sun, and S. Baek, “Modeling and forecasting short-term power load with copula model and deep belief network,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 3, no. 2, pp. 127–136, 2019.
View at: Publisher Site | Google Scholar
P. E. Puspita, T. İnkaya, and M. Akansel, “Clustering-based sales forecasting in a forklift distributor,” International Journal of Engineering Research and Development, vol. 11, no. 1, pp. 25–40, 2019.
View at: Google Scholar
P. S. G. de Mattos Neto, J. F. L. de Oliveira, D. S.D. O. Santos Júnior, H. V. Siqueira, M. H. N. Marinho, and F. Madeiro, “An adaptive hybrid system using deep learning for wind speed forecasting,” Information Sciences, vol. 581, pp. 495–514, 2021.
View at: Publisher Site | Google Scholar
A. A. Nasser, M. Z. Rashad, and S. E. Hussein, “A two-layer water demand prediction system in urban areas based on micro-services and LSTM neural networks,” IEEE Access, vol. 8, pp. 147647–147661, 2020.
View at: Publisher Site | Google Scholar
K. Singh, R. Tiwari, P. Johri, and A. A. Elngar, “Feature selection and hyper-parameter tuning technique using neural network for stock market prediction,” Journal of Information Technology Management, vol. 12, pp. 89–108, 2020.
View at: Google Scholar
W. Wang, W. Liu, and H. Chen, “Information granules-based BP neural network for long-term prediction of time series,” IEEE Transactions on Fuzzy Systems, vol. 29, no. 10, pp. 2975–2987, 2020.
View at: Google Scholar
Z. Nannan and L. Chao, “Adaptive online time series prediction based on a novel dynamic fuzzy cognitive map,” Journal of Intelligent & Fuzzy Systems, vol. 36, no. 6, pp. 5291–5303, 2019.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Yuan Yang and Xu Ma. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

383

Downloads

663

Citations

Journal of Robotics

Internet of Robotic Things-Enabled Edge Intelligence Cognition for Humanoid Robots

Support Vector Machine and Granular Computing Based Time Series Volatility Prediction

Abstract

1. Introduction

2. Related Work

3. Support Vector Machine Based Algorithms for Particle Computation

4. Support Vector Machine and Granular Computing Based Time Series Volatility Prediction

4.1. Empirical Modal Decomposition of Time Series Fluctuation Algorithms

4.2. Time Series Volatility Prediction Model Based on Support Vector Machine Grain Calculation

5. Experimental Verification and Conclusions

5.1. Time Series Domain Division

5.2. Accuracy Comparison between Different Models

5.3. Comparison of Model Error Scenarios

6. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright