#### Abstract

Data mining refers to the process of obtaining information from a huge amount of data through algorithms, which provides reception support for people to apply data from simple problems to extracting and discovering knowledge in data. This study examines traditional data mining methods and their applications in order to increase the timeliness and usability of data extraction algorithms. Through data mining and recognition of motion gestures, the most accurate algorithm data are given. This study begins by looking at data mining classification methods and associated algorithms. Then, using a neural network, a motion attitude prediction is created and the neural network is used to test the algorithm. The experimental results show that the single-stage neural network and the bipolar neural network can achieve an average accuracy of 87.9%, while for the WTO model, it can achieve an average accuracy of 95.7%.

#### 1. Introduction

In a rapidly developing society, all kinds of information data show explosive growth, resulting in a large backlog. The emergence of the data mining technology solves the problem that it is difficult for people to find useful information from these massive data. Through ensemble data learning, the initial data are changed into a suitable operational form and useful extraction data are derived. Finally, by implementing different data mining strategies to create useful patterns, one can make predictions and gain information about new data samples. After the reform and opening up, under the new situation, the national constitution is facing new challenges. With the breakthrough of machine algorithms, the accuracy of motion gesture recognition technology is getting higher and higher, which makes motion gesture recognition enter all aspects of life.

The advent of databases has helped us store vast amounts of information and data sources, which contain a wealth of useful silent information that can serve as a basis for making proven decisions. Data mining is mainly used to classify and predict data to derive useful rules and criteria. Motion attitude prediction has a wide range of application prospects and considerable economic value. Using motion posture prediction can not only achieve long-term effective monitoring of key areas but can also identify the behavior of relevant personnel involved in the video, so as to provide targeted warnings; in the field of human-computer interaction and virtual reality, it can analyze more complex user actions, increase the immersion and entertainment of the device, and further meet the needs of users.

The innovation of this paper is that (1) the data mining is described, the classification methods in data mining are analyzed, and the specific algorithm analysis of the algorithms, as well as their respective advantages and disadvantages. (2) The research on motion and attitude prediction based on the neural network is proposed and the data mining method is used to make it have a certain adaptive ability, and it can discover the crucial knowledge useful for decision-making from all data, which improves the intelligence of the system.

#### 2. Related Work

Data mining technology is a new technology that came into being in the 1980s but matured in the late 1990s and has gradually been widely used in many fields. Buczak and Guven present a survey of machine learning and data mining methods for network analysis. He identifies, reads, and summarizes papers representing each method, discusses the ML/DM is used for cybersecurity challenges, and provides some advice. However, there are many redundancies in his articles [1]. Xu et al. takes a broader perspective on privacy issues related to data mining and studies various methods that help protect sensitive information. He introduces related research topics and reviews state-of-the-art methods [2]. The purpose of the study by Kavakiotis et al. was to provide a systematic review of the application. The title application in his selected articles highlights the usefulness of extracting valuable knowledge, leading to a deeper understanding of DM and new hypotheses for further research. However, his research is only at the surface [3]. Chaurasia and Pal study the performance of different classification techniques. Breast cancer data were used for testing, using classification accuracy. In experiments, he compared three classification techniques and the results showed that sequence minimal optimization has a higher prediction accuracy of 96.2% compared to IBK and BF tree methods. However, his comparative technique has limitations [4]. The aim of the study by Emoto et al. was to elucidate gut microbiota profiles in patients with coronary artery disease. Operational classification units are determined to be significant through data mining methods and general statistical comparisons. However, the database in the experiment needs to be improved [5]. Triguero et al. present the third major version of the KEEL software; in this work, he describes the latest components added in KEEL 3.0. Additionally, new interfaces in R have been incorporated to perform algorithms contained. However, this method consumes too much and wastes resources [6]. Lei et al. explore the potential of using modular optimization community detection algorithms to identify important accident characteristics. The findings show that community detection algorithms are very effective at identifying clusters with identifiable characteristics; clustering helps reveal relationships that remain hidden when the entire dataset is analyzed and association rule learning algorithms are useful for −190. However, the experimental comparison of the article draws less conclusions [7]. A research study by Pourghasemi et al. proposes a series of data mining methods to map gully erosion susceptibility in the Aghemam watershed. Experimental data demonstrate the important role of ensemble modeling in consistently building accurate and general models, underscoring the need to examine the ensemble. However, there are too many independent variables in his experiment and the experimental results will be biased [8].

#### 3. Classification Methods in Data Mining

##### 3.1. The Basic Process of Data Mining

It can be understood that the data must be real and huge; the content under investigation is useful knowledge for users [9].

The premise of data mining is to clearly define the target problem and work and to define the purpose of data extraction. On the basis of clear extraction purpose, knowledge discovery is carried out according to the basic steps. The whole process of data extraction has many processing stages as shown in Figure 1. Data cleaning: cleaning up incomplete, unclear, a lot of noisy, and random data for practical applications; exporting calculations to supplement pre-selected and incomplete data; and correcting abnormal data and deleting duplicate data. Data integration: selecting a variety of different data to use for physical or logical organic integration, which provides a good preparation for a series of subsequent data processing. The implementation of this step should address the differences in the physical form of the data caused by different data types. Data selection: finding and selecting relevant datasets from comprehensive datasets containing large amounts of data, exporting them, and obtaining corresponding pairs of operations for mining tasks, according to the work objectives. Data transformation: converting data types to data formats suitable for mining. One of the important purposes of data transformation is to reduce the dimension of the data, that is, to find out the features or variables that are really useful to the data. Data mining: data mining methods currently take on various forms because in the process of research and development, technologies and research results from different industries are continuously integrated into data mining. From a statistical point of view, data mining is currently mainly used in statistical analysis techniques: methods include regression analysis, recent series analysis, time series analysis, nonlinear analysis, linear analysis, nearest neighbor algorithm analysis, multivariate analysis, single analysis, variable analysis, cluster analysis, etc. Using these methods, data with abnormal behavior can be detected and then interpreted using a series of mathematical or statistical models that reveal the underlying patterns and knowledge behind the data. Knowledge discovery technology is a data mining technology that is completely different from statistical analysis technology. The main methods used are vector support machines, genetic algorithms, artificial neural networks, rough sets, correlation rules, decision trees, and so on. Pattern evaluation: using tools to measure and identify patterns discovered through data mining and evaluate their effectiveness and feasibility to discover truly valuable patterns representing knowledge based on specific measures of interest. Knowledge representation: interpreting the mined knowledge and transforming it into knowledge that the user can ultimately understand, and the methods and representations of knowledge reproduced through imaging techniques.

##### 3.2. Classification Algorithm

The construction methods of classifiers include decision tree classification and artificial intelligence methods. According to different research directions of classification algorithms, it can be divided into the following categories: neural network, the Bayesian classification algorithm, nearest neighbor *K* algorithm, decision tree classification algorithm, rough set, genetic algorithm, etc. [10]. Statistical methods mainly include the Bayesian classification and nearest neighbor K algorithm. The neural network method is mainly the BP algorithm. The model of the BP algorithm is composed of the forward propagation of the signal and the backpropagation of the error, which is a nonlinear continuous transfer function.

###### 3.2.1. BP Neural Network Algorithm

The neurons of the input layer are composed of the characteristic attributes of the objects in the training set, and the neurons of the output layer are composed of the sample object. Each neuron consists of two elements, the first aspect is the computation of the input value, and then, the resulting value is calculated using the activation function to get the output.

Forward propagation of information and backpropagation of errors constitute the learning process of the BP algorithm. Forward propagation passes through the input layer, after processing on the hidden layer, and then transmits. When the actual output value is inconsistent with the expected result, it should be corrected by reverse error propagation. Back propagating the error is to redistribute by layers in the input layer through the hidden layer, and then distribute the error to all neurons, receiving the error signal of each layer unit through each layer. The received error signal is then used to correct the weights of each cell. The point of this is repeating and adjusting the structure continuously. When it is within the control range, i.e., the error so far is less than a certain value, the loop ends, and is usually called a gradient descent [11].

In Figure 2, is the input layer and is the output layer, which constitute the BP network.

We take the three-layer neural network (input layer, hidden layer, and output layer) as an example to deduce the back propagation algorithm, as shown in Figure 3.

Because neurons in the left layer are input to neuron *j*, the synaptic weight is equivalent to the bias of neuron *j*.

The region produced at the input of the activation function *j*, the input of neuron *j* is as follows:

is the activation function:

Error backpropagation derivation process: and are the actual output and expected output value, respectively, then generated as follows:

Making the function continuously derivable, the root mean square difference is minimized here as follows:

Adding the error energies of all output layer neurons to get the entire network, we obtain as follows:

In formula (5), all neurons are in set C.

The BP algorithm minimizes by repeatedly changing the weights and using gradient descent. Then, there are calculated as follows:

The partial derivative represents a sensitivity factor in which the synaptic weight searches for the synthetic weight in the weight space.

Differentiating on both sides of equation (4), we have

Differentiating both sides of equation (3) with respect to , there are

Differentiating both sides of equation (2) with respect to , there are

Differentiating both sides of equation (1) with respect to , there are

Substituting the abovementioned formulas into equation (6), there are

The negative sign in formula (11) is the gradient descent into the weight space and *σ* is the learning rate. This leads to the following:

Among them, is defined according to the LMS algorithm as follows:

Similarly, the local gradient of the hidden layer neuron *j* can be obtained as follows:

Therefore, the correction value is as follows:

That is, the weight correction value is equal to the product of the learning rate, the local gradient, and the output signal of neuron *i*.

The traditional BP neural network algorithm, which is full of advantages in learning ability, also has shortcomings, such as the local minimization problem involved in this paper. From a mathematical point of view, the traditional BP neural network is a local search optimization method, which is to solve a complex nonlinear problem. The weights of the network are gradually adjusted in the direction of local improvement, which will cause the algorithm to fall into the local extreme value, and the weights will converge to the local minimum point, resulting in the failure of network training.

###### 3.2.2. Decision Tree Classification Algorithm

Decision trees have a dramatic effect on processing large amounts of data and do not require much expertise in tree formation. Therefore, it is usually used in data mining applications, and the knowledge obtained in the resulting tree structure is intuitive and easy to be accepted by people [12].

Assuming that the training set S is a collection of arbitrary sample objects, which contains *m* objects, and assuming that (*i* = 1, 2, …, *m*) is *m* different classes. Letting be the set of objects in dataset S that belongs to class , |*S*| is the number of data objects in dataset *S* and is the number of data objects in .

Classifying all the tuples, the entropy of the set *D* is expressed as follows:

Among them, .

A contains *n* different objects ; at the same time, *S* is divided into *n* partitions , and the information required for can be summed by weighting the entropy of the *n* partitions as follows:

The information gain is as follows:

###### 3.2.3. Naive Bayesian Classification Algorithm

The principle of the naive Bayes algorithm is to assume that the presence or absence of a specific feature is independent of the presence or absence of other features [13].

The explanation of Bayes’ theorem is as follows: suppose *K* is a data object and we describe *K* with *n* attribute values: supposing *H* represents the hypothesis that *K* belongs to a certain class *Q*. *P*(*H*) is the prior probability, and *P*(*H*) and event *K* are independent of each other.

The Bayes rule is as follows:

Its basic idea can be summarized as follows: Assuming a training set with *m* elements, each element in *Z* can be represented by a vector of *n*-dimensional attributes. Assuming that *n* attributes, then *K* is the *n* measures of the *n* attributes. Assuming that there are *m* sample classes , the naive Bayes algorithm predicts an object *K* whose attribute class is unknown, then the attribute class of *K* is the class to which the posterior probability belongs. That is,

##### 3.3. Comparison of Different Classification Algorithms

The three algorithms mentioned above are quite different in different aspects and efficiencies, so it is necessary to select an appropriate algorithm according to different data characteristics and the final purpose of classification and obtain a classification model with outstanding performance, time and space complexity, and classification speed.

The neural network classification algorithm has high classification accuracy and strong learning ability and can be used for feature extraction, but its learning process is long and the process is opaque, so it cannot detect and monitor the process. Decision tree classification algorithms are easy to understand and run relatively fast, but they are prone to overfitting and handling the missing data can be tricky. The naive Bayes classification algorithm is fast, has a short time period, and the interpretation of the results is simple and clear, but its classifier needs sufficient database support and threshold adjustment [14, 15].

Through the comparative analysis of the algorithms, it can be seen that when faced with different problems, it is necessary to choose the mining algorithm to solve the problem. Each classification algorithm has its own unique advantages and limitations. At the same time, for the same classification problem, there can be many different classification algorithms to solve it. In contrast, the neural network algorithm has strong robustness and fault tolerance to noisy data. Compared with other algorithms, it has a strong ability to process noisy data. Moreover, because of its strong learning ability, it can easily find the classification pattern of the original data, and at the same time, the neural network can continuously improve its own performance to improve the classification accuracy and prediction ability.

#### 4. Experiment and Analysis of Motion Attitude Prediction Based on the Neural Network

##### 4.1. Magnitude Normalization of the Input Data

The eigenvalues of the sample data used in the experiment are shown in Table 1, which are a set of data on the firing effect of corecomponents. The source of this data is the UCI machine learning library.

The BP neural network algorithm is used to train and learn the burning of the core composition, which is used to predict the burning effect of the core composition. The original data of a group of experiments were not standardized by size classification, and the data preprocessing operation of normalization was directly performed. Another set of experiments is to normalize the raw data by size classification, and then perform the normalized data preprocessing operation. The experimental results were observed for the two groups of experiments. The neural network consists of 8 input layer nodes, 4 hidden layer nodes, and 5 output layer nodes, and the total number of samples is 20.

The error between the predicted result and the true value is shown in Figure 4. After unifying the order of magnitude of the data into the range of [0, 10], other routine operations of preprocessing the data are performed. After these preprocessing procedures, the batch learning BP neural network algorithm mentioned later is applied to obtain the experimental results.

The experimental results show that the predicted value obtained by applying the BP neural network algorithm to the experimental group that does not use the order of magnitude normalization operation has a large error with the actual value. However, in the experimental group using the normalization operation of the order of magnitude, the error obtained by applying the BP neural network algorithm to it is small. It can be seen that the order-of-magnitude normalization operation makes the results of data mining have higher accuracy and better learning performance.

##### 4.2. Experiment of the Motion Attitude Intelligent Prediction System

According to the different characteristics of various competitive sports, an intelligent sports training planning system is designed, using database technology, data mining technology, and knowledge engineering technology. According to the theory of sports training, the training plan carefully designs the data table of each training item and designs an inspection and analysis part. The implementation of the training program may be the subject of a diagnostic analysis of the training results. On this basis, a knowledge-based system is also designed to systematically analyze different educational methods and means and provide a corresponding silent knowledge input, which is helpful for coaches to master and learn different educational methods and means [16, 17].

The dataset used below is the open source dataset WISDM. The acceleration is sampled from a 3-axis accelerometer in a smartphone placed in a trouser pocket at a sampling rate of 20 Hz. The dataset contains a total of 36 test persons and 6 categories of motion poses. Since each sliding segmentation window contains 64 data points, each data point corresponds to data on three axes [18].

This paper adopts two evaluation modes to evaluate the performance of the algorithm:(1)RTO mode: if a tester’s data are used in the test set, its data will not be used for training again.(2)WTO mode: If a tester’s data are used in the test set, then we divide all its data into two parts, one for testing (1/8 of all data for this tester) and the other for training. For the experiments, data from all testers in the dataset were used. For each experiment, the data of 5 testers were selected as testing data and the data of the remaining 31 individuals were used as training data. The only difference between the RTO and WTO mode is that in the WTO mode, the training data contain 1/8 of the test subject data. In order to effectively quantify the experimental results, this paper uses an F-score to evaluate the effect of the algorithm [19–21].

The F-score is a scoring formula commonly used to evaluate the performance of classification models. In the classification model, there are generally 4 classification situations:(1)TP (true positive): A positive sample is accurately predicted as a positive sample(2)FP (false positive, false positive): Negative samples are incorrectly predicted as positive samples(3)TN (true negative): Negative samples are correctly predicted as negative samples(4)FN (false negative, false negative): Positive samples are incorrectly predicted as negative samples

The WTO model mainly evaluates the effect of the proposed method for motion data augmentation; the training data contain 1/8 of the tester data included in the test set (these data are no longer included in the test set). The F-scores of the intelligent recognition algorithms of the single-stage CNN and the bipolar neural network in the RTO mode are shown in Tables 2 and 3, and the calculation amount of the two algorithms is compared in Figure 5.

In addition, the data augmentation algorithm for gesture recognition can generate more artificial training data, which helps to further improve the F-score. Figures 6 and 7 show the F-score of the intelligent recognition algorithm of the single-stage convolutional neural network and bipolar neural network using the data enhancement algorithm in WTO mode.

Figure 7 is the F-score of the two-stage convolutional neural network intelligent recognition algorithm in the WTO mode using the algorithm. Compared with the F-score of the two-stage convolutional neural network intelligent recognition algorithm in the RTO mode without it, the F-score is significantly promoted, and the two motion poses of up and down the stairs (up to 6.3% and 6.0%, respectively).

Under the single-stage convolutional neural network intelligent recognition algorithm in the WTO mode, the highest average value reached 93.9 in 11–15 and the two-stage convolutional neural network reached a high value of 94.5 in 16–20 times.

##### 4.3. Combination of the Single-Stage Convolutional Neural Network and Two-Stage Convolutional Neural Network Algorithm

By analyzing the data in Table 4, it is found that in the RTO mode, in terms of accuracy, the two-stage convolutional neural network intelligent identification algorithm is higher than the single-stage convolutional neural network intelligent identification algorithm. As far as the F-score is concerned, the two-stage convolutional neural network intelligent recognition algorithm is the highest and taller for most motion poses (except walking), most notably going up and down stairs [22]. In the WTO mode, the accuracy and F-score of the two-stage convolutional neural network intelligent identification algorithm are higher than those of the single-stage convolutional neural network intelligent identification algorithm. Larger improvements can also be achieved when combining this algorithm with a data augmentation algorithm for motion poses, especially for stair climbing and descending. Overall, the average accuracy of the proposed algorithm in the RTO mode and WTO mode can reach 87.9% and 95.7%, respectively.

#### 5. Discussion

This paper studies the related concepts and algorithms of the BP neural network algorithm and the naive Bayes classification algorithm in data mining technology. At the same time, it also studies the combination of intelligent motion research and computer technology and applies the neural network method to the intelligent motion gesture recognition system through synthesis or improvement.

For the two evaluation modes of RTO and WTO, the two-level neural network intelligent recognition algorithm and the data enhancement algorithm for motion gestures are tested and analyzed in detail, respectively, and they show the improvement of the effect of these two algorithms in the application of motion gesture intelligent recognition [23]. Experiments on the algorithm show that this thesis is not only the improvement of the algorithm but also lays a foundation for embedding the entire motion gesture intelligent recognition system into portable devices.

Motion gesture recognition has been closely related to human beings since its birth and is often used in all aspects of life such as behavior supervision, medical diagnosis, elderly monitoring, and intelligent interaction. After being combined with deep learning, its accuracy has been further improved, and it has emerged in various fields, bringing great convenience to human life. In addition, with the further development of deep learning, the types of motion gestures covered by motion gesture recognition are also more extensive, which can provide more complex and detailed information for various fields.

#### 6. Conclusions

This paper studies data mining algorithms and their applications, focusing on the current popular data mining algorithms and their applications. Sports research is a very persistent topic, and the ultimate goal is to promote the physical health of the people. The characteristics of national physique are of the times. Different times have different factors affecting physical health. Similarly, in different times, the development of research level, research methods have also been updated, and the methods of physical health promotion are also constantly updated. This research attempts to use modern science and technology to combine intelligent sports research with computer technology, proposes data mining and intelligent sports training model construction of sports attitude prediction algorithm research, and uses developed Internet technology to realize software online services, in order to promote the national sports. Since it is an experimental study, some research contents need to be further strengthened in the follow-up research.

#### Data Availability

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.