Abstract

In order to improve the effective guidance of martial arts movements, a decomposition method of martial arts movements based on image recognition is proposed. First of all, visual image target sampling of martial arts action is performed, and most of the noise background is eliminated through morphological gradient operation. Then, the contour edge of the human body is obtained, the contour edge of each frame of the video is extracted, and the accumulation is realized in the same image. Using accumulation, the edge image calculates the grid-based HOG to obtain the image action feature vector. Secondly, using the improved dynamic time warping theory combined with the characteristics of the angle change of each joint under the action time sequence, the joint change sequence among various martial arts movements can be identified in order to realize the decomposition process of martial arts actions based on image recognition. The experimental results show that the use of image recognition can effectively decompose martial arts movements.

1. Introduction

At present, with the vigorous development of sports, the competition rules of martial arts competitions are constantly improving in the process of internationalization. The martial arts routine competitions are developing in the direction of rapid movement combinations and difficult movements [1, 2]. The real-time extraction and optimization method of human martial arts action images can extract the contour features of human martial arts action images, and based on this, the decomposition learning of martial arts actions is the fundamental way to solve the above problems, which has attracted the attention of many experts and scholars [35]. Due to the far-reaching developmental significance of the decomposition method of human martial arts movements, it has also become the focus of research by industry insiders and has received extensive attention. At the same time, many good results have appeared [69]. It is the ultimate goal of computer vision research to enable computers to have the same visual perception functions as humans and to recognize external things, perceive scenes, and analyze the activities of surrounding things like humans. Using computers to analyze and understand human movements is challenging. This subject involves multiple disciplines, including cognitive science, pattern recognition, and machine learning, and has certain academic research value. The application of this technology will enable the computer to have the ability to observe the external world and then make a decision response through the automatic analysis and understanding of image information, and the computer will have a better ability to adapt to the environment [1014]. Three-dimensional images of martial arts movements have developed into a very active research topic in the field of image research. Each three-dimensional image of martial arts movements has its own fixed characteristics. They are located at a certain point in the three-dimensional image of martial arts movements, but there are many uncertainties. The interference of factors makes the three-dimensional image of martial arts movement have insignificant areas, which increases the difficulty of studying the three-dimensional image of martial arts movements, and the adaptive enhancement of the insignificant areas is to solve the insignificant areas of the three-dimensional image of martial arts movements. The most effective method to study the trouble has become the focus of many scholars’ research.

Literature [15] pointed out that the analysis and understanding of human actions in videos can be summarized as follows: by extracting movement and performance characteristics from the video, reasonable judgments are made on the type of action and the direction in which the action occurs, and the semantic information corresponding to the action is analyzed in detail. Finally, the person’s behavioral intentions are determined. Action representation, action segmentation, action recognition, and action positioning are the most important studies in the process of human action analysis and understanding. Action representation refers to obtaining the feature vector describing the video by extracting information such as the motion and structure of the input video. In [16], the definition of action segmentation is to divide the continuous video stream into several groups of subvideo segments containing only one action instance. Human action recognition is realized by establishing the association between the video content and the action category. Literature [17] proposed a real-time extraction method of human martial arts difficult action images based on extended edge features. This method first establishes the network structure of the difficult action image of human martial arts based on the principle of lateral suppression competition, prompts the human action pixel in the grid to be connected with its surrounding pixels, extracts the edge features of the difficult action image of human martial arts, and calculates the suppression competition coefficient of the pixel. Based on this, the real-time extraction of images of difficult martial arts movements of the human body is completed. This method is relatively simple, but it has the problem of large limitations. Literature [18] focuses on a real-time extraction method of human martial arts difficult action images based on the improved Canny method. This method firstly integrates the color features of the human martial arts difficult action image to initially locate the outline of the human martial arts difficult action image and performs edge detection on the human martial arts difficult action image. The Canny method is used to calculate the gradient amplitude. Difficult martial arts action images are extracted in real time. This method has high extraction efficiency, but the current method is used for the real-time extraction of actions, which cannot extract accurate contour information of difficult martial arts action images, and there is a problem of large errors in real-time action extraction. Literature [19] proposed a real-time extraction method of human martial arts difficult action images based on HSV color space and mathematical morphology. This method first uses color information to filter the areas that may contain the difficult action images of martial arts, then uses mathematical morphology technology to generate connected areas, judges and generates the correct difficult action image areas of martial arts, and finally uses radon transform to perform tilt correction in order to complete the real-time extraction of images of difficult martial arts movements on the human body. This method has high extraction accuracy, but the current method performs real-time extraction of actions, which cannot extract accurate image contour information of martial arts actions, and there is a problem of large errors in the real-time extraction of actions.

Therefore, this technique is used in this article to propose a method of decomposing and recognizing human martial arts movements based on image recognition [20]. Firstly, the morphological gradient operation is used to remove most of the noise background, the edge boundary of the human body contour is obtained, the shape edge of each frame of the image is obtained in the video, it is accumulated in an image at the same time, and the grid-based image is calculated by accumulating the edge image. HOG, the image action feature vector is obtained; secondly, combined with the change characteristics of each joint angle under the action time sequence, the improved dynamic time warping theory is used to identify the similarity of the joint change sequence among various martial arts actions, and the classifier is designed, and the time-varying feature data of human actions in the image is input into it, and the decomposition process of martial arts action based on image recognition is completed, and finally, the simulation test analysis is carried out, and the validity conclusion is reached [2125].

2. Sampling of Martial Arts Action Visual Image Targets

2.1. Principles of Martial Arts Action Image Target Collection

In the process of real-time acquisition of human martial arts action image targets, first determine the spatial position of the martial arts action image, track the spatial position of each body part of the human martial arts action image, build a calculation model for the position of the human body part, and extract the human martial arts action image contour feature, matching different features, selecting a set of matching sequences of the most neighboring action images, as the real-time extraction of the difficult action images of human martial arts. The specific steps are as follows.

Suppose p represents that the camera will reflect the mark, represents the world coordinates of the reflective mark p, and formula (1) is used to convert the world coordinates of the reflective mark p to the three-dimensional coordinates of the camera:

In the above formula, represents the external parameters of the camera during the calibration process.

Assuming that represents the reference point of the camera image, equation (2) is used to track the spatial position of each body part in the human martial arts action image:

In the above formula, f(P) represents its gray value information, N(f) represents the state vector of the martial arts action image, L(ni) represents the target area of the human martial arts action image, and λ(ξ) represents the edge of the image.

Assuming that σ[j] represents the position of the space where the human body martial arts difficult action image part is located, σ[j] represents its direction, and Λt represents the body’s movement speed; then, formula (3) is used to construct the calculation model of the human body part position:

In the above formula, p(k) represents the acceleration of the body part and represents the coefficient of the camera’s radial distortion.

Suppose that Σ(q) represents the position offset of the martial arts action image, and η(λ) represents the gravity acceleration of the action. Equation (4) is used to extract the contour features of the martial arts action image, the different features are matched, and the equation is given as follows:

In the above formula, fij represents the correlation coefficient of the contour edge matching of the human martial arts action image, represents the reference image of the human martial arts action image, ω(μ) represents the real-time image, and E(j) represents the reference image and real-time image of the action image. Assuming that d(p) represents the similarity measurement threshold between the two images, equation (5) is used to match different features, and a set of action image matching sequences that are closest to each other is selected:

In the above formula, V(E) represents the length of the matching sequence of human martial arts action images and represents the optimal matching sequence of images.

The above formula can explain the principle of real-time extraction of martial arts action images, and this principle is used to complete the real-time extraction of martial arts difficult action image targets.

3. Image-Based Martial Arts Action Decomposition and Recognition Process

3.1. Action Feature Extraction Based on Cumulative Edge Image

The combined morphological operation can eliminate part of the background on the video image, the morphological features can be preserved intact, and the silhouette image of the human body can be obtained, which is very similar to the background subtraction technology [2629]. The combined morphology operation formula can be expressed as follows:

In the above formula, G(x,y) represents the image processed through combined morphological operations, F(x,y) represents the frame of image in the original video, B(x,y) represents the structural element, and represents the closing operation. Through the closing operation of formula (6), the area darker than the background and smaller than the size of the resulting element in the original image can be removed. the appropriate structural element is selected, and the remaining background image is obtained through the closing operation, and it is subtracted from the original image. Then, target extraction is completed. One frame of image in the video image of human martial arts action cannot fully express an action. Generally, it is necessary to extract the features of multiple frames to fully demonstrate a human action. Due to the difference in the action rate, even the same action, the number of frames of each video image may be different. In order to deal with the changes of these two rates, taking into account the characteristics of rate changes, the paper accumulates the gray features of each frame of the edge image in the same time window into the same image and uses the established cumulative edge image to extract its features, which are used to represent human martial arts movements. The formula for accumulating the edge image at the point (x, y) at time t can be described as follows:

The cumulative edge image is to multiply the binary image E(x, y) and the morphological gradient image G(x, y) at each pixel to obtain the edge image I(x, y) with gray information, and all edge images are accumulated into one image, instead of accumulating every frame of a binary image into one image. 0 and 1 are the only two values of the pixel gray value of the binary image E(x,y). If the pixel value of the binary image E(x,y) corresponding to the edge image I(x,y) is 1, the gray value range at this point has more information than the binary image. The edge image is accumulated for the target image, and the image information center already contains the edge information of more frames of images, so there is no need to extract the edge features, and the direction gradient histogram can be solved directly at each point of the accumulated edge image. Calculating the directional gradient histogram based on the grid is to solve the directional gradient of all points on the cumulative edge image. The cumulative edge image is divided into i × i spatial grids, the histogram vector is calculated on each grid, one of the scale feature vectors is extracted, and it is used as an action feature, the local shape obtained by the target is counted, and then, the cumulative edge is obtained.

3.2. Human Martial Arts Action Recognition Based on Dynamic Time Warping

Action expression has time continuity, that is, action can be a collection of a series of static actions in a certain period of time. The movement process of the human body can reflect the change trend of the action through the change of the joint angle curve, and the angle change curve of the joint with the time change can be called the “joint angle time series.” Human motion characteristics are described by the time series of joint angles. If the duration of a martial art action is set to T, the motion characteristics can be defined as follows:where the time series of a certain joint angle is represented by the row vector Ai; the row vector when the number of motion features is M is represented by AM; the number of motion features is represented by M, and the range is 1 ≤ M ≤ 16. The row vector Ai can be understood as a time-varying one-dimensional signal and then evolved into a classification problem of simple action recognition classified as time-varying feature data. It can be seen from the prior data that when the tester freely demonstrates martial arts movements, the same movements have different waveforms and amplitudes, and the possibility of similarity to Ai cannot be ruled out. Therefore, action recognition is realized by comparing the similarity of time series; that is, the determination of martial arts action decomposition is realized by comparing the distance between vectors of different lengths.

The comparison of the similarity between the curves is the focus of the time-series change trend. Because there are uncertain factors in the video feedback system and the tester, etc., which will cause the deviation and fluctuation of the data, the following formula is used to smooth the sequence:

In the above formula, the joint angle value of the i-th time point in the sequence is represented by xi; xn+xn + 1 are the joint angle values of n and n + 1, respectively; and n is an integer greater than 0.

The dynamic time warping theory based on the idea of dynamic programming aims to find the optimal matching path and the shortest distance between two test samples of different lengths and the reference template. The reference time series is set to R = {r1, r2,…,ri,…,rL1}, and the test sample is set to T = {t1,t2,…,tj,…,tL2}. The joint angle values of time i and j are represented by ri and tj, respectively; L1 and L2 represent the vector length. If the vectors R and T are nonlinearly matched, the cumulative distance moment D(i,j) can be described as follows:where d(ri,tj) represents the distance function between ri and tj and D(i,j − 1), D(i − 1,j), and D(i − 1,j − 1) are the cumulative distance matrix elements.

To make the points ri and tj on the time series have different joint angle Y-axis values, a three-dimensional vector needs to be constructed based on the points ri and tj to redefine d(ri,tj) instead of the original Euclidean distance, that is, . The following formula describes the first derivative of the reference sequence and the second derivative of the reference sequence in turn:

In the above formula, ri − 1 represents the joint angle value at the i − 1th time point and ri + 1 represents the joint angle value at the i + 1th time point. Since the construction of the above vector is conducive to the accuracy of the mapping, d(ri,tj) can be defined as follows:where represents the first derivative value of the joint angle of the test sample sequence, represents the second derivative value of the joint angle of the test sample sequence, and , respectively, represent the weight of the shortest distance adjustment of the joint angle value and the adjustment of the joint angle, the shortest distance weight of the first-order derivative value, and the shortest distance weight of the second-order derivative value of the adjusted joint angle.

According to formula (8), there are motion template feature matrix AR = {R1,…,Rk,…,RM}T and the sample to be tested AT = {T1,…,Tk,…,TM}T, and if Dk is Rk and Tk, the distance between AR and AT can be described as follows:where Dk represents the improved distance between samples, k represents the number of motion features with improved distance, and DM is the improved distance between AR and AT when the number of motion features is M.

The expected distance value is calculated as follows:where is the weight value of the desired distance. Given a martial arts action image test sample, the martial arts action corresponding to the template with the minimum expected distance ED is the recognition result.

In the above formula, C represents a known template in the reference library.

In summary, the feature vector of the martial arts action features in the video image is extracted by accumulating edge images, and then, the martial arts action time sequence is calculated using the dynamic time warping theory. After the martial arts action to be recognized is matched with the reference time sequence sample, the process of decomposition and recognition of martial arts movements is completed.

4. Simulation Experiment Results and Analysis

In order to accurately decompose martial arts movements, this paper uses a dynamic time warping method based on accumulated edge images to identify martial arts movements, and the feasibility of this method is verified by simulation experiments [30].

Experiment 1. Two martial arts action images are given as experimental objects in the article. In order to effectively extract the target contour from the image, the morphological operation and active contour model method in the article are used to extract the target contour of the image. The specific image processing effect is shown in Figures 16.
From the first set of images (Figures 1 to 3), it can be seen that Figure 1 is the original image, and Figure 2 is the effect diagram after contour extraction of the image using the morphological operation in the text. By observing the morphological operation in Figure 2, the first after the image is transformed into a binary image, and the outline of the martial arts action is extracted. The edge image of the action can be clearly identified from Figure 2; after comparing Figure 2 with Figure 3, it can be seen that Figure 3 uses the active contour model. The method realizes the extraction of the outline of the martial arts action, but this method is not accurate in the extraction of the outline edge, and the outline of the action cannot be clearly recognized.
The images of the second set of experiments (Figures 46) can also prove that when using the morphological operation in this paper to process the image, it can effectively extract the contour edges of martial arts movements, indicating that the morphological operation in the text is an effective method to extract the contour edges of the image.
The experiment gives 5 groups of images and uses the morphological operation and the active contour model method in this paper to realize the contour extraction processing of the 5 groups of images and compares the time consumption and sharpness of the image contour extraction. The specific data are shown in Table 1.
By observing Table 1, we can see that using the morphological operation and the active wheel model method in this paper to achieve contour edge feature extraction on 5 groups of images with a given number of images, the method in this paper is used to process the 5 groups of images, and the average image contour extraction time is 1.2 s; while using the active contour model method to process the image, the average image contour extraction time is serious, which is more than 10 s. Comparing the image output definition, it can be seen that the image definition after the morphological operation in this paper is obviously much higher than that after the active contour model method, which shows that the performance of the morphological operation in this paper is superior.

Experiment 2. The experiment gives a set of different martial arts action sample sets, each set of sample sets including 4 actions. By using this article’s dynamic time warping method and sports history image recognition method to perform action recognition on the martial arts action samples, the two methods are compared to recognize actions. The specific data are shown in Table 2.
Through the use of this article’s dynamic time warping method and sports history image method to recognize the actions of martial arts action concentration, as can be seen from Table 2, the success rate is less than 50%, which shows that the dynamic time warping method in this paper can effectively identify martial arts decomposition actions.

5. Conclusion

Image processing and computer vision feature recognition methods are combined to decompose martial arts actions, and a method of decomposing and recognizing human martial arts actions is proposed based on image recognition. Firstly, through the morphological gradient operation, most of the noise background can be eliminated, and then, the contour edge of the human body is obtained. The contour edge of each frame of the video is extracted and accumulated in the same image. The accumulated edge image is used to calculate the grid-based HOG, and the image action feature vector is obtained; secondly, using the improved dynamic time warping theory combined with the characteristics of the angle change of each joint under the action time sequence, the similarity of the joint change sequence among various martial arts movements can be identified, and then, the classifier is designed and directed. It inputs the time-varying feature data of human actions in images so as to realize the decomposition process of martial arts actions based on image recognition. The experimental results show that the use of image recognition can effectively decompose martial arts movements.

Data Availability

The dataset can be accessed upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.