A Study on the Optimization Simulation of Big Data Video Image Keyframes in Motion Models

Guo, Jianbang; Sun, Peng; Tsai, Sang-Bing

doi:https://doi.org/10.1155/2022/2508174

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Analysis of Results Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Innovative Artificial Intelligence-Based Internet of Things for Smart Cities and Smart Homes

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 2508174 | https://doi.org/10.1155/2022/2508174

A Study on the Optimization Simulation of Big Data Video Image Keyframes in Motion Models

Jianbang Guo,¹Peng Sun,²and Sang-Bing Tsai³

Academic Editor: Chao-Yang Lee

Received11 Jan 2022

Revised20 Feb 2022

Accepted25 Feb 2022

Published16 Mar 2022

Abstract

In this paper, the signal of athletic sports video image frames is processed and studied according to the technology of big data. The sports video image-multiprocessing technology achieves interference-free research and analysis of sports technology and can meet multiple visual needs of sports technology analysis and evaluation through key technologies such as split-screen synchronous comparison, superimposed synchronous comparison, and video trajectory tracking. The sports video image-processing technology realizes the rapid extraction of key technical parameters of the sports scene, the panoramic map technology of sports video images, the split-lane calibration technology, and the development of special video image analysis software that is innovative in the field of athletics research. An image-blending approach is proposed to alleviate the problem of simple and complex background data imbalance, while enhancing the generalization ability of the network trained using small-scale datasets. Local detail features of the target are introduced in the online-tracking process by an efficient block-filter network. Moreover, online hard-sample learning is utilized to avoid the interference of similar objects to the tracker, thus improving the overall tracking performance. For the feature extraction problem of fuzzy videos, this paper proposes a fuzzy kernel extraction scheme based on the low-rank theory. The scheme fuses multiple fuzzy kernels of keyframe images by low-rank decomposition and then deblurs the video. Next, a double-detection mechanism is used to detect tampering points on the blurred video frames. Finally, the video-tampering points are located, and the specific way of video tampering is determined. Experiments on two public video databases and self-recorded videos show that the method is robust in fuzzy video forgery detection, and the efficiency of fuzzy video detection is improved compared to traditional video forgery detection methods.

1. Introduction

Vision is one of the most important ways for humans to perceive information about the external world. By imitating the human visual perception system, humans have created various imaging tools (e.g., cameras, depth sensors, and surveillance cameras) to obtain video image data, and these imaging devices give machines the ability to perceive the external world. These imaging devices give machines the ability to perceive the external world, while how to further analyze and understand the acquired images and video data needs to be implemented by computer vision-related algorithms [1]. Computer vision combines applied mathematics and statistics, digital signal processing, and other related theoretical foundations to analyze the image and video data acquired by visual imaging devices to achieve machine understanding of the objective world. The research of computer vision is of great importance for the realization of machine intelligence. As an indispensable and important part of computer vision, visual target tracking not only has received great attention in the academic field but also has a wide application prospect in the fields of national defense and military, transportation, video surveillance, human-computer interaction, and automatic driving [2]. On the other hand, visual target tracking techniques can be used to localize and describe the motion trajectory of targets in videos and are the basis for higher-level tasks such as video understanding and behavior recognition. The application scenario of target tracking requires that the tracking algorithm needs to be real-time in nature; otherwise, the algorithm would have no practical application value. In addition, the accuracy and robustness of tracking are also important metrics for tracking algorithms [3]. However, due to the diversity and complexity of realistic scenarios, designing a robust real-time target tracking algorithm has been an extremely challenging task. The main challenges of visual target tracking come from two aspects. Haar is a feature description that has evolved over time from the three simple features of Haar Basic to Haar-Like and now Haar Extended, where the feature template contains both white and black rectangles and defines the feature value of the template as the sum of the white rectangle pixels minus the sum of the black rectangle pixels. The Haar feature values reflect the greyscale variation of the image. Variation of the tracking target is as follows: scale change, nonrigid deformation, and fast motion. Although target tracking algorithms have been greatly developed in recent years and most tracking algorithms can cope with limited scenes and specific objects, there is still room for improvement in complex scenarios.

All this shows that video coding and transmission technology are constantly developing and progressing. Video information is technically required to face a wide variety of demands in practical applications, which puts forward higher requirements for video coding technology, and this indicates that video coding technology will usher in new opportunities and challenges. Current video compression techniques are based on hybrid coding frameworks to improve coding efficiency through motion compensation, predictive coding, transform quantization, and entropy coding techniques [4]. The convergence of acquisition, computation, and cognitive techniques has made intelligent coding possible. Thus, there is now also intelligent coding based on neural networks that apply intelligent techniques to traditional coding frameworks, and there is also research being conducted on feature-based coding, i.e., texture feature cocoding. Facing more complex scenes and more demands for compression of video information data, the improvement of traditional coding techniques and the introduction of new coding techniques are also hot topics of research. In addition, the video can generally achieve acceptable quality after compression by standard techniques, but in the face of complex video content, flexible video scenes, real-time dissemination, and other complex situations cannot guarantee good video quality, which requires the use of other video coding optimization and control techniques. Among them, rate-distortion optimization and bit rate control techniques are very effective optimization and control techniques. Distortion optimization is to weigh the coding consuming bits as well as information distortion, expecting the least number of bits to be consumed with limited distortion [5]. The most intuitive role in encoders is to guide the encoder to select the optimal coding parameters from multiple coding candidate configurations according to a specific strategy to achieve optimal coding performance. The code rate control technique is the study of setting the appropriate quantization parameters for the coded image group, the coded image, and even the coding unit for various specific code rate requirements under the prerequisite conditions so that the coded code rate can conform to the initially set code rate while ensuring that the output video image quality is as stable as possible. The coding optimization technique is not part of the coding standard, but in the actual application scenario, the coding content is very complex, and different users have different needs, so the coding optimization technique is crucial. There are currently two main categories of methods for face detection: knowledge-based and statistics-based. Knowledge-based methods are as follows: mainly use a priori knowledge to view faces as a combination of organ features and detect faces based on the features of organs such as eyes, eyebrows, mouth, and nose and the geometric position relationship between them. The main methods include template matching, face features, shape and edge, texture features, and color features. Statistical-based methods are as follows: The face is regarded as a whole pattern—a two-dimensional pixel matrix—and the face pattern space is constructed from a statistical point of view through a large number of face image samples to determine whether a face exists based on a similarity measure. The main methods include principal component analysis and feature faces, neural network methods, support vector machines, hidden Markov models, and AdaBoost algorithms.

More detailed and comprehensive game information is provided to the sports team at the game site or after the game, to achieve the purpose of scientific research and monitoring to study and evaluate the individual athletes’ technical and tactical performance and team technical and tactical ability and to improve the athletes’ technical and tactical level and team athletic ability. This is one of the most common and frequently used methods and means of video image-multiprocessing technology in sports training practice. At present, video image-processing technology has been widely used in any of the competitive sports teams in China and has also been supported and recognized by most coaches, athletes, and scientific research staff, which has played a very important role in helping and guaranteeing the improvement of athletes’ technical and tactical ability and level. In the face of a large number of sports video image-multiprocessing technology means and forms, it is very necessary to conduct a more scientific and reasonable in-depth research and summary of the sports video image-processing technology system, to better serve the scientific sports practice in the field of sports in China, to use sports video image-processing technology to better guide and evaluate the technical and tactical ability of athletes, scientific monitoring and evaluation of modern sports science training and competition practice activities, and at the same time help athletes and coaches to improve their technical and tactical analysis and research ability, improve their ability to monitor and evaluate technical and tactical skills, and better improve the role and benefits of sports technical video images in the scientific process of modern sports training.

2. Current Status of Research

Video image-processing technology in a broad sense refers to various technologies related to video image processing in general, including the physical processing technology of video images and related video image-processing software and hardware development and applications [6]. At present, people mainly study digital video image-processing technology, and the main application is modern computer technology and video image-multiprocessing technology [7]. This includes a complete, orderly and tightly organized, and programmed set of systematic work using computers and other electronic devices, such as video image acquisition, video compression and coding, video clip decomposition and synthesis, video image format conversion and unification, video annotation and identification, video storage and transportation, image nonlinear editing and generation, video image display and output, video image transformation and enhancement, recovery (restoration) and reconstruction of video images, segmentation of video images and detection of targets, representation and description of video images, extraction and measurement of video image feature frames, correction and translation of video image sequences, reconstruction and restoration of 3D scenes, development and creation of video image databases, indexing and extraction of video image databases, classification representation and recognition of video images, video image model building and matching, interpretation and understanding of video image transitions scenes, and judgmental decision making and behavior planning of video images. An outline can be described as a curve that combines all consecutive points with the same color or intensity. They show the shape of the objects contained in the picture. Contour detection may be a useful technique for shape analysis and object detection and recognition. Contour detection is not the only algorithm for image segmentation; there are many others such as semantic segmentation, the Hough transform, and -means segmentation, which are currently state-of-the-art. As one of the most promising and active research directions in the field of computer vision at home and abroad, video human behavior recognition, with the in-depth development of artificial intelligence and computers, has been able to initially realize the use of key nodes, images, videos, and other data to identify human behavior [8]. Computer vision is a key AI research area, and researchers are making increased contributions to it. In recent years, major internet companies, as well as universities, have paid close attention to the development trend of human behavior recognition technology in the international frontier, mainly focusing on three directions: dynamic region detection, modeling, and classification recognition [9].

One of the research directions that is becoming of increasing importance and interest in the field of computer vision is video stream-based behavior analysis and understanding [10]. The core research focus is on the use of visual pattern recognition image signal processing techniques and other related techniques to perform processing of video stream target sequences for target detection, target classification, target tracking, and thus the behavioral analysis of human behavior in video surveillance regions [11]. The basis of video surveillance image-processing systems is target detection, for which many more mature detection algorithms are available and which is a low-level computer vision problem [12]. Target classification must accurately classify moving objects in the scene so that moving objects in the scene can be further tracked and analyzed. And one of the main bottlenecks that currently restrict the development of video surveillance image-processing systems is target tracking, which is one of the most basic functions in video surveillance image-processing systems [13]. And one of the research hotspots that has been widely focused on in recent years is how to analyze and identify human behavior, which focuses on how to analyze and identify human behavioral motion patterns, which can be regarded as the problem of classifying and matching time-varying motion data, i.e., matching test sequences with precalibrated reference data sequences of standard operations [14].

This method focuses on two adjacent image frames and performs a differential operation between consecutive frames to obtain a feature representation of human behavior. The advantage of this approach is that it preserves the temporal features of human behavior in the video, but this depends to some extent on the manual segmentation of human contours, is sensitive to color, lighting, contrast, and occlusion issues, and is only applicable to video scenes with limited space. Video behavior recognition algorithms based on local features of motion behavior do not require presegmentation processing of video images. Common local features include edges, corners, curves, and regions with special properties. The step length and speed of a walk can change continuously. The step length of a walk can change from time to time, some steps can be short and some can belong, and the speed can change from slow to fast or fast to slow and can be done in different background environments. Thus, there are many types of behavioral expressions and many variations of each behavior, which pose some problems for research in human behavior recognition.

3. Analysis of Frame Signal Processing for Athletics’ Video Images with Big Data

3.1. Big Data Processing Analysis of Athletics’ Video Images

Current digital multimedia forensic techniques can be divided into two main categories: active forensics and passive forensics. Active forensics refers to the use of information hiding technology, which embeds imperceptible verification information, such as digital watermark, digital signature, and perceptual hashing, in the process of image or video generation in advance [15]. The party receiving the image or video determines the reliability of the received data by analyzing whether this embedded information has been corrupted. Active forensic techniques have good detection effects but, at the same time, have significant limitations in practical applications; many imaging devices cannot embed signals such as watermarks, and the embedded information may be removed or modified and reembedded. Passive forensics, also known as blind forensics, does not require information to be embedded in the multimedia data in advance as inactive forensics. Passive forensics directly analyzes the information of the received multimedia data itself, because multimedia data will inevitably leave some traces of tampering in the process of tampering, although these traces cannot be detected by the eye; however, extracting certain statistical features of itself can verify these traces. Based on such a principle, passive forensic techniques can detect the authenticity of a video based on whether the encoding features, statistical features, or other characteristics of the video change before and after tampering [16–18].

Digital multimedia forensic techniques can be divided into two main categories: digital image forensics and digital video forensics. Among them, the technology of digital image forensics has been developed earlier and is becoming increasingly mature. And digital video forensics technology requires higher computational complexity of algorithms for digital video-tampering forensics because of the large amount of video data. There are also more ways of video tampering than image tampering and complex video codecs, which makes the complexity of digital video-tampering detection greater. Nowadays, digital video forensics is also being developed gradually. Video tampering refers to the use of video editing software, and the content of the video is maliciously modified to achieve the effect of disguise or “create something out of nothing.” Digital video forensics is the use of relevant video-tampering detection algorithms to verify the integrity and authenticity of a video, as shown in Figure 1.

In the practical application of video image-processing technology for research work, the researcher generally does not need to have a very deep understanding and mastery of complex video image-processing processes, formulas, and principles; especially, the field of track and field scientific research is a mainly innovative application of software and hardware systems and equipment related to video image-processing technology, to achieve the purpose of improving the athletic ability of outstanding athletes, to achieve for the field of track and field scientific research within the purpose of scientific and practical work. The video image-processing technology in a narrow sense only refers to the general term for the technology of direct processing and processing of video images. AdaBoost stands for “Adaptive Boosting,” which is adaptive in the sense that the weights of the incorrectly classified samples of the previous basic classifier are increased, while the weights of the correctly classified samples are decreased and used again to train the next basic classifier. At the same time, a new weak classifier is added in each round of iterations until some predetermined sufficiently small error rate is reached or a prespecified maximum number of iterations is reached before the final strong classifier is determined. It mainly includes the acquisition, compression, decomposition, synthesis, coding, storage, and transportation of video images, and the display and output of video images, and also the transformation, enhancement, recovery, and reconstruction of video images, the segmentation of video images, the detection, expression, and description of targets, the extraction and measurement of features, the correction of sequence images, the reconstruction and recovery of 3D scenery, and the establishment and classification of video image databases. These are all processing multiple techniques for processing the physical scenes of video images to achieve the corresponding work purpose. The convolutional layer consists of a set of convolutional kernels (each neuron as a kernel). These kernels are associated with a small region of the image, which is called the receptive field. It works by segmenting the image into small pieces (receptive fields) and convolving them with a specific set of weights (multiplying the elements of the filter (weights) with the corresponding receptive field elements).

Once features are extracted, only their approximate positions relative to other features need to be retained, and their exact positions become less important. Pooling layers can summarize similar information within a neighborhood of the perceptual field and output the dominant response within that region. The use of pooling operations helps to extract features that are invariant to translational shifts and small distortions. In addition, pooling can also help improve generalization by reducing overfitting. In addition, reducing the size of the feature map can also regulate the complexity of the network.

It is a decision function that helps in learning complex patterns. Choosing the right activation function can speed up the learning process. The activation function for the convolutional feature map is defined.

For CNN feature learning of single-frame images, today’s behavior recognition methods are generally based on image 2D convolutional neural networks, which tend to ignore the information connections between consecutive frames. As a result, a lot of information is lost in the action-processing aspect inside the video [17]. Therefore, the full utilization of 3D convolutional networks has become one of the important directions for behavior recognition research. Based on 3D convolutional feature extraction on a 3D cube composed of consecutive video frames, 3D convolutional networks can capture the feature information of video images in both spatial and temporal dimensions, and more importantly, the speed of network operation, which benefits from the one-time processing of multiple frames, is largely improved. However, 3D convolutional networks are not very accurate, require too much hardware processor, and are relatively less cost-effective, so the dual-stream network idea is also utilized in the current state-of-the-art recognition methods using 3D convolutional networks, which are fully utilized for optical stream images from the point of view of performance enhancement for behavioral recognition methods, as shown in Figure 2.

Performing the modeling of temporal features in terms of human behavior in the video is achieved by making full use of the temporal correlation between adjacent frames, which uses recurrent neural networks as an important underlying condition. However, the recognition accuracy in this regard is still some distance away from the expected value. The keyframe is the frame where the key action in the movement or change of the object is located and is the most valuable information in the video that is represented most intuitively, representing the maximum complete representation of all the information contained in the video based on the use of as few video image frames as possible. This is the keyframe extraction technology focused on the research direction that is also the principle of keyframe extraction.

The keyframe extraction method based on shot detection is the more classical keyframe extraction method; the main idea of the method is to extract the first and last frames or middle frames in each shot after segmentation as the keyframe of that shot after segmenting the video sequence into multiple shots. The method is extremely computationally small and simple to implement, but the number of extracted keyframes is determined. The method wants to use the extracted keyframes to represent the main content of the original shot, which is only applicable to the case where the content of the shot changes little so that the limitation is greater. The first and last frames of the middle frames of each shot do not represent the main content of the shot completely when the visual content has certain changes, and the extracted keyframes lose their meaning because they do not represent the whole video. The method also results in poorly extracted keyframes when the result of the shot boundary detection deviates from the actual shot, indicating that the keyframe extraction method relies too much on the shot boundary detection.

Specifically, the segmented video recognition network framework proposed in this paper is aimed at illustrating how to maximize the use of visual information in the entire video sequence for video-level prediction. The main part of this recognition network consists of two parts: a spatial flow convolutional neural network and a temporal flow convolutional neural network. Instead of operating on individual frames or stacks of frames, the segmented video recognition network performs this operation on a series of short video segments sparsely sampled from the entire video; each short video segment in this sequence generates its preliminary prediction of the behavior category and uses the agreement between the short video segments as a video-level prediction for the entire video. During network training, this dual-stream network updates the model parameters by iteration.

Various coding methods have been introduced in the development of video coding technology, and the methods of data compression based on statistical properties of images are known as predictive coding. This type of method uses the spatial and temporal correlation of the image signal to predict the image currently being encoded using the encoded image information and then encodes and transmits the difference between the predicted value and the true value. The method uses mathematical transformations such as the discrete Fourier transform and discrete cosine transforms, to convert the image described in the spatial domain into coefficients in the transform domain to reduce the amount of data. The combination of different types of coding methods results in the so-called hybrid coding. This has slowly evolved into the current hybrid coding framework. Although there is a tendency for video standards to compete, they all adopt a hybrid coding framework. Background subtraction is one of the more widely used methods in current motion target detection technology. The basic idea is similar to that of interframe differencing, in that it uses the differential operation of different images to extract the target region. However, unlike interframe differencing, background subtraction does not subtract the current frame from the adjacent frames but rather subtracts the current frame from a continuously updated background model to extract the motion target in the differential image.

3.2. Experiments on Frame Signal Processing of Athletic Sports Video Images

Video coding has evolved to the point where encoders are required to perform real-time encoding of ultrahigh-definition videos such as 4 K and 8 K [18]. The maximum coding unit has been extended to a size as the proportion of the area of flat image content increases in high-resolution video image content. The coding standards currently under investigation also support more flexible block division structures to achieve higher coding efficiency. The basic idea of intraframe prediction is to exploit the correlation of adjacent pixels, which in video coding refers to the reconstructed pixels of the encoded blocks around the current block, to effectively remove the video time-domain redundancy, because for an image, the luminance and chromaticity values between two adjacent pixels are often relatively close to each other, that is, the color changes gradually and does not change abruptly into a completely different color at once. Current intraframe prediction uses block-based multidirectional intraframe prediction. There are 33 angular prediction modes defined in H265, plus planar and DC modes, for a total of 35 in-frame prediction modes. The number of angle prediction modes has been increased to 65. In video coding, the angle of the predicted direction is not a geometric angle but is expressed in terms of the number of pixels.

According to the research needs, in the practice of scientific research work, the video image needs to be moved superimposed for comparison, but at present, this video image-processing function is difficult to carry out clear superimposition of moving background video images because the background is cluttered and disorderly, and this technology still needs to be further developed and applied [19, 20]. At the same time, because the accuracy and clarity of modern video image acquisition still need to be improved and developed, the video image is often enlarged after processing, and the image quality and clarity declined, so there is a need to do further development and improvement of the accuracy of the video image enlargement demonstration processing, as shown in Figure 3.

Visual understanding of human behavior is a core capability for building AI systems, and so recognition of behavior in the video is now the focus of most researchers, to classify the input video data containing specific actions. However, this setup does not have much practical application, as real-world videos are usually unedited and most actions will occur accompanied by other actions rather than alone. As a result, recent researchers have gradually turned their attention to the task of detecting temporal behavior in the unedited video, which goes beyond classification to find the boundaries of the behavior performed in unedited video data, i.e., the start and end moments. The development of temporal behavior detection has led to the emergence of many real-world applications, such as highlight moments in sports videos, or more advanced tasks such as the automatic generation of video subtitles.

Ideally, the selection unit can remove the interference of background noise. However, in practical applications, limited by the number of convolutional channels and the size of the training dataset, the features learned offline are insufficient to adapt to arbitrary tracking targets without online learning of target information. Therefore, this paper proposes an adaptive learning method to obtain more discriminative features online. Experiments show that online updating does not bring significant performance gains but rather aggravates the computational burden. The authors of this paper argue that fine-tuning the motion regression part is not reasonable because it mainly predicts motion by comparing two similar features, while the purpose of online updating is to make the changing targets have similar features and suppress the interference of background noise. In addition, it was found in the experiments that updating the parameters of the fully connected layer tends to lead to overfitting due to the large number of parameters contained in the fully connected layer. In the case of known blur kernels, the recovery of blurred images can be transformed into the problem of finding the deconvolution of blurred images and blur kernels, using a nonblind deblurring algorithm such as the Richardson-Lucy algorithm (RL) algorithm to solve clear images in an iterative loop. The RL algorithm is one of the most widely used image-deblurring algorithms, and the method has been improved with continuous improvements and works better for most blurred images. The Poisson noise statistical properties are used to recover low-quality video frame images when the blur kernel is known, as shown in Figure 4.

In this paper, 100 videos with significant jitter, blurring, and poor video effects were downloaded from video websites to judge the video quality based on the image sharpness metric, and these videos were found to be lower than the sharpness metric on the experimental video library. Since the authenticity and reliability of the videos on the video sites were uncertain and could not be used as experimental videos, the videos from the SULFA and OV video libraries were subjected to video blurring using Adobe Premiere Pro CC software. The videos after blurring are close to the clarity metrics of the videos on these websites, so they can be used as low-quality video libraries for experiments [21–23].

The motion search approach is to develop a strategy for finding the best matching block from the starting point of the search. In the beginning, a global search algorithm was used to ensure the accuracy of the matching. The global search predefines a search region, compares the coding unit with all the candidate blocks in the reference frame region, and determines the best matching block according to the motion estimation criterion, and the displacement between these two blocks is estimated. But this method inevitably brings great computational complexity. Therefore, various fast search methods have been investigated. Although the process of these fast algorithms varies, they all mainly are aimed at improving search efficiency by trying to avoid locations that are unlikely to be the best matching blocks.

4. Analysis of Results

4.1. Performance Results of Big Data Graphics-Processing Algorithms

The chunked motion compensation encodes the motion vector of each block on a block-by-block basis. The block-to-block motion vectors show a strong null-domain correlation, and usually, the motion vector of one block shows the same characteristics as the motion vectors of the surrounding blocks. If only the motion information of one block is saved in detail, the information of the adjacent blocks only needs to refer to the information of that block, and the differences between them are recorded, and this reference relationship is saved at the same time to effectively reproduce the original information. To improve the accuracy of motion compensation, the reference frame pixel values are usually interpolated at the subpixel level, when the motion vector is also noninteger. However, there are drawbacks in this way, and the most common one is the block effect, as the chunked motion compensation divides the whole into multiple subblocks, forcibly cutting off the overall continuity, resulting in the boundary parts of each subblock going in different directions due to their different coding methods, breaking the boundary continuity and presenting a square content overall. When the block effect is serious, the decoded image will look like a mosaic effect, which seriously affects the visual quality. Another disadvantage is that when the high-frequency component is large, it will cause a ringing effect. A reasonable secondary coding scheme was designed, reasonable initial quantization parameters were set in the single-trip coding process, and the reference relations of coding blocks as well as bit consumption were counted. A video-tampering detection method with fused audio is proposed for video interframe tampering that is often accompanied by simultaneous audio tampering. First, the ENF signal of the audio is extracted and analyzed to determine the suspicious tampering point of digital audio according to the phase continuity and consistency. Then synchronously locate the location of the suspicious anomalies of video frames, while calculating the video frame similarity using GMSD coarse detection to extract the suspicious anomalies.

Quantization coding is a means of further compressing the data after transform coding and predictive coding, mainly by mapping a signal interval to a signal to reduce the amount of information to be recorded. Also, predictive coding and transform coding did not bring distortion to the image, and quantization as a lossy compression technique is the main source of distortion generated by video coding. To reduce the distortion caused by quantization while maintaining better video quality, reasonable quantization methods need to be designed. The quantization methods are mainly uniform quantization, nonuniform quantization, and adaptive quantization. Among them, uniform quantization is a linear quantization, which is simple and easy to implement, but the quantization effect is not good because the distribution of the quantized objects is not considered. Take the residual signal as an example; most of the residuals are concentrated around 0. At this time, nonuniform quantization can be used to quantify the finer regions with concentrated distribution and the coarse-grained quantization for the more loosely distributed regions. This can get a better quantization effect. This method combines quantization with rate-distortion optimization to choose the least costly quantization value among multiple selectable quantization values by rate-distortion optimization, as shown in Figure 5.

Among the selected test sequences, there are also differences in the performance gains of different sequences, which indicates a certain tendency of the algorithm in this paper. Analysis of the sequence image content reveals that the performance gain achieved is greater in sequences with characteristics such as fixed scenes and flat object motion. This is because some of the image contents in the sequence exist for a long time and are referenced by subsequent images for a long time, and the performance gain of the high reference value coding unit is accumulated in the image propagation chain, so the performance gain is more obvious. In contrast, in sequences where the background changes and the object motion are drastic, the image content with reference value disappears when the propagation process is in progress, thus bringing lower performance gain, as shown in Figure 5.

From the data in Figure 6, the algorithm can achieve a more significant performance gain, with gains of 3.59%, 2.69%, and 2.99% on the , , and components in the GOP length 8 configurations. This indicates that the coding quality of the coding unit can be reasonably improved according to its reference value, which can effectively improve the quality of the reference coding unit and thus achieve overall quality improvement. Moreover, the performance gains achieved with the configuration of GOP length 16 are more significant, with the average performance gains of 7.21%, 12.17%, and 12.06% on the three components of , , and . This indicates that the larger the GOP length, the more significant the hierarchical structure, the stronger the reference dependence between frames and between coding blocks, and therefore, the more performance is brought by the algorithm that corrects the Lagrange multipliers by reference values.

4.2. Experimental Results of Video Graphics Processing for Track and Field Sports

The channel injects bit resources into the decoding buffer at a constant rate, previously referred to as the constant code rate. To accomplish a good code rate control process, the following requirements are imposed on it. Since the first frame of the encoding is usually an I-frame and requires more resources to be allocated, if the allocation is done at the beginning, the resources will be exhausted quickly, so a moderate number of resources will be injected into the buffer now, which is also known as delayed encoding. The buffer size should also be set reasonably and should not be too large or too small. The allocated resources for each image frame are reasonably interfered with to facilitate continuous system operation.

The kinematic parameters such as velocity, acceleration, displacement, angular and angular velocity, rotation, and rotation of different technical movements are obtained through certain calculations and analysis, and then the skeletal muscle biomechanics reproduction software system is used to simulate and stimulate the technical movements of the human body under the domination of skeletal muscle in real situations to achieve the purpose of studying and simulating the skeletal muscle biomechanical movements, as shown in Figure 6.

This requires firstly a large number of 3D video tests and analyses of sports technical movements to establish a database of skeletal muscle-dominated technical movement forms, then the corresponding technical movement analyses to reproduce video images of skeletal muscle-dominated sports technical movements, and also the inverse derivation of human skeletal muscle or joint force characteristics from the kinematic parameters of different kinematic parameters of skeletal muscle involved in the movement kinetic tests and studies for a more scientific and rational study and analysis of sports technical movements. Figure 7 shows the simulation and reproduction of the skeletal muscle dynamics of the volleyball snapping technical action and the human walking action, as well as the derivation and analysis curves of the inverse dynamics of the action of the sole on the ground.

Using athletes’ individual sports technology panorama, you can combine two athletes’ technology panorama, synthesize a technology picture using image-processing software, and align the key images of the athletes’ same time phase for the position, forming a panorama technology comparison picture including two athletes. It is convenient for coaches and athletes to make targeted observations and comparative analyses to find the technical gaps in different key positions, which can be studied and imitated intuitively to have the most intuitive improvement of individual technical movements. For such pictures, more advanced image-processing software can be used to fit the background, which can be compared more clearly in the same background, and such background removal and fitting of video image-processing software have now appeared, but the processing is slightly cumbersome and requires more simple and effective processing, as shown in Figure 7.

A new stable and effective hierarchical feature network was proposed to accurately detect behaviors and their temporal coordinates in uncut videos. The network is divided into two parts: a behavior classification network and a coordinate regression network. The behavior classification network is mainly used for video behavior determination, predicting the motion score for each frame and generating initial proposals based on the score distribution. The coordinate regression network, on the other hand, uses coarse granularity to divide the initial proposals into cell levels, achieves fast computation by recycling cell-level features, and uses temporal coordinate regression to refine the boundaries of the proposed regions eventually to stable and accurate motion boundaries.

5. Conclusion

Starting from the working principle of code rate control, several code rate control methods and their respective applicable scenarios are introduced. The advantages of quadratic coding are analyzed, and a code rate control algorithm based on quadratic coding is proposed. With the help of decoding buffers, the code rate allocation errors that occur in the real-time coding process are analyzed. The work of code rate control is converted to eliminate the code rate allocation error. A reasonable secondary coding scheme is designed, reasonable initial quantization parameters in the single-trip coding process are set, and the reference relations of coding blocks as well as bit consumption are counted. A video-tampering detection method with fused audio is proposed for video interframe tampering that is often accompanied by simultaneous audio tampering. First, the ENF signal of the audio is extracted and analyzed to determine the suspicious tampering point of digital audio according to the phase continuity and consistency. Then synchronously locate the location of the suspicious anomalies of video frames, while calculating the video frame similarity using GMSD coarse detection to extract the suspicious anomalies. Therefore, for the offline training process, this paper borrows from the data augmentation methods in other fields of computer vision and proposes to generate many complex background images with semantic information using a limited training set by linearly overlaying different images, which enhances the discriminative ability of different motions. The online multiscale block-filter network reorders the improved output and outputs an online confidence score for each similar object. The online multiscale block-filter network utilizes online negative sample mining for online learning, which mitigates tracking drift caused by similar interfering objects. The effectiveness of the proposed algorithm is verified by conducting extensive comparison experiments on authoritative datasets in the tracking domain.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

No competing interests exist concerning this study.

References

H. Ba, “Medical sports rehabilitation deep learning system of sports injury based on MRI image analysis,” Journal of Medical Imaging and Health Informatics, vol. 10, no. 5, pp. 1091–1097, 2020.
View at: Publisher Site | Google Scholar
Y. Zhang, M. Zhang, Y. Cui, and D. Zhang, “Detection and tracking of human track and field motion targets based on deep learning,” Multimedia Tools and Applications, vol. 79, no. 13-14, pp. 9543–9563, 2020.
View at: Publisher Site | Google Scholar
E. M. Saoudi and S. Jai-Andaloussi, “A distributed content-based video retrieval system for large datasets,” Journal of Big Data, vol. 8, no. 1, pp. 1–26, 2021.
View at: Publisher Site | Google Scholar
C. Cuevas, D. Quilon, and N. García, “Techniques and applications for soccer video analysis: a survey,” Multimedia Tools and Applications, vol. 79, no. 39, pp. 29685–29721, 2020.
View at: Publisher Site | Google Scholar
O. Elharrouss, N. Almaadeed, S. Al-Maadeed, A. Bouridane, and A. Beghdadi, “A combined multiple action recognition and summarization for surveillance video sequences,” Applied Intelligence, vol. 51, no. 2, pp. 690–712, 2021.
View at: Publisher Site | Google Scholar
A. A. Khan, J. Shao, W. Ali, and S. Tumrani, “Content-aware summarization of broadcast sports videos: an audio–visual feature extraction approach,” Neural Processing Letters, vol. 52, no. 3, pp. 1945–1968, 2020.
View at: Publisher Site | Google Scholar
F. A. Khan, M. Nawaz, M. Imran, A. U. Rahman, and F. Qayum, “Foreground detection using motion histogram threshold algorithm in high-resolution large datasets,” Multimedia Systems, vol. 27, no. 4, pp. 667–678, 2021.
View at: Publisher Site | Google Scholar
B. Li and X. Xu, “Application of artificial intelligence in basketball sport,” Journal of Education, Health and Sport, vol. 11, no. 7, pp. 54–67, 2021.
View at: Publisher Site | Google Scholar
K. Rangasamy, M. A. As’ari, N. A. Rahmad, N. F. Ghazali, and S. Ismail, “Deep learning in sport video analysis: a review,” Telkomnika, vol. 18, no. 4, pp. 1926–1933, 2020.
View at: Publisher Site | Google Scholar
J. W. Yang, “Target tracking and recognition of a moving video image based on convolution feature selection,” International Journal of Biometrics, vol. 13, no. 2-3, pp. 180–194, 2021.
View at: Publisher Site | Google Scholar
S. Agrawal and P. Natu, “Segmentation of moving objects using numerous background subtraction methods for surveillance applications,” International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 9, no. 3, pp. 2553–2563, 2020.
View at: Publisher Site | Google Scholar
A. D. Smith, “Event detection in educational records: an application of big data approaches,” International Journal of Business and Systems Research, vol. 15, no. 3, pp. 271–291, 2021.
View at: Publisher Site | Google Scholar
C. Guntuboina, A. Porwal, P. Jain, and H. Shingrakhia, “Deep learning based automated sports video summarization using YOLO,” Electronic Letters on Computer Vision and Image Analysis, vol. 20, no. 1, pp. 99–116, 2021.
View at: Publisher Site | Google Scholar
S. Asadianfam, M. Shamsi, and A. R. Kenari, “TVD-MRDL: traffic violation detection system using MapReduce-based deep learning for large-scale data,” Multimedia Tools and Applications, vol. 80, no. 2, pp. 2489–2516, 2021.
View at: Publisher Site | Google Scholar
S. Banerjee, H. H. Chopp, J. G. Serra, H. T. Yang, O. Cossairt, and A. K. Katsaggelos, “An adaptive video acquisition scheme for object tracking and its performance optimization,” IEEE Sensors Journal, vol. 21, no. 15, pp. 17227–17243, 2021.
View at: Publisher Site | Google Scholar
T. Dekel and N. Snavely, “Unveiling unexpected training data in internet video,” Communications of the ACM, vol. 64, no. 8, pp. 69–79, 2021.
View at: Publisher Site | Google Scholar
T. Grubljesic, P. S. Coelho, and J. Jaklic, “The shift to socio-organizational drivers of business intelligence and analytics acceptance,” Journal of Organizational and End User Computing, vol. 31, no. 2, pp. 37–64, 2019.
View at: Publisher Site | Google Scholar
L. X. Z. Zhang, M. Mouritsen, and J. R. Miller, “Role of perceived value in acceptance of “bring your own device” policy,” Journal of Organizational and End User Computing (JOEUC), vol. 31, no. 2, pp. 65–82, 2019.
View at: Publisher Site | Google Scholar
D. R. Beddiar, B. Nini, M. Sabokrou, and A. Hadid, “Vision-based human activity recognition: a survey,” Multimedia Tools and Applications, vol. 79, no. 41, pp. 30509–30555, 2020.
View at: Publisher Site | Google Scholar
T. Li, J. Sun, and L. Wang, “An intelligent optimization method of motion management system based on BP neural network,” Neural Computing and Applications, vol. 33, no. 2, pp. 707–722, 2021.
View at: Publisher Site | Google Scholar
Y. Hu, “Image segmentation based on velocity feature vector for moving target extraction,” IEEE Sensors Journal, vol. 20, no. 20, pp. 11983–11991, 2020.
View at: Publisher Site | Google Scholar
A. Shahri, M. Hosseini, K. Phalp, J. Taylor, and R. Ali, “How to engineer gamification: the consensus, the best practice and the grey areas,” Journal of Organizational and End User Computing (JOEUC), vol. 31, no. 1, pp. 39–60, 2019.
View at: Publisher Site | Google Scholar
L. Sun, Y. Li, College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China, and University Medical Center Groningen, University of Groningen, Groningen 9713 GZ, Netherlands, “Multi-target pig tracking algorithm based on joint probability data association and particle filter,” International Journal of Agricultural and Biological Engineering, vol. 14, no. 3, pp. 199–207, 2021.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Jianbang Guo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

315

Downloads

415

Citations