Abstract

Large-screen human-computer interaction technology is reflected in all aspects of daily life. The dynamic gesture tracking algorithm commonly used in recent large-screen interactive technologies demonstrates compelling results but suffers from accuracy and real-time problems. This paper systematically addresses these issues by a switching federated filter method that combines particle filtering and Mean Shifting algorithms based on a 3D sensor. Compared with several algorithms, the results show that the one-hand and two-hand large-screen gesture tracking based on the switched federated filtering algorithm work well, and there is no tracking failure and loss of target. Therefore, the switching federated tracking positioning algorithm can be well applied to the design of human-computer interaction system and provide new ideas for future human-computer interaction.

1. Introduction

The definition of human-computer interaction technology is based on a certain software program through the corresponding input and output devices, and the organic combination of computer and human operations is employed to realize the related technology of interactive communication [117]. The current large-screen human-computer interaction technology is mainly divided into the following categories: (1) infrared frame touch interaction; (2) capacitive touch-based electronic screen touch interaction; (3) speech recognition interaction; (4) computer vision interaction. These types of technologies have their own advantages and disadvantages in terms of cost, accuracy, interactivity, etc. [37]. Touch interactions based on infrared frame and capacitive touch-based electronic screens are all based on interactive hardware fixed at a certain place. Therefore, not only are such man-machine interaction operations costly, but also the range and accuracy of the interaction are affected by the hardware constraints [48]. Also, interactions based on speech recognition are often limited by complex noisy environments which are mostly used in the small-scale civil-computer interaction field and therefore have a smaller application area. The human-machine interaction system based on machine vision is mainly based on sensors and other devices to collect images or signals and then completes the signal preprocessing operations on the image signals collected by the sensors or digital signals through an independently designed human-computer interaction software. The targets to be tracked are segmented from the background, and a series of operations such as target tracking and motion recognition are used to complete the corresponding human-computer interaction [4, 711]. Therefore, the cost of human-computer interaction system designed based on computer vision is relatively low, and the interaction effect can be well obtained.

Hand gesture recognition methods based on computer vision are mainly divided into two categories: one is an analysis method based on a 3D model, and the other is based on a 2D image. The analysis method with the 3D model needs to establish a parametric model that describes gestures. Because it can provide 3D data, a more accurate gesture model can be established. However, the method has many parameters and high computational complexity, and it is difficult to achieve real-time results in current technologies. Also, with the two-dimensional image method, it is mainly to analyze the performance of the image and extract the effective hand features for identification. Because of the loss of the third 3D space information, the gesture model cannot be effectively established for the described features. Besides, its parameters are less, so it can meet the requirements of real-time processing. These two methods can hardly guarantee the balance between system parameters and real-time performance.

This paper proposes a large-screen interactive imaging system with switching federated filter method based on 3D sensor and validates the tracking results with the independently developed gesture position tracking platform. The main innovations of this paper are as follows: (1) An improved switching federated filter algorithm combining particle filtering and Mean Shift is introduced into the field of large-screen gesture tracking to track dynamic gestures; (2) the self-developed gesture tracking platform and 3D interactive software are combined to observe the gesture tracking effects; (3) the large-screen single-hand and two-hand gesture tracking based on the switching federated filtering algorithm work well and there is no phenomenon of tracking failure and losing the target.

The structure of this paper is as follows: An improved switching federated filter algorithm combining particle filtering and Mean Shift is employed to track and locate the dynamic gestures in Section 2. Section 3 presents the dynamic gesture tracking and positioning experiment verification based on switching federated filter method. Finally, conclusion is given in Section 4.

2. Dynamic Gesture Tracking Algorithm with Switching Federated Filter

Common gesture tracking algorithms cannot handle the dynamic gesture tracking problem in complex environments. This paper has developed an improved switching federated filter algorithm that combines particle filtering and Mean Shift algorithm. In the case of slow movement of the human hand, after the uniform displacement of the particles, the particles drift to the dynamic gesture area except for a small number of particles. The gesture position can then be obtained without subsequent prediction of the particles, so that the running time of algorithm can be saved. The average value of the particles will drift when the movement of the hand is fast or there is occlusion. If the region of the gesture cannot be detected, the particle will return to the condition before the drift, waiting for the algorithm to perform corresponding processing on the next frame. Therefore, how to select the corresponding filtering algorithm under different conditions is the key problem. In this paper, the switching system scheme is introduced into the filter for the first time, and applied to the large-screen interactive imaging system.

A switching system consists of a series of sequential or discrete differential equations subsystems and switching rules or a switching strategy [4, 1834]. If the entire filtering process is regarded as a hybrid dynamic system, then each filter algorithm can be regarded as a subsystem of the system. The proposed system is constructed by coupling the switching system with arbitrary switching rules and using two subsystems composed of arbitrary switching filter system. In the case when the movement of the human hands is fast, the filter subsystems of particle filter algorithm will be chosen; when the human hand moves slowly, the filter subsystems of Mean Shift algorithm will be selected. The meaning of the switching signal iswhere denotes that the -th subsystem is switched on at and the -th subsystem is switched off at . Thus, in finite time, the switching sequence is finite, and there exists no state transition during the switching moment, where is the initial time and is the -th switching time. When , the trajectory of the switched nonlinear system is produced by the -th subsystem, defining , as the minimum interval time of the -th subsystem.

The current common particle filter implementation framework is based on resampling and sequential importance sampling, which can be called sampling importance resampling filter [3537]. Therefore, it mainly consists of three basic operation steps to form an iterative cycle as follows:

Step 1 (sampling). Based on the Bayesian posteriori estimation and state transfer equation at the previous moment to achieve the purpose of updating the particle state, the predicted distribution (i.e., prior distribution) can be achieved.

Step 2 (weight update). Based on the latest observed information , employ the likelihood function to perform a corresponding weight update operation for the weights:where represents the weight of the particle, represents the importance distribution function, and represents the posterior probability density.

Step 3 (resampling). Based on the principle of identity distribution, the resampling operation is performed on the updated particle combinations after the weights update operation is completed. A new set of particles with most of the particle weights can then be obtained. The number of times that particles weights with nonbiased weight values are sampled is expected to be

Meanwhile, the Mean Shift algorithm as the other filter subsystems is a process that uses nonparametric density estimation, which was used to perform iterative search based on feature spatial gradient directions and then obtain sample data with local density maximum [1214]. Compared with other tracking methods based on optimized matching search, the advantage of the Mean Shift filter algorithm is that the method does not need to know the characteristics of the feature space in advance; the only requirement is the relevant sample data. The filtering is performed according to these sample points. Therefore, this kind of algorithm has less computational complexity. The implementation of the algorithm is not difficult, and the algorithm has strong real-time performance to meet the systems with high real-time requirements. In the Mean Shift algorithm, the most critical thing is to calculate the offset mean for each point and then update the position of the point based on the newly calculated offset mean.

For n sample points , in a given d-dimensional space , the basic form of the Mean Shift vector for point x is

where is a high-dimensional sphere with a radius of h. is defined as

The form of Mean Shift has a fundamental problem: in the region of ,, each point has the same effect on . In fact, this effect is related to the distance between and each point. The importance of different samples is different. Aiming at the abovementioned considerations, the kernel function and the sample weight are added to the basic Mean Shift vector form, and the following modified Mean Shift vector form is obtained:

Among them,

is a unit of kernel function. is a positive definite symmetric matrix, called the bandwidth matrix, which is a diagonal matrix. is the weight of each sample. The form of the diagonal matrix is

The above Mean Shift vector can be rewritten as

Mean Shift vector is a normalized probability density gradient. In the Mean Shift algorithm, the probability density is actually used to obtain the local optimal solution of the probability density. For a probability density function , the known d-dimensional space of n sampling points , the kernel function of is estimated:

where is a weight assigned to the sample point and is a kernel function. The estimate of the gradient of the probability density function is

Let and ; there are

Among them, the second square brackets mean the Mean Shift vector, which is proportional to the probability density gradient. Mean Shift vector correction results are as follows:

Considering , then the abovementioned formula becomes

The process after the integration of the specific algorithm is as follows.

Step 1. In the initial frame, all the particles are distributed in the gesture area according to the Gaussian distribution.

Step 2. In the next frame, all particles are mean shifted, the Pasteurian after shifting is taken as the weight of the particle, the weights are normalized, and the number of effective particles is calculated.

Step 3. Determine whether the number of effective particles is greater than the threshold .

Step 4. If the effective number of particles is greater than the set threshold , the target area is in the particle distribution area. Then, all the particles are sorted by weight from the highest to the lowest. Pick a particle whose weight is greater than the threshold to calculate the location of the gesture area.

Step 5. Define the seed as the center of the region calculated in tep 4, the area growth is segmented, the ellipse model of the gesture is established with the separated area, and resample the significance of the particles.

Step 6. If the effective number of particles is less than the set threshold , particles are spread by using a four-week Gaussian distribution.

Step 7. Use Mean Shift algorithm on all particles, the Pasteurian after shifting is taken as the weight of the particle, the weights are normalized, and the number of the effective particles is calculated.

Step 8. If the effective number of particles is greater than the set threshold , go to tep 4.

Step 9. If the effective number of particles is less than the set threshold , determine whether the particle weight has increased.

Step 10. If the weight of some particle increased, it means that a small part of the particles have spread to the gesture area. Resample the significance of the particles and then go to tep 7. If the particle weight do not increase, and the particles spread to the surrounding less than 3 times, then go to tep 6. If they have been spread 3 times, it is considered that there is no target area in the frame, the particles are restored to the state before being shifted, and then go to the next frame.

The algorithm flowchart is shown in Figure 1.

3. Dynamic Gesture Tracking and Positioning Experiment Verification Based on Switching Federated Filter Method

In this paper, two experiments are designed to verify the gesture tracking and positioning effect of the proposed switched federated filtering algorithm. The first experiment is based on the self-developed human-computer interaction positioning software platform and observes the filtering effect of the switching federated filter algorithm. The second one is the gesture tracking and positioning experiment based on 3D interactive software. It compares the switching federated filter algorithm with several algorithms and observes the performance of the different algorithms in tracking accuracy and tracking time.

3.1. Gesture Tracking and Positioning Experiment Based on Self-Developed Human-Computer Interactive Platform

The self-developed human-computer interactive positioning software platform system generates an interactive operation interface through a projector on an arbitrary wall or curtain. The Kinect sensor is generally fixed at a range of 1-3 meters directly above the screen and 5-6 centimeters far from the wall surface [4, 6, 8]. The projector is placed in front of the wall, and the size of the projection surface can be changed by adjusting the distance. The physical map is shown in Figure 2. The operator stands in the interactive area and can use the limb to click, swing, rotate, and grab in front of the projection surface.

Figures 3 and 4 show the effect of gesture positioning before filtering and after switching federated filtering, respectively. As shown in Figure 3, the image on the left taken by Kinect sensor is used to recognize the position of the gesture, and the position and relative coordinates of the gesture obtained after being calculated by the human-computer interaction software platform can be seen on the right side. The green trapezoidal area represents the interactive area recognized by the sensor. The cusp at the bottom of the red line represents the position of the recognized gesture. The black numbers in Figure 3 next to the cusp is the relative 2D coordinate of the gesture in the interactive area.

From Figures 3 and 4, the multiple red line cusps in Figure 3 indicate that the platform is unable to filter out interference to determine the relative gesture position. After the switching federated filter algorithm is added, only one red line appears in Figure 4. So the platform can accurately determine the relative gesture position. Therefore, the proposed switching filter algorithm can improve the accuracy of the gesture tracking algorithm.

3.2. Gesture Tracking and Positioning Experiment Based on 3D Interactive Software

There are many commonly used target tracking algorithms. This paper uses the typical Cam-shift algorithm to compare the actual application with the algorithm on the large screen. Cam-shift algorithm is the commonly used gesture tracking algorithm, which is good for tracking solid objects in a black-and-white background. However, the contrast between the background color and the target is not obvious, and the tracking effect is poor [1517, 3840]. To verify the effectiveness of the switching federated filter algorithm, a gesture tracking and positioning experiment based on the 3D interactive software is presented. The specific operation method is to change the large-screen projection system based on the independent developed interactive platform to a large-screen splicing display system and use the 3D interactive software for real-time display.

This section compares the Cam-shift algorithm with the switching federated tracking algorithm presented in the simulation. Gesture trajectories tracked by two different algorithms are transmitted to the computer. The mouse function of trace tracking in the 3D interactive software can convert the gesture trace map into the trace of the mouse origin and display it on the mosaic screen. The tracking effect of different algorithms can be observed according to the mouse trajectory on the splicing screen. The interactive software based on the software platform can operate the position of the gesture as the mouse position, so the mouse track is the gesture tracking trajectory. Figure 5 is an actual effect diagram of one-hand gesture tracking using the proposed switching federated filter algorithm, and Figure 6 is the corresponding computer screenshot for Figure 5. Figure 7 is the one-hand gesture tracking effect based on Cam-shift algorithm, and Figure 8 is the corresponding computer screenshot for Figure 7.

As can be seen from Figure 5, when one hand moves in front of the screen, the gesture tracking effect is well obtained. The 3D interactive software can accurately track the gesture track and display the gesture track. From Figure 6, there is no tracking failed point on the tracking trajectory (the failed trajectory point is displayed as a red exclamation point), so the gesture tracking based on the switching federated filter algorithm works well. Figure 7 is a schematic diagram of a single-handed operation based on the Cam-shift algorithm. When one single hand moves in front of the screen, a tracking failure phenomenon occurs in the gesture tracking effect. The computer screenshot of the one-hand gesture trajectory is shown in Figure 8. It can be seen from Figure 8 that the tracking trajectory has failed to follow the track point, so the gesture tracking effect based on the Cam-shift algorithm cannot be well obtained.

The hands gesture track trajectory diagrams are shown in the following. Figure 9 is the actual effect diagram of two-hand gesture tracking based on the switching federated filter algorithm, and Figure 10 is the computer screenshot of hands gesture tracking in Figure 9. From Figure 9, the gesture tracking effect is better when both hands are moved in front of the screen. The 3D interactive software can accurately track the gesture track and display the gesture track. From Figure 10, there is no tracking failure point on the tracking trajectory, so the gesture tracking based on the switching federated filter algorithm works well. Figure 11 is a schematic diagram of the two-hand operation based on the Cam-shift algorithm. When both hands are moved in front of the screen, the tracking effect of the gesture tracking fails. The computer screenshot of the hand gesture trajectory is shown in Figure 12, and the track trajectory fails to follow the trajectory point (the tracking failed trajectory point is displayed as a red exclamation point), so the effect of gesture tracking based on the Cam-shift algorithm cannot be well obtained.

To verify the accuracy and tracking time at different sampling points based on the federated tracking algorithm and several commonly used algorithms [41], this paper traces the image sequence of 200 different sampling points and calculates the tracking accuracy and tracking time of the algorithm, respectively. Particle filter (PF) is a Monte Carlo approximation to the optimal Bayesian filter and provides robust tracking of moving objects in a cluttered environment, especially in the case of nonlinear and non-Gaussian problems where the interest lies in the detection and tracking of moving objects. The velocity adaptive model (VAPF) updates the propagation distance according to the temporal difference of previous frames by calculating the average velocity.

Table 1 shows the tracking accuracy of different algorithms at different sampling points. The correct tracking rate (%) are below the various algorithm names. TA indicates the tracking algorithm. MP indicates the number of sampling points.

Table 2 shows the tracking time of the several algorithms at different sampling points. Below the various algorithm names are tracking times (ms). TA indicates the tracking algorithm. MP indicates the number of sampling points.

From Tables 1 and 2, it can be found that the switching federated tracking algorithm proposed in this paper improves the tracking accuracy and tracking time of the algorithm compared to the several filter algorithms. Therefore, the proposed switching federated filter algorithm has greatly improved the real-time performance and accuracy of tracking gestures and can be well applied in large-screen interactive imaging systems.

4. Conclusions

Recent gesture tracking algorithms of large-screen interactive imaging systems are faced with inaccuracies and low real-time performance. This paper proposed a switching federated gesture tracking algorithm combining particle filtering and Mean Shift algorithm. The federated filter algorithm first averages the particles to make most of the particles drift into the scope of the gesture region. The subsequent step of particle prediction can be omitted, thus saving the running time of the algorithm. In the aspect of experimental simulation, this paper first compares the effect graph before and after the filtering with the self-designed software platform and shows that the switching federated filter has good effect in removing the positioning interference. Then, the trajectory display function is invoked in the 3D interactive software. Compared with the Cam-shift algorithm for the gesture tracking image in actual large screen, whether it is tracking one-hand gesture or two-hand gesture, none of the federated filter algorithms will fail to track or lose the target. Therefore, the switching federated filter algorithm can solve the problem of low accuracy and real-time performance in dynamic gesture tracking. This algorithm effectively reduces the impact of complex environments on tracking effects, which can be applied to interactive imaging systems such as large screens.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work is supported by the National Natural Science Foundation of China (61403268, 61873176); Natural Science Foundation of Jiangsu Province, China (BK20181433); Fundamental Research Funds for the Central Universities (30918014108); Natural Science Fund for Colleges and Universities in Jiangsu Province (16KJB120005); Open Fund for Collaborative Innovation Center of Industrial Energy-Saving and Power Quality Control, Anhui University (KFKT201806).