Information and Modeling in ComplexityView this Special Issue
Mixed Signature: An Invariant Descriptor for 3D Motion Trajectory Perception and Recognition
Motion trajectory contains plentiful motion information of moving objects, for example, human gestures and robot actions. Motion perception and recognition via trajectory are useful for characterizing them and a flexible descriptor of motion trajectory plays important role in motion analysis. However, in the existing tasks, trajectories were mostly used in raw data and effective descriptor is lacking. In this paper, we present a mixed invariant signature descriptor with global invariants for motion perception and recognition. The mixed signature is viewpoint invariant for local and global features. A reliable approximation of the mixed signature is proposed to reduce the noise in high-order derivatives. We use this descriptor for motion trajectory description and explore the motion perception with DTW algorithm for salient motion features. To achieve better accuracy, we modified the CDTW algorithm for trajectory matching in motion recognition. Furthermore, a controllable weight parameter is introduced to adjust the global features for tasks in different circumstances. The conducted experiments validated the proposed method.
Motion trajectory is a sequence of positions of moving object by time series in spatiotemporal motion. For motions of human hand gestures, body movements, robot arm actions, and other complex long-term motions, we can record them by their trajectories. Motion trajectory contains dynamic information which is useful for characterizing and identifying motions. Trajectory-based motion analysis is important for many applications, such as human behavior recognition , human intention perception with prediction, motion modeling for robot Learning by Demonstration (LbD) [2, 3], and other motion analysis. Most of these applications are useful for human-robot interaction in which motion analysis of human and robot actions are important . As behaviors and activities are mostly performed in space, 3D trajectory analysis is much more important than 2D curve approaches. The information contained in trajectory can be extracted and used in motion perception and recognition. For example, in a surveillance application, the intentions of people need to be captured by perceiving their motions and predicting the further motions of people. Here, an effective trajectory descriptor for motion modeling which can extract adequate and overall motion information is valuable for motion description, perception, and recognition.
In the existing work, trajectory was mostly used in raw data directly . However, raw data rely on the absolute positions of motion and are thus ineffective in computation and sensitive to noise. They are incapable of capturing shapes and detailed features under changes of viewpoints. Therefore, most information in 3D space cannot be captured directly from raw data, and a flexible and adaptable trajectory descriptor for 3D motions is needed. The flexibility and adaptability here can be measured with four criterions: (1) ability of capturing salient features, (2) accuracy of charactering motion trajectories, (3) efficiency in computation, and (4) invariant in different viewpoints and circumstances.
To get the salient features of motion, the concept of shape descriptor was developed. Most of shape descriptors lack adaptability for tasks under different environments, such as Centroid-Contour Distance (CCD) and R-S curve. Chain code  is a discrete method which makes the trajectory samples transformed to an apex point of the square around the sample position. Shape indexing approach  uses tokens to describe local shapes with their curvatures and directions via the M-tree structure. Another local shape description is the Dominant Polygonal (DP) , which uses polygonal curve to describe local dominant shapes. Contour functions  are used to describe local shapes, such as Cross-Section Function, Radius-Vector Function, Support Function, and Tangent-Angle Function. Curvature scale method is also used as curve description . These methods are limited under spatial transformation and can only be used for simple shapes of regular human gestures, which is not appropriate for complex long term motions.
Mathematical descriptors were also used to describe motion trajectories, such as B-Spline , NURBS , and Bezier curve . Trajectories and curves are represented by a group of control point parameters of the spline curve. To obtain the spline parameters for shape representation, data fitting of these mathematical descriptors is necessary. In this way, inaccuracy of them will be inevitably caused.
Transform functions were used in the existing work to describe the global features of curves and motion trajectories, such as Fourier Descriptor (FD) , wavelet coefficients [15, 16], and radon transform . Shapes of curve and trajectory are represented as a whole by these transformation methods, where the local features were lost and the normalization process is always a problem. Therefore, the transform functions are not suitable as descriptors here.
Invariant descriptors are flexible for motion trajectory representation in different circumstances. Geometric invariants  are invariant at shapes of curve that can be used for the same shape in different viewpoints or transformation, as well as the algebraic invariants [19, 20]. The moment invariant function  is also an invariant description which abstracts the global features, but they are not capable for the complex and long-term trajectory. The method in  is viewpoint invariant for projected two-dimensional recognition rather than real spatial data analysis. Several relevant articles are inspired in our work [23–25] for motion analysis.
Some new methods of shape description are used for motion analysis in the present study. Ogawara et al.  use motion density to detect repeated motion patterns, which is only useful for coincide motions. Prati et al.  use motion trajectory angles for description, Faria and Dias  use curvature and orientation to model hand motion trajectories, which are limited for simple shapes, and more than 3D coordinates will be needed. The method in  is viewpoint invariant, but it is only useful for analysis of the periodic motion. The context of motion is used for modeling trajectories in , but the composite of multitrajectory context are not flexible for single point in complex trajectory analysis.
Differential invariants in the previous signature descriptor are invariant in translation, rotation, scaling, and occlusion [31–33]. The signature descriptor is adaptable for motions under these spatial transform and is efficient in motion analysis. These differential invariants are good for describing local shapes, but global features are lost in this descriptor. Trajectories are usually characterized as a whole in motion analysis, and the global information of each point in a trajectory is also important for these tasks.
In this paper, we propose a new descriptor—mixed signature with not only differential invariants but also global invariants. The global invariants perform well in capturing global features which are necessary for trajectory perception and recognition. Apart from containing global information, the mixed signature inherits the advantage of the previous signature descriptor: the mixed signature is also invariant in spatial transformation, including translation, rotation, scaling, and occlusion. We use this descriptor to model motion trajectory for motion perception and recognition. A large database  is used in the experiment for testing the efficiency of our method.
The reminder of this paper is organized as follows. Section 2 presents the definition of the mixed signature descriptor. Theories of motion perception and recognition based on this descriptor are expatiated in Sections 3 and 4 separately. Section 5 presents experiments and result analysis. This paper is concluded in Section 6.
2. Mixed Signature for Trajectory Representation
A motion trajectory is a sequence of discrete points which represent the positions of the moving object in every frame. The 3D coordinates of these positions are raw data of trajectory, denoted as , where is the index number of the frame sequence and is the trajectory length. Figure 1 shows a piece of 3D motion trajectory.
The mixed signature of is defined with differential invariants and global invariants as follows: where
The parameters defined in (2.2)–(2.5) are the differential invariants: curvature , torsion and their derivatives and ; here, s is arc length defined in (2.6). As these differential invariants have been presented in the previous signature descriptor [31–33], we only discuss the global invariants in this paper. Parameter s denotes the arc length from the beginning of the trajectory to the present point, which represents different phases of the motion. The geometric centre of the trajectory is denoted by , and parameter represents the geometric distance from the present point to the center . These two global invariants capture the relation between the present point and the whole trajectory, as illustrated in Figure 2.
These two global features are necessary for the requirement of global information, especially when the motion is characterized as a whole. For example, when we demonstrate a task to a mobile robot as PbD learning, the distribution of different parts of the motion cannot be ignored for the integral structure of the task. Figure 3 shows a case of motion trajectory classification with Dynamic Time Warping (DTW) algorithm and Euclidean distance, respectively, where different distributions of similar local features will significantly alter the result. The parameter in Figure 3 denotes the distance-degree of difference between the trajectories in matching. In this case, the trajectories will be wrongly classified by DTW with only local features which will lead to motion confusion. The raw data contain much more global information because it is represented by the absolute position of 3D space. Hence, the matching result in Figure 3(b) is significantly different from that in Figure 3(a). However, the raw data is inflexible that most motion information cannot be extracted and is not invariant in spatial transformation as the signature invariants. Therefore, the mixed signature is a tradeoff that preserves the particularity of invariant and captures the global information as much as possible.
The global invariants and illustrate the relation between and the whole trajectory, which are independent from space transformation. No matter how the viewpoint is changed or the scale of trajectory altered, the value of the global invariants will be steady for every sample. In this way, the mixed signature inherits the invariant of translation, rotation, scaling, and occlusion of the previous signature with extra global information. Figure 4 shows the six invariants of the mixed signature in time index.
As the trajectory is a discrete sequence, the calculation of the high-order derivatives and integrals of the mixed signature will result in errors due to its sensitivity to noise. To avoid calculating the high-order derivatives directly, we replace the accurate invariants with an approximation based on several neighboring points of the sample. This approximation will reduce the sensitivity to noise with only the lowest-order derivatives . The approximations of differential invariants have been discussed in [31–33] and we only present the approximations of global invariants in this paper.
As shown in Figure 2, we use line segments between points instead of the arc length, and the approximation is the sum length from the beginning to the present point . Similarly, geometric center is calculated by averaging all the points of trajectory. Equation (2.8) is definition of approximations and that normalized by the whole trajectory where
The mixed signature is calculated with the discrete samples of trajectory, and there maybe more than one sample at the same position for the condition that . We call this position the stationary point, and there are two types of stationary point in motion trajectory. One is the case that the motion alters its direction and then the object must have a moment that the speed is zero at the altered point. This case is shown in Figure 5. This point is important for the motion analysis in perception and recognition. The other case is the break of motion while the behavior accidentally holds on, and the position is recorded for several frames. This point is the noise in the motion analysis, and the repeatedly sampled points should be removed with only one sample staying in the trajectory.
The noise in trajectory will affect the calculation of the invariants, because the differential invariants are local parameters which depend only on several nearby samples. Trajectory smoothing is an important process to reduce the noise and vibration in a trajectory for accurate calculation of invariants. In our method, we use the wavelet smoother and it proved to be effective with acceptable shape deviations . In this process of smoothing by the wavelet smoother, trajectory shape can also be preserved, as the decomposition level of wavelet smoother can be tuned according to the noise strength (see Figure 6).
In a real motion trajectory, there are usually some outlier samples which contain significant errors and will affect the calculation of invariants badly. The single distance between outlier samples on separate trajectories is larger than the normal ones. In this paper, we set a threshold for single distance of every pair of corresponding points to filtrate the outlier samples. The threshold can be set in two ways: via local threshold and global threshold. The local one is calculated dynamically with the distances of nearby samples. The global one is calculated with the distances to a mean route which is calculated beforehand without any threshold. For general motion trajectories, the global one generally works well. However, in some cases where the motions are complex with different working situations, trajectories will suffer from varying covariance of distances. For this reason, the local threshold performs better than the global one.
3. Motion Perception
Motion perception is an important method in analysis of human and robot gestures by capturing their salient features. From these features, we can perceive the intention of motion or identify motions from database. Salient features can be captured by properties of motion, such as speed, symmetry, period, and feature shapes. These properties have been studied in existing tasks for motion analysis , and we discuss the symmetry and period to elaborate the mixed signature for motion perception.
3.1. Motion Symmetry Perception
As we all know, there is a symmetrical point in the center of the symmetrical part of the trajectory, denoted . All the corresponding points are symmetrical related to this point. The symmetrical part may be only a segment of a trajectory. We classify all the symmetrical conditions in 3D space into two basic classes. In the first class, the corresponding points have the same distance to the symmetrical plane. That is to say, this only plane is a mirror between the corresponding points and one point is the image of the other one. This case is shown in Figure 7, denoted mirror symmetry. In the other class, the corresponding points have the same distance to the center point. One part will cover the other part if rotate a certain angle around the center point. We denote this case central symmetry as shown in Figure 8. The methods to perceive these two classes of symmetrical motions are described in two theorems as follows.
Theorem 3.1. As is the symmetrical center, a pair of corresponding symmetrical points, for example, and hold the following relations:
Proof. The symmetrical part composed with two corresponding segments which are the same in shape, so that . A left-hand helix will turn to a right-hand helix via a mirror, which explains . As (3.1) proves that the function is an even function, the derivatives of with respect to should be an odd function, and we have . Similarly, we have . The distances and arc lengths from to and should be the same, as shown in Figure 7. Hence, we have and .
Theorem 3.2. As one part will cover the corresponding part of the trajectory by rotating round the center point, the properties of corresponding points will be the same. Hence, a pair of corresponding symmetrical points, for example, and , hold the following relations:
Proof. The curvature and its derivative are the same as those of Theorem 3.1, and (3.11)-(3.12) are the same as (3.5)-(3.6) as well. The torsion direction will not change while rotation, and the corresponding parts in Figure 8 are both left hand helix. Then, we have and .
We can perceive motion symmetry by detecting the relevant properties listed in these two theorems. The central point should be located first, according to and for mirror symmetry while and for central symmetry. Then, we check every pair of corresponding points via the equations in the theorems and confirm the length of the symmetry part. If the length of corresponding points is zero, this center point should be discarded. Equations (3.5)-(3.6) with global parameters are necessary in motion perception, which were not considered in previous signature descriptor. Due to the scaling invariant of differential invariants, the trajectory segments with similar shapes in different sizes will be erroneously perceived as symmetry without these global equations. We will discuss this condition in the implementations in Section 5.2.
For better analysis of motion symmetry, the coordinates of / and / are useful for extracting the properties. Figures 9 and 10 show the coordinates of the subsignature in the two classes of symmetry. From the figures we can figure out the center point and there maybe multiple center points in one trajectory. The symmetrical properties of the differential invariants are also observed in these figures.
3.2. Periodic Motion Perception
Periodic motion occurring in human and robot activity is usual, and motion analysis will capture more information in repeated tasks by perceiving this motion property. We can perceive this feature according to the properties of periodic trajectory. Periodic motion is periodic in almost all of the motion features, such as speed, direction, shape, and displacement. All of these periodic features are indexed by the period —the distance between neighboring periods, and they are generally expressed with the displacement function where denotes motion features and is an integer. The features can be well represented by the invariant signature under spatial transformation. The periodic property cannot be represented accurately with only differential invariants. Rather, geometric distance and is more reliable here, which will ensure the features periodic in 3D space. Not only the local shapes but also the vectors are equal between neighboring periods as follows.
Theorem 3.3. A pair of corresponding points and in different periods have the relations as follows:
Proof. From (3.7), we can conclude the four equations of differential invariants directly. In 3D space, a segment of one period will be coincident with another period by translating along a vector. In this way, all the vectors between corresponding points are equal to this vector. Then, (3.18)-(3.19) can be inferred.
For periodic motion perception, we should confirm the period and starting point first. Then, we need to examine whether all the corresponding points in periodic trajectory satisfy the equations in Theorem 3.3. Therefore, we can perceive periodic motion only via a single trajectory without any database beforehand. The advantage of the mixed signature with global information will be illustrated in the experiments in Section 5.3.
3.3. Feature Perception via DTW Algorithm
Corresponding sample alignment and match is necessary in perception of symmetrical motion and periodic motion. We need to compare the invariants by the theorems between corresponding samples. However, as there will be a difference in sample rate or distribution of points that makes the samples not in the corresponding points, the comparison between corresponding samples should not directly use the equations in the theorems. The samples cannot be aligned one by one, and an appropriate method of alignment for matching samples is necessary. In this paper, we use a nonlinear alignment method—DTW algorithm , which can find the best alignment between corresponding segments according to the theorems. DTW is effective at similarity measurement that we can perceive motion by matching a segment of trajectory with the feature segments in a database. The segments in database indicate different motion features, and we can infer the intension of the motion by perceiving them.
DTW—Dynamic Time Warping algorithm—is to calculate the best correspondence of samples between two trajectory segments for the minimum distance (see Figure 11). This distance can be defined according to the demand of tasks to calculate the similarity. The alignment in every step of matching relies on the minimum sum distance where denotes the distance of sample and in respective trajectory and is the sum distance up to them.
The distance between corresponding samples in our work is defined with the descriptor of trajectory. For two trajectories A and B with respective lengths M and N, the distance between samples and is defined with the approximate mixed signature as follows: where
Equations (3.22)–(3.25) are the same as the definition in  with the previous differential signature . Here, the parameter is the weight of global invariants in calculating the distance of trajectories. The weight of global information depends on two aspects: the circumstance for sampling and demand of tasks. The setting of this parameter will be presented in the next section.
From Figure 11, we can see that corresponding samples in different frame rate and distribution will be matched by DTW algorithm. This method considers the minimum distance of relative features, which is suitable for this task. The mixed signature is flexible and adaptable for this algorithm, because only the differential invariants in the previous signature descriptor are not enough. The matching in Figure 3(a) is a case of DTW matching by only differential invariants, and we can see that corresponding points are similar at local features. In this way, the distance between the two trajectories appears small while they are of different classes. To solve this problem, global information should be considered in distance calculation as (3.21). We will present the flexibility of mixed signature in the experiments by comparison with the previous signature.
4. Motion Recognition
Motion recognition is to classify motions with a database, where flexible descriptor and recognition algorithm are both crucial for good performance. The DTW algorithm is effective to overcome the diversity in motion speed and frame rate which will lead to different sample rate and distribution. However, this method has some disadvantages which will be inaccuracy in trajectory recognition. For example, if the sampling in one trajectory is sparse while the other is not (Figure 12(a)), the samples far from the corresponding position will cause a large distance due to the fact that DTW matches only discrete samples rather than continuous curves. Munich and Perona proposed a Continuous Dynamic Time Warping (CDTW) algorithm to explore a solution for this problem .
The alignment method in CDTW is suitable for curve matching rather than motion recognition because it does not consider the properties of kinematics. Figure 12 shows the results of matching by different algorithms. The trajectories in Figure 12 are the same motion under two sampling schemes. It shows that different match algorithms lead to different results. The linear interpolation of the trajectories in Figure 12(b) is not accurate for motion trajectory and the method in  is complex. Proper interpolation method considering motion properties is needed, and concise conditions will reduce the complexity of the algorithm.
In this section, we modified the CDTW algorithm for motion recognition with the mixed signature descriptor. The time warping method used in  classified the path of alignment between two trajectories into four matching conditions in the algorithm. Those four conditions are complex in calculation, and we simplify them into only two conditions in our algorithm. We use two subitems to express the distances of two conditions and calculate the minimum of them as follows: where , are sequence lengths of trajectory A and B, is a parameter between 0 and 1, is the distance between samples and , and is the distance between samples and . Here, is a point moving on the trajectory A between samples and as shown in Figure 13.
For the matching between and in DTW, the corresponding points can only be three positions: the three intersections , , in Figure 13(a). However, in our approach, the matching point can be anywhere on the two sides connecting the three points (see Figure 13(b)). In this way, the difference of samplings will not increase the distance between similar trajectories and the difference of trajectory lengths will not affect the matching either.
There are two warping conditions in every step of our algorithm, no matter which side the warping path go through in the previous step of matching and into the present “matching block” (see Figure 13(b)). As long as the warping path enters the block, it can only exit from the left side or bottom side, including the intersection of these sides. All these conditions are included in (4.1). When , this is the same condition as that in DTW algorithm.
In the warping algorithm, if the parameter is not zero, the corresponding point in one of the trajectories must between the adjacent samples and the position of the point is unknown. In the CDTW algorithm , the positions in x and y direction are calculated by linear interpolation separately. However, the linear interpolation is not accurate especially in motion trajectories, because not only the two adjacent samples decide the position between these samples but also the neighboring samples of them will affect the position of the unknown point as well.
As presented in the efficient prediction method Kalman Filtering , the prediction of the unknown point depends and only depends on the present sample and the previous sample . As the whole trajectory sample data are known in advance, the succeeding samples and are also useful in calculating the unknown point (see Figure 14). We can also use this theory by the feature of motion that the previous sample will control the inertia of the unknown point by direction and speed as well as that the same property of the unknown point will also affect the succeeding sample .
In our method, the cubic polynomials interpolation is selected to calculate the coordinate of the unknown point with four samples: , , , and , because four samples can control a cubic curve. Then, we use the calculated coordinates and neighboring known samples to calculate the invariants of the point for the calculation of the matching distance. The two corresponding points and in trajectories A and B (maybe or ) are represented by their mixed signature: [,,,,,] and [,,,,,]. The distance between samples and is defined as follows: where The definition of is similar to that of .
The parameter is the same as the one used in motion perception via DTW. Trajectories in different circumstances will appear different in global distance and noise. Overload the global distance will enlarge error in calculation, and lead to wrongly recognizing. For a database sampled under certain circumstance, we calculate the average distance with different . We set as zero and enlarge it with a certain step size in iterative calculation of average distance until arrive the convergence condition. The subscript here is the iteration index, and the convergence condition is a threshold of average distance: (e.g., ).
In another condition, a set of motion trajectories are classified with a standard beforehand, and we want the recognition engine to classify trajectories by this standard. Therefore, the weight should be trained by this standard database first. We adjust in different value to classify this database, until the result of classification of this database is the same as the standard. Then, this satisfies this task.
The aim of the experiments is to present the performance of the mixed signature in motion perception and motion recognition comparing with the previous methods. Motion perception and recognition are demonstrated based on the DTW and modified CDTW nonlinear matching algorithms respectively. We use the DTW algorithm in motion perception to show the flexibility of the mixed signature descriptor. The modified CDTW is used in motion recognition to improve the recognition accuracy.
Sign motion is an important sort of human action used for daily interaction. The signs are spatial symbols performed by human hands or other mode. We did implementations with several groups of sign data in order to illuminate the necessity of the global invariants in motion analysis. We used a stereo vision system to track sign motion trajectories of different people and recorded them in a PC (Figure 15). The 3D motion trajectory was calculated from the two image sequences captured by separate camera. We used these sign trajectories to test the properties of the mixed signature in motion perception and recognition compared with the previous method. But note that this descriptor is reliable in various tasks other than this given type of examples. A large trajectory database  was used in [31–33] for motion recognition, and we also used this database in our experiments for comparison.
Beside the sign motion, daily behavior of people was tracked as well in the experiments. We recorded these motion trajectories for motion analysis, such as open a box, pure water, and other actions. In the trajectory acquisition, only the actions performed by one hand were tracked and we tracked the mark on hand instead of directly tracking the hand. The tracking was simplified by tracking a rigid object, and tracking other parts of body is out of the scope of this paper. Figure 16 shows the tracking of opening a box to carry an object.
5.1. Invariant Property in Motion Analysis
The mixed signature is a flexible descriptor in both local shapes and distribution as a whole for motion analysis. As an invariant signature, it is also spatial-temporal invariant for motion trajectories in 3D space as the previous signature, including translation, rotation, scaling, speed and occlusion. This experiment demonstrates the intertrajectory perception between motion trajectories from the mixed signature based on the DTW alignment algorithm. Figures 17–20 show several cases of trajectory path matching, which give an intuitive perception of the invariant properties.
From the instance in Figure 17(a), the trajectories of the same action in different positions were tracked. Both the viewpoint and motion position were translated from one position to another position. Figure 17(b) is the interalignment matching between the trajectories which illustrate the invariant of the mixed signature under translation. It is observed that the corresponding points of respective trajectory are matched via DTW algorithm in the figure. We should notice that the bottle in A-1 and B-1 are the same one; hence, the trajectories A-1 and B-1 are the same in size. The trajectories in Figure 18(a) represents the same action rotated to different directions and were tracked in different viewpoints. The abstracted trajectories A-2 and B-2 are matched in Figure 18(b) which demonstrates the invariant property under rotation as well. Figure 19 is the case that the similar signs in different sizes which are matched via DTW. We can observe that A-3 and B-3 are similar in shape with different sizes in the same coordinate system. This difference is not resulted by the distance from the vision system to the object, but the real difference in scale. This case verified the scaling invariant of the mixed signature. In general motion instances, we should notice that there maybe not only one spatial transformation between two similar actions. The transformation between two motion trajectories is probably a mixture of translation, rotation, and scaling. For example, there is also translation between the actions under rotation in Figure 18, which shows the invariant for complex transformation.
The instance in Figure 20 is two trajectories of similar signs in different speed. We tracked these signs in the same frame rate and the sampling of B-5 is much denser than that of A-5, which shows the speed of A-5 is faster than that of B-5. In this way, the samples in A-5 is warping to more than one corresponding samples in B-5 via alignment, which demonstrate the invariant in different speed. The occlusion in tracking of motion is another important factor in motion analysis and the mixed signature is invariant for this as well. However, as the global invariants are calculated with all samples of a trajectory, the mixed signature cannot be directly used in this condition. There should be preprocessing of motion trajectory under the occlusion condition. Further more, a high level alignment method of trajectories should be introduced instead of DTW/CDTW, and this is out of the scope of this paper.
5.2. Perception of Motion Symmetry
In this experiment, we generated a group of sign motion trajectories for motion symmetry perception via the mixed signature compared with the previous signature. Symmetrical segments of a trajectory were perceived excluding the dissymmetrical points. There might be more than two pairs of symmetrical segments in a trajectory, and some of them maybe overlap each other. These conditions can be perceived in our method as shown in Figure 21. There is one pair of symmetrical segments in Figure 21(a), and the central point is noted with a black star. Those points which cannot be matched by Theorem 3.1 will not be perceived. Similarly, both of the two pairs of the symmetrical segments in Figure 21(b) can be perceived with respective central point. The cases in Figure 21 are of mirror symmetry, and a case of central symmetry is shown in Figure 22.
As the previous signature is invariant for scaling, two segments of a trajectory in different scale will have the same signature if they are similar in local shape. That is to say, if a part of symmetry motion is scaled, this motion will be error perceived as symmetry motion as well. If the global information is considered, this error will never occur. The mixed signature is also invariant in space transform, but not invariant for different parts within a single trajectory. Some words signed in this condition were error perceived by separate trajectories. Figure 23 illustrates this problem.
From Figure 23, we can see that two segments of the trajectories are similar in shape but different in scale. As scaling will not change the differential invariants, the corresponding points satisfy the theorem of symmetry in the previous signature. In this way, these signs were error perceived as symmetrical trajectories by that method . However, the global invariants of the corresponding points are different at all. For example, the arc lengths from the central point to the corresponding points are different. Hence, these signs will not be perceived as symmetrical trajectories by the mixed signature. The theorem of mixed signature is more accurate and strict in symmetrical motion perception than the previous one. The same condition will also occur in the central symmetry perception.
5.3. Perception of Periodic Motion
We sampled several groups of periodic motions by different people and perceived the periodic properties of them. Figure 24 shows a case of periodic trajectory with 3 periods. Some of the corresponding points are linked with dash lines.
The similar condition also occurs in the perception of periodic motion by the previous signature as occurred in symmetrical motion perception. Periodic motion perception suffers much more from the periodic theorem of differential invariants than that in symmetrical motion perception. Not only the scaling invariant will lead to error perception while a trajectory is not of periodic property, but also the rotation invariant will cause this error perception as well. If a periodic segment of a trajectory is scaled or rotating to another form, this trajectory will be error perceived by the previous signature as shown in Figure 25.
A trajectory with two similar segments in different scale in Figure 25(a) was erroneously perceived as periodic trajectory by the previous signature. Another one in Figure 25(b) was also erroneously perceived while two segments rotate to different directions. However, the global parameters and are different between the corresponding points. In this way, these trajectories would not be error perceived as periodic motions by the mixed signature.
5.4. Sign Motion Recognition
We implemented sign motion recognition via the mixed signature comparing with the previous signature in order to test the performance of our method. We captured sign motion trajectories from the vision system in this experiment to test the characteristics of our method for motion recognition, and we performed the statistics implementation with a large 3D database in next subsection.
Two groups of sign motion trajectories in different classes were sampled by several different signers. There are similar shapes between different classes of these signs and hard to distinguish. Every two signs in different classes in these groups are similar in their local shapes. As the differential invariants are similar between the corresponding points in these similar local shapes, these signs will be wrongly classified by the previous signature descriptor. Figure 26 shows this condition with two cases: - and 0-6. We can see the similar local shapes were matched by the local features in separate figure. Hence, their distances were largely decreased.
We matched all these signs of different classes by previous descriptor and new descriptor separately. The results in comparison are listed in Table 1. The data in this table are distances by separate descriptor, and 100 pairs of sign data were tested for every subgroup. We calculated the average values and the extreme values from the 100 data and listed them in the table. From these data we can see that the matching of - by the previous signature is not clearly different from - and -, with even some of the extreme values overlapped. In this way, and cannot be classified accurately by the previous descriptor and will be wrongly accepted or rejected for the boundary confusion. In contrast, we can see that the results of - are obviously different from - and - under the mixed signature descriptor and the boundaries are distinguished. The same condition occurs in the matching between 0 and 6 as well. Table 2 lists the experimental results of this group. The experiments in  also suffer the confusion of 0 and 6 which is presented in that paper.
We also tested the same word by different fonts (just like the Experiment 5.1–5.3 in , see Figure 27). Recognition with the previous descriptor can only classify the words in different classes but cannot distinguish the same word in different fonts. In some cases, similar words in different classes cannot be correctly classified either. However, the mixed signature can solve these confusions in our experiments. We tested 4 and 9 which were signed in different fonts, just like Figure 27. The results of classifying the two fonts of number 4 in Figure 27(a) (Type A and Type B) are listed in Table 3.
5.5. Motion Recognition with Large Database
In this experiment, we tested the mixed signature descriptor by recognizing 3D trajectories of different classes in a large database . The database was used in [31–33] for experiments and we used the same database for comparison. Two instances of the sign words “crazy” and “name” are shown in Figure 28. There are 95 classes in the database and 29 samples for each class. We used half of the samples for training and the other half for testing. Several classes of samples were randomly selected and recognized in our method, and we repeated this test 50 times. The average ratios of correct recognition are listed in Table 4 for different number of classes.
The experimental results in Table 4 show that the new method by mixed signature achieves higher recognition rate in matching within 2, 4, and 8 classes separately. Furthermore, the correct ratio of mixed signature in two classes is 2.27% more than that of the previous method. However, this difference of correct ratio in four and eight classes recognition is enlarged to 2.63% and 5.52%, respectively. Hence, as the number of classes increases, the recognition rate of the new method outperforms much more than the previous one. That is to say, our method is more flexible for multiclass recognition in large database.
We also used another dataset from the same database  including 95 classes and each class has 70 samples signed by five people. Two and Four people were selected randomly every time and half of their signatures were used for training and half for testing. This experiment was repeated over 100 times in all these 95 classes, and the average recognition performance of these tests is listed in Table 5. Figure 29 shows the confusion degree among the motions from these five signers with the intensity image. The grey levels of the intensity image denote the degree of confusion which is the ratio of error classification between two classes.
From Table 5, we can see that the recognition rate of our method is higher than the previous one. Some fonts between signers are hard to distinguish that these fonts of the two people are very similar. For example, the recognition between C2 and C4 are hard to classify by either method (see Figure 29). For this reason, the average recognition rate is much lower than classification among different words. However, the result can also support our method which has performance in this aspect. These comparisons show that the proposed method of mixed signature with global invariants is more capable for motion trajectory recognition under various circumstances.
A new invariant descriptor—mixed signature is presented for 3D motion perception and recognition via the trajectory. This new descriptor is based on differential invariants but uses extra parameters containing global information which was not included in the previous study. An effective alignment algorithm CDTW is modified and used in our method for trajectory matching. We modified the CDTW algorithm for matching trajectories with the cubic polynomials interpolation. Our new method is flexible and adaptable for different tasks by adjustable λ. Experimental results show the advantage of this method.
We compare the performance of classifying different classes of trajectories by our new descriptor with the performance of the previous descriptor. Our method shows better performance especially in distinguishing motions with similar shapes. We also test these methods by trajectories which were signed by different persons, and our method outperforms previous methods. To increase the computational efficiency of the CDTW algorithm for high speed implementation, some efficient methods need to be developed for computing the invariants. Furthermore, we will apply this method for motion analysis in the biology and human health area .
This work was supported by a grant from City University of Hong Kong (Project no. 7002511).
K. K. Lee, M. Yu, and Y. Xu, “Modeling of human walking trajectories for surveillance,” in IEEE International Conference on Intelligent Robots and Systems, vol. 2, pp. 1554–1559, 2003.View at: Google Scholar
J. Martin, D. Hall, and J. L. Crowley, “Statistical gesture recognition through modeling of parameter trajectories,” Lecture Notes in Computer Science, vol. 1739, pp. 129–140, 1999.View at: Google Scholar
M. J. Black and A. D. Jepson, “A probabilistic framework for matching temporal trajectories: condensation-based recognition of gestures and expressions,” in Proceedings of the European Conference on Computer Vision, vol. 1, pp. 909–924, Freiburg, Germany, 1998.View at: Google Scholar
P. R. G. Harding and T. J. Ellis, “Recognizing hand gesture using Fourier descriptors,” in Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), pp. 286–289, August 2004.View at: Google Scholar
S. Y. Chen, J. Zhang, Q. Guan, and S. Liu, “Detection and amendment of shape distortions based on moment invariants for active shape models,” IET Image Processing, vol. 5, no. 3, pp. 273–285, 2011.View at: Google Scholar
D. R. Faria and J. Dias, “3D hand trajectory segmentation by curvatures and hand orientation for classification through a probabilistic approach,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '09), pp. 1284–1289, December 2009.View at: Publisher Site | Google Scholar
J. Sun, X. Wu, S. Yan, L. -F. Cheong, T. -S. Chua, and J. Li, “Hierarchical spatio-temporal context modeling for action recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR '09), pp. 2004–2011, 2009.View at: Publisher Site | Google Scholar
J. Y. Yang, Y. F. Li, and K. Y. Wang, “Mixed signature descriptor with global invariants for 3D motion trajectory perception and recognition,” in Proceedings of the IEEE International Conference on Industrial Engineering and Engineering Management, pp. 1952–1956, 2010.View at: Google Scholar
UCI KDD ASL Archive, http://kdd.ics.uci.edu/databases/auslan2/auslan.html.
L. R. Rabiner and B. H. Huang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
M. Munich and P. Perona, “Continuous dynamic time warping for translation-invariant curve alignment with applications to signature verification,” in Proceedings of the IEEE International Conference on Computer Vision, vol. 1, pp. 108–115, 1999.View at: Google Scholar
R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transaction of ASME, Journal of Basic Engineering, vol. 82, pp. 35–45, 1960.View at: Google Scholar