Abstract

This paper utilizes a human-robot interface system which incorporates particle filter (PF) and adaptive multispace transformation (AMT) to track the pose of the human hand for controlling the robot manipulator. This system employs a 3D camera (Kinect) to determine the orientation and the translation of the human hand. We use Camshift algorithm to track the hand. PF is used to estimate the translation of the human hand. Although a PF is used for estimating the translation, the translation error increases in a short period of time when the sensors fail to detect the hand motion. Therefore, a methodology to correct the translation error is required. What is more, to be subject to the perceptive limitations and the motor limitations, human operator is hard to carry out the high precision operation. This paper proposes an adaptive multispace transformation (AMT) method to assist the operator to improve the accuracy and reliability in determining the pose of the robot. The human-robot interface system was experimentally tested in a lab environment, and the results indicate that such a system can successfully control a robot manipulator.

1. Introduction

Human intelligence is required to make a decision and control the robot especially when it is in unstructured dynamic environments. Thus, robot teleoperation is necessary in this situation especially when objects are unfamiliar and shapeless. There are some human-robot interfaces [1] like joysticks [24], dials, and robot replicas, and they have been commonly used. However, for completing a teleoperation task, these contacting mechanical devices always require unnatural hand and arm motion.

There is another way to communicate complex motions to a remote robot and it is more natural compared with using contacting mechanical devices. This method uses inertial sensors, contacting electromagnetic tracking sensors, gloves instruments with angle sensors, and exoskeleton systems [5] to track the operator hand-arm motion which completes the required task. However, these contacting devices may hinder natural human-limb motion.

Because vision-based techniques are noncontacting, they seldom hinder the hand-arm motion. Vision-based methods often use physical markers which are placed on the anatomical body part [68]. There are a lot of applications [6, 9, 10] using this marker-based tracking method. However, because body markers may hinder the motion for some highly dexterous tasks, operators may get occluded. Thus, this marker-based tracking method is not always practical. Due to this reason, a markerless approach seems better for many applications.

Compared to image-based tracking method which uses markers, markerless method not only is less invasive, but also eliminates problems about marker occlusion and identification [11]. Thus, for remote robot teleoperation, markerless tracking may be a better approach. However, existing markerless human-limb tracking techniques have a lot of limitations so that they may be difficult to use in robot teleoperation applications. Many existing markerless-tracking techniques capture images and then compute the motion later [1215]. Thus, the robot manipulator can be controlled by continuous robot motion using the markerless tracking method. To allow the human operator to perform hand-arm motions for a task in a natural way without any interruption, the position and orientation of the hand and arm should be provided immediately. Many techniques can only provide 2D image information of the human motion [16, 17]; thus the tracking methods cannot be extended for accurate 3D joint-position data. An end-effector of a remote robot requires the 3D position and orientation information of the operator’s limb-joint centers. How to identify human body parts in different orientations has always been a main challenge [12, 13, 18].

Some limited research towards markerless human-tracking has been done for robot teleoperation. Some use a human-robot interface based on hand-gesture recognition to control the robot motion [1921]. Ionescu et al. [22] developed markerless hand-gesture recognition methods which can be used for mobile robot control and only need a few different commands such as “go,” “stop,” “left,” and “right.” However, for object manipulation in 3D space, it is not possible to achieve natural control and flexible robot motion by only gestures. If an operator wants to use gesture recognition method, he/she needs to think of those limited separate commands which human-robot interface can understand such as move up, down, and forward. A better way of human-robot interaction would be allowing the operator to focus on the complex global task and complete the task naturally when grasping and manipulating objects in 3D space instead of thinking about what type of hand motions is required.

To achieve this goal, a method that allows the operator to complete the task using natural hand-arm motions has to provide the robot manipulator with information of the hand-arm motion in real time. The information includes the hand and arm anatomical position and orientation [6, 23]. In [23], operators are required to use bare hands to control the robot and the accuracy is not enough to cope with the high-precision manipulation. Method [6] uses stereo vision to measure the human hand for controlling the robot. But the precision of the stereo vision is not so good and an occlusion is encountered easily. What is more, because method [6] needs the operator to make the big movements to carry out the task, it is time-consuming.

The proposed system uses a 3D Camera to locate the hand of the human operator and a camera to measure the distance between the robot tool and the target. This paper proposes Camshift to track the human hand and PF to estimate the position of the hand, as well as an adaptive multispace transformation method to improve the accuracy and efficiency of manipulation. Experimental results to validate the proposed methods are also presented.

The remainder of the paper is organized as follows. In Section 2, overview of our paper is presented. The human hand tracking system is then detailed in Section 3. Section 4 describes the position estimation using PF. The method of adaptive multispace transformation is presented in Section 5. Experiments and results are presented in Section 6. Discussions are detailed in Section 7, followed by concluding remarks in Section 8.

2. Overview

Figure 1 shows the structure and data flow of this system. Fifteen Skeleton points are extracted from Kinect. A PF estimation algorithm is used to estimate the positions and the orientation of the hand. The positions and the orientations are processed via AMT to improve the accuracy of manipulation.

3. Human Hand Position Tracking

Since operators are required to wear a glove to control the robot, the position information of the marks on the gloves can be used to achieve the position information of human hands. Besides, the color of the marks makes the marks easier to identify. In the marks tracking system, we use 3D camera (Kinect) to obtain two 2D images of the IMU: color image (Figure 2(a)) and depth image (Figure 2(b)). And the marker positions are tracked by Camshift algorithm, which becomes more and more popular because of its performance in reality and robust [24].

Camshift algorithm employs a nonparameter method and use a clustering method to search targets. Camshift algorithm uses color information of the image and finds the target by color matching. Because the objects are very similar in the image sequences, the Camshift algorithm is very robust in this situation. Camshift tracking system includes color probability distribution and Camshift flow.

3.1. Color Probability Distribution

Because of the fact that hue, saturation, and brightness are separate in HSV color space, the stability of the algorithm will increase by using HSV space. In this paper, the value of hue is selected as the parameter to build target histogram. The transformation equations from RGB space to HSV space are Let the total number of image pixel be , let the level of histogram of hue be , let the image pixel be , and let the histogram index function of be ; then the object histogram of hue can be described as where is unit impulse transfer function. Figure 2 shows the color image of the selected region and the corresponding hue histogram.

3.2. Camshift Flow

After establishing the color probability model of the object, every video frame should be converted into a color probability distribution image . The process of Camshift flow is shown as follows.(1)Select search window with the size of in and determine the original centre position of the search window.(2)Calculate the 0th and 1st order moment: (3)Calculate the mass centre of the search window (4)Set the centre of the window as the mass centre and repeat steps 2, 3 until the mass centre position converges to a point.(5)Calculate the 2th order moments: (6)Calculate parameters (): (7)Calculate the size of major and minor axis of tracked object and direction angle of ellipse in image:

Then the new window from the calculated ellipse can be achieved and the window should be set as the original window in the next video frame.

3.3. Feature Points Extracting

The feature points from the window are determined by Camshift. In the feature points extracting system, RGB color model is required. In 24-bit RGB color model, every color is represented by red, green, and blue. Different combination yields different color. The glove used in this paper is yellow, and the marker is red and blue. The tracking program needs to recognize the red and blue point in the image to extract the human hand.

Assume the value of a pixel in the image is ; the model to recognize the red point in the image is as follow: where , , are the threshold values.

The 3D coordinates of the feature points are reconstructed by the depth images (Figure 2(b)).

4. Position Estimation for Feature Points Using PF

4.1. Particle Filter

The PF is a suboptimal resolution, which estimates the true posterior using a finite number of random state samples amongst their corresponding normalized weights. Thus, at time , the posterior density approximation is where is the Dirac delta function, is the number of samples, is the normalized weight of the th particle, and is the th particle.

For derivation of the PF state estimation algorithm, the posterior density up to time is

Since sampling from the posterior density is frequently difficult, an importance sampling technique [25] is employed. If target density (posterior density in this case) can be assessed at any juncture, but sampling remains difficult, samples may be obtained from a recognized normalized probability density , the so-called importance density. In order to counteract for variances between the target and the importance densities, normalized weights defined as the ratios of the two densities are given to each particle [26]. Up to time , the discrete posterior density approximation is described as

The normalized weight, defined in (12), possesses the subsequent correlation with the target and importance densities, and , respectively:

Note that the importance density must be selected such that it may be determined recursively as

If the importance density, like the target density, also fulfills the Markov property, (12) can be rewritten as

With further simplification of (14), the importance density can be selected from prior as Subsequently, (14) can be rewritten as

This type of PF is problematic in that, when is high, only one particle will possess higher weights (nearing unity), while the weights of all remaining particles will be negligible (approaching zero). This phenomenon, referred to as the degeneracy problem, is unfavorable since the weighted particles fail to accurately reflect the real posterior density. To circumvent this issue, resampling of particles on the basis of their weights should be performed. After resampling, all particles are assigned the same weight; the weights at time are identical (). Resampling reduces the sample numbers from the lower weights by drawing more higher-weight samples. Consequently, (16) becomes Furthermore, after resampling at time , (11) can be written as

Various methods reported in the literature propose calculating weights on the basis of “fitness” [27, 28] or “evidence” [29, 30] values of each particle to characterize the likelihood.

4.2. Position Estimation Using PF

This proposed method estimates the object’s position using a PF, whose state is set as . The position measurements can be used as the expected value and the position calculation of each position particle can be used to determine the weight of the particle. However, in an approach like this, differences for each position state in the position measurements and position calculation are not only due to the position state, but also due to noise from sensor errors.

For a time period (, where subscript is the th orientation iteration, ), the summation of the position difference is used to obtain the weights rather than the immediate position difference at time . Then, the accumulated position difference from the estimated and calculated values for the th particle is used in the likelihood calculation as in where , is the measured position component and are the position states. For a given particle, the lower the , the greater the likelihood that the orientation is correct. Based on the values the weight of each particle is recalculated for every period that elapses. Then, rather than using the direct position measurements, the PF should be altered to use . So, up to time the posterior approximation becomes Then, for up to time the posterior can be presented as And at the th orientation iteration the normalized weight is

The following relationship for the normalized weight is determined through resampling and picking the importance density from :

Basing the calculation of , the particle’s weight is determined as the most likely value of . is used as the orientation with the smallest accumulated position error that is most likely to have the correct orientation. This means that the calculation for the normalized weight is where the standard deviation of is . The proposed particle filtering technique can be summarized and described as follows.

Step  1. Initialization: if , draw from and .

Step  2. Prediction: calculate according to (18).

Step  3. Likelihood: calculate according to (19).

Step  4. Update: if it remains that , then(1)predicted states ;(2)calculate (3)weight calculation:(a)normalize weights (b)resampling Draw based on (c)reset weights (d)

Over.

4.3. Pose of Hand

In this paper, the position of the marked points in the glove is used to estimate the position of the human hand. The orientation of the hand is in accordance with the orientation formed by thumb tip, index finger tip, and palm space of the operator’s hand (Figure 3).

It means once the transformation matrix is obtained, which is a transformation matrix from the coordinate system of the console to the coordinate system of the operator’s hand. The transformation matrix from the base coordinate system to the end-effector is also obtained. The derivation of the orientation matrix is detailed below.

Assume the origin of the operator’s hand coordinate system is identical to the one in the console coordinate system, and the transformation matrix is a matrix . In the process of hand tracking and positioning, the unit vector , , in direction , , which are measured by cameras yield is Through (17), it yields

As stated before, the transformation matrix from console coordinate system to operator’s hand coordinate system is

Notice that the [] is the translation matrix of the hand.

5. Robot Manipulation System

5.1. Adaptive Multispace Transformation (AMT)

Humans have inherent perceptive limitations (such as the perception of distance) and motor limitations (such as physiological tremor), which prevent them from operating precisely and smoothly enough for certain tasks [31]. To improve visual and practical performances of the teleoperation interface, a modified adaptive multispace transformation (AMT) is applied. This method involves an interface with two scaling processes which links human operators’ workspace and robots’ working space (Figure 4). One process scales the movement produced by the human operator. The other process changes the scale of the vector , which is the virtual unit vector of the central axis of the robot EE and the robot movements. These changes help the operator to modify the robot speed and improve performance.

Scaling vector is used to map the actions of the human hand from master space MS to the working space WS. is a function of the distance between robot EE and the target. When , it decelerates the movement of robot EE. Otherwise, it accelerates the movement of when .

As the movement of the virtual position in VS is affected by vector , the Euler angular velocity in MS and the angular velocity in WS are

Assume that is the velocity vector of the hand movement in MS and is the speed of the robot EE in WS. So we have where and are the threshold. Since is a function of distance , the function is defined as where and are constants.

5.2. Robot Control

Moving angles of Robot’s joints can be achieved by the solution of inverse problems. Given a position and pose of end-effector, every angle that makes a robot reach to the expected position can be achieved by the solution of the inverse problem.

In the standard DH representation, presents the homogeneous coordinate transformation matrix from coordinate to :

For a robot with 6 joints, the homogeneous coordinate transformation matrix from base coordinate system to end-effector’s coordinate system is defined as where is the row vector of the end-effector, is the pitch vector, is the yaw vector, and is the position vector.

Using (32) and (28), it is known that

To solve the nonlinear (8), Levenberg-Marquardt (LM algorithm) [32] is used to fit to the nonlinear model.

6. Experiments

6.1. Environment of Experiment

To verify the proposed method, a series of five tests are carried out to imitate the human hand motion in the human-robot-manipulator interfaces for robot teleoperation. A GOOGOL GRB3016 robot is used in our experiments. Table 1 lists the nominal Denavit-Hartenberg (DH) [33] parameters of the robot. We compared our method with the method [6] in the accuracy of these two methods which has been tested as well as the ability of the manipulator to imitate human hand-arm motion. Being different from method [6], which controls the position and orientation of robot manipulators directly, our method uses velocity control. In every test, the operator moved his arm in the working space to perform a pick-and-place task which requires the operator to pick up an object and place it on a target position whose edges were aligned as shown in Figure 6.

Because picking up and placing the object require opening or closing the claws of the robot, we use a motion to give this order. When the distance of two blue points is small enough, the claws will be close. Instead, when the distance of two blue points is large enough, the claws will be open. The object used in the tests is a rigid plastic block whose height, width, and thickness are 180 mm, 90 mm, and 50 mm correspondingly. The 2D error in translation is the distance between two center points of the target and object. The 2D error in rotation is the angle between two edges of the target and the object (Figure 5).

6.2. Results of Experiment

Figure 7 shows the results of the human hand and robot EE (Test 3). The 3D paths of the robot EE for two methods were shown in Figure 7(a). The path of our method is the red line and that of method [6] is the blue dotted line. Figures 7(b)7(g) show the result of the translation and the orientation of the robot with method [6]. There are three circles on the 3D path of the robot EE (Figure 7(a)). The time from the beginning to the first circle is the period of approaching the object. The time from the first circle to the second circle is the period of picking the object. Then the time from the second circle to the third circle is the period of placing the object on the target. During the period of picking the object and the period of placing the object on the target by the robot, some minor correction of the position and orientation of the robot EE for overshoot was required, mainly preceding gripper closing to pick the object and gripper opening to place object on the target. Because of the hands shaking or body shaking, the curves at the stationary period are barely straight. During the object manipulation tasks, the operator needed to make frequent movement to adjust the orientations. With our method, the period of picking the object is 45th s to 84th s and 112th s to 136th for the period of placing. With method [6], the period of picking the object is 41th s to 97th s and 120th s to 157th s for the period of placing. The results with our method are rather excellent.

In experiment, we compared our method with method [6] in terms of accuracy and operating time. Figure 8 shows the errors for 5 tests with two methods. In our method, the 2D absolute errors for 5 tests ranged from 2.61 mm to 3.48 mm in translation, 2.27 mm to 4.47 mm in translation, and 0.66 deg to 1.11 deg in rotation, with the mean absolute errors (MEs) of 3.02 mm, 3.20 mm, and 0.91 deg. In method [6], the 2D absolute errors for 5 tests ranged from 5.17 mm to 6.64 mm in translation, 3.64 mm to 5.87 mm in translation, and 2.17 deg to 2.75 deg in rotation, with the mean errors (MEs) of 5.77 mm, 4.54 mm, and 2.44 deg. Comparing with method [6], the mean total errors of translation of our method drop by 2.94 mm, as well as the rotation errors drop by 1.53 deg. The translation errors of our method are slightly smaller than those of method [6], but the rotation errors of our method are less than half of those of method [6]. In our method, we used AMT to measure the orientation of the human hand, so the rotation errors are smaller in our method. But due to the perceptive limitations and the motor limitations, the human operator is hard to carry out the high precision operation.

Since the method [6] needs to execute big movements to control the robot manipulator, the operating time of method [6] is longer than that of our method (Table 2). The mean time of method [6] is 165.66 s, which is more than that of our method (131.0 s). With AMT method, the robot can move by a large range even though the operator makes small movements. Due to the accurate manipulation, the operator did not need to make so much correction of position and orientation and it saved a lot of time of operation.

7. Discussions

The robot is controlled by extracting the position and pose of human hands, including tips of index fingers, palms, and thumbs. In reality, the coordinate of human hands extracted by Camshift algorithm may be unstable and human hands would tremble sometimes due to physical limitation. Thus tremble phenomenon may occur in control data. This paper uses velocity control theory to eliminate the data jitter.

For the robot teleoperation in the remote unstructured environment, a condition is assumed that all the remote robot site components can be installed on a mobile platform and enter those unstructured environment. And the remote unstructured environments involve robotic arm, robot controller, cameras on end-effectors, and some other cameras. The method shown here is testified on grabbing, picking, and inserting a peg into a hole. This system includes the operator into the decision control loop and that is a significant advantage. Firstly, it allows a robot to finish some tasks like grabbing or picking objects without any prior knowledge like start location and even destination location. Secondly, there are some similar tasks requiring decision making. When picking objects or cleaning some objects which may contain some dangerous items, decision making is always required. Also it is expected that this system can be used to achieve those complex poses when the joints of the robot are limited. The peg-into-hole tasks testify the ability of this robot teleoperation method to determine the random position of the target and the object. This function is helpful because assembly and disassembly tasks include more limited peg-into-hole tasks. However, for these tasks, an appropriate grabhook, bigger hole, and groove are needed unless this system includes force feedback.

Compared with contacting electromagnetic devices, like hand joystick and data gloves, our method would not hinder most natural human-limb motion and allows the operator to concentrate on his own task instead of decomposing commands into some simple commands. Compared with the noncontacting markerless method [23], our method proves more accurate and stable. This system can be used immediately without any initialization. Compared with the automatic capture [6], this algorithm uses manual positioning. Considering the hand tremor, this algorithm includes coarse adjustment and fine adjustment function. When guiding the robot, we use the coarse adjustment to make the robot move close to the target fast. When grabbing the target, we use the fine adjustment to position the robot accurately. This can ensure the safety and the efficiency of the teleoperation and solve the problem of inaccuracy caused by manual operation.

This paper contributes to the guiding teleoperation system based on noncontacting measurement. By using tracking based on Kinect, now robot teleoperation allows the operator to control the robot by a more natural way. Generally speaking, using the same hand motion that naturally would be used in a task can accomplish the operation task. And what is more, this tracking based on Kinect is noncontact. Thus, compared with contacting electromagnetic devices, devices based on sensor and data gloves which are used normally, noncontacting devices may less hinder natural human-limb motion. The method raised here will allow the operator to focus on his task instead of thinking of how to decompose the commands into some simple commands that the voice recognition teleoperation system can understand. This method raised here is more natural and intuitive than the operation in [23]. The system can be used immediately without any initialization. And this noncontacting control system can be used outdoors.

Further research can merge the gesture control and voice control together. Simple commands like start, end, and pause can be controlled by voice. And there is no need to recognize the specific motion and thus this system can control the robot more flexibly and advantageously.

8. Conclusions

This paper presents a human-robot interface system that utilizes one 3D camera configuration PF and that is developed to estimate the position and the orientation of the human hand more stably. To eliminate the effect of the occlusion and the failure of motion sensing, an AMT method is proposed in this paper. Since humans have inherent perceptive limitations and motor limitations, the high-precision tasks cannot be performed directly by a human. Therefore, we employ an AMT system to assist the operator to perform the high-precision tasks.

In the experiment, a task of picking and placing and a task of peg-into-hold were carried out. We compared the accuracy and the efficiency between our method and method [6]. The experimental results show that our method has better performance.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.