Abstract

Two different visual servoing controls have been developed to govern a translating parallel manipulator with an eye-in-hand configuration, That is, a position-based and an image-based controller. The robot must be able to reach and grasp a target randomly positioned in the workspace; the control must be adaptive to compensate motions of the target in the 3D space. The trajectory planning strategy ensures the continuity of the velocity vector for both PBVS and IBVS controls, whereas a replanning event is needed. A comparison between the two approaches is given in terms of accuracy, fastness, and stability in relation to the robot peculiar characteristics.

1. Introduction

Visual servoing is the use of computer vision to control the motion of a robot; two basic approaches can be identified [14]: position-based visual servo (PBVS), in which vision data are used to reconstruct the 3D pose of the robot and a kinematic error is generated in the Cartesian space and mapped to actuators commands [57]; image-based visual servo (IBVS), in which the error is generated directly from image plane features [815]. Recently, a new family of hybrid or partitioned methods is growing, with the aim of combining advantages of PBVS and IBVS while trying to avoid their shortcomings [16, 17].

The principal advantage of using position-based control is the chance of defining tasks in a standard Cartesian frame. On the other hand, the control law strongly depends on the optical parameters of the vision system and can become widely sensitive to calibration errors. On the contrary, the image-based control is less sensitive to calibration errors; however, it is required the online calculation of the image Jacobian, that is, a quantity depending on the distance between the target and the camera which is difficult to evaluate. A control in the image plane results also to be strongly nonlinear and coupled when mapped on the joint space of the robot and may cause problems when crossing points which are singular for the kinematics of the manipulator [1].

Visual servo systems can also be classified on the basis of their architecture in the following two categories [1]: the vision system provides an external input to the joint closed-loop control of the robot that stabilizes the mechanism (dynamic look and move); the vision system is directly used in the control loop to compute joints inputs, thus stabilizing autonomously the robot (direct visual servo). In general, most applications are of the dynamic look and move type; one of the reasons is the difference between vision systems and servo loops rates. The low frequency imposed by vision might cause problems on controller’s stability, especially in cases where several DOFs are involved.

The aim of this work is to compare two different visual servoing controls, respectively, of the position-based and image-based type, implemented to govern a translating parallel manipulator with an eye-in-hand configuration. The robot must be able to reach and grasp a special target randomly positioned in the workspace; the control must be adaptive to compensate motions of the target in the 3D space. The use of Bézier curves [18, 19] in the trajectory planning algorithms ensures the continuity of the velocity vector, whereas a replanning event is needed. A dynamic look and move philosophy has been adopted to conjugate the low frame rate of the vision system (lower than 30 Hz) with the high control rate of the joint servo (about 1000 Hz).

2. Hardware Setup

The robot, which is called ICaRo, is a research prototype designed and realized at the Machine Mechanics Lab of Università Politecnica delle Marche [20]; the end-effector is in parallel actuated by 3 limbs whose kinematic structure only allows pure translations in the 3D space. The workspace is a cube of 0.6 m edge free of singular points (Figure 1(a)).

The eye-in-hand configuration has been chosen for the installation of the vision system. The optical axis of the camera is aligned with the end-effector vertical axis. The end-effector is also provided with a pneumatic gripper in order to grasp the target when the desired relative pose is reached (Figure 1(b)). The robot is managed by a central control unit, a DS1103 real-time board by dSPACE. The vision data, after image acquisition and preliminary processing made by the CVS real-time hardware of National Instruments, are sent via a serial interface to the central unit that runs the control algorithms.

3. PBVS Control

Using a position-based method, a 3D camera calibration is required in order to map the 2D data of the image features to the Cartesian space data. This is to say that intrinsic and extrinsic parameters of the camera must be evaluated. Intrinsic parameters depend exclusively on the optical characteristics, namely, lens and CCD sensor properties. The calibration of intrinsic parameters can be operated offline in the case that optical setup is fixed during the operative tasks of the robot. Extrinsic parameters indicate the relative pose of the camera reference system with respect to a generic world reference system. It is assumed that the world reference system is exactly the object frame, so that the extrinsic parameters give directly the pose of the camera with respect to the target. Obviously the extrinsic parameters are variable with robot or target motion, and an online estimation is needed in order to perform a dynamic look-and-move tracking task.

3.1. Estimation of Intrinsic Parameters

A standard pinhole camera model is here briefly described to introduce the notation used in this paper. A point in the 2D image space is denoted by 𝐦=[𝑢𝑣]𝑇, while a point in the 3D Cartesian space is denoted by 𝐩=[𝑥𝑦𝑧]𝑇. Introducing the homogeneous coordinates, it is 𝐦=[𝑢𝑣1]𝑇 and ̃𝐩=[𝑥𝑦𝑧1]𝑇. The image projection that relates a 3D point 𝐩 to its corresponding point 𝐦 in the image plane can be expressed as𝑠̃𝐦=𝐀𝚷𝐁𝐩𝚷=100001000010,(1) where 𝑠 is an arbitrary scale factor and 𝐁 is the 4×4 matrix of the extrinsic parameters that combines a rotation and a translation relating the world coordinate system to the camera coordinate system. 𝐀 is the matrix of intrinsic parameters, defined as𝐀=𝛼𝛾𝑢00𝛽𝑣0001,(2) where(i)(𝑢0,𝑣0) are the coordinates of the principal point, that is, the projection of the optical axis on the CCD sensor;(ii)𝛼 and 𝛽 are the focal scale factors in the image 𝑢 and 𝑣 axes;(iii)𝛾 is the parameter describing the skewness of the two images axes, usually set to 0.

The pinhole model needs to be refined in order to correct image distortions introduced by the lens [2125]. To this aim, normalized coordinates are conveniently introduced: given a point 𝐩=[𝑥𝑐𝑦𝑐𝑧𝑐]𝑇 in the camera reference system, the corresponding normalized pinhole projection is defined as 𝐦𝑛=[𝑢𝑛𝑣𝑛]𝑇=[𝑥𝑐/𝑧𝑐𝑦𝑐/𝑧𝑐]𝑇. It can be demonstrated that, in the ideal case free of lens distortions, the intrinsic camera matrix 𝐀 relates a point in the image plane with its normalized projection: 𝐦𝑛=𝐀1𝐦. Actually, normalized coordinates obtained from previous equation are affected by image distortions, and the real case is distinguished from the ideal case by using the notation 𝐦𝑑=𝐀1𝐦. Two components of distortion can be recognized: the radial lens distortion causes the actual image point to be displaced radially in the image plane; if centres of curvature of lens surfaces are not collinear, also a tangential distortion is introduced. The following expression is generally used to model both components of distortion:𝐦𝑑=𝐦𝑛+𝛿𝐦(𝑟)+𝛿𝐦(𝑡),(3) where 𝛿𝐦(𝑟) and 𝛿𝐦(𝑡) are defined as high-order polynomial functions of the undistorted coordinates 𝑢𝑛,𝑣𝑛 [21]. Because of the high degree of the distortion model, there exists no general algebraic expression for the inverse problem of computing the normalized image projection vector 𝐦𝑛 from the distorted coordinates of vector 𝐦𝑑. If low-distortion lenses are used, a direct solution can be found making the approximation of computing 𝛿𝐦(𝑟) and 𝛿𝐦(𝑡) as functions of distorted coordinates. Inspiring to well-known distortion models [2125], a rough correction of the image distortion can be quickly obtained by using the equation𝐦𝑛=1𝑎𝐦𝑑𝑏𝑐𝑇,(4) where𝑎=1+𝑘1𝑟2𝑑+𝑘2𝑟4𝑑+𝑘3𝑟6𝑑,𝑏=2𝑝1𝑢𝑑𝑣𝑑+𝑝2𝑟2𝑑+2𝑢2𝑑,𝑐=𝑝1𝑟2𝑑+2𝑣2𝑑+2𝑝2𝑢𝑑𝑣𝑑,𝑟2𝑑=𝑢2𝑑+𝑣2𝑑,(5) and 𝑘1,𝑘2,𝑘3,𝑝1,𝑝2 are additional intrinsic parameters to be defined by calibration.

Several algorithms for the estimation of intrinsic parameters are available in literature [2225]. A technique inspired to the Heikkilä algorithm has been adopted: after the settings of the optical system have been tuned for an optimal result of the vision, a number 𝑛>4 of frames (15 in our application) of a grid of known dimensions are acquired; through an automatic extraction of the corners of the grids, the algorithm is able to estimate with an iterative procedure the intrinsic parameters of the camera including distortion parameters, according to (4) and (5).

3.2. Estimation of Extrinsic Parameters

The reconstruction of the relative pose between camera and object is possible through the estimation of extrinsic parameters of the camera with respect to a reference system that is coincident with the object frame 𝑂𝑜𝑥𝑜𝑦𝑜𝑧𝑜. Figure 2 shows the optical pattern realized on the top surface of the target object; it consists of four coplanar circles positioned at the corners of a square of known edge 𝑑. The object frame is attached to the pattern in the way shown in the figure.

The position of the origin 𝑂𝑜with respect to the camera frame is the vector 𝑐𝐭=𝑐(𝑂𝑜𝑂𝑐), while the relative orientation between the two reference systems is the rotation 𝑐𝑜𝐑. The homogeneous transformation 𝑐𝑜𝐓 coincides with the extrinsic parameters matrix𝐁=𝑐𝑜𝐓=𝑐𝑜𝐑𝑐𝐭𝟎𝑇1.(6) The aim is to determine the extrinsic parameters, once the geometry of the optical pattern and the pixel coordinates of the centroids of the circles projected onto the image plane are known.

The problem of determining the pose, knowing the correspondence of 𝑛 points in the world and camera frame reference systems, is typical in photogrammetry (PnP problem), and it is proved that for 4 coplanar points the solution is unique [26, 27]. While the calibration of intrinsic parameters is quite laborious, but does not need to be performed online, the solution of the PnP algorithm for 4 coplanar points is fast enough to be implemented online during the motion of the robot.

An experimental test is here presented in order to evaluate the accuracy of the implemented pose estimation method: while keeping fixed the camera, the optical target of Figure 2 is moved 7 mm along each axis using a precision micrometric stage. An image of the pattern is grabbed before the motion, and a series of 100 images are stored after the motion.

The relative displacements can be estimated by making the difference between the pose obtained from the first image and the pose obtained from each one of the further 100 images. The test has been repeated in two configurations, setting the initial z-distance between the camera and the object, respectively, at 500 mm and 150 mm.

Experimental data of the tests are plotted in Figure 3. It is clear that there is a larger dispersion of measurements at a bigger distance from the target; moreover, it can be noticed a larger sensitivity to noise along the 𝑧 axis, parallel to the optical axis. It results also evident how the accuracy of the pose estimation increases approaching the target. The same conclusions can be drawn from Table 1, where experimental data are summarized in terms of mean values and standard deviations: passing from a distance of 500 mm to 150 mm, the standard deviation lowers nearly one order of magnitude in 𝑥 and 𝑦 and two orders in 𝑧.

3.3. Control Algorithm

A dynamic look and move control has been implemented following the position-based approach above described. The global architecture is shown in Figure 4, where s represents the vector of image features, 𝐪 is the vector of joint angles, and 𝐱𝑚,𝐱𝑑, and 𝐱plan are, respectively, the measured, desired, and planned Cartesian coordinates vectors. Looking at the scheme of the control, two separate loops can be identified: the inner loop realizes a standard PD control in the operative space with gravity compensation (𝐠 term) by exploiting the information from encoders; the camera, working as an exteroceptive sensor, closes an outer loop where visual information is processed in order to plan and replan in real time the desired trajectory of the robot. It is important to remark that image features extraction is performed externally from the control loop by the real-time image processing hardware (CVS); only the pose estimation, that includes the solution of the PnP algorithm, is involved in the loop.

A control in the Cartesian space is realized imposing the following requirements to the planned motions:(a)starting from an arbitrary status of the end-effector (position and velocity), the continuity of position and velocity must be ensured;(b)the final point must be reached with a vertical tangent in order to facilitate the grasping of the target object.

To this purpose, a third-order Bézier curve has been adopted [18, 19]. This curve is defined by 4 points in the 3D space and has the fundamental property of being tangent to the extremities of the polyline defined by the four points (Figure 5(a)).

The parametric formulation of a cubic Bézier curve is𝐁(𝑠)=𝐏01𝑠3+3𝐏1𝑠(1𝑠)2+3𝐏2𝑠2(1𝑠)+𝐏3𝑠3[].𝑠0,1(7)

In the trajectory planner algorithm, 𝐏0 is the current position of the end-effector, while 𝐏3 is the target point obtained from the vision system. Knowing the current velocity direction ̂𝐯0 and the desired final velocity direction ̂𝐯𝑓, 𝐏1 and 𝐏2 are defined as𝐏1=𝐏0+𝐏3𝐏0𝑛̂𝐯0,𝐏2=𝐏3𝐏3𝐏0𝑛̂𝐯𝑓,(8) where 𝑛 is a tunable parameter that influences the curvature of the trajectory (Figure 5(b)); all experimental data reported in the paper are obtained by setting 𝑛=3.

Since the target 𝐏3 is continuously estimated during the motion of the robot, variations of its position due to motions of the object or to the reduction of the measurement errors (as shown in Figure 3) are compensated by a continuous replanning of the trajectory. A trapezoidal velocity profile is used with a temporal planning algorithm that is able to modify the shape of the profile, while the length of the planned trajectory changes during the motion.

4. IBVS Control

In the image-based visual servo control, the error signal is defined directly in terms of image feature parameters. In the present application, the optical target is an annulus painted on the top surface of a cylinder (Figure 6).

The image features are the coordinates of the center and the major axis of the ellipse which best approximates the projection of the annulus on the image plane; they are collected in the image space vector 𝐬={𝑢,𝑣,𝑑}𝑇 expressed in pixel. The relation between image space and Cartesian space is given by in the interaction matrix 𝐋𝐞 [27]: being ̇𝐱={̇𝑥,̇𝑦,̇𝑧}𝑇 the Cartesian velocity and ̇̇̇𝐬={̇𝑢,𝑣,𝑑}𝑇 the image space velocity, it iṡ𝐬=𝐋𝐞̇𝐱.(9) Here, the interaction matrix is defined as𝐋𝐞=𝑓/𝑧0𝑢/𝑧0𝑓/𝑧𝑣/𝑧00𝑓𝐷/𝑧2,(10) where 𝑓 is the focal length (assumed equal in 𝑢 and 𝑣 directions) expressed in pixel and 𝐷 is the metric value of the annulus diameter in the Cartesian space. It arises from (10) that the interaction matrix is a function of image features 𝑢, 𝑣 and of the Cartesian variable 𝑧. The estimation of 𝑧 is performed by a linear interpolation on the parameter 𝑑; the interpolation law is obtained in a preliminary calibration step. The inversion of (9) is performed at each time step of the control, so thaṫ𝐱=𝐋𝐞1̇𝐬.(11)

The trajectory planner operates in this case over the image space: in analogy with the PBVS control, it is defined as a Bézier curve according to (2) through points defined by image coordinates {𝑢,𝑣,𝑑}. Such kind of strategy allows for the continuity of the velocity in both image and Cartesian spaces, even in the case of motion of the target.

Particular attention is focused on the definition of the velocity profile: without going into details for the sake of brevity, it can be summarized that velocity profiles on the image space are properly planned in order to obtain an effective trapezoidal profile in the Cartesian space. This operation is required to prevent that any component of the Cartesian velocity may assume peaks that are too high or may approach asymptotically zero during the motion of the robot. As an example, we may look to the relation that results from the classic pinhole camera model𝑑=𝑓𝐷𝑧.(12) The derivative of the above expression is:̇𝑑̇𝑧=𝑧𝑓𝐷2,(13) which could be find directly from the term 𝐋𝐞(3,3).

Thus, if a constant ̇𝑑 is imposed according to the trapezoidal planning, the resulting Cartesian vertical speed ż will approach zero when 𝑧 approaches zero (i.e., when the camera approaches the target). Therefore, if the vertical speed has to be controlled, it must be directly planned in the Cartesian space, and analogously for the 𝑥 and 𝑦 components.

The scheme of the implemented IBVS, control is shown in Figure 7. As in PBVS it is possible to individuate the inner and outer loops. In the outer loop, the pose estimation block disappears, and the image features are directly the input of the trajectory planning block. Then the inverse of the interaction matrix is used to map the trajectory from image to Cartesian space according to (11).

5. Experimental Tests

A series of tests have been performed in order to compare the PB and IB approaches at different speeds and in cases of fixed or moving target. The starting position of the robot allows the camera to frame a wide area of the workspace; when the target is recognized by the vision system, the controller starts to govern the robot till the target is reached.

Figure 8 shows the results of the slow tests (𝑣max=101 m/s) with a fixed target. The plots on the left side are referred to the PBVS, while IBVS results are on the right column; the position of the target is the same for both tests. The image plane trajectories of the features and the respective centroid are represented on the top of the figure; such features are the four coplanar circles for the PB and the annulus diameter (represented as a circle) for the IB. The Cartesian position and velocity are plotted, respectively, in the middle and at the bottom of the figure; lines refer to measured entities, while circular markers indicate planned values. Data show that the IB control has a slightly higher accuracy in centering the target. Further on, the IBVS presents higher 𝑥 and 𝑦 components of velocity in the initial part of the motion, which allows for quickly compensating the offset between camera and target in the horizontal plane.

Same considerations are even more evident in fast tests where the maximum velocity of the robot is twice that in the slow tests (𝑣max=2101 m/s). Results are plotted in Figure 9 in analogy with Figure 8. Against a strong decrease of the time required to reach the target, it is not noticed for the IB control an appreciable loss of accuracy. For the PBVS, on the contrary, a certain increment of the final error is evident in the image plane.

For the sake of completeness, the error between planned and measured Cartesian coordinates in slow and fast tests is plotted in Figure 10. The comparison between IB and PB controls at the same speed shows that Cartesian errors are substantially similar. On the other hand, the error increases of about a factor 2 passing from the slow to the fast mode.

A further experimental evaluation is here described to prove the ability of the control in compensating dynamically a generic motion of the target: once the visual control has been started, the object is moved by means of a conveyor belt; the controller is able to replan continuously the trajectory ensuring the continuity of the velocity and quickly compensating the displacement imposed to the target. Tests are performed with 𝑣max=2101 m/s, which implies a time of about 5 s to reach the target. The original (referred to the PB algorithm) and replanned trajectories are plotted in Figure 11 in orthographic and axonometric views. The higher sensitivity to offsets on the horizontal plane is once again noticeable for the IBVS, which translates in higher performances in path following tasks.

6. Conclusions

Two dynamic look and move visual servos, respectively of the position-based and image-based type have been implemented to govern a parallel translating robot. Both controls are able to detect a target in the 3D workspace and to dynamically compensate a generic motion of the target itself. A continuous replanning algorithm defines the trajectory of the end-effector ensuring the continuity of the velocity vector even if sudden displacements are imposed to the target; trajectories are planned using cubic Bézier curves that give a smooth spatial path and allow to impose the final direction of the velocity in order to facilitate the grasping task; such curves are planned, respectively, in the Cartesian space for the PBVS and in the image space for the IBVS.

Even if accuracies of the two proposed controls seem to be substantially comparable, experimental results suggest a better behaviour of the IBVS. Furthermore, the IBVS shows a greater sensitivity to horizontal displacement proved by higher components of the velocity vector in the 𝑥 and 𝑦 directions in the first part of the trajectory.

Both methods require the knowledge of the geometric model of the optical target; specifically for the proposed controls, the edge of the square formed by the four circles in the PB case and the diameter of the annulus in the IB case must be known in metric units. Nevertheless, the PB approach requires also the 3D calibration of the camera in order to estimate the intrinsic parameters matrix, which is then required for the execution of the PnP algorithm; as known from literature, the sensitivity to calibration errors is probably the main drawback of the PBVS.

Further on, even if the solution to the PnP problem returns the homogeneous transformation matrix 𝑐𝑜𝐓, only the vector 𝑐𝐭 is a useful information for a pure translation robot. It means that the PB approach gives redundant information for our specific purpose, which translates in useless computational effort for the real-time hardware.

All these reasons make the IBVS preferable for the studied application: the method is simple and accurate and does not require a camera calibration. Moreover, the risk of passing through or nearby singularities, which is typical for IB controls, is avoided if, as in this case, the robot is free of singular points inside its workspace.

Finally, it is worth to remark as the above considerations are strictly related to the kinematic properties of the robot and may not be valid a priori for other architectures. In this sense, authors are intended to implement PB and IB controls on a parallel robot which confers to the end-effector a 3DOFs motion of pure rotation. Such robot (SpheIRo) has been studied and prototyped at Machine Mechanics Lab of Università Politecnica delle Marche [28] with the philosophy of complementary mobility with respect to ICaRo. Unlike the translation manipulator, SpheIRo presents singular points inside its workspace which may cause problems of stability for the IB control. Moreover, passing from a relative position to a relative orientation problem, the sensitivity of the two algorithms in detecting the orientation between frames has to be assessed.