#### Abstract

Redundant manipulators are suitable for working in narrow and complex environments due to their flexibility. However, a large number of joints and long slender links make it hard to obtain the accurate end-effector pose of the redundant manipulator directly through the encoders. In this paper, a pose estimation method is proposed with the fusion of vision sensors, inertial sensors, and encoders. Firstly, according to the complementary characteristics of each measurement unit in the sensors, the original data is corrected and enhanced. Furthermore, an improved Kalman filter (KF) algorithm is adopted for data fusion by establishing the nonlinear motion prediction of the end-effector and the synchronization update model of the multirate sensors. Finally, the radial basis function (RBF) neural network is used to adaptively adjust the fusion parameters. It is verified in experiments that the proposed method achieves better performances on estimation error and update frequency than the original extended Kalman filter (EKF) and unscented Kalman filter (UKF) algorithm, especially in complex environments.

#### 1. Introduction

In special manufacturing, troubleshooting, and other fields, there will inevitably be a closed and narrow operating environment, which is high risk and inefficient for manual operation. Therefore, the research on robots that can be used in this complex environment is significant and challenging, especially since the traditional manipulators are unable to do the job due to their bulky structure.

Kinematic redundancy is one of the most important properties that determine whether the robots can accomplish the task in the highly constrained environments mentioned above [1]. Manipulators with kinematic redundancy are called redundant manipulators. Redundant manipulators can be implemented in different ways, such as the wheeled mobile manipulator used for part handling [2, 3], the hyperredundant manipulator used for engine manufacturing [4] or spacecraft maintenance [5], and the flexible bionic manipulator inspired by the octopus claw [6] or the elephant trunk [7]. Although redundant manipulators have excellent kinematic performance, they also face the challenge of control accuracy due to the accuracy of end-effector pose estimation.

Traditional manipulators generally use joint encoders as sensor inputs and use kinematics models to calculate the pose of the end-effector as information feedback for motion control. Since the accuracy of the joint encoder can be very high, the pose calculated by the kinematics model has low noise. However, redundant manipulators usually have a large number of joints and long slender links, which means the pose will be affected by the transmission gap, mechanical vibration, elastic deformation, and other factors that deviate from the calculated value. As a result, the manipulators are in an uncertain state of kinematics and dynamics, so complex control models need to be established to ensure control accuracy [8–10].

Complex control models may introduce system uncertainty, so many researchers try to solve the problem of unreliable encoders in redundant manipulators by studying sensor processing methods. Analyzing the source of the measurement error and establishing a compensation model can effectively improve the sensor accuracy. For example, in a cable-driven redundant manipulator, the accuracy of the encoder-based pose estimation can be improved by correcting the cable hole location error, cable length error [11], and nonnegligible cable mass [12].

In addition to error correction for a single type of sensor, multisensor fusion is also the main research direction. The most commonly used sensors fused with encoders including angle sensors, vision sensors, and inertial sensors. For one thing, adding angle sensors at joints and fusing with motor encoders can correct transmission errors and improve pose estimation accuracy [13]. For another, adding one or more eye-to-hand cameras to the environment and performing the visual measurement on redundant manipulators can correct deformation errors [14] and is also an effective method for flexible robot pose estimation [15]. Moreover, the Kalman filter and its improved methods are also used to fuse motor encoders and inertial sensors, which have good dynamic response performance and can correct vibration errors [16]. Because the sensor and the fusion algorithm are lightweight, it is especially suitable for small bionic robots [17].

The sensor processing methods mentioned above can indeed solve the problem of end-effector pose estimation, but there are also many unsatisfactory shortcomings, especially when applied in complex environments. Therefore, the estimator constructed in this paper uses the compensation model and multisensor fusion at the same time, but in different processing stages. In a dynamic environment, although the preestablished compensation model is prone to failure and cannot accurately reflect the relationship between the sensor and the estimated state, it can effectively correct the wrong measurement [18]. Therefore, this paper does not directly use the compensation model for state estimation but as the preprocessing part of the state estimation to enhance the original data of the sensor. In terms of multisensor fusion, although visual-inertial fusion has become an effective method of mobile robot navigation [19], it is not common in end-effector pose estimation of redundant robots, and it is mostly the fusion of eye-to-hand camera and kinematic sensors [20]. In a narrow environment, it is difficult to apply an eye-to-hand camera, so the vision sensor needs to be arranged at the end of the manipulator, which means that the vision sensor is more susceptible to the influence of obstacles. Therefore, based on the Kalman filter algorithm, this paper establishes a nonlinear motion prediction of the end-effector and a synchronization update model of the multirate sensors that combines vision sensors, inertial sensors, and encoders. Moreover, since the neural network can improve the robustness of pose estimation [21], this paper introduces the radial basis function neural network to adjust parameters adaptively. Through the above-mentioned improved fusion strategy, the pose estimation of the end-effector can be more adaptable to complex environments and can better overcome the adverse effects of sensor noise and different sampling rates of multisensor fusion.

The rest of this paper is organized as follows: Section 2 introduces the overall method. Section 3 introduces the complementary enhancement of the original data. Section 4 introduces the fusion strategy including the nonlinear prediction, the synchronization update, and the adaptive adjustment. Section 5 introduces the experimental results and analysis, and Section 6 summarizes the paper.

#### 2. The Overall Method of the Pose Estimation

Figure 1 shows the small redundant robot studied in this paper, which can enter the narrow space to accomplish the specified tasks.

Due to the miniaturized mechanical structure, when the end-effector is equipped with a load, the connecting rod between the joints will inevitably be bent and deformed. This deformation error will accumulate at the end-effector; as a result, the pose estimation based only on the joint encoders produces deviation. Therefore, structured-light RGB-D camera, magnetic angular rate, and gravity (MARG) sensor are added, together with the joint encoders.

In this paper, the fusion pose estimation method proposed mainly includes two parts: preenhancement and adaptive fusion. The system block diagram is shown in Figure 2.

The RGB-D camera and the MARG sensor can compensate for the error caused by a single encoder measurement to a certain extent, but at the same time, it is easy to reduce the robustness of the system due to the interference of the external environment. Considering that the RGB-D camera and the MARG sensor themselves are integrated sensors composed of multiple sensing units, each sensing unit has complementary characteristics, which can be used to enhance the original data of the sensors so that the fusion of multiple sensors can be completed based on robust data. According to the above theory, preenhancement is added before multisensor fusion.

The preenhancement part includes the image enhancement of the RGB-D camera and the quaternion extended Kalman filter (QEKF) fusion attitude calculation of the MARG sensor. In the image enhancement part, the color image is used to repair the depth image, which can make the measurement of the RGB-D camera more accurate and reliable in a narrow environment (Section 3.1). In the QEKF part, the magnetometer, accelerometer, and gyroscope, which are all parts of the MARG sensor, are fused to correct the drift error of each independent sensing unit (Section 3.2).

Through the preenhancement part, the robustness of each independent sensor is improved, but the following problems still need to be overcome in the multisensor fusion process: (a) nonlinear state model, (b) the sampling rate of different sensors differs greatly, and (c) sensor abnormal handling in extreme environments.

In the adaptive fusion part, the RBF and adaptive UKF (RBF-AUKF) method is modified to solve the above problems, which mainly includes pose prediction, synchronization update, and adaptive adjustment. In the pose prediction part, by establishing the pose prediction model of the end-effector and the nonlinear processing method, the pose state at the next moment can be predicted based on the pose state at the previous moment (Section 4.1). In the synchronization update part, a synchronization check is performed on the preenhanced quaternion attitude and position obtained by the visual odometer; quaternion attitude, angular velocity , and acceleration obtained by the QEKF algorithm; and quaternion attitude, position , and velocity obtained by the kinematic calculation. According to the check result, the control variable is relatively updated when the sensor data is asynchronous, while the state variable is absolutely updated when the sensor data is synchronous. After the prediction and update, the fusion pose estimation at this moment is obtained (Section 4.2). In the adaptive adjustment part, this paper selects the prediction error , estimation error covariance matrix , Kalman gain , and observation error after updating as features. Based on the RBF neural network, the process, and observation noise covariance matrix , is adaptively adjusted to improve the robustness of fusion (Section 4.3).

#### 3. Sensor Enhancement Based on Complementary Characteristics

##### 3.1. Enhancement of the Depth Image

The structured-light RGB-D camera is composed of a color camera, an infrared camera, and an infrared transmitter and obtains the depth value through the speckle feature of the infrared structured light projected on the surface of the object [22]. In a closed and narrow environment, infrared light is easily reflected and absorbed by objects multiple times, so it is easier to produce invalid depth values locally, which in turn introduces additional noise for pose estimation.

Since the invalid depth value occurs locally, the invalid point can be repaired by a valid point that is in a similar region to the invalid point. Since the color camera is not affected by the narrow environment, it can be used to determine whether the valid point and the invalid point belong to the same type of patch gathered by the point cloud, that is, the similar region. Therefore, the enhanced method proposed in this paper includes two steps, namely, similar region extraction and local point cloud reconstruction, as shown in Figure 3. The former extracts similar regions in the depth image through the texture features of the registered color image. The latter uses valid points in the similar region to construct a local model of the region and then recalculates invalid points in the region with the model to complete the enhancement repair process.

In the similar region extraction step, the Mean Shift algorithm is used to extract the color texture information of the color image, and the Flood Fill algorithm is used to extract the corresponding color-connected domains to form a similar region mask. Finally, the mask is applied to the depth image to obtain the region segmentation of the similar point cloud. Since the Mean Shift algorithm needs to perform multiple iterative calculations for each pixel, it will affect the real-time performance of the system in practical applications. In order to speed up the calculation process, this paper downsamples the input color image to a lower resolution, then forms the similar region mask on the low-resolution image, and finally performs bilinear interpolation on the mask to restore the original scale.

In the local point cloud reconstruction step, by fitting the patch model in the similar region and recalculating the depth value of the invalid point on the patch, the invalid point is repaired. The least-squares method is the basic method to fit the patch model, but in practical applications, the region segmented by the color image has deviation, and not all valid points can be used to fit the patch model. In order to form a more reliable local model, the RANSAC algorithm is used to filter the points used to fit the model.

Although the invalid point does not have the correct depth value, it contains the coordinate information of the pixel plane. Combining with the local model that has been fitted, the spatial coordinate of the invalid point can be recalculated, as shown in equation (1). Finally, the invalid point in the depth image is repaired. where is the optical center offset of the camera and is the focal length of the camera.

##### 3.2. Enhancement of the Attitude Calculation

The MARG sensor is a microelectromechanical system (MEMS) composed of a three-axis accelerometer, a three-axis gyroscope, and a three-axis magnetometer. However, due to the impact of sensor manufacturing accuracy and operating environment interference, the raw measurement data of the MARG sensor is generally noisy. From the analysis of its working principle, the gyroscope has good dynamic response characteristics, but it is easy to produce cumulative errors in the attitude calculation, and the accelerometer and magnetometer are just the opposite. Therefore, the QEKF algorithm is used to fuse the three measurement units of the MARG sensor so that the output attitude angle data can be enhanced, as shown in Figure 4.

**(a) Block diagram of attitude calculation**

**(b) The coordinate system of MARG sensor**

There are three coordinate systems in this method. Firstly, the acceleration vector measured by the accelerometer and the angular velocity vector measured by the gyroscope are both in the body coordinate system. Secondly, the magnetic field intensity vector measured by the magnetometer is in the coordinate system that can be transformed to the body coordinate system by the rotation matrix . Lastly, the geomagnetic field intensity vector and the gravitational acceleration vector are in the navigation coordinate system that can be transformed from the body coordinate system by the rotation matrix .

Suppose the system state variables at the -th time are where is the quaternion value of the rotation transformation matrix at the -th time and is the accumulated drift value of the gyroscope on the , , and axes at the -th time.

Suppose the system control variables at the -th time is

Suppose the system observation variables at the -th time is

According to the discrete-time model of the quaternion attitude differential equation [23], the following equation can be obtained:

where is the discrete-time interval, is the identity matrix with rows and columns, is the quaternion vector of at the -th time, and is the component of angular velocity in the body coordinate system.

Substituting equations (2) and (3) into equation (5), (6) can obtain the state prediction equation: where is the zero matrix with rows and columns.

To transform the measured value of the sensor, the following equations can be obtained:

Then, the system observation equation was obtained: where is the theoretical measurement value of the sensors and is the quaternion attitude predicted by equation (7).

According to the EKF algorithm, calculate the enhanced quaternion attitude at the -th time: where is the prediction of the system state, is the prediction of the estimation error covariance matrix , is the Kalman gain, is the identity matrix, and and are the noise matrix.

To further improve the robustness of the system, the process noise matrix and the observation noise matrix in equation (11) adopt an adaptive form.

Suppose that the angular velocity measurement value obtained by the gyroscope obeys the Gaussian distribution with variance , and the drift value obeys the Gaussian distribution with variance , substituting equation (7) to obtain the following:

Suppose that the acceleration measurement value obtained by the accelerometer obeys the Gaussian distribution with variance , and the magnetic field intensity measurement value obtained by the magnetometer obeys the Gaussian distribution with variance . Considering the influence of external acceleration and external magnetic field strength, where and are correction coefficients.

#### 4. Sensor Fusion Based on Modified RBF-AUKF Method

##### 4.1. End-Effector Pose Prediction Model

The RGB-D camera and MARG sensor are both arranged at the end of the manipulator, so the sensor data after preenhancement processing is directly in the end-effector coordinate system. The joint encoders are arranged at each joint of the manipulator, so it is necessary to transform the angle of each joint to the pose in the end-effector coordinate system through forward kinematics calculation.

Suppose the system state variables at the k-th time is where is the position of the end-effector at the -th time and is the quaternion attitude of the end-effector at the -th time.

Suppose the system control variables at the -th time is where is the , , and axis angular velocities of the end-effector, is the position velocity of the end-effector at the -th time, and is the position acceleration of the end-effector at the -th time.

According to the differential model of motion [24], the system state prediction equation can be obtained: where is the discrete-time interval and is the rotation matrix from the body coordinate system to the navigation coordinate system as shown in equation (17).

The state prediction equation mentioned in equation (16) has obvious nonlinear characteristics. Both EKF and UKF algorithms can solve the problem of nonlinear state estimation. However, when the calculation performance can meet the requirements, the estimation accuracy of the UKF algorithm is higher [25]. By sampling, the UKF algorithm approximates the probability density distribution of the nonlinear prediction equation, and the constructed sampling point set is as follows: where is the -th column of the sample point set matrix at the -th time, which is a 7-dimensional column vector, is the -th column of the matrix , which is an -dimensional column vector, is the scale parameter to adjust the approximation accuracy, and is the estimation error covariance matrix.

##### 4.2. Multirate Update Model

Due to the limitation of sensor processing capacity, the measurement update frequency of different sensors is not the same, or even quite different, which means that even if the sensors are triggered to collect at the same time, the measurement of each sensor cannot be received at the same time.

Although it is difficult for sensors to publish data synchronously all the time, it is possible to approximate the software synchronization of multiple sensors through timestamp alignment. The basic strategy is to set a sliding time window; when all sensor data falls within this window, it can be considered that the sensor data is synchronous. As is shown in Figure 5, is the size of the sliding time window, which determines the accuracy of data synchronization. Its maximum size is the smallest update time interval among all sensors. The smaller the size, the higher the accuracy of the time, but the lower the frequency of synchronous data.

The UKF algorithm corrects the prediction model through sensor observations. Therefore, when the algorithm is used for multisensor fusion, the fusion accuracy is greatly affected by the time synchronization [26]. Because the update frequency of synchronous data is generally much lower than the average update frequency of data, the real-time performance of synchronization fusion estimation is poor. To solve this shortcoming, a modified synchronization fusion algorithm is proposed including two update models, namely, the absolute state update model and the relative state update model.

Define a matrix with rows and columns as a state update matrix, and its -th row and -th column satisfy the following:

The absolute state update model updates the system state variables including the position and attitude. The number of rows and columns in the state update matrix satisfies the following: where is the total number of sensors synchronized with the system state variables and must be equal to which is the maximum number of sensors; otherwise . is the total number of the system state variables that the -th sensor can observe.

The relative state update model updates the system control variables including velocity and acceleration. The number of rows and columns in the state update matrix satisfies the following: where is the total number of sensors synchronized with the system control variables and must satisfy ; otherwise . is the total number of system control variables that the -th sensor can observe.

When the multisensor data is not all synchronized, the relative state update is used to increase the update frequency locally, and when the data is all synchronized, the system state is globally corrected to improve the estimation accuracy. The final observation equation is as follows: where is the absolute state update matrix determined by equation (20) and is the relative state update matrix determined by equation (21).

According to the UKF algorithm, calculate the fusion pose estimation of the end-effector: where is the state prediction of sampling points determined by equation (16), is the weighted mean of the state prediction, is the weight value determined by equation (24), is the weighted variance of the state prediction, is the state observation of sampling points determined by equation (22), is the weighted mean of the state observation, is the weighted variance of the state observation, is the weighted variance of the state prediction and observation, is the sensor measurement value, is the Kalman gain, and are the noise matrix.

##### 4.3. Adaptive Adjustment of the Noise Matrix

It can be seen from equation (23) that the process noise covariance matrix and the observation noise covariance matrix can affect the confidence level of the data during state prediction and update. In general applications, the variance of system state noise and sensor measurement noise is a fixed value that can be counted in advance. However, in a complex environment, the variance of noise will fluctuate to a certain extent. If the fixed value is still set in advance, it will affect the accuracy of the final fusion state estimation. Therefore, the system identification ability of the RBF network needs to be used to adaptively adjust the noise variance.

The RBF network includes a three-layer structure of input layer, hidden layer, and output layer, as shown in Figure 6. The prediction error , estimation error covariance matrix , Kalman gain , and observation error shown in equation (23) can reflect the deviation and fluctuation during the prediction and update process in the UKF and can be used as features of the input layer.

Considering the similarity between the features, select the features according to the position state, attitude state, and the observations of each sensor: where are the position and attitude prediction error; are the estimation error covariance matrix of position, position-attitude, attitude-position, and attitude; are the Kalman gain of the three sensors; and are the observation error of the three sensors.

Extract the main information of the feature matrix in equation (25) to obtain the input layer feature vector , which contains 10 feature information:

From input layer to hidden layer, radial basis function is used to activate neurons: where are the center and width of the -th hidden layer neuron.

The linear function is used from hidden layer to output layer: where is the weight of the -th hidden layer neuron to the -th output layer neuron.

The output of the output layer is as follows: where and are the standard deviations of the process noise of the position and attitude and , , and are the standard deviations of the observation noise of the three sensors.

The training process of the network is divided into two stages: offline and online. The offline training stage uses the data with known noise variance to obtain the initial neuron center and connection weights. At this stage, the output error of the network is the difference between the true noise variance and the network predicted noise variance. Since the noise variance in the offline stage is not the noise variance in the actual experiment, it is easy to produce overfitting in the offline training stage, which needs to be corrected through the online stage. The online training stage uses the updated prediction error at the next time as the output error of the network. The training goal is to minimize the prediction error:

#### 5. Experiments and Results

##### 5.1. Experimental Device

The mechanical structure of the experimental device consisted of a 1-DOF motion platform and an 8-DOF redundant manipulator, as shown in Figure 7. The joint encoders were arranged at each rotary joint, and the RGB-D camera and the MARG sensor were arranged at the end of the manipulator. The RGB-D camera was connected to the industrial personal computer (IPC) through USB, and the MARG sensor and the joint encoders were connected to the sensor collector through the 485 bus and then connected to the IPC through USB.

##### 5.2. Experiment of the Enhancement Part

In order to test the effect of the sensor complementary enhancement method proposed in this article, the experimental device was placed in a narrow artificially built environment, considering that the RGB-D camera focuses more on static repeatability and the MARG sensor focuses more on dynamic stability. Therefore, when testing the enhancement effect of the RGB-D camera, a periodically repeated expected motion trajectory was used, while the MARG sensor used a random expected trajectory. The experimental results are shown in Figures 8 and 9.

**(a)**

**(b)**

**(c)**

Comparing Figures 8(a) and 8(b), it can be seen that the point cloud after enhancement is closer to the real environment and has a less invalid blind area caused by the narrow environment. Figure 8(c) further demonstrates the impact of depth image enhancement on the accuracy of position estimation. It can be seen from the figure that the position trajectory obtained by using the original image information fluctuates more drastically and deviates from the reference trajectory to a greater degree. But the position trajectory obtained by using the enhanced image information is closer to the reference trajectory, and the overall fluctuation is smaller, which is more conducive to the fusion pose estimation with other sensors.

Comparing the desired and enhanced quaternion attitude in Figure 9, the overall calculation error of the QEKF algorithm proposed in this paper is within the expected range, which means acceleration information can compensate for the integral drift of angular velocity and geomagnetic field information can calibrate angle measurement errors. Through the enhancement of attitude calculation, it can provide more reliable attitude data for multisensor fusion.

##### 5.3. Simulation of the Fusion Part

Because the statistical tests are very sensitive when the sample size is small, both simulation tests and actual experiments were carried out to complete the verification of the multisensor fusion pose estimation method. Simulation tests were mainly used to test the stability of the method under extreme conditions, and actual experiments were mainly used to test the fusion effect of the method in real environments. It was easy to adjust the sensor parameters in simulation environments, so a large number of samples can be generated to test the performance of the method. Although the experimental samples that were collected in the real environments were limited, these samples were more targeted and reliable after the screening of the experimental methods by the simulation tests. At the same time, to present the experimental results as objectively as possible, the root mean square (RMS) errors of multiple experimental results were used to evaluate the performance differences of different methods.

Since real sensors can only be simulated numerically under simulation conditions, the actual state and sensors were simplified during the simulation. Different from the real system, an observation state and two sensors were set up in the simulation environment. The observation noise of sensor 1 was smaller than that of sensor 2, and sensor 1 only produced a stable zero offset, while the zero offset of sensor 2 increased with time. Figure 10 and Table 1 show the effect of the synchronization update method on the fusion result.

**(a) Single sensor 1**

**(b) Single sensor 2**

**(c)**Sensor 1 and 2 update frequency:

**(d)**Sensor 1 and 2 update frequency:**(e) Sensor 1 and 2 synchronized**

Since the noise of sensor 2 is greater than that of sensor 1, the RMS error of sensor 2 is greater than that of sensor 1 when the state of a single sensor is estimated. In the simulation, the data update frequency of sensor 1 was different from the data update frequency of sensor 2. If the data synchronization of the sensor was not considered, the state update was performed when the data of any sensor inputted. From the experimental results shown in Figures 10(c) and 10(d), the RMS error when is less than that when , indicating that the fusion result is affected by the sensor update frequency, and it is biased towards high-sampling-rate data sources. If the high-sampling-rate data is noisy, the fusion accuracy cannot be significantly improved. The above defects are improved in the modified synchronization update method. It can be seen from the experimental results shown in Figure 10(e) that the RMS error of the fusion data is significantly reduced, and there will be no sawtooth fluctuations in the asynchronous fusion. Therefore, the data synchronization of sensors is an important factor in multirate sensor fusion.

In the simulation environment, the process and observation noise conformed to the Gaussian distribution, but the variance of the distribution changed at every moment and satisfied a nonlinear model. The RBF network was designed to predict the variances of the process and observation noise and was trained when the filter was updated. In the simulation, the variances predicted by the network in the test set were compared with the preset variances, and the coefficient of determination (marked as -square) was calculated. The closer the coefficient of determination is to 1, the higher the accuracy of the network. It can be seen from the results shown in Figure 11 that the network can predict the variances of the noise accurately and has no low coefficients of determination (0.71 for the process noise and 0.84 for the observation noise).

Although the coefficient of determination cannot be obtained when the true variance is unknown, the accuracy of the network can be indirectly reflected through the state estimation deviation. In other words, if the state estimation deviation can remain stable even in extreme cases, it means that the network’s prediction of the parameters is accurate. In the simulation, the process noise and the observation noise of the sensor were designed to increase with time. Figure 12 and Table 2 show the fusion effects of different fusion algorithms in this case.

**(a)**

**(b)**

It can be seen from the experimental results that at the beginning, the system noise is small and the estimation error of each fusion method is not much different, but as time goes by, the system noise increases. The original EKF and UKF algorithms cannot adaptively adjust the filter parameters, which are easily affected by noise and produce state estimation deviations. Through the RBF network proposed in this paper, the noise covariance matrix can be predicted, so that the EKF and UKF algorithms have adaptive adjustment capabilities. The modified RBF-AEKF and RBF-AUKF methods have smaller RMS errors than the original EKF and UKF methods, so they can be more robust in complex environments.

##### 5.4. Experiment of the Fusion Part

Since the environment established in the simulation can only approximately reflect the actual situation, it is still necessary to conduct experiments in the real environment. In actual experiments, the end-effector of the manipulator periodically moved around the -axis of the base coordinate system, and the pose of the end-effector was set as the desired value. The position estimations along the , , and axes were obtained by fusion of the visual odometer based on the RGB-D camera and the kinematics calculation based on joint encoders. And the quaternion attitude estimations were obtained by fusion of the MARG sensor, visual odometer, and encoders. The experiment compares the original UKF method and the modified adaptive UKF method proposed in this paper, and the results are shown in Figures 13 and 14 and Table 3.

It can be seen from Table 3 that the RMS error of the pose estimation after multisensor fusion is smaller than that of a single sensor. Compared with that of the original method, the RMS error of the modified method proposed in this paper is significantly reduced in position estimation, but there is little difference in attitude estimation. The main reason is the modified method proposed in this paper is better than the traditional method due to its better adaptive effect when the noise fluctuation is large. In this experiment, the overall fluctuation of the sensor in attitude measurement is smaller than that of position measurement, so the modified method is close to the original method in attitude estimation.

It can be seen from Figure 13 that the visual odometer is prone to local sudden changes due to environmental interference and the encoder is prone to measurement offset. Compared with the original method, the modified method handles the measurement data of local sudden change better, so the overall RMS error is smaller. And because of the combination of absolute and relative updates, the overall update rate is maintained. In this experiment, the original method was updated only 258 times, while the modified method was updated 479 times, so the real-time response rate of state estimation was improved. It can be seen from Figure 14 that the rotation angle of the desired trajectory on the and axes is small, and the rotation angle on the -axis is large. When the rotation angle is small, it can be seen that the sensor has obvious measurement drift, when the rotation angle is large, it is easy to produce local sudden changes. The modified method in this paper can compensate the measurement drift of each sensor, reduce the influence of local sudden changes on the state estimation, and thus can better follow the desired trajectory.

#### 6. Conclusions

In order to improve the accuracy of the end-effector pose estimation, this paper proposes a modified method based on multisensor fusion, including preenhancement and adaptive fusion. The following conclusions can be drawn from the experimental results. Firstly, the depth image obtained by the RGB-D camera and the attitude obtained by the MARG sensor is enhanced, which can make the multisensor fusion process in a complex environment more stable. Secondly, the synchronization of sensors has a great influence on the results of multisensor fusion. The control variable is relatively updated when asynchronous, and the state variable is absolutely updated when synchronous, which can balance the update frequency and fusion accuracy. Thirdly, predicting the noise matrix through a neural network can make the algorithm adjust the fusion ratio of each sensor adaptively according to changes in the environment, which helps improve robustness. Finally, in the simulation environment, the RMS errors of the original UKF and EKF method are 4.25 times and 5.02 times that of the modified RBF-UKF method, respectively, and in the actual environment, the average pose estimation errors of the single sensor and the original UKF method are 8.57 times and 3.91 times that of the modified RBF-UKF method, and the update frequency of the original method is only 54% of the modified method. In summary, the method proposed in this paper can improve the accuracy of end-effector pose estimation and has high application value and practical engineering value.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported in part by the National Key R&D Program of China under Grant 2018YFB1307400 and in part by the National Natural Science Foundation of China under Grant 61873267.