Abstract

LiDAR and camera are two commonly used sensors in autonomous vehicles. In order to fuse the data collected by these two sensors to accurately perceive the 3D world, it is necessary to perform accurate internal parameters’ and external parameters’ calibration on the two sensors. However, during the long-term deployment and use of autonomous vehicles, factors such as aging of the equipment, transient changes in the external environment, and interference can cause the initially correctly calibrated camera internal parameters to no longer be applicable to the current environment, requiring a recalibration of the camera internal reference. Since most of the current work is focused on the research of perception algorithms and the calibration of various sensors, there has not been much research in identifying when a sensor needs to be recalibrated. Consequently, this paper proposed a data-driven detection method for the miscalibration of RGB cameras to detect the miscalibrated camera internal parameters. The specific operation process is to first add a random perturbation factor to the correctly calibrated camera internal parameters to generate an incorrect camera internal parameter and then calibrate the raw image with the incorrect internal parameter to generate a miscalibrated image data. The miscalibrated image data are used as the input data of the neural network to train the network model and generate a network model for detecting the miscalibration parameters. On the KITTI dataset, we conducted training as well as model deployment with the data collected from Cam2 and Cam3, respectively, and evaluated the abovementioned two models. The experimental results show that our proposed method has some application value in detecting errors in the calibration of the camera’s internal parameters.

1. Introduction

In autonomous driving, a single sensor cannot accurately perceive the complex and ever-changing road traffic environment, and therefore, a fusion of data collected by several different kinds of sensors is the mainstream solution for current autonomous driving perception systems. To enable a more accurate fusion of sensor data, accurate calibration of the sensors becomes the basis for all applications. As two commonly used sensors in autonomous vehicles, LiDAR and cameras, once their calibrated internal and external parameters are determined, they will be in a constant state throughout the operating period of the autonomous vehicle. However, during long-term deployment and use, due to the aging of the camera equipment, transient changes in the external environment, and distraction, the initial calibration of the camera’s internal parameters are no longer applicable to the current environment, and the camera’s internal parameters need to be recalibrated. The periodic calibration of the internal parameters of the camera will waste a lot of manpower and resources. A better state would be to calibrate the camera when the system detects a miscalibration, so identifying when the camera needs to be recalibrated is the main purpose of this paper. Miscalibrated camera internal parameters can adversely affect the performance of the perception and control modules for autonomous driving, which makes detecting camera data faults crucial to the safety and stability of autonomous vehicles. In order to avoid system errors caused by the miscalibration of camera internal parameters, this paper proposes a data-driven RGB camera miscalibration detection method to detect the miscalibration of camera internal parameters. The main contributions of this paper are as follows:(1)We propose a method to generate miscalibration datasets. The method is based on the idea that incorrect calibration parameters will cause the pixel projection position of the raw image to change relative to the pixel projection position with the correctly calibrated parameters. Adding a random perturbation factor to existing correctly calibrated camera internal parameters to generate incorrect camera internal parameters, thus generating miscalibration image data.(2)We designed a convolutional neural network that uses incorrectly calibrated images as the input data and trained a network model that can detect whether the calibration of the camera is incorrect and thus identify when the camera needs to be recalibrated for internal parameters.

Projecting the point cloud onto the image requires a joint calibration of the LiDAR and the camera. The calibration process of different sensors is the process of estimating the rigid body transformation between two sensor reference coordinate systems. The correctly calibrated sensor can reproject the 3D points from the world coordinate system to the 2D pixel coordinate system, and the conversion process of the coordinate system is shown in Figure 1.

During the projection of the point cloud onto the image, accurate camera calibration is essential to ensure that the point cloud data are accurately projected onto the image. The current calibration techniques for cameras are mainly divided into two methods: offline calibration [13] and online calibration [4, 5]. Offline calibration of the camera is performed by observing the geometry of a known target in 3D space, so various types of calibration boards such as checkerboard grids [6, 7] and targets [8] have been proposed in recent years. These methods usually treat the calibration problem as a nonlinear optimization problem, estimating the camera parameters by minimizing the reprojection error. However, it is difficult to scale up in practical applications due to the limitations of equipment and professional staff and other requirements. Online calibration refers to the calibration performed during the normal operation of the system. Although the method is novel, it requires substantial computational resources and is also subject to various restrictive factors such as the smoothness of the surface on which the vehicle is driven. In addition, a self-calibration method is introduced in reference [9], which uses only images to estimate the camera’s internal parameters. References [1013] use motion constraints of the camera to calibrate the internal parameters of the camera, but these methods consume a lot of computational resources.

With the rapid development of techniques such as deep learning [1416] and computer vision [1721], in recent years, many researchers have proposed the use of data-driven methods to estimate the calibration of sensors. Workman et al. [22] used a convolutional neural network to estimate the focal length in the camera’s internal parameter. To train the network, the authors of [23] combined the image and camera model to construct the dataset. Lopez et al. [24] used the SUN360 panoramic dataset [25] to manually generate training images and then used independent regressors that shared the same pretrained network structure in order to estimate the camera’s internal parameters. Unlike the previous two methods, Yin et al. [26] used a depth network to remove distortions from fisheye camera images. In autonomous driving systems, sensor fault detection can be accomplished by combining information from different sensors and removing mismatched measuring values [2731]. These approaches are usually able to detect faults when they occur, but they rely heavily on redundant information about the sensor settings.

LiDAR and camera are two commonly used sensors in autonomous vehicles. In the process of point cloud and image fusion, the calibration of internal parameters of the camera and external parameters between the camera and the lidar is essential, and their calibrated internal and external parameters, once determined, will be in a constant state during the whole operating cycle of the autonomous vehicles.

However, during long-term deployment and use, due to the aging of the camera, transient changes, and interference in the external environment, the camera’s internal parameters that were initially correctly calibrated are no longer applicable to the current environment and need to be recalibrated. Periodically recalibrating parameters can be labor-intensive; it would be more sensible to only recalibrate the camera when the system detects a calibration error, which makes identifying when the camera needs recalibration a pressing issue.

To address this problem, in 2020, Cramariuc [32] et al. proposed a method that uses deep learning to identify the errors in the calibration of the camera’s internal parameters; this method is one of the first works to propose the use of deep learning methods to detect whether the miscalibration of the RGB camera internal parameters. In their paper, the authors designed a neural network to extract the features of the input data by using ReLU as the activation function of the network. Due to the problem of neuron death in the ReLU activation function, the network cannot be updated properly in training.

In summary, to address these problems, we propose a data-driven RGB camera miscalibration detection method to detect the miscalibrated camera internal parameters, and we design a feature extraction network using Leaky ReLU as the activation function, which can solve the neuron death problem of the ReLU activation function used in traditional methods. The Leaky ReLU activation function has a small positive slope in the negative half-axis, so it can also perform better backpropagation when the input data are negative. At the same time, in the process of generating the miscalibrated image dataset, we introduce a random disturbance factor to perturb the correctly calibrated camera internal parameters so as to obtain the miscalibrated camera internal parameters. This allows efficient and accurate identification of when the camera needs to be recalibrated.

3. System Design

3.1. Image Datasets for Error Correction

To obtain the image data of the correction error, you first need to obtain the miscalibration camera’s internal parameters. Using the wrong internal parameters to calibrate the camera’s raw image produces miscalibrated image data. In general, using a different lens to change the camera’s internal parameter is one way to acquire miscalibrated camera’s internal parameters. However, this production process is cumbersome because every lens change requires a new offline calibration. Calibrating the raw image with wrong calibration parameters changes the position relative to calibrating with the right ones. Based on such an idea, we use the raw data in the KITTI dataset and the correctly calibrated camera internal parameters to generate the miscalibrated image data by adding a random perturbation factor to the correct camera internal parameters, so that the correctly calibrated internal parameters become miscalibrated. Since the lens of the camera is not perfectly parallel to the imaging plane, lines that are straight in the real world become curved when projected onto the 2D plane through the camera and need to be corrected for distortion by using a calibrated distortion factor (as shown in Figure 2). Therefore, in the process of generating image datasets with calibration errors, we considered pinhole camera models with radial and tangential distortions [33].

The paper [32] generates the miscalibrated image data by first fluctuating a fixed magnitude from left to right on the basis of the correctly calibrated camera internal parameters given in KITTI, so that the miscalibrated calibrated camera internal parameters are randomly selected from this parameter range (The value range of the parameter for the focal length miscalibration is 5% decrease to the left and 20% increase to the right on the basis of the original correct calibration parameter; the value range of the parameter for the optical center miscalibration is 5% decrease to the left and 5% increase to the right on the basis of the original correct calibration parameter, and the value range of the parameter for the aberration coefficient miscalibration is 15% decrease to the left and 15% increase to the right on the basis of the original correct calibration parameter). Since the KITTI [34] dataset has a total of 5 days of raw image data and the camera calibration parameters are different for each of the 5 days, this approach can lead to a large difference in the miscalibrated camera parameters generated each day.

To address the abovementioned problem, we propose a method of adding a random perturbation factor to generate the miscalibrated camera internal parameters to reduce the gap between the miscalibrated camera internal parameters generated each day. The operation process is as follows: first, the calibration file of September 26 is used as the reference, the range of values of the miscalibrated internal parameters is fixed according to the method in the literature [36]; then, the corresponding miscalibrated camera internal parameters are obtained from the range of values of each parameter by random sampling, and the original image is calibrated with these internal parameters to obtain the miscalibrated image data. The distance between the pixel position of the miscalibrated image and the correctly calibrated image is calculated. Using this distance as a standard, a random perturbation factor is added to the calibration parameters for the other days, so that the pixel difference between the generated miscalibrated image and the pixel difference on September 26 is controlled within a fixed threshold.

We assume that the original image captured by the camera is ; the corrected image is obtained by using the true calibration parameters of the pinhole camera model by mapping function . At this moment, each pixel in is associated with a position in the original image, but not every pixel in can find a corresponding position in the original image. Therefore, we define the largest rectangular area of valid pixels in image as R, then crop the region R, resize it to the size of the original image I according to a certain ratio, and finally obtain the sample image .

Usually, the calibration parameters of the camera are difficult to obtain. In order to obtain samples of incorrectly calibrated images, we get the incorrectly calibrated internal parameter by sampling each internal reference in its corresponding range of values independently at several random times, thus obtaining many incorrectly calibrated images and incorrectly corrected mapping functions . Thus, by calibrating the sensor correctly, a large number of incorrectly calibrated images can be generated by acquiring only one set of raw image data, which are used to detect whether the internal parameters of the camera is miscalibrated. The variation of the corrected position of the raw image with correct and incorrect calibration parameters is shown in Figure 3.

3.2. Metric for the Degree of Error in the Calibration of the Camera’s Internal Parameters

Because the calibration of the input image is performed in the first stage of the perception system of the autonomous vehicle, the extent of the camera’s internal parameters calibration error needs to be portrayed by using miscalibrated images. Thus, we use an average pixel position difference (APPD) [32] metric to reflect the extent of error of the calibration parameters. The APPD is the average of all pixel position deviations on an image, and its expression is shown below:where is the size of the image. is the pixel coordinate , is the correct mapping function, and is the error mapping function and is the raw image.

To accurately detect if the camera calibration is wrong, we designed a convolutional neural network to extract features. The input data of the network are the picture with the incorrect calibration parameters. The final output of the network is the APPD metric, and the structure of the network is shown in Figure 4.

In contrast to reference [32], we use the Leaky ReLU activation function after each convolutional layer, which solves the neuron death problem that exists with ReLU. The leaky ReLU activation function still has a small positive slope in the negative half-axis, thus allowing better backpropagation when the input data are negative. The ReLU and the leaky ReLU activation functions are shown in Figure 5.

The ReLU activation function can avoid gradient disappearance during backpropagation, shield negative values, and prevent gradient saturation, but it also has its own drawbacks. When the learning rate is too high, some neurons will die permanently, resulting in the network cannot be updated properly later on. The neural network weight update formula:where is the learning rate, indicates the gradient of the current parameter obtained by derivation (generally positive), when the learning rate is too high, it will cause to rise up, when is greater than , the updated will become negative. When the weight parameter becomes negative, the positive value of the input network will be multiplied by the weight and will also become negative. According to the image of the ReLU function in Figure 5, the negative value will output 0 after passing ReLU; if W has a chance to be updated to a positive value at a later stage, there will not be a big problem, but when the output value of Relu function is 0, the derivative of ReLU will also be 0, so it will lead to the later to be 0 all the time. In turn, this leads to the fact that will never be updated and therefore will lead to the permanent death of this neuron (always output 0).

As shown in Figure 5, the output value of Leaky ReLU is also less than 0 when the input is less than 0, thus having the opportunity to update to a positive value if is less than 0.

Since the acquisition platform for the KITTI [34] dataset uses two cameras, we trained a neural network model for each of these two cameras and deployed it next to them. This can be considered a complement to the calibration procedure. The purpose of describing the dataset generation process is to reduce the amount and type of data required to train the model. With this method, when the correct calibration is known, only one dataset is enough; any manual annotation is not required.

In the training process, each parameter in the camera's internal parameter is sampled uniformly and independently within its corresponding value range, so that the calculated APPD values follow an approximately uniform distribution. We selected a certain percentage of correctly calibrated samples, i.e., those with zero APPD values. We use the mean error loss function of the output of the neural network and the true label value (as shown in (3)) for training:where is the batch_size, is the APPD value of the neural network output, and is the true APPD value of the training sample.

4. Experiments

4.1. Setting Experimental Parameters

The collection platform for the KITTI dataset [34] has two RGB cameras (e.g., Cam2 and Cam3 in Figure 6). We first divide the data from September 26, 2011, into a training set and a validation set. The trained model was tested with data from September 28, September 29, September 30, and October 03.

The dataset includes a total of 5 days of data and the calibration files are different for each day. The reason for this situation is not clear. Therefore, there is no standard correct calibration when evaluating the performance of network models. We use the calibrated parameters from September 26, 2011, as reference values. The network model is trained with batch_size set to 4, epoch set to 10, and learning rate set to 0.0001, and its loss curve for training and validation is shown in Figure 7.

The two curves in Figure 7 show the convergence of the loss function of the network model during the training process. By analyzing the two curves in the figure, it can be concluded that the convergence of the loss function during the training process is fast. The trend of the validation loss and training loss remains basically the same. When the training reaches the 4th epoch, the loss value is basically close to 0 and the network model converges.

4.2. Analysis of Prediction Results

We tested the trained model by using data from dates other than September 26, and the results are shown in Figure 8.

As can be seen from Figure 8, the average absolute error of our trained model on the data of other dates is within 0.5, which indicates the feasibility of our proposed method in detecting whether there is an error in the calibration of the internal reference of the camera. The worst evaluation result was obtained for the data on September 29, with an average absolute error close to 0.5. The possible reason for this is the high diversity of the data on September 29.

We trained the network models for the Cam2 camera data and Cam3 camera data on September 26 and tested the two network models with data from other dates, and the test results are shown in Tables 1 and 2.

Tables 1 and 2 show the prediction results of the trained network models for the Cam2 data and Cam3 data, respectively, which shows that both network models have a good generalization ability to previously unseen images and environments. Although both network models are powerful in detecting miscalibration relative to the reference parameter set they were trained on, it can be seen from Table 1 that the network of Cam2 performs better. It is also evident from Tables 1 and 2 that the trained models perform worse at both very low and very high APPD values.

In the KITTI data collection platform, Cam2 and Cam3 are from the same brand, and their operating environment and location in the car are horizontal, so we evaluated the Cam3 data with the model trained on the Cam2 data (Figure 9), and the Cam2 data with the model trained on the Cam3 data (Figure 10).

By analyzing Figures 9 and 10, we can see that the model trained on Cam2 can be well generalized to Cam3. While the model trained on Cam3 does not generalize well to Cam2. This illustrates the importance of reference calibration, which is another reason why Cam3 may be inconsistently calibrated.

To verify the effectiveness of the Leaky ReLU activation function we used in this experiment, we trained the data collected by using Cam2 on September 26 by using both ReLU and Leaky ReLU activation functions, respectively, and tested them with data that are not used in training. The test results are shown in Figure 11.

From Figure 11, it can be seen that when the learning rate is 0.000001, the network model with Leaky ReLU is not as effective as the network model with ReLU. As the learning rate increases, the network model with Leaky ReLU is significantly better than the network model with ReLU. When the learning rate is 0.0001, the average absolute error of the trained network model is the smallest, and the network model performs the best.

4.3. Generalization to New Environments and Cameras

Since the KITTI dataset is limited by the variation of the scene as well as the camera sensor positions (both cameras are forward facing with only horizontal offset between them); therefore, to test the potential generalization capability of the proposed method, we trained the same model by using our proposed method on some daytime scenes recorded by using the forward-facing cameras of the Waymo dataset [35] and tested the trained model, the results of which are shown in Figure 12.

From Figure 12, it can be seen that the network model trained on the Waymo dataset still has great generalization ability, but there are big errors in the prediction for both small and large APPD values. The data of their test results are generally similar to those of the network model trained on the KITTI dataset, respectively. It shows that the method proposed in this paper is still applicable to other datasets.

5. Conclusion

In this paper, we propose a data-driven method for detecting vehicle camera miscalibration by generating a set of miscalibrated data on the KITTI dataset by using a random perturbation factor as the training data for the neural network. Also, we designed a lightweight feature extraction network to extract the features of miscalibrated images. Finally, the metric of average pixel position difference mentioned in reference [32] was used to measure the degree of error in the calibration of the camera’s internal reference. Our method solves the problem that it is not necessary to recalibrate the camera internal reference periodically but only to recalibrate the camera’s internal reference when an internal reference calibration error is detected. The current problem with our approach is that for different autonomous vehicle platforms, depending on the camera and radar mounting positions and angles, a separate model needs to be trained for each camera, thus ensuring that the camera's internal parameters can be recalibrated when the camera position changes. However, this also makes it impossible to apply models from one acquisition platform to another. In our next work, we intend to mix multiple datasets from different acquisition platforms to train the network model, so that the network model can be adapted to multiple acquisition platforms with better generalization capability.

Data Availability

All data and programs included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (61971007 and 61571013).