#### Abstract

Recent interest in on-orbit proximity operations has pushed towards the development of autonomous GNC strategies. In this sense, optical navigation enables a wide variety of possibilities as it can provide information not only about the kinematic state but also about the shape of the observed object. Various mission architectures have been either tested in space or studied on Earth. The present study deals with on-orbit relative pose and shape estimation with the use of a monocular camera and a distance sensor. The goal is to develop a filter which estimates an observed satellite’s relative position, velocity, attitude, and angular velocity, along with its shape, with the measurements obtained by a camera and a distance sensor mounted on board a chaser which is on a relative trajectory around the target. The filter’s efficiency is proved with a simulation on a virtual target object. The results of the simulation, even though relevant to a simplified scenario, show that the estimation process is successful and can be considered a promising strategy for a correct and safe docking maneuver.

#### 1. Introduction

Autonomous rendezvous and docking is a research field which has been extensively studied in the last years. In particular, high interest has been put in visual navigation and pose estimation; various mission architectures have been analyzed, all of which present a different way to estimate the relative pose between a chaser, equipped with some optical device, and a target.

The ATV docking mission to the ISS is an example of a real application involving visual navigation techniques using cameras [1–3]. In particular, the ISS is equipped with optical targets of known shape, which make it easier to estimate the relative attitude between them and the ATV. In other words, the ATV docking mission relies on the cooperativeness of the target spacecraft.

Lately, growing interest in orbit servicing and debris removal switched the focus to autonomous docking to an object that is potentially uncontrolled, free-tumbling, noncooperative, and possibly unknown [4]. An example of a real application in this sense is the PRISMA mission [5, 6], whose focus was to demonstrate the feasibility of autonomous, partially noncooperative in-orbit rendezvous with the use of GPS measurements and a radio frequency sensor, switching to a vision-based control at short range. While the far approach phase can be considered as fully noncooperative, in the short range approach phase, the close range camera of the chaser detects the signal of light emitting diodes located on Tango satellite [7].

Optical navigation has been also used in ESA’s Rosetta mission, where the target is a celestial object, the comet 67P/Churyumov-Gerasimenko [8–10]. Navigation cameras in the far approach phase were used to identify significant landmarks on the comet, which were then tracked and used as reference points for the relative position and velocity estimation process. Moreover, images of the comet taken at close range were used to build a 3D reconstruction. However, the landmark identification and estimation process was made by ground operators, thus making the mission not autonomous.

Many studies in open literature have been devoted to prepare and extend these still partial real applications. Longuet-Higgins [11] was one of the first to develop an algorithm for reconstructing a scene by means of stereo vision in 1981. Weng et al. [12, 13] studied the reconstruction of the structure and the motion of a target body by means of the so-called* two-view motion algorithm*, which estimates the rigid rotation between two consecutive images. Johnson and Mathies [14] applied the two-view motion algorithm to a landing maneuver on a small body and made use of an altimeter to compute the scale variation between two images during the descent. Vetrisano et al. [15] studied the use of cameras and intersatellite links between a swarm of spacecraft during the rendezvous to an asteroid, to estimate both the relative state and the relative angular velocity of the body. Oumer and Panin [16] presented a camera-based 3D feature tracking method used to estimate the position and velocities of an observed rigid body without reconstructing its shape.

In such a research frame, this paper aims to present a vision-based feature tracking method to estimate 3D shape, relative rotation, and translation of a free-tumbling, noncooperative, and unknown satellite orbiting Earth. This task will be achieved by using a monocular camera and a distance sensor (such as a LIDAR), with no prior knowledge either of the relative pose or of the target’s shape. Such a target is clearly more difficult to analyze with respect to the asteroid dealt in [14, 15], as its dynamics can be far more complex and fast. The addition of the target’s shape reconstruction, not tackled in [16], is significant with respect to inspection tasks and to the need to evaluate target’s inertia characteristics and possible locations for a possible grasp or dock. From the academic perspective, one of the major contributions brought by this paper is its generality and the possibility of being considered as a valid navigation technique in different scenarios. In fact, no prior information regarding target’s shape and attitude is required, since the shape itself is reconstructed after the estimate process. At the same time, no restrictions are required either on the magnitude or on the orientation of the relative attitude motion for the correct convergence of the estimation process. Most importantly, in this paper we propose a technique which makes use of a passive optical sensor (a monocular camera) which is more reliable and less power consuming than time-of-flight sensors [17–20]. The specific contribution of this paper stays also with the detailed description, including the selected dynamics, of a process that, by means of a judicious selection of the reference frames adopted, leads to a quite performing technique. Presented tests confirm the effectiveness and the quality of the estimates of the pose and motion of an unknown target. The results of the research are a first step for an application to autonomous rendezvous and docking.

The paper is organized as follows. The mission scenarios, along with the issues involved with the feature tracking and matching process, are described in Section 2. In Section 3, the Hill, camera, and body-fixed reference frames are defined. The relative and attitude dynamics and kinematics model, along with the definition of every angular velocity, are presented in Section 4. In Section 5, the relation between 2D features and 3D body points is described. The filter’s structure is shown in Section 6, while in Section 7 the data used in the simulation are listed. The results of the simulation are discussed in Section 8 and final comments are made in Section 9.

#### 2. Mission Scenario

In the following simulations, we assume that two satellites, a chaser and a target, are orbiting Earth. The chaser, a controlled satellite whose attitude with respect to an inertial reference frame is known from its attitude determination system, is equipped with an optical camera and a distance sensor at the scope of performing the relative navigation; the target is a free-tumbling, uncontrolled satellite, whose shape is initially unknown. We assume that the chaser’s attitude control system maintains the optical axis of the camera pointing towards the target.

The preliminary steps focus on navigation. In particular, estimating target’s shape, relative position, and attitude is of crucial importance to ensure that no collisions between chaser and target are taking place and indeed a correct and safe docking is achieved.

The target’s shape and relative state can be estimated by tracking significant features through subsequent frames captured by the camera. A set of features extracted from an image at time by means of a feature extraction algorithm, such as SURF* (Speeded-Up Robust Features)* [21], is compared to a set of features extracted from the following frame at time in order to find the correspondences. The identified set of matched features is then used as observables for the filter’s update process. An example of the feature’s extraction and matching process is shown in Figures 1(a)–1(c) where a 3D CAD model of the satellite AQUA [22] has been used as a test. As can be seen, the satellite has accomplished an in-plane rotation between the two frames. Note that in a more generic situation the satellite may rotate out of the image plane, and features may disappear from the camera field of view due to either target’s body occlusion or changes in sunlight illumination conditions. Other events that can happen as a result of the matching process are false matches, which occur also in the example presented (Figure 1(c)) for better understanding. These issues and others have to be faced if a robust estimation process must be achieved.

**(a)**Extracted features at time

**(b)**Extracted features at time

**(c)**Matched features between features at (red circles) and at (green crosses)Therefore, given the scenario, the aim is to build a filter which merges the measurements obtained by the camera (i.e., the features detected, tracked, and matched) and by the distance sensor (i.e., the target-to-chaser distance measurements) and combines them with the state prediction given by a dynamics model to estimate in real time the target’s relative position, velocity, and attitude and the target’s shape in a body-fixed reference frame.

#### 3. Reference Frames

##### 3.1. Hill Frame

The reference frame in which the relative orbital dynamics is expressed is the* Hill Reference Frame* (HRF), which is centred in the target’s centre of mass and is made up by the following axes:(1) axis is parallel to target’s orbital radius, .(2) axis is parallel to target’s orbital angular momentum, .(3) axis forms an orthonormal, right-handed frame.

The matrix describing the transformation between HRF and ECI* (Earth Centred Inertial)* is a function of target’s position and velocity expressed in ECI

##### 3.2. Camera Reference Frame

The rotation between chaser and target has to be evaluated with respect to a reference frame in which the measurements are taken. The feature’s pixels are 2D projection of 3D points whose coordinates are expressed in the* Camera Reference Frame* (CRF). In the present work, we assume that the origin of CRF, which is the camera’s focal point, coincides with the chaser’s centre of mass. This hypothesis, which does not affect the validity of the results, is made in order to avoid defining another reference frame parallel to CRF and differing from it only for a rigid translation. The axes are displaced as follows:(1) is parallel to the* optical axis*. As said in Section 2, we assume to know and control chaser’s attitude to our needs. This means that, in order to have continuous and full visibility of the target to acquire significant images, it is needed to make the optical axis point towards the target’s centre of mass. Therefore, its direction will be opposite to the relative position vector .(2) and form a plane perpendicular to but are not bound to any direction since a rotation about is irrelevant to the purpose of framing the target.The rotation used to switch between HRF and CRF is a 3-1-0, meaning that the first two rotations are needed to bring parallel and opposite to , but the third rotation is null (), as there is no constraint on and directions. The correspondent rotation matrix is an identity matrixwith

The angles and , which are, respectively, the azimuth and elevation angles of the vector with respect to HRF, are obtained as follows (see Figure 2 for a geometrical representation of the angles):

##### 3.3. Target Body-Fixed Reference Frame

One of the goals of the filter is to evaluate the attitude of the body relatively to a known reference frame. Because the measurements are taken in CRF, which is a chaser-fixed reference frame, it would be worth expressing the attitude of the target with respect to the chaser, that is, relatively to CRF. As the target body reference frame is concerned, a meaningful choice has to be made.

The* Body Principal Reference Frame* (BPRF) is defined as the frame in which the inertia matrix is diagonal. It is centred in the target’s centre of mass; the axes are body-fixed and parallel to the target’s principal axes. The attitude dynamics equations assume a simple form when written in the BPRF, so it seems the most convenient frame to choose. However, it is not possible to evaluate the attitude of BPRF with respect to CRF, as there is no information about the direction of the target’s principal axes at any time.

In order to give an evaluation of the target’s attitude with respect to a known frame, it is therefore necessary to introduce a* Body Dummy Reference Frame* (BDRF) that is a reference frame whose origin and axes’ directions are arbitrarily chosen by us at the beginning of the estimation process. We fix the origin in the target’s centre of mass for convenience. The most clever choice is to assume BDRF being parallel to CRF at , that is, the attitude matrix being an identity matrix at

#### 4. Dynamics Model

##### 4.1. Relative Translation

The relative orbital dynamics is described by the second-order, inertial derivative of the relative position vector , whose coordinates are expressed in HRFwhere (i) denotes the angular velocity of HRF with respect to ECI, that is, the target’s orbital angular velocity;(ii) and are the target’s and chaser’s position vectors with respect to Earth’s centre in HRF, respectively.

For the sake of simplicity, since the relative distances involved are small compared to the orbital radius, we use the linearisation of (6), which are the well-known* Clohessy-Wiltshire* equations [23], largely adopted in space proximity operationswhere is the target’s mean motion.

##### 4.2. Relative Attitude

For the purpose of the present research, we want to investigate how the target rotates with respect to the chaser. This means we want to evaluate target’s angular velocity between BDRF and CRF, .

As said in Sections 2 and 3.2, we assume the chaser’s attitude varies in a known way; that is, the CRF-to-HRF angular velocity is known. It can be written as a function of Euler angles’ derivatives, which are computed as derivatives of (4a) and (4b) and therefore depend on the relative position and velocity vectors and

Because the target’s orbit is known, we can use its orbital parameters to compute the orbital angular velocity vector, which is the angular velocity between HRF and ECI

The attitude dynamics equations describe the rotation of BDRF with respect to ECI , but the angular velocity is expressed in BDRF so that the inertia matrix is constant over time. Besides the target’s free dynamics, the effect of the gravity gradient is taken into accountwhere and is not diagonal because BDRF is not a principal reference frame.

Finally, we can obtain the BDRF-to-CRF angular velocity vector by using the following:

We make use of the quaternions set to describe the attitude of the body frame with respect to the camera frame. The kinematics equations describing the quaternions’ variation over time is written as follows:where is a matrix depending on the BDRF-to-CRF angular velocity vector expressed in the body reference frame

#### 5. Features Identification

Each feature tracked by the extraction and matching algorithm corresponds to a 3D point of the target’s body. We therefore can link every matched 2D feature to a precise 3D point whose coordinates in a certain reference frame have to be estimated in order to accomplish the shape reconstruction problem. The 3D position of the -th point can be identified in BDRF by means of the vector from the target’s centre of mass to the point , which is constant over time as the body is considered rigid.

On the other hand, the same point can be identified in CRF. We define the vector which goes from the chaser’s centre of mass to the -th point, whose coordinates are expressed in CRF.

These two representations of the -th tracked point can be linked with each other by means of the chaser-to-target relative position vector , as can be seen in Figure 3or equivalently

The 2D coordinates of the -th feature on the image plane are related to their respective -th point’s 3D coordinates by the prospective camera model equations [24]where is the focal length of the camera.

#### 6. Filter’s Structure

The* Unscented Kalman Filter* (UKF) [25] is a recursive estimator built on the unscented transformation and based on the idea that the evolution of a (noisy) process could be known through the associated statistical distribution. The propagation of the statistical distribution is carried out by means of a set of purposely selected points (the so-called sigma points). In this way, the propagation of the state and of the statistical indicators is based on the integration of the dynamics instead of involving the Jacobian of the dynamics and of the measurements equations (as in Extended Kalman Filter), meaning that UKF is especially convenient for complex nonlinear dynamics.

For the purpose of the present study, the state vector has to include parameters regarding target’s shape, rotational state, and relative position and velocity. As a consequence, the state vector is defined aswhere is the number of points tracked; therefore, the state vector’s length is .

The dynamics equations are a system of differential equations

As explained in the previous sections, (18c)–(18g) represent the target’s attitude kinematics represented via the quaternions set, the target’s attitude dynamics, and the Clohessy-Wiltshire equations written in components, respectively.

Equation (18h) can be written because we express the attitude dynamics equation (18d) in the body-fixed reference frame BDRF, in which the inertia matrix, though not diagonal, is constant over time.

Equation (18b) represents the kinematics of the -th target’s body point in BDRF. Because we are considering a rigid target’s structure, the vectors are constant over time if expressed in a body-fixed reference frame.

Equation (18a) is a differential equation which describes the kinematics of the vector , expressed in CRF. It derives from differentiation in time of (14).

Since we want to analyze the variation of the vector in the noninertial CRF, we need to derive the quantities in the inertial ECI frame and consider the angular velocities between the frames in which we are derivingwhere the notation means the derivative is computed in the reference frame , while the vectors’ components can be expressed in any reference frame.

Differentiating (14) in time leads to

Substituting (19) into (20) and simplifying the notation as we get the following expression:where all the quantities must be expressed in the same reference frame. We will use CRF for convenience.

We get (18a) from (22) by recalling that the variations of the body vectors over time are null, as previously stated in (18b).

##### 6.1. Observables

The distance sensor and camera mounted on board of the chaser provide, respectively, a range measure and the pixels of the features tracked. The observables equations can be written as follows:The terms , , and represent the noises affecting the observables, which are considered Gaussian. The variances associated with the measurements are assumed to be mm, px, and px. Nevertheless, the measurements obtained by the camera are discretized due to the nature of the detector made up by a pixel map.

#### 7. A Test Case

In this step of the research we want to focus on target’s shape and relative state estimation process. We therefore do not want to pay particular attention at the feature detection and matching process. Thus, no feature extraction and matching algorithm is used here, meaning that no real image is being processed to extract features. It is indeed true that specific problems related to the space environment, such as eclipses, harsh illumination, and occlusions, are not considered in this phase of the research. Nevertheless, we assume that some specific points of the target’s body are always visible and perfectly tracked throughout the entire estimation process. This certainly is a simplification to be removed in future developments of the approach, but it allows for a verification of the filter’s performance in an ideal case.

The target has a basic shape, which can be seen in Figure 4.

We assume we are tracking 12 points of the target’s body (), whose positions can be seen in Figure 5. In this figure, the target is represented in its initial attitude.

##### 7.1. Filter’s Initialization

In order to generate the observables by means of (23), we need to know the true state of the system at each instant . Therefore, we need to integrate the system of differential equations (18a)–(18h) with some initial conditions.

For this test case, the target’s orbit is assumed to be circular and inclined: 7,000 km, , , , , and . Moreover, we decide that the chaser is on a closed and inclined relative orbit around the target. This example is also quite reasonable because it represents the case of a parking orbit, which frequently is an intermediate step before docking.

The initial attitude matrix from BDRF to CRF is an identity matrix, as said in Section 3.3, while the tracked points are chosen as shown in Figure 5. The inertia matrix is computed assuming that the BDRF is initially aligned with BPRF. This assumption is a simplification which makes the inertia matrix diagonal and does not affect the validity of the results because a simple, fixed rigid rotation would be needed to bring BDRF parallel to BPRF if they were not.

The state vector (17) is therefore initialized to its true initial value

In addition to the true value for the observables generation, we need to give an initial guess to the filter from which it can start the estimation process. We assume to have a certain error on the initial guess of the relative position and we set the initial relative velocity estimate to zero.

As we are arbitrarily setting BDRF to be parallel to CRF at time , we have a perfect initial guess on quaternions

We assume to have an uncertainty of deg/s on each component of the initial guess of the target’s angular velocityand an uncertainty of kg/m^{2} on each element of the inertia matrix

To give an initial estimate of the vectors , we make use of their pixels’ coordinates obtained by the measurements at time and go back from the 2D coordinates on the image plane to the 3D coordinates in CRF. However, we need to solve the perspective ambiguity, for which a 3D point corresponding to its 2D representation can be at any depth from the image plane. Therefore, along with the in-plane coordinates obtained through the camera, the depth of each point is needed. As an initial rough estimation, we assume that each point is situated at the same depth and set it to be the norm of the relative position guess (25).

Dropping the subscript for the sake of simplicity, we can thus write the initial guess for each vector as follows:

From (15) and (30a)–(30c) we can write the initial guess for the vector

Along with the initial guess, a certain standard deviation has to be associated with each state component. Because we are tracking 12 points, the covariance matrix is a diagonal matrix. The element is the square of the standard deviation associated with the -th component of the state vector

Because we assume to know the correct dynamics equations, the process noise matrix is set to be a null matrix: in case of significant perturbations, should be tuned accordingly. On the other hand, as the measurements obtained by the camera and the distance sensor are considered to have Gaussian noise, the measures noise matrix is a diagonal matrix, where the element is the square of the variance associated with that measure

#### 8. Results

The output of the simulation is plotted in Figures 6–10, where the errors between true and estimated values of each component of the state vector are plotted. However, only the first point is analyzed and plotted in Figure 6 for the sake of simplicity. The errors’ components at final time are listed in Table 1, while the norms of the errors at the beginning and at the end of the simulation are shown in Table 2. Note that the inertia matrix errors are written as a vector in the following form, which takes into account the fact that the inertia matrix is diagonal:The simulation time is set to one orbital period of the target s and the interval between subsequent steps is s.

**(a)**error

**(b)**error**(a) View 1**

**(b) View 2**

**(a)**error

**(b)**error

**(a)**error

**(b)**error**(c) 3D relative trajectory of the chaser with respect to the target**

We can see in Figure 6 that the errors of the point number 1 correctly converge. In fact, the simulation improves the estimate of the tracked points of an order of magnitude, as can be noticed in Table 2.

Along with the point number 1, the estimation process is successful for the remaining 11 tracked points, as can be seen in Figure 7, where the blue markers represent the tracked points in their true position (as defined in Figure 5), while the red markers are the points’ estimate at the end of the simulation.

The target’s relative attitude estimation errors are plotted in Figure 8(a), while the norms of the initial and final errors are, respectively, and . In this case, the filter cannot refine the estimate, as the initial attitude is known with accuracy because it has been defined as the difference between a frame whose orientation is known (CRF) and a frame artificially chosen (BDRF).

The errors on the target’s angular velocity with respect to ECI reference frame are plotted in Figure 8(b). In this case, the estimation process improves the estimate between the beginning and the end of the simulation of three orders of magnitude, as can be noted in Table 2.

By taking a look at Figure 9, we can see the results of the relative position and linear velocity estimation. The errors are plotted over time in Figures 9(a) and 9(b), while both the true and estimated 3D trajectories are shown in Figure 9(c), where it is possible to see the closed and inclined nature of the relative orbit.

In this case the estimation process is satisfactorily achieved in both position and linear velocity, as the improvements of the estimates between the beginning and the end of the simulation are, respectively, of two and three orders of magnitude, as can be seen in Table 2. Moreover, it can be noticed in Figure 9(b) how the convergence time is of few seconds for the linear relative velocity, while, on the other hand, the relative position converges to its final estimate approximately at half orbital period ( s).

Conversely to the results shown so far, the filter does not improve the estimate of the target’s inertia matrix. This can be clearly seen in Figure 10 and Table 2. While the off-diagonal elements’ error roughly remains at the same order of magnitude throughout the estimate process, the error of the diagonal elements increases of an order of magnitude. Moreover, the errors of , , and seem to converge to a value different from zero (see Figure 10).

The explanation to this result can be found inside (18d). The dependence of the angular acceleration on the inertial matrix is nearly negligible, as it cancels in the right part of (18d), regardless of the inertia’s order of magnitude. In fact, the two terms inside the parentheses, which, respectively, represent the free dynamics and the gravity gradient contribution, both depend on the target’s inertia, but at the same time both are multiplied by the inverse of the inertia matrix to compute the target’s angular acceleration. Moreover, the target’s inertia is not used in any observable equation (23). This means that the filter has got no information about how a variation on the inertia would affect any state vector’s component. In other words, a correct estimation of the inertia is not needed to provide a correct estimation of the other quantities, and, conversely, an accurate estimation of the relative pose cannot either provide or affect the inertia estimate.

#### 9. Conclusions

The problem of optical navigation used in a docking maneuver to a noncooperative satellite has been analyzed. In particular, a filter which estimates the relative state (position, velocity, and attitude) together with the shape of the target has been developed and tested in a specific case. It is important to remark that the proposed approach is valid for any attitude and linear motion of the observer spacecraft. Moreover, no previous knowledge of the target’s shape is required. The process of features’ extraction and matching between two consecutive frames has been considered as successfully carried out by a prefiltering stage and therefore it has not been implemented in the proposed filter.

The output of the simulation showed how the filter improves five out of seven of the quantities to be estimated. In fact, the relative position, velocity, and angular velocity final estimates have lower errors than the initial one, respectively, of two, three, and three orders of magnitude. Moreover, a 3D shape of target can be reconstructed by using the estimates of the 3D points tracked by the camera, which are evaluated with errors up to tens of centimetres, while starting the simulation with uncertainties of meters. The quaternion vector describing the relative attitude is estimated with errors of the order of . The target’s inertia is the only parameter whose final estimation is not an improvement of the initial guess, because neither the dynamics nor the observables equations are sensible enough to its variation. Nevertheless, estimating incorrectly the inertia does not affect the other quantities’ estimates.

By what has been seen from the simulation’s results, the filter can thus solve the problem of real-time pose and shape estimation satisfactorily, even if the considered case is simplified with respect to the real world. Important missing ingredients that will be the object of future studies are relevant to the detection and tracking of features in the case of a real 3D target, where the observables can be noncontinuous (due to occlusion or blinding) and also affected by false matches (due to the statistical nature of the matching process). Preliminary analyses show that the proposed approach should be robust enough with respect to these issues.

Notwithstanding these limitations, the degree of accuracy attained from the present approach can be considered a promising basis for a correct and safe docking maneuver.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.