Abstract

With the advance of mobile technologies, mobile devices such as unmanned aerial vehicle (UAV) become more important in video surveillance. By applying mobile person re-identification (re-id), mobile devices can monitor pedestrians in the transportation system from complex environments. Since the computing and storage resources of mobile devices are limited, traditional person re-id methods are not appropriate for mobile condition. Besides, mobile person re-id task also requires real-time processing. In this paper, we propose a novel hashing method: online discrete anchor graph hashing (ODAGH) for mobile person re-id. ODAGH integrates the advantages of online learning and hashing technology. In ODAGH, we propose an online discrete optimization algorithm to improve the efficiency of anchor graph learning in the online scenario. Experimental results demonstrate the superiority of ODAGH in terms of both effect and efficiency.

1. Introduction

With person re-identification (re-id) technology, it will be able to find the same identity from different and non-overlapping cameras. Person re-id can be widely used for video surveillance; moreover, person re-identification is the key technology in pedestrian traffic monitoring [1]. Detecting and tracking a person across camera is important in traffic monitoring system [2]. Also, person re-id technology can intelligently and efficiently identify and track pedestrians in streets, airports, or other transportation systems.

The task of person re-id is an image retrieval problem. Given a probe image (query), the purpose of person re-id is to search the information about established personnel characteristics in a traffic database for images that contain the same person [3]. Traditional person re-id technology is used in the scene where cameras are unable to move, such as fixed camera networks in different public areas, including urban transport systems.

In recent years, with the advance of mobile technologies, mobile systems such as unmanned aerial vehicle (UAV) are widely used for video surveillance and traffic monitoring [2]. For example, UAV with cameras can play more important roles in tracking people in transportation system from complex environments such as rural area, mountain, and sea, where fixed cameras are lacking or cannot reach, and they can accomplish dangerous and boring visual information gathering tasks with great excellence. Therefore, mobile person re-id is required. Unlike traditional person re-id methods, mobile systems have limited computing and storages resources, and in real-word applications, the communication bandwidth is also limited. However, image processing methods usually have high time and space complexity, and traditional person re-id methods are not suitable for mobile systems.

The hashing technique is an efficient image retrieval method for fast person re-id. It converts high-dimensional data into short binary code while keeping the data similar. With the operations including XOR and bit-counting, it will make it easier to do a fast search. Also, the hashing method based on machine learning is proved to be superior to the hashing method based on random projection [46]. However, most existing hashing methods use batch learning strategy, and they learn hash functions offline by using specific training data. If new person images are collected and added to the database, they also use the pre-trained model to obtain new hashing codes. If the information of new images is different to training data, the performance of offline hashing model will be affected significantly. Offline hashing methods must accumulate all the databases to retrain new hash functions and recompute all the hashing codes. They are obviously very inefficient, especially when the database is frequently updated by new collected images.

Existing hashing methods are not appropriate for the application of mobile system which requires fast real-time methods to solve their tasks such as person re-id. To cope with the real-time demand of mobile systems, a mobile person re-id method should have two import characteristics: (1) it should support fast retrieval of images and (2) it should learn hashing functions from changing training data in online manner. As an emerging technology, online hashing technique [7, 8] can be applied to cope with the online retrieval of changing database.

In the paper, online discrete anchor graph hashing (ODAGH) is proposed for mobile person re-id. ODAGH utilized the advantages of graph learning to effectively preserve the visual correlation of person images. However, traditional graph learning still requires much storage and computing resources. In ODAGH, we propose online anchor graph learning which first uses anchor graph to reduce the space cost of graph construction and then uses an online learning algorithm to optimize the graph model effectively and efficiently. The main contributions of this paper are summarized as follows:(i)ODAGH integrates the advantages of online learning and hashing, and as a hashing method, it can be easily applied to mobile systems with limited computing and storage resources. By leveraging online discrete algorithm, ODAGH can efficiently update hashing functions and learn discrete hashing codes when new images are collected. It can fulfil the real-time demand of mobile systems.(ii)ODAGH is an unsupervised hashing method, so it requires no training labels which are hard to obtain. By improving the graph learning, the visual correlation can be effectively preserved in hashing codes. Moreover, by using a GPI-based online optimization, quantization loss can be largely avoided in the hashing process.(iii)ODAGH only relies on one parameter, which guarantees its robustness in real applications. Also, experimental results demonstrate the effectiveness and efficiency of ODAGH compared to other person re-id methods.

The rest of this paper is organized as follows. The related work is reviewed in Section 2. We present the details of our proposed method in Section 3. The experimental configuration and results are introduced in detail in Section 4. Finally, the conclusion of the paper is given in Section 5.

2.1. Image Hashing

In recent years, image hashing has gained much attention from researchers in image retrieval. Generally, there are two kinds of hashing types: one is supervised hashing [911] and the other is unsupervised hashing [6, 12]. By learning the class labels or leveraging other supervised information, the binary hash coding can be studied in a supervised hashing method. Representative supervised hashing methods include column sampling-based discrete supervised hashing (COSDISH) [9], supervised discrete hashing (SDH) [10], deep supervised discrete hashing (DSDH) [13], and fast scalable supervised hashing (FSSH) [14]. However, because of the extremely expensive costs to annotated supervised labels, the supervised hashing methods have limited application range on large-scale image retrieval.

Unsupervised hashing methods have one obvious advantage that they can learn hash functions without any label information. Therefore, the original geometric structure, including visual information, can be retained in hash coding. Representative supervised hashing methods include spectral hashing (SH) [4], iterative quantization (ITQ) [6], angular reconstructive embeddings (AREs) [15], unsupervised hashing with binary deep neural network (UH-BDNN) [16], and similarity-adaptive deep hashing (SADH) [17]. Nevertheless, most unsupervised and supervised hashing methods learn hash functions in offline manner, and they cannot cope with the scenario where new data are continuously added to the database.

2.2. Pedestrian Detection and Tracking
2.2.1. Person Re-Identification

Traditional research on person re-id mainly includes visual feature representation [18, 19] and distance metric learning [20, 21]. In [22], feature effectiveness was identified in a query-adaptive manner for feature fusion. The method proposed in [23] learns discriminative and robust representations via dictionary learning. In [24], Fisher vectors were used for person re-id. Part loss network was proposed in [25] for deep representation learning. There are also many studies related to distance metric learning. In [26], person re-identification problem was formulated as a ranking problem, and Ensemble RankSVM was developed to solve the problem. KISSME [27] considers the scalability and learns a distance metric from equivalence constraints. Mahalanobis metric leaning was also used in several methods [28, 29]. Feature presentation learning can be regarded as the pre-processing step of person re-id, and it can be combined with metric learning or our proposed hashing method.

Recently, several hashing-based person re-id methods [3033] were proposed to improve the search efficiency. As we know, fast indexing is so essential between the raw image data and the binary hashing codes. In order to realize the end-to-end fast indexing, the convolutional neural network (CNN) is always adopted in deep regularized similarity comparison hashing (DRSCH) [34]. The spatial information is integrated with part-based deep hashing (PDH) by representing horizontal parts to binary codes for feature discrimination [35]. A novel coarse-to-fine (CtF) method [36] complementarily learns short and long codes in a pyramid structure from CNN. Most of existing hashing-based person re-id methods use deep network for feature representation learning, and they ignore the preservation of visual correlation in hashing codes. Moreover, deep learning [37, 38] methods cost many computing and storage of resources, so they are not appropriate for the mobile person re-id task which is always performed on mobile systems with limited resources.

3. Proposed Method

In this section, we introduce online discrete anchor graph hashing (ODAGH) in detail. The overall framework of ODAGH is shown in Figure 1. If a mobile device (e.g., UAV) collects new images of a person, then visual features are extracted for online discrete anchor graph learning. ODAGH can efficiently update hashing functions by only new features and training variables with small sizes. Then hashing codes of new images are added to database. In the query step, users can use a query image of specific person to search images of this person from database.

3.1. Problem Setting

Suppose the database consists of streaming images. When new images come in, we update the hash functions. We define as image matrix, where is the number of all training images in database and is the dimension of image feature. In the online learning process, image matrix X can be represented as , where denotes old images in the database and denotes new images, . Our goal is to learn hashing functions and hashing codes for all images, where is the code length. In order to guarantee efficiency, we directly use linear projection to reduce time cost. Thus, the formulation of hashing function is defined aswhere is the weight matrix. The main notations used in this paper are summarized in Table 1.

3.2. Basic Formulation

We use graph learning [39] to preserve the visual information in hashing codes. However, traditional graph learning costs many computing and storage resources, and it is hard to design an online algorithm for graph learning. Therefore, we use anchor graph learning which constructs anchor graph to approximate a graph.

For constructing the anchor graph, anchors firstly can be randomly selected from the training data, and later we can approximate the data neighbourhood structure. Then, the truncated similarity matrix can be achieved aswhere is the distance between image and anchor (in this paper, we use L2 distance for similarity matrix) and σ is the mean of all distances. In addition, we can approximate the graph matrix of each modality by .

Based on the construction of anchor graph, we can formulate anchor graph learning for hashing as

The above formulation is similar to traditional graph learning. The biggest difference is that the Laplacian matrix should be computed as , and is approximated graph matrix. denotes trace operator.

It is unavailable to directly solve the discrete constraint of hashing matrix H in equation (3). Therefore, we relax the hard discrete constraint by introducing a continuous matrix to replace H in equation (3). We also add the term to make H close to . Then, we can obtain the overall basic formulation aswhere is the parameter for the second term. The constraint is used to avoid the trivial solution of .

The objective function (4) can be solved by an alternate iteration method. At each step, we optimize one variable and fix other variables. When we fix and optimize , the objective function can becomewhere

It is obvious that equation (5) is a quadratic problem on the Stiefel manifold [40], and the Lagrangian function of this objective function is

By setting the derivative of equation (5) w.r.t to 0, we have

We can use generalized power iteration (GPI) [40] to solve , and the detailed algorithm of GPI is shown in Algorithm 1.

(i)Input:
(ii)Output:
(1)Initialize an orthogonal matrix ;
(2)for iter < max_iter do
(3)Update according to equation (10);
(4)Compute ;
(4)Perform the compact SVD ;
(6)Compute;
(7)end for

Then, we fix and optimize , and (4) becomes

The solution of (4) can be easily obtained by

3.3. Online Algorithm

In this section, we consider the online optimization of (4). In the online learning process, image matrix X can be represented as . Similarly, hashing matrix H can be represented as , where denotes the hashing codes of old images and denotes the hashing codes of new images.

Then, we consider the online improvement of GPI, and we can obtain thatwhere , and we can find that is only related to old images; it can be preserved in previous learning. So, in the online process, we only need to compute , whose time complexity is . The time of computing is linear to the size of new images, and it is irrelevant to size of the database.

The truncated similarity matrix Z also can be represented as .Therefore, we can obtain thatwhere . Similar to (8), we can find that the time complexity of computing D is also linear to the size of new images, and it is irrelevant to the size of the whole database.

Based on (11) and (12), we can obtain online updation of with linear time complexity which is irrelevant to the size of the database:

Similarly, when we compute , its term can be computed in an online manner:where , and the time complexity of computing is linear to the size of new data.

According to the above division for online updating, we propose an online optimization algorithm based on GPI to solve and optimize . The detailed procedure is summarized in Algorithm 2.

(i)Input:
(ii)Output:
(1)Compute truncated similarity matrix ;
(2)Compute according to (11);
(3)Compute according to (12);
(4)Compute according to (13);
(5)Initial an orthogonal matrix ;
(6)for iter < max_iter do
(7)Compute ;
(8)Update according to (14);
(9)Compute;
(10)Perform the compact SVD ;
(11)Compute ;
(12)end for
(13)Update;
(14)Add new hashing codes by

In Algorithm 2, according to above analysis, we can find that the time complexities of steps 1–5 and steps 7–8 are all , and is the size of new images. The time complexity of steps 9 and 11 is , d is the dimension of image features, and is the code length. Since the size of is , performing SVD on costs . and are irrelevant to the new data size , and thus they can be ignored in the computation of whole time complexity. As a result, the overall time complexity of Algorithm 2 is . The time complexities of the steps in Algorithm 1 are all irrelevant to the size of the database. Thus, even when the database is very large, our algorithm is constantly efficient.

3.4. Overall Process

Suppose the initial database has images , and it becomes larger when new images come in. The overall process of our OAGH is described in Algorithm 3.

(i)Input:
(ii)Output:
(1)Using as training data, compute according to Algorithm 1;
(2)While new images come in
(3)Update by Algorithm 2;
(5)End While

Suppose that the database size is currently. We can easily find that the time complexity of Algorithm 1 is , and the time complexity of Algorithm 2 is . Therefore, the time cost of our overall iterative online process is linear to the size of all images. Also, it is equivalent to the time cost of one-round learning of offline hashing.

4. Experiments

4.1. Datasets

We use Market-1501 dataset [41] to evaluate the performance of our method. There are 32,668 bounding boxes of 1,501 identities in Market-1501 dataset. It has the largest person re-id dataset with 14.8 cross-camera ground truths for each query on average. Also, it is closer towards realistic situations than previous ones. Market-1501 dataset can better show the effectiveness of online methods. Since we mainly focus on the efficiency of person image retrieval and do not focus on feature learning, we directly use IDE_ResNet_50 features.

4.2. Experimental Settings

We compared our methods with two non-hashing methods: Euclidean [41] and KISSME [27], and two supervised offline hashing methods: COSDISH [9] and SDH [10]. We set , which means that the database contains 1000 images at the beginning. Then, we set , which means that at each round, 1000 new images are added to the database. Finally, the database contains 32668 images; at the last round, 668 images are added. Since COSDISH and SDH are offline methods and it is obviously time-consuming to train in every round, to be consistent with real applications, we only train hash functions at first round for COSDISH and SDH. Deep learning-based hashing methods [3436] usually require many computing and storage resources, which are not appropriate for mobile embedded systems. Therefore, we do not use them for comparison.

Our method only relies on one parameter . We select a proper value of from a candidate set , and we finally choose the best . The hashing code length for all hashing methods is set to 512.

Non-interpolated mean average precision (MAP) score is utilized [42] to evaluate the performance of all compared methods. Given a query, the average precision (AP) is defined aswhere is the number of relevant images, is the precision of top i retrieved images, and if the image is relevant to query; otherwise, . The MAP score is the mean of AP scores from all queries. Besides MAP, precision-recall (PR) curves are also used to measure the performance of all methods.

4.3. Experimental Results

Table 2 shows the MAP score of all compared methods on Market-1501 dataset. The results of Euclidean and KISSME are reported in [41], and we carefully tune the parameters of COSDISH and SDH to report best results. From Table 2, we can find that OAGH performs best. Although OAGH is an unsupervised method, it can effectively preserve visual correlation of images in hashing codes. The performance of two offline hashing methods is much worse than other methods, even though they are supervised methods. The main reason is that they cannot support online learning of new images, and they only use 1000 initial images for training. The MAP scores obtained by the non-hashing method KISSME are very close to OAGH. The main reason is that hashing will introduce quantization loss. It is reasonable that hashing methods perform even worse than non-hashing methods.

Figure 2 shows the PR curves of all compared methods, and we can further observe the retrieval performance of them. We can obtain similar results as MAP scores. OAGH and two non-hashing methods significantly outperform other two hashing methods. Also, the yellow curve of KISSME is very close to blue curve of OAGH.

Both MAP scores and PR curves demonstrate the superiority of our ODAGH. It can easily outperform offline hashing methods. Also, it can even outperform classical non-hashing methods. The above results prove that the proposed online discrete anchor graph learning process has good applicability and makes the process effective.

At last, we also evaluate the total training time of ODAGH. Table 3 shows the comparison of training time, and the experiment described in Table 3 is conducted on a PC with Core i5 2.11 GHz CPU and 16 GB memory. For offline hashing, we use SDH as representative. When new images come in, offline hashing trains the whole database, while ODAGH only needs to process new data. As a result, in our experiment, ODAGH totally processes 32668 images, and SDH processes 560668 images in fact. Therefore, the training time of ODAGH is significantly less than SDH. If new data continue to come in and the database continues to be enlarged, the superiority of our method in total training time will be much more significant.

4.4. Influence of Code Length

Traditional graph-based hashing methods cannot directly solve the discrete constraint of hashing codes, which will introduce much quantization loss. When code length increases, the quantization loss will also increase, and thus their performance cannot be improved and even deteriorates when the code length increases [43].

Figure 3 shows the performance of ODAGH with increasing code length, and we can observe that the performance increases constantly with the increase of code length. The main reason is that ODAGH uses the GPI-based online discrete optimization procedure which can effectively avoid the quantization loss. We can also find that when that code length is larger than 512, the increase of MAP score is not significant, which indicates that 512 bits are sufficient for this person re-id task.

4.5. Parametric Analysis

In this section, we analyse the influence of parameter on our method. Figure 4 shows the MAP scores with different values of , where varies from 1 to . We can observe that the performance of ODAGH is stable with the variation of , and thus our method is not sensitive to the parameter and is robust in person re-id. Moreover, we can find that MAP score achieves relatively high value when , which has the same magnitude with image number. This phenomenon indicates that we can easily set the parameter according to database size.

5. Conclusions

In this paper, we propose online discrete anchor graph hashing (ODAGH) for mobile person re-id. We first introduce the basic formulation of discrete anchor graph learning which can effectively preserve visual correlation of images and avoid quantization loss. Then, we derive a novel online optimization algorithm to update hashing functions and compute discrete hashing codes in online manner. The time complexity of online optimization algorithm is linear to the size of new images and irrelevant to the database size. Experimental results on real-world dataset Market-1501 demonstrate both effectiveness and efficiency of ODAGH compared to several state-of-the-art non-hashing and offline hashing person re-id methods.

In future work, we will continue to improve ODAGH to make it more suitable for the real-world application of mobile person re-id. For example, several UAVs usually work together with each other, and distributed hashing [44] is required for the person re-id of UAV swam.

Data Availability

The Market-1501 data used in this study are from previously reported study which has been cited, and they are publicly available on https://github.com/zhunzhong07/IDE-baseline-Market-1501. The experimental result data used to support the findings of this study are available from the first author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (no. 61702388), Equipment Pre-Research Fund (JZX7Y20190253036101), Equipment Pre-Research Ministry of Education Joint Fund (6141A02033703), and Hubei Natural Science Foundation (2019CFC897).