Abstract

An intelligent urban system relies on different types of electronic and/or sensor technologies to collect data to facilitate education, security, healthcare, etc. Person reidentification (re-ID) plays a crucial role in intelligent security, in which significant progress has been made during the past few years. One example is using re-ID systems for law enforcement tasks such as suspect identification. One common obstacle is quickly deploying a re-ID system to new city, such as data label deficiency. For example, the lacking of enough labeled data to train an excellent model in a new city, only relying on a tiny amount of criminals’ pictures provided by witnesses. Fortunately, this can be modeled as a special application in unsupervised person re-ID of real-world scenarios, the study of which has become more prevalent in the re-ID community in recent years. In this paper, we first formulate our scenario as a cross-domain few-shot problem and discuss the difference between conventional supervised re-ID and unsupervised re-ID. Then, we introduce a reweighting instance method based on influence function (ReWIF) to guide the training procedure of the re-ID model. This method is motivated by the influence function, and we use two-step optimization to avoid the computation of Hessian matrices. We evaluate our proposed method on public datasets, including Market, Duke, and CUHK. Extensive experimental results show that our method can effectively address the domain bias of different datasets and the absence of labeled data on the target dataset, achieving state-of-the-art performance.

1. Introduction

Intelligent security is a fundamental function and need in smart and intelligent city. Person reidentification (re-ID) plays an important role in intelligent security city, especially in law enforcement areas. With the help of re-ID system, the process of suspect identification has been improved significantly. A re-ID-based video monitoring auxiliary system can greatly improve the efficiency of law enforcement tasks. However, this auxiliary system must be customized for different cities due to different illumination condition, different viewpoint, occlusion on cameras, etc. These factors make the re-ID system expensive since we need substantial pairwise labeled data across every pair of camera views. Furthermore, large-scale applications are limited as the vast majority of the data samples are unlabeled due to the prohibitive manual efforts in exhaustively labeling the pairwise re-ID data. To address the scalability problem, some recent works focus on unsupervised re-ID by clustering on the target unlabeled data [1] or transferring the knowledge from other labeled source dataset [2, 3]. However, the performance is still not satisfactory compared to the supervised setting. Should the scalability problem of re-ID system be formulated as an unsupervised problem?

Assume that city has a person re-ID system trained by labeled data sampling from city , while a crime occurs in city and the police obtain only a few image captures of the suspect. The city ’s police hope to use a person re-ID system to help them target this suspect. However, city does not have a person re-ID system because of the lack of labeled data or infrastructure. So, they attempt to develop upon the re-ID system of city . However, city and city have different illumination conditions and camera systems. The re-ID system borrowed from city may not match the requirements of city . So, we expect to fine-tune the person re-ID system, relying on a few pictures of the suspect provided by the witnesses, which is a cross-domain few-shot person re-ID problem. Such scenario is a common deep learning problem and is easier than unsupervised re-ID setting since some information is accessible from the target dataset.

Fortunately, this cross-domain few-shot person re-ID problem can be regarded as a special case of the unsupervised person re-ID problem, and lots of methods about unsupervised person re-ID [49] can be adapted to our problem easily. However, the performance is not good and sometimes even worse since there is no mechanism to utilize the labeled few-shot data on the target . Another two works [10, 11] proposed small sample training methods, which can be applied to our problem directly. Unfortunately, they do not utilize the source dataset. In this paper, we first give a formal definition of the cross-domain few-shot person re-ID problem, which is a new task in person re-ID community. And then discuss the differences to other person re-ID settings. Motivated by [12], we finally propose a re-weighting instance method based on influence function (ReWIF), in which a statistical model is employed as the influence function to measure how a sample from a source domain influences the performance on the target domain. Moreover, ReWIF can be regarded as a module of most unsupervised person re-ID methods to train the pretrained model on source labeled dataset.

In this paper, we study the unsupervised person re-ID problem and relax it to a concrete and common scenario. Then, a formal definition is given to define the scenario. After that, we propose an influence function based method to solve the problem.

The contributions of this work are three-folds:(i)The lack of generalizability for re-ID algorithm is the main issue for the scalability of re-ID system deployment. In person re-ID community, people formulate the scalability problem as purely unsupervised problem. In this paper, we find a new common application scenario for the scalability of re-ID system deployment. We formally define it as a cross-domain few-shot learning problem, which is a new task in person re-ID community. And then discuss the differences to other people’s re-ID settings (see Table 1).(ii)The new scenario can offer few-labeled data on the target dataset. In order to make full use of this benefit, we propose a method called ReWIF, which is based on an influence function with statistical analysis. It is a reweighting method on the loss function during the training. Besides that, it also can be regarded as a module of most unsupervised person re-ID methods to train the pretrained model on source labeled dataset.(iii)We simulate the scenario and evaluate our proposed method on public datasets between Market and DuckMTMC. Extensive experimental results show that the proposed method achieves state-of-the-art results.

2. Problem Formulation

With the advancement of urban infrastructure, more and more security cameras are adopted in modern cities [13]. However, deploying a re-ID system in a new city is expensive both in terms of time and financial costs, since the re-ID system usually requires customization. Not all cities have sufficient funds to cover the high cost. Thus, how to deploy a re-ID system in a new city quickly and cheaply is an emergent and meaningful task. This work considers a scenario in which we can deploy a re-ID system in a new city without large amounts of local labeled data as we mentioned before. In this paper, we use the following definitions to formulate the few-shot person re-ID problem.

Definition 1. Given a target testing dataset , it consists of a huge amount of human images capture from the city B’s camera system. A support dataset of M suspects’ images from the target dataset, where M << N, and M is not a fixed constant (because we cannot guarantee the condition that each suspect appears in surveillance for the same number of times). In addition, we have an auxiliary source training dataset borrowed from city , where is the number of auxiliary datasets. In our scenario, each tested personnel image is produced by pedestrian detector (e.g., faster RCNN) from traffic cameras in city . Each suspect’s image is provided by a witness in city and has related identity . Set is provided by city and has its own label space that is disjoint with and . Meanwhile, dataset from city has data bias with and because of different illumination conditions and different camera systems. So, we need to search the suspect from the testing dataset with the help of both and . In other words, is the source domain composed of multiple labeled training sets and and are prob set and gallery set (of the test set), respectively, from target domain. In our scenario, models can be trained on and and then be tested on and .
We summarize the most settings for person re-ID problem and ours in Table 1. Based on the domain information gap between training set and testing set, the person re-ID problem can be classified as domain adaptive re-ID problem (the last 2 row in Table 1) or general re-ID problem (the first 4 row in Table 1). The general re-ID problem is where the training set and testing set are from the same domain. Under such assumption, the general re-ID problem can be further classified into supervised person re-ID, few-shot person re-ID, unsupervised person re-ID, and semisupervised person re-ID based on the label information on training set and the trainable of prob set. The last row in Table 1 is our setting, which is similar to unsupervised domain adaptation person re-ID problem but with trainable prob set.

In this section, we review the most popular works in supervised person re-ID, few-shot person re-ID, and semisupervised person re-ID. Finally, we merge the unsupervised person re-ID and unsupervised domain adaptation person re-ID into one paragraph since most related works handle them together.

3.1. Few-Shot Person Re-ID

In the setting of supervised person re-ID, the identities of training set and testing set do not overlap. Two works [14, 15] propose to merge the prob set of testing set into training set and regard it as few-shot person re-ID problem. Although such modification perfectly meets the definition of few-shot learning [16], those two works do not draw much attention from the person re-ID community. The reason is that supervised person re-ID already achieves excellent performance. People in this community are more interested in the problem of unsupervised and domain adaptation. Hence, the few-shot person re-ID problem belongs to the supervised person re-ID problem, which only has a target domain. Our scenario is not a few-shot person re-ID problem.

3.2. Semisupervised Person Re-ID

Different from supervised person re-ID, semisupervised person reidentification [17] assumes that few data from the training set are labeled and the rest of the training set is unlabeled. In [17], the authors proposed a novel semisupervised region metric learning method, which estimates positive neighbors to generate positive regions and learn a discriminative region-to-point metric. There are some works that focus on the few-example video-based re-ID task. Ye et al. [18] proposed a dynamic graph matching (DGM) method, which iteratively updates the image graph matching and the label estimation to learn a better feature. Liu et al. [19] updated the classifier with K-reciprocal nearest neighbors (KNN) in the gallery set and refined the nearest neighbors by apply negative sample mining with KNN in the query set. After that, Wu et al. [10] proposed a progressive learning framework to exploit the unlabeled data for semisupervised person re-ID. Wu et al. [11] proposed a multiteacher adaptive similarity distillation framework to train a user-specified lightweight student model.

3.3. Unsupervised Domain Adaptation Person Re-ID

Another intuitive thought is that we can utilize auxiliary datasets because we can obtain a large amount of person re-ID datasets easily. In general, these auxiliary datasets are called source datasets, with different data distribution from the target dataset. Although advanced supervised learning methods [2022] have achieved great performance in person re-ID. However, when testing on a new dataset, the performance drops significantly. To address cross-domain person re-ID problem, many methods [1, 3, 2331] have been proposed. The model in [1] learns a deep re-ID model as a feature extractor on the source dataset. Next, they use unlabeled target data to refine the model by unsupervised method. Wei et al. [2] proposed a person transfer generative adversarial network (PTGAN) to reduce the gap of two domains, relieving the cost of annotations in the target dataset. The model proposed by Wang et al. [3] learns a transferable joint attribute-identity deep learning (TJ-AIDL) model by transferring the labeled information of the source dataset to a new unseen (unlabeled) target dataset. Wu et al. [23] proposed an Instance-Guided Context Rendering scheme to supervise model learning in the unlabeled target domain. The model proposed by Fu et al. [32] harnesses the similar characteristics existing in the samples from the target domain for learning to conduct person re-ID in an unsupervised manner. Huang et al. [25] concentrated on the difference of backgrounds in training and testing datasets and proposed SBSGAN to generate images with suppressed backgrounds. Wang et al. [29] proposed the SADA approach that guides the source domain images to align the target domain images by using a trained camera classifier.

All of these methods explore the relationship between the source domain and the target domain and transfer source knowledge to the target domain. They do not have any strategies to gain from a small amount of labeled data. Different from the unsupervised person re-ID setting, our scenario is a cross-domain few-shot problem since our target dataset includes a few labeled data. We expect to obtain substantially better performance based on these labeled data than the unsupervised method.

To address the cross-domain few-shot person re-ID problem, we propose a novel method to utilize auxiliary datasets in this paper. Because our support dataset is sampled from the same scenario with testing dataset , they have a similar data distribution so that we can use the support dataset to represent the test dataset. Firstly, we regard the support dataset as a template. Each template data sample includes the image patch and related label . Then, we select helpful samples from auxiliary training datasets to update our model with the guidance of the template data. Lastly, we test our model on the testing dataset.

4. Methodology

In this section, we introduce our method to address the cross-domain few-shot person re-ID problem. The difficulty is that we only have a few labeled data on the target dataset. Thus, we try to exploit the assistance of other unlabeled data to improve the performance of the model on the target dataset. To address the domain bias of a different dataset, we propose a novel policy to select important samples from the source dataset to train our model. In this section, we first discuss the intuition and motivation of our method. Then, we analyze the influence function. Lastly, we propose the reweighting instance method under the framework of transfer learning.

4.1. Intuition and Motivation

Different from the existent setting of few-shot learning, the definition of our problem does not have the constraint of “k-way N-shot.” It means that we cannot follow the solution of traditional few-shot learning. Instead, we refer to the idea of instance based transfer learning, due to two conditions: (1) cross-domain and (2) a few labeled samples from the target domain (few-shot setting) of the problem. The K-nearest neighborhood (KNN) is a simple example of instance based transfer learning. With the few labeled samples used as anchors to reweight the instances from the source domain, the KNN uses the Euclidean distance as the criterion for reweighting. In other words, KNN uses the Euclidean distance to measure the similarity and influence of samples. The influence of a sample from the source domain means how this sample influences the model performance in the target domain. The authors in [12] proposed to use a statistical model as the influence function to measure how a sample from the source domain influences a model’s performance on the target domain. Motivated, by [12], we proposed our reweighting instance method based on transfer learning.

4.2. Revisiting of the Influence Function

The goal of the work in [12] is to understand the effect of each training sample on the prediction. Formally, consider a model with parameter set and training set . Then, the effect of sample is defined by , where and is the optimal parameter set for the model defined as . The work in [12] employs influence function for an arbitrary sample . The idea is to compute the parameter change if is up-weighted by some small .

We briefly introduce how the work in [12] derives the closed form of from influence function . Define . Then, . Based on the local linear hypothesis, we have . With the definition of and , we know . The next step is to determine . Note that since is not related to . The detailed derivation of (1) can be found in the appendix of [12] or in our material:where is the Hessian matrix. Equation (1) holds when is twice differentiable and strictly convex in . However, the deep neural network does not satisfy the two conditions. There are two important contributions of the work in [12]: (1) approximating the Hessian matrix and speeding up the computation and (2) analyzing the feasibility under the condition of nonconvexity and nonconvergence.

5. Reweighting Instance Method Based on Influence Function

In this section, we introduce our reweighting instance method based on influence function (ReWIF for short), which employs the influence function and support set to guide the training of the source train set. First, we formulate our problem and derive the reweighting function through two approximations. Then, we use pretraining to approximate the Hessian matrices. Moreover, we propose a post-processing and dataset absorbing to improve the performance. Finally, we summarize our ReWIF in Algorithm 1.

5.1. Preliminaries and Formulation

Different from [12], we aim to maximize the model performance on the support set through determining the weights on the source training set. In other words, the determination of weights is guided by the few support set. So, we put forth the formulation as follows. Equation (2) describes how the model reweights the source training set and the guidance of support set is embodied in equation (3). represents source training samples and their corresponding weights are . denotes support samples. So, we can jointly optimize equations (2) and (3) to obtain our target model. Different from the influence function proposed in [12], the second term in equation (2) reweights all samples in the training set:where is the solution of equation (3), which satisfies if is convex and second-order differentiable and smooth about . However, the exact solution is difficult to obtain. Instead, we use one-step gradient descent to approximate . We approximate each as , where

5.2. Approximation of Hessian and the Close Form of

The inverse of Hessian in (4) is difficult to evaluate. To avoid the calculation of the Hessian, we first note that the Hessian results from the first term of equation (2). We approximate the optimization in equation (2) by two-stage optimization as in equations (5) and (6):

For stage one in equation (5), we take the standard back-propagation to find the optimal θ˜. For stage two in equation (6), we use the one-step gradient descent to approximate equation (6) by equation (7). In equation (8), the first step follows by the chain rule and the second step by using equation (6). Finally, we provide the expression of in equation (9):where and are the step sizes in the two approximations.

5.3. Post-Processing

In equation (9), , seen as the inner product of the two gradients, is not necessarily nonnegative. As such, we propose to normalize each aswhere and is a relaxing factor. Equation (10) means that we first clip the negative to zero and then normalize it by using the function.

We now provide a heuristic explanation of equation (10). The support set is always totally different from the source training set; large angles may exist between the gradient of support set and some source training samples . That is, could be negative and then could be clipped by to zero. From simulations, we find more than half of the elements in are clipped to zeros. Although we prefer the model focus on a few similar samples, many zeros in vector make the model prone to be trapped in a bad local minimum. Thus, we need to reduce the difference of weights between similar samples and dissimilar samples . The dissimilar samples are useful in the sense that they can provide prior knowledge about person re-ID task and prevent over-fitting. prevents our model from discarding the dissimilar samples, where controls the relaxation degree.

5.4. Absorbing Support Set into Source Training Set

We combine the support set to the source training set to optimize equations (5) and (6). First, normalization in equation (10) eliminates the absolute relationship among . Suppose we use equation (9) to calculate , most of them are less than zero, and the rest are less than 0.01. It indicates all the n samples are dissimilar with the support set. However, the normalization equation (10) will enlarge the nonzero . So, we absorb the support set into the source training set as anchors. Second, the support set share the same domain information with the target testing set, and it improves the model performance on target testing set.

5.5. Algorithm Flow

We summarize ReWIF as shown in Algorithm 1. Assume that, is a source training set including the support set . The support set and test set are from another domain. In stage one, we optimize by traditional deep learning optimizer Adam [33]. We prepare , , and from stage one. In stage two, we sample two mini-batches and from and . Then, calculate based on and and equation (9). Moreover, we normalize through equation (10). Finally, we use to compute and then update by gradient descent. Repeat stage two (3–6 lines in Algorithm 1) until the model converges.

Require: a deep CNN model (we choose ResNet50) with parameter ; cross-entropy loss ;
Ensure: the two stages optimized on the same CNN model;
(1)Stage one: prepare , then we optimize by traditional deep learning optimizer Adam with loss ;
(2)Stage two: prepare , support, and ;
(3) Sample mini-batch and from z, , respectively;
(4) Calculating the based on and and equation (8);
(5) Make the postprocessing by calculating equation (10).
(6)Using to compute then update by gradient descent;
(7)Repeat steps 3–6 until converges to .
5.6. Training Time

For our ReWIF in Algorithm 1, stage two is time-consuming and its running time is about two times that of stage one in each iteration. This can be seen from equation (4), where an extra forward and backward procedure is required to calculate . Fortunately, the number of iteration of stage two is much less than that of stage one.

ReWIF: We name ... and individual method.

Figure 1 shows the diagram of the ReWIF algorithm.

6. Experiments

The experiment contains three parts. First, some basic settings about dataset, evaluation metrics, network architecture, and training policy are introduced. Second, several experiments on the effectiveness of ReWIF are made, followed by an ablation study on two important hyperparameters. Third, we combine the ReWIF with other methods [1] and compare with their original versions.

6.1. Dataset and Settings
6.1.1. Dataset

To simulate different domain of different cities, we use different re-ID benchmarks: Market-1501 [34], DukeMTMC-reID [35], and CUHK03 [36]. Market-1501 contains 32,668 images with 1501 identities in total. It has three parts: 12,936 images of 751 identities for training and 19,732 images of 750 identities for testing. We rename “Market-1501” to “Market” for simplicity. DukeMTM-CreID is a subset of DukeMTMC [37] dataset. It has 16,522 images of 702 identities for training and 2,228 query images and 17,661 gallery images of 702 identities for testing. We rename “DukeMTMC-reID” to “Duke” for simplicity. CUHK include 3 different datasets CUHK01 (971 identities, 3884 images, manually cropped) [38], CUHK02 (1816 identities, 7264 images, manually cropped) [39], and CUHK03 (1360 identities, 13164 images, manually cropped + automatically detected) [40]. We follow the official splitting for training and testing.

Market, Duke, and CUHK belong to different domains that have different camera systems and illumination conditions. That is, they are suitable for our proposed scenario.

6.1.2. Evaluation Metrics

As one case of our experiments, we regard Duke as city and Market as city . The training set of Duke is our auxiliary source training set. We randomly choose M images of K identities from Market query images as the support set. The number of each person’s images is not equal. The gallery images of Market are our test set. We use the M queries to retrieval on the testing set and report mean average precision (mAP) and rank-1 accuracy for our testing set. Query and gallery sets could have same camera views, but for each individual query identity, his/her gallery samples from the same camera have been excluded.

6.1.3. Backbone

We employ a strong baseline [41] which has reached the state-of-the-art on Market and Duck as our backbone. The main tricks can be summarized in the following four points:(1)We use ResNet50 as our backbone, initialize the network with pretrained parameters on ImageNet, and optimize the network with Adam. Meanwhile, we change the dimension of last fully connected layer to , where is the number of identities in support dataset and training dataset.(2)The input size of image is 256 × 128 pixels. We adopt random erasing and flipping horizontally with 0.5 probability as the data augmentation, and the cross-entropy loss is adopted.(3)Each image is decoded into 32 bit floating point raw pixel values in [0, 1]. Then, we normalize RGB channels by subtracting 0.485, 0.456, and 0.406 and dividing by 0.229, 0.224, and 0.225, respectively.(4)Randomly sample P identities and K images of per person to constitute a training batch. Finally, the batch size equals to B = P × K. In this study, we set P = 32 and K = 4.

6.1.4. Training Set Policy

We combine the training set and the support set as a new training set. In each iteration, we randomly sample 128 images from the new training set and 192 images from support set. If the size of support is smaller than 192, we use all samples. Then, we fine-tune our model for 60 epochs with the guidance of the support set, and the learning rate is 3.5 × 10−6 during whole the fine-tuning process.

6.2. Experiment Results

We compare our proposed ReWIF model with ReWIF_S and three baseline policies.Policy1 trains the model only on the train set during the whole training process. The total training epochs is 180. The initial learning rate is 3.5 × 10−4 and is multiplied by 0.1 at every 60 epochs.Policy2 learns the model only on the support dataset during the whole training process. The total training epoch is 2500. The initial learning rate is 3.5 × 10−4 and is decreased by 0.1 at 1000, 1500, and 2000.Policy3 [42] is also a reweighting method motivated by [12]. It reweights training samples for dealing with the imbalanced classification problem. We make necessary but trivial modifications of the method to fit our problem. The total training epochs is 180. The initial learning rate is 3.5 × 10−4 and is multiplied by 0.1 at every 60 epochs.ReWIF_S learns the model on the combined dataset of training dataset and support dataset during the whole training process. The total training epoch is 610. The initial learning rate is 3.5 × 10−4 and is decreased by 0.1 at 40, 160, and 500.ReWIF fine-tune on the ReWIF_S for 60 epochs with the guidance of support set, and the learning rate is 3.5 × 10−6 during whole fine-tuning process.

The 2500 and 610 epochs of Policy2 and ReWIF_S are much larger than 180 epochs of Policy1. In the training procedure of Policy2 and ReWIF_S, the loss converges to a low level in early time (100 epochs for Policy2 and 50 epochs for ReWIF_S). However, the mAP still improves after the convergence of the loss. For ReWIF_S, the reason may be that the network regards the few support samples as hard samples and fits them slowly. The loss of the support set could be inundated in the large amount of training set.

6.2.1. Results on Single Source Training Set

The “Size” indicates the size of support set, where Y denotes the number of persons. All the models are evaluated on the testing set and experimented repeatedly for 10 times and get the average value, and the results are shown in Table 2. Our proposed method obtains significant performance improvement on the testing set. Policy1 only has the mAP metric less than 20, which indicates that there is a large gap between two domains. Meanwhile, Policy2 has unsatisfactory performance because the small training set makes the model over-fit. And the performance of Policy3 is between Policy2 and RINIMI_S. The reweighting method is not suitable for our cross-domain few-shot problem. By comparison, ReWIF and ReWIF_S are better than Policy 1–3 and ReWIF is better than ReWIF_S. To prove the effectiveness of addressing few-shot problems with our method, we study the influence of the size on the support set. From the results in Table 2, the ReWIF method has larger improvement on the fewer labeled data. Especially, when the size of support sets is 5 (Market as target set), the mAP has 23 and 4.4 improvement, respectively, compare with Policy2 and Policy3. The improvement is larger when the size of the support set is 1. Finally, the performance of all methods is worse with the decrease of the identity size, but our ReWIF can keep a relatively high performance when the identity size of support set is small.

6.2.2. Robustness Results on Support Set

We experiment the robustness for different support sets. For each identity size (1, 5, 10, and 50), we randomly choose 10 different support sets and train the network with ReWIF. Then, we calculate the mean and bias of the mAP and rank-1 for each identity size (the value in the parenthesis means bias), and the results are shown in Table 3. With the decrease of the identity size, less robustness of the method can be seen from the bias of the mAP.

6.2.3. Results on Multisource Training Set

Our method can merge multidomain source datasets conveniently. We include the CUHK01 [38], CUHK02 [39], and CUHK03 [40] into the source set one-by-one. The size of support set is 50(215) for Market target set and 50(169) for Duke target set, and the results are shown in Table 4. With the increase of the number of source set, the performance of the methods ReWIF and ReWIF_S increase stable. Instead, the results of Policy1, Policy2, and Policy3 on multisource are reversal with ReWIF, even randomly. The results in Table 4 demonstrate the effectiveness of our ReWIF on the aspect of leverage cross-domain datasets.

6.3. Ablation Study
6.3.1. Influence of Learning Rate

To show the influence of the learning rate, we test different learning rates during the whole fine-tuning process, and the results are shown in Table 5. In our experiments, the learning rate, set to 3.5e-5, can achieve the best performance of 84.8 mAP.

6.3.2. Influence of

is a tuning parameter in equation (10). We study the effectiveness and necessity of the alpha factor, and the results are shown in Table 5. Meanwhile, we show that a suitable alpha factor is also important. We search alpha’s value ranging from 0.0 to 10.0. The results in Table 5 show that alpha equal to 1.0 can achieve the best performance of 84.8 mAP.

6.3.3. Adaption to Other Methods

We combine the ReWIF with one unsupervised domain adaptation person re-ID method, PUL [1], one semisupervised person re-ID method, and One-Example [10].

The details of PUL are [1] as follows: it first trains a model on the source labeled dataset, uses the trained model to make predictions on the target dataset and obtains the soft-label (with many errors). Finally, we use the soft-label and data in the target dataset to retrain the model. We use ReWIF to replace the procedure of training on source labeled dataset at PUL and drive to PUL with ReWIF.

For One-Example [10], initially, a CNN model is trained on the one-example labeled samples (support set). Then, generate the pseudolabels for all unlabeled samples (target set) and select some reliable pseudolabeled data for training according to the prediction confidence. The selected subset is continuously enlarged during iterations according to a sampling strategy. At the initial stages, only the most reliable and easiest ones are included. In the subsequent stages, a growing number of pseudolabeled data are gradually selected to incorporate more difficult and diverse data. Since the One-Example method is designed on one domain, we employ the model which is trained with the One-Example method as the initial model and apply the ReWIF on it, driving to One-Example with ReWIF.

The size of support set is 50. The rank-1 and mAP of PUL and One-Example in our dataset setting are much larger than that in their paper (see Table 6). It is reasonable since the number of queries in our dataset setting is less. Meanwhile, PUL with ReWIF and One-Example with ReWIF are both better than their original versions on mAP and rank-1 due to the use of support set. It is surprising that the setting of absorbing the support set into training set improves the performance of this application.

6.4. Comparison with Other Methods

In this section, we compare ReWIF with state-of-the-art unsupervised person re-ID methods, including (1) UMDL [4] and PUL [1] that employed unsupervised learning; (2) eight UDA-based (Unsupervised Domain Adaptation, short for UDA) methods, including SPGAN [43], ATNet [44], CamStyle [45], HHL [46], and ECN [47] that used GANs, MMFA [48] and TJ-AIDL [3] that used images attributes, and UDAP [49] and AD-Cluster [30] that employed clustering. Table 7 shows the person re-ID performance while adapting from Market to Duke and vice versa.

We combine the ReWIF with UDAP [49]; the details are summarized as follows: we first train a model with the UDAP method. Then, we borrow the pseudolabel of clustering results as the support set of ReWIF to fine tune the model with ReWIF. In order to follow the UDA setting, we build the support set to use the samples of training set from target domain instead of prob set of target domain.

As shown in Table 7, the UDA-based methods get better performances in most cases compare with unsupervised methods. Specifically, AD-Cluster, the state-of-the-art method, performs much better than other methods. The performance of GAN-based methods is diverse. Note that our ReWIF achieves good performance compared with GAN-based methods. Even comparing with AD-Cluster, we get competitive results.

People might notice that the UNRN [50] is much better than other methods. UNRN borrows the uncertainty idea [51, 52] with mean teacher framework [53], which is the new state-of-the-art method on those two datasets of unsupervised domain adaptation task. However, since it is not directly to adaptive our method with it, we leave it as the future work.

7. Conclusion and Future Work

In this work, we proposed a special application scenario for cross-domain few-shot person re-ID problem. To address the problem of insufficient labeled data in the target dataset, we try to involve an auxiliary training set as source datasets. Meanwhile, because of the gap between the source dataset and the target dataset, the performance of the model trained on the source dataset drops drastically. So, we propose a reweighting policy that minimizes the loss of the support dataset to select helpful samples. The experiment results confirm that our reweighting policy is effective and achieves state-of-the-art performance in our problem.

7.1. Limitations

The ReWIF is a time-consuming method since the computation of second-order derivative (although we remove the computation of Hessian metrics). A direct drawback is that we should train the model again when the new suspect appears. Fortunately, the training time of ReWIF is less than 1.5 hours on 4 GTX 2080 GPUs. Moreover, when we accumulate enough suspects’ images (labeled data) after deploying the system in the new city for about half a year, we can retrain the model as a supervised person re-ID problem

7.2. Future Works

The future works can be summarized in two parts. Firstly, extending our framework to methods beyond the single-network-based method, e.g., AD-Cluster, which is an adversarial framework. Another direction, e.g., video data are common used in real-world application, which need to further explore and adapt to our method. Secondly, exploring the network compression methods since the drawback mentioned above. Three works [5456] can be explored to improve ReWIF, especially for the dictionary learning-based multifeature fusion method [54].

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Scientific Research Project of Tianjin Education Commission (no. 2021KJ186).