Discriminative Codebook Hashing for Supervised Video Retrieval

Bian, Xiaoman; Lan, Rushi; Wang, Xiaoqin; Chen, Chen; Liu, Zhenbing; Luo, Xiaonan; Lai, Kuei-Kuei

doi:https://doi.org/10.1155/2021/5845094

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Interpretation of Machine Learning: Prediction, Representation, Modeling, and Visualization 2021

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 5845094 | https://doi.org/10.1155/2021/5845094

Discriminative Codebook Hashing for Supervised Video Retrieval

Xiaoman Bian,¹Rushi Lan,¹Xiaoqin Wang,¹Chen Chen,¹Zhenbing Liu,¹Xiaonan Luo,¹and Kuei-Kuei Lai²

Academic Editor: Nian Zhang

Received18 May 2021

Accepted12 Aug 2021

Published25 Aug 2021

Abstract

In recent years, hashing learning has received increasing attention in supervised video retrieval. However, most existing supervised video hashing approaches design hash functions based on pairwise similarity or triple relationships and focus on local information, which results in low retrieval accuracy. In this work, we propose a novel supervised framework called discriminative codebook hashing (DCH) for large-scale video retrieval. The proposed DCH encourages samples within the same category to converge to the same code word and maximizes the mutual distances among different categories. Specifically, we first propose the discriminative codebook via a predefined distance among intercode words and Bernoulli distributions to handle each hash bit. Then, we use the composite Kullback–Leibler (KL) divergence to align the neighborhood structures between the high-dimensional space and the Hamming space. The proposed DCH is optimized via the gradient descent algorithm. Experimental results on three widely used video datasets verify that our proposed DCH performs better than several state-of-the-art methods.

1. Introduction

Under the condition of the increase in smartphones, the amount of video data has shown an explosive growth trend [1–3]. For example, TikTok has over 400 million daily active users who upload approximately 2,000 videos every minute. YouTube receives a total of 100 hours of videos per minute [4–6]. Due to the economic storage and efficiency of binary codes, hash-based methods have been widely applied to visual retrieval tasks [7–13].

Previous hash-related work [14] mainly focused on image hashing and can be divided into data-independent and data-dependent methods. Data-independent approaches learn binary codes without data information but through random space projection. The most representative algorithm is local sensitive hashing (LSH) [15], which generates huge redundant information using random mapping and obtains satisfactory performance with long hash codes. Data-dependent hash methods [16–18], which can also be divided into unsupervised hashing and supervised hashing, are proposed to generate more efficient hash codes by maintaining the neighborhood structure between data. For example, Gong et al. [19] proposed iterative quantization hashing (ITQ), which minimizes quantization error by rotating principal component analysis (PCA) projection data. Spectral hashing (SH) [20] assumes that data obey a uniform distribution and divides the data according to the main direction of the data stream. Density sensitive hashing (DSH) [21] extends LSH by studying structural information. Zhang et al. [22] developed a convergence-preserving parametric learning algorithm, called latent factor hashing (LFH), to learn similarity-preserving binary codes based on latent factor models. Liu et al. [23] proposed kernel supervised hashing (KSH) by applying kernel-based formulas to accommodate linearly inseparable data and designed a greedy algorithm to solve the hash function optimization problem.

In recent years, hashing methods proposed for video retrieval have also received extensive attention [24–31] and are composed of two categories: machine learning methods and deep hashing. Machine learning methods, resembling image hashing approaches, learn binary codes of video keyframes based on the low-level manual features and then calculate video hashing codes via averaging. Wu et al. [4] employed video hashing via using color histograms to obtain global features. This is the first application of hash learning in the video field. Multiple-feature hashing (MFH) [32] adopts the weight-based method to combine different features. Ye et al. [33] used video structural information in the supervised learning paradigm to obtain the optimal binary codes. Stochastic multiview hashing (SMVH) [34] attempts to separately calculate the probability similarity matrices of video frames in the feature space and the Hamming space, and then, the difference between the above two probability matrices is minimized using the KL divergence. Nie et al. [35] defined joint multiview hashing (JMVH) by maximizing the interclass distance and minimizing the innerclass distance to preserve the global structure and local structure with multiple features. Boosting temporal video hashing (BTVH) [36] studies the multitable learning problem to boost the performance and captures the inherent similarity of video from both visual and temporal perspectives. In addition, some researchers in recent years have used deep networks to obtain the temporal and spatial information between keyframes. For instance, central similarity quantization (CSQ) [37] learns the temporal information by using 3D convolutional neural networks and proposes a view point called hash center to enhance the central similarity.

However, most existing video hashing approaches may lead to the following problems. (1) Low discriminability among different categories: functions based on pairwise similarity or triple relationships only consider local information, which results in good maintenance of the information of similar samples but shows poor performance in distinguishing samples from different categories. (2) Poor performance in real-world scenarios: in real application scenarios, similar data often accounts for only a small proportion, and most samples are not similar, which leads to low efficiency when the data are imbalanced [37]. (3) Greater time costs on deep learning: deep learning frameworks are time-consuming when training models and have no significant performance based on the spatiotemporal information extracted by the network. Hence, these video hashing functions cannot learn discriminative hash codes to enhance the performance.

To solve the above problems, in this work, we propose a novel framework for supervised video retrieval, called discriminative codebook hashing, which considers the global structure to construct the hash function. DCH encourages samples within the same category to converge to the identical codeword and maximizes the mutual distances between different categories. Specifically, the discriminative codebook is first generated based on two characters: the predefined distance between intercode words and Bernoulli distributions for ensuring that each hash bit stores more information. Then, to keep the similarity matrix between the feature space and the Hamming space, the composite KL divergence is proposed to solve this problem. Finally, the gradient descent algorithm is utilized to optimize the algorithm. In this way, we can obtain discriminative binary codes for video retrieval. Figure 1 shows the framework of DCH, and the method we proposed has the following innovations:(i)We proposed the discriminative codebook based on the predefined distance between intercode words and Bernoulli distributions for ensuring each hash bit to store more information(ii)The DCH method, which can maximize the distance of the intercode words generated by the predefined codebook to learn discriminative binary codes for supervised video retrieval, is proposed(iii)We verify our proposed method by experimenting on three widely used datasets, which shows that DCH has a significant improvement in contrast with several state-of-the-art methods

Figure 1

The framework of DCH. We divide the entire experiment into two steps, namely, offline learning and online retrieval. In the offline phase, we join keyframe features and predefined codebook to learn hash functions. In the online phrase, we map the query video into a set of binary codes through hash functions. Next, we use the exclusive or (XOR) operation to obtain the Hamming distance between the query video and samples in the database. Finally, we take videos with the shortest Hamming distance as the video retrieval results.

The other sections are organized as follows. Section 2 introduces some preliminary works. Section 3 introduces the proposed discriminative codebook hashing in detail. The experimental work is presented in Section 4, and the conclusion of DCH is shown in Section 5.

2. Preliminary Work

In this section, we briefly introduce the preliminary work, namely, stochastic multiview hashing [34]. It is a supervised video retrieval method that aims to preserve the similarity structure from the original space to the Hamming space.

Let be the video set, where indicates the video of and is the number of videos. is hash code of the video set, where is -bit length binary codes transformed by . The video features are extracted based on the set of keyframe features , where , is the number of keyframes, and is the dimension of each keyframe. represents the corresponding binary codes of the keyframes, where . The conversion relationships between the above variables are formulated aswhere is the temporal result of linear projection, is a bias parameter, is the projection matrix, is the set of frames, and is the sum of samples in the set. The high-dimensional keyframe feature matrix is first projected into the lower matrix . Then, the sigmoid function is used to map the variable between and . Finally, a thresholding function is used to change the data into a binary code with if and , otherwise.

SMVH keeps the similarity matrix between the feature space and the Hamming space using a composite KL divergence measure. In particular, it separately calculated the similarity probability matrix in the original space and the pairwise similarity matrix among samples in the Hamming space. Then, the KL divergence is used to examine how well the above two probability matrices and match. Therefore, the objective function of SMVH is defined as follows:where controls the weight of the regular term to prevent overfitting and is the composite KL divergence. The latter can be represented aswhere controls the influence of the composite KL divergence, is the similarity structure based on , and is another probability matrix preserving the similarity information of in the Hamming space. In addition, the KL divergence is defined as follows:where is a conditional probability that reflects the similarity between and , and another conditional probability represents the probability of returning given the query .

3. Discriminative Codebook Hashing

In this section, we present the proposed DCH in detail through four parts, including the proposed discriminative codebook, the objective function, algorithmic optimization, and complexity analysis.

3.1. Discriminative Codebook

Motivated by CSQ [37], we propose a novel and discriminative codebook for supervised video retrieval, where is the code word of the category. The proposed codebook is defined according to two characters. The first is that the value in the same bit of different code words obeys a Bernoulli distribution. Specifically, the proportions of and of the same bit in different categories are both , that is, has a probability of being or , which will maximize the entropy and store more information in each bit. The other is that the mutual distances among intercode words are defined as follows:where is the Hamming distance between code words and , is the length of binary codes, and represents the fault tolerance. The mutual distance between intercode words will be the largest constrained by equation (7).

Overall, the proposed codebook encourages samples within the same category to converge to the same codeword and maximizes the mutual distance between different categories. Therefore, the proposed codebook can preserve global structures and help generate discriminative binary codes for video retrieval. The scheme of the proposed discriminative codebook is presented in Algorithm 1.

Input: the number of categories ; the number of samples per category ; code length ; maximum number of iterations ; fault tolerance rate .
Output: codebook
(1)	for iteration
(2)	for category
(3)	[random half coordinate]
(4)	[the rest coordinate]
(5)	end
(6)	if any two rows of satisfy equation (7)
(7)	break
(8)	end
(9)	end

3.2. Objective Function

According to the proposed discriminative codebook , we expand each row of the codebook matrix into according to the number of samples, where . The detailed generation process of is shown in Algorithm 2. We minimize the error between the binary codes and the predefined codebook as

Input: training data ; codebook ; maximum number of iterations ; code length ; parameters , , ; learning rate ;
Output: hash codes .
(1)	Initialization: initialize the projection matrix and bias matrix as a random matrix and vector.
(2)	Generatingaccording to the number of samples:
(3)	for category :
(4)
(5)	end
(6)	Gradient descent:
(7)	for iteration :
(8)	-Step:
(9)	-Step:
(10)	end
(11)	Video binary code computation: video hash codes are obtained by equations (1)–(3).

Specifically, for each , we take as the codebook of to make samples in the same category share the same codebook and samples in different categories have discriminative binary codes.

To keep the similarity matrix between the feature space and the Hamming space, we join the composite KL divergence and our proposed codebook to construct the overall objective function of DCH as follows:where controls the weight of the error loss between the codebook and the learned hash codes, and the second term of equation (9) aligns values between binary codes and their corresponding code word.

In this way, our proposed DCH can solve the problem that other algorithms only consider the pairwise relationships and ensure that samples in the same category share the same code word. Furthermore, DCH maximizes the mutual distances between different categories and then obtains discriminative binary codes.

3.3. Algorithmic Optimization

The optimization problem has two main variables: and . Our solution is to use the gradient descent algorithm to find good solutions. To facilitate the writing, we split the objective function equation (9) into three parts:

The detailed optimization procedure is presented as follows.

-Step: the corresponding problem is to minimize the following loss function:

To compute the optimal , the relevant deviation formula can be expressed as

The derivative of w.r.t. can be computed as follows:where and are represented as

Following the norm derivation law, can be optimized as follows:where indicates that the elements in the same position of two matrices are multiplied.

For , we have the derivative that

-Step: the subproblem of is given by

The deviation w.r.t. can be expressed as

The derivative of is described as follows:where

The second term of equation (18) is described as follows:

Algorithm 2 describes the overall algorithm optimization process of the proposed DCH.

3.4. Complexity Analysis

The time complexity of the entire training process of SMVH [34] is approximately , and the proposed DCH algorithm adds two parts time-consuming on this basis. The first part is the learning process of , and the time complexity is . The second part is that the time complexity of optimizing equations (15) and (21) together is in each iteration. Therefore, the overall time complexity of DCH is . In this work, time complexities and can be ignored due to so that our complexity is nearly . Additionally, the calculation of the hash codes is a linear projection with a time complexity of approximately , and the online search can be performed by XOR operations. Although the algorithm proposed in this paper adds a constraint on SMVH, the maximum number of iterations directly affects the time complexity of the algorithm. It can be proven in subsequent experiments that DCH can converge in fewer iterations. Thus, the time complexity of DCH is in a reasonable range.

4. Experiments

In this section, we first introduce the datasets used in this paper, and then, the baselines and some experimental details will be introduced. Finally, we present the experimental results.

4.1. Datasets

CC_WEB_VIDEO [4] is the most useful dataset in near-duplicate video retrieval (NDVR) research, which contains data from YouTube, Google, and Yahoo. There are 12,877 videos that are divided into 24 sets, and keyframes are extracted by a uniform sampling method to represent the video. Since some videos do not have label information, we take 3,482 videos with labels as the experimental dataset. In each category, we select of the video data as the training set and the remainder as the testing set. We extract 10 keyframes for each video uniformly and extract 4096-dimensional features to represent keyframes by using the pretrained VGG-19 network.

HMDB51 [38] contains 6,766 human action videos selected from movies and some other public sources such as YouTube. The dataset is divided into 51 categories, and each of them includes approximately 100 clips. In each category, we randomly select 45 video samples. Of these, 25 videos are added to the training set and the rest are select to the testing set. We uniformly extract 10 keyframes for each video, and the VGG-19 pretraining network is used to extract the 4096-dimensional deep features.

UCF101 [39] contains 13,320 videos which has been divided into 101 human behavior categories, such as sports, instruments, character interactions, and others used for action recognition. We randomly select 70 videos in each category to join the training set, and 30 videos to join the testing set. For each video, 10 keyframes are uniformly selected to represent the video. We use VGG-19 to extract the 4096-dimensional features for each keyframe.

4.2. Experimental Setting

4.2.1. Baselines

Several state-of-the-art hash functions, including ITQ [19], SH [20], DSH [21], LFH [22], KSH [23], JMVH [35], and SMVH [34], are used for comparison. Among these methods, ITQ, SH, and DSH are unsupervised hashing methods, while LFH, KSH, JMVH, and SMVH are supervised hashing methods. For the comparative test, we use the source codes published to conduct the experiment. JMVH and SMVH can also be used for multiview video retrieval, but in this paper, we only test these methods as a single view method. It is worth noting that all the experimental results are obtained in MATLAB R2016a on the same computer with an Intel Core i7-6700 CPU @ 3.40 GHz, 72 GB RAM and the 64 bit Windows 10 operating system.

4.2.2. Evaluation Metrics

We use four popular evaluation metrics to comprehensively evaluate experimental results. The mean average precision (mAP) is widely used in the retrieval field. The higher the mAP score is, the better the retrieval performance of the method is. The precision@K curve represents the precision accuracy versus the first retrieved samples, where precision represents the proportion of the number of retrieved correct videos to the total number of retrieved videos. The recall@K curve represents the average recall rate versus the first retrieved samples, where recall represents the proportion of the correct video volume retrieved in all near-duplicate video samples. The precision-recall (PR) curve is an index used to evaluate reliability and is widely used in the fields of medicine and machine learning.

4.2.3. Parameter Selection

We have three model parameters, including , , and , and the number of iterations . According to SMVH [34], we set and . As shown in Figure 2(a), when is in the range of to 1, the results are stable across three different datasets. Therefore, we empirically choose in our proposed model. The maximum number of iterations determines the training time cost and the performance, so it is worth discussing. Figure 2(b) shows the effect of the maximum iterations in the range of 100 to 1400 on mAP performance. For HMDB51, it can be seen that the best mAP is generated with before decreasing. However, in the other two datasets, is not an optimal experimental result. Therefore, after comprehensive consideration, is set as the final parameter setting.

(a)

(b)

4.3. Results and Discussion

Table 1 shows the mAP results for different lengths of hash codes on the three datasets, and the results of other evaluation metrics are shown in Figures 3–5. We will give the detailed analysis of all results of the three datasets in the following parts.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

According to Table 1, for the CC_WEB_VIDEO dataset, the mAPs are very high because the dataset is movie clips, and videos of the same category are near-duplicate videos. As shown in Table 1, the performance of the proposed DCH is at least better than that of the other methods from 32 to 64 bits. When the code length is 96 bits, the mAP of DCH is slightly lower than that of LFH. As shown in Figure 3, the experimental results of our method in precision@K and recall@K are equal to or slightly higher than those of most other methods. Besides, as the code length increases, the performance of our proposed DCH gradually surpasses that of other methods. Figures 3(i)–3(l) show that the area surrounded by DCH is gradually increasing.

Table 1 shows that our proposed DCH performs better than other hash methods in most cases in the HMDB51 dataset. Although the mAP performance of the JMVH method surpasses over that of DCH with 32 bits, the mAPs of our proposed DCH are better than those of the other comparison methods in the subsequent experiments. Figure 4 shows that when the length of hash codes is larger than 32 bits, regardless of whether precision@K curve, recall@K curve, or PR curve is used, DCH has excellent performance compared with other methods in all metrics for the precision@K curve, recall@K curve, and PR curve.

For the UCF101 dataset, DCH obtained the optimal experimental results in the range of [32, 48, 64] bits. It is worth noting that the size of the UCF101 dataset is relatively large, and SMVH cannot obtain discriminative video hash when the hash code length is very small. Therefore, SMVH has no experimental results available for and . As shown in Figure 5, the performance of DCH is much higher than those of some of the methods except JMVH. We can see that the recall rate of DCH for positive samples is slightly lower than that of JMVH based on Figures 5(e)–5(h). Figures 5(i)–5(k) show that the performance of DCH for 32 to 48 bits is better than those of all other methods for the PR curve.

5. Conclusion

In this paper, we propose a novel supervised video hashing framework, termed discriminative codebook hashing, which can generate discriminative binary codes for video retrieval. The proposed DCH encourages samples within the same category to converge to the same code word and maximizes the mutual distances between different categories. Specifically, we generate a discriminative codebook to distinguish between samples of different categories more accurately. Extensive experimental results prove that the performance of DCH is significantly improved compared to several state-of-the-art methods. In future work, we will use a smaller matrix storing the similarity information between samples to avoid consuming considerable training time and space when the amount of data is large. This will improve the performance of the model while reducing the time complexity.

Data Availability

CC_WEB_VIDEO dataset can be downloaded from http://vireo.cs.cityu.edu.hk/webvideo/, the HMDB51 dataset can be downloaded from https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/#dataset, and the UCF101 dataset can be downloaded from https://www.crcv.ucf.edu/data/UCF101.php.

Conflicts of Interest

The authors declare that there are no conflicts of interest in the publication of this paper.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (nos. 61902087, 61772149, 61936002, and 6202780103), Guangxi Science and Technology Project (nos. 2019GXNSFFA245014, AD18281079, AA18118039, and AD18216004), and Guangxi Key Laboratory of Image and Graphic Intelligent Processing (no. GIIP2001).

References

X. Wu, C.-W. Ngo, A. G. Hauptmann, and H.-K. Tan, “Real-time near-duplicate elimination for web video search with content and context,” IEEE Transactions on Multimedia, vol. 11, no. 2, pp. 196–207, 2009.
View at: Publisher Site | Google Scholar
V. O. Maraghi and K. Faez, “Scaling human-object interaction recognition in the video through zero-shot learning,” Computational Intelligence and Neuroscience, vol. 2021, Article ID 9922697, 15 pages, 2021.
View at: Publisher Site | Google Scholar
Z. Cai, X. Zheng, and J. Yu, “A differential-private framework for urban traffic flows estimation via taxi companies,” IEEE Transactions on Industrial Informatics, vol. 15, no. 12, pp. 6492–6499, 2019.
View at: Publisher Site | Google Scholar
X. Wu, A. G. Hauptmann, and C. Ngo, “Practical elimination of near-duplicates from web video search,” in Proceedings of the 15th International Conference on Multimedia, pp. 218–227, ACM, Bavaria, Germany, 2007.
View at: Publisher Site | Google Scholar
Z. Lu, Y. Wang, Y. Li, X. Tong, C. Mu, and C. Yu, “Data-driven many-objective crowd worker selection for mobile crowdsourcing in industrial IoT,” IEEE Transactions on Industrial Informatics, vol. 30, p. 1, 2021.
View at: Publisher Site | Google Scholar
Y.-N. Ma, Y.-J. Gong, C.-F. Xiao, Y. Gao, and J. Zhang, “Path planning for autonomous underwater vehicles: an ant colony algorithm incorporating alarm pheromone,” IEEE Transactions on Vehicular Technology, vol. 68, no. 1, pp. 141–154, 2019.
View at: Publisher Site | Google Scholar
G.-H. Liu and Z. Wei, “Image retrieval using the fused perceptual color histogram,” Computational Intelligence and Neuroscience, vol. 2020, Article ID 8876480, 10 pages, 2020.
View at: Publisher Site | Google Scholar
R. Lan, L. Sun, Z. Liu, H. Lu, C. Pang, and X. Luo, “MaDNet: a fast and lightweight network for single-image super resolution,” IEEE Transactions on Cybernetics, vol. 51, no. 3, pp. 1443–1453, 2021.
View at: Publisher Site | Google Scholar
W. Li, Y. Zhang, Y. Sun et al., “Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 8, pp. 1475–1488, 2020.
View at: Publisher Site | Google Scholar
R. Lan, Y. Zhou, Z. Liu, and X. Luo, “Prior knowledge-based probabilistic collaborative representation for visual recognition,” IEEE Transactions on Cybernetics, vol. 50, no. 4, pp. 1498–1508, 2020.
View at: Publisher Site | Google Scholar
X. Wang, R. Lan, H. Wang, Z. Liu, and X. Luo, “Fine-grained correlation analysis for medical image retrieval,” Computers & Electrical Engineering, vol. 90, Article ID 106992, 2021.
View at: Publisher Site | Google Scholar
L. Shang, L. Yang, F. Wang, K. Chan, and X. Hua, “Real-time large scale near-duplicate web video retrieval,” in Proceedings of the 18th International Conference on Multimedia, pp. 531–540, ACM, Firenze, Italy, 2010.
View at: Publisher Site | Google Scholar
N. Q. Ly, T. K. Do, and B. X. Nguyen, “Large-scale coarse-to-fine object retrieval ontology and deep local multitask learning,” Computational Intelligence and Neuroscience, vol. 2019, Article ID 1483294, 40 pages, 2019.
View at: Publisher Site | Google Scholar
J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong, “Multiple feature hashing for real-time large scale near-duplicate video retrieval,” in Proceedings of the 19th International Conference on Multimedia, pp. 423–432, ACM, Scottsdale, AZ, USA, 2011.
View at: Publisher Site | Google Scholar
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” in Proceedings of the 20th ACM Symposium on Computational Geometry, pp. 253–262, ACM, Brooklyn, NY, USA, 2004.
View at: Publisher Site | Google Scholar
W. Liu, J. Wang, S. Kumar, and S. Chang, “Hashing with graphs,” in Proceedings of the 28th International Conference on Machine Learning, pp. 1–8, Bellevue, WA, USA, 2011.
View at: Google Scholar
Y. Fang and Y. Ren, “Supervised discrete cross-modal hashing based on kernel discriminant analysis,” Pattern Recognition, vol. 98, Article ID 107062, 2020.
View at: Publisher Site | Google Scholar
X. Liu, X. Nie, X. Xi, L. Zhu, and Y. Yin, “MoBoost: a self-improvement framework for linear-based hashing,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 871–880, ACM, Beijing, China, 2019.
View at: Google Scholar
Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2916–2929, 2013.
View at: Publisher Site | Google Scholar
Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1753–1760, Vancouver, BC, USA, 2008.
View at: Google Scholar
Z. Jin, C. Li, Y. Lin, and D. Cai, “Density sensitive hashing,” IEEE Transactions on Cybernetics, vol. 44, no. 8, pp. 1362–1371, 2014.
View at: Publisher Site | Google Scholar
P. Zhang, W. Zhang, W. Li, and M. Guo, “Supervised hashing with latent factor models,” in Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 173–182, ACM, New York, NY, USA, 2014.
View at: Publisher Site | Google Scholar
W. Liu, J. Wang, R. Ji, Y. Jiang, and S. Chang, “Supervised hashing with kernels,” in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2074–2081, IEEE Computer Society, Providence, RI, USA, 2012.
View at: Publisher Site | Google Scholar
G. Wu, J. Han, Y. Guo et al., “Unsupervised deep video hashing via balanced code for large-scale video retrieval,” IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1993–2007, 2019.
View at: Google Scholar
Y. Hao, T. Mu, J. Y. Goulermas, J. Jiang, R. Hong, and M. Wang, “Unsupervised t-distributed video hashing and its deep hashing extension,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5531–5544, 2017.
View at: Publisher Site | Google Scholar
G. Wu, L. Liu, Y. Guo et al., “Unsupervised deep video hashing with balanced rotation,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 3076–3082, Sydney, Australia, 2017.
View at: Publisher Site | Google Scholar
V. E. Liong, J. Lu, Y.-P. Tan, and J. Zhou, “Deep video hashing,” IEEE Transactions on Multimedia, vol. 19, no. 6, pp. 1209–1219, 2017.
View at: Publisher Site | Google Scholar
S. Li, Z. Chen, J. Lu, X. Li, and J. Zhou, “Neighborhood preserving hashing for scalable video retrieval,” in Proceedings of the 2019 IEEE International Conference on Computer Vision, pp. 8211–8220, IEEE, Seoul, South Korea, 2019.
View at: Publisher Site | Google Scholar
S. Chen, Y. Zhao, Q. Jin, and Q. Wu, “Fine-grained video-text retrieval with hierarchical graph reasoning,” in Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, pp. 10635–10644, IEEE, Seattle, WA, USA, 2020.
View at: Publisher Site | Google Scholar
R. Yang, F. Mentzer, L. V. Gool, and R. Timofte, “Learning for video compression with hierarchical quality and recurrent enhancement,” in Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6627–6636, IEEE, Seattle, WA, USA, 2020.
View at: Publisher Site | Google Scholar
Y. Wang, X. Nie, Y. Shi, X. Zhou, and Y. Yin, “Attention-based video hashing for large-scale video retrieval,” IEEE Transactions on Cognitive and Developmental Systems, 2019.
View at: Google Scholar
J. Song, Y. Yang, Z. Huang, H. T. Shen, and J. Luo, “Effective multiple feature hashing for large-scale near-duplicate video retrieval,” IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1997–2008, 2013.
View at: Publisher Site | Google Scholar
G. Ye, D. Liu, J. Wang, and S. Chang, “Large-scale video hashing via structure learning,” in Proceedings of the 2013 IEEE International Conference on Computer Vision, pp. 2272–2279, IEEE Computer Society, Sydney, Australia, 2013.
View at: Publisher Site | Google Scholar
Y. Hao, T. Mu, R. Hong, M. Wang, N. An, and J. Y. Goulermas, “Stochastic multiview hashing for large-scale near-duplicate video retrieval,” IEEE Transactions on Multimedia, vol. 19, no. 1, pp. 1–14, 2017.
View at: Publisher Site | Google Scholar
X. Nie, W. Jing, C. Cui, C. J. Zhang, L. Zhu, and Y. Yin, “Joint multi-view hashing for large-scale near-duplicate video retrieval,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 10, pp. 1951–1965, 2020.
View at: Publisher Site | Google Scholar
Y. Wu, X. Liu, H. Qin et al., “Boosting temporal binary coding for large-scale video search,” IEEE Transactions on Multimedia, vol. 23, pp. 353–364, 2021.
View at: Publisher Site | Google Scholar
L. Yuan, T. Wang, X. Zhang et al., “Central similarity quantization for efficient image and video retrieval,” in Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3080–3089, IEEE, Seattle, WA, USA, 2020.
View at: Publisher Site | Google Scholar
H. Kuehne, H. Jhuang, E. Garrote, T. A. Poggio, and T. Serre, “HMDB: a large video database for human motion recognition,” in Proceedings of the 2011 IEEE International Conference on Computer Vision, pp. 2556–2563, IEEE Computer Society, Barcelona, Spain, 2011.
View at: Google Scholar
K. Soomro, A. R. Zamir, and M. Shah, “UCF101: a dataset of 101 human actions classes from videos in the wild,” Computing Research Repository, 2012, https://arxiv.org/abs/1212.0402.
View at: Google Scholar

Copyright

Copyright © 2021 Xiaoman Bian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

400

Downloads

910

Citations