Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2015, Article ID 293176, 9 pages
Research Article

Obtaining Cross Modal Similarity Metric with Deep Neural Architecture

1School of Computers, Beijing University of Posts and Telecommunications, Beijing 100876, China
2Engineering Research Center of Information Networks, Ministry of Education, Beijing 100876, China

Received 14 September 2014; Accepted 24 December 2014

Academic Editor: Florin Pop

Copyright © 2015 Ruifan Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Analyzing complex system with multimodal data, such as image and text, has recently received tremendous attention. Modeling the relationship between different modalities is the key to address this problem. Motivated by recent successful applications of deep neural learning in unimodal data, in this paper, we propose a computational deep neural architecture, bimodal deep architecture (BDA) for measuring the similarity between different modalities. Our proposed BDA architecture has three closely related consecutive components. For image and text modalities, the first component can be constructed using some popular feature extraction methods in their individual modalities. The second component has two types of stacked restricted Boltzmann machines (RBMs). Specifically, for image modality a binary-binary RBM is stacked over a Gaussian-binary RBM; for text modality a binary-binary RBM is stacked over a replicated softmax RBM. In the third component, we come up with a variant autoencoder with a predefined loss function for discriminatively learning the regularity between different modalities. We show experimentally the effectiveness of our approach to the task of classifying image tags on public available datasets.