Research Article

Obtaining Cross Modal Similarity Metric with Deep Neural Architecture

Figure 5

The CCA system used in our experiments. From the lower left corner, the image modality is represented using MPEG-7 and gist descriptors forming a vector with size 1,704. Then, a Gaussian RBM with 1,704 visible neurons and 1,024 hidden neurons is used to learn the image representation. From the lower right corner, the text modality is represented using BoW model forming a vector with size 4000. Then, a replicated softmax RBM with 4,000 visible neurons and hidden 1,024 neurons is adopted to learn a text representation. Finally, a CCA model with 1,024 twin inputs and 1,024 twin outputs is built using these bimodal representations.