Pyramidal Part-Based Model for Partial Occlusion Handling in Pedestrian ClassificationRead the full article
Advances in Multimedia publishes research on the technologies associated with multimedia systems, including computer-media integration for digital information processing, storage, transmission, and representation.
Advances in Multimedia maintains an Editorial Board of practicing researchers from around the world, to ensure manuscripts are handled by editors who are experts in the field of study.
Latest ArticlesMore articles
High Dynamic Range Imaging Based on Bidirectional Structural Similarities and Weighted Low-Rank Matrix Completion
High dynamic range (HDR) imaging, aiming to increase the dynamic range of an image by merging multiexposure images, has attracted much attention. Ghosts are often observed in a resultant image, due to camera motion and object motion in the scene. Low-rank matrix completion (LRMC) provides an effective tool to remove ghosts. However, user specification of the included or excluded regions is required. In this paper, we propose a novel HDR imaging method based on bidirectional structural similarities and weighted low-rank matrix completion. In our method, we first propose the bidirectional structural similarities containing forward-projection structural similarity (FPSS) and backward-projection structural similarity (BPSS) to divide each image into four groups: motion region, saturated region in the source image, saturated region in the reference image, and static and unsaturated regions. Then, the weight maps and the motion maps constructed based on FPSS and BPSS are introduced in the weighted LRMC model to reconstruct the background irradiance maps. Experiments are conducted on several challenging image sets with complex scene, and the results show that the proposed method outperforms three current state-of-the-art methods and Photoshop cs6 and is robust to the reference image.
A Color-Image Encryption Scheme Using a 2D Chaotic System and DNA Coding
This paper proposes a method of encrypting images with password protection for secure sharing based on deoxyribonucleic acid (DNA) sequence operations and the tangent-delay ellipse reflecting the cavity-map system (TD-ERCS). The initial values of the TD-ERCS system are generated from a user’s password, and the TD-ERCS system is used to scramble the pixel locations of the R, G, and B matrices of the original image. Next, three DNA-sequence matrices are generated by encoding the permuted color image such that it can be transformed into three matrices. Then, the TD-ERCS system is employed to generate three chaotic sequences before encoding the DNA into the three matrices. Thereafter, a DNA exclusive OR (XOR) operation is executed between the DNA sequences of the permuted image and the DNA sequences generated by the TD-ERCS system to produce three encrypted scrambled matrices. Finally, the matrices of the DNA sequences are decoded, and the R, G, and B channels are recombined to form an encrypted color image. The results of simulation and security tests reveal that the proposed algorithm offers robust encryption and demonstrates the ability to resist exhaustive, statistical, and differential attacks.
Euclidean Distance-Based Weighted Prediction for Merge Mode in HEVC
Merge mode can achieve a considerable coding gain because of reducing the cost of coding motion information in video codecs. However, the simple adoption of the motion information from the neighbouring blocks may not achieve the optimal performance as the motion correlation between the pixels and the neighbouring block decreases with their distance increasing. To address this problem, the paper proposes a Euclidean distance-based weighted prediction algorithm as an additional candidate in the merge mode. First, several predicted blocks are generated by motion compensation prediction (MCP) with the motion information from available neighbouring blocks. Second, an additional predicted block is generated by a weighted average of the predicted blocks above, where the weighted coefficient is related to Euclidean distances from the neighbouring candidate to the pixel points in the current block. Finally, the best merge mode is selected by the rate distortion optimization (RDO) among the original merge candidates and the additional candidate. Experimental results show that, on the joint exploration test model 7.0 (JEM 7.0), the proposed algorithm achieves better coding performance than the original merge mode under all configurations including random access (RA), low delay B (LDB), and low delay P (LDP), with a slight coding complexity increase. Especially for the LDP configuration, the proposed method achieves 1.50% bitrate saving on average.
A Human-Computer Interaction System for Agricultural Tools Museum Based on Virtual Reality Technology
Traditional museums and most digital museums use window display to exhibit their collections. However, the agricultural tools are distinctive for their use value and wisdom contained. Therefore, this paper first proposes a method of virtual interactive display for agricultural tools based on virtual reality technology, which combines static display and dynamic use of agricultural tools vividly showing the agricultural tools. To address the problems of rigid interaction and terrible experience in the process of human-computer interaction, four human-computer interaction technologies are proposed to design and construct a more humanized system including intelligent scenes switching technology, multichannel introduction technology, interactive virtual roaming technology, and task-based interactive technology. The evaluation results demonstrate that the system proposed achieves good performance in fluency, instructiveness, amusement, and practicability. This human-computer interaction system can not only show the wisdom of Chinese traditional agricultural tools to the experiencer all over the world but also put forward a new method of digital museum design.
Image Hashing for Tamper Detection with Multiview Embedding and Perceptual Saliency
Perceptual hashing technique for tamper detection has been intensively investigated owing to the speed and memory efficiency. Recent researches have shown that leveraging supervised information could lead to learn a high-quality hashing code. However, most existing methods generate hashing code by treating each region equally while ignoring the different perceptual saliency relating to the semantic information. We argue that the integrity for salient objects is more critical and important to be verified, since the semantic content is highly connected to them. In this paper, we propose a Multi-View Semi-supervised Hashing algorithm with Perceptual Saliency (MV-SHPS), which explores supervised information and multiple features into hashing learning simultaneously. Our method calculates the image hashing distance by taking into account the perceptual saliency rather than directly considering the distance value between total images. Extensive experiments on benchmark datasets have validated the effectiveness of our proposed method.
Video Scene Detection Using Compact Bag of Visual Word Models
Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key frames. Key frames are further represented by BoVW feature vectors which are quite short and compact compared to classical BoVW model implementations. Two variations of BoVW model are used: classical BoVW model and Vector of Linearly Aggregated Descriptors (VLAD) which is an extension of classical BoVW model. The similarity of the shots is computed by the distances between their key frames feature vectors within the sliding window of length , rather comparing each shot with very long lists of shots which has been previously practiced, and the value of is . Experiments on cinematic and drama videos show the effectiveness of our proposed framework. The BoVW is -dimensional vector and VLAD is only -dimensional vector in the proposed model. The BoVW achieves segmentation accuracy, whereas VLAD achieves .