Table of Contents Author Guidelines Submit a Manuscript
Journal of Robotics
Volume 2017, Article ID 2061827, 7 pages
Research Article

Long Short-Term Memory Projection Recurrent Neural Network Architectures for Piano’s Continuous Note Recognition

1School of Information Science and Technology, Beijing Forestry University, No. 35 Qinghuadong Road, Haidian District, Beijing 100083, China
2National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, No. 95 Zhongguancundong Road, Haidian District, Beijing 100190, China
3College of Information Science and Technology, Jinan University, No. 601, West Huangpu Avenue, Guangzhou, Guangdong 510632, China

Correspondence should be addressed to Yanyan Xu; nc.ude.ufjb@naynayux

Received 10 May 2017; Revised 30 July 2017; Accepted 6 August 2017; Published 12 September 2017

Academic Editor: Keigo Watanabe

Copyright © 2017 YuKang Jia et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Long Short-Term Memory (LSTM) is a kind of Recurrent Neural Networks (RNN) relating to time series, which has achieved good performance in speech recogniton and image recognition. Long Short-Term Memory Projection (LSTMP) is a variant of LSTM to further optimize speed and performance of LSTM by adding a projection layer. As LSTM and LSTMP have performed well in pattern recognition, in this paper, we combine them with Connectionist Temporal Classification (CTC) to study piano’s continuous note recognition for robotics. Based on the Beijing Forestry University music library, we conduct experiments to show recognition rates and numbers of iterations of LSTM with a single layer, LSTMP with a single layer, and Deep LSTM (DLSTM, LSTM with multilayers). As a result, the single layer LSTMP proves performing much better than the single layer LSTM in both time and the recognition rate; that is, LSTMP has fewer parameters and therefore reduces the training time, and, moreover, benefiting from the projection layer, LSTMP has better performance, too. The best recognition rate of LSTMP is . As for DLSTM, the recognition rate can reach because of the effectiveness of the deep structure, but compared with the single layer LSTMP, DLSTM needs more training time.