Table of Contents Author Guidelines Submit a Manuscript
Complexity
Volume 2019, Article ID 9345861, 11 pages
https://doi.org/10.1155/2019/9345861
Research Article

Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets

Harbin Institute of Technology, China

Correspondence should be addressed to Hongxun Yao; nc.ude.tih@oay.h

Received 15 September 2018; Revised 3 December 2018; Accepted 19 December 2018; Published 2 January 2019

Guest Editor: Li Zhang

Copyright © 2019 Xinjie Feng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. L. Zhang, A. A. Mohamed, R. Chai, B. Zheng, and S. Wu, “Automated deep-learning method for whole-breast segmentation in diffusion-weighted breast mri,” in Medical Imaging, SPIE, 2019. View at Google Scholar
  2. L. Zhang, R. Chai, S. W. Dooman Arefan, and J. Sumkin, “Deep-learning method for tumor segmentation in breast dce-mri,” in Medical Imaging, SPIE, 2019. View at Google Scholar
  3. D. Xie, L. Zhang, and L. Bai, “Deep learning in visual computing and signal processing,” Applied Computational Intelligence and Soft Computing, vol. 2017, Article ID 1320780, 13 pages, 2017. View at Google Scholar · View at Scopus
  4. Z. Zhou, J. Shin, L. Zhang, S. Gurudu, M. Gotway, and J. Liang, “Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally,” in Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 4761–4772, USA, July 2017. View at Scopus
  5. L. Zhang, F. Yang, Y. Daniel Zhang, and Y. J. Zhu, “Road crack detection using deep convolutional neural network,” in Proceedings of the 23rd IEEE International Conference on Image Processing, ICIP 2016, pp. 3708–3712, Phoenix, AZ, USA, September 2016. View at Scopus
  6. Y. Qi, S. Zhang, L. Qin et al., “Hedging deep features for visual tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. View at Google Scholar
  7. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. View at Google Scholar · View at MathSciNet
  8. R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 58, no. 1, pp. 267–288, 1996. View at Google Scholar · View at MathSciNet
  9. P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data, Springer Series in Statistics, Springer, New York, NY, USA, 2011. View at Publisher · View at Google Scholar · View at MathSciNet
  10. N. M. Nasrabadi, “Pattern Recognition and Machine Learning,” Journal of Electronic Imaging, vol. 16, no. 4, p. 049901, 2007. View at Publisher · View at Google Scholar
  11. A. Christmann and D.-X. Zhou, “On the robustness of regularized pairwise learning methods based on kernels,” Journal of Complexity, vol. 37, pp. 1–33, 2016. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  12. Abhishake and S. Sivananthan, “Multi-penalty regularization in learning theory,” Journal of Complexity, vol. 36, pp. 141–165, 2016. View at Publisher · View at Google Scholar · View at MathSciNet
  13. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2323, 1998. View at Publisher · View at Google Scholar · View at Scopus
  14. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS '12), pp. 1097–1105, Lake Tahoe, Nev, USA, December 2012. View at Scopus
  15. T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, “End-to-end text recognition with convolutional neural networks,” in Proceedings of the 21st International Conference on Pattern Recognition (ICPR '12), pp. 3304–3308, November 2012. View at Scopus
  16. A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, “PhotoOCR: Reading text in uncontrolled conditions,” in Proceedings of the 2013 14th IEEE International Conference on Computer Vision, ICCV 2013, pp. 785–792, Australia, December 2013. View at Scopus
  17. Y. Deng, A. Kanervisto, and A. M. Rush, “What you get is what you see: a visual markup decompiler,” 2016, https://arxiv.org/abs/1609.04938v1.
  18. C.-Y. Lee and S. Osindero, “Recursive recurrent nets with attention modeling for OCR in the wild,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 2231–2239, USA, July 2016. View at Scopus
  19. A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural 'networks,” in Proceedings of the ICML 2006: 23rd International Conference on Machine Learning, pp. 369–376, USA, June 2006. View at Scopus
  20. K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV '11), pp. 1457–1464, IEEE, Barcelona, Spain, November 2011. View at Publisher · View at Google Scholar · View at Scopus
  21. L. Neumann and J. Matas, “Scene text localization and recognition with oriented stroke detection,” in Proceedings of the 14th IEEE International Conference on Computer Vision (ICCV '13), pp. 97–104, December 2013. View at Publisher · View at Google Scholar · View at Scopus
  22. C.-Y. Lee, A. Bhardwaj, W. Di, V. Jagadeesh, and R. Piramuthu, “Region-based discriminative feature pooling for scene text recognition,” in Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 4050–4057, USA, June 2014. View at Scopus
  23. X. Bai, C. Yao, and W. Liu, “A learned multi-scale representation for scene text recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049, 2014.
  24. J. Almazan, A. Gordo, A. Fornes, and E. Valveny, “Word spotting and recognition with embedded attributes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 12, pp. 2552–2566, 2014. View at Publisher · View at Google Scholar · View at Scopus
  25. J. A. Rodriguez-Serrano, A. Gordo, and F. Perronnin, “Label Embedding: A Frugal Baseline for Text Recognition,” International Journal of Computer Vision, vol. 113, no. 3, pp. 193–207, 2015. View at Publisher · View at Google Scholar · View at Scopus
  26. A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber, “A novel connectionist system for unconstrained handwriting recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, 2009. View at Publisher · View at Google Scholar · View at Scopus
  27. M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks,” International Journal of Computer Vision, vol. 116, no. 1, pp. 1–20, 2016. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  28. A. Hannun, C. Case, J. Casper et al., “Deep speech: Scaling up end-to-end speech recognition,” https://arxiv.org/abs/1412.5567.
  29. D. Amodei, S. Ananthanarayanan, R. Anubhai et al., “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International Conference on Machine Learning, pp. 173–182, 2016.
  30. A. Graves and J. Schmidhuber, “Offline handwriting recognition with multidimensional recurrent neural networks,” in Proceedings of the 22nd Annual Conference on Neural Information Processing Systems, NIPS 2008, pp. 545–552, Canada, December 2008. View at Scopus
  31. A. Ul-Hasan, S. B. Ahmed, F. Rashid, F. Shafait, and T. M. Breuel, “Offline printed urdu nastaleeq script recognition with bidirectional LSTM networks,” in Proceedings of the 12th International Conference on Document Analysis and Recognition, ICDAR 2013, pp. 1061–1065, USA, August 2013. View at Scopus
  32. B. Shi, X. Bai, and C. Yao, “An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298–2304, 2017. View at Publisher · View at Google Scholar · View at Scopus
  33. B. Shi, X. Bai, and C. Yao, “An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition,” CoRR abs/1507.05717, https://arxiv.org/abs/1507.05717. View at Publisher · View at Google Scholar · View at Scopus
  34. M. Busta, L. Neumann, and J. Matas, “Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework,” in Proceedings of the 16th IEEE International Conference on Computer Vision, ICCV 2017, pp. 2223–2231, Italy, October 2017. View at Scopus
  35. P. He, W. Huang, Y. Qiao, C. C. Loy, and X. Tang, “Reading scene text in deep convolutional sequences,” in Proceedings of the 30th AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 3501–3508, USA, February 2016. View at Scopus
  36. D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” https://arxiv.org/abs/1409.0473.
  37. J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” https://arxiv.org/abs/1412.7755.
  38. K. Xu, J. L. Ba, R. Kiros et al., “Show, attend and tell: Neural image caption generation with visual attention,” in Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp. 2048–2057, France, July 2015. View at Scopus
  39. M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Proceedings of the 29th Annual Conference on Neural Information Processing Systems, NIPS 2015, pp. 2017–2025, Canada, December 2015. View at Scopus
  40. T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007, Venice, October 2017. View at Publisher · View at Google Scholar
  41. C.-Y. Lee and S. Osindero, “Recursive recurrent nets with attention modeling for ocr in the wild,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  42. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778, July 2016. View at Scopus
  43. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. View at Publisher · View at Google Scholar · View at Scopus
  44. F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, “Learning precise timing with LSTM recurrent networks,” Journal of Machine Learning Research, vol. 3, no. 1, pp. 115–143, 2003. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  45. J. Zhou and W. Xu, “End-to-end learning of semantic role labeling using recurrent neural networks,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1127–1137, Beijing, China, July 2015. View at Publisher · View at Google Scholar
  46. Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks and Learning Systems, vol. 5, no. 2, pp. 157–166, 1994. View at Publisher · View at Google Scholar · View at Scopus
  47. R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pp. 2347–2355, USA, June 2013. View at Scopus
  48. G. E. Dahl, T. N. Sainath, and G. E. Hinton, “Improving deep neural networks for LVCSR using rectified linear units and dropout,” in Proceedings of the 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '13), pp. 8609–8613, May 2013. View at Publisher · View at Google Scholar · View at Scopus
  49. J. S. Bridle, “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” in Neurocomputing (Les Arcs, 1989), vol. 68, pp. 227–236, Neurocomputing, Springer, Berlin, Germany, 1990. View at Google Scholar
  50. Y. Chenguang, “chinese_ocr,” 2018, https://github.com/YCG09/chinese_ocr. View at Publisher · View at Google Scholar
  51. I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On The Importance of Initialization And Momentum in Deep Learning,” in International Conference on Machine Learning, pp. 1139–1147, June 2013. View at Scopus