Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2015, Article ID 913489, 9 pages
http://dx.doi.org/10.1155/2015/913489
Research Article

Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China

Received 10 November 2014; Revised 14 February 2015; Accepted 24 February 2015

Academic Editor: Stavros J. Hamodrakas

Copyright © 2015 Shengyu Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. I. Segura-Bedmar, P. Martínez, and M. Segura-Bedmar, “Drug name recognition and classification in biomedical texts: a case study outlining approaches underpinning automated systems,” Drug Discovery Today, vol. 13, no. 17-18, pp. 816–823, 2008. View at Publisher · View at Google Scholar · View at Scopus
  2. E. F. Sang and F. D. Meulder, “Introduction to the CoNLL-2003 shared task: language-independent named entity recognition,” in Proceedings of the Conference on Computational Natural Language Learning (CoNLL '03), pp. 142–147, Edmonton, Canada, 2003.
  3. L. Smith, L. K. Tanabe, R. J. Ando et al., “Overview of BioCreative II gene mention recognition,” Genome Biology, vol. 9, supplement 2, article S2, 2008. View at Publisher · View at Google Scholar · View at Scopus
  4. R. I. Doğan, R. Leaman, and Z. Lu, “NCBI disease corpus: a resource for disease name recognition and concept normalization,” Journal of Biomedical Informatics, vol. 47, pp. 1–10, 2014. View at Publisher · View at Google Scholar
  5. I. Segura-Bedmar, P. Martínez, and M. Herrero-Zazo, “SemEval-2013 task 9: extraction of drug-drug interactions from biomedical texts (DDIExtraction 2013),” in Proceedings of the 7th International Workshop on Semantic Evaluation, vol. 2, pp. 341–350, 2013.
  6. D. Sanchez-Cisneros, P. Martínez, and I. Segura-Bedmar, “Combining dictionaries and ontologies for drug name recognition in biomedical texts,” in Proceedings of the 7th International Workshop on Data and Text Mining in Biomedical Informatics, pp. 27–30, 2013.
  7. L. He, Z. Yang, H. Lin, and Y. Li, “Drug name recognition in biomedical texts: a machine-learning-based method,” Drug Discovery Today, vol. 19, no. 5, pp. 610–617, 2014. View at Publisher · View at Google Scholar · View at Scopus
  8. A. R. Aronson, O. Bodenreider, H. F. Chang et al., “The NLM indexing initiative,” Proceedings of the AMIA Annual Symposium, pp. 17–21, 2000. View at Google Scholar · View at Scopus
  9. I. Segura-Bedmar, P. Martínez, and D. Sánchez-Cisneros, “The 1st DDIExtraction-2011 challenge task: extraction of drug-drug interactions from biomedical texts,” in Proceedings of the 1st Challenge Task on Drug-Drug Interaction Extraction, pp. 1–9, September 2011. View at Scopus
  10. M. Krallinger, F. Leitner, O. Rabal, M. Vazquez, J. Oyarzabal, and A. Valencia, “CHEMDNER: the drugs and chemical names extraction challenge,” Journal of Cheminformatics, vol. 7, supplement 1, p. S1, 2015. View at Google Scholar
  11. M. Krallinger, O. Rabal, F. Leitner et al., “The CHEMDNER corpus of chemicals and drugs and its annotation principles,” Journal of Cheminformatics, vol. 7, supplement 1, article S2, 2015. View at Google Scholar
  12. R. Leaman, C. H. Wei, and Z. Lu, “tmChem: a high performance approach for chemical named entity recognition and normalization,” Journal of Cheminformatics, vol. 7, supplement 1, article S3, 2015. View at Publisher · View at Google Scholar
  13. Y. Lu, D. Ji, X. Yao, X. Wei, and X. Liang, “CHEMDNER system with mixed conditional random fields and multi-scale word clustering,” Journal of Cheminformatics, vol. 7, supplement 1, p. S4, 2015. View at Google Scholar
  14. B. Tang, Y. Feng, X. Wang et al., “A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature,” Journal of Cheminformatics, vol. 7, supplement 1, article S8, 2015. View at Google Scholar
  15. H. J. Dai, P. T. Lai, Y. C. Chang, and R. Tsai, “Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization,” Journal of Cheminformatics, vol. 7, supplement 1, p. S14, 2015. View at Google Scholar
  16. J. Björe, S. Kaewphan, and T. Salakoski, “UTurku: drug named entity detection and drug-drug interaction extraction using SVM classification and domain knowledge,” in Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval '13), pp. 651–659, Atlanta, Ga, USA, 2013.
  17. B. Tang, H. Cao, Y. Wu, M. Jiang, and H. Xu, “Recognizing clinical entities in hospital discharge summaries using structural support vector machines with word representation features,” BMC Medical Informatics and Decision Making, vol. 13, supplement 1, article S1, 2013. View at Publisher · View at Google Scholar · View at Scopus
  18. R. T.-H. Tsai, C.-L. Sung, H.-J. Dai, H.-C. Hung, T.-Y. Sung, and W.-L. Hsu, “NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition,” BMC Bioinformatics, vol. 7, supplement 5, article S11, 2006. View at Publisher · View at Google Scholar · View at Scopus
  19. L. Ratinov and D. Roth, “Design challenges and misconceptions in named entity recognition,” in Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL '09), pp. 147–155, June 2009. View at Scopus
  20. Ö. Uzuner, B. R. South, S. Shen, and S. L. DuVall, “2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text,” Journal of the American Medical Informatics Association, vol. 18, no. 5, pp. 552–556, 2011. View at Publisher · View at Google Scholar · View at Scopus
  21. J. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, “Introduction to the bio-entity recognition task at JNLPBA,” in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, pp. 70–75, Geneva, Switzerland, August 2004. View at Publisher · View at Google Scholar
  22. A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman, “BioCreAtIvE task 1A: gene mention finding evaluation,” BMC Bioinformatics, vol. 6, supplement 1, article S2, 2005. View at Publisher · View at Google Scholar · View at Scopus
  23. T. Rocktäschel, T. Huber, M. Weidlich, and U. Leser, “WBI-NER: the impact of domain-specific features on the performance of identifying and classifying mentions of drugs,” in Proceedings of the 7th International Workshop on Semantic Evaluation, pp. 356–363, 2013.
  24. T. Grego, F. Pinto, and F. M. Couto, “LASIGE: using conditional random fields and ChEBI ontology,” in Proceedings of the 7th International Workshop on Semantic Evaluation, pp. 660–666, 2013.
  25. D. Sanchez-Cisneros and F. A. Gali, “UEM-UC3M: an ontology-based named entity recognition system for biomedical texts,” in Proceedings of the 7th International Workshop on Semantic Evaluation, pp. 622–627, 2013.
  26. A. Collazo, A. Ceballo, D. D. Puig et al., “UMCC_DLSI: semantic and lexical features for detection and classification drugs in biomedical texts,” in Proceedings of the 7th International Workshop on Semantic Evaluation, pp. 636–643, June 2013.
  27. B. Settles, “Biomedical named entity recognition using conditional random fields and rich feature sets,” in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications (JNLPBA '04), pp. 104–107, 2004. View at Publisher · View at Google Scholar
  28. C. Knox, V. Law, T. Jewison et al., “DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs,” Nucleic Acids Research, vol. 39, no. 1, pp. D1035–D1041, 2011. View at Publisher · View at Google Scholar · View at Scopus
  29. K. M. Hettne, R. H. Stierum, M. J. Schuemie et al., “A dictionary to identify small molecules and drugs in free text,” Bioinformatics, vol. 25, no. 22, pp. 2983–2991, 2009. View at Publisher · View at Google Scholar · View at Scopus
  30. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS '13), pp. 3111–3119, December 2013. View at Scopus
  31. G. Forman, “An extensive empirical study of feature selection metrics for text classification,” Journal of Machine Learning Research, vol. 3, pp. 1289–1305, 2003. View at Google Scholar · View at Scopus
  32. Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” in Proceedings of 14th International Conference on Machine Learning, pp. 412–420, 1997.
  33. Z. Zheng, X. Wu, and R. Srihari, “Feature selection for text categorization on imbalanced data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 80–89, 2004. View at Publisher · View at Google Scholar
  34. C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, Cambridge, UK, 2008.