Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2017 (2017), Article ID 3273891, 16 pages
https://doi.org/10.1155/2017/3273891
Research Article

An Efficient Platform for the Automatic Extraction of Patterns in Native Code

1Computer Science Department, University of Oviedo, Calvo Sotelo s/n, 33007 Oviedo, Spain
2Cork Institute of Technology, Computer Science Department, Rossa Avenue, Bishopstown, Cork, Ireland

Correspondence should be addressed to Francisco Ortin; se.ivoinu@nitro

Received 30 September 2016; Revised 26 December 2016; Accepted 17 January 2017; Published 28 February 2017

Academic Editor: Raphaël Couturier

Copyright © 2017 Javier Escalada et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Defense Advanced Research Projects Agency, MUSE envisions mining “big code” to improve software reliability and construction, 2014, http://www.darpa.mil/news-events/2014-03-06a.
  2. F. Ortin, J. Escalada, and O. Rodriguez-Prieto, “Big code: new opportunities for improving software construction,” Journal of Software, vol. 11, no. 11, pp. 1083–1008, 2016. View at Publisher · View at Google Scholar
  3. F. Yamaguchi, M. Lottmann, and K. Rieck, “Generalized vulnerability extrapolation using abstract syntax trees,” in Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC '12), pp. 359–368, ACM, Los Angeles, Calif, USA, December 2012. View at Publisher · View at Google Scholar · View at Scopus
  4. E. Alpaydin, Introduction to Machine Learning, The MIT Press, 2nd edition, 2010.
  5. T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley, “Byteweight: learning to recognize functions in binary code,” in Proceedings of the 23rd USENIX Conference on Security Symposium (SEC '14), pp. 845–860, USENIX Association, San Diego, Calif, USA, August 2014.
  6. N. Rosenblum, X. Zhu, B. Miller, and K. Hunt, “Learning to analyze binary computer code,” in Proceedings of the 23rd National Conference on Artificial Intelligence—Volume 2 (AAAI '08), pp. 798–804, AAAI Press, 2008.
  7. N. E. Rosenblum, B. P. Miller, and X. Zhu, “Extracting compiler provenance from program binaries,” in Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE '10), pp. 21–28, ACM, Toronto, Canada, June 2010. View at Publisher · View at Google Scholar · View at Scopus
  8. N. Rosenblum, B. P. Miller, and X. Zhu, “Recovering the toolchain provenance of binary code,” in Proceedings of the 20th International Symposium on Software Testing and Analysis (ISSTA '11), pp. 100–110, ACM, Ontario, Canada, July 2011. View at Publisher · View at Google Scholar · View at Scopus
  9. I. Santos, Y. K. Penya, J. Devesa, and P. G. Bringas, “N-grams-based file signatures for malware detection,” in Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS '09), pp. 317–320, AIDSS, 2009.
  10. C. Liangboonprakong and O. Sornil, “Classification of malware families based on N-grams sequential pattern features,” in Proceedings of the 8th IEEE Conference on Industrial Electronics and Applications (ICIEA '13), pp. 777–782, June 2013. View at Publisher · View at Google Scholar
  11. V. Raychev, M. Vechev, and A. Krause, “Predicting program properties from ‘big code’,” in Proceedings of the 42nd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (POPL '15), pp. 111–124, 2015.
  12. K. Troshina, A. Chernov, and Y. Derevenets, “C decompilation: is it possible?” in Proceedings of the International Workshop on Program Understanding (PSI '09), pp. 18–27, Altai Mountains, Russia, 2009.
  13. E. J. Schwartz, J. Lee, M. Woo, and D. Brumley, “Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring,” in Proceedings of the 22nd USENIX Security Symposium, USENIX, pp. 353–368, Washington, DC, USA, 2013.
  14. A. Fokin, E. Derevenetc, A. Chernov, and K. Troshina, “SmartDec: approaching C++ decompilation,” in Proceedings of the 18th Working Conference on Reverse Engineering (WCRE '11), pp. 347–356, IEEE, October 2011. View at Publisher · View at Google Scholar · View at Scopus
  15. Y. Fan, Y. Ye, and L. Chen, “Malicious sequential pattern mining for automatic malware detection,” Expert Systems with Applications, vol. 52, pp. 16–25, 2016. View at Publisher · View at Google Scholar · View at Scopus
  16. J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: probabilistic models for segmenting and labeling sequence data,” in Proceedings of the 18th International Conference on Machine Learning (ICML '01), pp. 282–289, Morgan Kaufmann, 2001.
  17. J. Escalada and F. Ortin, Source code for the article: An efficient platform for the automatic extraction of patterns in native code, 2016, http://www.reflection.uniovi.es/decompilation/download/2016/sp.
  18. LLVM, clang: a C language family frontend for LLVM, 2016, http://clang.llvm.org.
  19. E. Bachaalany, GitHub: IDAPython, 2016, https://github.com/idapython.
  20. D. Beazley, “Understanding the python GIL,” in Proceedings of the PyCON Python Conference, Atlanta, Ga, USA, February 2010.
  21. D. Phillips, Python 3 Object-Oriented Programming, Packt Publishing Ltd, Livery Place, Birmingham, UK, 2nd edition, 2015.
  22. J. M. Redondo, F. Ortin, and J. M. C. Lovelle, “Optimizing reflective primitives of dynamic languages,” International Journal of Software Engineering and Knowledge Engineering, vol. 18, no. 6, pp. 759–783, 2008. View at Google Scholar
  23. F. Ortin, L. Vinuesa, and J. M. Felix, “The DSAW aspect-oriented software development platform,” International Journal of Software Engineering and Knowledge Engineering, vol. 21, no. 7, pp. 891–929, 2011. View at Publisher · View at Google Scholar · View at Scopus
  24. G. M. Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” in Proceedings of the Spring Joint Computer Conference, pp. 483–485, Atlantic City, NJ, USA, April 1967. View at Publisher · View at Google Scholar
  25. N. Rosenblum, X. Zhu, and B. P. Miller, “Who wrote this code? Identifying the authors of program binaries,” in Computer Security—ESORICS 2011: 16th European Symposium on Research in Computer Security, Leuven, Belgium, September 12–14,2011. Proceedings, vol. 6879 of Lecture Notes in Computer Science, pp. 172–189, Springer, Berlin, Germany, 2011. View at Publisher · View at Google Scholar
  26. E. R. Jacobson, N. Rosenblum, and B. P. Miller, “Labeling library functions in stripped binaries,” in Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (PASTE '11), pp. 1–8, ACM, Szeged, Hungary, September 2011. View at Publisher · View at Google Scholar · View at Scopus
  27. I. Santos, X. Ugarte-Pedrero, B. Sanz, C. Laorden, and P. G. Bringas, “Collective classification for packed executable identification,” in Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS '11), pp. 23–30, Perth, Australia, September 2011. View at Publisher · View at Google Scholar · View at Scopus
  28. X. Ugarte-Pedrero, I. Santos, and P. G. Bringas, “Structural feature based anomaly detection for packed executable identification,” in Computational Intelligence in Security for Information Systems: 4th International Conference, CISIS 2011, Held at IWANN 2011, Torremolinos-Málaga, Spain, June 8–10, 2011. Proceedings, vol. 6694 of Lecture Notes in Computer Science, pp. 230–237, Springer, Berlin, Germany, 2011. View at Publisher · View at Google Scholar
  29. C. Cifuentes, D. Simon, and A. Fraboulet, “Assembly to high-level language translation,” in Proceedings of the IEEE International Conference on Software Maintenance (ICSM '98), pp. 228–237, IEEE, Bethesda, Md, USA, November 1998. View at Scopus
  30. C. Cifuentes and M. Van Emmerik, “Recovery of jump table case statements from binary code,” Science of Computer Programming, vol. 40, no. 2-3, pp. 171–188, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  31. A. Mycroft, “Type-based decompilation,” in Proceedings of the European Symposium on Programming (ESOP '99), pp. 208–223, 1999.
  32. G. Balakrishnan and T. Reps, “Divine: discovering variables in executables,” in Verification, Model Checking, and Abstract Interpretation: 8th International Conference, VMCAI 2007, Nice, France, January 14–16, 2007. Proceedings, vol. 4349 of Lecture Notes in Computer Science, pp. 1–28, Springer, Berlin, Germany, 2007. View at Publisher · View at Google Scholar
  33. A. Cozzie, F. Stratton, H. Xue, and S. T. King, “Digging for data structures,” in Proceedings of the 8th Conference on Operating Systems Design and Implementation (OSDI '08), pp. 255–266, San Diego, Calif, USA, December 2008.