Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2015, Article ID 962974, 9 pages
Research Article

Capturing Uncertainty Information and Categorical Characteristics for Network Payload Grouping in Protocol Reverse Engineering

1School of Information Science and Technology, Sun Yat-Sen University, Guangzhou 510275, China
2School of Electronic and Information, Guangdong Polytechnic Normal University, Guangzhou 510665, China

Received 21 January 2015; Revised 11 May 2015; Accepted 12 May 2015

Academic Editor: Filippo Ubertini

Copyright © 2015 Jian-Zhen Luo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


As a promising tool to recover the specifications of unknown protocols, protocol reverse engineering has drawn more and more attention in research over the last decade. It is a critical task of protocol reverse engineering to extract the protocol keywords from network trace. Since the messages of different types have different sets of protocol keywords, it is an effective method to improve the accuracy of protocol keyword extraction by clustering the network payload of unknown traffic into clusters and analyzing each clusters to extract the protocol keywords. Although the classic algorithms such as K-means and EM can be used for network payload clustering, the quality of resultant traffic clusters was far from satisfactory when these algorithms are applied to cluster application layer traffic with categorical attributes. In this paper, we propose a novel method to improve the accuracy of protocol reverse engineering by applying a rough set-based technique for clustering the application layer traffic. This technique analyze multidimension uncertain information in multiple categorical attributes based on rough sets theory to cluster network payload, and apply the Minimum Description Length criteria to determine the optimal number of clusters. The experiments show that our method outperforms the existing algorithms and improves the results of protocol keyword extraction.