Abstract

The security analysis of protocols on theory level cannot guarantee the security of protocol implementations. To solve this problem, researchers have done a lot, and many achievements have been reached in this field, such as model extraction and code generation. However, the existing methods do not take the security of protocol implementations into account. In this paper, we have proposed to exploit the traces of function return values to analyze the security of protocol implementations at the source code level. Taking classic protocols into consideration, for example (like the Needham-Schroeder protocol and the Diffie-Hellman protocol, which cannot resist man-in-the-middle attacks), we have analyzed man-in-the-middle attacks during the protocol implementations and have carried out experiments. It has been shown in the experiments that our new method works well. Different from other methods of analyzing the security of protocol implementations in the literatures, our new method can avoid some flaws of program languages (like C language memory access, pointer analysis, etc.) and dynamically analyze the security of protocol implementations.

1. Introduction

With the fast development of the network communication, information security is becoming more and more important [1, 2]. To protect the network information from attacks, protocols are usually applied. However, general methods (e.g., formal method, computational model, and computational soundness formal) cannot guarantee the security of protocols during the process of their implementations. That is, even if protocols have been theoretically proved to be secure, some insecure factors (like the language characteristics of protocols’ source codes, the operating environments of the protocol implementations) arise when implementing them at the source code level. Therefore, researchers focus on the security analysis of protocol implementations at the source code level [3].

During implementing at the source code level, it is difficult to guarantee the security of protocol specifications due to language characteristics (such as C language memory access, pointer analysis, etc.). Hence, it is more complex to analyze the security of protocol implementations at the source code level compared to that of protocols on the theoretical level. To avoid these insecure factors that languages bring, some methods have been proposed to analyze the security of protocol implementations at the source code level. Among them are two representatives: model extraction and code generation. Model extraction is applied to avoid the problem of concrete state space explosion. During the extracting process, an abstract mapping is set up to map a concrete protocol model onto a corresponding abstract model and its properties onto corresponding abstract properties. If the security properties of the protocols on the abstract model have been proved to be sound and the abstract mapping has been proved to be reliable, then the security properties on the concrete model are proved. It guarantees the security of the protocol implementation and provides its reliability’s demonstration. Related research achievements include [47]. Sometimes leaks arise in the process of protocol implementations due to the design imperfection, which leads protocol implementations to be insecure (such as the SSL protocol, the TLS protocol). To avoid these cases, code generation is applied. That is, protocol specifications are analyzed before implementations. After that, refined mapping and some related running choices (like concrete program languages or running environments) are applied to map a program abstract model onto a corresponding concrete model and its abstract properties onto corresponding concrete properties. If the security properties of the protocol on the abstract model have been proved to be sound and the mapping has been proved to be reliable, the properties on the concrete model are proved. Related research achievements include [810]. Now model extraction and code generation have been widely applied to protocol security analysis. However, neither model extraction nor code generation can guarantee the security of protocol implementations due to program language flaws, which causes the gap between the protocol security analysis on theoretical level and the security analysis of the protocol implementations at the source code level [11, 12].

The thought of our new method to analyze the security of protocol implementations at the source code level derives from a daily life phenomenon. When an object moves from A to B in an environment without any barriers (called the ideal environment), a trace will be produced by the object, which is called the ideal trace. In nonideal environments, the object will be attacked by a third party, and its trace (called nonideal trace) will deviate from the ideal trace (the detailed definitions of ideal trace and nonideal trace are given in Section 5). If the moving behavior is rectified after the object is attacked, the degree of deviation of its nonideal trace from the ideal trace will become smaller. Based on the thought above, we propose a new method to analyze the security of protocol implementations by means of the traces. The traces consist of the sets of function return values when implementing a protocol. Our new method is carried out like this: we exploit the ideal trace and the method of cluster analysis (including the degree of deviation and the similarity of the trace sequences) as the evaluated reference to analyze protocol security. To prove our new method, with strong simulation of π-calculus we refine the source codes of the Needham-Schroder protocol as an example and set up the security analysis model of protocol implementations with labelled transition systems [13]. Besides, taking the Needham-Schroder protocol and the Diffie-Hellman protocol, for example, we verify man-in-the-middle attacks on OpenSSL with our new method. To describe the specific steps of the security analysis in detail, a flow graph is drawn in Figure 1.

Our original contributions are as follows:

() We bring in the ideal trace as the evaluated reference of the security analysis of protocol implementations at the source code level and setting up a bijection of the mapping of events onto traces.

() We propose a method for analyzing the security of protocol implementations by comparing the similarity and the deviation between nonideal traces and the ideal trace during implementing protocols at the source code level.

The remaining work of our article is as follows: a summary of related works in Section 2; the preliminaries in Section 3; building a new model in Section 4; the security analysis of protocols in Section 5; taking classical protocol implementations, for example, and doing experiments in Section 6; conclusion and future work in Section 7.

It is practical and valuable to analyze the security of protocol implementations at the source code level. In recent years, researchers focus on this field and here we can find great achievements [14].

The security analysis of protocol implementations at the source code level is very complex and different from common protocol security analysis [15], for it must take program language structure and running environment into account, which adds difficulty to the security analysis of protocol implementations. To solve this problem, literature [4] has done some researches, analyzing the security of implementing the Needham-Schroeder protocol written in C language. In literature [4], C language annotations act as trust assertions, and a trust assertion model is established for protocol security analysis by means of Horn Logic. Some researchers have used some function libraries and intermediate languages, like C language, to automatically analyze the security of protocol implementations. This method has avoided the problems that C language structure brings about (like pointer operation, buffer overflow, etc.). The related literatures are [57, 16, 17]. Similarly, some researchers have applied the latest techniques or tools to the security analysis of protocol implementations at the source code level, which is a new direction in this field. For example, based on VCC, C language application program interface (API) has been applied to the analysis security of protocol implementations at the source code level. Literature [18] has proposed a general method for protocol security analysis, which has distinguished two different items mapped onto the same array. In literature [19], a general verifying method has been applied to practical TPM and HSMs platforms. Literature [20] has exploited interface constraint and program logic reasoning to analyze the security of protocol implementations at the source code level. To reduce the difficulty brought about by C language structure when analyzing the security of protocol implementations, researchers have exploited C language complier to analyze the security of protocol implementation. The examples are [21, 22].

Transport Layer Security (TLS) has been widely put into practice so it is a focus to analyze the security of protocol implementations on the base of TLS. For example, literature [23] has proposed that the OpenSSL crypto library can be applied to the security analysis of implementing TLS in the C language environment. Another example is literature [24], which has proposed to analyze the security of protocols on base of the control-flow integrity of the message authentication codes (MACs). This method has taken it into account that an adversary attack protocols by means of C/C++ pointer and memory leaks during the process of protocol implementations.

The methods mentioned above have promoted the research of the security analysis of protocol implementations. However, they cannot settle the security problems that are caused by the inherent flaws of program language structures. Compared with them, our new method can avoid the flaws of program language structure and dynamically analyze the security of protocol implementations at the source code level. Hence, it is helpful and valuable for protocol design, security verification, and security evaluation [25].

3. Preliminaries

For better understanding of our new method, it is necessary to have a basic knowledge of labelled transition system, strong simulation, program refinement, and so on.

3.1. Labelled Transition System and Strong Simulation

According to the automata theory, a labelled transition system is a system transferring model which consists of start point, input label, and terminal point. But a labelled transition system has no fixed start states and accepting states, which is different from automata. Therefore, any state in a labelled transition system can act as a start state. Therefore, a labelled transition system has an advantage over other systems when dynamically analyzing the security of protocol implementations at the source code level.

Definition 1 (labelled transition system [13]). A labelled transition system (LTS) over Act is a pair consisting of
() a set of states;
() a ternary relation , known as a transition relation.
If , we write , and we call the source and the target of the transition. If , then we call a derivate of under .

Definition 2 (strong simulation [13]). Let be an LTS, and let be a binary relation over . Then is called a strong simulation over ; if, whenever , then there exists such that and .
We say that strongly simulates if there exists a strong simulation such that .

Definition 3 (extended strong simulation). Let be an LTS and be an LTS and be a binary relation over , and let . If satisfies
() , ,
() and , and if , then there exists such that and . Then is called an extended strong simulation over a 4-tuple .

3.2. Program Refinement

When implementing a protocol, it is very difficult to verify its properties in concrete program state space (called the state-explosion problem [26]). Program refinement is exploited to solve this problem. Program refinement, which simplifies two programs with refined relation, is an important technique of verifying programs and is useful to reduce program state space.

Definition 4 (program refinement). Let be a concrete program, and then there exists , a refined program of . That is, the process from to is recursive. The behaviors produced by program belong to the subset of the behaviors produced by program . Therefore, will never produce the behaviors that cannot produce, and wherever a program is applied, can be used to replace it. Hence, is called a refined program of .
According to Definition 4, many program verification problems can be reduced to refined verification. For example, the verification of a program or an algorithm is always reduced to the refined relation between programs and protocol specifications. Similarly, there exists refined relation between a protocol implementation and its specifications. To reduce refined relation and refine source codes, we design a refining algorithm of protocol implementations, shown in Algorithm 1.

Step 1. IniStack(); // initiate a stack. Here, represents stack.
Step 2. Input file.; // input a file (protocol source codes)
Step 3. While (judge whether the file of protocol source codes is finished)
(judge whether functions are true) Push(); functions push into the
stack. Here, represents file. .
Continue to seek the functions of protocol source codes.
  while
Step 4. While ( StackEmpty()) judge whether the stack is empty.
GetTop(); pop the functions which are on the top of the stack.
if (the top functions of the stack satisfy protocol specifications) the functions
can reflect the code execution of protocol interactive communication, such as
send and recv of socket API.
Else
//while

The complexity analysis of the algorithm consists of two circulations. One circulation achieves the goal of choosing functions. The other labels the functions which satisfy the condition, and refines source codes. Assuming there exists functions in the source codes, among these functions, functions can satisfy protocol specifications (). The worst case is . Hence, the complexity of the algorithm is , which is polynomial time. The algorithm can be achieved.

3.3. The Program Refinement of Protocol Source Codes

In most cases, the program source codes consist of functions, such as C language source codes, Java language source codes, and F# language source codes. According to the rules of C language program execution, when implementing a protocol, called function return values, the function behaviors are decided, and the sets of called function return values decide the behavior traces of protocol interactive communications. Therefore, if the functions of source codes are refined, the difficulty of the state-explosion problem will be reduced and the security properties of the protocol will not change after refinement. Hence, we can refine a protocol to analyze its security if the refinement relation of source codes exists.

Proposition 5. If the nonfunction part is removed from program source codes, the newly produced program source codes are the refinement of the previous program source codes.

Proof. Assume that the program running is a LTS. That is, let the LTS of the previous program source codes be , and let the LTS of the program source codes obtained after removing the nonfunction part be . Let be a binary relation over . And there exist ,  , and , . Here and , respectively, are the input labels of two program source codes, and (called in the following). According to Definition 1, there exist and , as well as and . According to Definition 3, the behaviors generated by LTS are the subset of LTS , and will never produce any behaviors that cannot produce. That is, wherever is applied, it can be replaced by . Hence, the proposition is proved.

4. Establishing the Model

There are always some faults in a protocol due to its imperfect design. These faults make the protocol vulnerable to malicious third-party attacks. For example, the Needham-Schroeder protocol and the Diffie-Hellman protocol cannot resist man-in-the-middle attacks. To avoid these malicious attacks, it is necessary to analyze the security of a protocol on theory level before its implementations (such as formal, computational model, the computationally sound formal). However, a protocol is not secure when implementing it at the source code level, although it has been proved to be secure on theory level. That is, it is essential to analyze the security of protocol implementations at the source code level.

In this paper, a method is proposed by us to analyze the security of protocol implementations at the source code level written in C. Our method establishes models in the following steps: ① describe a protocol symbolically; ② acquire the program source codes of the protocol; ③ refine these source codes; ④ draw the control-flow graph and the state diagram of the protocol; ⑤ establish the model of the traces of the protocol implementations. We take the classic Needham-Schroeder protocol, for example, to show how to establish a model with our new method.

4.1. The Protocol Symbolic Description

It is a basic requirement of protocol communications that at least two participants participate in an interactive communication in accordance with certain specifications. Generally, Message Sequence Charts (MSC) of protocol interactive communications are expressed by an ITU-standardized protocol specification language (ITU: International Telecommunication Union) [27] (in this paper, protocol specifications are expressed by the approaches from ITU and literature [28]). The symbolic description of the Needham-Schroeder protocol specifications is shown in Figure 2.

In Figure 2 A and B, respectively, denote two participants of a protocol; pk(A) and pk(B), respectively, denote the public keys of the participants; sk(A) and sk(B), respectively, denote the private keys of the participants; Fresh Na and Fresh Nb, respectively, denote temporary values of A and B; and denotes the encryption of information with the public keys of B.

4.2. The Refinement of Source Codes

Taking the Needham-Schroeder source codes written in C, for example, we illustrate the refinement of source codes with Algorithm 1 and then analyze the security of protocol implementations at the source code level. When writing protocol codes, we exploit the mechanism of RAS public key cryptography for encryption and decryption, and the functions of the OpenSSL crypto library are used while encrypting and decrypting. The protocol runs over OpenSSL. Due to the limited length of this paper, only main source codes are shown in Algorithm 2.

1int main(int,argc,char argv)
2
3WSADATA wsaData;
4SCOKET client;
5RSA ;
6serv.sin_family=AF_INET;
7serv.sin_port=htons(port);
8serv.sin_addr.S_un.S_addr=inet_addr(''127.0.0.1'');
9client=socket(AF_INET,SOCK_STREAM,0);
10connect(client,(struct )&serv,sizeof
(serv));
11Gene_Rand;
12Comp_Str;
13memcpy(plaitxt_A_E,Str_Id,1024);
14public_Encrypt(plaitxt_A_E);
15strcpy(Cipher_buf,(const )cipher);
16iSend=send(client,Cipher,sizeof(Cipher),0);
17iLen=recv(client,(char cipher,sizeof(cipher),0);
18memcpy(Tcipher_Na_Nb,cipher_Na_Nb,1024);
19Private_Dencrypt_Na_Nb(R,Tciper_Na_Nb);
20memcpy(plaitxt_Nb,StrNb,1024);
21Public_Encrypt_Nb(plaitxt_Nb);
22strcpy(Cipher,(const)cipher);
23iSend=send(client,Cipher,sizeof(Cipher),0);
25int Gene_Rand
26
27BIGNUM ;
28int ret,bits=128;
29char ;
30n=BN_new;
31ret=BN_pseudo_rand(n,bits,1,1);
32sn=BN_bn2dec(n);
33strcpy(RandNum,sn);
34return 0;
36int Com_Str
37
38strcat(Str_Id,RandNum);
39strcat(Str_Id, ''A'');
40return 0;
42int Public_Encrypt(unsigned chars)
43
44RSA ;
45int ret,flent,len;
46BIGNUM ,;
47bnn=BN_new;
48bne=BN_new;
49ret=BN_dec2bn(&bnn,strn);
50ret=BN_dec2bn(&bnn,stre);
51r=RSA_new;
52r->n=bnn;
53r->e=bne;
54flen=RSA_size(r);
55len=RSA_public_encrypt(flen,s,cipher_A_E,r,3);
56return 0;
58int Private_Dencrypt_Na_Nb(RSA ,unsigned chars)
59
60int len,flen;
61flen=RSA_size(r);
62len=RSA_private_decrypt(flen,s,plaitxt_Na_Nb, r,3);
63return 0;
65int StrDivNb(char str)
66
67int k=0,i=0,j=0;
68int strlen=0;
69int Publi_Encrypt_Nb(usigned chars)
70
71RSA ;
72int ret,flen,len;
73BIGNUM ,;
74bnn=BN_new;
75bne=BN_new;
76ret=BN_dec2bn(&bnn,strn);
77ret=BN_dec2bn(&bne,strn);
78r=RSA_new;
79r->n=bnn;
80r->e=bne;
81flen=RSA_size(r);
82len=RSA_public_encrypt(flen,s,cipher_Nb,r,3);
83return 0;

As is shown in Algorithm 2, the source codes of the Needham-Schroeder protocol include many redundant ones. According to Proposition 5 and Algorithm 1, the security analysis of protocol implementations will not be influenced if those redundant codes are deleted. After deletion, the left source codes of the Needham-Schroeder protocol are mainly function codes (shown in Algorithm 3). According to Definition 4, the source codes in Algorithm 3 are the refinement of the source codes in Algorithm 2. In the example of the Needham-Schroeder protocol implementations, the behaviors produced by the source codes in Algorithm 3 are the subset in those produced by the source codes in Algorithm 2. Therefore, we can exploit the traces of function return values produced by the source codes in Algorithm 3 to dynamically analyze the security of the Needham-Schroeder protocol implementations at the source code level.

1int main(int argc,char argc)
2
3connect(client,(struct sockaddr &serv,sizeof(serv));
4Gene_Rand;
5Comp_Str;
6memcpy(plaitxt_A_E,Str_Id,1024);
7Public_Encrypt(plaitxt_A_E);
8strcpy(Cipher_buf,(const char cipher);
9iSend=send(client,Cipher,sizeof(cipher),0);
10iLen=recv(client,(char cipher,sizeof(cipher),0);
11memcpy(Tcipher_Na_Nb,cipher_Na_Nb,1024);
12Private_Dencrypt_Na_Nb(R,Tcipher_Na_Nb);
13memcpy(plaitxt_Nb,StrNb,1024);
14Public_Encrypt_Nb(plaitxt_Nb);
15strcpy(Cipher,(const char cipher);
16iSend=send(client,Cipher,sizeof(Cipher),0);
18int Gene_Rand
19
20n=BN_new;
21ret=BN_pseudo_rand(n,bits,1,1);
22sn=BN_bn2dec(n);
23strcpy(RandNum,sn);
25int Comp_Str;
26
27strcat(Str_Id,RandNum);
28strcat(Str_Id, ''A'');
30int Public_Encrypt(unsigned char )
31
32bnn=BN_new;
33bne=BN_new;
34ret=BN_dec2bn(&bnn,strn);
35ret=BN_dec2bn(&bne,stre);
36r=RSA_new;
37flen=RSA_size(r);
38len=RSA_public_encrypt(flen,s,cipher_A_E,r,3);
40int Private_Dencrypt_Na_Nb(RSA ,unsigned char );
41
42flen=RSA_size(r);
43len=RSA_private_decrypt(flen,s,plaitext_Na_Nb, r,3);
45int Public_Encrypt_Nb(unsigned char )
46
47bnn=BN_new;
48bne=BN_new;
49ret=BN_dec2bn(&bnn,strn);
50ret=BN_dec2bn(&bne,stre);
51r=RSA_new;
52flen=RSA_size(r);
53len=RSA_public_encrypt(flen,s,cipher_Nb,r,3);

According to Definition 4 and Proposition 5, the security analysis of protocol implementations after refinement is consistent with that before refinement.

4.3. Program Control-Flow Graph

To clearly see the traces of the function return values generated during the protocol implementations, taking the source codes gained after the refinement (in Algorithm 3), for example, we draw a program control-flow graph to show the process of calling functions after the refinement, shown in Figure 3.

In Figure 3, denotes the control-flow direction of program main functions; “” denotes the control-flow direction of calling functions; and “” denotes the control-flow direction of function returning.

As is shown in the program control-flow graph, there are only two types of function return values: ① deterministic return values (like numerical values, alphabets, symbols, etc.); ② nondeterministic return values (such as calling functions directly or indirectly). When function return values are not deterministic, functions will continue to call other functions until the function return values become deterministic.

4.4. From the Control-Flow Graph to the State Graph

From Figure 3, we can clearly see the traces of the protocol implementations at the source code level and the state of how functions are called. According to C language grammar and program executive rules, in the normal process of program implementations every called function has a return value (deterministic or nondeterministic) and the implementation of every function is related to its return value. When a function receives its return value, the program will execute next step in order. Every function can be regarded as a state node, and function return values can be regarded as input labels. In this case, the program control-flow graph is just like an automata state graph. As mentioned above, function return values are clarified into deterministic values (like numbers, symbols, character strings, etc.) and nondeterministic values (such as functions). Here, we use to denote a deterministic value set and to denote a nondeterministic value set. Then after refinement, the control-flow graph of source codes is transferred into the state graph of a LTS, whose input label is . A LTS has no initial state and receiving state. It only has starting point and terminal point, and any state can be regarded as its starting state. Hence, let function Gene_Rand be the starting point of a LTS and let Send function be its terminal point. The state graph of the Needham-Schroeder protocol is obtained after refinement, shown in Figure 4.

In Figure 4” denotes a state, and its subscript denotes the state of its corresponding function point. For example, denotes the state of memcpy function. If a called function does not call other functions any more, its return value is deterministic, and is written to denote its input label. If a called function continues to call other functions, then its return value is nondeterministic, and is written to denote its input label. Here, “” can be used to denote the constitution of nondeterministic input labels. Its final return value is deterministic . For example, the return value of the state is not deterministic at first, for it calls other functions, like BN_new function. Then, the input label of the state is .

4.5. The Traces of Protocol Implementations

As is shown in Section 4.4, a LTS can be established on the base of the function return values in the program control-flow graph. A LTS is denoted by a 4-tuple . We define the LTS of protocol implementations as follows:

() denotes a function node state set; denotes the starting point state of the LTS; and denotes the terminal point state of the LTS.

() denotes other functions, and there exists . denotes the transition relation between any two adjacent states , . denotes an input label. Let . If is , there exists a substitute function , and . Then final function return value is deterministic.

According to the LTS of protocol implementations, after protocol implementations there exist the traces of function return values:   . Then the security of protocol implementations at the source code level can be dynamically analyzed on the base of these traces.

5. The Security Analysis of Protocol Implementations

Functions are an important part of the source codes of a program design language, especially C language. If the behaviors of called program functions are regarded as the events of a protocol implementation, function return values can act as the conditions for running an event during the process of protocol implementations. In this paper, operational semantics are exploited to analyze the behaviors’ security of protocol implementations at the source code level.

5.1. The Operational Semantics of Function Return Values

Operational semantics clearly display the traces of function return values of a protocol implementation. That is, they display concrete behaviors of protocol implementations. Therefore, operational semantics have an advantage over other methods and are competent to analyze the security of protocol implementations.

() The function return values of protocol implementations is as follows.

. Here denotes the functions of a protocol, denotes running it, and denotes its function return values.

() The BNF forms of function return values () during protocol implementations are as follows:ReturnValue IdentityUnidentifyUnidentify FunctionIdentifyFunction ≔ Self_FuncLibr_FuncSelf_Func ≔ FunctionIdentifyLibr_Func ≔ FunctionIdentifyIdentify ≔ numberalphabetstring

Here, denotes self-defined functions and denotes library functions. Every function has a return value (deterministic or nondeterministic). Every function return value is an element of a trace (). After a protocol is implemented, all the function return values constitute its traces. Here is the definition of the BNF form of the traces:Traces numberalphabetsymbol

According to the relation between called function events of C language and return values, every called function has a return value, including void values. Then there exists the bijection between called function events and return values: CallEvent∣→ReturnValue. According to the traces obtained after the protocol implementations, is true. Therefore, there exists the bijection: e∣→. That is, every event corresponds to an element of the traces of protocol implementations.

Definition 6 (event trace). Let every event of protocol implementations correspond to a function return value . There exists a set , which consists of all the function return values. The set denotes an event trace.

Once the bijection between called function events of protocol implementations and function return values is set up, there exists the bijection between the event set and the elements of the traces: ∣→. If the event is not secure, its trace is not secure either, which will inevitably lead to the insecurity of the protocol implementations.

5.2. Ideal Trace and Nonideal Trace

Generally, it depends on the Dolev-Yao model assumption to decide whether a protocol is secure or not. Equally, the trace of function return values is obtained on the base of the Dolev-Yao model assumption. We are the first to propose “the ideal trace” as the reference of protocol security evaluation for the precise security analysis of protocol implementations at the source code level.

In the following definitions, the symbol denotes the ordinal relation of called functions during the protocol implementations. As Definition 6 shows, denotes the ordinal relation of the events during the protocol implementations. is a binary relation over . satisfies partial ordering relation on event sequences : ① reflexivity (there exists for any event ); ② antisymmetry (if there exists for any events and , then there exists ); ③ transmissibility (if there exists   and for , then there exists ).

Definition 7 (nonideal trace). The protocols are implemented in the following environments: (1)The environments are based on the Dolev-Yao model (adversaries have the capacity to attack actively or passively).(2)The environments of implementing protocols are insecure.The environments are insecure due to the flaws of implementing program language structures, such as memory overflow and pointer operation.There are malicious code attacks in the environments.

The trace obtained in such environments is called a nonideal trace.

Definition 8 (ideal trace). The trace of general protocol implementations satisfies the following conditions.
() In an ideal communication environment (there are no adversary attacks in an ideal communication environment, passive attacks, or active attacks; i.e., there is no Dolev-Yao model assumption in the environment), all the participants of a protocol are honest. The information sent or received by these participants is protected and read by encrypting and decrypting techniques.
() Partial ordering relation:    . Here denotes that the events of protocol implementations satisfy partial order relation.
Generally, if a trace of protocol implementations satisfies both condition () and condition (), it is called the ideal trace.

Definition 9 (the similarity between a nonideal trace and the ideal trace). Let , be ordered sequences, respectively, obtained in an ideal environment or a nonideal environment. According to the clustering method, the similarity between the ideal trace and a nonideal trace is defined as follows:

() Let the Euclidean distance between ordered sequences and be .

() Let the similarity between ordered sequences and be , and . denotes the similarity coefficient.

The bigger is, the smaller the degree of deviation between and is. This shows that the similarity between the ideal trace and the nonideal trace is bigger, and vice versa.

According to Definitions 7, 8, and 9, the ideal trace consists of function return values obtained in an ideal environment. In practice, most communications are carried out in nonideal environments. If the trace produced in a nonideal communication environment deviates from the ideal trace, it indicates that the protocol is attacked by a third party. If the attacked protocol is improved, its trace will deviate less from the ideal trace. That is, after its improvement, the similarity between its trace and the ideal trace is bigger. It means that the improved protocol is more secure than the original protocol.

6. The Method and the Experiment

According to Algorithm 1, in the process of protocol implementations, source codes are refined. After that, labelled function return values are obtained and constitute the sequences. When analyzing the security of protocol implementations, we exploit the ideal trace as the reference of security evaluation. According to the clustering method, the sequences can be used as samples to analyze the security of protocol implementations at the source code level by means of the deviation of Euclidean distance and the similarity.

6.1. The Steps of Our New Method

() Supposing that source codes are executed in an ideal communication environment: after an execution, function return values are obtained and form a sequence. It is called the sequence of ideal trace data.

) Supposing that source codes are executed in a nonideal communication environment: after an execution, function return values are obtained and form a sequence. It is called the sequence of nonideal trace data.

() With the clustering method [29], the Euclidean distance [30] between the ideal trace and the nonideal trace is calculated. The following is the formula of the Euclidean distance:

Let , and let be a distance matrix:

Here . If , the nonideal trace does not deviate from the ideal trace.

() Supposing that, after implementing the protocols and , their sequences of trace data are obtained: let be the similarity between the ideal trance and a nonideal trance. The similarity between the sequence of and the sequence of is

Formula (3) satisfies the following: ① If , ; ② if , .

Therefore, their similarity is inversely proportional to their deviation. That is, the smaller the similarity is, the easier the protocol implementation is attacked.

With our new method, when analyzing the security of protocol implementations, there are two cases: ① If , the nonideal trace deviates from the ideal trace    denotes the deviation between the ideal trace and a nonideal trace ). It means that the protocol is attacked during implementations. ② The bigger is, the closer the trace of the protocol gets to the trace of the protocol . It means that the protocol implementations are more secure.

6.2. The Experiments

We carry out experiments with classical protocols and their improvements (written in C). In the experiments, we analyze the cases in which these protocols are attacked by man-in-the-middle attacks during implementations. The running environment is Win7, Visual studio 2010, Intel(R) CPU G3240, memory 4 GB, openssl-1.0.1s. The protocols run by using the functions and the big number in OpenSSL function library. The data of the protocols are encrypted and decrypted with the mechanism of the RSA public key. Participants communicate by linking TCP of Socket API. Simulate experiments are carried out with the pattern of client/server. The function return values are transformed into numerical values and then used as experiment data, which will not influence the result of the security analysis of protocol implementations.

() We analyze man-in-the-middle attacks of the Needham-Schroeder protocol implementations and the Needham-Schroeder-Lowe protocol implementations.

Experiments are carried out in two types of environments: ① the ideal environment and ② nonideal environments. In the ideal environment, the Needham-Schroeder protocol is called INS for short and the Needham-Schroeder-Lowe protocol is called INSL for short. In nonideal environments, the Needham-Schroeder protocol is called MANS for short and the Needham-Schroeder-Lowe protocol is called MANSL for short. NSL is the improvement of the Needham-Schroeder protocol. Theoretically, NSL can resist man-in-the-middle attacks. After running these protocols, their traces are obtained, shown in Table 1.

The illustration of the data in Table 1: ① the numbers in the first line are the serial numbers of the functions of the protocol implementations; ② the first row corresponds to the names of the protocols and their serial numbers; ③ the values at the intersections are the function return values of each protocol implementation. For example, INS () () = 144 means that, during the implementation of INS, the fifth function return value is 144.

As is shown in Table 1, the traces of the Needham-Schroeder protocol implementations and of the Needham-Schroeder-Lowe protocol implementations deviate from the traces of their implementations in the ideal environments. According to formula (1), we, respectively, calculate each Euclidean distance between INS and MANSL, INSL and MANSL, and MANS and MANSL. , and . The results of calculation are shown in Table 2.

According to formula (2), we, respectively, calculate each similarity between INS and MANSL, INSL and MANSL, and MANS and MANSL. The results are shown in Table 3.

As is shown in Table 2, means that the trace of INS deviates from the trace of the MANS protocol implementations; means that the trace of INSL deviated from the trace of MANSL. As is shown in Table 3, 2.43 × 10−3 > 2.27 × 10−3 means that the similarity between INSL and MANSL is bigger than the similarity between INS and MANS.

() We analyze man-in-the-middle attacks of the Diffie-Hellman protocol implementations and the Diffie-Hellman-Signature protocol implementations.

The protocols are carried out in two types of environments: ① the ideal environments and ② nonideal environments. In the ideal environment, Diffie-Hellman is called IDH for short and Diffie-Hellman-Signature is called IDHS for short; in nonideal environments, Diffie-Hellman is called MADH for short and Diffie-Hellman-Signature is called MADHS for short. DHS is the improvement of the Diffie-Hellman protocol and, theoretically, it can resist man-in-the-middle attacks. After running these protocols, their traces are obtained, shown in Table 4.

According to formula (1), we, respectively, calculate each Euclidean distance between IDH and MADHS, IDHS and MADHS, and MADH and MADHS. , and . The results of calculation are shown in Table 5.

According to formula (2), we, respectively, calculate the similarity between IDH and MADHS, IDHS and MADHS, and MADH and MADHS. The results are shown in Table 6.

As is shown in Table 5, means that the trace of IDH deviates from the trace of MADH; means that the trace of IDHS deviates from the trace of MADHS. As is shown in Table 6, means that the similarity between IDHS and MADHS is bigger than the similarity between IDH and MADH.

() We analyze the performance of our new method.

To illustrate the performance of our new method, we only list the time overhead of each protocol implementation in the ideal environment and nonideal environments (shown in Table 7) and discuss the relation between the time overhead and the similarity (shown in Figures 5 and 6). Due to the limited length of this paper, we only list and discuss the time overhead and the similarity of each protocol implementation in the simulated experiments mentioned above.

As is shown in Table 7, the time overhead of implementing protocols in the ideal environment is less than the time overhead of implementing protocols in the nonideal environments.

We calculate the absolute values of the difference of the time between implementing protocols in the ideal environment and implementing protocols in the nonideal environments. We list of the time overhead and of the Needham-Schroeder protocol and its improvement (shown in Table 3), as well as of the time overhead and of Diffie-Hellman protocol and its improvement (shown in Table 6). By comparing their changes, we can see their relation clearly, shown in Figures 5 and 6.

This shows that the relation between and tends to be in inverse proportion. It means that is related to the performance of protocol implementations in different environments.

As is shown in experiments (), (), and (), the security of protocol implementations can be analyzed through the traces of function return values obtained when implementing the protocols in the ideal environment and in the nonideal environments. From the experiments, we can conclude the following:

① Third-party attacks can be found.

② The deviation of the improved protocols is smaller than the deviation of the original ones. This means that the improved protocols are more secure than the original ones in the process of implementations at the source code level.

③ The protocols are insecure during implementations at the source code level, even though they are proved to be secure on theory level.

④ In the ideal environment and nonideal environments, the performance of protocol implementations is related to the similarity. It is in accordance with the security thought defined by software.

This paper aims to analyze the security of protocol implementations at the source code level. With our method, attacks can be discovered by analyzing whether there are abnormal behavior characteristics when implementing protocols, such as memory overflow and malicious code (shellcode) attacks. This paper mainly analyzes man-in-the-middle attacks.

Man-in-the-middle attacks are a common type of attacks which protocols widely suffer from [31]. It is typical to analyze the security of protocol implementations which are attacked by man-in-the-middle attacks. The thought of our new method comes from life phenomena and the theory is on the base of LTS, program operational semantics, and program refinement. And we propose Algorithm 1. The purpose of analysis is to discover whether a protocol is attacked by third-party attacks, impersonation attacks, or other attacks during the implementation at the source code level. We discover that the similarity of the improved protocols, which can resist man-in-the-middle attacks, is bigger than that of the original ones. Hence, our method is competent to analyze the protocols attacked by man-in-the-middle attacks during the implementations at the source code level.

7. Conclusion and Future Work

In the paper, we have proposed a new method to dynamically analyze the security of protocol implementations at the source code level. First, we have refined protocol source codes through strong simulation relation and have established a model for the security analysis of protocol implementations. Second, when implementing protocols, we have obtained the function return values, which constitute the sequences of trace data. We propose the ideal trace as the evaluating reference for protocol security analysis. The clustering method is exploited to analyze the sequences of trace data. We propose to exploit the deviation of the Euclidean distance between the ideal trace and nonideal traces and the similarity between the ideal trace and nonideal traces to analyze the security of protocol implementations. Last, taking some classical protocols, for example, we have carried out experiments. Our experiments show that third-party attacks can be found by analyzing the deviation and similarity of the traces of the function return values obtained when implementing protocols. It is also shown through our experiments that the improved protocols are more secure than the original ones. Our method exploits the traces of protocol implementations to analyze its security and differs from other methods mentioned in the literature. It will be helpful and valuable for protocol design, security verification, and security evaluation.

The future study in this field lies in the automatic security analysis of protocol implementations [32] and the security analysis of parallel protocol implementations [33]. It requires improving existing protocol automatic analysis tools or developing new automatic tools for the security analysis of protocol implementations. On the other hand, since the flaws of language structures will make the security analysis of protocol implementations more complex, more studies should be carried out to solve these complex problems, such as C language memory access and pointer analysis. This field should be focused on.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Key Foundation of China (no. 61332019) and the National Basic Research Program (NBRP) (973 Program) (no. 2014CB340601),