Abstract

Managing the performance of the Session Initiation Protocol (SIP) server under heavy load conditions is a critical task in a Voice over Internet Protocol (VoIP) network. In this paper, a two-tier model is proposed for the security, load mitigation, and distribution issues of the SIP server. In the first tier, the proposed handler segregates and drops the malicious traffic. The second tier provides a uniform load of distribution, using the least session termination time (LSTT) algorithm. Besides, the mean session termination time is minimized by reducing the waiting time of the SIP messages. Efficiency of the LSTT algorithm is evaluated through the experimental test bed by considering with and without a handler. The experimental results establish that the proposed two-tier model improves the throughput and the CPU utilization. It also reduces the response time and error rate while preserving the quality of multimedia session delivery. This two-tier model provides robust security, dynamic load distribution, appropriate server selection, and session synchronization.

1. Introduction

The enormous growth of VoIP plays an active role in the IP-based business applications that introduce numerous real-time media services. Due to the lower cost and greater flexibility, customers are using various VoIP services such as voice calls, instance message, and video conference. In these services, VoIP adopts a SIP as a signaling protocol to create, manage, and terminate a multimedia session. The Real-time Transport Protocol (RTP) transmits media streams through a User Datagram Protocol (UDP) [1]. Figure 1 shows an example for SIP-based media conversation. An SIP transaction is carried out by the transmission of text-based request and response messages. The SIP transaction consists of two phases, namely, session establishment and session termination phases. In the session establishment phase, an INVITE message is followed by the corresponding ACK message whereas, in the session termination phase, a BYE message is followed by the corresponding ACK response message. The SIP server plays two vital roles; one is the utilization of location server to identify the location of the callee and the other is responsible for message routing. The SIP request messages are traversed via several proxies that are present either on the same or on different VoIP domains. Similarly, the related responses are forwarded to the same proxy servers in the reverse order.

The SIP-based IP telephony experiences a bottleneck problem because the signaling part deals with a huge amount of messages and the media part processes media streams. Therefore, both signaling and the media part directly influence the scalability of the VoIP network. In addition, the literature survey provides the causes of the SIP overload [26]. First, in order to provide reliable transmission, requests are retransmitted over the UDP for a specific period of time [2, 3]. Then, the SIP messages are transmitted among the clients via several proxy servers and adjacent nodes in the same or another network. Afterwards, the SIP messages are used as real-time session messages; consequently they are highly sensitive [4]. Finally, as voice calls are transmitted over the unsecured Internet, an attacker can easily inject attack packets as normal packets [5]. As a result, the server increases the unsuccessful call completion rate, reduces the throughput, and increases the call setup delay and spends most of the time to reject the request [6].

The main objective of the proposed work is to remove the overload state by detecting and eliminating the unwanted messages. The work also deals with the load distribution algorithm that provides a uniform load distribution among the servers. This work also maximizes CPU utilization, throughput while minimizing the response time, and execution time of the server. To attain these objectives, a handler and LSTT algorithm is introduced and it provides a uniform load distribution by selecting suitable server based on the least session termination time.

The key question when supporting end-to-end media stream security is which layer should provide media security, either the network layer or some higher layer. The currently available foremost alternatives for the TCP-based data communication are Internet Protocol Security (IPSec) [7] (network layer) or Transport Layer Security (TLS) (transport/application layer) [8]. The standard SIP authentication protocol is the HTTP Digest Authentication which uses a trusted shared secret key to perform a cryptographic hash function. The experimental results of Salsano et al. [9] demonstrated that even the HTTP Digest Authentication is causing a considerable overhead on the SIP protocol. In the real-time UDP traffic, the main alternatives are Secure Real-time Transport Protocol (SRTP) [10] (transport/application) and IPSec. Chen et al. [11] designed a key exchange protocol which secures media stream and moderates the impact of Spam over IP Telephony (SPIT) using mutual authentication. Yoon et al. [12] established that a few SIP authentication schemes are insecure against the attacks such as offline password guessing attacks, Denning-Sacco attacks, and stolen-verifier attacks.

In general, the load balancing algorithms are classified into static and dynamic. In the static methods, the load is assigned during compilation time according to the prior decision of knowledge. The main drawback of these methods is that the load balancing decision is previously known and hence static methods are less efficient than the dynamic algorithms. Conversely, the load is allocated dynamically based on the dynamic algorithm implemented in the server and offers a good performance. Numerous research works are carried out to control the SIP overload dynamically by considering various parameters such as feedback [4, 1317], call rejection [1821], session aware [22, 23], response time [24, 25], and priority [26, 27].

In the feedback-based overload control schemes [4, 1317], the overloaded server computes constraints on its generated load according to current load and distributes such constraints to its immediate neighborhood which is placed before the overload server by a feedback mechanism. Based on this constraint, the neighborhood server decides whether the traffic has to be forwarded to the overload server or not. The limitation of these works is that the additional overhead problem is added to the overload server when the load constraints are computed and distributed. In the call rejection schemes [1821], the overloaded server uses a local overload control algorithm to generate error responses (503 Service Unavailable) under a heavy load condition. Depending on the error response rate, incoming traffic is restricted. Though these algorithms result in better solution, in real time it is restricted by a service provider. In addition, high volume of rejection can lead a way for the SIP network failures. Therefore, this approach can be applied efficiently in lightly overloaded servers only.

The session aware method [22, 23] comprises three algorithms, namely, least call, least transaction, and weighted least transaction. In this method, the load is distributed according to the least number of active calls in the server. It is not a feasible solution because each call consumes different time duration. The next option is the least transaction algorithm which chooses the server, based on the least number of active transactions that the server currently handles. The main limitation of this algorithm is that the INVITE request takes a longer period than BYE request. The third algorithm is the weighted least transaction that assigns a weight of 0.75 for BYE and 1 for INVITE. In that algorithm, every server has to maintain the session and transaction states for individual user. The weight assignment method is useful only when the server in the cluster does not have the same capacity.

In [24, 25], the average response time of each server is maintained in a separate window and the load is distributed to the server which has the least response time. The overall response time of the session establishment phase is importantly considered in this research. This scheme becomes ineffective when the queuing delay and packet loss rates are increased. Moreover, the SIP transmits a maximum of 8 retransmissions for each original request. Hence, the overall response time of the session establishment will be increased. Besides, signaling messages are interrupted in order to enhance the performance of the SIP network during the media conversation period [2830]. For example, the signaling messages INVITE, 183 Session in progress, UPDATE, 200 OK, and ACK are interrupted during the media conversation to improve the SIP security as shown in Figure 2.

As a result, computing response time between INVITE and ACK is not a feasible solution. The existing weight-based scheduler [22, 23] leads to starvation, since it does not provide a way for a high-weighted request and so the retransmission occurs over a certain period. Therefore, an additional overload problem is applied to the load balancer. A few research works [26, 27] proposed a method which reduces the retransmission rate instead of reducing original sending rate. This solution offers a lesser amount of blocking calls and more revenue for carriers. The performance of the SIP server is increased in the priority queue [26], in which INVITE requests are assigned to low priority and all other requests are given high priority. The authors in [27] use the error message (503 Service Unavailable) that stops the retransmission request. But this method creates an additional overhead problem to the server. Researchers pay more attention on security threats introduced by the load balancer [3133]. Kim et al. [31] investigated the performance of SSL protocol for providing secure service in a cluster-based application server and proposed a backend forwarding method for improving server performance through better load balance. The authors [32] implemented a Distributed Denial of Service (DDoS) detection scheme in the load balancer. Here, the dispatcher module of Kamailio server acts as load balancer. This scheme detects low-rate and multiattribute DDoS attacks. However, there are no enough practical results. The authors in [33] implemented authentication layer for load balancing architecture. Thus, only the authenticated users can send jobs and no overload occurs within the cloud platform and the load balancer pays minimum resource wastage. In the previous work [34], the honeywall is a major contribution for reducing the load towards the load balancer. The SIP server is configured to make a decision to select the server with the least number of active calls and BYE_ACK method. In this work, the SIP message arrival rate, delay, and deadline are used to compute the load in the queue.

3. Proposed Two-Tier Model

In many cases, the load balancer alone spends more processing time for malicious packets, retransmission, and signaling traffic and latter these messages could be rejected [5, 35]. Therefore, the processing time together with the load distribution delay gets increased. In order to overcome these problems, in this work, a handler is designed that drops malicious and unwanted traffics before serving the load balancer. The operation of the handler is independent of that of the load balancer as shown in Figure 3. The handler is the first entity implementing prior to the load balancer which receives all incoming and outgoing SIP traffic. The handler provides anomaly traffic detection, prevention, and load mitigation. As a result, the work task of the load balancer gets decreased and the availability of server resource gets increased.

The second tier LSTT algorithm defines three processes, namely, distribution, decision, and selection. The main aspect of the LSTT algorithm is that it decides to select an appropriate server for new and existing SIP messages. Based on the call-ID and call sequence (Cseq) in the message header, a distribution process investigates whether the incoming message is new or existing one. If the incoming message is an existing one, then the previous corresponding request handling server is identified and the load can be distributed. Else, a decision process is applied to new message which selects a suitable server. Here, the LSTT algorithm dynamically calculates the session termination time by measuring the timestamp between a BYE and its corresponding ACK. Then, the selection process chooses an appropriate server which has the least session termination time in the history window. Each server maintains an accurate state and the load balancer updated periodically. At that time, every server will update the message delay time and cumulative session termination time at the predetermined time interval.

3.1. Handler

The SIP has many open choice control messages and it can easily be mounted by using a SIP traffic generator. In the SIP, a few messages (REGISTER, INVITE, and BYE) and its message fields like URI and Call-ID are only protected. All other signaling messages like 100 Trying, 180 Ringing, 200 OK, and ACK and message fields like From, To,and SDP are undefended [36]. Consequently, a signaling traffic increases from 20 to 40% [5, 3638] in the VoIP network. This signaling traffic increases the processing time of the load balancer and so the call dropping ratio at the victim side is increased. Therefore, an effective signaling traffic detection and prevention scheme is proposed in this work. The signaling traffic and its changes are computed by where is signaling traffic of the th user and is increasing signaling traffic rate at period.

The handler is deployed at the edge router of the innocent host and checks the SIP control messages that enter and leave through the edge router. The handler operates at three stages, namely, call rate, control, and drop. The call rate stage uses modified hash tables for two counters (session establishment and session termination) as shown in Figure 4. Each SIP message (INVITE-ACK and BYE-ACK) is hashed with and recorded in the handler database. For any valid SIP session there is a unique one-to-one mapping among INVITE-ACK and BYE-ACK. The abnormal three-way handshake is clearly shown by these two counters.

Initially, the Bloom filter was designed by Bloom [39] and modified to protect from flooding attacks [40, 41]. The Bloom filter has vector of bits and set to 0 in the initial condition. Let the element ; the bit positions in are set to 1. If any one of the bits is 0, then . An original Bloom filter is unable to handle the paired three-way handshake messages. Therefore, 0’s and 1’s are replaced with countable integers in the Bloom filter. For each incoming INVITE and BYE message, the handler increments the corresponding counter by 1 and decrements by 1 for a pair of 200 OK and ACK. Each paired INVITE-ACK and BYE-ACK counter remains 0 and the rest of the bit positions in the counter keep a nonzero value. The overflow bit position in the counter shows an unwanted traffic that intends to affect the SIP server. The handler notifies the affected position of the counters by where is number of signaling messages’ (INVITE, BYE) rate under normal traffic, is mean of the signaling message, and is standard deviation of the signaling messages.

The threshold value can be set according to the number of users and their generated signaling traffic. In this work, a maximum of is to be set for the maximum of 8000 calls per second (cps). The next step is to identity the offended SIP messages and eradicate them to provide a secure environment. Particularly, each detection cycle that deviates from the interarrival time of the message is considered to be an unwanted message. For every call, the signaling messages occur with a minimum round trip time as stated in

First, different types of signaling traffic that contain various forms of the SIP messages from legitimate users are observed. Then, the distribution of time intervals between the first and previous message of the same user is characterized as shown in Figure 5.

The signaling traffic of each message of the same type is identified by where is th observation of type signaling message and is th observation of type signaling message.

For instance, signaling traffic (REGISTER, INVITE, and BYE) and malformed packets (fake BYE, fake BUSY, fake BYE drop) are identified by where is first INVITE packet at time and is next INVITE packet at time of the same user.

Then, the average time interarrival of each SIP message is computed by where is number of messages from th observation.

The SIP message interarrival time dispersion is computed by

At last, the control stage compares the offended messages with threshold and removes it by drop rule. Thus, only controlled and needed requests are transmitted to the load balancer and all other requests are eliminated. Finally, our architecture is added with the second stage called LSTT load balancer.

3.2. LSTT Algorithm for Load Distribution

The main objective of the second tier architecture is that none of the servers should be in an idle and heavily loaded state. The second tier measures the least session termination time which is the round trip time (RTT) between BYE and ACK as shown in Figure 6.

This work uses a load balancer based on M/M/1 queuing model that consists of an infinite number of clients in which each server maintains a queue with Poisson arrival rate and service time which is exponentially distributed with mean . To identify the status of the server, it is considered that is the number of clients currently in the system, which would follow a Markov chain with states . When the client arrives, the server state is 0 or 1 and it is said to be in an idle or BUSY state, respectively. The given random process is a birth-death process and hence the rates are

Let be the average number of utilizations or loads in a server. If , the SIP server is heavily loaded; otherwise the server is lightly loaded. The distribution of signaling load for different types of calls can be characterized by empirical measurement as given in

Let be the set of messages used to terminate the session. Each call is assumed to be a probabilistic model and let be the probability of process to process for types of calls, where . The probability of mean arrival rate for types of calls is identified by where is mean arrival rate of th messages for types of call, is mean arrival rate of th messages for types of call, and is probability of th and th message for types of call.

The server utilization for calls is computed by

The probability of messages entering the queue for services is equal to the probability that all servers are BUSY. Thus, the mean for each message in the set is computed by

The total session completion time for terminating a call may be defined as the addition of message entering time and waiting time which is multiplied with service time. The session termination time () is computed by where is mean service time , .

The distributed messages wait in the queue for a certain period of time before processing and so this waiting time affects the response time-based scheduler. The proposed work reduces the message waiting time; thus, the mean response time gets minimized.

3.2.1. Minimizing Mean Response Time

Figure 7 shows the load balancer which receives a continuous stream of messages to be processed in the message set as where = 1, 2, and 3.

These messages require an amount of processing time that varies with several orders of magnitude (size, cost, etc.). From Figure 7, the clients and their generated messages are shaded by different colors. The client1 call is terminated after serving the INVITE messages of and . The call termination varies according to the cost of the message, which means that the INVITE message takes a longer time to complete a call than the BYE message [23]. Hence, any response time-based scheduling algorithm has to consider delay which decides the efficiency of the algorithm. It is assumed that the server has a capacity and the sum of the message size in the set must be less than or equal to . Utilized Parameters shows the notations used in this work.

It is learnt that the service discipline at each server is FIFO in which scheduling and transmission times are negligible. The arriving SIP message spends a particular time in the system which is denoted as waiting time and computed by The weighted total response time of the message is given in The waiting time weights have a different load index such that

To minimize the waiting time of each message in the message set, the delay factor with known factors of arrival time and a deadline of each SIP message is considered. For example, INVITE message waits until  ms and its corresponding response message waits until [1]. Therefore, the proposed system measures how long a message is delayed compared to its deadline. The delay time is measured according to the waiting time of the message in the queue. Thus, the minimization of the weighted mean waiting time is equivalent to the minimization of the weighted total waiting time. To minimize the total waiting time, the existing work takes a maximum flow time and max stretch (delay) [42, 43]. These metrics are desirable for the real-time application, but the maximum flow metrics increases the sequence of message arrival rate which results in starvation. The maximum delay time instead of max flow is applied so as to avoid starvation that is computed by where .

Equation (17) is rewritten as The delay time of a message is its waiting time divided by its processing time as given in

To reduce the mean response time, decision process assigns the optimal delay time to each message and sorts the message sequences according to the nondecreasing delay time. Then, the load balancer selects the SIP message which has a maximum delay and distributes it to an appropriate server. The LSTT algorithm computes the mean session termination time of each server by where is th server session termination time, is th server cumulative termination time, , where , Session is th session termination time, and is th server newly received session termination time.

Finally, the selection process identifies the server in the history window by using (21). It can be noted that the messages are the same size with same delay time that are executed in FIFO fashion. The load balancer distributes the messages according to the least cost of the session termination time among the servers.

4. Experimental Setup

Figure 8 depicts the experimental test bed which consists of the various experimental components as listed in Table 1.

4.1. Call Generation

For generating a huge amount of call traffic, is used as an open source tool coded with XML for the implementation. When a new call is generated by INVITE message, the SIP proxy forwards it to the corresponding client and the appropriate response will be received. Once a call is initiated successfully, the media data is exchanged; then, the session termination occurs with standard BYE message and its corresponding ACK message. has been configured in such a way that the load is generated continuously without any interruption. The maximum capacity of the proxy and its performance has been measured under heavy load conditions.

4.2. Hardware Implementation

The OpenSIP express version 1.1 (OpenSER) is an open source proxy server coded in C and it is considered to have high efficiency with minimal functionality. The database stores OpenSER configuration data and registers client information. An IBM blade server with 4 GB RAM and 100 GB ATA disk drives is used to carry out these experiments. In the study, it is observed that the CPU cycles spend more time in stateful proxy. However, a stateful proxy records the route for future use.

4.3. Load Balancer Design

It has also been observed that, without handler mechanism, a server drops the messages which results in retransmission. On the other hand, the SIP server sends 503 Service Unavailable responses to avoid unwanted retransmissions to the corresponding clients [30]. This response scenario cannot provide a better performance owing to the relatively high processing cost. Initially, a set of experiments is conducted with handler prior to the load balancer and this combination can be used to check the test call sessions. The full session is captured and analyzed in the two-tier model to ensure the total number of sessions for a call from a client to a server and vice versa. The load balancer maintains an active table, to track the incoming load rate and the number of active sessions in the server. The timer is set for 1 sec to update the active table regularly. The proposed algorithm gives a pause time with a mean of 1 min and a variance of 30 sec.

5. Performance Comparison

The load which is generated in this experiment varies up to a maximum of 8000 calls. Then, the detection rate of handler, server throughput, response time, CPU utilization, and error rate are measured. The obtained results are compared with the existing algorithms (response time [25], transaction least work 1.75 [22]). In order to identify the performance of the two-tier model, two different implementations are executed. The first one is load balancer with handler and the other one is without handler. Each implementation is measured for an average of 10 min and every event is measured for 120 sec and 5 sec warm-up period. Every test runs around 10 to 20 times to validate the results and the average values of various parameters are analyzed.

5.1. Attack Packet Distribution

Figure 9 shows the observed SIP signaling attack packets before serving the handler. It can be observed that the fake and flooding traffics give significant changes in the VoIP network. For the duration of one week, 1821 flooding and 632 fake SIP messages are observed. In addition to that, 97 timeouts and 117 retransmission packets are also obtained. Therefore, a strong protection mechanism is required to drop the unwanted traffic.

5.2. Handler Detection Rates

The generates a normal traffic and is mixed with attack packets from 100 to 400 packets per second (pps). The experiment is repeated for several runs up to 20 times. The proposed process on the handler is capable of identifying lower arrival rate of 50 pps. The duration of the detection rate varies from 5 to 10 sec for increasing the Poisson rate from 100 to 400. The detection rates are calculated from the relative proportions of the SIP attributes during the same period of time. It is found to have certain great fluctuations as shown in Figures 10(a) and 10(b). Existing Hellinger Distance [36] based detection methods require a maximum threshold value of = 0.4. The proposed handler utilizes only = 0.08 for flooding and fake message consumes a maximum of = 0.05.

5.3. False Alarm Rate: Fake Messages

Figure 11 shows the false alarm rate for fake signaling packets. The proposed handler is found to have a good effect on correctly identifying the fake signaling packets. As the call volume increases, the false alarm rate of misclassified fake signal packet reaches less than 0.2 for the proposed handler and 0.5 for the existing system.

5.4. Response Time

Tables 2, 3, and 4 depict the average response time of the proposed two-tier model and existing algorithm. A maximum of 3 servers are used in this experiment and the server capacity is varied from 300 to 500 cps. The average response time is also varied while increasing the load to the maximum of 1000 cps. The existing algorithm [25] takes a maximum of 2.8 ms whereas the LSTT maintains less than 1.41 ms for 1000 cps. Similarly, service rate of 1 ms is applied and and are varied accordingly, for which a LSTT algorithm results in a maximum of 1.9 ms and 0.3 ms response time, respectively, as given in Tables 3 and 4.

5.5. Load Balancer Throughput

The throughput of existing algorithms [22, 25] is compared with the proposed algorithm as given in Tables 5 and 6. The throughput results are obtained for generating a load of 8000 cps without pause duration. From Table 5, it is observed that the LSTT achieves the highest throughput of 99%. However, the existing response time [25] attains only 90% and TLWL 1.75 [22] reaches a maximum of 91.6%.

The number of clients is increased up to 10 for a maximum of 3 servers. The obtained throughput results are furnished in Table 6 and it is obvious that the proposed algorithm works better in spite of increasing the client rate. Efficiency of the LSTT algorithm is verified by removing the first tier and it is observed that the LSTT linearly increases the throughput for a maximum of 6821 cps without a handler and 7420 cps with a handler as shown in Figure 12.

5.6. CPU Utilization

Table 7 summarizes the execution time of the LSTT algorithm and corresponding CPU utilization. The execution time for the estimation process of the LSTT algorithm is the time stamp between a BYE request from the client and its corresponding ACK from the server. The two-tier model utilizes lesser time to accomplish a SIP transaction because it spends a minimum amount of time to analyze the message whereas the rest of the algorithms monitor and analyze large amount of messages.

5.7. Error Rate

Figure 13 demonstrates the error rates of different load balancing algorithms. The error rate is the percentage of unsuccessful requests which do not get the appropriate servers since the load balancer spends more time in message analysis.

From Figure 13, it is observed that the better load balancing algorithm increases the acceptance rate of load and rejects the request that spends more time on consumption of resources. Furthermore, the arrival rate of excessive requests in the load balancer takes more processing time and these requests are not required for services. Spending more time to analyze these requests raises the error rates. However, dropping of these messages could result in a maximum throughput. The error rate result clearly brings out that there are a number of requests that are not effectively distributed due to timeout conditions. The proposed LSTT algorithm achieves 5% of error rate at offered load of 4500 cps and TLWL 1.75 generates 15% of error rate.

6. Conclusion

In this paper, a two-tier model was proposed to eliminate the malicious traffic in the first tier and provide uniform load distribution in the second tier. The proposed LSTT algorithm ensures that all inbound/outbound SIP messages are routed to an appropriate server which has the least session termination time at the moment. The implementation results are phenomenally impressive and the proposed model significantly improves the throughput and 99% of CPU utilization even when the offered load was a maximum of 8000 cps. At the same time, the system responds instantly to the variation of the generated load and achieves a better performance with a reduced error rate of 5%. Thus, it is concluded that the proposed two-tier system provides an enhanced performance with a guaranteed QoS.

Utilized Parameters

:SIP messages
:Arrival time
:Processing time
:Waiting time
:Deadline
:Delay
:Message completion time
:Slowdown or slack time
WT:Weight.

Conflicts of Interest

The authors declare that they have no conflicts of interest.