Abstract

This paper describes the design and prototype implementation of a communication platform aiming to provide voice and video communication in a distributed networking environment. Performance considerations and network characteristics have also been taken into account in order to provide the set of properties dictated by the sensitive nature and the real-time characteristics of the targeted application scenarios. The proposed system has been evaluated both by experimental means as well as subjective tests taken by an extensive number of users. The results show that the proposed platform operates seamlessly in two hops, while in the four hops scenario, audio and video are delivered with marginal distortion. The conducted survey indicates that the user experience in terms of Quality of Service has obtained higher scores in the scenario with the two hops.

1. Introduction

Mobile ad hoc networks have received particular attention the last years due to the wide range of applications such as real-time communications including video apart from voice where existing telecommunication infrastructure may fail. The introduction of low-cost wireless technologies and the standardization efforts of the IETF MANET Working Group have been generating renewed interest in research and development of MANETs outside the military field. The advent of new products in both hardware and software has eliminated many of the barriers of the past, enabling the development of integrated platforms providing a wide spectrum of services.

In this dynamic and distributed environment it is important to deploy multimedia application and services. This necessitates the deployment of P2P Voice and Video in a large scale, since many users become aware of the abilities of these newly developed architectures and migrate to them. Wireless multihop networks often show great potential, due to some characteristics such as node mobility and extended packet-forwarding ability [1]. Simply applying current peer-to-peer overlay techniques to MANETs is rather undesirable due to node mobility, energy consumption, and lack of infrastructure. Always keeping in mind that the overlay technology needs for power consumption and response time reasons to reflect the underlined physical network topology, wired network control schemes are unable to accommodate a constantly changing peer group where nodes constantly join and quit. An additional issue in the overall idea of this network architecture is peer cooperation. As shown in [2], most topology control algorithms assume that peers are cooperative, which is simply not the case. Peers are always trying to minimize their own costs such as the number of necessary communication links or the distance to other peers. Several studies [3] investigate the selfish peer impact on topology unfortunately in a rather theoretical approach where peers have global knowledge that is considered fundamental for overlay construction. Due to lack of a practical overlay topology control algorithm other means need to be established. Since traditional approaches tend to show decreased performance, peer-to-peer services in MANETs might need a fresh new design which would enable better results. It is certainly not a coincidence that even network operators are searching methods of using these novel applications in terms of profit [4]. Alas, the main issue in those topologies is no other than network capacity. A performance evaluation presented in [5] shows the influence of intra and interflow interference in channel utilization, which directly impacts the VoIP capacity. In more controlled environments such problems are not that obvious or additional components such as wireless mesh routers [6] could be an effective solution. Nevertheless, current P2P searching and routing arithmetic do not meet the requirement for extremely low time delay real-time multimedia application demand [7] so other network and parameters are to be examined. Figure 1 presents an example of an unstructured peer-to-peer overlay in comparison with the overall nodes participating in the physical network.

In this paper we describe the design and prototype implementation of a communication platform aiming to provide voice and video communication in a distributed networking environment. Performance considerations and network characteristics have been taken into account in order to provide the set of properties dictated by the emergency and sensitive nature of the targeted application scenarios. The proposed system has been evaluated both by experimental means as well as subjective tests means. The results show that the proposed platform operates seamlessly in two hops, while in the four hops scenario, audio and video are delivered with marginal distortion. The conducted survey indicates that the user experience in terms of Quality of Service obtained higher scores in the scenario with the two hops.

The rest of the paper is organized as follows. In Section 2 we present the basic system architecture, while in Section 3 we present the results regarding the objective performance evaluation and the subjective survey-based user rating of the platform. Finally, Section 4 concludes our paper.

2. System Architecture

2.1. Network Organisation

A primary decision that needs to be made in defining the proposed communication platform is the type of underlying network organization. The autonomic nature of the network, where each node operates in a standalone fashion, is matched by several network organization paradigms, namely, peer-to-peer networks, mobile ad hoc networks, and so forth.

Given that the application focus of the proposed communication platform is on real-time multimedia provisioning, a set of basic properties of the underlying network organization needs to be met. To begin with, ease of deployment is necessary so that the network is quickly set up in a straightforward manner. Depending on the conditions on the deployment area, the network topology should be formed using simple procedures without requiring much computational or communicational effort. Furthermore, the decisions made about the construction and the operation of the network should be distributed across the network nodes, thus avoiding single (or a limited set of) point of failure. Resiliency in order to overcome potential failures of nodes is another requisite of the underlying network.

By definition, mobile ad hoc networks (MANETs) satisfy all the aforementioned properties. MANETs consist of wireless network devices that operate without any kind of centralized control or fixed communication infrastructure. Each network node operates not only as a host but also as a router forwarding packets to other nodes, which may not be within direct transmission range to each other. Therefore, each packet is transmitted to its destination in a multihop manner. The autonomous nature of MANETs fits the required properties of the real-time multimedia provisioning, while in the same time posing interesting challenges in defining efficient protocols in this direction.

2.2. Peer-to-Peer MANETs

In order to efficiently provide real-time multimedia transmission over MANETs, some notions of the formation and operation of peer-to-peer (p2p) networks are employed. Such networks are constructed based on an overlay network topology that is formed on top of the actual (physical) network topology and dictates the way the network peers are logically connected between each other.

Generally, three classes of peer-to-peer overlay networks related to the structure of the topology create structured, unstructured, and hybrid peer-to-peer architectures. In structured overlays, the architecture is controlled in an organized manner with content distributed at specific locations across the network to increase the efficiency of lookup queries. In unstructured overlays, the network is organized randomly in either a flat or hierarchical manner and execute queries using flooding, random walks, or expanding-ring Time To Live (TTL) techniques. Each peer receiving the query will initiate the search on its own local content which allows the execution of more complex queries. Finally, in hybrid peer-to-peer architectures, queries are handled by a central server which contains a database of content and its location within the overlay network, peers lookup the content on the main server then connect to the peer containing the specific content using the overlay. Peers are responsible for contacting the main server with the information on which resources and content they wish to share, making them an unfavorable solution on MANETs due to the single point of failure.

In order to setup, maintain and tear down multimedia sessions between network nodes, clearly define procedures for establishing, using, and terminating a logical connection between the terminal nodes. These procedures are part of a signaling protocol that in the proposed communication platform is based on the P2PSIP protocol [8]. P2PSIP is a peer-to-peer approach of the Session Initiation Protocol (SIP) [9] communication protocol that enables solutions for distributed storage of user information such as registration info along with logical position within the overlay and then handles all possible user queries leading to real-time communication session establishment. The most essential component of P2PSIP architecture is no other than the Distributed Hash Table (DHT). Its distributed nature originates on the fact that it is divided into several parts each located inside an overlay peer and stores the physical addresses of all participating nodes in order to use them for resource availability lookup. Each node that receives a query for the address of a certain node searches its DHT fraction and if it contains the requested info, it returns it to the node that posted the query, otherwise it forwards the request to its logical neighbors [10].

2.3. Routing Protocols

Due to the multihop nature of the packet forwarding in MANETs, routing is a primordial task directly affecting their performance. Therefore, efficient routing mechanisms are integral to a communication platform providing real-time multimedia in this context.

Designing a routing protocol for MANETs has been a very active research field in the recent years. The challenges that must be met are both numerous and diverse. The frequently changing topology is among the basic factors that must be taken into account, since the routes calculated at every time instance are subject to repeated change. Another priority is energy conservation, since, in the general case, the lack of fixed infrastructure means that energy recharging in network nodes is very difficult or costly, if not impossible. Furthermore, attention should be paid so as to evenly distribute the traffic across the network.

Traditionally, the basic categorization of routing protocols in MANETs is made based on whether their operation is proactive or reactive (on-demand). A protocol is defined as proactive when the routes from every node to everyone else are calculated and updated in a periodic fashion, while in the on-demand case a route is obtained only upon request from a packet. Both categories have advantages and disadvantages. Depending on the special characteristics of the network deployment setting, each of the aforementioned types of routing protocols may be suitable. Typical representatives of the proactive protocols are DSDV [11] and OLSR [12], while DSR [13] and AODV [14] are classic examples of on-demand protocols.

In the proposed communication platform a hybrid routing protocol is employed, ChaMeLeon (CML) [15]. The main concept of CML is the adaptability of its routing mechanisms according to the changes in the network topology. More specifically, it consists of 3 phases of operation, namely proactive, oscillation, and reactive. The basic criterion for the operation type selection is the network size. In relatively small networks, routing is implemented in a proactive fashion using the OLSR protocol, whereas when the number of nodes grows larger, CML utilizes the reactive Ad hoc On-Demand Distance Vector (AODV) [14].

2.4. Architecture Design

While attempting to implement a certain prototype that takes under consideration the latest trends in all aforementioned signaling, architecture, and routing protocols there were certain issues to be addressed mostly regarding the actual communication scheme amongst all participating entities. For being totally clear prior to the implementation the signaling diagram presented in Figure 2 was introduced.

There are five individual entities in this signaling diagram, the Joining Node (JN) which tries to access the overlay for the first time, the Initial Overlay Node (ION) which acts as the point of entry of the JN, the Bootstrap Node (BN) which operates as the key entity when it comes to DHT and overlay access and update, the Intermediate Node (IN) which plays the role of the packet carrier in the communication process and last but not least the Destination Node (DN) with whom the JN tries to establish communication in the first place. Normally, between the JN and DN there are several peers acting as IN thus facilitating communication. In all the tested scenarios described in the following paragraphs, more than one intermediate node is present. Nevertheless, for simplicity reasons for this particular diagram only one hop in the overall communication process between JN and DN is illustrated.

The joining peer first initiates a neighbor discovery mechanism in order to detect if there is someone within its communication range. JP broadcasts a HELLO message that is identified by every peer already connected to the overlay and receives an ACK HELLO message as response. Then JP transmits a JOIN Request to the ACK HELLO message sender, which in this particular case is the ION. After receiving the JOIN Request, ION forwards it towards the Bootstrap Node. BN then issues a JOIN Response towards the JN along with a STORE Request containing the public keys of JNs’ logical neighbors, while JN confirms the successful reception of the later with a STORE Response message. Bootstrap then issues UPDATE Requests towards both JN and its logical neighbors (in the particular case the IN) informing them that they are connected to each other, in a message that contains the local view of the JP for the overlay. Finally, BN asks for overlay join confirmation by the JN receiving a 200 OK message in response.

Communication between JN and DN will include video and sound over IP network. Therefore after the software described in Section 2.5 becomes operational, JN sends a request regarding Videoconference (VC) initiation. The particular request is being forwarded through IN to the DN in a single-hop path which will be used for all signal and packet transfer for the whole session. DN responses also travel through the particular single-hop path. After JN receives the 200 OK message to its VC request the actual call begins. Through the monitoring software, the whole call was identified as UDP packet traffic, a common method used in all VoIP and real-time communication schemes. After the conclusion of the call JN sends a Termination request and the overall process finishes through a 200 OK message from the DN.

2.5. Extended WengoPhone Prototype Implementation

A prototype implementation was a matter of utmost importance for the necessary proof of concept of the solution we propose in this paper regarding real-time multimedia communication over ad hoc networks. With voice and video being the two most fundamental elements of human interaction, the prototype was designed in such a way that endorses both attributes in a seamless binding. Voice is integrated by the VoIP capability, the platform included and the additional feature of videoconference support were added to further improve Quality of Service and user experience. The implemented software utilized libraries and repositories of a certain project called WengoPhone [16], a SIP compliant VoIP client developed under the GNU General Public Licence (GPL). In addition, we extended the provided features of the software by developing tools that initiate a mobile device’s web camera, capture video, and encode it using the H.263 video compression algorithm [17]. This algorithm is considered to be optimized for video transmission over wireless networks and also published under open source licence, unlike it’s successors that have many patent limitations blocking us from exploiting all their attributes. The latest version of WengoPhone with the new enabled features has a redesigned Graphical User Interface (GUI) that is shown in Figure 3.

Providing that video quality is a key feature, we decided to include several quality levels that can be accessed instantly by a menu located in the bottom of the graphical interface presented in the previous figure. The new window for video settings is shown in the previous picture. After setting all attributes in order according user’s preferences along with wireless network’s capabilities, a video call is performed as shown in Figure 4.

The application was developed using Microsoft Visual Studio 2005 and was tested on three different Asus EEEPCs having the followin specifications: Atom CPU running at 1.6 Ghz, 1 Gb RAM, 160 Gb HDD, and Windows 7 Professional Edition, each representing a mobile device of an extreme emergency communication scenario. For the GUI, Qt, a cross-platform application framework was used by installing a certain widget toolkit, since several alterations were considered necessary in terms of user friendliness, for providing direct access to the new features. No external video camera or wireless card was used apart those provided by the EEEPCs, a 1.3 Megapixel Logitech Camera and an Atheros 802.11 g/n compatible interface. A major concern during Wengo Phone extensions development was how to abandon the monolithic architecture the previous version had and move towards a more flexible platform that would make future changes and code upgrades easier. We manage to achieve this goal by keeping the H.263 algorithm implementation relatively modular, avoid to temper with preexisting pieces of code when possible, apart in case that this was absolutely necessary.

3. Results

The software prototype described in a previous paragraph requires a whole testbed for extended evaluation of the new abilities it supports. In particular six wireless nodes forming an ad hoc network were used in order to check voice and video quality. In all our tests, although we expected that this relatively demanding platform would consume all available CPU and memory capacity of a low-range piece of equipment such as an EEEPC, CPU utilization never exceeded 40% and memory was constantly under 15% of its total capacity. The topology used for our tests is illustrated in Figure 5.

Only two out of six nodes had Wengo Phone software installed and they are depicted clearly in all pictures as WengoClient 1 and 2. Most nodes forming the necessary ad Hoc network use Windows 7 as their operating system, together with a compatible OLSR implementation for routing purposes. There is one exception the node that plays the role of the Bootstrap Node which is running Ubuntu Linux. Static IP addresses were given to all nodes in order to ensure that no interference from any foreign network will compromise the evaluation. OLSR and CML protocols were configured in such a way that a certain amount of hops between nodes to be established, according the two evaluation scenarios presented in the following section.

3.1. Evaluation Scenarios

For evaluating the performance of the implemented software, two scenarios were designed. Nodes 1 and 2 having Wengo Phone installed are trying to establish a call to each other. In order to achieve that call, they first have to join the overlay which is maintained by the Bootstrap Node. The main difference in these two scenarios is the actual routing configuration in OLSR and CML protocols. In the first scenario all IP packets involved in the call between Node 1 and Node 2 are being diverted through Node 3, achieving a total amount of two hops whithin the network overlay until they reach the destination node, Node 2 and 1, respectively. This is expected to keep the packet loss and the jitter relatively low thus providing a service as close to direct connection calls as possible. In the second scenario the OLSR and CML configuration has changed dramatically. Instead of a total of two hops, IP packets from Node 1 to Node 2 are being routed through all overlay nodes which means a four-hop route. This extensive rerouting is likely to increase packet loss and jitter causing issues in video quality due to image distortion. In order to have a more complete estimation regarding the performance of our software implementation in terms of video delivery under all conditions, two sets of measurements per scenario were taken. During the first one, video quality was set to “Normal” requiring available bandwidth of 0–512 Kbit/sec for upload and 0–128 Kbit/s for download, while for the second the video quality was increased to “Very Good” requiring available bandwidth of 512–2048 Kbit/sec for upload and 128–256 Kbit/s for download, quite an increase compared to the previous scenario. For monitoring all traffic between nodes we used the Wireshark network analyser [18]. In both scenarios the actual length of each call was set to 120 sec, enough time frame for measuring all network characteristics.

3.2. Experimental Results

Real-time applications such as VoIP and Video conferencing platforms are extremely sensitive in terms of jitter and packet loss. Packet loss compromises voice and video quality since data flow from the source to the destination is interrupted instead of being a continuous event, time slot is expired and the communication becomes unbearable. System cannot have the luxury to wait for a retry as in other applications and users do not experience best possible Quality of Service. Jitter refers to undesired variation in packet receiving. If there is a traffic delay, data might be buffered accordingly but when this delay keeps accumulated, buffer can no longer sustain packet delivery and this could result in video distortion or jerkiness. When it comes to voice jitter it causes gaps in communication and problematic system behaviour in general.

We used Wireshark [18] network analyser in order to monitor and record traffic and packet exchange between nodes as shown in Figure 6. In both scenarios, Wireshark daemon was strategically placed so that traversal packet flow can be logged, thus creating a file for further analysis. Focusing on Jitter and Packet Loss we were able to compare the overall testbed behaviour, not only the improved version of WengoPhone we developed. Our analysis brought to light several interesting results regarding multihop topologies and how the hop number increase inevitably leads to voice and video quality deterioration.

Figure 7 presents jitter measured when a video-call session was established between Nodes 1 and 2 in each of the scenarios presented earlier. Video quality is set to “Normal” in both sessions making number of hops their only difference. In Scenario 1, when only two hops are needed for a UDP packet to cross the gap between source and destination, jitter never exceeded 60 ms in absolute numbers. On the other hand, in Scenario 2, with four hops between source and destination, jitter had an average value of 81 ms, significantly increased compared to the previous topology, yet acceptable.

We measured jitter again, this time after terminating all previous calls and restarting WengoPhone setting video quality of the established video-call session between Nodes 1 and 2 to “Very Good”. This time jitter values were generally increased due to higher traffic caused by packet increase in each topology. The results are shown in Figure 8, with jitter in Scenario 1 having a mean value of 64.3 ms while that of Scenario 2, with four hops end-to-end, rises well above 110 ms. A jitter value greater than 100 ms urges for buffer implementation, a temporary fix sometimes of no use in real-time applications.

In addition to jitter, Packet Loss was measured for all previously mentioned scenarios and video quality settings. The results are shown in Figures 9 and 10.

Typical numbers for acceptable packet loss during a videoconference system range from 0.1% to 1%. Video is less sensitive than voice. According to VoIP standards packet loss greater than 1% is likely to compromise the whole session, leading to several audio drop-outs [2]. This inevitably draws the conclusion that in the four-hop routing of Scenario 2 users might experience problems, since packet loss percentage rises to 1.03% after less than 120 sec of established audiovisual communication.

3.3. Subjective Evaluation

The International Telecommunication Union (ITU) has published several recommendations that intend to define standards for subjective assessment methods to be used in the one-way overall audiovisual quality evaluation. These methods can be used for several different purposes, including but not limited to ranking of audiovisual system performance and evaluation of the quality level during an audiovisual connection. The most commonly used ITU Recommendation is P.911 [19], where several test methods and experimental design techniques are presented. Although a number of such methods have been validated for different purposes, the final choice of one of these methods for a particular application depends on various factors such as the context, the purpose, and where in the development process the test is to be performed. Out of all proposed methods described in [19] the most suitable for our evaluation seems to be the Absolute Category Rating (ACR). More information regarding this method as well as the numerical evaluating scale we used can be found in Appendix.

3.3.1. Subjects

The possible number of subjects in a viewing and listening test along with usability tests on terminals or services varies from 6 to 40. Four is the absolute minimum for statistical reasons, while there is rarely any point in going beyond 40 [19]. The actual number in a specific test should really depend on the required validity and the need to generalize from a sample to a larger population. In general, at least 15 subjects should participate in the experiment. They should not be directly involved either in picture or audio quality evaluation as part of their work and should not be experienced assessors. Prior to the session the observers should usually be screened for normal color vision as well as normal or corrected-to-normal acuity. During our evaluation 32 individuals were divided in two groups. Each group participated in videoconference sessions in random pairs without knowing the total amount of intermediate hops packets were routed into, for not making them provide relatively biased overall experience evaluation.

After a videoconference session of 120 sec was concluded, subjects were given a form containing a set of four standardized questions for the aforementioned rating. These questions were as follows.(1)How would you rate the video quality of the connection?(2)How would you rate the audio quality of the connection?(3)How would you judge the effort needed to interrupt the other party?(4)How would you rate the overall audiovisual quality?

Answers for questions one, two, and four were given on the nine-level scale with greater numbers indicating more assigned points ergo proportional to user satisfaction. The third question was evaluated in a different way; since in a perfect video-call little effort would be sufficient for one participant to interrupting the other, points were assigned disproportional. More points indicate less effort therefore better user experience.

3.3.2. Subjective Evaluation Results

In Figure 11 the results of video quality evaluation are shown. A significant fact is that although video quality was set to “Normal” in both cases, there is little yet existing variation in user scores. This has a very profound explanation. Video quality in our testbed depends not only on the software prototype configuration but to the architecture and topology parameters as well. This means that video sessions of better quality that might reach to their destination distorted, delayed, or jerked due to jitter or latency is likely to satisfy users less than lower quality ones, scoring lower than expected. In our evaluation this seemed to be the case.

Audio quality evaluation results are shown in Figure 12. Once again network parameters seem to play a vital role to user satisfaction, since in almost all cases scenario with two hops leading to lower jitter and packet loss prevailed in user preferences.

In Figure 13 interruption effort results are depicted. Once again there is a slight user preference for the two-hop topology. This might be explained if we consider the fact that although video is involved in the conference, the most natural way of human communication is verbal and when this is compromised for instance due to jitter, subjects feel uncomfortable and assign points accordingly.

In Figure 14 overall audiovisual quality evaluation can be found. Scenario 1 once again overcame Scenario 2 in terms of user satisfaction proving that in video-conference applications as well as in all real-time communication platforms, network topologies having better jitter and packet loss ratings shall gain momentum over more complex but less effective ones.

4. Conclusions

The task of evaluating such a complex ad hoc network platform including a real-time communication prototype proved to be extremely challenging. Using all necessary software tools as well as modern industry standards we were able to perform both subjective and objective evaluation of our proposed solution. Two different scenarios were designed, based on network characteristics as well as prototype restrictions. In the first one packet routing was delivered in a total of two hops end-to-end, while in the second one a total of four hops was introduced. The results show that the proposed platform operates seamlessly in two hops, while in the four hops scenario, audio and video are delivered with marginal distortion. The conducted survey indicates that the user experience in terms of Quality of Service obtained higher scores in the scenario with the two hops. On the other hand, the objective test bears many consistencies to the subjective one. Network characteristics measurements acquired during the test indicate that the key elements of jitter and packet loss were slightly compromised in the four-hop scenario, a fact that shows clearly the limitations this topology has until today. Future research is definitely going to improve all characteristics of the mobile ad hoc networking and we hope this paper, based on an implementation rather than simulation results, paved a path towards that direction.

Appendix

A. Absolute Category Rating

The Absolute Category Rating method is a category judgment where the test sequences are presented on at the time and are rated independently on a category scale. In our case, instead of a video/audio sequence, the evaluation refers to the videoconference and its parameters. The method specifies that after each videoconference session, the subjects are asked to evaluate the session’s quality. No explicit reference is provided by the method, although subjects will always use an implicit one. Videoconference session can be no longer than 2 minutes yet no shorter than one minute, thus providing enough time for user to consider and assign points according their overall experience. This is persistent with our measurment scenarios that limit the total call duration to 120 sec. ITU recommends a five-level scale for rating overall quality [19]. In our evaluation since higher discriminative power is required because of the low bit rate of the video conference encoding algorithm, the nine-level scale presented in Table 1 is going to be used.

Additional examples of suitable numerical or continuous scales are given in [19], which also provides examples of rating dimensions other than overall quality. Such dimension may be useful for obtaining more information on different perceptual quality factors when the overall quality rating is nearly equal for certain systems under test, although the systems are clearly perceived as different. In our case the simple nine-level scale is considered rather accurate. For the ACR method, the necessary number of replications is obtained by repeating the same test conditions at different points of time in the test. ACR is easy and fast to implement and the presentation of the stimuli is similar to that of the common use of the systems. This attribute renders ACR an optimal choice for qualification tests.