Abstract

Bioinformatics is an active and important research discipline in which molecular data is exponentially growing in complex nature. Because of the substantial research in this field, researchers are faced with critical issues such as bandwidth, storage, and complexity in order to retrieve molecular data. It becomes very difficult to conduct research using low computational devices such as Internet of things and sensors. We are employing migration of the agent technique to decrease network traffic and to mitigate the client’s limited resource problem by utilizing server-side resources to perform large-scale computation. Our proposed solution does not necessitate additional storage or processing power on the client’s side which makes it cost effective. In the proposed solution, (i) an agent visits service provider containing biological data, say sequences requested by the client, (ii) agent fetches the required data, and on the server side it will manipulate the data, and (iii) returns along with the required results to its source platform. Thus, it solves the bandwidth, storage, and computational issues without involving the low resources of the client. For the proof of concept, Java Agent Development (JADE) framework is used as an implementation tool and the results are compared with Java Remote Method Invocation (RMI). It is important to note that our findings reveal that our strategy saves the user up to 16.25% of average time with respect to bandwidth. On the other hand, our approach takes 46.82% less time than the other with respect to data that the agent carries. In addition to the previous contributions, our approach acts as a mashup, to collect data in different format from several service providers, and converts it in any required format. Thus, it solves the problem of complexity hidden in the nature of the data to increase the researchers’ productivity.

1. Introduction

The volume and complexity of data over multiple service providers are generated exponentially. Now the issue is extraction, retrieval, and processing of relevant information which has made obvious the need for a system to facilitate users. Data coming in from multiple domains needs to be integrated together to provide a cohesive view. One technology that massively helps facilitate this goal is the concept of mashups [1, 2]. Mashups help users get an integrated user-oriented view of data and code from multiple heterogeneous sources. Trendsmap, housingmaps, Wheel of Lunch, and InstantWatcher are some examples of mashup. A mashup is a web application that stitches together the contents, presentations, and application functionalities from multiple sources and gives them a new and useful look. In other words, it combines multiple services into a single one [3, 4]. In this paper, we apply the concept of mashup through multiagent paradigm to the domain of bioinformatics.

The bioinformatics [5] and [6] is about understanding biological data and is a growing field of research. With the advance in technology, the amount of biological data is growing at a tremendous pace. This makes the field of bioinformatics important to the society. They have also made strides in understanding how they interact with other proteins, which is known as protein-protein interactions. For example, data reports [8] and [9] also include both structural data [8] and other data [10], [11] about proteins. A wide range of computational techniques has proposed, primarily for image manipulation and pattern detection. With NGS, analysts are analyzing precious data and doing a lot of intensive work to find patterns. More importantly, it is considerably difficult for them to create complex maps from heterogeneous sources.

There are various experimental approaches [18] which are very expensive and time consuming because they require a lot of resources and time to measure the physical interaction among proteins. They have a high possibility of error because experiments are purely carried out in the lab and are not standardized. The adoption of agent technologies and multiagent systems constitutes an emerging area in bioinformatics where data is quite big (that is, in gigabytes) and complex in nature. Researchers face the problem of data retrieval due to low (1) bandwidth, (2) storage, and (3) computation on their machines. For the processing, analyzing and transportation of multilevel complexity of molecular data require high bandwidth and storage. Thus, it becomes very difficult for researchers to conduct research with low power researches. This study proposed agent migration-based approach [1921].

The main advantage of an agent-based approach is that the agent will get the request from the requester and visit the service provider, which saves the time for the data provider to make the data available, and agents will return back to the requester. Another advantage of using agent-based approach is that we will be able to transfer data to the machine which has high computational power. The computation is done at the service side. And the results are available in the same format as they were at service side. This will help to reduce the data size and transfer the data to the machine which has high computational power. This will give better efficiency, reduce network congestion, and transfer the data to machine with high computational power. The main theme of this paper is to use agent-based methodology which tackles processing and network issues of multiagent systems. It processes all the necessary steps on the client side with low computational resources.

The rest of the paper is organized into different sections. Literature review about bioinformatics, multiagent systems, and mashup is provided in Section 2. Section 3 provides a list of complete steps of our agent-based solution along with details. In Section 4, as a proof of concept, a reference implementation is listed. Section 5 provides the analysis and discussion of the proposed solution. Section 6 concludes this study along with future directions.

2. Literature Review

In this section, literature about bioinformatics mashup and multiagent systems is presented. In each section, the importance of each domain is provided. We first turn to the target domain, namely, biofoundation.

2.1. Bioinformatics

Bioinformatics is an interdisciplinary field that mostly uses computer as a computational tool for solving issues related to the biological data. Such computing devices are used for the analysis of the internal structure and biological functions of living organisms. The main purpose of computing devices is giving an efficient structure to the data so that it could be interpreted accurately. Mainly it deals with genome and protein. One of the important characteristics of bioinformatics is personalized medicines. It is the application of computer processing techniques to the field of genetics and biochemistry. This is a branch of computer science that deals with the storage, retrieval, and analysis of biological data [11]. It is a classification of data in a standard manner. The data are analyzed in order to determine their structure and content [12]. The bioinformatics includes the research in the field of genetics and genomics. This branch of science is implemented in various fields such as the study of evolution and phylogeny of various species [13]. The data generated by the bioinformatics are stored at various data centers. These data can be used for the diagnosis of the diseases and for the treatment and prevention of the diseases [14].

The protein-protein interactions [22] are of extreme importance because they play a vital role in many biological processes, such as signal transduction and transcription regulation. They also act as protein-based modules that are extensively used by nature to build complex systems. Therefore, they are a primary objective of many bioinformatics algorithms. However, in the field of protein-protein interactions, the problem of interactions between proteins in the context of the entire proteome has received less attention. In this context, a recent study has been carried out by proposing a new protein-protein interaction network. The network is based on the comparison of the entire proteome between two different organisms.

Protein-protein interactions [8, 9, 23] are a key element in the study of molecular biology, as molecular interactions are at the foundation of all biological systems. In this regard, the interactions between proteins have been extensively studied [10, 24]. Protein Interactions by Structural Matching (PRISM) [2527] is an online web tool for predicting protein-protein interactions with high confidence. It is based on the structural and functional domain similarity of proteins. The first step in the PRISM-2 algorithm is to generate a structural alignment of two proteins. The proteins are compared by hand-determined structural similarity, and this similarity is used to generate a structural alignment.

There is a gigantic development in the organic succession where a huge amount of data is being made and transferred on the sites/servers. Presently to get the information we would have to communicate with the connection point utilizing electronic inquiries [28]. This means that the user has to click on a single link to access all the linked websites. This is very time consuming and tedious to stay online each query. The idea of a mashup is to integrate multiple datasets into a single system. The paradigm of a mashup combines the data from multiple heterogeneous data sources. It is a great way to combine diverse data sources into a single view. Mashups are a great way to implement the existing data into a new structure. It is used in situations where the data from multiple heterogeneous data sources are required to be combined into a single view. The main idea behind the mashup is to integrate the data from multiple data sources into one system.

2.2. Mashup

Information retrieval is becoming a challenging task due to rapid proliferation of data. It becomes more complex when the required information is scattered on multiple service providers. This complexity demands an efficient system to retrieve the desired results in an appropriate manner. There are different approaches to retrieve the information, combine it, and give a desired look; mashup is one of such approaches [29]. Mashup gives entirely a new and different look or some added value to the existing data for end users. Service providers provide APIs which act as an interface for data and services. Some APIs are free, and some are proprietary in natures that need authentication and authorization. Asynchronous JavaScript and XML (AJAX), Representational State Transfer (REST), and Services-Oriented Access Protocol (SOAP) are some state-of-the-art technologies that have influenced the mashup architecture [30]. REST, screen scraping, and RSS feed/widget are used to retrieve the contents from other websites. It is widely developed for web applications such as social networking, e-government, enterprise resource management, real state, and more [31, 32].

A number of tools exist to create mashup such as Yahoo Pipes [33] or IBM Mashup Center [34]. Traditionally, a mashup runs inside a web browser, but there are also some other environments for it. Two important styles of mashup are server-side mashup and client-side mashup [35]. The difference between server-side and client-side mashups is the way the data is processed. In a server-side mashup, also called a proxy-style mashup, a web server serves the mashup to retrieve all the data from multiple web hosts, and stitching takes place on server side and is rendered on client’s web browser. In a client-side mashup, opposite to server-side mashup, stitching of the services and contents takes place on the client, namely, within the web browser. These are also called Rich Internet Applications (RIAs) and have the added advantage of prompt response over server-side mashup. A mashup can be either a consumer or enterprise [36]. A consumer mashup also known as service or client mashup integrates data from multiple public sources inside the browser, for example, iGuide; server-side mashup is the target of this study. Both styles of mashup have their own obvious benefits, as both provide new insight into existing resources. But using such mashup tools, users must trust them. So user data is not secure, since it has to be released to the third parties. We address this issue using the multiagent paradigm.

2.3. Multiagent Systems (MAS)

Multiagent system is the collection of multiple software agents [37]. A software agent is a piece of code that works autonomously and communicates with other agent-oriented and non-agent-oriented software [38, 39]. The basic building blocks of an agent consist of code, data, and state. The data part represents the data structure to preserve important information about the expression before and after evaluation. The configuration of the agent is stored in its data and state parts. It contains information about platforms which changes dynamically when it travels from one node to another node within a network. The code part of an agent is the collection of ordered statements that remains nearly constant during the execution though it can change when required. It represents the actual logic of the agent. The state part of an agent represents the current status of the data part. Basically, state is the collection of information of all data structures.

2.4. Significance of Multiagent Systems

Agent-oriented software paradigm has become a promising technology which is widely used in distributed environments such as e-commerce [40], network management [41], data mining [42], robotics [43, 44], and information extraction [45]. Some interesting applications of agent systems can be found in healthcare system [46, 47] for patient scheduling, storing medical records of patients, and sharing them with concerns. Agent-based system, also called multiagent system [48], is the system in which multiple agents interact, cooperate, and coordinate with each other. Such system, loosely coupled, enhances the capabilities of monolithic system to perform different tasks which are beyond the scope of individual agent. It is widely used to share or get resources over the network among agents. The resources might be computational, logic to solve the problem, software or expertise distributed temporally and spatially. Normally, systems are categorized into two categories: client (to make a request) and server (to server) but multiagent systems combine the benefits of both in a social, proactive, and reactive manner.

2.5. Design Issues in Multiagent Systems

The most important design issue for multiagent systems is how they will communicate among each other and with other entities. The starting point is to select any tool or middleware to facilitate developers to get the core benefits of this technology rather than to resolve the basic issues of communication. So some standards are needed prior to deploying such system. The Foundation for Intelligent Physical Agents (FIPA) [49], AGENTLINK [50], and OMG Agent Platform Special Interest Group (PSIG) [51] are the leading standardization bodies to promote agent technology. This study focuses on FIPA for agent reference and development model. There are various tools for agent-based modeling like NOMADS [52, 53], AgentScape [54], Agentcities [55], Aglets [56], Voyager [57], Janus [58], TACOMA [59, 60], Grasshopper [61], JADE [62, 63], JaCaMo [64], Addre Jason [65], and ABLE [66].

JADE [62] is used to launch an agent platform. The most important reason is that it is an open source middleware under the Library Public License (LGPL). It is entirely implemented in the Java language, which makes it more portable and smarter. It is one of the most popular middleware types within the research community. It alleviates the implementation of multiagent systems (MAS). It provides a set of graphical tools which make it very easy to deploy agent platform on a standalone system as well as over a distributed network. This study highly recommends JADE as agent middleware. Its infrastructure is very flexible and agent community is adding different add-ons to enhance its features. It is compliant with the FIPA-IEEE computer society specifications. The core concept of FIPA is to resolve the issue of interoperability and it has extended FIPA’s model in multiple areas. According to the JADE specification, the mobility of an agent can be categorized into two types: inter- and intraplatform. In intraplatform mobility, an agent migrates itself between containers of the same platform but cannot move to containers of the different platform. In intraplatform mobility [67], an agent moves among different platforms. In interplatform mobility, the agent leaves its own main container and joins another main container of another platform. The main focus of this study is interplatform mobility; see step 1 for details of each and how an agent can migrate from one platform to another.

2.6. Bioinformatics and Multiagent Systems

This study also explores the area of bioinformatics as a real application of multiagent systems and explores how a mobile agent can operate in a highly dynamic environment for data dissemination. A mobile agent visits different itineraries to collect the required information and stitch it together to provide a new shape. We propose agent migration to mitigate the aforementioned issues by moving the agent to the server side to perform computations [21, 68]. In a nutshell, this study proposes agent migration characteristic to make it a mashup. Hence, it can be used for data dissemination.

3. Solution

We propose mobility characteristic of an agent to find a solution. The solution provides accurate and fine-grained result even though the bandwidth and storage of the client might be low. It also deals with complexity present in the nature of the data as well as in the dynamic environment. In the agent migration approach, an agent is executed in a client machine; the agent migrates to another machine when the original machine is overloaded. To migrate an agent from one machine to another, the agent must be able to traverse the network. The abstract details are in Figure 1.

Java Remote Method Invocation (RMI) [69] is a way to extract data from the server as it allows remote access to Java objects on a remote host. It is light weight communication protocol. The payload of Java RMI is the Java object that contains references to the remote methods. A Java object on the remote host is a Java object that is created on the remote host. A client stub object, which is a proxy object that contains references to the remote object, is used to access remote object. The complete steps which were used in this study are listed in Figure 2.

Mobile agents use the resources of the system to complete the tasks and then get back to the system where they started their execution [10, 11]. Agent-based systems are a powerful and effective way to develop intelligent systems because of their simplicity and extensibility. They are more effective in handling with the problems related to distributed, parallel, and autonomous systems.

It is also effective in handling with the problem of complex systems. The reason behind it is that they are not heavy in computation and they can be used on multiple systems at the same time. This makes it easy to design and implement these systems. The steps which are carried out in this study are mentioned in Figure 3.

4. Reference Implementation

To deploy agents, various agent frameworks are available [70, 71]. Java Agent Development (JADE) framework is FIPA Agent Markup Language (FAM), a language designed to be used to model agent systems. It is compliant with the FIPA Agent Communication Language (ACL) specification and with the FIPA Agent Communication Framework (ACF). We provide the source of our own reference implementation at https://github.com/BioAgent.

For the testbed configuration, two personal computers were used: one as a server and the other one as a client. Both systems were connected through the 4G Huawei E5573s-320 which is a pocket WiFi router. Table 1 shows both hardware and software details of both personal computers.

5. Results and Discussion

This section presents the study carried out on the performance of mobile agents and Java RMI. A detailed discussion of the results is carried out in this section. Java RMI and agent migration approached are compared. Due to the fact that mobile agents are not dependent on the host application and can be independently transferred to another host, an effective approach to large-scale agent migration has been proposed.

Table 2 shows the amount of network load made by client using Java RMI and our agent-based approach. The agent approach is more efficient than Java RMI approach because it decreases the number of network calls made by the client. The agent approach is more efficient because the agent is migrated to the server only when there is a need for it. As a result, the client makes fewer network calls, and the overhead of the network calls is reduced. Therefore, it is clear that the Java RMI approach has more network load than the agent approach. In the agent approach, the number of network calls is reduced by migrating the agent. It is important to note that, in the agent approach, the agent is only migrated if there is an urgent need for it.

The Java RMI approach is a lot more mature compared to the agent-based approach. Table 2 provides a summary of the results of Java RMI and agent approaches based on network load. The agent migration approach does not have any network load as it needs only interconnectivity.

In Figure 4, the x-axis represents the size of the extracted data from multiple databases, while the y-axis shows the network load consumed by both approaches. According to Figure 4, we conclude that Java RMI is showing an increasing trend in result size. But, at the same time, the curve shows an increasing trend in network load. Thus, while achieving high results, network load also increases with time. That is why the curve has been shown in the graph. It shows a direct relationship between result size and network load. The result size is dependent on network load. On the other hand, an agent-based system shows high results with a constant value of network load. That is why the graph of an agent-based system is a straight line.

In Figure 5, we can see that the agent-based approach gives better results as compared to Java RMI when the result size increases from 2 kB. Similarly, the response time is only 10 seconds at result size of 5 kB. Furthermore, 10 kB result size is achieved at a response time of only 15 seconds.

From Figure 5, which is based on Table 3, we can conclude that, in the Java RMI system, result size is directly proportional to response time. As the size of the result increases, the response time also increases. It will take more response time to achieve a high volume of results. On the other hand, an agent-based system shows a high return size with 46.82% less response time than the other with respect to data that the agent carries. The average of Java RMI approach is 20.21785714 while the average time of our approach is 10.75047619. The difference of both approaches is 9.467380952.

In Figure 6, the blue line shows the agent graph and the red line shows the graph of Java RMI. We can clearly see that if we decrease the bandwidth, our agent is computing faster as compared to Java RMI.

According to Table 4, the average responses of both approaches are 20.21785714 and 10.75047619. It is important to note that our findings reveal that our strategy saves the user up to 16.25% of the average time with respect to bandwidth.

6. Conclusions and Future Work

In this study, we have designed an agent migration approach for transferring the information between the clients. The agents migrate from client to client to collect the data and transfer it to a central server. The client uses the agent’s services. Feedback service of the agent is used to ask the client for any information required by the agent. The client can provide information to the agent to ask the server for any service the client requires. The agent can migrate between the client and the server. The client can also request the agent to migrate to any other client. This approach has many advantages. The agents are intelligent, and they can even work well in low network areas. They can be used for many generic purposes. The agents can be used to find out the interactions between proteins. This approach can be used for many bioinformatics problems like finding out the similarity of sequences, or even finding the missing sequence in known sequences. The findings also show that mobile agent technology leverages network load and storage on the client side and heterogeneous data can be converted into homogeneous format. The main limitation of this study is the deployment of agent environment on client and service side. This approach does not demand the availability of the user online for a full time period. Our research can be modified to make it work on different bioinformatics problem, like viewing the interaction of sequences. It can also be used to find out the similarity of sequences. By modifying the approach, one can also find out the similarity of proteins, or even find the missing sequence in known sequences. It is also possible to find out the similarities between different organisms.

Data Availability

All relevant code samples can be found at GitHub-shahshakir/BioAgent (https://github.com/shahshakir/BioAgent/).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.