The Scientific World Journal

Volume 2015 (2015), Article ID 545308, 9 pages

http://dx.doi.org/10.1155/2015/545308

## Composition of Web Services Using Markov Decision Processes and Dynamic Programming

Facultad de Matemáticas, Universidad Autónoma de Yucatán, Anillo Periférico Norte, Tablaje Cat. 13615, Apartado Postal 192, Colonia Chuburná Hidalgo Inn, 97119 Mérida, YUC, Mexico

Received 26 June 2014; Revised 17 September 2014; Accepted 14 October 2014

Academic Editor: Ahmad T. Azar

Copyright © 2015 Víctor Uc-Cetina et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We propose a Markov decision process model for solving the Web service composition (WSC)
problem. Iterative policy evaluation, value iteration, and policy iteration algorithms are used to
experimentally validate our approach, with artificial and real data. The experimental results
show the reliability of the model and the methods employed, with policy iteration being the best
one in terms of the minimum number of iterations needed to estimate an optimal policy, with the
highest Quality of Service attributes. Our experimental work shows how the solution of a WSC
problem involving a set of 100,000 individual Web services and where a valid composition
requiring the selection of 1,000 services from the available set can be computed in the worst
case in less than 200 seconds, using an Intel Core i5 computer with 6 GB RAM. Moreover, a real
WSC problem involving only 7 individual Web services requires less than 0.08 seconds, using the
same computational power. Finally, a comparison with two popular reinforcement learning
algorithms, sarsa and *Q*-learning, shows that these algorithms require one or two orders of
magnitude and more time than policy iteration, iterative policy evaluation, and value iteration to
handle WSC problems of the same complexity.

#### 1. Introduction

A Web service is a software system designed to support interoperable machine-to-machine interaction over a network, with an interface described in a machine-processable format called Web Services Description Language [1]. A Web service is typically modeled as a software component that implements a set of operations. The emergence of this type of software components has created unprecedented opportunities to establish more agile collaborations between organizations, and as a consequence, systems based on Web services are growing in importance for the development of distributed applications designed to be accessed via the Internet.

When a Web service is requested, all available Web services descriptions must be matched with the requested description, so that an appropriate service with the desired functionality can be found. However, since the number of available Web services is continuously growing year by year, finding the best match is not a trivial problem anymore, especially if we take into account that the matching criteria must consider not only the desired functionality, but also other attributes such as execution cost, security, performance, and so forth.

If individual Web services are not able to meet complex requirements, they can be combined to create composite services [2]. A composite Web service has one initial task and one ending task, and between the initial and the ending tasks there can be individual tasks connected in sequential order. To create a composite Web service it is necessary to discover and select the most suitable services. The complexity of WSC involves three main factors: the large number of dynamic Web Services instances with similar functionality that may be available to a complex service; the different possibilities of integrating service instance components into a complex service process; various performance requirements (e.g., end-to-end delay, service cost, and reliability) of a complex service.

##### 1.1. Related Work

Some approaches to solve the WSC problem have focused on different graph-based algorithms [3–8]. Some others have proposed to use optimization methods specially designed for solving constraint satisfaction problems, such as integer programming [9], linear programming [10], or methods for solving the knapsack problem [11]. Artificial intelligence methods such as planning algorithms [12–14], ant colony optimization [15], fuzzy sets [2], and binary search trees [16] have been used too.

The use of methods based on Markov decision processes (MDPs) for the composition problem is certainly not new. In [17], the problem of workflow composition is modeled as a MDP and a Bayesian learning algorithm is used to estimate the true probability models involved in the MDP. In [18], the WSC is solved using QoS attributes in a MDP framework with two versions of the value iteration algorithm: one backward and recursive and one forward version. In [19], the authors proposed the use of what they call value of changed information. Their approach uses MDPs focusing on changes of the state transition function, in order to anticipate values of the service parameters that do not change the WSC. In [20], a combination of MDPs and HTN (Hierarchical Task Network) planning is proposed.

Solutions based on reinforcement learning are also relevant. For instance, in [21], reinforcement learning and preference logic were employed together to solve the WSC problem, obtaining some kind of qualitative solution. Authors argue that computing a qualitative solution has many advantages over a quantitative one. Other methods using* Q*-learning are given in [22–24]. It is important to remember that reinforcement learning methods [25] belong to a family of algorithms highly related to the MDPs. The main difference with these methods is that the state transition function is assumed to be unknown and therefore the agents need to explore their state and action spaces by executing different actions in different states and observe the numerical rewards obtained after each state transition.

##### 1.2. Contributions of This Paper

The goal of automatic WSC is to determine a sequence of Web services that can be combined to satisfy a set of predefined QoS constraints. For problems where we need to find the sequence of actions maximizing an overall performance function, the MDPs are one of the most robust mathematical tools that we can use. Therefore, in this paper we propose an MDP model to solve the WSC problem. To show the reliability of our model, we conducted experiments with three of the most studied algorithms: policy iteration, iterative policy evaluation, and value iteration. Although all three algorithms provided good solutions, the policy iteration algorithm required the minimum number of iterations to converge to the optimal solutions. We also compared these three algorithms against sarsa and* Q*-learning, showing that the latter methods require one or two orders of magnitude and more time to solve composition problems of the same complexity.

This paper is structured as follows. Section 2 provides the basics of the MDPs framework and introduces the three algorithms that we tested. Section 3 introduces our MDP model for solving the WSC problem. Section 4 describes the experimental setup and presents the most relevant results. Section 5 presents comparative experiments with sarsa and* Q*-learning algorithms. Finally, Section 6 concludes this paper by discussing the main findings and providing some advice for future research.

#### 2. Markov Decision Processes

The WSC problem can be abstracted as the problem of selecting a sequence of actions, in such a way that we maximize an overall evaluation function. Such kind of sequential decision problems can be defined and solved in an MDP framework. An MDP is a tuple , where is a set of states, is a set of actions, are the state transition probabilities for all states and actions , is a discount factor, and is the reward function.

The MDP dynamics is the following. An agent in state performs an action selected from the set of actions . As a result of performing action , the agent receives a reward with expected value and the current state of the MDP transitions to some successor state , according to the transition probability . Once in state the agent chooses and executes an action , receiving reward and moving to state . The agent keeps choosing and executing actions, creating a path of visited states .

As the agent goes through states, , it obtains the following rewards:

The reward at timestep is discounted by a factor of . By doing so, the agent gives more importance to those rewards obtained sooner. In an MDP we try to maximize the sum of expected rewards obtained by the agent:

A policy is defined as a function mapping from the states to the actions. A value function for a policy is the expected sum of discounted rewards, obtained by performing always the actions provided by :

is the expected sum of discounted rewards that the agent would receive if it starts in state and takes actions given by . Given a fixed policy , its value function satisfies the Bellman equation:

The optimal value function is defined as

This function gives the best possible expected sum of discounted rewards that can be obtained using any policy . The Bellman equation for the optimal value function is

The optimal value function is such that we have

##### 2.1. Dynamic Programming Algorithms for MDPs

When the state transition probabilities are known, dynamic programming can be used to solve (6). Next, we present three efficient algorithms for solving finite-state MDPs by means of dynamic programming. The first one is the iterative policy evaluation (given in Algorithm 1). The second one is the policy value iteration algorithm (given in Algorithm 2). This algorithm repeatedly computes the value function for the current policy and then updates the policy using the current value function. The third one, shown in Algorithm 3, called value function iteration, can be thought as an iterative update of the estimated value function using Bellman Equation (6).