Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2015 (2015), Article ID 680271, 13 pages
http://dx.doi.org/10.1155/2015/680271
Research Article

Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization

1Department of Computer Science, AGH University of Science and Technology, Aleja Mickiewicza 30, 30-059 Kraków, Poland
2ACC CYFRONET AGH, Ulica Nawojki 11, 30-950 Kraków, Poland
3USC Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292, USA
4Center for Research Computing, University of Notre Dame, Notre Dame, IN 46556, USA

Received 15 May 2014; Accepted 22 November 2014

Academic Editor: Roman Wyrzykowski

Copyright © 2015 Maciej Malawski et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize the cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.