Research Article  Open Access
Taeseok Kim, Hyokyung Bahn, Youjip Won, "A PruningBased Disk Scheduling Algorithm for Heterogeneous I/O Workloads", The Scientific World Journal, vol. 2014, Article ID 940850, 17 pages, 2014. https://doi.org/10.1155/2014/940850
A PruningBased Disk Scheduling Algorithm for Heterogeneous I/O Workloads
Abstract
In heterogeneous I/O workload environments, disk scheduling algorithms should support different QoS (QualityofService) for each I/O request. For example, the algorithm should meet the deadlines of realtime requests and at the same time provide reasonable response time for besteffort requests. This paper presents a novel disk scheduling algorithm called GSCAN (GroupingSCAN) for handling heterogeneous I/O workloads. To find a schedule that satisfies the deadline constraints and seek time minimization simultaneously, GSCAN maintains a series of candidate schedules and expands the schedules whenever a new request arrives. Maintaining these candidate schedules requires excessive spatial and temporal overhead, but GSCAN reduces the overhead to a manageable level via pruning the state space using two heuristics. One is grouping that clusters adjacent besteffort requests into a single scheduling unit and the other is the branchandbound strategy that cuts off inefficient or impractical schedules. Experiments with various synthetic and realworld I/O workloads show that GSCAN outperforms existing disk scheduling algorithms significantly in terms of the average response time, throughput, and QoSguarantees for heterogeneous I/O workloads. We also show that the overhead of GSCAN is reasonable for online execution.
1. Introduction
As an increasingly large variety of applications are developed and equipped in modern computer systems, there is a need to support heterogeneous performance requirements for each application simultaneously. For example, a deadlineguaranteed service is required for realtime applications (e.g., audio or video playback), while reasonable response time and high throughput are important for interactive besteffort applications (e.g., web navigation or file editing). Since these applications require different QoS (QualityofService) guarantees, an efficient disk scheduling algorithm that can deal with heterogeneous I/O requests is needed.
Due to the mechanical overhead for accessing data in hard diskbased storage systems, I/O scheduling has been a longstanding problem for operating system and storage system designers. An optimal I/O schedule in the traditional disk scheduling domain usually refers to a sequence of requests that has minimum scanning time. In order to find this optimal schedule, all possible request sequences need to be searched. This is a complicated searching problem which is known as NP hard [1]. The location of each requested block is represented as cylinder, head, and sector information. The distance between two points in this threedimensional space does not satisfy the Euclidean property. Therefore, to obtain an optimal solution, we should enumerate all possible orderings of a given set of I/O requests. For example, if there are requests in the I/O request queue, the number of all possible combinations is factorial. Unfortunately, finding an optimal schedule from this huge searching space is not feasible due to the excessive spatial and temporal overhead. For this reason, most practical scheduling algorithms simply use deterministic heuristic approaches instead of searching huge spaces.
Unlike traditional scheduling problems, scheduling in heterogeneous workload environments is even more complicated because it should meet the deadlines of realtime requests and provide reasonable response times for besteffort requests, simultaneously. This implies the necessity of scanning huge search spaces rather than simple deterministic processes as in traditional scheduling problems. Y.F. Huang and J.M. Huang presented a new approach called MSEDF (Minimizing Seek time Earliest Deadline First) that effectively reduces the huge state space to a feasible extent through the branchandbound strategy [2]. Though MSEDF shows superior performances, it has some limitations. First, MSEDF handles requests in a batch manner and thus it cannot be practically used for online scheduling. Second, MSEDF considers only realtime requests, so adopting it directly to the domain of heterogeneous workload environments is not possible.
In this paper, we present a novel disk scheduling algorithm called GSCAN (GroupingSCAN) for handling heterogeneous workloads. GSCAN resolves the aforementioned problems by employing an online mechanism and several rules exploiting the QoS requirements of I/O requests. Specifically, GSCAN first arranges requests in the queue by the SCAN order and then clusters adjacent besteffort requests into a group to schedule them together. Then, GSCAN reduces the huge searching space to a reasonable extent by pruning unnecessary schedules using the branchandbound strategy. Experimental results show that GSCAN performs better than existing disk scheduling algorithms in terms of average response time, throughput, and QoSguarantees for heterogeneous workload environments. We also show that the space and time overhead of GSCAN is reasonable for online execution.
The remainder of this paper is organized as follows. Section 2 presents the state of the art of disk scheduling algorithms. In Section 3, the proposed scheduling algorithm, namely, GSCAN, is explained in detail. The validation of GSCAN is described in Section 4 by extensive experiments. Finally, we conclude this paper in Section 5.
2. Related Works
Since diskbased storage is always one of the performance bottlenecks in computer systems, disk scheduling algorithms have been studied extensively in the last few decades. Recently, as disks are used as the storage for multimedia data with soft realtime constraints, I/O scheduling problems have become more complicated. In this section, we classify existing disk scheduling algorithms into several classes according to the design purpose.
The first class is throughputoriented scheduling algorithms. This class of algorithms concentrates on the optimization of disk head movement. SSTF [3], SATF [4], SCAN [5], and CSCAN [5] are such examples. Of these, SSTF and SATF require an elaborate disk model in order to predict disk seek time or access time, which are not required for SCANlike algorithms. This is the reason why SCAN and its variants such as CSCAN are widely used in commodity operating systems. Note that this class of algorithms does not consider the priority of requests, and thus they do not have the function of realtime supports.
The second class is realtime scheduling algorithms, and they again can be classified into two categories: deadlinebased algorithms and roundbased algorithms. Deadlinebased algorithms aim at servicing I/O requests within given deadlines. EDF (Earliest Deadline First) is a representative algorithm in this category [6]. The concept of EDF comes from the realtime CPU scheduling technique. As EDF focuses only on deadlines, it exhibits poor performance in terms of disk head movement. Hence, a number of policies have been proposed to reduce the disk head movement of EDF. They include SCANEDF [7, 8], SSEDO/SSEDV [9], FDSCAN [10], SCANRT [11], DMSCAN [12], and Kamel’s algorithm [13]. Most of these algorithms combine the features of EDF and SCAN in order to meet the deadlines of realtime requests and maximize the disk utilization. However, since this approach is based on priority, they may induce the starvation of requests with low priorities.
Roundbased algorithms are designed for continuous media data and they exploit the periodicity of data retrieval in audio/video playback. They first define the size of round and service all I/O requests before the round expires. Rangan’s algorithm [14], Grouped Sweep Scheduling (GSS) [15], Preseeking Sweep algorithm [16], and Chen’s algorithm [17] can be classified into this category. These algorithms primarily focus on the efficiency of underlying resources rather than explicitly consider the deadlines of realtime requests. Instead, deadlines could be satisfied in the roundbased algorithms by careful load control through the admission control mechanism. These algorithms mandate the indepth knowledge of disk internals, such as the number of cylinders, the number of sectors per cylinder, and the curve function of seek distance and seek time, which are not usually accessible from the operating system’s standpoint.
The third class is algorithms for heterogeneous I/O workloads. During the last years, handling heterogeneous workloads in a single storage device has become an important issue as integrated file systems get momentum as the choice for next generation file systems. The most famous work is Cello [18]. Shenoy and Vin proposed the Cello disk scheduling framework using twolevel disk scheduling architectures: a classindependent scheduler and a set of classspecific schedulers. Cello first classifies disk requests into several classes based on their requirement of service. Then it assigns weights to the application classes and allocates disk bandwidth to the application classes in proportion to their weights. Won and Ryu [19], Wijayaratne and Narasimha Reddy [20], and Tsai et al. [21] also proposed scheduling strategies for heterogeneous workloads.
More recently, general frameworks that can control different scheduling parameters such as deadline, priority, and disk utilization were presented. For example, Mokbel et al. proposed CascadedSFC which provides a unified framework that can scale scheduling parameters [22]. It models multimedia I/O requests as points in multidimensional subspaces, where each dimension represents one of the parameters. These general scheduling frameworks require many tuning parameters to be set by the system itself or end users. Povzner et al. proposed Fahrrad that allows applications to reserve a fixed fraction of a disk’s utilization [23]. Fahrrad reserves disk resources in terms of the utilization by using disk time utilization and period. They also proposed a multilayered approach called Horizon to manage QoS in distributed storage systems [24]. Horizon has an upperlevel control mechanism to assign deadlines to requests based on workload performance targets and a lowlevel disk I/O scheduler deigned to meet deadlines while maximizing throughput.
Most of the aforementioned scheduling algorithms employ deterministic approaches. “Deterministic” here means that the algorithms maintain only a single schedule to be actually executed, and each time a new request arrives the schedule is simply updated. Though deterministic algorithms are effective for fast online processing, they have difficulty in maximizing the performance. For example, a new request in the future may change the order of the optimal schedule of existing requests, but this cannot be reflected in deterministic algorithms. Y.F. Huang and J.M. Huang presented MSEDF (Minimizing Seek time Earliest Deadline First) for multimedia server environments that is not a deterministic algorithm [2]. They recognized I/O scheduling as an NPhard problem and made an initial attempt to reduce the searching space. However, MSEDF is a kind of offline algorithm, so it cannot be adopted directly as the online scheduler of heterogeneous workload environments. Table 1 lists a summary of various disk scheduling algorithms.

3. GSCAN: A PruningBased Disk Scheduling
3.1. Goal and Assumptions
Our goal is to design a disk scheduling algorithm that satisfies the deadline requirement of realtime requests and at the same time minimizes the seek distance of the disk head as much as possible. In addition to this, the scheduling algorithm should be feasible to be implemented; that is, the execution overhead of the algorithm should be reasonable in terms of both space and time for online execution.
We first classify I/O requests into two classes: realtime requests and besteffort requests. We assume that each I/O request consists of , where is the deadline and is the track number of on the disk. Realtime requests have their own deadlines and they can be periodic or aperiodic. Besteffort requests have no specific deadlines, and thus we assume their deadlines to be infinite. We also assume that all requests are independent, which implies that a request does not synchronize or communicate with other requests and all requests are nonpreemptive while being serviced in the disk.
Since GSCAN is an online scheduling mechanism, it should decide the schedule of requests immediately when a new request arrives or the service of a request is completed. Though GSCAN expands existing schedules whenever an arrival or a departure of a request occurs, it reduces the searching space significantly by grouping and branchandbound strategies.
3.2. Grouping of BestEffort Requests
We group adjacent besteffort requests and consider them as a single request to service them together. To do this, we arrange the requests in the queue by the SCAN order and then cluster adjacent besteffort requests into a group. Since besteffort requests have no deadlines, it is reasonable to service them together within a group. This grouping reduces the huge searching space significantly by removing unnecessary combinations.
Figure 1 illustrates the grouping of adjacent besteffort requests. There are 11 requests sorted by the SCAN order, and the searching space is 11 factorial as shown in Figure 1(a). In this example, for besteffort requests , , , and , the ordered schedule or is always superior to the nonordered schedules such as in terms of the seek distance.
(a)
(b)
Figure 1(b) shows the state after grouping adjacent besteffort requests. Basically, GSCAN clusters all besteffort requests between two realtime requests into a single group. However, if the seek distance between any two besteffort requests is too long, they are not put together into the same group. This is because a group that spans too long distance may decrease the possibility of finding good schedules. Hence, we put any two adjacent besteffort requests whose distance is below the threshold into the same group, where is an experimental parameter. In Figure 1(b), and belong to separate groups because their distance is longer than . If is large, the number of possible schedules decreases and thus the searching space becomes smaller, but the possibility of finding the best schedule also decreases.
When a new request arrives at the queue, GSCAN groups it by the aforementioned method. If the new request is a besteffort one, it may be merged into an existing group, bridge a gap between two groups, or create a new group. On the other hand, if the new request is a realtime one, it may split an existing group or just be inserted by the SCAN order without any specific actions.
3.3. The BranchandBound Strategy
To reduce the searching space even more, we employ the branchandbound strategy similar to the approach of Y.F. Huang and J.M. Huang [2]. The branchandbound strategy is an algorithmic technique to find an optimal solution in combinatorial optimization problems by keeping the best solution found so far. If a partial solution cannot improve at best, it is pruned not to produce unnecessary combinations any more. Since I/O scheduling is a typical combinatorial optimization problem, the branchandbound strategy can be effectively used for this problem.
We cut down two kinds of unnecessary schedules from huge searching spaces using the QoS requirements of heterogeneous workloads. The first class is schedules that have any deadline missed request and the second class is schedules that incur too long seek time. Figure 2 illustrates an example of the cuttingdown process. Let us assume that is a realtime request with the deadline of 200 ms, and and are besteffort requests. In this example, for simplicity, we assume that the seek time of tracktotrack is 1 ms and the seek time is proportional to the track distance of the requests. We also assume that the rotational latency for each request is constant and do not consider the transfer time because it is very small compared to the seek time and the rotational latency. Note that these factors are considered in the experiment section.
(a) Requests in the queue: is the deadline and is the track number
(b) All possible schedules: node ( , , ) denotes the scheduling order of
In Figure 2(b), level denotes the number of requests in the queue. For example, when the level is 3, the searching space is 3 factorial. Among all possible combinations, some schedules can be removed from this tree structure. For example, schedule can be removed because request in schedule cannot meet its deadline of 200 ms. Note that any schedules inherited from this schedule cannot also satisfy the deadline constraints, which we will show in Theorem 1. Schedule can also be removed because it incurs too long seek time. A concrete yardstick for “too large” here will be given more clearly in Theorem 2. As a result, practical searches for finding the best schedule can be performed only with the remaining schedules. An optimal schedule in this example is , because its seek time is shortest among the schedules satisfying the deadline requirement of realtime requests.
Now, we will show why the two classes of schedules and their successors cannot produce an optimal schedule and thus can be pruned. These two pruning conditions can be proved through the following two theorems.
Theorem 1. If a schedule does not meet the deadline of any realtime request, then all new schedules inherited from that schedule will not also meet the deadlines.
Proof. Let us assume that there is a schedule with the request order , where , that cannot meet the deadline of . When a new request arrives, GSCAN expands existing schedules by inserting into positions either before or after , that is, or . In the latter case that is serviced later than , the service time of does not change at all, and thus still misses the deadline. In the former case that is serviced earlier than , the seek time of will not obviously be reduced. Hence, the schedule cannot meet the deadline of .
Theorem 2. Assume that there are requests in the queue and the seek time of a schedule is longer than that of an optimal schedule for a full sweep time of the disk head. Then, any schedule expanded from due to the arrival of a new request cannot be an optimal schedule.
Proof. Let and be the seek time of and , respectively. Then, by the assumption of this theorem, the following expression holds: where is the seek time of a full disk head sweep. Similarly, let be an optimal schedule after arriving request, and let and be the seek time of and , respectively. Since is inherited from by including a new request, the following expression holds: Also, expression (3) is satisfied because an additional seek time for the new request is not longer than the seek time of a full disk head sweep in the case of the optimal algorithm: Through expressions (1), (2), and (3), the following expression is derived: This implies that any schedule inherited from which satisfies expression (1) cannot have shorter seek time than that of . Hence, cannot be an optimal schedule.
The above two pruning conditions are devised to reduce the searching space when a new request arrives at the queue. Similarly, it is also possible to reduce the searching space when a request is removed from the queue. Specifically, when the disk becomes ready to perform a new I/O operation, GSCAN selects the best schedule among the candidate schedules and dispatches the first request in that schedule. This makes schedules not beginning with the selected request meaningless and thus they can be pruned. Details of this pruning condition are explained in Theorem 3.
Theorem 3. When a request leaves from the queue to be serviced, any schedules that do not begin with can be pruned.
Proof. Let us suppose that an optimal schedule with requests is and the first request in is . When the disk becomes ready to service a request, the scheduling algorithm selects and removes from the queue to service it. In this case, all schedules that do not begin with can be removed from the searching space because schedules inherited from them as well as themselves are all invalid. On the other hand, schedules beginning with are not pruned but remain in the tree structure though they are not selected. It is because these schedules may become an optimal schedule according to the arrival of new requests in the future even though they are not optimal now.
It is possible that all schedules will be removed through the above pruning conditions. For example, when the I/O subsystem is overloaded and no feasible schedule exists, all schedules may be pruned. To resolve this phenomenon, if the number of candidate schedules becomes less than threshold, GSCAN maintains a certain number of relatively superior schedules even though they satisfy the pruning conditions. The relative superiority here is evaluated by considering both total seek time and deadline miss time of realtime requests. On the other hand, there is a possibility of incurring large overhead if too many schedules satisfy the conditions of GSCAN. To solve this problem, we give rankings to the schedules according to the relative superiority and then cut down schedules whose ranking is beyond another threshold. Note that GSCAN might not find an optimal schedule in the true sense of the definition. Essentially, an optimal algorithm requires the knowledge of request sequences that will arrive in the future. Our goal is to design an algorithm which can obtain a schedule close to optimal with reasonable execution overhead. The algorithm of GSCAN is listed in Algorithms 1 and 2. ADD_REQUEST() is invoked when a new request arrives and SERVICE_REQUEST() is invoked when the disk dispatches a request in the queue for I/O service.


4. Performance Evaluation
4.1. Experimental Methodology
To assess the effectiveness of GSCAN, we performed extensive experiments by replaying various traces collected. We compare GSCAN with other representative online algorithms, namely, CSCAN, EDF, SCANEDF, and Kamel’s algorithm [13] in terms of the average response time, total seek distance, throughput, and deadline miss rate. We also show that the overhead of GSCAN is feasible to be implemented. To evaluate the algorithms in various heterogeneous workload environments, we use both synthetic and realworld I/O traces. For synthetic traces, we generated four different types of workloads as shown in Table 2. Workloads 1 to 4 consist of various heterogeneous I/O workloads including realtime and besteffort applications. We modeled two different types of realtime applications based on their access patterns, namely, random and periodic. In the random type, data positions, I/O request times, and deadlines are determined randomly each time, while the periodic type has regular values. Similarly, we modeled besteffort applications as two different access patterns, namely, random and sequential.

To show the effectiveness of GSCAN under more realistic conditions, we also performed experiments with realworld I/O traces gathered from Linux workstations (workloads 5 and 6 in Table 2). We executed the IOZONE program and the mpeg2dec multimedia player together to generate different types of I/O requests. IOZONE is a filesystem benchmark tool which measures the performance of a given file system. It generates various random I/O requests, and their average interarrival times in workloads 5 and 6 are 10 ms and 19 ms, respectively [25]. mpeg2dec is a program for playing video files, which generates realtime I/O requests periodically. Average interarrival times of I/O requests generated by mpeg2dec in workloads 5 and 6 are about 45 ms and 90 ms, respectively. The deadline of realtime I/O requests in mpeg2dec is about 30 ms.
4.2. Effects of Grouping
Before comparing the performances of GSCAN against other algorithms, we first investigate the effect of grouping when workload 5 (real workload) is used. Figure 3 shows the average number of groups as a function of threshold . Note that the number of groups illustrated in Figure 3 includes realtime requests as well as grouped besteffort requests. The unit of is defined as the track distance of two requests. For example, if is set to 100, besteffort requests whose track distance is smaller than 100 can belong to the same group. As can be seen from Figure 3 the searching space, namely, all possible combinations of schedules, is significantly reduced after grouping.
For example, when grouping is not used, the average number of requests in the queue is about 22 and thus the size of entire searching space is 22 which is a number larger than 10^{21}. Note that the zero extreme of threshold in the graph implies that grouping is not used. However, after grouping is used, the searching space is significantly reduced. For example, when the threshold is 100 tracks, the average number of groups becomes about 6, and thus the searching space is reduced to 6 = 720. Moreover, GSCAN does not expand this searching space completely because it also uses heuristics to reduce the searching space even more.
To see the effect of grouping, we investigate the performance of GSCAN in terms of various aspects as a function of threshold . We also use workload 5 (real workload) in this experiment. As can be seen from Figures 4(a)–4(c), total seek distances, throughput, and deadline miss rate are scarcely influenced by the value of threshold . In the case of average response time, however, the performance degrades significantly when is larger than 100 as shown in Figure 4(d). We also compare the number of schedules actually expanded as a function of threshold . As can be seen from Figure 4(e), grouping significantly reduces the number of schedules to be handled. Specifically, the number of expanded schedules drops rapidly when the threshold is larger than 60. With these results, we can conclude that grouping of adjacent besteffort requests can significantly reduce the searching space without performance degradations when the threshold is set to a value around 100. In reality, finding an appropriate value for each workload environment is not an easy matter and is a topic that we are still pursuing. We use the default value of as 100 throughout this paper because it shows good performances and incurs reasonably low scheduling overhead for all workloads that we considered.
(a) Total seek distance
(b) Throughput
(c) Deadline miss rate
(d) Average response time
(e) Number of schedules searched
4.3. Performance Comparison
In this subsection, we compare the performance of GSCAN with other scheduling algorithms. We use four synthetic workloads and two real workloads listed in Table 2. Note that the performance of GSCAN is measured when is set to 100. First, we investigate the total seek distances of the five algorithms. As shown in Figure 5(a), GSCAN outperforms the other algorithms for all workloads that we experimented. CSCAN and Kamel’s algorithm also show competitive performances though the performance gap between GSCAN and these two algorithms is distinguishable for workloads 2 and 4. Figures 5(b) and 5(c) show the throughput and the average response time of the algorithms, respectively. For both of the metrics, GSCAN again performs better than CSCAN and Kamel’s algorithm. EDF and SCANEDF result in excessively large average response time for all cases. The reason is that EDF and SCANEDF greedily follow the earliest deadline irrespective of request positions. Figure 5(d) compares the deadline miss rate of the five algorithms. As expected, deadlinebased algorithms such as EDF and SCANEDF perform well for most cases. CSCAN and Kamel’s algorithm do not show competitive performances. GSCAN shows reasonably good performances in terms of the deadline miss rate for all cases. Specifically, GSCAN performs better than even EDF when real workloads (workloads 5 and 6) are used. In summary, GSCAN satisfies the deadline constraints of realtime requests and at the same time exhibits good performances in terms of the average response time, throughput, and seek distances for both synthetic and realworld traces.
(a) Total seek distances
(b) Throughput
(c) Average response time
(d) Deadline miss rate
To show the upper bound of performance, we additionally measured the performance of several unrealistic algorithms that have more information to schedule, namely, OPTD, OPTT, and OPTG. OPTD is an optimal algorithm in terms of the deadline miss rate that minimizes the number of requests missing its deadline. OPTT moves the disk head in order to minimize the total seek time irrespective of deadline misses, which performs similarly to the original SCAN algorithm. Finally, OPTG moves the disk head to minimize the seek time and meet the deadlines of realtime requests simultaneously if a feasible schedule exists. When no feasible schedule exists, OPTG moves the disk head to minimize the seek time. OPTG is a complete version of GSCAN that does not use neither grouping nor branchandbound scheme.
Figures 6, 7, and 8 show the total seek distance, the throughput, and the average response time of the algorithms, respectively. The experiments were performed with workload 1 (synthetic workload) and workload 5 (real workload), respectively. We scale the original interarrival times of the workloads to explore a range of workload intensities. For example, a scaling factor of two generates a workload whose average interarrival time is twice longer than original workload. As can be seen in the figures, GSCAN shows almost identical performances with OPTT and OPTG in terms of the total seek distance, the throughput, and the average response time. As expected, EDF results in extremely poor performance in terms of the three metrics because it does not consider the movement of the disk head.
(a) Synthetic workload
(b) Real workload
(a) Synthetic workload
(b) Real workload
(a) Synthetic workload
(b) Real workload
Figure 9 compares the deadline miss rate of the algorithms. Since GSCAN aims at reducing the seek time as well as the deadline misses, it could not exhibit better performance than OPTD that only considers the deadline miss rate. However, GSCAN consistently shows competitive performances in terms of the deadline miss rate. Specifically, the performance of GSCAN is similar to that of OTPG which pursues identical goals but does not use either grouping or pruning mechanism. Consequently, we can conclude that the grouping and the pruning mechanism of GSCAN significantly reduce the searching space without degradation of the performance in all aspects of the total seek distance, the throughput, the average response time, and the deadline miss rate.
(a) Synthetic workload
(b) Real workload
4.4. Overhead of GSCAN
To show the overhead of GSCAN, we measured the number of schedules expanded by GSCAN and compared it with the number of all possible schedules. Figure 10 shows the result for different scaling factors when workload 1 (synthetic workload) and workload 5 (real workload) are used. It is important to note that the axis in the graph is in logscale. As shown in the figure, the number of schedules maintained by GSCAN is reasonable for all cases. Specifically, when the scaling factor of 1.0 is used for workload 1 that refers to the original workload, the average number of schedules expanded by GSCAN is only 298. Note that the average number of all possible schedules in this case is 7.455 10^{15}. Similarly, the average numbers of schedules expanded by GSCAN for real workload are smaller than 100 for all cases.
(a) Synthetic workload
(b) Real workload
Figure 11 compares the schedules expanded by GSCAN with all possible schedules when the scaling factor of 1.0 is used for workload 1 (synthetic workload) and workload 5 (real workload), respectively, as time progresses. Note that the axis is again in logscale. As can be seen, GSCAN explores only a small fraction of total possible schedules, and its overhead is reasonable for online execution.
(a) Synthetic Workload
(b) Real Workload
5. Conclusions
In this paper, we presented a novel disk scheduling algorithm called GSCAN that supports requests with different QoS requirements. GSCAN reduces the huge searching space to a feasible level through grouping and branchandbound strategies. We have shown that GSCAN is suitable for dealing with heterogeneous workloads since (1) it is based on the online request handling mechanism, (2) it meets the deadlines of realtime requests, (3) it minimizes the seek time, and (4) it has low enough overhead to be implemented. Through extensive experiments, we demonstrated that GSCAN outperforms other scheduling algorithms in terms of the average response time, throughput, total seek distances, and deadline miss rate. We also showed that GSCAN has reasonable overhead to be implemented for online execution.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (no. 20120001924 and no. 20110028825). The present research has been also conducted by the Research Grant of Kwangwoon University in 2013.
References
 M. Andrews, M. A. Bender, and L. Zhang, “New algorithms for the disk scheduling problem,” in Proceedings of the 37th Annual Symposium on Foundations of Computer Science, pp. 550–559, October 1996. View at: Google Scholar
 Y.F. Huang and J.M. Huang, “Disk scheduling on multimedia storage servers,” IEEE Transactions on Computers, vol. 53, no. 1, pp. 77–82, 2004. View at: Publisher Site  Google Scholar
 P. J. Denning, “Effects of scheduling on file memory operations,” in Proceedings of the AFIPS Spring Joint Computer Conference, pp. 9–21, 1967. View at: Google Scholar
 D. M. Jacobson and J. Wilkes, “Disk scheduling algorithms based on rotational position,” Tech. Rep. HPLCSP917, HewlettPackard Lab., Palo Alto, Calif, USA, 1991. View at: Google Scholar
 B. L. Worthington, G. R. Ganger, and Y. N. Patt, “Scheduling algorithms for modern disk drives,” in Proceedings of the ACM Sigmetrics on Measurement and Modeling of Computer Systems, pp. 241–251, May 1994. View at: Google Scholar
 C. L. Liu and J. W. Layland, “Scheduling algorithms for multiprogramming in hardrealtime environment,” Journal of the ACM, vol. 20, no. 1, pp. 47–61, 1973. View at: Google Scholar
 A. L. N. Reddy and J. Wyllie, “Disk scheduling in a multimedia I/O system,” in Proceedings of the 1st ACM International Conference on Multimedia, pp. 225–233, August 1993. View at: Google Scholar
 A. L. N. Reddy, J. Wyllie, and K. B. R. Wijayaratne, “Disk scheduling in a multimedia I/O system,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 1, no. 1, pp. 37–59, 2005. View at: Publisher Site  Google Scholar
 S. Chen, J. A. Stankovic, J. F. Kurose, and D. Towsley, “Performance evaluation of two new disk scheduling algorithms for realtime systems,” RealTime Systems, vol. 3, no. 3, pp. 307–336, 1991. View at: Publisher Site  Google Scholar
 R. K. Abbott and H. GarciaMolina, “Scheduling I/O requests with deadlines: a performance evaluation,” in Proceedings of the 11th RealTime Systems Symposium, pp. 113–124, December 1990. View at: Google Scholar
 I. Kamel and Y. Ito, “A proposal on disk bandwidth definition for video servers,” in Proceedings of the Society Conference of IEICE, 1996. View at: Google Scholar
 R.I. Chang, W.K. Shih, and R.C. Chang, “DeadlinemodificationSCAN with maximumscannablegroups for multimedia realtime disk scheduling,” in Proceedings of the 19th IEEE RealTime Systems Symposium, pp. 40–49, December 1998. View at: Google Scholar
 I. Kamel, T. Niranjan, and S. Ghandeharizedah, “Novel deadline driven disk scheduling algorithm for multipriority multimedia objects,” in Proceedings of the IEEE 16th International Conference on Data Engineering (ICDE '00), pp. 349–361, March 2000. View at: Google Scholar
 H. M. Vin and P. V. Rangan, “Designing a multiuser HDTV storage server,” IEEE Journal on Selected Areas in Communications, vol. 11, no. 1, pp. 153–164, 1993. View at: Publisher Site  Google Scholar
 P. S. Yu, M. S. Chen, and D. D. Kandlur, “Design and analysis of a grouped sweeping scheme for multimedia storage management,” in Proceedings of the of Network and Operating Systems Support for Digital Audio and Video, pp. 44–55, 1992. View at: Google Scholar
 D. J. Gemmell, “Multimedia network file servers: multichannel delay sensitive data retrieval,” in Proceedings of the 1st ACM International Conference on Multimedia, pp. 243–250, August 1993. View at: Google Scholar
 H. J. Chen and T. D. C. Little, “Physical storage organizations for timedependent multimedia data,” in Proceedings of the International Conference on Foundations of Data Organization and Algorithms, pp. 19–34, 1993. View at: Google Scholar
 P. Shenoy and H. M. Vin, “Cello: a disk scheduling framework for next generation operating systems,” RealTime Systems, vol. 22, no. 12, pp. 9–48, 2002. View at: Publisher Site  Google Scholar
 Y. Won and Y. S. Ryu, “Handling sporadic tasks in multimedia file system,” in Proceedings of the 8th ACM International Conference on Multimedia, pp. 462–464, November 2000. View at: Google Scholar
 R. Wijayaratne and A. L. Narasimha Reddy, “Providing QOS guarantees for disk I/O,” Multimedia Systems, vol. 8, no. 1, pp. 57–68, 2000. View at: Google Scholar
 C.H. Tsai, E. T.H. Chu, and T.Y. Huang, “WRRSCAN: a ratebased realtime diskscheduling algorithm,” in Proceedings of the 4th ACM International Conference on Embedded Software (EMSOFT '04), pp. 86–94, September 2004. View at: Google Scholar
 M. F. Mokbel, W. G. Aref, K. Elbassioni, and I. Kamel, “Scalable multimedia disk scheduling,” in Proceedings of the 20th International Conference on Data Engineering (ICDE '04), pp. 498–509, April 2004. View at: Publisher Site  Google Scholar
 A. Povzner, T. Kaldewey, S. Brandt, R. Golding, T. M. Wong, and C. Maltzahn, “Efficient guaranteed disk request scheduling with fahrrad,” in Proceedings of the 3rd ACM European Conference on Computer Systems (EuroSys '08), pp. 13–25, April 2008. View at: Publisher Site  Google Scholar
 A. Povzner, D. Sawyer, and S. Brandt, “Horizon: efficient deadlinedriven disk I/O management for distributed storage systems,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10), pp. 1–12, New York, NY, USA, June 2010. View at: Publisher Site  Google Scholar
 IOZONE, http://www.iozone.org.
Copyright
Copyright © 2014 Taeseok Kim et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.