Abstract

Information abounds in all fields of the real life, which is often recorded as digital data in computer systems and treated as a kind of increasingly important resource. Its increasing volume growth causes great difficulties in both storage and analysis. The massive data storage in cloud environments has significant impacts on the quality of service (QoS) of the systems, which is becoming an increasingly challenging problem. In this paper, we propose a multiobjective optimization model for the reliable data storage in clouds through considering both cost and reliability of the storage service simultaneously. In the proposed model, the total cost is analyzed to be composed of storage space occupation cost, data migration cost, and communication cost. According to the analysis of the storage process, the transmission reliability, equipment stability, and software reliability are taken into account in the storage reliability evaluation. To solve the proposed multiobjective model, a Constrained Multiobjective Particle Swarm Optimization (CMPSO) algorithm is designed. At last, experiments are designed to validate the proposed model and its solution PSO algorithm. In the experiments, the proposed model is tested in cooperation with 3 storage strategies. Experimental results show that the proposed model is positive and effective. The experimental results also demonstrate that the proposed model can perform much better in alliance with proper file splitting methods.

1. Introduction

The rapidly increasing information resources, which are often recorded in form of data and stored in computer systems, are a kind of new resources and becoming more and more important nowadays. The rapid development of Internet technologies and mobile communication equipment allows more and more people to have opportunity to access various Internet networks, such as Twitters, Amazon, and Taobao sites. While visiting these websites, massive accession information which is very important to the service providers to make business decisions will be produced. The massive accession information is often collected by the service providers and recorded as data resources stored in cloud systems. The storage of such massive data is of great significance for the quality of service of computing systems and poses many challenges to storage service providers. Recently, cloud storage is getting more and more competitive and popular, which is becoming a trend of future storage technique. Various cloud storage products are springing up in the high-tech market, such as Cloud Drive of Amazon and Live Mesh of Microsoft.

The main goal of cloud storage is to provide users with low-cost, infinitely extensible, and high-reliability storage platform, which can also support various data storage types. A cloud storage system also has abilities to satisfy the different requirements of different users. With the development of the era of service economy, customers no longer seek storage products or services with cheap price and high quality but are more concerned about quality of service, which makes the data storage providers concentrate on promoting both of the quality of service and storage efficiency simultaneously. Therefore, optimizing the data storage process is of great importance to the quality of service of the cloud storage systems. However, with the development of cloud storage techniques, various data processing techniques and applications need to be deployed in combination with cloud storage, so that different aspects should be considered while optimizing the storage process. Users usually select suitable storage service according to the issues of storage cost and reliability. The service providers also do their best to enhance product competitiveness by saving the storage costs and promoting the storage reliability.

Nowadays, the quality of service of cloud storage is being influenced by the following 7 categories of risks or challenges [1]: privileged access, auditing, storage location, data isolation, data disaster recovery, survey, and long-term viability. Among these risks, data disaster recovery is a very significant issue for both cloud storage providers and users. Users choose cloud storage providers to store their sensitive and significant business data [24]. Once their data were lost, it will not only lead to fetal disasters to the companies but also bring legal disputes between users and storage service providers. For instance, in a banking system, the clients’ data is of great importance to both banks and clients. If lost or unable to be recovered, it will lead to immeasurable losses for both banks and their clients [5, 6]. Hence, it should be one of the business objectives for cloud storage providers to ensure the reliability and recovery for the stored data.

In cloud storage systems, the datacenters are the vital components used to house computer systems and associated components, such as servers, telecommunications, and storage system [7]. Compared with personal computers, the main advantages of the datacenter are over its continuously powerful operation, storage, and network communication, faster crash recovery, and broader convenient expanse space [8]. In almost all the computer systems, data resources are stored on the devices called hard disks, which have direct impacts on the quality of service of the storage process. The storage capacity of a cloud storage system is significantly limited by the volume of the hard disks in its datacenters. In a datacenter, various hard disks with different properties are deployment for certain purposes on storage or analysis of the data files, such as SSD (Solid State Disk), HDD (Hard Disk Drive), and HHD (Hybrid Hard Disk). In datacenters, the mixture of HDDs and SSDs is becoming more and more popular in enterprise-class disk arrays. Practically, datacenters are often located at different places (cities) and connected with each other via Internet or high-performance routers. The geographical distance is another important factor which will influence the data storage process. The cloud storage is getting more and more popular, so more and more individuals and enterprises decide to migrate their business or the data management works to datacenters aiming to cut operating costs. As is well known, the value of data lies not only in its contents but also in the potential business benefits for the business development. Thus, the quality of service of a cloud storage system is conditioned by many different factors.

From the user’s point of view, different data files submitted for possible storage often need different storage service accordingly [9]. To meet such demand, the storage service providers have to develop diverse storage products and invest large quantities of human inputs and various storage devices. Hence, an important issue for the providers to consider is how to save the service cost and improve the storage reliability. Many operators have employed different technologies to realize the purpose of reducing the cost and improving the reliability [1012]. During the storage of the data files, the cost and reliability are influenced by many factors, such as communication and transmission. In this paper, these factors will be quantified to evaluate the cost and reliability of the storage service. Through minimizing the storage cost and maximizing the storage reliability simultaneously, a multiobjective optimization model for the massive data storage in cloud environment is proposed. Various solutions that have reference value for decision-makers will be obtained by solving this multiobjective model, so that more reasonable decisions might be made. Then, a particle swarm optimization (PSO) algorithm is designed to solve the proposed model.

The remainder of this paper is organized as follows: Section 2 briefly reviews the state of the art of the research on the cloud storage optimization. Then, the multiobjective optimization model for reliable cloud storage is proposed through analyzing the storage processes in Section 3. Section 4 briefly introduces the PSO algorithm and a tailor-made multiobjective PSO algorithm is designed to solve the proposed model. Experiments and results analysis are made in Section 5. At last, conclusions of this paper are drawn in Section 6.

In recent years, with the increasing usage of the cloud storage service, more and more efforts have been made to improve the quality of service via managing resources in clouds. Since the resources in the cloud environment are provided on a pay-as-you-go basis, high efficiency utilization of the resources is necessary. Efficient resource management in cloud datacenters has attracted more and more attentions with different objectives, such as reducing cloud computational cost or energy budget, improving the quality of service of the cloud systems, and optimizing the data storage. Various techniques have been developed to achieve these objectives.

Virtualization technique is a kind of efficient resource management techniques employed in datacenters to reduce the computational cost or energy budget. Virtual machines play significant roles in the cloud computing systems, which can be migrated between different systems or infrastructures. Through migrating, these virtual machines can efficiently manage the resources in the cloud environment. The recent research on virtual machine migration problems mainly focuses on maximizing the resources utilization or minimizing the migration cost. Zhang et al. [13] proposed a multiobjective optimization model to maximize the resource utilization and minimize the migration cost simultaneously, in which the factors of storage capacity, bandwidth, etc. are considered. Ahmad et al. [14] summarized the virtual machine migration schemes in aspects of the network bandwidth optimizations, server consolidation frameworks, DVFS energy optimization, and storage optimization for cloud datacenters in the Internet links. Dong et al. [15] proposed a virtual machine scheduling strategy aiming at reducing the number of the active virtual machines and network devices, in which the resources in IaaS cloud, such as the storage space, bandwidth, and network connectivity, are considered.

Cloud storage optimization [1622] is another issue to improve the quality of service of the cloud computing systems. In [16], optimizations are proposed to reduce the volume of data to be transferred per data access for the respects of privacy and security, which did not consider the storage cost. Guo and Fang [17] proposed an optimization model with the objective of minimizing the total electricity cost of multiple datacenters for cloud service providers, in which the available energy storage capabilities in datacenters were considered to be further used to reduce the electricity. Then, an online algorithm based on Lyapunov optimization technique was developed for this model. However, solving this single objective optimization model, only one solution can be obtained, which cannot help service providers develop customized products. Mao et al. [18] proposed a SAR model to improve the read performance of the cloud storage system. In this model, the high efficiency of the random read performance of SSDs was utilized to improve the read performance of the cloud system converting the read requests on HDDs to the read requests on the SSDs. Liu et al. [19] proposed a security-aware intermediate data placement strategy for data intensive scientific workflows in cloud environment with the aim of storing large volumes of the intermediate datasets cost-effectively, in which the security factor of the datasets is considered. In [20], the efforts on improving the energy efficiency of data accesses and storage were made. Aiming to improve the storage utilization and workflow scheduling performance in the cloud, Wang et al. [21] proposed a user-level file system and a scheduling algorithm for scientific workflow computation in the cloud, in which the storage utilization was improved by using the workflow-aware file system and the scheduler to control the number of concurrent workflow instances at runtime. Jarrah et al. [22] proposed a hierarchical optimization model for energy data flow in smart grid power systems aiming to minimize daily electricity cost through maximizing the used percentage of renewable energy.

However, most of the existing works focus on either the cost (or energy) or the storage safety by scheduling the data storage tasks or the storage devices. Few work considers both the storage cost and reliability simultaneously. The storage reliability is of great significance to the quality of service of the cloud storage systems.

3. Multiobjective Optimization Model for Massive Data Storage

As a kind of open service IT systems, cloud storage systems are usually with the structure as Figure 1. Choosing an open cloud environment to store data files, users often take into account not only the price of data storage services but also the reliability when they send their data files to the cloud computing systems [2325]. The users’ privacy or profit is often involved in their data files to be stored. The users might probably select different storage services for different purposes according to their demands. Therefore, the storage service providers should provide plentiful storage service to satisfy different users’ requirements. When pricing the storage service, the providers should have knowledge of the cost and reliability of the certain storage service. Or, given the price of a certain storage service, the providers need to do their best to reduce the storage cost and keep the storage reliability. Aiming to solve this problem, we proposed a new multiobjective optimization model for massive data storage in cloud environment considering storage cost and reliability simultaneously.

3.1. Assumptions

Each server in datacenters of a cloud storage system can be treated as a node, and the linkage between servers can be represented by edges. Therefore, we can have a complete graph to represent the networks of the servers. The data files with different security level have different storage cost. Except for the costs of encryption processing and special device usage, another issue is that data files with high security level may have more backups around different locations. Hence, the monitor system needs to communicate with the servers located in different places and transfer these backups. These processes will not only cost much more but also make the storage unreliable.

Let be the servers in the datacenters of a cloud storage system. These servers might be arranged in different places. The distance between these servers is denoted by the matrix , in which, , and, and , . The data files submitted by users are denoted as , in which denotes an indivisible data file. Each data file has its own security level, denoted as , . For convenience, other parameters are listed in Nomenclature.

3.2. Total Service Cost for Massive Data Storage

In each server, the data files with different security levels usually have different storage price denoted as , which is in proportion to the security level. The number of backups for data file with different security levels is often set to be proportional to its security level. Hence, the higher the security level a data file has, the higher the storage cost it will need. The total service cost for storing a set of data files is divided into three parts in this paper: storage cost, migration cost, and communication cost.

3.2.1. Storage Cost

The storage cost of data file per unit time can be treated as rental cost for data store space, which is the product of the volume of the data files, the price of the related storage service in the target server, and the storage time. Therefore, the storage cost of data file can be evaluated through the following formula:

Thus, the storage cost of storing the data file set in time is the sum of the storage cost of each data file in the cloud storage system, which can be formulated as follows:

3.2.2. Migration Cost

As is well known, users can access and upload their data files via Internet to a cloud system almost everywhere. The data files are often uploaded by the users to the servers located at the management layer of a cloud system to decide which target storage servers will be chosen to store these data files. Therefore, some data files might be migrated to other servers located at different places. The migration cost will be required, which is basically proportional to the migration distance and the data volume. Suppose that the data files are uploaded at server , the migration cost of a data block from to can be calculated as follows:in which is the migration price. Thus, the migration cost of the data files during the storage process can be calculated as follows:

3.2.3. Communication Cost

During the whole process of migrating the data files to the cloud system, the storage management system should collect the storage information of the candidate servers to make sure whether the certain data file can be stored, verify the integrity of the migrated data files, and monitor the storage process. Therefore, the communication is throughout the whole storage process. The required communication cost is proportional to the distance between the server where the management system located and the target server candidated to store the data file . The communication cost of storing data file can be formulated as follows:

Therefore, the total communication cost of the set of data files submitted for storage is calculated as the following formula:

3.2.4. Total Cost of Storage Service

The dataset is supposed to be submitted via terminal server . Let be the total cost of storing the dataset in the cloud storage system, which is considered to be the sum of storage cost, communication cost, and migration cost. The formula is as follows:

3.3. Storage Reliability

The reliability of a storage system is one of the most important measures of the service of quality and the calculation of the reliability is so difficult that very few attention has been paid. During the data storage process, the reliability is influenced by many factors (e.g., data transmission rate, equipment stability, and service system stability). However, it is difficult to determine the quantitative criteria to evaluate these impactors. The evaluation of these factors is essentially predictions of these factors according to their historic behavior. In this paper, the data transmission reliability, equipment stability, and software reliability are considered in the evaluation of the storage reliability.

3.3.1. Transmission Reliability

This factor is used to estimate the stability of the networks in practice. To store a set of date files into a cloud storage system, the data transmission in the network is necessary and very important for the distributed storage. The transmission reliability is of crucial importance for successful data storage.

In networks, the information and data with large datagrams are broken up into several small packets which will be transferred along with different network paths in one or several networks and reassembled at the target end. However, when one or more packets of data traveling across a computer network fail to reach their destination, some packets may be lost. Packet loss is one of the common faults in the network and typically caused by a number of factors that can corrupt or lose packets in transit, such as network congestion, radio signals that are too weak due to distance or multipath fading (in radio transmission), faulty networking hardware, or faulty network drivers. What is more, packets are sometimes intentionally dropped by normal routing routines (such as Dynamic Source Routing in ad hoc networks [26]) and through network dissuasion technique for operational management purposes [27]. Packet loss can reduce throughput for the senders, whether unintentionally due to network malfunction or intentionally as a means to balance available bandwidth between multiple senders when a given router or network link reaches near its maximum capacity [28].

This kind of network fault is measured as a percentage of packets lost with respect to packets sent. Therefore, the packet loss rate is an important criterion to measure the stability of networks.

When reliable delivery is necessary, packet loss increases latency due to additional time needed for retransmission. Assuming no retransmission, packets experiencing the worst delays might be preferentially dropped (depending on the queuing discipline used) resulting in lower latency overall at the price of data loss. The acceptable amount of packet loss depends on the type of data being sent [29]. Therefore, the packet loss rate is used to evaluate the network reliability. However, packet loss often happens randomly. It is difficult to determine the exact rate for current usage. In this paper, the average packet loss rate in a period of time is used to measure the reliability of the candidate path. The packet loss rate can be easily obtained via log files. Thus, the transmission reliability between servers and can be measured as follows:

3.3.2. Equipment Stability

A datacenter is an integrated system composed of equipment systems and software systems. The stability of the equipment in the cloud storage system is to measure the reliability of the physical devices while storing data files. There are many devices in a datacenter, which includes infrastructure (electric generators, uninterruptible power system, air conditioners, etc.), IT equipment (servers, switchers, etc.), and networks. All the computing and storage services are implemented upon these devices and the fundamental premise of the successful service implementation is the trouble-free operation of the critical equipment, which is measured by mean time between failures (MTBF).

The operating ratio is used to measure the equipment stability in this paper. Given a device, its operating ratio during interval can be calculated as the following formula:where MTBF denotes mean time between failures during the interval . The operating ratio of the storage process is evaluated to be the minimal ratio among all involved devices:where denotes the devices involved in the path from to . The number of of each device can be found in the operating log files.

3.3.3. Software Reliability

Software reliability is more difficult to be ensured than hardware reliability in computing service field, which will impact the reliability of the whole system. In cloud storage systems, the service providers usually pay more attention to the speed of the data processing, the correctness of the results, and the amiability of the interface of the software but less attention to the reliability. However, reliability problems emerging during the employment of the software often increase the difficulties and effort of the software maintenance. Serious reliability problems might cause the system paralysis.

Software reliability is defined to be the probability of failure-free operation for a software system in a given environment [30] or the ability of a system or component to perform its required functions under stated conditions for a specified period of time [31], which is of great importance for the safety-critical systems and influenced by both the software itself and software operating environment. Before or during the storage process, the data files are usually processed using different software for various purposes (e.g., file encryption, file splitting, and merging). A reliable software should have abilities to run without fault and output correct results. Hence, the software reliability is an important factor that should be considered when evaluating the data storage reliability. There are various metrics to specify the software reliability, such as failure rate, mean time to failure (MTTF), and reliability function [32]. In this paper, the software reliability is evaluated to be the mission success rate, which can be calculated to the ratio of allocating tasks to the successfully completed tasks. For software, the numbers of allocating tasks and successfully completed tasks can be obtained via its log files.

In this paper, the software reliability is to measure the behavior of software in recent time interval . Let the number of the allocating tasks be and the successfully completed tasks be . Hence, the software reliability will be calculated aswhere and can be obtained from the log files of the software or the operation system. What is more, the failure rate can be easily formulated as follows:

3.3.4. Service Reliability of Cloud Storage

Storage reliability is a significant metric to measure the quality of service of a storage strategy. After being submitted to the storage system, both clients and service providers expect that the data files can be stored with high reliability. The evaluation of the reliability of storage strategies is essential before making reasonable decisions.

For each inseparable data file , its storage reliability is influenced by not only the transmission reliability, software reliability, and equipment stability involved in the uploading server to the destination server but also the physical storage mediums in the destination server, which can be formulated as follows:

The reliability of a storage strategy for dataset is calculated to be the product of the reliability of each inseparable data file as follows:

3.4. Multiobjective Reliable Storage Model

As a matter of fact, the storage strategies are often limited by various real restrictions or limitations, such as storage capacity, off-site backup, impacts of storage devices, storage security, and data integrity. These constraints should be considered inevitably during the determination of storage strategies.

The constraints can be stated as follows:(1)During storage, the original data files and the backup files are treated to be the same. In a feasible storage strategy, each inseparable data file (including the backup file) should be assigned to only one server.(2)For an inseparable data file , the number of its backups should not be less than the number for the backup with the same security level. This constraint can be formulated as .(3)The data file and its backups should be assigned to be stored at different servers in different places between each other. This constraint can also be reflected via the inequality .(4)If data file was assigned to be stored in server , the available storage space in should be enough to store the assigned data file with the corresponding reliability. This constraint can be expressed as .

Based on the above analysis, the multiobjective optimization model for massive data storage can be formulated as follows:

The proposed multiobjective optimization model for reliable cloud storage is a constrained multiobjective combination optimization problem, which is too difficult to be solved by traditional optimization algorithms. To solve this problem, one has to design a specific optimization algorithm using some particular but effective optimization technique. In this paper, we design a multiobjective particle swarm optimization algorithm to solve this proposed model.

4. Multiobjective Particle Swarm Optimization

As is well known, it is difficult to solve a constrained multiobjective problem using traditional optimization algorithms. Hence, researchers or engineers often turn to intelligent approaches to solve such multiobjective optimization problems. The mathematical model proposed in this paper for massive data storage in cloud environment is an obviously constrained multiobjective optimization problem.

Swarm or population intelligence inspired optimization techniques are becoming more and more popular. They simulate the evolution process of the nature, or the social behavior of insects swarms, flocks of birds, or schools of fish. The advantages of these optimization approaches over traditional techniques are their robustness and flexibility. These properties make swarm intelligence a successful design paradigm for algorithms that deal with increasingly complex problems. Thus, we employ a swarm intelligence optimization algorithm to solve the above model in this paper.

4.1. Brief Introduction of PSO

Particle swarm optimization (PSO), derived from the research of birds predation [3336], is essentially an evolutionary computation technique, which optimizes the problems by iteratively improving the candidate solutions dubbed particles according to one or several given quality measurements. When solving optimization problems, PSO moves the particles around in the decision space by utilizing simple mathematical formulae over the position and velocity of the particles. The movement of each particle is under direct control of its local best known position and the global best position found so far by the population [37]. It is expected to move the particles toward the best solutions. Algorithm 1 represents a simple PSO algorithm [38].

(1) Randomly generate an initial swarm
(2) while termination criterion is not satisfied do
(3)     for each particle   do
(4)    if the particle is better than   then
(5)       position of
(6)    end if
(7)    
(8)    Update velocity of particle
(9)    Update position of particle
(10)   end for
(11) end while
(12) Return the results

Similar to the genetic algorithm (GA), PSO algorithms need very little additional mathematical information of the problems to be solved, start the search process from randomly generated initial population, use fitness mechanisms to evaluate the particles, do stochastic search according to evaluation value of the particle, and so on. PSO and GAs do not use the gradient information of the problem being optimized, so that they can solve a wider range of optimization problems than the classic gradient-based optimization methods.

Unlike GAs, the PSO algorithm has different information sharing mechanism. GAs share information among the chromosomes, which makes the whole population approach optimal regions uniformly. But in PSO algorithms, there are no crossover and mutation operators, and the information is provided to other particles via the current global best particles. Therefore, such one-way information sharing makes the update processes follow the current best solutions, which might help the swarm converge to the optimal solutions fast. Compared to GAs, PSO algorithms have fewer parameters and are easier to be implemented. PSO has found a wide range of applications in function optimization, neural network, fuzzy systems, and many other fields [3941]. An extensive survey of PSO applications has been made by Poli and Engelbrecht [42, 43]. In [36, 41], several variations of PSO algorithms for multiobjective optimization problems are introduced and their convergence is also analyzed. Also, the convergence of PSO algorithm had been studied empirically in [43, 44]. Thus, according to the guidance and analysis in [36, 41], we construct a simple PSO algorithm to solve the proposed storage model.

4.2. PSO for the Proposed Model
4.2.1. Encoding

One of the most important issues in designing a PSO algorithm to solve the proposed model is to determine reasonable encoding of the solutions. The particles in the swarm of the PSO algorithm is a candidate storage strategy. In this paper, we use a matrix to express a possible storage strategy, in whichwhere .

However, using matrices to express the particles in the swarm often makes the evolution process difficult to implement. As is well known, the matrix can be converted to a -vector , in which . Therefore, we use the -dimensional vector to take the place of the matrix for coding the candidate storage strategies in the PSO algorithm. Another issue is to decode the results obtained by the PSO algorithm to be readable storage strategies. For , can be decoded as follows:where and . Here, is the function that its value is the smallest integer larger than or equal to .

4.2.2. Evolution and Selection

The evolution process in the PSO algorithm is essentially to update the individuals in population. This process is controlled by the velocity of each particle in PSO, which is updated iteratively by its coordinates in the search space associated with the best solution it has achieved so far and the best position found by its neighbors. Let be the best position of a particle, the best position of the neighbors of the particle, and the best position of all particles in the population. At each generation, the particle swarm optimization changes the velocity of each particle toward its and locations. The acceleration toward and locations is generated randomly and separately.

Let be the position of current particle in the designed PSO algorithm. Let be the best position that has found, the global best position of the population, and the velocity. The particle in the th generation is updated as follows [33]: where the parameters and are two separate random functions according to the generation returning a random value uniformly generated in , respectively. are the acceleration coefficients. is a coefficient inertia weight set to be time-varying and gradually decreasing typically from 0.9 to 0.4 [45].

It has been studied in many literatures [46, 47] that elitism has advantages in achieving better convergence in MOEAs. Hence, during the evolution process of the designed PSO algorithm, a tournament based selection strategy is employed to select particle to be updated. So as to ensure the convergence of the designed PSO algorithm, we employ the elitism strategy to select the next population from the offspring and parents and follow the guidance in [36, 41, 43].

4.2.3. Constraint Handling

When solving the constrained multiobjective model proposed in this paper, infeasible solutions will be produced inevitably. Infeasible solutions have so significant impacts on the performance of the solution PSO algorithm that they often result in failure of finding feasible solution of the optimization problem. It is necessary to handle the constraints reasonably to improve the performance of the solution PSO algorithm. Therefore, the constraint handling method in [46] is employed in the solution of PSO algorithm in this paper.

Suppose and are two given storage strategies. is called to be better than , if and only if one of the following conditions is satisfied:(i)Both and are feasible, and the strategy is with better objective value.(ii)Both and are infeasible, and the strategy is with smaller constraint violation value.(iii) is feasible and is infeasible.

According to these comparison rules and Pareto domination, the individuals in the population can be sorted.

4.2.4. Designed PSO Algorithm

According to the analysis on the proposed multiobjective reliable cloud storage model and the above operators, we design a constrained multiobjective PSO algorithm (CMPSO) as Algorithm 2.

(1) Randomly generate an initial swarm
(2) Initialize the parameters, and for each particle in
(3) ,
(4) while termination criterion is not satisfied do
(5)   
(6)   
(7)     non_dominate()
(8)   for each particle in   do
(9)       for each particle in archive do
(10)       if   dominates   then
(11)          
(12)       end if
(13)       
(14)       Update velocity of using Eq. (19)
(15)       Update position of using Eq. (20)
(16)       Put the new position of in
(17)       end for
(18)   end for
(19)   Select from
(20) end while
(21) return the nondominated solutions

5. Experiments

5.1. Experimental Settings
5.1.1. Experiment Design

The experiments are designed based on the storage prices of two cloud companies, which have different storage prices for 3 different places of company A. The detailed price can be found in Tables 13. When storing high security level data files, special storage devices or related encryption algorithm will be needed to process these data files. Thus, it will cost much CPU time and high level storage disk usage. It is assumed that there are 4 types of storage disks in each datacenter and the storage price is as listed in Tables 1 and 4. Table 1 lists the storage price of normal disks and Table 4 the storage price of high level disks.

In the experiments, the total size of the set of data files is set to be 120 TB, which is composed of 10 subfiles with different size and different security level. To store these data files, we conduct three storage strategies: the integral storage strategy, separate storage strategy, and splittable storage strategy. For the integral storage strategy, all the data files in a storage task are treated as an integral file to be stored. For the separate storage strategy, each subfile can be stored separately in which the subfile cannot be split. In the splittable storage strategy, the data files can be partitioned into several smaller files using the file splitters when needed. These three storage strategies are used to confirm the effectiveness of the proposed model and further investigate the influences of different storage strategies on the storage cost and reliability.

5.1.2. Parameter Settings of the Designed PSO Algorithm

According to the model in this paper, the other parameters in the PSO algorithm are adopted as follows:(i)Population size: (ii)Acceleration coefficients: and , where is current generation and is the maximum generation(iii)Selection strategy: the next population is selected using elitism strategy according to the fast nondominated sorting approach [46](iv)Stopping criterion: the algorithm will be stopped when the evolution generation reaches 2000 or the solutions found by the algorithm do not change in 30 continuous generations.

5.2. Experimental Results

In the experiment of the integral storage, the PSO algorithm is employed to solve the proposed multiobjective model, in which all the test data files are treated as an integral file. The feasible storage solutions found by the PSO algorithm are as shown in Figure 2 and Table 5. Under this storage strategy, all data files were stored in the same disk of a server. The total cost has direct relationship with the storage reliability and the data security level. In order to satisfy the constraint , all the data files should be stored in the HD which can fit the highest security level among all the data files. It can be seen vividly from Figure 2 that the total cost of storing the test data files is increasing rapidly with the storage reliability. Practically, the higher the storage reliability is, the more it will cost. The storage cost is proportional to the highest level of the data files, which has also direct impacts on the storage reliability. It is because the suitable target servers should be explored when storing these files into certain disk and the transformation process of these data files will significantly reduce the storage reliability. What is more, if the suitable target server cannot be found, this storage strategy will not be able to be carried out. Therefore, it is unpractical and unreasonable to store all the data files as an integral file.

Another storage strategy is that each subfile is treated to be inseparable and can be stored independently. The data files submitted by the client might be stored in different servers or places. Hence, data files with different security level can be handled specifically, so that the cost can be saved. Also, it is easy to find feasible storage solutions with higher utilization rates of storage spaces. The experimental results are as shown in Figure 3 and Table 5. Using the proposed storage model to store these test data files, one can find feasible storage solutions easily. From Figure 3, it can be seen that when the storage reliability is less than 0.95, the total cost grows smoothly. When the storage reliability is more than 0.95, the total cost will increase rapidly. Compared with the integral storage strategy, it can be found from Table 5 that the cost of the separable storage strategy with each subfile to be inseparable is much lower than that of the integral storage strategy. These results indicate that the separable storage strategy is much practical and the proposed multiobjective model is correct and effective.

As there are many file splitting techniques, each data file can be partitioned into smaller data files by using these techniques while being stored. In the further experiment of the separable storage strategy, each data file is treated to be divisible. When storing these files, each data file will be partitioned according to the available storage space in servers. While storing, each subfile will be partitioned into several parts with the same security level when needed. Therefore, the data file can be divided for any usable space during the storing process, which can make the solution algorithm find feasible solution much easier. This way can also improve the space utilization rates effectively. The experimental results can be found in Table 5 and Figure 4. Comparing the results shown in Figures 3 and 4, one can find vividly that, with the storage reliability growing, the total cost in Figure 4 grows more slowly. Also, in this experiment, the PSO algorithm can find much more storage solutions. As shown in Figure 4, the solutions obtained by the PSO algorithm are uniformly distributed, which indicates that many more varied storage schemes can be obtained with high probability coupled with the splittable storage strategy. From Table 5, it can be seen that the storage strategy with divisible subfiles can reduce the storage cost efficiently.

Comparing the results listed in Table 5, we can find that, to achieve given storage reliability, splittable storage strategy costs the least among the 3 storage strategies conducted in the experiments. Figure 5 intuitively shows the differences among these three storage strategies. Furthermore, the results of these experiments can demonstrate the validity and correctness of the proposed multiobjective data storage model, which can also indicate that the PSO algorithm can solve the proposed multiobjective model effectively.

6. Conclusions

Data has become a kind of important resources, which has increasingly significant effects on the Internet companies and telecommunication enterprises. The volume of the data resources is growing so rapidly that their storage is challenging the quality of service of the storage systems significantly. Based on the analysis of data storage in the cloud environment, we built a multiobjective optimization model considering storage cost and reliability simultaneously. To store a set of data files, the total cost is composed of storage cost, communication cost, and migration cost. The device stability and transmission reliability are considered in the evaluation of storage reliability. To solve the proposed multiobjective optimization model and find feasible storage solutions, a PSO algorithm is tailor-made. Different experiments are made to validate the correctness of the proposed model. In the experiments, the data files to be stored are handled in different ways in advance. Experimental results confirm that the proposed model is positive and effective for cloud storage.

Furthermore, an obvious phenomenon that emerged during the experiments is that how to preprocess the data files has significant effect on the experimental results or storage strategies obtained by the PSO algorithm. If the data files submitted by the clients are treated as an integral file, all the data files need to be stored in a server which has enough available storage space with the reliability meeting the demand of the highest security level among the data files. Hence, it costs much in searching the servers with sufficient available storage space of certain reliability. However, when the available storage fragments in the cloud storage systems were too small and scattered, it might lead to failure in finding a suitable fragment to store all these data files. These data files preprocessing strategy will lead to low utilization ratio of the storage space. The other two preprocessing strategies can effectively raise the storage space utilization, because each data file can find suitable storage fragment easily. Therefore, there will be more possible storage schemes and the best scheme can be found by using the model and PSO algorithm proposed in this paper.

The main goal of the model proposed in this paper is to find optimal storage scheme for storing the submitted data files. From the experiments, it can be seen vividly that the cost and reliability of the storage scheme for the submitted data files strongly depend on the preprocessing strategies of the data files, which also impacts the utilization efficiency of the storage space. The proposed model in this paper should be coupled with reasonable data file preprocessing method, so that it can improve the quality of service and the resource utilization efficiency of the cloud storage system effectively.

Nomenclature

Number of severs
Number of files to be stored
:th file to be stored
:Security level of
:Available storage space of th HD in
Storage reliability of th HD
:Communication price between and
Sign function, where
:Storage cost of
:Migration cost of from to
:Communication cost of storing
Total cost of storage service cost for
:Equipment stability of link
:Storage reliability of
:th server
:Distance between and
Set of files to be stored
: is stored in th HD of
Number of the types of HDs in a datacenter
:Reliability of
:Transmission reliability between and
Candidate storage strategy
:Data volume of stored in
:Storage cost of storing
:Migration cost in storing
:Total communication cost of storing
:Transmission reliability of link
:Reliability of software in
Reliability of storing .

Disclosure

This paper is an extended version of our conference paper [48]. The extension includes the following: (1) we summarize the related work on multiobjective reliable storage; (2) we update the proposed multiobjective reliable storage optimization model with more precise objective function; (3) we analyze the evaluation of storage reliability in detail; (4) a tailor-made PSO algorithm is designed in this paper; (5) we conduct new experiment with new storage strategy to evaluate the correctness and effectiveness of our model.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61502371 and 61472311), the Fundamental Research Funds for the Central Universities (BDZ011401, JB151005, and JB161003), the Novel Technology Research of Universities Cooperation Project sponsored by the State Key Laboratory of Satellite Navigation System and Equipment Technology (KX152600027), Fundamental Research Project sponsored by State Administration of Science, Technology and Industry for National Defense (JCKY2016110C006), and the Basic Research Program of Sichuan Province of China (2017JY0209).