Research Article | Open Access
Two General Extension Algorithms of Latin Hypercube Sampling
For reserving original sampling points to reduce the simulation runs, two general extension algorithms of Latin Hypercube Sampling (LHS) are proposed. The extension algorithms start with an original LHS of size and construct a new LHS of size that contains the original points as many as possible. In order to get a strict LHS of larger size, some original points might be deleted. The relationship of original sampling points in the new LHS structure is shown by a simple undirected acyclic graph. The basic general extension algorithm is proposed to reserve the most original points, but it costs too much time. Therefore, a general extension algorithm based on greedy algorithm is proposed to reduce the extension time, which cannot guarantee to contain the most original points. These algorithms are illustrated by an example and applied to evaluating the sample means to demonstrate the effectiveness.
Latin Hypercube Sampling (LHS) is one of the most popular sampling approaches, which is widely used in the fields of simulation experiment design , uncertainty analysis , adaptive metamodeling , reliability analysis , and probabilistic load flow analysis . Compared with other random or stratified sampling algorithms, LHS has a better space filling effect, better robustness, and better convergence character. The extension of LHS is to obtain a LHS of a larger size that reserves the preexisting LHS (or the original LHS). There are at least two situations that need the extension of sampling, especially for time consuming simulation systems. One is sequential sampling for sequential analysis, adaptive metamodeling, and so on. The other is to consider the extension of LHS when the original LHS was subsequently determined to be too small and a new LHS of a larger size without original sampling points might be time consuming. But the LHS structure makes it difficult to increase the size based on an original LHS while simultaneously keeping the stratification properties of LHS.
A special extension case is the integral-multiple extension where the new LHS is integral times the size of the original sampling. Tong  proposed integral-multiple extension algorithms for stratified sampling methods including LHS. Sallaberry et al.  gave a two-multiple extension algorithm of LHS with correlated variables. Later, two related techniques appeared in the papers named “nested Latin hypercube design”  and “nested orthogonal array-based Latin hypercube design” . A nested Latin hypercube design with two layers is defined to be a Latin hypercube design that contains a smaller Latin hypercube design as a subset. A special integral-multiple extension method called -extended LHS method was illustrated, where the new LHS contains smaller LHSs . Some related papers were produced by Vorechovsky [11–13], where the new sampling size is multiple times more than the original sampling size. The integral-multiple extension algorithms have a good feature that can obtain a strict LHS of larger size and simultaneously preserve all the original sampling points.
In this study, we consider the general extension algorithm of LHS where the new sample size is more controllable and the algorithm can be applied more widely. Wang  and Blatman and Sudret  obtained an approximate LHS of a larger size, which might have two or more original points falling into the same variable interval. Wei  also proposed a general extension algorithm to get an approximate LHS, which might have no point falling into a variable interval. However, a sample is a LHS if (and only if) there is only one point in each variable interval. The approximate LHS does not satisfy the definition and is harmful to the extension with some criteria, such as correlated variables, maximizing minimum distance, orthogonal array, and so on.
In this paper, we would like to obtain a strict extension of LHS (ELHS) rather than an approximate one and the new LHS contains original sampling points as many as possible. The extension algorithm includes two parts: the reservation of original sampling points and the generation of new ones. As the generation of new sampling points is almost the same as integral-multiple extension, the reservation of original sampling points is the main problem to discuss. The relationship of original sampling points in new LHS is expressed by a graph. Then, the reservation problem can be solved by graph theory.
In Section 2, the procedure of LHS and the mathematical description for extension problem of LHS are given, which is the basis of extension algorithm. In Section 3, a basic general extension algorithm of LHS (BGELHS) is proposed based on the graph theory, which reserves the most original sampling points. A general extension algorithm of LHS based on greedy algorithm (GGELHS) is intended to balance the simulation runs and the generation time of ELHS. In Section 4, the proposed extension algorithms are illustrated by an application, which is performed to evaluate sample means. Finally, the conclusion and some thoughts for the future research are given.
2. Latin Hypercube Sampling and Extension Problem
2.1. Procedure of LHS
Suppose that the input variables are and the range of is , . Then, LHS can be obtained as follows, which can ensure that each input variable has all portions among its range.(a)Divide the range into equiprobable intervals , . So the intervals satisfy , , and , where .(b)For the th interval of variable , the cumulative probability can be obtained aswhere is a uniform random number ranging from 0 to 1. So all the probability values can be noted as .(c)Transform the probability into the sample value by the inverse of the cumulative distribution function :Then, the sample matrix is (d)The values of each variable are paired randomly or in some prescribed order with the values of the other variables. Then the sample matrix of LHS can be written aswhere each row is a sampling point.
Figure 1 shows an example of LHS in size of 5, where each interval of each input variable has one sampling point.
2.2. Extension Problem of LHS
In this paper, the extension problem of LHS has two purposes. One is to construct a strict LHS of a larger size than original LHS. The other is to reserve the most original sampling points in the new LHS. Assume that there is an original LHS and the sample set is , where is a sampling point, . The objective of extension is to get ELHS of size noted as , where and is an integer. satisfies the constraint condition as shown inwhere denotes the number of elements in set. It means that the objective is to obtain ELHS in size of that contains the original LHS points as many as possible. Figure 2 shows the original LHS points in Figure 1 and the structure of ELHS in size of 7, which can be seen that the original sampling points do not accord with the new structure. So we need to delete original sampling points as little as possible and then create some new sampling points. The reserved points and the new ones are ELHS in size of 7.
3. General Extension Algorithms of LHS
The general extension problem can be described as (5). In this section, two general extension algorithms of LHS are demonstrated to get ELHS. The problem of selecting reserved points is transformed into a problem to get the maximum independent set, which is a graph theory problem. The generation procedure is based on a subtraction rule . These two stages construct BGELHS in Section 3.1. As BGELHS costs too much time to obtain ELHS, GGELHS is proposed in Section 3.2.
3.1. Basic General Extension Algorithm of LHS
Some original sampling points might be deleted before generating new sample. At the beginning of BGELHS, the original sample is transformed into a graph. Here the definition of simple graph is introduced.
Definition A (see ). A simple graph is a graph where there is no loop or multiple edges, where is a vertex set and is an edge set. No loop means that there is no edge from a vertex to itself. No multiple edges means that there is 0 or 1 edge between two vertices.
Let the original sample matrix be and note the sampling points as by the row orders. Transform the sampling points into vertices of a graph as follows: Every point is a vertex and links two vertices by a line if the corresponding two sampling points are in the same interval of the ELHS structure. Then we get a simple undirected graph. Order the sampling points in Figure 2 from left to right and the corresponding simple undirected graph is obtained as shown in Figure 3. The original problem in (5) is to delete the least vertices and their corresponding lines so that the remaining vertices have no line to link, which can be proved by reduction. Independent set, maximal independent set, and maximum independent set are introduced to solve this problem.
Definition B (see ). It is supposed that there is a simple graph where is the set of vertices and is the set of edges. If a set of vertices and there is no edge between any two vertices of , the set is called an independent set.
Definition C (see ). It is supposed that there is a simple graph where is the set of vertices and is the set of edges. and are independent sets. If and , is not an independent set. Then, the set is the maximal independent set. If , . Then the set is the maximum independent set. The number of elements in is the independent number of graph noted as or .
From the definitions, reserving the most original sampling points can be transformed into computing the maximum independent set; that is, delete the least vertices so that there is no edge between any two vertices. In graph theory, the simple graph can be expressed as an adjacent matrix , which shows the lines between vertices. If there is a line between vertex and vertex , . Otherwise . Obviously, the properties are given as follows:(a)symmetrical characteristic ;(b), ;(c)if , the vertex set is an independent set.So the reservation problem is transformed into a problem to get the maximum independent set; that is, delete the least vertices so that the corresponding adjacent matrix of remaining vertices is a zero matrix. Suppose that there is a LHS of size , the sample matrix is , the order matrix is ( is the interval order number of in the th variable), the cumulative probability matrix is ( is the cumulative probability of , which is the in (1)), and the random matrix is ( is the random value of , which is the in (1)). The objective is to get ELHS of size noted as . The detailed steps of BGELHS algorithm are as follows.
Step 1. Compute the positions of original sampling points in ELHS structure.(a)Transform into the new sample structure and get the order matrix as shown in the following:where , , and is the floor function ( is the largest integer not greater than ).(b)Transform into the new sample structure and get the random matrix as shown in the following:where , .Step 2. Reserve the most original sampling points, noted as the sample matrix .(a)Compute the adjacent matrix of the original LHS based on as shown in the following:(b)Choose sampling points of original LHS to delete, . Then we can get a combination matrix , where , is a combination, and the number of elements in is not more than that in , (e.g., let and , , where means there is no element). Let and delete the point(s) corresponding with the element(s) of ; then judge whether the remaining adjacent matrix is equal to . If , record the order number of deleted point(s) as and the dimension number of is noted as , that is, the number of remaining points. Otherwise, set , delete the points corresponding with the elements of , and repeat the Step 2-(b).(c)Delete the rows of , , , and according to the element of . The remaining matrices are noted as , , , and .Step 3. Generate the new sample matrix .(a)Generate a new order matrix of size and every column of is a random permutation of the integer number from 1 to .(b)For every column, delete the element that is the same as the corresponding column in and get the matrix .(c)Generate a random matrix where is a uniform random number ranging from 0 to 1.(d)Compute the cumulative probability matrix as shown in the following:(e)Compute the sample matrix as shown in (2).Step 4. Get the order matrix , the random matrix , the cumulative probability matrix , and the sample matrix of ELHS:As taking (7) into (1) as (11), we can prove that the matrix is a sample matrix of LHS. Equation (11) shows that the remaining original sampling points meet the basic principle of LHS. As the new sampling points have the same feature, is a strict LHS:
Example 1. BGELHS is illustrated by the original LHS in Figure 1. Let have a uniform distribution on and let have a triangular distribution on with mode at 7.5. The distribution function isThe inverse of the distribution function isThe sample matrix and related matrices areThen, the objective is to get ELHS of size 7, which contains the original sampling points as many as possible. So , .
From Step 1, the distribution of original sampling points in the new sample isFrom Step 2-(a), the adjacent matrix is computed:From Step 2-(b), all the combinations are noted as :Delete the point(s) corresponding with the element(s) of , . For the fifth iteration, and . Delete the fourth point (i.e., delete the fourth row and fourth column of ) and the new adjacent matrix is . Then, and . From Step 2-(c), the addition sample is computed:From Step 3, an order matrix is original:ThenFinally, the ELHS of size 7 is obtained, which is shown in Figure 4:
3.2. General Extension Algorithm of LHS Based on Greedy Algorithm
In graph theory, obtaining the maximum independent set is a NP-hard problem. Step 2-(b) of BGELHS needs the most time and the time complexity of BGELHS is . It means that, with the increase of the number of sampling points, the time of BGELHS will be increased exponential. In this section, GGELHS is proposed to reduce the time of extension algorithm. First of all, the definition of degree is introduced.
Definition D (see ). In a simple graph , the number of edges associated with the vertex is the degree of noted as .
indicates the number of vertices linked with the vertex . The basic idea of greedy algorithm is to delete a point one time whose is the maximum until of any remaining vertex is 0. The steps of GGELHS are the same as those of BGELHS except for Step 2-(b). The details of Step 2-(b) are given as follows.
Step 2-(b): compute the degree of every vertex , . Delete the vertex whose degree is the maximum. Delete the corresponding row and column in the adjacent matrix ( at the beginning) and note the remaining matrix as . Record the order of in . If , , and repeat Step 2-(b) to reconstruct . Otherwise, the element of is the order of points to delete. The dimension of is noted as and the order of points to remain is recorded in .
In Step 2-(a) of GGELHS, when computing the adjacent matrix , the time required is . It is easy to know that the time complexities of other steps are not greater than . So the time complexity of GGELHS is , which is much less than BGELHS. But this algorithm gets the maximal independent set rather than the maximum independent set; that is, this algorithm cannot guarantee to reserve the most original sampling points.
Example 2. To compare BGELHS and GGELHS, we consider the same question of Example 1. Since GGELHS is almost the same as BGELHS except for Step 2-(b), we mainly give Step 2-(b) of GGELHS.
From Step 2-(b), the degrees of five points are . The fourth point has the maximum degree. So delete the fourth point, and the adjacent matrix is . So, , .
BGELHS run five times in Step 2-(b) and GGELHS run only one time. As computing the degree of original sampling points costs a little, GGELHS is less time consuming and more practical. From the result above, we can know that Step 2-(b) gets the maximum independent set in this example, which means that this algorithm deletes the least original sampling points. But the greedy algorithm cannot guarantee to have the same effect for every circumstance.
BGELHS and GGELHS can be applied for many other applications, such as adaptive metamodel building and sequential experiments design and analysis. In Section 4, pick up two test examples to evaluate the sample means . The functions are shown in (22) where , , and obeys uniform distribution:
Four algorithms are applied including Monte Carlo Sample (MCS), classical LHS (CLHS), BGELHS, and GGELHS. Set the maximum sample size noted as . The steps are as follows.(a)Pick samples (MCS or LHS) of size .(b)Compute the output of samples, , .(c)Compute the average of every sample outputs as shown in the following:(d)Compute the standard deviation of as shown in the following:where is the mean of , .(e)Let . If , stop. Otherwise, generate sample by the five algorithms and go to step b.
In this application, , , , and for MCS, CLHS, BGELHS, and GGELHS. Figure 5 is the result of these four algorithms. It can be concluded that the convergence rates of two extension algorithms are the same as CLHS and better than MCS. Therefore, the extension algorithms reserve the good convergence of CLHS.
Compared with CLHS, an outstanding advantage of extension algorithms is the less number of runs. Figure 6 shows the number of function runs in this application. The run number of CLHS is . BGELHS saves 3302 runs for function 1 and saves 3325 runs for function 2. GGELHS saves 3263 runs for function 1 and saves 3259 runs for function 2. Therefore, BGELHS and GGELHS save more than half of the runs compared with CLHS.
The total times of these extension algorithms are shown in Figure 7. The times of MCS and CLHS are almost the same. The time of GGELHS is one and two orders of magnitude slower than that of MCS and CLHS because of Step 2. BGELHS cost the most time, which is a serious problem when applied. GGELHS is easy to apply in practice.
A drawback to LHS is the extension problem and this paper mainly discusses how to obtain a new strict LHS that contains the original sampling points as many as possible. There are three problems to solve including how to express the relationship of original sampling points in the ELHS structure, how to reserve the most original sampling points and how to generate the new sampling points. In this paper, BGELHS and GGELHS are proposed to solve these problems. BGELHS can reserve the most original points in theory. However, it is difficult to apply in practice because of the problem of time consuming. GGELHS cannot guarantee to reserve the most original points, but the time complexity is much less than that of BGELHS. Therefore, BGELHS is suitable for the analysis of time consuming system when the running time of system is much more valuable than the generation time of ELHS. GGELHS can be applied for more situations. A simple example and application for the evaluation of sample means show the effectiveness of these algorithms.
This paper illustrates the relationship of ELHS and graph theory. Some conclusions of graph theory can be applied for the general extension of LHS directly, which is the main contribution of this paper. The relationship of original sampling points in ELHS is expressed in the form of adjacent matrix, but the reverse does not hold. The reason is that some information of original sampling points is lost including the dimensions of variables, the value of sampling points, and so on. Therefore, we can improve BGELHS based on the lost information and some conclusions of graph theory, which is under investigation. On the other hand, LHS is often applied with some criteria for a better distribute or space-filling, such as correlated variables, maximizing minimum distance, and minimizing discrepancy. The general extension of LHS with these criteria is also the future work.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors would like to thank the editor and two referees for helpful comments. This research is supported by the National Natural Science Foundation of China (Grant no. 61403097) and the “Fundamental Research Funds for the Central Universities” (Grant no. HIT. NSRIF. 2015035).
- B. G. M. Husslage, G. Rennen, E. R. van Dam, and D. den Hertog, “Space-filling Latin hypercube designs for computer experiments,” Optimization and Engineering, vol. 12, no. 4, pp. 611–630, 2011.
- J. C. Helton, F. J. Davis, and J. D. Johnson, “A comparison of uncertainty and sensitivity analysis results obtained with random and Latin hypercube sampling,” Reliability Engineering and System Safety, vol. 89, no. 3, pp. 305–330, 2005.
- G. G. Wang and S. Shan, “Review of metamodeling techniques in support of engineering design optimization,” Journal of Mechanical Design, vol. 129, no. 4, pp. 370–380, 2007.
- A. Olsson, G. Sandberg, and O. Dahlblom, “On Latin hypercube sampling for structural reliability analysis,” Structural Safety, vol. 25, no. 1, pp. 47–68, 2003.
- H. Yu, C. Chung, K. Wong, and J. Zhang, “A probabilistic load flow calculation method with latin hypercube sampling,” Automation of Electric Power Systems, vol. 33, no. 21, pp. 32–81, 2009.
- C. Tong, “Refinement strategies for stratified sampling methods,” Reliability Engineering and System Safety, vol. 91, no. 10-11, pp. 1257–1265, 2006.
- C. J. Sallaberry, J. C. Helton, and S. C. Hora, “Extension of Latin hypercube samples with correlated variables,” Reliability Engineering and System Safety, vol. 93, no. 7, pp. 1047–1059, 2008.
- P. Z. G. Qian, “Nested Latin hypercube designs,” Biometrika, vol. 96, no. 4, pp. 957–970, 2009.
- X. He and P. Z. Qian, “Nested orthogonal array-based Latin hypercube designs,” Biometrika, vol. 98, no. 3, pp. 721–731, 2011.
- D. Williamson, “Exploratory ensemble designs for environmental models using k-extended Latin Hypercubes,” Environmetrics, vol. 26, no. 4, pp. 268–283, 2015.
- M. Vorechovsky, “Hierarchical subset latin hypercube sampling,” in Pravdepodobnost porusovani konstrukci (PPK ’06), pp. 285–298, Brno, Czech Republic, 2006.
- M. Vorechovsky, “Extension of sample size in Latin hypercube sampling–methodology and software,” in Proceedings of the 3rd Symposium on Life-Cycle and Sustainability of Civil Infrastructure Systems, pp. 2403–2410, Vienna, Austria, 2012.
- M. Vorechovsky, “Hierarchical refinement of Latin hypercube samples,” Computer-Aided Civil and Infrastructure Engineering, vol. 30, no. 5, pp. 394–411, 2015.
- G. G. Wang, “Adaptive response surface method using inherited Latin hypercube design points,” Transactions of the ASME—Journal of Mechanical Design, vol. 125, no. 2, pp. 210–220, 2003.
- G. Blatman and B. Sudret, “An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis,” Probabilistic Engineering Mechanics, vol. 25, no. 2, pp. 183–197, 2010.
- X. Wei, Research of global optimization algorithm based on metamodel [Doctoral Dissertation], Huazhong University of Science and Technology, Wuhan, China, 2012.
- D. B. West, Introduction to Graph Theory, China Machine Press, 2006.
Copyright © 2015 Zhi-zhao Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.