Abstract

Covering perfect hash families (CPHFs) are combinatorial designs that represent certain covering arrays in a compact way. In previous works, CPHFs have been constructed using backtracking, tabu search, and greedy algorithms. Backtracking is convenient for small CPHFs, greedy algorithms are appropriate for large CPHFs, and metaheuristic algorithms provide a balance between execution time and quality of solution for small and medium-size CPHFs. This work explores the construction of CPHFs by means of a simulated annealing algorithm. The neighborhood function of this algorithm is composed of three perturbation operators which together provide exploration and exploitation capabilities to the algorithm. As main computational results we have the generation of 64 CPHFs whose derived covering arrays improve the best-known ones. In addition, we use the simulated annealing algorithm to construct quasi-CPHFs from which quasi covering arrays are derived that are then completed and postoptimized; in this case the number of new covering arrays is 183. Together, the 247 new covering arrays improved the upper bound of 683 covering array numbers.

1. Introduction

A covering array of strength is an array over the symbol set such that in every subarray all -tuples over occur at least once. If each -tuple over occurs exactly once in every subarray, then the array is an orthogonal array of strength , denoted by . Every orthogonal array is a covering array, but not every covering array is an orthogonal array.

The main applications of covering arrays are in software and hardware testing. In the testing strategy called combinatorial testing, covering arrays are viewed as test-suites where the strength is the size of the interactions to be checked in a software component. For example, a covering array of strength can be used to test all possible combinations of values among any three of the parameters; see [1] for an introduction to the use of covering arrays in combinatorial testing. In the area of hardware testing, covering arrays have been used to check the presence of hardware Trojans [2].

Given the values of the strength , the number of columns , and the order , the problem of constructing covering arrays is the problem of finding the minimum number of rows such that a exists. Currently, there is no polynomial-time algorithm to solve optimally this problem for general values of , , and ; only some particular cases have been solved optimally [36].

The smallest , for which a covering array exists with strength , columns, and order , is called the covering array number of , , , and it is denoted by . A trivial lower bound for the covering array number is , since every subarray of columns must cover every one of the possible -tuples over the symbols. In addition to the cases with optimal solution, a number of improvements in the lower bounds for specific cases have been reported in [79].

With regard to upper bounds, much work has been done to constantly improve them, and several algorithms have been developed to construct increasingly better covering arrays. To keep track of the advances in the construction of covering arrays, the Covering Array Tables [10] contain the current best-known upper bounds for strengths and orders .

The techniques used to construct covering arrays can be classified as exact, greedy, metaheuristic, algebraic, and recursive; see [1114] for an overview of covering array construction methods. Metaheuristic algorithms to construct covering arrays include tabu search [15], genetic algorithms [16], simulated annealing [17], ant colony [18], particle swarm optimization [19], harmony search [20], hill climbing [21], bird swarm [22], a combination of bee colony and harmony search [23], Cuckoo search [24], differential evolution [25], tabu search as a hyperheuristic [26], and a combination of simulated annealing with a greedy algorithm [27].

One way to construct covering arrays is to use a high level representation of them called covering perfect hash family (CPHF). The elements of a CPHF are -tuples or -tuples over that represent a column vector of length . CPHFs are a compact representation of covering arrays because a CPHF with rows and columns represents a covering array with rows and columns. Commonly CPHFs have few rows (say ), but their corresponding covering arrays may have a large number of rows depending on the values of and .

In this work we develop a simulated annealing (SA) algorithm to construct CPHFs. The neighborhood function of the SA algorithm is composed of three perturbation operators which together provide exploration and exploitation capabilities to the SA algorithm. The probability of using one of the three operators in an application of the neighborhood function is tuned by experimenting with a number of probability distributions. Relevant results obtained with the SA algorithm are 64 new CPHFs whose respective covering arrays improved the best-known ones and other 183 new covering arrays obtained by means of a procedure based on constructing quasi-CPHFs. In total, these new covering arrays improve the upper bound of 683 covering array numbers .

This paper is organized as follows: Section 2 provides a background on CPHFs; Section 3 reviews some related works; Section 4 describes the components of the SA algorithm and shows the experimentation done to set the utilization rate of the operators in the neighborhood function; Section 5 presents the computational results; and Section 6 gives the conclusions of the work.

2. Covering Perfect Hash Families

CPHFs were introduced by Sherwood et al. [28]. Let be a prime power and let denote the finite fields with elements. For a given strength , let be the base representation of ; that is, . As defined in [28], a permutation vector is the vector of length that has the symbol in position for .

Permutation vectors are vectors with elements from , but they are represented by -tuples over . For fixed and the number of distinct permutation vectors is . The term “permutation vector” is because the vectors of length are formed by concatenating permutations of . For example, Table 1 shows the distinct permutation vectors for and . Another characteristic of permutation vectors is that their first elements are in that order.

Given a tuple of permutation vectors, let be the array of size whose columns are the permutation vectors of the tuple. If is an then the -tuple of permutation vectors is covering; otherwise the -tuple is noncovering. For example, the 3-tuple of permutation vectors is covering, because we can verify in Table 1 that the array of size with columns , , and is an . On the other hand, the tuple is noncovering since we can see in Table 1 that the array formed by the permutation vectors of the tuple is not an orthogonal array.

A is an array with elements from such that every subarray of columns has at least one row that is a covering tuple of permutation vectors [28]. A generates a replacing the elements of the CPHF by the corresponding permutation vectors of length . However, since the first elements of every permutation vector are , it is possible to delete the first elements from every permutation vector given by the last rows of the CPHF; the result is a . Equation shows a ; the permutation vectors are written as . Every one of the subarrays of columns contains as a row a covering tuple of permutation vectors. This CPHF generates a .

Example of a

In addition to permutation vectors generated by a -tuple over , we have that a -tuple over also generates a vector of elements when evaluated as in the base representation of every . However, some of these -tuples generate vectors that are not permutation vectors; specifically, the -tuples with generate vectors of length , where every group of elements is formed by occurrences of the same symbol.

An extended permutation vector is defined as the vector of length that has the symbol in position for [29]. For given and , the total number of extended permutation vectors is . As examples of extended permutation vectors, Table 2 shows six of the extended permutation vectors for and .

Extended permutation vectors enlarge the available options to construct CPHFs, because the number of possible symbols is now . As for permutation vectors, a generates a covering array replacing the elements of the CPHF by their corresponding extended permutation vectors of length . However, in this case it is not possible to delete the first elements of the extended permutation vectors given by the last rows of the CPHF; only the first element can be deleted because it is zero in any extended permutation vector. Thus, a generates a [29].

To make a distinction between CPHFs with elements from and CPHFs with elements from , we follow the convention of Colbourn et al. [30], where the first kinds of CPHFs are called Sherwood-CPHFs or SCPHFs, and the second ones are called simply CPHFs.

For the same number of columns and the same number of rows , a is better than a , because the SCPHF produces a covering array with less rows. However, CPHFs may have more columns than SCPHFs since there are more extended permutation vectors than permutation vectors.

This section reviews some methods to construct SCPHFs and CPHFs, and some SA algorithms to construct covering arrays.

3.1. Methods to Construct CPHFs

CPHFs were introduced by Sherwood et al. [28]. In that work SCPHFs were constructed by a backtracking algorithm. The first step of the algorithm is to initialize a candidate array of size ; in this candidate array each entry takes its first valid element from , but not all entries of the candidate array have the same set of valid elements due to the following constraints:(1)If the elements of the first row are in ascending order, and all elements are distinct. The other rows contain any permutation of .(2)If the array is divided into subarrays with at most columns; in each subarray every row has distinct elements from , and the first row of each subarray has the th element of in its th column.

Once a candidate array is formed, i.e., all its entries are assigned with an element from , the array is tested to see if it is a SCPHF. The test consists in verifying that every subarray of columns contains a covering tuple of permutation vectors. The following candidate arrays are generated using backtracking on the elements assigned to each entry of the array.

This backtracking technique is only effective for a small number of columns, say , since the number of candidate arrays grows considerably with each new column. The authors report SCPHFs with up to columns for strength , and with columns for strength four.

Walker II and Colbourn [31] constructed SCPHFs using tabu search. The initial solution is an array of size generated randomly. The algorithm employs a tabu list to store the last 50000 moves performed by the algorithm. A move replaces the value of one entry of the current solution by another value. Moves are only done in columns that are part of subarrays of columns with no covering tuple; such subarrays are called “uncovered subarrays”. The reason to restrict moves is that moves in columns that are not part of uncovered subarrays cannot improve the “score” of the current array. The score of the current solution is the number of subarrays with no covering tuple; when the score becomes zero a SCPHF has been obtained.

The neighborhood of the current solution is all possible moves in all columns that are part of an uncovered subarray. At the beginning of the execution, the size of the neighborhood can be very large, and for this reason the neighborhood is restricted to the column that appears in more uncovered subarrays; such column is the worst column of the current solution. As the algorithm executes, the number of columns that belong to an uncovered subarray decreases and so the restriction of only considering the worst column is removed.

The tabu search algorithm was able to construct SCPHFs with up to 255 columns for strength three, 62 columns for strength four, 19 columns for strength five, 14 columns for strength six, and 13 columns for strength seven.

Recently Colbourn et al. [30] constructed SCPHFs and CPHFs by combining randomized and greedy algorithms. Firstly, a column resampling algorithm generates an initial CPHF (or SCPHF) and then a greedy algorithm adds new columns to the CPHF. The column resampling algorithm works as follows: the initial array of size is generated by assigning its entries randomly; then, all subarrays of columns are checked to see if they are “covered”, that is if they have a covering tuple as a row. From each uncovered subarray one column is selected and its entries are reassigned with random elements to try to decrease the number of uncovered subarrays. Columns are resampled until all subarrays are covered.

After generating the initial CPHF, a greedy algorithm adds new columns to the CPHF. For each new column the algorithm generates randomly a number of candidates columns; if one of these candidates does not introduce uncovered subarrays then it is appended to the current CPHF, which now has one more column. If none candidate column can be appended to the CPHF without introducing uncovered subarrays, then one of the candidate columns replaces one of the current columns if there is one current column that is part of all uncovered subarrays introduced by the candidate column. This is a greedy strategy to replace the column most likely to participate in uncovered subarrays.

By means of the combination of the column resampling algorithm and the greedy random extension algorithm, the authors constructed a great number of new SCPHFs and CPHFs whose corresponding covering arrays established a new upper bound of . Sizes of the SCPHFs and CPHFs constructed are columns for strength three, for strength four, for strength five, and for strength six.

It is very difficult for a metaheuristic algorithm to construct CPHFs of these sizes in a reasonable amount of time. Then, our simulated annealing algorithm concentrates in small and medium-size CPHFs ( and , respectively); we consider CPHFs with columns as large CPHFs.

3.2. SA Algorithms to Construct Covering Arrays

The SA technique has been used many times to find new upper bounds of . One of the most successful algorithms is the augmented annealing algorithm developed by Cohen et al. [32]. In that work the target covering array is divided into smaller arrays or “ingredients” using a combinatorial construction; after that, each ingredient is constructed either by a combinatorial technique or by SA if there is no combinatorial construction to generate the ingredient.

Other work that uses SA is the one by Torres-Jimenez and Rodriguez-Tello [17]. In this SA algorithm only binary () covering arrays are constructed. The initial solution is created randomly, but in such a way the symbols are balanced in each column; that is, there are zeros and ones in every column. To generate a new solution from the current one the algorithm uses one of two relatively simple neighborhood functions and . The function employs a procedure called switch () that changes an entry of the current solution from 0 to 1, or from 1 to 0. The function uses a procedure called swap () to interchange the elements in the rows and of the column .

Avila-George et al. [33] developed three parallel SA algorithms called, respectively, independent search, semi-independent search, and cooperative search. In the independent search each processes runs an instance of the SA algorithm independently of the others; at the end, the best solution among all processes is taken as the final solution of the parallel SA algorithm. On the other hand, in the semi-independent search the processes exchange intermediate solutions in a synchronous way. After a predefined number of applications of the neighborhood function the processes share their best solution to select a global winner which is communicated to all processes, in order for all of them to know the current best global solution. In the cooperative search the processes share their current best solutions asynchronously.

The SA algorithms developed so far to construct covering arrays work directly in the domain of covering arrays; that is, to construct a they work with matrices of size over , or with parts of the matrix as in the augmented annealing technique [32]. On the other hand, the SA algorithm developed in this work constructs the CAs in the domain of CPHFs, which is a high level representation of some covering arrays. Covering arrays and CPHFs are two distinct combinatorial objects; in covering arrays the requirement is that every subarray of columns contains at least once -tuple over , while in CPHFs the requirement is that every subarray of columns contains at least one covering tuple of permutation vectors. Then, although the structure of the SA algorithms proposed in this work may be similar to the structure of the SA algorithms developed to construct covering arrays, the neighborhood functions are different.

4. Simulated Annealing Algorithm

This section describes the SA algorithm to construct SCPHFs and CPHFs. Section 4.1 shows the general structure of the SA algorithm. Section 4.2 presents the operators that form the neighborhood function and describes the experimentation done to find the more convenient values for the utilization rate of each operator. Section 4.3 reviews the noncovering test of [28, 31] to determine efficiently if a tuple of permutation vectors is noncovering; also this subsection describes how the noncovering test is adapted to extended permutation vectors.

4.1. Structure of the SA Algorithm

Simulated annealing is a metaheuristic inspired in the process of heating and cooling metals to obtain a strong crystalline structure. A metal is heated until it melts and then it is cooled in a controlled way. At the end of the process, the particles of the metal are arranged in such a way the energy of the system is minimal [34]. The evolution of the metal from melted to ground state is simulated as follows: given a current solution with energy ; from a perturbation of a new solution with energy is created. If then becomes the new current solution; otherwise becomes the new current solution with probability , where is the current temperature. When the temperature is high, the probability of accepting a solution with more energy than the current one is high; but as the temperature decreases the probability of accepting such solutions also decreases. The cooling process requires three parameters: the initial temperature , the final temperature , and the cooling rate .

Perturbations of the current solution are done by means of a neighborhood function that takes as argument, makes some changes to , and returns a new solution . In simulated annealing, the neighborhood function is applied a certain number of times for each temperature value. In this work, the number of perturbations of the current solution (the length of the Markov chain) done at each temperature value is not fixed. The number of perturbations is incremented as the temperature drops. For this reason, the SA algorithm requires another three parameters for increasing the length of the Markov chain: the initial length , the final length , and the increment factor .

The parameter can be expressed in terms of the other five parameters if we establish that the Markov chain reaches its final value when the temperature reaches its final value . Let be the number of iterations to decrease the temperature from to ; then . From this last expression we get , and so . Now, from the equation we obtain , and so .

For the cooling schedule we take the values of the successful SA algorithm developed in [17] to construct covering arrays: ,  , and . In addition, we set the initial length of the Markov chain to and the final length to . Algorithm 1 shows the SA algorithm to construct either a or a .

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10) random_initial_solution()
(11) if then
(12)  return
(13) end if
(14) while do
(15)  best_global_improved false
(16)  for do
(17)    neighborhood_function
(18)   if or random then
(19)    
(20)    if then
(21)     
(22)     best_global_improved true
(23)     if then
(24)      return
(25)     end if
(26)     end if
(27)   end if
(28)  end for
(29)  if best_global_improved false then
(30)   
(31)   
(32)   end if
(33) end while
(34) return NULL

The current solution is stored in matrix , and matrix stores the best global solution found. At the end of the while loop the current temperature and the current length of the Markov chain are updated only if the global best solution was not improved in the last Markov chain. The neighborhood function is called in every iteration of the for loop to generate a new solution based on the current solution . The cost of a solution is the number of subarrays of columns that do not have a covering tuple as a row. When the cost of the current solution is zero a CPHF has been constructed. Function is used to compute the cost of a solution. If the cost of the solution generated by the neighborhood function is better than the cost of the current solution , then is accepted as the new current solution; otherwise has a probability of of being accepted as the new current solution.

4.2. Neighborhood Function

Let be the matrix that stores the current solution. As said before, the cost of is the number of subarrays with no row containing a covering tuple. We will refer to such subarrays as uncovered combinations, because a subarray is associated with a combination of columns from .

The neighborhood function changes by applying one of three operators , , and , with the objective of reducing the number of uncovered combinations. Since Algorithm 1 can be used to construct SCPHFs and CPHFs we denote by the symbol set of the current solution ; then, for SCPHFs and for CPHFs; the number of elements in is .

The most basic operator is , which only changes the content of a random cell by a random value from . The purpose of this operator is to provide exploration capabilities to the SA algorithm, given that a random change can direct the algorithm to another region of the search space.

Operator selects an uncovered combination denoted by and makes changes in every cell of the submatrix of columns given by , which is , in order to transform one row of in a covering tuple. Let be a permutation of . For the value of is replaced by until finding elements of such that when () is assigned to the row of is a covering tuple. Then, for each cell the operator tests at most distinct covering tuples in row , for a total of at most distinct covering tuples in , and the one that minimizes the number of uncovered combinations is selected as a result of . Sometimes no element of makes covering the row of ; in these cases all elements of are assigned to . The changes in the cells of are done independently, so that when a cell is being changed the other cells in its row have their original values. The value limits the number of changes done in a cell; the reason to set a limit is that when is large the process of assigning every symbol of to can be time consuming. By experimenting with several values of and , we set in the operator .

The last operator is a specialized version of that randomly selects a cell of the submatrix given by an uncovered combination and assigns independently to all elements of . There may be several elements of that cover the row of , but the symbol selected for is the one that covers and minimizes the number of uncovered combinations in . In case of ties, the symbol for is selected randomly among the best symbols. If no symbol of covers the row of then is assigned randomly. The reason for this operator is to increase the probability of finding the best symbol for that at the same time covers and minimizes the number of uncovered combinations in the current solution . Thus, this operator is intended for exploitation of the current neighborhood.

For better performance of the neighborhood function, the probabilities of using one of the three operators , , and in an application of the neighborhood function were tuned by experimentation. Assuming a granularity of 0.1 for the probability of using each operator, we consider the distinct sixty-six triples where , , and are nonnegative integers such that . From a triple the probabilities for the three operators are given by , , and . The experimentation is based on executing sixty-six instances of the SA algorithm, one for each distinct triple , with the objective of identifying the utilization rate of the operators for which the SA algorithm performs better.

The CPHFs used in the experimentation were , , , , , and . However, given the nondeterministic nature of the SA algorithm, we repeated each case 31 times. Thus, the SA algorithm was executed times to determine the utilization rate of the operators , , and .

Table 3 shows the best four results obtained for each CPHF. The first column contains the CPHF instance. Columns 2-4 contain the triple that gives the probabilities of using the three operators of the neighborhood function, where corresponds to , corresponds , and corresponds to . The last column contains the average over the 31 runs of the number of uncovered combinations reached by the SA algorithm after 7000 executions of the neighborhood function.

The results of Table 3 do not show an absolute winner triple , but they show that operator should have a higher probability of being used. To obtain the winner triple we averaged the values in columns 2, 3, and 4:Therefore, the probability that each operator has been used in an application of the neighborhood function isFrom this result we concluded that the three operators are required by the neighborhood function, but each one with a different utilization rate.

4.3. Noncovering Test

This section describes the test of [28, 31] to check efficiently if a -tuple of permutation vectors is noncovering. The noncovering test is executed several times in every application of the neighborhood function, and therefore it is very important to execute this test efficiently.

A -tuple of permutation vectors is noncovering if the array of size on symbols generated by the permutation vectors has two distinct rows that contain the same -tuple on symbols. If the array generated by the permutation vectors has two identical rows, then it cannot be an orthogonal array , and therefore the -tuple is noncovering.

For let be a permutation vector. Then, the -tuple , is noncovering if and only if there exist distinct such that

for . Now, let for ; then is noncovering if and only if the following linear system with unknowns , , , has a nonzero solution over :

Solving this linear system for each -tuple of permutation vectors is time consuming. Thus, we follow the method of [28] to find the set of permutation vectors for which a nonzero tuple solves the system (5). For each -tuple over , where at least one is distinct of zero for some , consider the following equation with unknowns :

If is nonzero for some then the corresponding is obtained by assigning arbitrarily the other unknowns ,  , and solving for . Therefore, there are solutions in for (6). Let be the set of -tuples over with at least one nonzero for some ; the cardinality of is .

To store the set of vectors that are solved by each we use a binary matrix of size . This matrix has a row for each and has a column for each permutation vector; recall that the total number of permutation vectors is and each one can be represented by an integer . For each row of the entry at column is 1 if and only if the permutation vector is solved by the -tuple associated with row of ; otherwise the entry at column is 0. A -tuple of permutation vectors is noncovering if there exists a row of such that for ; in this case the -tuple associated with row of is a nonzero solution of system (5) generated by the permutation vectors . On the other hand, if in every row of there is at least one such that then the -tuple is covering.

This noncovering test requires in the worst case operations. The strategy of [31] stores for each only the permutation vectors solved by and uses binary search to check if a permutation vector is in the set solved by . In that method, the worst case is , because for each are required binary searches in a set of elements.

For extended permutation vectors consider (7) with unknowns for each nonzero -tuple over :

In this case the nonzero element of can be at any position, and therefore there are nonzero -tuples . In addition, there are solutions in for (7) because of the unknowns can be assigned arbitrarily in order to solve the equation for the remaining unknown (that must be associated with a nonzero ). Since there are extended permutation vectors, the matrix has columns, and so its dimensions are . The worst case of the noncovering test for extended permutation vectors is .

5. Computational Results

This section shows the main computational results obtained with the SA algorithm. In Section 5.1, the results for complete CPHFs are presented; complete CPHFS are CPHFs where all -combinations of columns have at least one covering tuple as a row. On the other hand, in Section 5.2, the results obtained by a three-stage procedure based on constructing quasi-CPHFs are presented; a quasi-CPHF is a CPHF with a relatively few number of uncovered combinations.

5.1. Construction of Complete CPHFs

The SA algorithm produced in total 64 new complete CPHFs whose derived covering arrays are the best-known ones. Of these results, 38 covering arrays were derived from SCPHFs and 26 were derived from CPHFs.

Table 4 shows the 38 new SCPHFs. The first column of the table shows the constructed; the second column shows the generated by the SCPHF in the first column; the third column shows the number of rows of the previous best-known covering array with the same , , of the covering array in the second column; and the fourth column contains the number of covering array numbers improved by the covering array in the second column. Let be the value in the last column; then, the number of rows of the covering array in the second column is the new upper bound of the covering array numbers . Sizes of the best-known covering arrays were taken from the Covering Array Tables [10].

To obtain the SCPHFs shown in Table 4, the SA algorithm was launched with parameters , , , such that if constructed the SCPHF gives a covering array that improves a current upper bound. For example, a produces a covering array with rows. The current upper bound of is 1264, and the current upper bound of is 1396 (see [10]); then, to improve a current upper bound the SCPHF must have columns. Thus, for this particular case the SA algorithm was launched with parameters , , , and .

If the SA algorithm was able to construct a , then the algorithm searches for a SCPHF with one more column . To do this, a quasi-SCPHF with columns is created by appending a column generated randomly to the SCPHF with columns. In general, this quasi-SCPHF has considerably less uncovered combinations than a quasi-SCPHF of the same size generated randomly, and therefore the SA algorithm has more possibilities of constructing a SCPHF with columns. For the above-mentioned instance , , , , Table 4 shows that the SA algorithm was able to reach columns.

Table 5 shows the 26 new CPHFs constructed by the SA algorithm. The best constructed was used to initialize a quasi-CPHF with columns (where the extra column was generated randomly). From this quasi-CPHF the SA algorithm started the search of .

All SCPHFs and CPHFs constructed by the SA algorithm are small and medium-size CPHFs, according to our classification of CPHFs based on the number of columns. For large CPHFs () the SA algorithm takes too much execution time, because for each change in an entry of the current solution we need to verify the subarrays of columns affected by the change. The verification consists in determining if the subarray is covered or uncovered. However, for small CPHFs the SA algorithm produced good results. The SCPHFs and CPHFs listed in Tables 4 and 5 are the best-known ones since the covering arrays derived from them improved their respective best-known covering arrays. The total number of upper bounds of covering array numbers improved by the covering arrays in Tables 4 and 5 is .

5.2. Construction of Quasi-CPHFs

In this section, we employ the SA algorithm given in Algorithm 1 to construct quasi-CPHFs with a relatively few number of uncovered combinations, say less than or equal to 5% of the total number of -combinations of columns. The advantage of constructing quasi-CPHFs instead of complete CPHFs is that quasi-CPHFs are constructed in less time, and also quasi-CPHFs may be constructed for values of , , , for which a complete may not even exist. To construct quasi-CPHFs, Algorithm 1 was modified to finalize when the number of uncovered combinations in the best global solution is less than or equal to 0.05 .

The array derived from a quasi-CPHF is a quasi-CA (quasi covering array) with some missing tuples. To be a covering array with strength , a matrix of size over must satisfy the property that every submatrix of columns contains every -tuple over at least once. The -tuples not covered in a subarray of columns are the missing tuples in that subarray.

Let be the array of size derived from a quasi-CPHF() or the array of size derived from a quasi-SCPHF(). To cover the missing tuples in the submatrices of columns from we use a greedy algorithm that works in two steps. The first step of the algorithm is to determine which -tuples over are missing in ; these missing tuples are stored in a list . The second step covers the tuples in by adding new rows to or by overwriting unassigned elements in the rows previously added to cover a tuple in . For each tuple of , the algorithm first searches if there is a candidate row to cover the tuple by overwriting some unassigned elements of the row. Suppose is formed by columns , then a row is a candidate row to cover a tuple if for either or is unassigned. If there is no candidate row to cover a tuple , then a new row is added to . The final result of the greedy algorithm is a complete covering array that has a number of unassigned or redundant elements.

The redundant elements are elements that can be freely modified without affecting the coverage properties of a covering array; so these elements are not needed to satisfy the covering conditions. To reduce redundancy in the covering array, we use the postoptimization method of [35]. This postoptimization method deletes some rows from a covering array by copying their nonredundant elements to the redundant elements of other rows.

Because of time constraints we only apply the three-stage procedure (construction of a quasi-CPHF, coverage of missing tuples, and postoptimization) to order and strength . A similar three-stage procedure was developed in [36].

For we constructed CAs with up to columns, and the relevant results obtained are shown in Table 6. The first column of the table contains the covering array generated by the three-stage procedure; the second column of the table contains the number of rows of the previous best-known covering array; and the third column contains the number of upper bounds of covering array numbers improved. For we generate CAs with maximum number of columns , and the more important results are shown in Table 7. Finally, Table 8 displays the main results for .

The accumulated sum of the results in Tables 6, 7, and 8 is 183 new covering arrays and 546 new upper bounds of covering array numbers .

6. Conclusions

In this work, we developed a simulated annealing algorithm to construct covering perfect hash families (CPHFs) formed by either permutation vectors or extended permutation vectors. CPHFs formed by permutation vectors are denominated Sherwood-CPHFs or SCPHFs. For the same number of columns and for the same , a is better than a because the first generates a , while the second generates a . However, sometimes CPHFs can be constructed with more columns than SCPHFs.

The simulated annealing algorithm developed in this work has a neighborhood function composed of three perturbation operators whose probabilities of being used in an application of the neighborhood function were tuned by experimentation. The use of a compound neighborhood function allows combining exploration and exploitation operators in the neighborhood function, and this is very important for the success of the algorithm. The results obtained from the simulated annealing algorithm were 64 new CPHFs whose derived covering arrays improved the best-known ones. In addition, the simulated annealing algorithm was used to construct quasi-CPHFs whose derived arrays are then completed and postprocessed; in this case the number of new covering arrays constructed was 183. Then, in total we constructed new covering arrays; and these 247 covering arrays improved in total the upper bound of 683 covering array numbers.

The construction of CPHFs using metaheuristic algorithms can be further improved by developing more sophisticated neighborhood functions or by employing parallel computing to handle larger instances.

Data Availability

All the input data needed to produce the output data is given in the paper; in case output data is needed write to the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge ABACUS-CINVESTAV, CONACYT Grant, EDOMEX-2011-COI-165873 for providing access of high performance computing and General Coordination of Information and Communications Technologies (CGSTIC) at CINVESTAV for providing HPC resources on the Hybrid Cluster Supercomputer “Xiuhcoatl”. The CONACyT Métodos Exactos para Construir Covering Arrays Óptimos Project 238469 has funded partially the research reported in this paper.