Abstract

Considering that some intelligent software in mobile devices is related to location of sensors and devices, regression testing for it faces a major challenge. Test case prioritization (TCP), as a kind of regression test optimization technique, is beneficial to improve test efficiency. However, traditional TCP techniques may have limitations on testing intelligent software embedded in mobile devices because they do not take into account characteristics of mobile devices. This paper uses a smart mall as a scenario to design a novel location-based TCP technique for software embedded in mobile devices using the law of gravitation. First, test gravitation is proposed by applying the idea of universal gravitation. Second, a specific calculation model of test gravitation is designed for a smart mall scenario. Third, how to create a faulted test case set is designed by the pseudocode. Fourth, a location-based TCP using the law of gravitation algorithm is proposed, which utilizes test case information, fault information, and location information to prioritize test cases. Finally, an empirical evaluation is presented by using one industrial project. The observation, underlying the experimental results, is that our proposed TCP approach performs better than traditional TCP techniques. In addition, besides location information, the level of devices is also an important factor which affects the prioritization efficiency.

1. Introduction

Nowadays, the Internet of Things (IoT) develops more and more widely [1]. It is based on wireless sensor networks (WSNs) which combine intelligent software and sensor devices and makes smart home and smart city possible [2, 3]. With the development of hardware (chips) and software (intelligent systems), smart mobile devices (such as smart dust) are gradually emerging, which integrate sensors, processors, intelligent software, and communications. Smart mobile devices not only have an ability to transmit and monitor information but also perform sophisticated intelligent information processing and intelligent prediction by the intelligent software [4] and location-based service (LBS) [5]. The software in each device has its specific functions. For example, some devices are used to monitor and process temperature information and others are used to process population information. Devices are provided with location-dependent information and interact with other devices in a location-dependent way. It is location information and software complexity of devices that make software testing face a major challenge in IoT.

Regression testing, reusing test suites, is performed on a modified program to install confidence that the system behaves correctly and that modifications have not adversely affected unchanged portions of the program [6]. Test case prioritization (TCP), sorting test cases depending on some criteria, is a way to increase the efficiency of regression testing [7]. It aims at improving the rate of fault detection. Traditional TCP techniques mainly focus on the algorithm design for testing software to improve test prioritization efficiency. However, in IoT, traditional TCP techniques have limitations because they do not take into account the characteristics of hardware devices, such as location information.

The law of gravitation, according to Newton’s Philosophiae Naturalis Principia Mathematica [8], indicates that there is a force of gravitational attraction existing between any two objects, which is given by the following equation:where G is the universal gravitational constant, is the mass of one object, is the mass of the other object, is the radius of separation between the center of masses of each object, and is the force of attraction between two objects. The universal gravitation has been applied to the field of data analysis. For example, many research studies make data gravitation (simulating the universal gravitation) applicable to machine learning [911]. In IoT, if we can utilize the law of gravitation to prioritize test cases, will it improve the test efficiency?

In this paper, a new location-based TCP using the law of gravitation technique is developed to solve TCP problem of software embedded in mobile devices. This technique is designed for adapting to a smart mall scenario. It is not just a test case prioritization approach but additionally enables to make characteristics of devices utilized, thereby allowing the order of test cases to be beneficial for test efforts. Test gravitation is defined in this technique. Under this definition, a specific calculation model of test gravitation for a smart mall scenario is designed. First, it calculates the masses of each test case and each faulted test case. The creation of a faulted test case set is related with the occurred faults which detected by those preselected test cases that test different-location-area device representatives. Then, the distance between two specific test cases can be calculated according to location information of devices. For each test case, test gravitation is calculated from this test case to each faulted test case. Finally, test cases are prioritized based on test gravitation.

The contributions of this work include the following:(i)Test gravitation is proposed based on the law of gravitation. A specific calculation model of test gravitation adapted to a smart mall scenario is given. Specially, the creation of the faulted test case set used in the calculation of test gravitation is designed in detail.(ii)A location-based TCP using the law of gravitation technique is proposed, and its algorithm is designed by the pseudocode. Its feasibility is illustrated with a small example.(iii)An empirical evaluation is presented by using one industrial project. In addition, it discusses whether different evaluation metrics (with or without considering severities of faults) will influence the experimental conclusions. It is also discussed what factors affect the prioritization efficiency.

The rest of this paper is organized as follows: Section 2 describes test case prioritization problem, traditional TCP techniques, special TCP techniques, and TCP problem in a smart mall scenario. Section 3 presents a location-based TCP using the law of gravitation method and simulates its feasibility with an example. Section 4 describes an empirical evaluation and analyzes the results. Section 5 discusses some related work on test case prioritization and mobile application testing. Finally, the conclusions and future work are given in Section 6.

2. Background

2.1. Test Case Prioritization Methodology

Regression testing, attempting to validate modified version of the original program , checks the results for conformance with requirements [12]. Many techniques have been proposed to improve the cost-effectiveness of regression testing. Test case prioritization is one of these approaches, which rearranges test cases to increase the rate of fault detection during the whole regression testing.

Test case prioritization problem is a research hotspot in the field of software testing. It sorts test cases by using some criteria to detect more faults as fast as possible. A complete definition of TCP problem was first proposed by Rothermel et al. [13]:

Given a test suite already selected (T), the set of all possible prioritizations (orderings) of T (PT), and an objective function from PT to the real numbers (f), which yields an award value for that ordering.

Problem. Find such that .

Many test case prioritization techniques have been proposed during the past two decades. Elbaum and Rothermel et.al [1316] discussed test case prioritization techniques of the fine-grained entity, such as coverage prioritization (statement or branch coverage, etc.) and fault-exposing-potential (FEP) prioritization. Meanwhile, total strategy and additional strategy are proposed [15]. Both are built on a Greedy algorithm which selects a local optimal solution within the search space at each round. Srikanth et al. [1719] proposed a value-driven approach called PORT which does not require structural coverage information. The PORT algorithm was based on four factors: customer priority, requirements volatility, implementation complexity, and fault proneness of the requirements. Arafeen and Do [20] proposed a test case prioritization technique using requirement-based clustering. It incorporated traditional code analysis information which could improve the effectiveness of test case prioritization techniques.

2.2. Traditional TCP Techniques

Existing TCP methods—prioritizing test cases based on coverage [16] or requirement [19], or even time-aware [21, 22]—are all based on the optimization of software system itself. That is to say, traditional TCP methods focus on improvement of methods themselves. The factors they consider are based on characteristics of software and do not involve characteristics of hardware. Most of the software they test is also cross-platform web application.

There are several classical test case prioritization techniques, introduced as follows:Random prioritization [16]. Random prioritization orders test cases randomly. It is simple and convenient, but unstable.Total coverage prioritization [16]. It orders test cases based on the descendent number of units covered by these test cases. When multiple test cases cover the same number of units, the order is determined randomly.Additional coverage prioritization [16]. It orders test cases to achieve maximized coverage as early as possible. It first picks the test case with the greatest coverage and then successively adds those test cases that cover the most yet uncovered parts.Prioritization of Requirements for Test (PORT) [1719]. It orders test cases based on the descending order of weighted priority (WP) values so that the test case with a higher WP value will be ordered in the front.Optimal prioritization [16]. It prioritizes test cases using the faults, and it can obtain the ordering of test cases that maximizes a test suite’s rate of fault detection. It provides an upper bound on the effectiveness of the other heuristics.

2.3. TCP Techniques Utilizing the Execution Information

When a test case has been executed, it generates execution information, such as its fault detection. As regression testing becomes more complex, scholars have considered the impact of execution information of test history to the current test prioritization.

2.3.1. History-Based TCP

A history-based TCP technique [23] sorts test cases according to the selection probabilities calculated from test history. It defines the selection probability of each test case as follows:where is selection probability, is test history, and is a smoothing constant used to weight individual histories.

Three test histories (based upon each test case’s execution history, its fault detection, and the program entities it covers) have been investigated on the effect of test prioritization. Their experimental results show that historical information may be useful in reducing costs and increasing the effectiveness of long-running regression testing processes.

2.3.2. Adaptive TCP

As a main method of dynamic programming [24], the adaptive idea is also used in test case prioritization. Two types of adaptive TCP techniques are introduced here. They all take advantage of the impact of occurred faults to prioritize test cases in current test round.

(1) Adaptive TCP guided by output inspection. An adaptive test case prioritization guided by output inspection [25], which combines the test-case scheduling process and the test-case execution process, prioritizes test cases as the following process:

First, it calculates the initial fault-detection capability of all test cases based on the execution information of the previous output and then selects a test case t with the largest fault-detection capability. Second, t is executed on the modified program, and it records the output of t. Third, it modifies the fault-detection capability of remaining unselected test cases based on t’s output and selects the test case with the largest modified fault-detection capability. Fourth, it repeats the preceding two steps until all the test cases have been prioritized and run.

(2) TCP based on adaptive sampling strategy. TCP techniques using cluster filtering [26, 27] select and prioritize test cases as the following process: first, it partitions the test suite based on cluster analysis; then, it selects test cases according to sampling strategy; finally, it prioritizes the selected test cases. In the sampling strategies, the adaptive sampling strategy is that it first initially selects one execution at random from each cluster and then all others of its cluster are selected if the first one selected from the cluster is a failure.

2.4. TCP Problem in a Smart Mall Scenario

Figure 1 is a simple distribution diagram of mobile devices for a smart mall scenario. In the figure, a wireless transmitter icon represents a smart mobile device mentioned and studied in this paper. The cloud icon represents central processing. A person with a mobile phone represents a handheld mobile device. Among them, white devices are distributed around specific locations (stores) to monitor and process specific-location information. Black devices are distributed in the middle of the mall to monitor certain types of information and perform distributed information processing. Each mobile device in the mall has its own unique function; that is, its internal intelligent software achieves specific requirements. Integrated testing of the software in devices throughout the mall becomes extremely complicated. For example, in the case like Figure 2, each restaurant has a mobile device that manages information about this restaurant. It can, via Internet, monitor the number of incoming customers/remaining seats, the number of dishes, the temperature, etc., and pushes location preferences, food preferences, etc. to guests (other mobile devices) entering the restaurant. In the hall of the mall, there are restaurant-proxy mobile devices that collect real-time restaurant and people data, via wireless network. It also intelligently pushes the best restaurants (vacant, near, etc.) to the mall customers (other mobile devices) at the current moment via Internet. This can schedule mall customers in real time, which may avoid occurring the case that all customers crowded in front of one restaurant. All data need to be transmitted to the control center for large-scale data processing via wireless network.

When discussing TCP problem in a smart mall scenario, traditional TCP methods can be improved by adding location information in sorting test cases to adapt the test order for the new scenario. Mobile devices are located in different locations, making them communicate more frequently (functionally interact more closely) with other close-range devices. According to distances between devices, the correlation between functions of intelligent software attached to devices is also strong or weak. As shown in Figure 1, software functions of the black device on the left side should have a greatest relationship with software functions of the other three white devices which communicate with this black device. Test cases test software functions of mobile devices. We set the granularity of a test case as testing all of the functional requirements of a mobile device. In this way, the node graph between traditional test cases becomes a node graph between actual mobile devices (Figure 3). In Figure 3, the left ellipse is a test-case node graph where a circle icon indicates a test case, and the right ellipse is a device node graph where a square icon indicates a mobile device. r represents the distance between test cases or devices. The virtual distance between test cases is mapped to the actual distance between devices.

3. Location-Based TCP Using the Law of Gravitation

This section combines test case information, fault information, and location information to propose a new location-based TCP technique using the law of gravitation.

3.1. Test Gravitation

Test gravitation (TG) is introduced to simulate the universal gravitation in our method. Test gravitation F between two test cases ta and tb can be defined as follows:where is the test gravitational constant, and are the masses of and , respectively, and is the distance between and .

is related to the environment of regression testing. If the criterion of m and r is certain, should be unique. In this paper, we will not research its influence on the proposed method. So, is set as 1.

Different attributes of a test case can represent different substances that make up this test case. If this test case detected faults, the attributes of a fault, which is as another type of substances, are also included to make up this test case. The weight of two types of attributes (substances) together makes up the total mass of this test case.

Definition 1. Test case mass . The mass of a test case t is defined as follows:where is the total number of attributes of , is the weight of ith attribute of , is the total number of faults which detects, is the total number of attributes of a fault, is the weight of ith attribute of this fault, and is a smoothing constant, which is .

For instance, in the implementation process, can be represented as the coverage of a test case and can be represented as the importance level of a test case; can be represented as the location level of a fault, can be represented as the severity of a fault, and so on.

Because faults cannot be known in advance during the actual testing process, there are two ways to obtain faults and their attributes. One way is presetting faults, which can be given based on expert decision or deep learning; the other way is utilizing occurred faults.

Definition 2. Distance r. It indicates the distance between two test cases, denoted as .

For instance, r can be calculated according to the business level (tree relationship) between test cases or according to the spatial distance between devices they are located.

3.2. TG Calculation Model

In a smart mall scenario, according to the above definitions, we design a specific calculation model of TG to make preparations for prioritizing test cases. This model calculates a force F from a test case to a faulted test case.(1)Importance level (TI) of a test case is selected as the only attribute of . TI is determined by the functional level of the mobile device (DL) which t tests. The mass of t iswhere is the device tested by . is divided into 5 levels. It can use a linear assignment, such as , or a nonlinear assignment, such as .(2)Fault severity (FS) is selected as the only attribute of a fault . The values of FS and TI(DL) compose ; that iswhere is the device tested by the faulted test case and is jth fault detected by . is divided into 5 levels, like DL.Occurred faults, detected by preselected test cases in current test round, are used to create a set of faulted test cases (FTS). The formation of a FTS will be described in detail in Section 3.3.(3)Spatial location distance of devices is selected to calculate r. r is the 3-dimensional Euclidean distance between a device which a test case tests and a device which a faulted test case tests, as shown in Figure 4. It is defined as follows:where is the device whose software is tested by , and is the device tested by .(4)From the above, a specific calculation model of TG between a test case and a faulted test case is as follows:where is the device whose software is tested by , is the device tested by which detected e faults, and is jth fault detected by .

3.3. Faulted Test Case Set

Occurred faults which detected by some preselected test cases in the current test round are used as the faults mentioned in Section 3.2, so how to collect these occurred faults to create a FTS is an important step. The fault attaches to the device. We use clusters of devices to obtain a FTS. Algorithm 1 describes a clustering process of devices. Euclidean distance is used as the dissimilarity metric.

Input: , k //a device set, and the number of clusters
Output: //a set of k clusters
(1);
(2)put each as a cluster ;
(3)add all clusters into ; //Initialization: get a single-cluster set
(4)Do //Iteration: make clusters merge.
(5)For each
(6)  If ( and have the minimum 3-dimensional Euclidean distance)
(7)   merge and into a new cluster ;
(8)   delete and from C;
(9)   ;
(10)  End if
(11)End for
(12)Until The number of clusters in C is k //Break condition

After devices clustered, one device is selected randomly from each cluster as the representative of this cluster. Test cases that test these representatives are put into a test subset ST. ST is executed. If faults occurred, the test cases which detected these faults are combined into a FTS. Algorithm 2 shows the pseudocode of this process.

Input:
//a set of device clusters
//a set of test cases
Output:
FTS //a set of faulted test cases
(1);
(2)While () do
(3);
(4)For each
(5)  Randomly select one d (i.e., ds) from ;
(6)  Select the test case (i.e., ts) which tests the software of ds;
(7)  ;
(8)End for
(9)For each
(10)  Execute t;
(11)  if (t detects faults) then
(12)   put t into FTS;
(13)  End if
(14)End for
(15)End while
(16)Return FTS
3.4. Location-Based TCP Using the Law of Gravitation Algorithm

Definition 3. Test case priority . It indicates the priority of a test case in the execution order. The priority is defined as follows:where is the number of faulted test cases and is the force F of this test case to the ith faulted test case. The larger the value is, the earlier this test case will execute.

Algorithm 3 shows the location-based TCP using the law of gravitation approach. Its input is a test suite T. Its output is the prioritized test order T′. First, m of each test case t is calculated according to the level of a mobile device which t tests. Second, a faulted test case set FTS is created according to algorithms 1 and 2. m of each faulted test case is calculated based on both FS and DL. Third, the distance r between each t and each can be calculated according to location information of devices. Fourth, for each t, the force F is computed from this t to each . Fifth, the priority P of each t is calculated according to F. Finally, test cases are sorted in descending order of P to obtain a prioritized test execution order T′.

Input: Test suite T
Output: Prioritized test suite T′
General process:
Begin
(1)Calculate m of each t in T
(2)Cluster devices according to Algorithm 1
(3)Create a FTS according to Algorithm 2
(4)Calculate m of each in FTS
(5)for each in T
(6)for each in FTS
(7)  Calculate r between and
(8)  Calculate of to
(9)end for
(10) Calculate P of
(11)end for
(12)Sort all t in T based on the descending order of P and obtain the new test execution order, being T′
(13)return T′
End
3.5. Example for Simulating Smart Mall

We simulate a smart mall scenario with Figure 5 to explain how to prioritize test cases. There are five mobile devices (d1d5) in the figure, shown by squares. Each device is tested by a test case for its internal intelligent software functions. So, there are five test cases (t1-t5), shown by circles. Assume that the devices are clustered into 2 clusters: and . and are extracted randomly to be device representatives, and a subset ST {t2, t4} is formed. After ST run, two faults (f1 and f2) are found by t2, which are shown by stars in the figure. A dashed line in the figure shows the 3-dimensional Euclidean distance between two devices.

In the above example, let us consider a test case prioritization problem defined over a set of five test cases with a set of one faulted test case FTS () from Table 1. From Figure 5, according to the location of devices, the distances between test cases are obtained, as in Table 1. We suppose that all faults (including their Severity levels) detected in current testing round are shown in Table 2.

We take t1 as a sample and calculate the force F of t1 to , as F1 = 0.135. According to Equation (9), we get the priority value of t1 which is P1 = 0.135. Similarly, the priority P of t2, t3, t4, and t5 are P2 = 22.5, P3 = 0.28125, P4 = 0.0002, and P5 = 0.000225. The prioritization order is t2-t3-t1-t5-t4, and the APFDc [28] value of this order is 78.57%. According to Table 2, the optimal prioritization sorts test cases as the order t2-t3-t1-t5-t4 (or t2-t3-t1-t4-t5), whose APFDc value is 78.57%. The random prioritization sorts test cases as one order t5-t1-t3-t2- t4, whose APFDc value is 41.43%. It can be seen that the effect of our location-based TCP using the law of gravitation has a good effect, which is even consistent with the optimal prioritization.

4. Empirical Evaluation

To investigate the effectiveness of the method, called location-based TCP, using the law of gravitation (L-TCP from now on), an empirical evaluation is performed in terms of the following research questions:(i)RQ1: Is L-TCP approach more effective in the rate of fault detection than other traditional prioritization techniques?This research question aims at understanding whether the L-TCP method can detect faults earlier than other traditional test case prioritization techniques. To answer this question, this paper applies four traditional TCP techniques for comparison.(ii)RQ2: When evaluating the efficiency of techniques, is there any difference in the experimental conclusions for whether or not considering faults severities?Whether or not to consider severity of a fault will undoubtedly make a difference in the judgment of the prioritization effect. This research question mainly discusses the influence of two evaluation metrics on the experimental conclusions.(iii)RQ3: In addition to location information, what other factors the prioritization efficiency is also related to?In the smart mall scenario, test prioritization efficiency of software may be related to the information of mobile devices. This research question combines analyses of the above two questions to discuss factors that influence the efficiency of prioritization.

4.1. Object

The object used in this experimental study is a real industrial project which is for chip testing and has approximately 140,000 lines of codes (LOC), totally. It has many versions, and each version has a few of requirements. The test suite of each version is relatively small. The granularity of test cases is coarse-grained. That is to say, each test case may contain dozens or even hundreds of test scripts, but it tests only one chip function (requirement). These features can be used to simulate test data for a smart mall scenario. First of all, functions of the hardware chip are similar to those of smart devices, so characteristics of the faults may be similar, too. Second, each test case covers only one specific requirement, which can simulate to test one mobile device. Third, there are many rounds (versions) of regression testing, and there are new test cases introduced in each round, which can simulate a step-by-step integration testing environment for the smart mall scenario. The project data include the number of test cases, the functional requirements covered by test cases, the faults detected by test cases, the fault severities, and so on. We use this project data as a basis and then simulate the distance data between devices. Finally, they are formed into a complete data required for this experiment. There are six versions chosen for this experiment. The basic information is shown in Table 3.

4.2. Variables and Measures
4.2.1. Independent Variables

To address our research questions, one independent variable is manipulated: test case prioritization technique. Besides our proposed L-TCP approach, the following traditional test case prioritization approaches are also implemented for comparison.(i)Random (R): this technique uses random prioritization technique to order test cases without using location information of devices in prioritization.(ii)Total coverage (TC): this technique uses total coverage prioritization technique to order test cases without using location information of devices in prioritization.(iii)Additional coverage (AC): this technique uses additional coverage prioritization technique to order test cases without using location information of devices in prioritization.(iv)Requirement prioritization (PORT): this technique uses prioritization of requirements for test technique to order test cases without using location information of devices in prioritization.

4.2.2. Dependent Variable and Metric

Details on the measures for the dependent variables of these experiments are given here.

APFD. To measure how rapidly a prioritized test suite detects faults, average percentage of fault detection (APFD) is used as the dependent variable.

APFD [28], the weighted average of the percentage of faults detected, focuses on the rate of fault detection during the testing life of a test suite. It assumes that the faults severities are equivalent. The equation of APFD is as follows:where is the number of test cases, m is the number of faults, and is the index of the first test case that reveals the ith fault in the execution order T. The value of APFD varies from 0 to 100%. Since n and m are fixed for any orders, a higher APFD value indicates that the faults are detected earlier during the testing process.

APFDc. When considering faults severities, we use APFDc [28], the (cost-cognizant) weighted average percentage of faults detected, to reward test case orders proportionally to their rate of units of fault severity detected. We assume that test case costs are identical. The equation of APFDc is simplified as follows:where is the severity of the ith fault and other symbols have the same definition as in the equation of APFD.

4.3. Case Study Design

Suppose that there are many smart mobile devices in a mall and each device is responsible for its own unique functions. We now need to test the functionality of their internal intelligence software. We assume that one test case is in charge of testing one device, and it tests all of the software functionality of this device.

First, it collects data. To conduct the comparative experiments, five types of data information are required, including test case, levels of test cases (devices), fault, severities of faults, distance of devices, and coverage information.

The preparation of test case, fault, severities of faults, and coverage information is trivial because it is already available in the original data of the object system. For the preparation of the levels of test cases (devices), we grade test cases according to their name description. The preparation of the distance between mobile devices requires us to give them values by simulating the smart mall scenario.

Second, it performs test case prioritization techniques. This experimental study implements five approaches (TC, AC, R, PORT, and L-TCP) for comparison. Because of the indeterminacy of some prioritization techniques, each technique runs 20 times for each experiment and the average values are presented as results. The smoothing constant is set = 50%.

Third, it calculates APFD and APFDc for each prioritized test order from each technique. All measure values are compared across different techniques. The results emerging from this comparison are presented in the Section 4.4.

All the experiments are conducted on the same computer which is configured as 64-bit windows 8 operating system, Intel(R) Core(TM) i3-2130 CPU and 4 GB memory.

4.4. Results and Analysis

In this section, we present the results of the experiment(s) and analyze their relevance to our research questions above.

4.4.1. RQ1: Comparison with Traditional TCP Techniques

Figure 6 shows the box plots of five techniques across all the system versions of ChipTest. The horizontal axis shows versions, and each box in a version presents one TCP technique. The vertical axis presents APFD values. Each boxplot shows the median, upper/lower quartile, and max/min APFD values achieved by a technique.

From the boxplot, L-TCP, as indicated by APFD scores, significantly outperforms the others because its median point reaches up to the highest. Besides L-TCP, PORT performs better with a higher median point. TC, AC, and R have a similar effect, and their median points of APFD locate approximately between 40% and 65%.

For instance, let us choose the data of v-9 for analysis. We use M(Median, Q1, Q3) to denote the median, first, and third quartiles APFD values for each technique and M1-M5 to denote the five techniques: TC, AC, R, PORT, and L-TCP, respectively. So, results of the five techniques are M1(38.89, 29.37, 48.02), M2(34.13, 29.37, 43.25), M3(38.1, 31.35, 46.43), M4(45.24, 40.88, 46.83), and M5(78.57, 78.57, 78.57), respectively, which clearly indicates that L-TCP overall performs better than the others.

For evaluating the confidence level of the observed results, we test their statistical significance. First, a single sample K-S test is used to check the normal distribution of the data of each technique from 120 executions (20 running × 6 versions). The significance level is = 0.05. Their results are as follows: in Table 4, the first row is the names of TCP approaches mentioned above. The second row shows the judge of normal distribution for TC, AC, R, PORT, and L-TCP. The third row shows their significance probability values under the null hypothesis.

From Table 4, their results accept the null hypothesis ( values are all greater than 0.05). So, APFD values of the five prioritization techniques satisfy a normal distribution.

Next, the paired-samples t-test is employed to obtain sufficient statistical evidence. f1 and f2 are defined as the values of APFD, which are prioritized by two prioritization approaches, respectively.

The following two hypotheses are considered:H0: f1 = f2, if two techniques have the same effectiveness in the rate of fault detection.H1: f1 > f2, if f1 is significantly better than f2.

If the p value is less than the significance level ( = 0.05), we can reject the null hypothesis and accept the alternative hypothesis.

Table 5 reports the results of statistical testing by using the data from 120 executions. Their results show that L-TCP is statistically significantly better than other TCP techniques because its t values are greater than 0 and values are less than 0.05.

For instance, compared with TC, the value of L-TCP equals 0.000 and its t value equals 22.947, so we can reject the null hypothesis that L-TCP and TC have the same effectiveness in the rate of fault detection and accept the alternative hypothesis that L-TCP is significantly better than TC.

4.4.2. RQ2: Effects of Different Evaluation Metrics

In the evaluation metrics of fault-detection rate, APFD is one that does not consider faults severities and APFDc is one that considers faults severities.

Figure 7 shows APFD and APFDc distributions of different techniques in different versions. As can be seen from Figure 7, L-TCP has highest values in both APFD and APFDc evaluations. That is to say, the prioritization effect of L-TCP is the best among other techniques, regardless of whether or not faults severities are considered in evaluation. In addition, the trend of other techniques is similar in both evaluations (except in version 9 and version 11); that is, PORT is a second best technique besides L-TCP. In version 9, APFD evaluation shows that effects of TC, AC, R, and PORT are similar, as shown in Figure 7(a). However, in APFDc evaluation, PORT is significantly better than the other three techniques in version 9, as shown in Figure 7(b). In version 11, APFD evaluation shows that PORT is significantly better than the others, but in the evaluation of APFDc, AC is slightly better than PORT.

Therefore, from the results, whether or not considering faults severities will not affect the conclusion of RQ 1, that is, L-TCP is superior to other techniques. It is just that the degree of excellence varies across different metrics.

4.4.3. RQ3: Factors Affecting Prioritization Efficiency

From the analysis of the results of RQ1 and RQ2, it can be seen that L-TCP is the best technique to improve the rate of faults detection. In addition to L-TCP, PORT is the second best performing technique.

In-depth analysis shows that, first of all, according to the characteristics of test data, L-TCP mainly affects the prioritization efficiency by location information of devices. That is, in the smart mall scenario, location information is the main factor affecting the test order efficiency of intelligent software embedded in mobile devices. Second, in the smart mall scenario, PORT sorts test cases according to the priority of software functions of mobile devices in this experiment. The functional priority of a device determines the level of a device, and the device level determines the mass of a test case which tests this device. In retrospect, in the smart mall scenario, the test gravitation calculation model considers both device location information and device level, which are the main factors that influence the test prioritization efficiency. So, this may be the reason why L-TCP can achieve better sorting results in this smart mall scenario.

4.5. Threats to Validity

In terms of the internal validity, the choice of the smoothing constant can affect the results. In this paper, the selection of this parameter has been based on equalization, that is, . Further investigations can study the effect of the smoothing constant.

The threats to external validity are from the object, its test data, and its faults used by this experimental study. To reduce this threat in the object, the experimental object we select is the system that tests chips, which is an object that is relatively close to the simulated scenario. Moreover, we select multiple successive versions (6 versions) for experiments to simulate step-by-step integration testing of a smart mall scenario. The second external threat lies in the test data in this object. Although the data are relatively real, it is not complete enough for the research in this paper. For incomplete data (such as the lack of distances between devices), we try to simulate the data supplement according to the scenario. The third external threat is the faults. For faults, we use actual real faults in order to be closer to the real scenario.

The threat to construct validity lies in whether the experimental results are measured in a correct way. To reduce this threat, firstly, APFD is used to measure the effectiveness of a prioritized test case order since APFD can measure the rate of fault detection and has been widely used in the evaluation of the test case prioritization problem. Second, APFDc is also used to measure accurately the rate of units of fault severity detected since it considers faults severities.

Test case prioritization has been an interesting research field for nearly two decades. Rothermel et al. [13] firstly proposed the complete definition of TCP problem which is finding a permutation of T in order to maximize some objective functions. They focus on code-coverage TCP methods at code-level [1316]. In 2001, Elbaum conducted specific research for TCP metrics, including APFD and APFDc [28]. APFD metric proclaims that all faults have the same severity and all test cases have equal costs. APFDc, units of fault severity detected per unit test cost, considers unifying test case costs and fault severities. Their study was primarily focused on white-box testing but not on black-box testing. Zhang et al. [29] considered requirement priorities to TCP and proposed an algorithm called TCP_RP_TC. The prioritization technique must predict requirement priorities and test costs before test suite execution, but the prediction was difficult in practice. Chu-Ti et al. [30] presented a history-based TCP method with software version awareness. Yuchi et al. [31] designed and analyzed TCP using weight-based methods for GUI applications. Garg and Datta [32, 33] used test case prioritization in web applications based on modified functionalities or database changes. Saha et al. [34] proposed a fully automated and lightweight test prioritization approach (REPiR) to address the problem of regression test prioritization by reducing it to a standard information retrieval problem so that the differences between two program versions formed the query and the tests constituted the document collection. Some researchers [35] focused on test case prioritization based on mutation analysis. It is an effective method, but the cost is expensive. Another novel refactoring-based approach (RBA) was proposed by Alves et al. [36] which reordered an existing test sequence utilizing a set of refactoring fault models. It promoted early detection of refactoring faults. Wang and Ali et al. [37] proposed a resource-aware multiobjective optimization solution with a fitness function defined based on four cost-effectiveness measures. Prioritizing test cases for the testing of location-aware services was proposed by Zhai et al. [38, 39], and it brings in service selection into a test case prioritization technique for testing the location-based web services.

Mobile application testing is a research direction for testing on mobile devices. However, most of mobile application testing focuses on performance testing or stand-alone testing which sees the software of mobile devices as a stand-alone software. Gao et al. [40] provided a general tutorial on mobile application testing that first examined testing requirements and then looked at current approaches for both native and Web apps for mobile devices. Muccini, Di Francesco, and Esposito [41] investigated new research directions on mobile applications testing automation, by answering three research questions. Given the first research question (RQ1) are mobile applications (so) different from traditional ones, so to require different and specialized new testing techniques?, the natural answer seems to be yes, they are. About (RQ2) what are the new challenges and research directions on testing mobile applications?, the challenges seem to be many, related to the contextual and mobility nature of mobile applications. As far as concern (RQ3) which is the role automation may play in testing mobile applications?, some potentials for automation have been outlined, being aware that a much deeper and mature study shall be conducted. Dantas et al. [42] proposed a set of testing requirements, elicited using the results of an extensive research on how the testing process for mobile applications is done in the literature and in practice. Morla and Davies [43] created a test environment that supports the evaluation of key aspects of location-based applications without the extensive resource investment necessary for a full application implementation and deployment. Zhang and Adipat [44] proposed a generic framework for conducting usability tests for mobile applications through discussing research questions, methodologies, and usability attributes. Vilkomir [45] evaluated the effectiveness of coverage approaches for selecting mobile devices (i.e., smartphones and tablets) to test mobile software applications. Amalfitano et al [46] addressed the problem of testing a mobile app as an event-driven system by taking into accounts both context events and GUI events. Kim, Choi, and Wong [47] proposed a method to support performance testing utilizing a database established through benchmark testing in emulator-based test environment at the unit test level.

6. Conclusion

This paper proposes a location-based TCP using the law of gravitation approach. It introduces test gravitation, which combines three factors (test case information, fault information, and location information), to prioritize test cases. Test case information involves the level of mobile device. Fault information includes the severity of fault. In addition, we use occurred faults to create a faulted test case set. It is obtained in three steps: devices clustering, test subset extraction, and running preselected test cases. Location information involves the actual location of devices. It is used to calculated the 3-dimensional Euclidean distance between two devices. Finally, it experimentally verifies the effectiveness of L-TCP technique in comparison with several traditional test case prioritization techniques.

The experimental results show that the median APFD value of L-TCP is 78.57%, which is higher than the values of the baseline methods. When employing the paired-samples t-test, L-TCP’s t values are greater than 0 and values are less than 0.05. Specially, (1) comparing with TC, the value of L-TCP equals 0.000 and its t value equals 22.947; (2) comparing with AC, the value of L-TCP equals 0.000 and its t value equals 21.728; (3) comparing with R, the value of L-TCP equals 0.000 and its t value equals 25.486; and (4) comparing with PORT, the value of L-TCP equals 0.000 and its t value equals 28.295. These results indicate that L-TCP is statistically significantly better than other TCP techniques and it can detect more faults than others at the same time consumption.

When considering the factor of faults severities during the evaluation, the conclusion that L-TCP is superior to other techniques will not be affected. It is just that its degree of excellence varies across different metrics. In the smart mall scenario, location information of devices is the main factor which influences the prioritization performance. Furthermore, the level of devices is also important.

The next step is to expand the scope of empirical evaluation and try to make the conclusion more accurate. Moreover, how to give an appropriate parameter is also a research direction.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (grant nos. 61572306 and 61502294), the IIOT Innovation and Development Special Foundation of Shanghai (grant no. 2017-GYHLW-01037), and the CERNET Innovation Project (grant nos. NGII2017051 and NGII20170206).

Supplementary Materials

The ChipTest Data.xlsx file is the data of the project used in the experimental study of my paper. The results.xlsx file is the detailed results of my experimental study. The result analysis.xlsx file includes the original graphs of the results analysis, which are also shown in my paper. (Supplementary Materials)