Abstract

In this paper, a two-stage consistency estimator for change point in the mean of panel data is given. Firstly, a single sequence is extracted, and the initial estimator and confidence interval of the change point are given by the least square method. Based on the confidence interval, a random interval containing change point with probability tending to 1 is constructed. Secondly, using all panel data falling into the random interval, the final estimator of change point is obtained by least square estimation. The asymptotic distribution is established. Simulation results show that our method can not only ensure the estimation accuracy but also greatly reduce time complexity.

1. Introduction

One of the tools to analyse large, high-dimensional datasets is the panel data model. This paper studies the problem of structural changes for panel data, in which there are N series (variables), and each series has T observations. It is assumed that there is only one change that has taken place in each series at an unknown common point, referred to as the common change point. Common change points in panel data are wide spread phenomena. For example, the outbreak of the epidemic may impact every country’s GDP. A tax policy change may alter each firm’s investment incentive. While it may be difficult to identify a change point with single series, it should be, naturally, much easier to locate the common change point using a number of series together. This paper explores the panel data approach to the estimation of change point.

Joseph and Wolfson [1, 2] are the early researchers who laid the groundwork in change point for panel data. They proposed a random change point in which each series has its own change point; across N series, the change points are assumed to be independent and identically distributed (i.i.d.). They proved that the common distribution of the i.i.d. change points can be consistently estimated. This random change point model is extended to the autoregressive model proposed by Joseph et al. [3]. Joseph et al. [4] considered the Bayesian framework. Skates et al. [5] and Jackson and Sharples [6] studied the application-oriented Bayesian models. Bai [7] established the consistency of the estimated common change point in panel data by the least square method. Horváth and Hušková [8] and Shin and Hwang [9] used the CUSUM method to test change point in the mean of panel data. Bai [7] conducted ratio-type statistics to detect change point in panel data.

Computer-based technology allows scientists to collect enormous datasets, and huge data demand new methodology. In massive data, how to quickly and accurately estimate the change point location has become a real problem to be solved. Cao and Xia [10] considered a fast estimation method for univariate sequence. This paper proposes a two-stage estimation method to locate change point in the mean of panel data, and the consistency of the estimator is proved.

2. Model and Assumptions

We assume that we study panels and we have observations in each panel. We define our model aswhere for all and . In this model, each series has a change point at , where is unknown and for some . The prechange mean of is , and postchange mean is . The difference represents the magnitude of change in th panel, which can be either random or nonrandom, and is assumed to be independent of error process . In this paper, we assume that each series has common change point at . Our purpose is to give consistent estimators with lower time complexity.

For a given such that , define

So, and are estimators for and , respectively. The classical least square estimator for in Peštová and Pešta [11] is defined aswhere

This estimator is straightforward to compute, and the time complexity is . When and are huge, it is not to easy to locate the change point. In this paper, a two-stage estimation method is proposed to reduce the time complexity.

We adopt some assumptions in Bai [7].

Assumption 1. are i.i.d. over ; for all . In addition, are independent over . Let .

Assumption 2. .

Assumption 3. is larger than such that as and go to infinity.

Assumption 4. , with , .

3. Two-Stage Estimator

3.1. The Initial Estimator

For any given , a univariate change point series with observations is selected from panels. The initial estimator for is defined as

Actually, the initial change point estimator is an ordinary least square estimator for change point in mean of univariate series. Let and . Then, according to Proposition 3 and Theorem 1 in Bai [12], the following conclusions can be drawn.where is a two-sided Brownian motion on .

Denote ; is a consistent estimator for , and is the quantile of . Using (7), the confidence interval for is conducted as

Given and , we enlarge confidence interval (8) to

Define , , and . So,

The time complexity for initial estimator is .

3.2. The Finial Estimator

Using all samples falling into the random interval , the finial estimator is defined aswhere

Denoteand then

It is easy to see that time complexity for finial estimator is . So, total time complexity for the two-stage estimation method is , which is smaller than because . It implies that the two-stage estimation method can give the change point position faster. Furthermore, Theorems 1 and 2 in Section 4 ensure the accuracy of our estimator.

In order to prove the properties of the two-stage estimator, we need the following lemmas described in Bai [7].

Lemma (A.1). Assume that model (1) and Assumption 1 hold, and we have

Lemma (A.2). Assume that model (1) and Assumption 1 hold. For all , the expected value of satisfieswhere for some .

4. Theorem and Proof

The following properties can be obtained.

Theorem 1. Under Assumptions 13, we have

Proof. Due to symmetry, it is sufficient to consider . According to Lemmas (A.1) and (A.2), for any , sinceand , then we have thatIt can be concluded from Assumptions 1 and 2 and (19) thatThus, . So, for any , for large , with probability tending to 1. Because and , there exists such that as . That is, , where .
Define the set so that excludes from . Then,By the definition of , . So, a necessary condition for is . Similar to Lemma A.3 of Bai [7], it can be proved that , which implies that . Thus,This completes the proof.

Theorem 2. Under Assumptions 1, 3, and 4, as ,where and , , and are i.i.d. standard normal random variables.

Proof. Again, by symmetry, it is sufficient to consider . For and ,Introduceand letIt follows thatBy the definition of , we getNotice thatand thusSo,According to Theorem 4.2 of Bai [7], the first and the fifth terms are and all others are .
Because , we haveSimilarly,whereUnder Assumption 4, and , where . Thus, the limit of the fifth term of (31) is .
In summary, for ,Similarly, for , we can prove thatLet , by functional central limit theorem.

5. Monte Carlo Comparison

We compare two estimators (3) and (11) by Monte Carlo simulation on the same computer. The series is generated according to model (1), where , and . Experiments are carried out for , , and . We choose . Table 1 reports our simulation results based on 500 replications, where , , and stand for the average estimator, standard deviation, and the operated time for the computer, respectively, while time is in seconds.

It can be seen from Table 1 that with the increasing number of sequences of and the sample size of , the running times are getting longer and longer for both the two-stage method and least square method. When and are fixed, with the increase of , the two-stage estimator is closer to the true change point and the running time is increased slightly. The running time of the two-stage estimation method is much less than that of the least square method. This shows that in the case of massive data, the method in this paper can estimate the change point position faster.

Data Availability

All data are computer simulation data.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (grant nos. 11771353 and 12171391).