«Size distortions of tests of the null hypothesis of stationarity: evidence and implications for the PPP debate M. Caner a, L. Kilian b, c,* a ...»
Journal of International Money and Finance
20 (2001) 639–657
Size distortions of tests of the null hypothesis
of stationarity: evidence and implications for
the PPP debate
M. Caner a, L. Kilian b, c,*
Department of Economics, University of Pittsburgh, Pittsburgh, PA 15260, USA
Department of Economics, University of Michigan, Ann Arbor, MI 48109-1220, USA
Centre for Economic Policy Research, London, UK
Tests of the null hypothesis of stationarity against the unit root alternative play an increasingly important role in empirical work in macroeconomics and in international ﬁnance. We show that the use of conventional asymptotic critical values for stationarity tests may cause extreme size distortions, if the model under the null hypothesis is highly persistent. This fact calls into question the use of these tests in empirical work. We illustrate the practical importance of this point for tests of long-run purchasing power parity (PPP) under the recent ﬂoat.
We show that the common practice of viewing tests of stationarity as complementary to tests of the unit root null will tend to result in contradictions and in spurious rejections of longrun PPP. While the size distortions may be overcome by the use of ﬁnite-sample critical values, the resulting tests tend to have low power under economically plausible assumptions about the half-life of deviations from PPP. Thus, the fact that stationarity is not rejected cannot be interpreted as convincing evidence in favor of mean reversion. Only in the rare case that stationarity is rejected, do size-corrected tests shed light on the question of long-run PPP. 2001 Elsevier Science Ltd. All rights reserved.
JEL classiﬁcation: F31; F41; C15; C22 Keywords: Purchasing power parity; Mean reversion; Finite-sample critical values; Real exchange rates * Corresponding author. Tel.: +1-734-764-2320; fax: +1-734-764-2769.
E-mail address: email@example.com (L. Kilian).
0261-5606/01/$ - see front matter 2001 Elsevier Science Ltd. All rights reserved.
PII: S 0 2 6 1 - 5 6 0 6 ( 0 1 ) 0 0 0 1 1 - 0 640 M. Caner, L. Kilian / Journal of International Money and Finance 20 (2001) 639–657
1. Introduction Recently, there has been increasing interest in tests of the null hypothesis of a level stationary process (or more precisely I(0) process) against the alternative of a difference stationary (or I(1)) process. These tests are widely used in empirical macroeconomics and in international ﬁnance both in their own right and as complements to more traditional tests of the unit root null hypothesis (Lewbel, 1996;
Culver and Papell 1997, 1999; Lee et al., 1997; Collins and Anderson, 1998; Wolters et al., 1998; Kuo and Mikkola, 1999).1 We show that there are serious problems with the interpretation of these tests in practice that applied users need to be aware of and that are of immediate relevance for many questions in international ﬁnance, including tests of long-run purchasing power parity (PPP).
The two most widely used tests of the I(0) null hypothesis are due to Kwiatkowski et al. (1992), henceforth KPSS test, and to Leybourne and McCabe (1994), henceforth LMC test. Asymptotic critical values for both tests are given in Kwiatkowski et al. (1992). However, these critical values make no distinction between a process that is white noise and a highly persistent stationary process. We provide new evidence that in models with roots close to unity the use of asymptotic critical values may cause extreme size distortions. The existence of such size distortions has not been documented in the previous literature. While Kwiatkowski et al. (1992) report some size results for the KPSS test, their data generating processes are of relevance mainly for annual data. Leybourne and McCabe (1994) report low size distortions for the Leybourne and McCabe test in a somewhat more realistic setting, but their favorable simulation results appear to be due to a programming error. In contrast, we provide a comprehensive analysis of the size of both the KPSS test and the LMC test for the regions in the parameter space that are relevant for typical quarterly and monthly processes.
Our ﬁndings of severe size distortions are of immediate practical interest. It is well known that the processes of interest in empirical macroeconomics and in international ﬁnance tend to be highly persistent even under the null of stationarity (Rudebusch, 1993; Diebold and Senhadji, 1996; Lothian and Taylor, 1996). Since for such processes the KPSS and LMC test have a tendency to reject the null of stationarity whether it is true or not, we conclude that it is all but impossible to interpret rejections of the stationarity hypothesis in empirical work.
This fact also has important implications for the common practice of testing both the null hypothesis of a unit root and that of stationarity. Evidence against the stationarity null, but not the unit root null is typically interpreted as conclusive evidence that the underlying process is difference stationary (Baillie and Pecchenino, 1991;
Cheung and Chinn, 1997; Ely and Robinson, 1997; Moreno, 1997). Our results imply that applied users will tend to accept spuriously the difference stationary model in Stationarity tests also have been modiﬁed for the purpose of testing the null of cointegration (see Harris and Inder, 1994; Shin, 1994; McCabe et al., 1997). In related work, Caner (1998) discusses applications of stationarity tests at seasonal frequencies.
M. Caner, L. Kilian / Journal of International Money and Finance 20 (2001) 639–657 641 practice, given the low power of unit root tests. Rejections of stationarity in favor of a unit root process are indeed common in applied work. Moreover, size distortions of stationarity tests may generate contradictory test results with both null hypotheses being rejected.
We illustrate the practical importance of this point for tests of long-run PPP in the post-Bretton Woods period. Our empirical analysis extends recent work by Culver and Papell (1999) who applied the KPSS test to quarterly data under the recent ﬂoat. We use both the KPSS and the LMC test, and we include monthly real exchange rate data in our analysis. We ﬁnd that both tests are likely to overstate the evidence against long-run PPP under the recent ﬂoat. Notably the LMC test rejects the null of stationarity for virtually all countries in favor of a unit root process.
An important question is to what extent the size distortions of stationarity tests may be mitigated by replacing the asymptotic critical values by size-adjusted ﬁnitesample critical values. Recently, such corrections have been employed by Cheung and Chinn (1997), Rothman (1997) and Kuo and Mikkola (1999), among others. We derive size-corrected critical values for the LMC and KPSS test under economically plausible assumptions about the half-life of deviations from PPP. Using size-adjusted critical values for the KPSS and LMC test instead of asymptotic critical values, we are unable to reject the stationarity null for any country but Japan.
However, this sharp reversal in results cannot be interpreted as convincing evidence in favor of long-run PPP. We show that after size corrections the power of the KPSS (LMC) test may fall as low as 20% (22%) at the 5% level for the sample sizes and degrees of persistence of interest in the PPP literature. This means that tests based on size-adjusted critical values are unlikely to reject stationarity whether long run PPP holds or not. Thus, we learn very little from conducting tests with size-corrected critical values except in the rare case of a rejection of stationarity.
Our example of the problems with interpreting results of stationarity tests is representative for a wide range of applications in macroeconomics and international ﬁnance.
We conclude that these tests should not be used without size-adjusting the critical values and will tend to be of limited usefulness even with size-adjustments unless the sample size is very large.
In Section 2, we review the construction of the KPSS and LMC tests of stationarity. In Section 3 we document the size distortions of the KPSS and LMC tests based on asymptotic critical values. In Section 4, we illustrate the practical importance of the size distortions for applied work in the context of the PPP debate. In Section 5, we analyze the size-corrected power of the LMC and KPSS tests and reexamine the empirical ﬁndings. The conclusions are given in Section 6.
2. A review of the two leading examples of tests of stationarity
The two most widely used tests of the I(0) null hypothesis are due to Kwiatkowski et al. (1992) and to Leybourne and McCabe (1994). These two tests differ in how they account for serial correlation under H0. Whereas the KPSS test uses a nonparametric correction similar to the Phillips–Perron test, the LMC test allows for 642 M. Caner, L. Kilian / Journal of International Money and Finance 20 (2001) 639–657 additional autoregressive lags similar to the augmented Dickey–Fuller (ADF) test.
Although both tests have the same asymptotic distribution, the LMC test statistic converges at rate OP(T) compared to a rate of only OP(T/l) for the KPSS statistic where l is the autocorrelation truncation lag. Moreover, the LMC test is robust to the choice of lag order, whereas the KPSS test can be sensitive to the choice of l (Leybourne and McCabe, 1994; Lee, 1996).
2.1. Leybourne–McCabe test
where s2=ˆ ˆ/T is a consistent estimator of s2 and V is a T×T matrix with ijth element ˆ equal to the minimum of i and j. We reject the null hypothesis of stationarity if the test statistic exceeds its critical value under H0.
If b is known to be zero in population, the residuals ˆ t are obtained from regressing y∗ on an intercept alone. The resulting test statistic is t sa s−2T −2 ˆ Vˆ ˆˆ Asymptotic critical values for both of these statistics are provided in Kwiatkowski et al. (1992). We follow Leybourne and McCabe in programming the test in GAUSS.
A key issue in the implementation of the LMC test that is not discussed in their paper is the choice of starting values for the ARIMA(p,1,1) model. Rather than rely on the standard optimization algorithm used by the GAUSS–ARIMA routine, Leybourne and McCabe evaluate the likelihood function for a grid of initial values for the moving average parameter, q0, ranging from 0 to 1 in increments of 0.05, with the initial value of the autoregressive parameter(s) ﬁxed at q0 0.1. In particular, for p=1, the initial guess for the AR parameter is deﬁned as f0=q0 0.1. For p 1, the initial guess is f0=…=f0=q0 0.1. The starting value for the drift parameter is 1 p set equal to 0.1. In this paper, we extend Leybourne and McCabe’s procedure by including among the candidate models the model selected based on the default initial values supplied by the GAUSS–ARIMA routine. The model that achieves the highest likelihood is selected for the ﬁnal analysis. The GAUSS code for our estimation procedure is available upon request.2
2.2. KPSS test
The KPSS test of stationarity is based on the same model as the LMC test and has the same general structure. The KPSS test statistic for the model with time trend is computed as db s−2T −2 ˆ Vˆ ˆˆ where ˆ t is the least-squares residual from a regression of yt∗ on an intercept and deterministic time trend. The difference to the LMC test is that the KPSS test relies on a nonparametric estimator of the long-run variance of t l
where w(i,l)=1 i/(l+1) is the Bartlett kernel. This estimator is consistent if the truncIt can be shown that this modiﬁed algorithm results in considerably lower size distortions than use of the initial values supplied by the GAUSS–ARIMA routine, if the test is based on asymptotic critical values. Hobijn et al. (1998) use yet another procedure for initializing the GAUSS–ARIMA routine based on Yule–Walker estimates of the model under the null hypothesis. This choice of starting values has no theoretical justiﬁcation under the alternative hypothesis. In this paper, we therefore rely on the procedure originally proposed by Leybourne and McCabe.
644 M. Caner, L. Kilian / Journal of International Money and Finance 20 (2001) 639–657
3. Evidence of size distortions This is not the ﬁrst paper to examine the size of tests of stationarity. For example, Kwiatkowski et al. (1992) and Lee (1996) have provided size results for a range of sample sizes and values of r. However, their results are limited to the AR(1) model with slope parameter r 0.8. Kwiatkowski et al. make the case that “r=0.8 is a plausible parameter value since, if we take most series to be stationary, their ﬁrstorder autocorrelations will often be in this range” (pp. 171–172). This view may be plausible for some of the annual Nelson and Plosser (1982) data analyzed in Kwiatkowski et al., but it is highly unrealistic for most monthly and quarterly data.
In this paper, we make the case that many econometric applications of stationarity tests involve processes with roots much closer to unity (Rudebusch, 1993; Cheung and Chinn, 1997). While Leybourne and McCabe (1994) provide some additional small-sample evidence for the size of the LMC and KPSS tests, the low size distortions they ﬁnd for the LMC test appear to be due to a programming error. Moreover, their evidence is limited to processes with roots between 0 and 0.9. As we will show, there is reason to expect the dominant root of many stationary processes to be closer to 0.94–0.99 in practice. Thus, the relevant models from an economic point of view are a highly persistent process under the null of stationarity and a unit root process under the alternative. There is reason to doubt the ﬁnite-sample accuracy of the asymptotic critical values for such highly persistent processes.4 We illustrate this point by extending the simulation evidence for the KPSS test and the LMC test to processes with larger roots. We use the critical values compiled by Kwiatkowski et al. (1992) and used in most applied work.5 The size of the KPSS test is highly sensitive to the choice of the truncation lag Lee (1996) shows that some slight improvements may be possible if we use data-based selection procedures for l, but the differences tend to be small in practice.