It is common in empirical work to test the null hypothesis of stationarity against the alternative of a unit root process. We showed that the use of conventional asymptotic critical values for commonly used stationarity tests may cause extreme size distortions, if the model under the null hypothesis is highly persistent. The existence of such size distortions is not surprising from the point of view of econometric theory. It is merely the mirror image of the well-known problem of low power of unit root tests against local alternatives. Nevertheless, few applied users appear to be aware of the potential pitfalls of using tests of the stationarity null. There is a widespread belief that these tests — in conjunction with tests of the unit root null — provide compelling evidence about the true order of integration of the underlying series. The aim of this paper has been to alert applied users to the fact that this perception often is misleading. In fact, we showed that such confirmatory data analysis will tend to favor the unit root hypothesis, regardless of whether it is true.

In most applications of stationarity tests in empirical macroeconomics and in international finance the process under the null hypothesis will be highly persistent. Given our evidence of large size distortions for realistic degrees of persistence and sample sizes, one would expect stationarity tests to reject the null hypothesis of stationarity far too often, even if the true model is stationary. Thus, the common practice of viewing tests of the null hypothesis of stationarity as complementary to tests of the M. Caner, L. Kilian / Journal of International Money and Finance 20 (2001) 639–657 655 unit root null must be regarded as questionable. Given the low power of unit root tests against highly persistent alternatives, this practice will tend to result in spurious acceptances of the unit root hypothesis.

We stress that our simulation evidence is limited to the two stationarity tests most widely used in applied work, the KPSS and the LMC test. Further research will be needed to establish whether similar conclusions hold for other tests of the null hypothesis of stationarity. For example, Choi (1994) proposed a modification of the KPSS test. Unlike the KPSS and LMC tests, this test does not appear to have been used in applied work, however. The tentative simulation evidence for the size and power of this test provided by Choi (1994) suggests that his test will have better size properties than the KPSS test for large samples, but worse size properties for small samples. Thus, no one test is most accurate in all situations. It appears unlikely that any test of stationarity can ever be designed that completely overcomes the small-sample size distortions we document and retains reasonable power, but additional research into the tradeoffs between alternative stationarity tests for a given sample size of interest is likely to help applied researchers to make an informed choice between alternative tests and to interpret test results obtained in practice.

We illustrated the practical importance of the size distortions of the KPSS and LMC tests for testing the hypothesis of long-run PPP under the recent float. The results of stationarity tests based on asymptotic critical values were shown to be potentially very misleading and difficult to interpret in practice. Consistent with our simulation evidence, we found stronger evidence against long-run PPP based on the LMC test than based on the KPSS test. However, we showed that both tests are likely to overstate the evidence against long-run PPP. We also showed that if the same test is used with size-corrected critical values based on economically plausible models, the size-adjusted power of the test drops sharply, and the observed failures to reject stationarity cannot be interpreted as convincing evidence in favor of mean reversion in real exchange rates. Only in the rare case that stationarity is rejected after size adjustments do these tests shed light on the PPP question. We concluded that tests of the null hypothesis of stationarity (and by extension tests of the null hypothesis of cointegration) are of limited usefulness for the PPP debate and by extension for other empirical work with monthly and quarterly data based on small samples.

Acknowledgements We thank Bob Barsky, Bruce Hansen, Bart Hobijn, Jan Kmenta, Steve Leybourne, David Papell, Shaowen Wu, the editor and two anonymous referee for helpful discussions.

