Issues concerning benchmark choice illustrate the joint hypothesis problem faced by many mutual fund performance studies. Under the assumption that modeladjusted returns are perfect measures of skill, one can test for skill in the cross-section of funds. But if benchmark models are imperfect, it is challenging to distinguish skilled management from imperfect performance measures. In our setting, even if the performance measures are imperfect, as long as these imperfections affect index and active funds to the same extent, we can overcome the standard joint hypothesis problem and test the null of no skilled management.

The rest of paper is organized as follows. In Section 2, we describe our sample and benchmark models. Section 3 shows that index funds can appear skilled using methodologies to disentangle skill and luck in performance evaluation. In Section 4, we use the distribution of index fund performance to evaluate the extent of skill in active funds. Section 5 addresses potential concerns for our distributional comparisons by conditioning active funds on proxies for skill and explores the implications of our findings by examining only funds benchmarked or indexed to the S&P 500.

Section 6 concludes.

2. Data and Benchmark Models

2.1. Sample Construction We use fund characteristics and monthly returns from the Center for Research in Security Prices (CRSP) Survivor-bias-free U.S. Mutual Fund Database. Although the Vanguard 500 fund was introduced in the mid-1970’s, the number of index funds was small for the next two decades. Thus, we start our sample in 1995 with 29 index funds. Our sample contains 240 passive funds in total. We merge these data to s12 holdings data from Thomson Reuters using the WRDS MF Links file. We require that a fund match to the holdings data in order to be included in the sample. To avoid double-counting observations for multiple share classes, we aggregate information across share classes, weighting by total net assets in each class and summing total net assets across classes.7 To be included in our sample, funds must be at least 24 months old to avoid the incubation bias documented by Evans (2010). We also exclude funds whose average net fund assets are below $5 million in the sample. We focus on equity funds, requiring that on average over the sample, at least 90% and at most 105% of the fund’s assets be invested in common stocks for a fund to be included in the sample.

Many studies identify index funds as funds containing “INDEX” in the fund’s name. However, a portion of the funds identified as index funds in this manner are flagged as “Index Enhanced” or “Index-based Funds” by CRSP, suggesting a potential active component to the fund’s management. Because we treat index funds as a group of funds with no portfolio choice talent beyond the underlying index, we use a stricter definition of index funds, utilizing the CRSP index fund flag. This flag is only populated later in the sample, so we carry the earliest value back. Under our strict definition of index funds, we only identify funds with a value of “D” as index funds. This corresponds to “Pure Index Funds” in the CRSP manual.8 We examine each fund included in the Index Fund sample to verify our classification.

Exchange-traded funds (ETFs) are included in the sample. For our purposes, the We exclude several fund-months with obvious reporting errors in returns.

Our conclusions are unchanged when using a broader, name-based definition of index funds.

differences between traditional index funds and ETFs are minor as both represent traded portfolios without portfolio choice skill.

In Section 4, we test whether active funds exhibit performance superior to passive funds. To be conservative in dispersion of performance for the latter set, we restrict the sample to exclude sector, international, and emerging market funds.9 To identify these, we parse the fund names from CRSP and manually identify words associated with these funds. We flag funds containing these words as sector, emerging market, or international funds.10 If a fund is flagged in any month, we exclude it from the full sample. We also exclude sector funds based on Lipper codes provided by CRSP.11 Finally, we manually look at all remaining index funds to ensure that the fund is not a sector fund.

Table 1 reports summary statistics for our sample of index and active funds. The sample includes 2,153 distinct funds, 240 of which are passive index mutual funds or exchange-traded funds (ETFs). On average, the index funds in our sample are over twice as large as active funds, but are also younger.12 This is consistent with the rapid increase in index funds over the last two decades. As expected, expenses are much lower for index funds. The average expense ratio is 47 basis points for index funds and 125 basis points for active funds.

2.2. Are Index Funds Passive?

We assume that index funds are passive investments with no portfolio choice skill. To assess the validity of this assumption, we present evidence from holdings and returns that index funds are predominantly passive investment vehicles.

In Table 1, turnover, as reported by CRSP, is much lower for index funds; the median index fund has a turnover of 24% compared to 74% for active funds. Another We exclude these funds because they may have the potential to create dispersion in the index fund distribution due to concentrated holdings. Our results are robust to inclusion of these funds.

Some words may plausibly appear as part of the fund name (e.g., due to the fund family) or in ways that are clearly not related to a sector fund. We manually checked words where this is the case and did not flag funds that are clearly not sector funds.

A list of the words and Lipper codes is available upon request.

We control for these differences by examining dollar returns and t-statistics, respectively.

measure used to capture unobserved actions of funds is the Return Gap measure of Kacperczyk, Sialm and Zheng (2008), which compares the net investor return with the net holdings return. The net holdings return represents the return the fund would have experienced if it had kept holdings constant. Any differences in returns are due to interim trades by the fund. Table 1 shows that the cross-sectional average and median return gap measures are close to zero for both active and passive funds, but that the dispersion is about twice as large for active funds relative to passive funds, indicating that index funds are much less active funds.

We also obtain benchmark information, Active Share, and tracking error volatility from Antti Petajisto’s website constructed using the methodology outlined in Petajisto (2013).13 These data are available through 2008. For the subset of funds in the Petajisto (2013) dataset matching our sample, we first confirm that index funds are holding the underlying index constituents at the same weights as in the index. Cremers and Petajisto (2009) develop a holdings-based measure, Active Share, to study how active a manager is. If index funds are substantially deviating from benchmark weights, the Active Share should deviate from zero. Positions orthogonal to the index would have an Active Share of one.

Table 1 shows large differences in Active Shares across fund type. The median index fund has an Active Share of 0.02, indicating that these funds hold assets in proportions very close to those of the benchmark. On the other hand, the median active fund deviates widely from its benchmark, as evidenced by the median Active Share of 0.8. The Petajisto (2013) dataset also calculates annualized volatility of tracking error of daily returns. We again find large differences between active and passive funds on this dimension. The median index fund has tracking error volatility of 70 bps per year while that of the median active fund is 6.5%. On balance, index funds appear quite passive relative to actively managed funds, validating their use as funds with no portfolio choice skill beyond that of the underlying index.

Available at: http://www.petajisto.net/data.html

–  –  –

return on benchmark return or factor j in period t. Standard errors are adjusted for heteroskedasticity.

We report summary statistics of the full sample gross Fama-French-Carhart factor loadings in Table 1. The loadings are similar across the index and actively managed funds. Both groups have average market betas of approximately one and a slight tilt See, for example, Cremers, Petajisto and Zitzewitz (2013) or Berk and van Binsbergen (2014b).

Factor returns are available at Ken French's website:

http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data library.html.

towards small firms. Neither group loads heavily on value or momentum strategies on average.

The estimated α is our first measure of performance. An advantage of this measure is that it provides the economic magnitude of any abnormal performance, allowing us to gauge the economic value added by a fund. Due to different sample lengths or heterogeneous risk-taking by funds, an estimated α may not have attractive sampling properties. For these reasons, Kosowski et al. (2006) and Fama and French (2010) analyze the distribution of t-statistics associated with estimated alphas. Consequently, we use t(α) as our second measure of performance.

We examine these measures using before fee (gross) returns.16 Gross alphas (and t-statistics) allow us to ask the question of whether or not a fund exhibits sufficient skill to outperform the passive benchmark implied by the model (an alpha greater than zero) or sufficient skill to outperform alternative investments (an alpha greater than that of index funds).17

3. Skilled Index Funds?

Index funds, by definition, should be devoid of portfolio selection skill. In this section, we assess the efficacy of tests designed to separate skill and luck in performance evaluation. The results in this section are essentially placebo tests of the current methodology.

3.1. Bootstrapping the cross-section of t(α) Recent work by Fama and French (2010) and Kosowski, Timmermann, Wermers and White (2006) uses bootstrap analysis to simulate distributions of skill measures under the null of no skill. These studies both recognize that the underlying cross-sectional distribution of fund returns is likely to be non-normal, and therefore For some funds, CRSP does not report expenses monthly. For these funds, we carry forward the annually reported fees to subsequent monthly observations.

A study of net returns, on the other hand, addresses a different question: whether active managers have sufficient skill to cover the fees they charge to investors. However, this will capture, in part, the bargaining process between fund investors and managers (Berk and Green (2004)). To


from this confounding economic mechanism, we analyze gross performance.

inference based on standard critical values can be confounded. The idea is simply that there is a spread in alpha estimates due to noise in estimation and the statistical properties of the individual fund returns, even in the absence of true alpha (i.e., skill).

Both studies simulate the null distribution of fund returns by sampling from actual fund returns net of estimated alphas. The studies differ on sample construction and bootstrap methodology, but both conclude that a small set of active funds possesses skill.

How do index funds, a set of funds with no portfolio selection skill beyond the underlying benchmark, fare relative to the bootstrap distribution? The results provide insights into the effect of possible model mis-specification and the importance of non-portfolio selection activities such as trading costs and securities lending revenues. To assess this, we use the approach of Fama and French (2010). While our sample differs, our bootstrap methodology is the same. For each fund-month, we subtract a fund’s estimated gross alpha from the fund’s monthly gross return. This leaves us with a panel of monthly fund zero-alpha returns. From this data, we draw a bootstrap sample of months (with replacement) from the set of all months in our sample. If we draw a given month, we use all fund returns from that month to retain any cross-sectional correlation in monthly returns. For each bootstrap sample, we then calculate the time series alpha and t(α) for each fund. This provides us with a cross-sectional distribution of t(α) estimated from returns that by construction should have a true alpha of zero. We repeat this 10,000 times and average across the bootstrap samples at each point in the distribution of estimated t(α). We perform this analysis for zero-alpha distributions using each of the benchmark models described in Section 2.

Figure 1 plots the bootstrapped and actual distributions of gross zero-alpha performance (in terms of t(α)) for index and active funds under the Fama-FrenchCarhart model. The bottom panel shows that the results for actively managed funds in our sample are consistent with Fama and French (2010). There is evidence active funds both underperform (below the 40th percentile) and outperform (above the 40th percentile) the bootstrapped zero-alpha distribution. However, the top panel of Figure 1 shows that index funds also generally outperform the no-skill distribution above the 20th percentile. This may not be surprising given that the index funds face practical trading considerations (e.g., trading costs or equity lending fees). However, if the alpha due to portfolio selection is zero, then this would imply that lending fees significantly exceed trading costs, which seems unlikely. It is surprising that the index funds perform better compared to the bootstrapped sample over a large part of the distribution.

