«5 EXPLOITING SPATIAL DEPENDENCE TO IMPROVE MEASUREMENT OF NEIGHBORHOOD SOCIAL PROCESSES Natalya Verbitsky Savitz* Stephen W. Raudenbush† A number ...»
DEPENDENCE TO IMPROVE
Natalya Verbitsky Savitz*
Stephen W. Raudenbush†
A number of recent studies have used surveys of neighborhood
informants and direct observation of city streets to assess aspects of community life such as collective efficacy, the density of kin networks, and social disorder. Raudenbush and Sampson (1999a) have coined the term “ecometrics” to denote the study of the reliability and validity of such assessments. Random errors of measurement will attenuate the associations between these assessments and key outcomes. To address this problem, some studies have used empirical Bayes methods to reduce such biases, while assuming that neighborhood random effects are statistically independent. In this paper we show that the precision and validity of ecometric measures can be considerably improved by exploiting the spatial dependence of neighborhood social processes within the framework of empirical Bayes shrinkage. We compare three estimators of a neighborhood social process: the ordinary least This work was supported by National Science Foundation Grant 218966.
We thank Ben Hansen, Ed Ionides, Jeff Morenoff, Susan Murphy, and the anonymous reviewers for their thoughtful comments. Direct correspondence to Natalya Verbitsky Savitz, Mathematica Policy Research, Inc., 600 Maryland Ave., SW, Suite 550, Washington, DC 20024; email: NVSavitz@mathematica-mpr.com.
*Mathematica Policy Research, Inc.
†University of Chicago 152 SAVITZ AND RAUDENBUSH squares estimator (OLS), an empirical Bayes estimator based on the independence assumption (EBE), and an empirical Bayes estimator that exploits spatial dependence (EBS). Under our model assumptions, EBS performs better than EBE and OLS in terms of expected mean squared error loss. The benefits of EBS relative to EBE and OLS depend on the magnitude of spatial dependence, the degree of neighborhood heterogeneity, as well as neighborhood’s sample size. A cross-validation study using the original 1995 data from the Project on Human Development in Chicago Neighborhoods and a replication of that survey in 2002 show that the empirical benefits of EBS approximate those expected under our model assumptions; EBS is more internally consistent and temporally stable and demonstrates higher concurrent and predictive validity. A fully Bayes approach has the same properties as does the empirical Bayes approach, but it is preferable when the number of neighborhoods is small.
Social scientists have long known that urban neighborhoods vary substantially in rates of crime (Shaw and McKay 1942), disease rates (Zubrick 2007), mental health problems (Leventhal and Brooks-Gunn 2000), birth weight (Buka et al. 2003; Morenoff 2003), and education, fertility, and earnings (Galster et al. 2007). One major debate concerns the extent to which such associations are causal rather than attributable to the background characteristics of persons and families who migrate into these neighborhoods (see reviews by Duncan and Raudenbush ; Oakes ; Diez-Roux ). Closely related is the theoretical question of how social processes might produce these outcomes and how those processes might be measured in order to test theories about neighborhood influences. However, answers to these questions require reliable and valid measures of the social processes in urban neighborhoods.
In this paper, we propose that one can exploit spatial dependence to improve measures of these social processes. Recent advances in statistical theory enable the study of spatially dependent random effects (Banerjee, Carlin, and Gelfand 2004; Verbitsky 2007). We adopt this approach here. In particular, we formulate a first-order Markov model for spatial dependence (Anselin 1988) within a hierarchical linear model with normal-theory random effects. We then provide a series of empirical tests of internal consistency, temporal stability,
EXPLOITING SPATIAL DEPENDENCE FOR MEASUREMENTconcurrent validity, and predictive validity using data collected in 1995 and 2002 by the Project on Human Development in Chicago Neighorhoods (PHDCN).
2. THE LOGIC OF NEIGHBORHOOD MEASUREMENTSampson, Raudenbush, and Earls (1997) found that the “collective efficacy” of urban neighborhoods—defined as the fusion of social cohesion and informal social control—significantly predicted low rates of perceived violence, violent victimization, and homicide in Chicago neighborhoods after controlling for demographic characteristics of neighborhoods obtained from the U.S. Census and found to be predictive of crime in past studies. In a similar vein, Browning (2002) found that collective efficacy predicted partner violence, and Browning, Leventhal, and Brooks-Gunn (2005) found a significant association between collective efficacy and low rates of early sexual initiation. Sampson, Morenoff, and Earls (1999) considered the association between neighborhood composition and various aspects of social capital, including collective efficacy as well as the density of kinship and friendship networks and the intensity of reciprocal exchanges among neighbors.
In these studies, survey researchers measured neighborhood social processes by sampling adults within spatially defined units (“neighborhood clusters”), regarding each respondent as an informant about relations among neighbors. Responses to conceptually related questions were combined to form scales intended to measure each informant’s assessment of a latent construct such as collective efficacy. Next, analysts combined these scales across informants within neighborhoods to generate indicators of neighborhood-level latent variables. These neighborhood-level indicators then became the indicators of social process used in the studies just cited. Variation between items within informants and between informants within neighborhoods generates errors of measurement of the neighborhood-level latent variable.
Raudenbush and Sampson (1999a) coined the term “ecometrics” to describe the study of the reliability and validity of assessments of ecological units such as neighborhoods. This work parallels earlier work on the assessment of school climate using teachers as informants (Raudenbush, Rowan, and Kang 1991). Just as psychometrics 154 SAVITZ AND RAUDENBUSH identifies sources of error in assessments of cognitive skill and personality, ecometrics identifies sources of error in studies of social settings.
Using this logic, we can readily see that the reliability of a neighborhood social process measured by interviewing informants will depend on item consistency (the association between item responses within a scale), the number of items in the scale, the degree to which neighborhood informants agree on social relations in a local area, and the number of informants sampled per local area.
The logic of ecometrics applies similarly when researchers measure neighborhood characteristics through direct observation.
Raudenbush and Sampson (1999a) studied physical and social disorder of Chicago neighborhoods using such observational data. For example, to assess physical disorder, observers coded each city “face block” (one side of a street) on the presence or absence of garbage, broken bottles, abandoned cars, graffiti, cigarette butts, needles or syringes, and condoms. Using a three-level hierarchical logistic regression model, the authors combined responses to these items within a face block and across face blocks within a neighborhood cluster to produce a measure of physical disorder within that neighborhood cluster and to estimate the variance of errors of measurement. The reliability of such a measure will depend on the internal consistency of the items, the number of items, the similarity of face blocks within a neighborhood, the number of face blocks sampled per neighborhood, and the time of day.
Researchers have used such observational measures to study the association between neighborhood disorder and children’s physical activity (Molnar et al. 2004), and violent crime (Sampson and Raudenbush 1999). McCrea and colleagues (2005) used a survey-based measure of perceived neighborhood disorder to predict fear of crime, while Sampson and Raudenbush (2003) studied the association between observed and perceived disorder, showing that perceptions of disorder are influenced not only by observable disorder but also by the demographic composition of the local area.
Regardless of whether interviews of key informants or direct observations are used to measure a neighborhood social process, budget constraints will impose some limits on reliability of measurement. In the case of interviews, the sample size of informants per neighborhood cluster constrains reliability. Using almost 8000 informants to measure 343 Chicago neighborhoods, Raudenbush and Sampson (1999a) showed that the reliability of measurement of key social processes ranged
EXPLOITING SPATIAL DEPENDENCE FOR MEASUREMENTbetween 0.70 and 0.85. In the case of direct observation, they showed that the number of face blocks observed per neighborhood imposed the key constraint on reliability. Reliability ranged between 0.70 for social disorder and 0.98 for physical disorder. However, their study, part of the Project on Human Development in Chicago Neighborhoods (PHDCN), sampled about 200 face blocks per neighborhood, requiring a budget that often is unavailable.
In this paper, we consider the problem of bias that arises in estimating the association between a neighborhood social process measured with error and an outcome of interest. Random errors of measurement at the level of the informant, combined with small neighborhood sample sizes, will attenuate the estimated associations. To address this problem, some studies have used empirical Bayes methods to reduce such bias (Sampson, Raudenbush, and Earls 1997; Morenoff, Sampson, and Raudenbush 2001). Using this approach, latent variables of interest are regarded as independently distributed across neighborhoods. The posterior mean of the random effect, given the estimated variance components, is a weighted average of the neighborhood sample mean and the overall mean. Under the model assumptions, this “shrinkage” toward the overall mean eliminates the bias that arises from measurement error of the neighborhood social process.
Given the spatial contiguity of neighborhoods, however, the independence assumption regarding the neighborhood random effects is implausible. In this paper, we show that the precision and predictive validity of ecometric measures can be considerably improved by exploiting the spatial dependence of neighborhood social processes within the framework of empirical Bayes shrinkage. We compare three estimators of a neighborhood social process: the ordinary least squares estimator (OLS), an empirical Bayes estimator based on the independence assumption where random effects are regarded as exchangeable (EBE), and an empirical Bayes estimator that exploits spatial dependence (EBS). Under our model assumptions, EBS performs better than EBE and OLS in terms of expected mean squared error loss. The benefits of EBS relative to EBE and OLS depend on the magnitude of spatial dependence and the degree of neighborhood heterogeneity, as well as a neighborhood’s sample size.
Of course, the superiority of EBS under our model assumptions does not prove that EBS will be useful in practice, because our model assumptions may not hold up in practice. Thus, the failure of these 156 SAVITZ AND RAUDENBUSH assumptions may negate the expected benefit of EBS. To investigate this possibility, we conduct a cross-validation study using the original 1995 PHDCN data (Sampson, Raudenbush, and Earls 1997) and a replication of that survey that was done in 2002. The results show that the empirical benefits of EBS approximate those expected under our model assumptions.
The empirical Bayes approach adopted in past studies of neighborhood social processes draws upon a long tradition of research on the problem of simultaneous estimation of J means. This problem arises when data are collected on a number of independent groups, but the amount of data on each group is insufficient to estimate the group mean precisely.
Previous research has shown that a “shrinkage” estimator, such as an empirical Bayes estimator, performs better than the sample mean in predicting the population means of independent groups (e.g., Stein 1956; James and Stein 1961; Lindley 1971; Efron and Morris 1972a, 1973, 1975, 1977) despite the fact that the sample mean is the maximum likelihood as well as the uniform minimum variance unbiased (UMVU) estimator of the population mean in each group when the data are normally distributed. This apparent contradiction is known as Stein’s paradox.
Statisticians later extended James and Stein’s work from a Bayesian perspective. Lindley (1971) noted that this problem could be seen as estimating the means in the analysis of variance (ANOVA) case.
If the ANOVA is approached from Bayesian perspective, then the mean of the posterior distribution has a form similar to that proposed by James and Stein and is admissible. Jackson, Novick, and Thayer (1971) extended Lindley’s theoretical results and demonstrated their utility on several real data sets. In a later work, Novick et al. (1972) performed a cross-validation study to demonstrate some of the advantages of using the empirical Bayes estimator over the least squares in predicting future grade point averages of students. Efron and Morris (1972a, 1972b,
1973) extended the statistical theory and provided simulation results
EXPLOITING SPATIAL DEPENDENCE FOR MEASUREMENTdemonstrating the practical utility of James-Stein estimator. They also noted that an empirical Bayes estimator is a James-Stein estimator.
Efron and Morris (1977) discussed and illustrated Stein’s paradox using various real data sets and provided some cross-validation study results.
The basic idea of Stein’s paradox is that one can borrow strength when estimating a group mean by using the data from other groups when these groups are independent.