# «EMPIRICAL GROUND-MOTION MODELS FOR PROBABILISTIC SEISMIC HAZARD ANALYSIS: A GRAPHICAL MODEL PERSPECTIVE Kumulative Dissertation zur Erlangung des ...»

** Figure 5.9: Mean residuals; (a) between-event residuals, calculated with global parameters; (b) within-event residuals, calculated with global parameters; (c) between-event residuals, calculated with regional parameters; (d) within-event residuals, calculated with regional parameters.**

**5.7 Discussion and Conclusions**

Regional dependence of ground motion scaling is an open research question currently under debate (Douglas, 2009), whose answer has important implications for PSHA. Here, we look at this problem from a different angle. For this purpose, we have developed a ground motion model that can take regional differences in ground motion scaling into account. The coefﬁcients in individual regions can be different, though we assume they are similar, which means that all data points from all regions are used to estimate the coefﬁcients, though with different weights. The degree of the weights depends on the actual differences and the amount of data in the different regions [cf. the Appendix, eq. (5.18)].

The model/approach we have proposed is quite ﬂexible. Here, we have allowed all coefﬁcients to vary between regions. Furthermore, the variances of the global parameter distributions (e.g. a0 and so on, cf. Figure 5.4) are determined by the data. However, these two points are not a must – we could have speciﬁed that some or all regions share coefﬁcients, or assumed that the regional variability of, for example, the scaling of PGA with distance is ﬁxed at some value. However, we did not feel comfortable deciding such an issue – we think that current knowledge does not support doing so. Nevertheless, the model is ﬂexible enough to allow for that possibility.

The results of the analysis regarding regional dependence of PGA scaling with magnitude or Discussion and Conclusions

** Figure 5.10: Mean residuals per region, calculated with global ( ) and regional ( ) parameters: (a) between-event residuals; (b) within-event residuals.**

distance are somewhat inconclusive, which is to some extent expected, as the model is not intended to answer such questions. In Figure 5.6, we have seen that there are differences in the posterior distributions of the parameters between different regions. These differences are most pronounced for the distance related parameters b.

However, the interpretation of this feature as a physical characteristic might be subject to a caveat. The event related parameters are determined from 227 data points (the number of earthquakes in the dataset), while the record and site related parameters rely on 9,831 and 3,543 data points, respectively (cf. section 5.4). Therefore, the former can only be estimated with a higher degree of uncertainty. Furthermore, there are large differences in the number of records per region (cf. Table 5.1). To a lesser degree, this is also true for the number of earthquakes. This leads to the fact that parameters for regions with a large number of records (e.g. California, Europe, Japan and Taiwan; region ids 1, 5, 8, respectively) can be determined with higher precision than those for regions with a small number of records (e.g. Northeast China, region id 9). It also affects the way the partial data pooling is handled for the different regions (cf. the Appendix section 5.7). Thus, some of the apparent regional differences seen in Figure 5.6 might in part be ascribed to different sizes of the datasets.

To further investigate this issue, we look at the width of the global parameter distributions. Each parameter r ∈ {ar ; br ; cr } is sampled from a global distribution, r ∼ N ( ; ). The parameters of this distribution, and , are itself associated with uncertainty, which is quantiﬁed in their posterior distribution. The width of the global parameter distribution, , is an indicator of regional differences of the parameters. However, since the scale of the coefﬁcients is quite different, one cannot compare the standard deviations directly.

**Therefore, we look at the coefﬁcient of variation, which is deﬁned as the standard deviation divided by the mean, :**

cov = (5.17) The higher the coefﬁcient of variation, the larger the width of the distribution. For each sample from the posterior distribution of and , we can calculate cov, and thus get a posterior distriDiscussion and Conclusions 1.4

** Figure 5.11: Modal values of the distribution of the coefﬁcient of variation for the global distributions, calculated as cov = .**

bution for the coefﬁcient of variation. The modal values of these distributions are shown for each parameter in Figure 5.11.

For the distance related parameters, there are low cov-values for bH and bI, while bP and bQ are associated with larger values. Since these are based on a comparatively large amount of data (9,831 data points), this indicates that the latter parameters are indeed subject to stronger regional differences. This also makes physical sense: bP can be interpreted as a ‘pseudo-depth’, representing an average depth of the events in different regions. It can easily be imagined that this can be different in different regions. The parameter bQ, on the other hand, controls the anelastic attenuation with distance. It is especially important to model long-distance attenuation and is related to the quality factor QH. Again, it is physically plausible that this differs between regions. This can also explain the differences seen between Figures 5.9 (b) and 5.9 (d), which show the within-event residuals plotted against RRUP. Using the global parameters, there is a trend visible for large distances, which diminishes when the regional parameters are used. This is an indication that for large distances, regional differences are relevant.

High cov-values are also taken for the parameters aR and aS, which control the scaling for normal and reverse focal mechanism, respectively. Furthermore, also the site parameter cI and cP are associated with large cov-values. While this could be interpreted as regional differences, in this case there might be an alternative interpretation. In contrast to the other variables, there are QH.

several missing values for both the focal mechanism and VS Even though we can deal with them (cf. section 5.5), this leads to an increase in uncertainty. Moreover, the distributions of both QH the focal mechanism and VS are quite uneven over the different regions, which adds further “spread”. Therefore, we think it is not feasible to interpret the parameters aR, aS, cI and cQ in terms of regional differences.

In Figure 5.11, it can be seen that for the earthquake related parameters (except aR and aS ), the cov-values are low. This would indicate no or only small regional differences. However, as we have already elaborated, the number of events is small (227), and thus the parameters can only be estimated with considerable uncertainty (cf. Figure 5.6). Therefore, even if we ﬁnd no or only small regional differences in the parameters aH, aI, aP, and aQ, it is possible that the number of Discussion and Conclusions data does not sufﬁce to detect such differences.

The above discussion and ﬁndings illustrate an important point: It is not a GMM as a whole that is subject to regional differences or not, but certain aspects of it. Here, we ﬁnd that the scaling of PGA with long distances is regional dependent, while magnitude scaling is probably not. By supplementing the NGA dataset with data from small to moderate earthquakes, Chiou et al. (2010) observe a difference in scaling with magnitude for small (¡6) magnitudes between central and southern California.

Another important aspect of our model is dealing with parameter uncertainty. The parameters of model are estimated by Bayesian regression, resulting in their posterior distribution given data, r(¢ D), which reﬂects the epistemic uncertainty of the parameters. This enables a full i.e.

probabilistic treatment of the model in PSHA (if desired). The Bayesian approach to parameter estimation also makes it possible to incorporate prior knowledge in a principled way (cf. section 5.5.1). Most often, there is some prior domain knowledge that one can use to conﬁne parameters.

This is particular useful when the amount of data is sparse, and hence the parameters are not well constrained by data. However, the speciﬁcation of the prior is not an easy task. In particular, assigning a probability, i.e. a number between 0 and 1, to a speciﬁc parameter value can be difﬁcult.

In this work, we have used stochastic simulations (Boore, 2003) to specify the prior parameter distributions of our model (see section 5.5.1). We believe that this is a reasonable way to incorporate prior knowledge, since it is comparatively easy to specify a distribution over physical parameters such as stress drop or QH. This is also a good way to combine the output of simulations with regressions from empirical data.

The between-event and within-event variabilities and of our model are comparatively large.

E.g. the total standard deviation for PGA using the global parameters is 0.742. For Europe, the total standard deviation is 0.743, compared to a value of 0.683 for the model of Akkar and Bommer (2010). Part of this difference can be attributed to how the two horizontal components are combined (Beyer and Bommer, 2006). Here, we use the larger horizontal component, while Akkar and Bommer (2010) used the geometric mean. However, the larger issue in this context is probably limitations in the dataset. There are measurement uncertainties associated with the QH predictor variables MW, RRUP, VS and F, which are larger than e.g. in the NGA dataset. For QH example, the VS values are calculated from topographic slopes, using the method of Wald and Allen (2007). These measurement uncertainties lead to an increased ground motion variability.

In principle, it would be possible to incorporate measurement uncertainties of the predictor variables into the model. However, we have refrained from doing so, since it would complicate an already complicated analysis, and require knowledge about the speciﬁc uncertainties of the data, which is not available in the dataset. To include measurement uncertainties, one would perform Monte Carlo simulations over the possible predictor values. In the graphical model (Figure 5.4), this would correspond to replacing the (deterministic) nodes for the predictor variables with a stochastic node, whose mean and width are the measurement and the associated uncertainty, respectively.

The above paragraph emphasizes an important aspect of graphical models – their extendability and ﬂexibility. It is very easy to extend the model by adding nodes that might describe new aspects of the domain. Due to the factorization properties of graphical models, only local conditional probabilities need to be speciﬁed or changed. The graphical structure also provides an intuitive Discussion and Conclusions insight into the model and serves as a proxy for the data generating process. In particular, it is easy to change the model by moving around nodes (easier than changing equations, which can get messy). E.g., moving one node out of the region plate means that this parameter is not regional dependent.

In the future, one can think of extending a graphical model such as depicted in Figure 5.4 to make the magnitudes distributed according to a Gutenberg-Richter distribution. The parameters a and b of the GR-relation can then be nodes in the model, capturing their uncertainty. That way, one can build a graphical model to calculate the hazard, with all associated uncertainties.

Data and Resources The dataset used in this study is the one compiled by Allen and Wald (2009). Information about the records can be found in the electronic supplement. The MCMC sampling was done using the software OpenBUGS (http://www.openbugs.info/w/), version 3.06.

Electronic Supplement The electronic supplement to this document contains information about the used earthquakes and records, as well sa plots of the histograms of the sampled posterior distributions of the parameters. It is available at http://www.geo.uni-potsdam.de/mitarbeiter/Kuehn/ kuehn-esupp-bayesreg.html Acknowledgements Trevor Allen publishes with the permission of the Chief Executive Ofﬁcer of Geoscience Australia.

Appendix: Multi-level Modeling In this work, we assume that there is a GMM for each region r, which is sampled from a “global” distribution of GMMs (strictly speaking, there is a global distribution for each coefﬁcient). This is an example of a multilevel model. Usually, if one has data from different groups, there are two natural ways to deal with it: Complete pooling, where all data is lumped together and inference is made on the whole dataset, and no pooling, where all groups are treated separately and inference is made individually for each group. The ﬁrst approach can lead to problems if there are differences between the groups, while in the second approach some groups may not contain enough data to make reliable inferences.

By contrast, a multilevel model provides a compromise between these two extremes, partial pooling. Here, data from all groups is used for inference in the individual groups, but with different weight. As an example, imagine that we have measurements on a variable from different groups/regions, and we want to estimate the means of this variable. It is straightforward to compute the overall mean, all (the pooled estimate), as well as the unpooled estimate for each region Discussion and Conclusions

P where nr denotes the number of measurements in region r, is the within-region variance, and P is the variance among the average values of the regions.

r If the number of data in a region, nr, is large, then the ﬁrst term in eq. (refeq: ch4 multimean) carries more weight, and the partial pooled estimate will be close to the unpooled one. Conversely, if nr is small, the the second term outweighs the ﬁrst, and the region estimate is close to the global one.

In eq. (5.18), mean values of variables in different regions are estimated. It is straightforward to generalize it to cases where not only means, but also regression coefﬁcients are estimated. For more details, see Gelman and Hill (2007).