# «EMPIRICAL GROUND-MOTION MODELS FOR PROBABILISTIC SEISMIC HAZARD ANALYSIS: A GRAPHICAL MODEL PERSPECTIVE Kumulative Dissertation zur Erlangung des ...»

For each target variable, our GMM has 12 parameters [cf. eqs. (4.17) to (4.19)]. This makes in total 60 parameters for the ﬁve target variables. In addition, there are 15 independent entries ¨ in each of the covariance matrices and T. Hence, in total there are 90 parameter posterior distributions to estimate. We also compare the results with other simulations (see section 4.6), using different priors and/or no correlation, thus further increasing the number of parameters. We believe that it is important to show plots/comparisons of all posterior distributions. However, this would inﬂate the paper unnecessarily, so they are made available online in the electronic supplement.

In Figure 4.6, the approximated posterior distribution of the parameter aI is shown, which conI trols the scaling of PGA with magnitude [cf. eq. (4.17)]. The prior distribution of aI is also I displayed in Figure 4.6. As one can see, prior and posterior distribution are different, thus illustrating how the former gets updated by the data. One can also see in Figure 4.6 that the posterior r(a distribution of aI, I D), is approximately normal. Hence, it can be fully described by the I I Results

** Figure 4.7: Plot of median values and 5% and 95% quantiles for posterior parameter distributions, which are rescaled to range between -1 and 1.**

For each parameter, ﬁve intervals are shown, corresponding to the different targets PGA (lowermost interval), PGV and PSA at 0.3s, 1s and 3s (uppermost interval).

sufﬁcient statistics, its mean and standard deviation. In the electronic supplement, we provide plots of the posterior histogram and prior distribution for all parameters. There it can be seen that all posterior distributions can be considered approximately normal, and hence can be described by their means and standard deviations.

** In Table 4.3, the means of the posterior distributions for each parameter are listed.**

Correspondingly, Table 4.4 contains the respective standard deviations, which allow to assess the uncertainty of each parameter. A 90% credible interval for each parameter is shown in Figure 4.7, where the 5%, 50% and 95% quantiles of each posterior distribution is plotted. Since the magnitude of each parameter is different, the posterior distributions are rescaled to range between -1 and 1 for better comparison.

The posterior distributions are unimodal and approximately symmetric (as described above, they can even be considered to be approximately normal). Hence, we can use the mean values as maximum a posteriori (MAP) point estimates. These can be used as a remedy to obtain point predictions. In Figure 4.8, we show the residuals between such point predictions and the data.

Both between-event and within-event residuals are shown. As one can see, there is no obvious trend of the residuals with magnitude or distance. However, the event terms are slightly overpredicted. We associate this bias with the prior distribution (see also the discussion). There are only 159 data points to “pull away” the distributions of the event-related parameters (a) from the prior, while there are 7,957 for the record-related ones (b). This results in a higher inﬂuence of the prior distribution on the former than on the latter. Nevertheless, the bias is small, and in general the model predicts the data well.

Results

** Figure 4.9: Correlation between different ground-motion predictor variables, gray shaded from white (0) to black (1).**

(a) Between-event correlation (b) Within-event correlation.

4.6 Discussion and Conclusions We have obtained a ground motion model for the target variables P GA, P GV and P SA at 0.3s, 1s and 3s. Since the model is a multivariate one, we can directly, during the learning phase, estimate the covariance structure, i.e. the correlations between the ﬁve targets. In particular, the model directly presents an assessment of both between-event and within-event covariance. By contrast, usually studies that investigate correlations between different ground motion intensity parameters study total residuals (though recently also between-event and within-event residuals have been Discussion and Conclusions Table 4.3: Mean values of posterior distributions for the parameters.

taken into account (Baker and Jayaram, 2008)).

In general, our analysis shows that the behavior of between-event and within-event correlation with period is similar, which is in agreement with the ﬁndings of Baker and Jayaram (2008).

However, between-event correlation is generally slightly larger than within-event correlation (cf.

** Figure 4.9).**

The largest correlation coefﬁcients are the ones between PGA and PSA at 0.3s, and between PGV and PSA at 1s, while there is less correlation between PGA/PGV and PSA at 3s. In general, with decreasing period there is an increase in correlation with PGA, which is expected. The response spectral value at 1s has also sometimes been used to calculate PGV is not available (cf.

Newmark and Hall, 1982), which has been questioned lately (Bommer and Alarc´ n, 2006). Our o results do not support one view or the other exclusively - e.g., there is also signiﬁcant correlation between PGV and PSA at 0.3s.

To investigate the inﬂuence of learning a multivariate model on the coefﬁcients of the model, we also compute posterior distributions of the parameters where we assume that the targets are independent, as is commonly done in the derivation of GMMs. To assess the differences between the posterior distributions with and without covariance, we calculate their respective symmetric Kullback-Leibler (KL) divergences, which is a measure of the relative information loss when one probability distribution is replaced with another (see Scherbaum et al. (2009) for the use of KLdivergences in GMM selection). The KL-divergences for the parameters are shown in Figure 4.10.

The largest values are obtained for the site effects coefﬁcients cI and cP. Regarding the other parameters, there are slightly larger KL-divergences for the record-speciﬁc parameters b than for the ¨ event-speciﬁc parameters a. The variances, i.e. the diagonal elements of and T, are very similar to the variances computed under the independence assumption. In the electronic supplement, we provide plots of all parameter posterior distributions, both with and without covariance, for comparison.

r(¢), An important point in Bayesian inference is the use of the prior distribution which provides a principled way to incorporate prior knowledge [cf. eq. (4.5)]. In section 4.4.3, we described how we came up with the prior distributions for our parameters. These priors are also used for the calculation of posterior distributions under the assumption of target independence (Figure 4.10). To investigate the inﬂuence of the prior, we also calculate parameter posterior Discussion and Conclusions

distributions using a relatively ﬂat uniform prior, assuming independence of the targets. Plots of these posterior distributions are shown in the electronic supplement. In Figure 4.11, the symmetric KL-divergences between posterior distributions with a uniform and normal prior, respectively, are shown. In contrast to the differences between the posterior distributions with and without covariance (Figure 4.10), we now get much larger values for the KL-divergences. This is due not only to differences in the mean of the distributions, but also in the spread (see electronic supplement). The largest KL-divergences are obtained for the event-speciﬁc parameters a. This is due to the fact that these are essentially only determined by 159 data points (the number of earthquakes in the dataset), so the prior plays a larger role than for the record or site-speciﬁc parameters.

However, even though there is a prior dependence of the parameters, this is no a disadvantage of the model per se - in fact, it is to be expected. While the dataset of Allen and Wald (2009), which underlies our model, is quite extensive, it is by no means exhaustive. Hence, prior information can help to constrain the model. We believe that the prior we use, described in section 4.4.3, is reasonable, which in turn leads to a reasonable posterior parameter distribution. Nevertheless, model checking is very important, and we have seen in Figure 4.8 that the model predicts the data reasonably well.

The variances of the target distributions have very similar posterior distributions when computed from different priors (see electronic supplement). The distributional parameters of the covariances are listed in Tables 4.5 to 4.8. The variances of the marginal target distributions, i.e. the diagonal elements of the covariances, are relatively large. For example, the total standard deviation for PGA, calculated using the means of the respective posterior distributions (Table 4.5 and 4.7), is 0.824, compared to a value of 0.683 for the model of Akkar and Bommer (2010). A small part of this large standard deviation results from differences in how the two horizontal components are combined (Beyer and Bommer, 2006). In our study, the larger horizontal component is used, while Akkar and Bommer (2010) used the geometric mean. However, a major issue in this context are limitations in the dataset. There are measurement uncertainties associated with the predictor QH variables MW, RRUP, VS and F M, which are larger than e.g. in the NGA dataset. For example, QH the VS values are calculated from topographic slopes, using the method of Wald and Allen (2007). These measurement uncertainties lead to an increased ground motion variability.

Discussion and Conclusions Another possible source of the increased variability could be the fact that the underlying dataset is a global one, and thus combines data from different regions. Regional dependence of strong ground motions is still an open question (Douglas, 2009), but if there are differences in ground motion scaling between different regions that are not considered, these differences will map into the total ground motion variability.

We have built our GMM as a graphical model (cf. section 4.3). These provide a framework for reasoning under uncertainty, by associating each node with a probability distribution. The full joint distribution is determined by eq. (4.8). Inferences from the model can be made by using either point estimates of the parameters – such as means, medians, modes – or by sampling from the joint distribution. The factorization property of graphical models [eq. (4.8)] means that only local conditional probabilities need to be speciﬁed. This makes it very easy to extend the model, by incorporating additional nodes that might describe new aspects of the domain. Graphical models also offer great ﬂexibility, e.g. it is possible to model directly different functional forms to incorporate epistemic model uncertainty. This is similar to a logic tree, but the weights on the functions are represented by a node and can be learned. It is also easily possible to extend the graphical model to include regional differences, where some regions might share certain coefﬁcients.

In combination with graphical models, the Bayesian approach offers vast possibilities for reasoning under uncertainty. For one, the outcome of Bayesian inference is the conditional distribution of the parameters given the data [eq. (4.5)], which is exactly what we are interested in.

In particular, strictly speaking we do not infer just one model, but a distribution of models. This makes it possible to quantify and incorporate (epistemic) uncertainties of the parameters in PSHA.

The Bayesian approach also allows us to include prior knowledge in a principled way. This is especially important in the derivation of GMMs, since data is generally sparse, so other constraints need to be set.

In the future, we plan to extend our model to a general graphical hazard model. Therefore, additional functional forms should be included, as well as measurement uncertainties. This also requires additional nodes that describe the magnitude and distance distributions. The Bayesian method then allows to easily update the model once new data is acquired.

The dataset underlying our GMM ranges from moment magnitude 5-7.9 and from rupture disQH tance 1-400 km, representing global active conditions. Its VS range is 200-1000 m/s. We believe that the model is valid in this range, though it should be used with caution at the boundaries (cf.

Bommer et al., 2007).

**Data and Resources**

The dataset used in this study is the one compiled by Allen and Wald (2009). Information about the records can be found in the electronic supplement. The MCMC sampling was done using the software OpenBUGS (http://www.openbugs.info/w/), version 3.06.

Discussion and Conclusions Electronic Supplement The electronic supplement to this paper is available at http://www.geo.uni-potsdam.

de/mitarbeiter/Kuehn/kuehn-esupp-bayesregpsa.html.

Acknowledgments Trevor Allen publishes with the permission of the Chief Executive Ofﬁcer of Geoscience Australia.