# «EMPIRICAL GROUND-MOTION MODELS FOR PROBABILISTIC SEISMIC HAZARD ANALYSIS: A GRAPHICAL MODEL PERSPECTIVE Kumulative Dissertation zur Erlangung des ...»

When breaking down the data into the three entities, every earthquake or site is used only once in the learning process. This means that we use 154 data points for earthquake related variables such as magnitude or mechanism and 1314 for site related variables, compared to 3342 data points for PGA or distance, which belong to the measurement entity. However, 154 data points might be sufﬁcient only to discover strong dependencies, as we have seen in section 3.4. This is the reason why from the earthquake related variables, only the magnitude is connected to PGA by the structure learning algorithm. However, since it is known that fault mechanism has an (albeit small) effect on the distribution of PGA (Bommer et al., 2003), we have added this arc as expert knowledge. The direction of the arc was chosen so as not to create a v-connection. The other arcs are all learned. Thus, the connections in Figure 3.8 represent the statistical dependencies of several ground-motion related variables that are currently supported by the data.

An important feature of the learned network is that there is no learned direct arc connecting PGA QH. RH and VS However, both variables are not uncorrelated - the effect of VS on PGA is mediated QH by the depth to a shear wave horizon of 2.5km/s. This might be an indication that VS is not the Conclusions best predictor variable characterizing site effects. Hence, the use of other proxies for site effect characterization should be considered, as has been advocated lately (Castellaro et al., 2008).

One should keep in mind that the structure of the learned BN, as displayed in Figure 3.8, represents the current state of dependencies supported by the data. With more incoming data, the structure might (probably will) change. Also the parameters of the BN will change with increasing data, as more earthquakes will provide additional information on the interactions between earthquake, site and ground-motion parameters.

Another interesting feature of BNs is their extensibility. By adding nodes for e.g. the b-value or other ground-motion parameters like PSA, one could arrive at a full BN “hazard calculator” that takes into account all relevant variables and their corresponding uncertainties. For example, it is straightforward to calculate the hazard curve from the conditional distribution of the groundmotion parameter under consideration. A step further down the line would be to extend the BN to a decision support tool, as for example is done in tsunami early warning (Blaser et al., 2009), medical diagnosis (e.g. Nikovski, 2000) and many other ﬁelds. However, we acknowledge that this goal requires a lot of work in many ﬁelds. In this work, we have concentrated on learning a BN purely from data, both structure and parameters, which might be considered an early step towards these goals. This allowed us to assess which probabilistic (in)dependencies are actually supported by the available data. Future steps will include the combination of theoretical and empirical considerations, as we have seen that the BN is underrepresented by data in some ranges, which can lead to unphysical behavior if we rely only on the data.

**3.7 Conclusions**

We have presented a BN approach for the derivation of ground-motion models that directly estimates the joint probability distribution of several parameters related to the ground-motion domain in seismic hazard analysis. Directly modeling the joint-probability of earthquake, site, and groundmotion parameters gives insight into the data generating process hardly available otherwise. Since we use a Bayesian approach, the model we get is the maximum a posteriori model, i.e. the “most probable model given the data”. Our results show that PGA is directly inﬂuenced by the magnitude, the Joyner-Boore distance, the source-to-site azimuth, the depth to a shear wave horizon of

2.5 km/s, and the fault mechanism. All other effects are mediated by one of these parameters. In QH particular, VS affects the distribution of PGA only indirectly.

Data and Resources Ground-motion data used in this study were compiled for the NGA project. Data and accompanying information can be downloaded from http://peer.berkeley.edu/nga (last accessed September 2007).

Conclusions Electronic Supplement A table with information on the records used in this study is available online at http://www.

geo.uni-potsdam.de/mitarbeiter/Kuehn/kuehn-esupp.html. The Bayesian network can be downloaded from http://www.geo.uni-potsdam.de/mitarbeiter/ Kuehn/kuehn-esupp.html.

**Acknowledgements**

Our implementation of Bayesian networks is based on the SMILE reasoning engine for graphical probabilistic models by the Decision Systems Laboratory, University of Pittsburgh (http://dsl.sis.pitt.edu). We would like to thank Yahya Bayraktarli for comments on an early draft of the manuscript. We also thank the editor Andrew J. Michael and two anonymous reviewers for helpful comments that clariﬁed the manuscript.

## A BAYESIAN GROUND-MOTION MODEL WITH

## CORRELATION OF GROUND MOTION

## INTENSITY PARAMETERS

Kuehn, N. M., C. Riggelsen, F. Scherbaum, and T. I. Allen submitted to Bulletin of the Seismological Society of America We present a Bayesian ground motion model that directly estimates both coefﬁcients and the correlation between different ground motion intensity parameters. Therefore, we set up a graphical model which mimics our assumptions about the data generating process, i.e. which includes a source, path and station term. For each term, coefﬁcients to predict the median of the intensity parameter distribution can be estimated, together with the associated covariance structure (i.e.between-event and within-event variability plus correlation coefﬁcients). The graphical structure provides an easy, qualitative and intuitive insight into the model. The coefﬁcients of the model are estimated in a Bayesian framework using Markov Chain Monte Carlo simulation. Thus, prior information can be included into the model in a principled way, and an estimate of the epistemic uncertainty of the parameters is provided. It also allows to easily update the model once new data becomes available. The parameters of the model are estimated on a global dataset using peak ground acceleration, peak ground velocity and the response spectrum at three periods as the target variables. There is correlation between all target variables, to a varying degree.

**4.1 Introduction**

Ground motion models (GMMs), also often called ground motion prediction equations (GMPEs), play a crucial role in probabilistic seismic hazard analysis (PSHA). Uncertainty in the estimation of the ground motion parameter of interest, e.g. peak ground acceleration (PGA) or spectral acIntroduction celerations, is one of the key factors that controls the exceedance frequency for a given ground motion value (e.g. Bommer and Abrahamson, 2006). There exist a wide variety of ground-motion models for different seismic provinces (shallow active tectonics, subduction zones and intraplate regions) and different regions in the world (e.g. California, Japan, Europe). There also exist many different functional forms that are employed to model the dependence of ground motions on predictor variables such as magnitude, distance or site effects. For a review of published GMMs, see Douglas (2003, 2006, 2008).

In technical terms, a GMM quantiﬁes the conditional probability of a ground motion parameter r(Y Y given some earthquake and site related parameters X, X). In this context, it is usually assumed that the ground motion parameter Y is log-normally distributed, which leads to the

**following model:**

logY ¡;

= f(X) + (4.1) ¡ where describes the total variability of the ground motion, which is usually decomposed into ¡B ¡W, between-event variability, and within-event variability which are independent of each ¡B ¡W are normally distributed with mean zero and standard deviations and other. Both and , respectively. Here, we follow the notation proposed by Al Atik et al. (2010) for the description of the variability of GMMs. We can rewrite eq. (4.1) to emphasize the probabilistic nature of ground motion as logY ∼ N ( = f(X); = P + P ); (4.2) which reads as “log Y is distributed according to a normal distribution with mean = f(X) and standard deviation = P + P ”.

When dealing with GMMs in PSHA, epistemic uncertainty is commonly taken into account by selecting more than one GMM, which are then combined within a logic tree framework (e.g.

Bommer et al., 2005). Problems are which models to select and how to assign the weights for the logic tree (e.g. Bommer and Scherbaum, 2005). These issues, however, are not the concern of the present work but are treated elsewhere (Cotton et al., 2006; Bommer et al., 2010; Scherbaum et al., 2004a, 2009). Here, we are concerned with the epistemic uncertainty that is intrinsic to a speciﬁc GMM.

Usually, one gets a point estimate of the parameters when estimating a GMM, i.e. a single value for each coefﬁcient. Even with the best strong motion datasets currently available (e.g. the NGA dataset (Power et al., 2008; Chiou et al., 2008)), it is obvious that there is uncertainty associated with these parameter estimates. These uncertainties can be quantiﬁed by the respective standard errors. However, these do not lend themselves easily to a probabilistic interpretation. Here, we want to consider the aforementioned uncertainties by using a Bayesian approach. This results in a posterior probability distribution for the parameters which reﬂects their uncertainty, given our present state of knowledge and the current available data. A beneﬁcial feature of the Bayesian approach to the estimation of GMMs is also that it allows for an easy update of the model once new data is available. The Bayesian approach has been used in e.g. Ordaz et al. (1994) or Wang and Takada (2009) for the prediction of seismic ground motion. Recently, Arroyo and Ordaz (2010a,b) have presented a study where they compare the relative merits of maximum-likelihood (ML) and Bayesian regression. They come to the conclusion that the Bayesian approach leads to better results than ML, in particular when data is sparse.

Introduction to Bayesian Inference Traditionally, GMMs are derived separately for one ground motion parameter as the target variable, which is often PGA or pseudo-spectral acceleration (PSA) at discrete periods. However, it has been recognized that ground motion parameters recorded at one station are not independent from each other (e.g. Baker and Cornell, 2006). If this is not taken into account during PSHA and subsequent reliability analysis, it can lead to misleading or wrong results (e.g. Baker, 2007).

Normally, the correlation between ground motion intensity parameters is investigated using the residuals given a GMM that was estimated separately for each parameter. By contrast, here we directly develop a model for all target variables under consideration which takes into account the covariance between these parameters. Thus, our work is similar to Arroyo and Ordaz (2010a,b), who investigate a multivariate Bayesian regression model for ground motions. Our model differs from their approach in the design of the covariance structure, which we set up in as a multilevel model, while Arroyo and Ordaz (2010a,b) follow Joyner and Boore (1993, 1994). Both ways are very similar, but the multilevel model allows higher computational ﬂexibility.

We develop our model in the framework of probabilistic graphical models (see e.g. Koller and Friedman, 2009). These provide a general framework for reasoning under uncertainty, which can be exploited for use in PSHA. For example, it is possible to model measurement uncertainties or even different functional forms in the graphical model framework. Due to their modular structure, they are also easy to extend.

**4.2 Introduction to Bayesian Inference**

Bayesian inference is a key concept in our analysis. Therefore, we deem it necessary to provide a brief, though non-exhaustive introduction to the underlying principles of Bayesian inference/regression. A good overview of Bayesian statistics is presented by Spiegelhalter and Rice (2009, online available at http://www.scholarpedia.org/article/ Bayesian_statistics). For a more thorough introduction, see e.g. Gelman et al. (2003).

A key notion of Bayesian statistics is a proper treatment of (epistemic) uncertainty in terms of probabilities. As such, the goal of Bayesian inference is not to to estimate one particular model, but rather a distribution of (likely) models. Therefore, all information/belief that we have about the physics of the problem at hand is speciﬁed in terms of a probability distribution deﬁned on the parameters involved. This distribution is the so-called prior distribution, which is then subsequently updated given data using Bayes’ law. In the following, Bayesian inference is illustrated by means of a simple regression example.

I;:::;N Imagine that we have data, D on two variables, X and Y, with i = samples. We assume that there is a linear dependency between X and Y,

** Yi = wH + wI ∗ Xi + i ; (4.3)**

where is the error term which is Normal distributed with mean 0 and standard deviation . This deﬁnes a classical regression problem, which can be solved using e.g. maximum likelihood, giving us a point estimate of the parameters wH, wI and .

We can rewrite eq. (4.3) to emphasize the stochastic nature of the data Y and to explicitly express Graphical Models

**that the parameters are treated as random variables:**

Eq. (4.4) reads as “Yi is distributed according to a Normal distribution with mean i = wH +wI ∗Xi and standard deviation ”.

In Bayesian regression, we are interested in the posterior distribution of the parameters given the data, which can be estimated using Bayes’ rule