# «EMPIRICAL GROUND-MOTION MODELS FOR PROBABILISTIC SEISMIC HAZARD ANALYSIS: A GRAPHICAL MODEL PERSPECTIVE Kumulative Dissertation zur Erlangung des ...»

Introduction

**5.1 Introduction**

Empirical ground motion models (GMMs) are a crucial ingredient for probabilistic seismic hazard analysis (PSHA). Ground motion uncertainty is one of the key factors that controls the exceedance frequency of the ground motion parameter of interest, e.g. peak ground acceleration or spectral acceleration, for a given ground motion value (e.g. Bommer and Abrahamson, 2006). There are numerous published GMMs for different seismic provinces (e.g. shallow active tectonic, stable continental interiors, and subduction zones) and different regions in the world (e.g. California, Japan or Europe). These models can differ considerably in the amount of data, the functional forms used to model the scaling of ground motions with the predictor variables, and the number and kind of predictor variables used to characterize earthquake source and site effects. For a review of published ground motion models see Douglas (2003, 2006, 2008), for a recent comparison see Douglas (2010).

An important question in the context of PSHA, especially when it comes to the applicability of a GMM, is whether ground motion scaling is regionally dependent (e.g. Douglas, 2009). This has important consequences for the possible application of a GMM, developed from data in one particular region, in another region. One way to deal with this in a PSHA is to adjust a GMM for use in a new region based on physical differences (Campbell, 2003; 2004). Recently, it has been proposed to combine data from one region with a ‘reference’ model from a different region which is better constrained by data (Atkinson, 2008). In this work, we present a model/approach that can be used to directly include regional differences in GMMs. For this purpose, we learn a model on a global dataset, which consists of several submodels for different regions. These submodels are not independent of each other. More technically, we assume that there is a global distribution of GMMs, speciﬁed by so-called global hyperparameters. The regional GMMs are samples from this global distribution. Hence, we assume that the coefﬁcients for each regional model are similar, and data from one region is also used to estimate coefﬁcients in the other regions, though with less weight. The weights are determined according to the number of data in the regions and the variability of the coefﬁcients. For more details, see section 5.5.

Formally, a GMM is a model for the conditional distribution of a ground motion parameter Y r(Y X). Y is usually assumed to be given some earthquake and site related parameters X, distributed according to a log-normal distribution, whose median (and possibly the standard

**deviation ) depends on the inputs, i.e. is a function of the predictor variables X:**

and standard deviation = P + P ”.

In PSHA, it is important to take into account epistemic uncertainty that arises from the fact that no model captures all aspects of ground motion scaling for a particular application by selecting more than one GMM. This is especially important when no GMM particularly developed for the region of interest exists. The selected GMMs are usually combined within a logic tree framework (e.g. Bommer et al., 2005). Crucial questions in this context are which models to select and how to assign the weights for the logic tree (e.g. Bommer and Scherbaum, 2005). These issues have been addressed in the past (Cotton et al., 2006; Bommer et al., 2010; Scherbaum et al., 2004, 2009, 2010) and are not the topic of this work (though they are far from being solved). Here, our concern lies rather on the epistemic uncertainty that is intrinsic to learning a GMM.

Estimating the parameters of a GMM results usually in a point estimate, i.e. a single value for each coefﬁcient. It is obvious that these parameter estimates are not error-free, even when they are based on the best strong-motion datasets currently at hand (e.g. the NGA dataset (Power et al., 2008; Chiou et al., 2008)). When considering these uncertainties, it is important that they lend themselves to a probabilistic interpretation. This can be achieved by using a Bayesian approach, where we estimate the posterior probability of the parameters given our present state of knowledge and the current available data. The Bayesian approach also allows one to easily include prior domain knowledge, as well as to update the model once new data is available. It has been used in e.g. Ordaz et al. (1994) or Wang and Takada (2009) for the prediction of seismic ground motion.

Recently, Arroyo and Ordaz (2010a,b) presented a Bayesian GMM that estimated the correlation between different ground motion intensity parameters.

Our model is developed in the framework of probabilistic graphical models (see e.g. Koller and Friedman, 2009), which provide a general framework for reasoning under uncertainty. Their graphical structure also allows for an intuitive insight into the data generating process. It is easy to extend graphical models to accommodate extra complexities by exploiting their modular structure.

More technically, a joint probability distribution factorizes in a certain way for a graphical model.

This property can be exploited to facilitate the analysis (see section 5.3).

**5.2 Bayesian Inference**

In this section, we provide a very brief, non-exhaustive introduction to the principles of Bayesian inference. A good overview of Bayesian statistics is Spiegelhalter and Rice (2009, online available at http://www.scholarpedia.org/article/Bayesian_statistics). There exist also numerous textbooks on the subject, e.g. Gelman et al. (2003).

Bayesian inference provides a principled way of handling epistemic uncertainties about the parameters of a model in terms of probabilities. Therefore, all information/belief we have about the states of nature/the physics of the problem are quantiﬁed in terms of a probability distribution on the parameters involved. Subsequently, this so-called “prior distribution” is updated given data, which results in the “posterior distribution”. The updating process is done according to Bayes’ rule,

5.3 Graphical models As stated in the previous section, we estimate the posterior distribution of the parameters of our GMM by MCMC sampling. In particular, we use the program OpenBUGS (http://www.

openbugs.info/) (Lunn et al., 2009), where BUGS stands for ‘Bayesian inference using Gibbs sampling’.

Gibbs sampling (Geman and Geman, 1984) is an MCMC algorithm that exploits conditional independence assumptions between the quantities (parameters and observables) of a model. A convenient way to encode conditional (in)dependence assumptions are directed acyclic graphs (DAGs; e.g. Spiegelhalter, 1998), which are a special kind of graphical models. They are described below. For a more detailed introduction to graphical models, see e.g. Jordan (2004), Koller et al.

(2007) or Koller and Friedman (2009).

In a graphical model, each quantity (observable, parameter, functions of both) corresponds to a node. Arcs between nodes denote direct dependence of two quantities. An example of a graphical model is shown in Figure 5.1 for a simple model of the kind

• Fixed nodes: These nodes are denoted by rectangles and represent quantities with set values (e.g. ﬁxed parameters or observables).

Graphical models

Figure 5.1: Graphical model for a simple linear model of the form y(x; w) = wH + wI x +

• Stochastic nodes: They are denoted by circles or ellipses and represent uncertain quantities which are associated with a probability distribution.

• Functional nodes: These are also denoted by circles or ellipses (here they are also shaded).

They represent nodes that are functions of other quantities.

Functional dependences between parameters are represented by thick arrows, stochastic dependences by thin arrows. A node X from which an arrow points to a node Y is said to be the ‘parent’ of Y, while Y is called the ‘child’ of X.

¢ The parameters = {wH ; wI ; } of our model are represented by stochastic nodes because they are uncertain parameters whose posterior distribution is to be estimated. The observed data points Xi are assumed to be error-free and is therefore represented by a rectangular node. The mean i is a function of the parameters = {wH ; wI ; } [cf. eq. (5.4)] and is therefore displayed as a shaded circular node. Yi is represented by a stochastic node, because this is the data generating system that we assumed in eq. (5.4).

¢ = {wH ; wI ; } need to be assigned a prior distribution. The parameters of The parameters these prior distributions are represented by the rectangular nodes in the top line of Figure 5.1. In principle, it would also be possible to place a prior distribution over these so-called ‘hyperparameters’, describing their uncertainty.

A graphical model carries the gist of the model (i.e. information about direct (in)dependencies between quantities), but conceals the nitty-gritty details (i.e. distributional or functional speciﬁcs).

This is not so much a problem for the simple model of eq. (5.4), but it can be particularly useful for complex models, where a graphical model provides intuitive insight into the data generating process.

Specifying a model as a DAG also automatically encodes conditional independence assumptions and allows a factorization of the joint distribution, which makes the analysis of probabilistic models more convenient. It can be shown (Lauritzen et al., 1990) that for any particular Dataset 7.5

5.4 Dataset The dataset we use for constructing the global Bayesian ground motion model is the one compiled by Allen and Wald (2009). This dataset contains records from earthquakes in three different tectonic source types: shallow active tectonics, subduction zone and continental interiors. In this work, we use only earthquakes from shallow active tectonic regimes, in order to keep the model from being too complicated. However, in principle it is possible to extend the model to include also events from subduction zones and continental interiors.

The dataset of Allen and Wald (2009) contains 10,163 records from 238 earthquakes from shallow active tectonic regimes. The ground motion intensity parameters are PGA, PGV and the response spectrum at periods 0.3s, 1s and 3s. In this work, we use only PGA. For details on data compilation and processing, we refer to the original report of Allen and Wald (2009). In this work, we use only records up to a rupture distance of 400 km and above a magnitude of 5, which reduces the dataset to 9,831 records from 227 earthquakes. The magnitude-distance distribution of the used records is shown in Figure 5.2. A table with detailed information about the used earthquakes can be found in the electronic supplement.

The ground motion model we develop in this work, albeit a global one, takes into account regional differences (see section 5.5 for details). Therefore, we group the earthquakes into 10 regions. The earthquakes and regions are shown in Figure 5.3. The deﬁnition of the regions is based on geophysical considerations (e.g. stress drop variations (Allmann and Shearer, 2009)).

Dataset Figure 5.3: Location of earthquakes used in this study and deﬁnition of regions.

However, we refrain from grouping the earthquakes into too many regions to avoid having a too small number of earthquakes in the individual regions. For example, there exist a number of ground motion models developed for different parts of Europe (e.g. Italy (or parts of Italy), Greece (e.g. Danciu and Tselentis (2007))). However, here we consider Europe and the Middle East as one region, similar to Ambraseys et al. (2005) or Akkar and Bommer (2010). The number of earthquakes and records per region is given in Table 5.1.

As one can see in Table 5.1 and Figure 5.3, there is one region with only one event (North Eastern China). Nevertheless, it is still possible to construct a model for this regions (i.e., calculate distinct parameters). In that case, data from other regions is more heavily called upon.

The predictor variables we consider are moment magnitude MW, shortest distance to the rupture Ground Motion Model Setup

5.5 Ground Motion Model Setup In this section, we describe the GMM developed in this study. The GMM is outlined both as a graphical model as well as in equations. The target variable of our GMM is horizontal PGA.