# «EMPIRICAL GROUND-MOTION MODELS FOR PROBABILISTIC SEISMIC HAZARD ANALYSIS: A GRAPHICAL MODEL PERSPECTIVE Kumulative Dissertation zur Erlangung des ...»

Equation (4.5) can be interpreted as updating the prior distribution of the parameters using data (via the likelihood function), resulting in the posterior distribution. It is possible to estimate the posterior distribution for a whole batch of data points. Alternatively, one can also update the model sequentially, using one data point at a time, in which case the posterior distribution given the ﬁrst data point becomes the prior distribution for the second and so on. The latter procedure can be used to easily update a Bayesian model once new data becomes available.

To solve our regression example eq. (4.3), we need to specify our prior information/beliefs of the parameters wH, wI and into a probability distribution. In conjunction with the likelihood r(Y {wH ; wI ; }) we can estimate the posterior distribution using eq. (4.5).

function

**4.3 Graphical Models**

In the previous section we have illustrated the basic ideas behind Bayesian inference, where we r(¢ D). In simare interested in the conditional distribution of the parameters given the data, ple cases this distribution may be solved analytically, but in many realistic situations the posterior distribution is high-dimensional, complex, and unavailable in closed form. This is primarily due to analytical complications in computing the marginal likelihood. Therefore, one needs to resort to approximate inference methods, such as Markov Chain Monte Carlo (MCMC) sampling of the posterior distribution (e.g. Gilks et al., 1996). MCMC constructs a Markov Chain with the stationary/limiting distribution being the posterior. Here, we will use MCMC to obtain the parameters of our GMM. In particular, we make use of the program OpenBUGS (http://www.openbugs.info/) (Lunn et al., 2009), where BUGS stands for ‘Bayesian inference using Gibbs sampling’.

Gibbs sampling (Geman and Geman, 1984) is one particular MCMC algorithm that exploits conditional independence assumptions between parameters and (un)observables of a data generatGraphical Models

** Figure 4.1: Graphical model for a simple linear model of the form y(x; w) = wH + wI x + ing system such as the one described in eq.**

(2). An easy way to encode conditional (in)dependence statements are graphical models, in particular directed acyclic graphs (DAGs; e.g. Spiegelhalter, 1998), which we describe below. For a more detailed introduction to graphical models, see e.g.

Jordan (2004), Koller et al. (2007) or Koller and Friedman (2009).

Each quantity (observable or parameter) of a model corresponds to a node in the graphical model, and arcs between the nodes show direct dependences. The graphical model for our simple regression example [eq. (4.3)] is depicted in Figure 4.1. The graph is a DAG since each edge (connection) is an arrow, and it is acyclic because there is no direct path following the arrows from one node back to itself.

In a graphical model, non-random variables or co-variates are denoted by a rectangular node.

In our example, the data points Xi are considered known and are thus represented by a rectangle.

Elliptical or circular nodes represent either output of mappings (here also shaded) or stochastic quantities. In Fig. 4.1 and eq. (4.4), i is a function of wH ; wI and Xi, i = wH + wI ∗ Xi : (4.7) The functional dependence is denoted by a thick arrow in Figure 4.1. Thin arrows represent stochastic dependences. For example, in our case Yi coincides with a stochastic node associated with a Normal distribution with mean i and standard deviation . From a Bayesian perspective the nodes for wH, wI and also represent stochastic quantities, which need to be assigned a prior distribution. If we assume a normal prior distribution for each of these parameters, the means and standard deviations of these Gaussians are represented by the nodes in the top line of Figure 4.1.

These nodes are rectangles, because a ﬁxed prior distribution is assumed. Repetitive structures I (in our case the loop over the data points from i = to i = N) are represented by rectangular structures, so-called ‘plates’.

The advantage of a graphical model (DAG) is that it keeps details about distributions and deterministic functions hidden, but communicates the essence (i.e. the direct (in)dependences of Model Setup

** Figure 4.2: Earthquakes used in the study.**

variables) of a model. This is especially useful for complex models, where we otherwise would have to resort to a large set of equations.

The DAG representation also facilitates analysis of probability models, since it encodes conditional independence statements and allows a factorization of the joint probability distribution. It can be shown (Lauritzen et al., 1990) that for any particular r(¢ Y YX) ∝ r(¢;Y YX) = r(V parents [V ] YX); (4.8) V ∈Y ∪¢ where parents[v] speciﬁes the parent set of the nodes from which an arrow points to V (if the parent set is empty the conditional reduces to a marginal). Hence, to specify the full joint probability distribution it is sufﬁcient to deﬁne the local “parent-to-child” distributions along the arcs of YX).

the DAG, P (v parents [v] Gibbs sampling is a technique that makes efﬁcient use of these properties by passing information around the DAG in accordance to the independences holding and depending on the local distributions deﬁned.

4.4 Model Setup In this study, our intention is to learn a Bayesian GMM. The central formula in this context is

**Bayes rule, eq. (4.5), which we repeat here:**

4.4.1 Dataset The dataset we use for constructing the Bayesian GMM is the one compiled by Allen and Wald (2009), which is a global dataset. It contains records from earthquakes in three different tectonic source types: shallow active tectonics, subduction zone and continental interiors. Here, we use only earthquakes from shallow active tectonic regimes.

The dataset of Allen and Wald (2009) comprises 10,163 records from 238 earthquakes from shallow active tectonic regimes. For details on the records compilation, we refer to Allen and Wald (2009). In this work, we use only records up to a rupture distance of 400 km from earthquakes with a moment magnitude greater than 5., which reduces the dataset to 9,872 records from 228 earthquakes.

In the dataset, information on peak ground velocity (PGV), PGA and PSA at 0.3s, 1s and 3s is available. The dataset was originally compiled to reconstruct ground-shaking from recenthistorical earthquakes (Allen et al., 2009) using USGS ShakeMap methodology (Wald et al., 1999). Consequently, the target variables are taken to be the the larger horizontal component.

We use only those records for which all ﬁve target quantities (PGV, PGA and PSA at 0.3s, 1s and 3s) are available, which leaves us with 7,957 records from 159 earthquakes, recorded at 2,889 unique stations. The epicenters of the earthquakes are depicted in Figure 4.2.

As predictor variables, we consider moment magnitude MW, rupture distance RRUP, the avQH, erage shear wave velocity in the upper 30 m, VS and the focal mechanism F M. The focal mechanism has three states, normal, strike slip, and reverse. Shear wave velocity values were estimated from topographic gradient using the approach of Wald and Allen (2007), with the minimum QH QH and maximum VS values of 210 m/s and 963.9 m/s, respectively. We group VS into three site categories according to Table 4.1.

For some earthquakes, there is no information on the focal mechanism. Similarly, for some QH stations the value of VS is missing. Nevertheless, the corresponding records can be retained in the analysis, as the uncertainty of the unknown values is taken care of during the analysis. For more details, see section 4.4.2.

In Figure 4.3, we show the magnitude-distance distribution of the used records. Scatter plots between the predictor variables and PGA are depicted in Figure 4.4. We provide Tables with detailed information on the used earthquakes and records in the electronic supplement.

Model Setup 7.5

6.5 6.0 5.5

Almost all published GMMs make the assumption that P GA, P GV and the response spectrum are log-normally distributed, which we follow here. Thus, we introduce a new vector of random variables, Z, which is the log-transformed target vector Y,

We then assume that Z is distributed according to a multivariate normal distribution with a vector ¦.

of means , that are functions of the predictor variables, and a covariance matrix In the development of a GMM, one has to take into account the correlation of records from the same earthquake. This is usually done by invoking an appropriate regression technique that allows for separation of the total variability into between-event and within-event standard deviation, such as a one-step or two-step regression (Joyner and Boore, 1993, 1994) or a random effects algorithm (Abrahamson and Youngs, 1992). Another source of variability is between-station variability, which takes into account that ground motions recorded at the same station are not independent.

However, this variability is only rarely taken into account, e.g. in the work of Chen and Tsai (2002).

In the following, we use the notation introduced by Al Atik et al. (2010) to discern the different components of ground motion variability. Between-event variability is denoted by , while stands for within-event variability. The respective covariance matrices are denoted by upper case ¨.

letters, i.e. T and We develop our GMM as a multilevel/hierarchical model (Gelman and Hill, 2007), which can be seen as conceptually similar to a two-step regression but with the ability of easily adding extra complexity. The multiple levels allow to take into account grouped data. Hence, one level corresponds to all earthquakes, one level corresponds to all stations, and the intersection of these two levels represents the record of one earthquake recorded at one station.

In Figure 4.5, we show the GMM of this study as a graphical model. The two plates correspond to the two levels, where indices e and s denote the eth earthquake and sth station, respectively.

** Figure 4.5 can be thought of as a conceptual model of the data generation.**

**The concept of the model is as follows:**

The central node in Figure 4.5 is the vector of target variables, denoted by Zes, which is the eth earthquake recorded at the sth station. Zes is distributed according to a multivariate normal distri¨.

bution with mean vector Z;es and covariance matrix The mean vector Z;es is the sum of an event term, a station term and a record term, as can be seen in eq. (4.12). The event term E e is common to all records from the same earthquake e and is itself distributed according to a multivariate normal distribution, with mean vector E;e and covariance matrix T [eq. (4.13)]. Correspondingly, the station term S;s is common to all records recorded at the same station k. In principle, one could assume that the station term is also sampled from a multivariate normal distribution. However, a stable estimation of its covariance requires stations with multiple recordings, which are not Model Setup

abundant in our dataset. Hence, we assume that the station term is not a random variable, but a constant (strictly speaking, we assume that the components of the covariance matrix are all zero).

¨, T are the within-event and between-event covariances, respectively.

The means of the event, record and station terms are functions of parameters and the predictor

**variables:**

where superscript t denotes the tth target variable, while at, bt and ct are the coefﬁcients for the individual functions. These parameters are all assumed to be independent of each other.

¨ The parameters a, b, c, as well as the covariances and T, are displayed as stochastic nodes, since they are treated as random variables. Therefore, they are assigned a prior distribution, whose parameters are represented by the rectangular (i.e. ﬁxed) nodes of Figure 4.5. For a, b, and c, the prior distributions are independent univariate normal distribution with means a, b, c and ¨ standard deviations a, b, c, respectively. For the covariances and T, the prior distributions, r r denoted ¨ and T, are uniform distributions over their respective Cholesky decompositions (Weisstein, 2010). For more details on the prior distributions, see section 4.4.3.

We have settled on the following functional forms for f, g and h. These are based on geophysical considerations and generalization error determined by 10-fold cross-validation (e.g. Kuehn et al., 2009a; Hastie et al., 2001).

1. Specify prior distributions on the stress drop, QH and the slope of the geometric attenuation.

2. Generate a synthetic dataset from these parameters using stochastic simulations (Boore, 2003).

3. Regress the function of eqs. (4.17) to (4.18) on the synthetic dataset.

4. Take the estimated coefﬁcients and their standard errors as prior distributions for the parameters.

4.5 Results In the previous section, we have speciﬁed the model, the prior distribution and the dataset. Using Bayes’ rule [eq. (4.5)], we can now estimate the posterior distribution of the parameters given r(¢ r(¢ D). As described before, the model is too complicated to estimate D) the data, analytically, so we resort to approximate inference, using MCMC sampling to obtain samples from the posterior distribution of each parameter. From the sequence of samples we can compute several summary statistics, such as mean values, standard deviations, quantiles and so on. The histogram of the sampled values serves as an approximation to the posterior probability density function.