# «EMPIRICAL GROUND-MOTION MODELS FOR PROBABILISTIC SEISMIC HAZARD ANALYSIS: A GRAPHICAL MODEL PERSPECTIVE Kumulative Dissertation zur Erlangung des ...»

**6.1 Introduction**

Seismic intensities have recently gained much renewed attention. In particular, they can be used to make a ﬁrst, quick assessment of potential damage after a large earthquake using the ShakeMap methodology (Wald et al., 1999a). Furthermore, they are often the only means to test the potential applicability of ground-motion models in regions where no or only very few instrumental data exists (Scherbaum et al., 2009; Delavaud et al., 2009). For both of these applications, a relation converting an instrumental ground motion parameter, such as peak ground acceleration (PGA) or peak ground velocity (PGV), into a seismic intensity I is needed.

There are several studies that provide such a relation (e.g. Atkinson and Sonley, 2000; Atkinson and Kaka, 2007; Chiaruttini and Siro, 1981; Kaka and Atkinson, 2004; Marin et al., 2004; Panza et al., 1997; Souriau, 2006; Tselentis and Danciu, 2008; Theodulidis and Papazachos, 1992; Wald Introduction et al., 1999b). Most of these relations provide a simple regression equation of the form

where X is either PGA or PGV. A few studies also include additional predictor variables such as magnitude or distance (e.g. Atkinson and Kaka, 2007; Tselentis and Danciu, 2008).

It is worth remembering that macroseismic intensities are not quantitative, instrumentally measured parameters, but depend on human judgment and may also be mixed with information about building quality. Thus, they carry a large amount of uncertainty. Therefore, from a purely physical/seismological perspective, PGA, PGV or the response spectrum are better parameters to describe ground shaking, whereas seismic intensities have their main value in providing information about historical earthquakes (i.e. that occurred before the advent of seismometric data acquisition).

On the other hand, as discussed in Wald et al (1999a), ShakeMaps of seismic intensities might aid interpretability in terms of rapid damage/loss estimation. Hence, converting instrumental ground motion intensity parameters into macroseismic intensities are beneﬁcial for certain applications, and new relations, incorporating new data/methods, are useful.

Recently, Faenza and Michelini (2010) have presented new relations between PGA, PGV and intensities for Italy. They also provide an excellent, extensive review and discussion of different methodologies in this context, and provide an example of the application of the results in ShakeMap.

Relations like eq. (6.1) have been applied successfully in the context of ShakeMaps. However, in eq. (6.1), the target variable I is treated as continuous, while it is inherently a discrete, ordinal variable. Therefore, dependent on the value of X, a model like eq. (6.1) can predict an estimate T:US.

of seismic intensity of, say, I = Such a value, however, is not meaningful in terms of a discrete variable like seismic intensity, and it either has to be rounded to the next integer value, or an interpretation like “the intensity is 75% VII and 25% VI” has to be applied. Neither of these interpretations is completely satisfactory though. Furthermore, uncertainties in the intensity estimates are usually taken into account via a normal distribution, which is also continuous. For example, given a PGV-value of 5 cm/s, the model of Faenza and Michelini (2010) predicts an estimated intensity value of 6.75 with a standard deviation of 0.26, which is hard to interpret in terms of a discrete variable.

Here, we present a method that directly estimates intensities as a discrete variable: Naive Bayes classiﬁcation (e.g. Mitchell, 1997, chapter 6). In this context, Bayes’ rule is used to estimate the r(I X). We explain naive Bayes classiﬁcation in section 6.2.

(discrete) conditional distribution Our approach is similar to the one taken by Ebel and Wald (2003), who propose a Bayesian method to estimate the conditional distribution of an instrumental ground motion parameter X r(X I). The method of Ebel and Wald requires estimates of given modiﬁed Mercalli intensity, P (I X), which they obtain using a strategy that is similar to a naive Bayes classiﬁer, even though they do not call it that way.

The paper of Ebel and Wald (2003) has its main emphasis on the estimation of continuous instrumental ground motion parameters from seismic intensities, using Bayesian updating. Here, on the other hand, we focus on predicting seismic intensities from instrumental ground motion parameters by a naive Bayes classiﬁer. In this context, it is important to point out that the name Naive Bayes Classiﬁcation Bayes classiﬁer originates from the use of Bayes’ rule in the analysis, but does not pertain to any r(X I) statistical philosophy. On the other hand, the method of Ebel and Wald (2003) to estimate r(X), can be considered “Bayesian” in that it requires the speciﬁcation of a prior probability r(I X) (see eq. (1) of Ebel and Wald (2003)). However, the parameters in which is updated by Ebel and Wald (2003) are determined using maximum likelihood.

The goal of the present note is not to make a full-ﬂedged investigation into the use of naive Bayes classiﬁcation for predicting intensities, but rather to make simple comparisons between naive Bayes methods and regression models like eq. (6.1). Therefore, we use the dataset of Faenza and Michelini (2010), since it is freely available and is the basis of one of the most recent relations connecting seismic intensities and PGA/PGV.

In our paper, we often use the word “learn”. We use this term in a rather broad sense, where “learning a model” means building the model and estimating its parameters. This is a reference to the machine learning community, where learning is used in this sense.

A few words on notation: Upper case symbols (e.g. X, Y, I) denote random variables. Vectors are represented in bold face (e.g. X), and subscripts refer to each random variable or feature of a vector (e.g. Xi is a feature of X). Lower case symbols denote values of a random variable (i.e., Xi = Xij refers to the random variable Xi taking on its jth possible value). We use the notation 5(z) 5(Y to denote the number of elements that satisfy property z (e.g. = yi ) is the number of instances in the data where Y takes on the value yi ).

6.2 Naive Bayes Classiﬁcation Suppose we have a variable Y that is categorical, i.e. has discrete instantiations, and that depends on some other variables X = {XI : : : XN }. A naive Bayes classiﬁer predicts the the conditional distribution of Y given X using Bayes rule

This assumption greatly reduces the number of parameters to learn, since now we only have to estir(X j = xjk Y = yi ), which can be estimated from the relative frequencies mate the probabilities 5(X

**in the data:**

j = xjk ∧ Y = yi ) r(X j = xjk Y = yi ) = ;

5(Y (6.6) = yi ) where ∧ means “logical and”. The assumption of conditional independence of the predictor variables is often not very realistic from a physical perspective, but it works surprisingly well in many cases. In this context, it is important to keep in mind that the intention of the naive Bayes classiﬁer is not to be a generative, physical model of the data generating process, but to predict Y given X.

Even if the assumption of conditional independence does not represent the physics of the problem, it is often sufﬁcient for prediction. For example, in our problem PGV and PGA are not independent of each other, but for the purpose of predicting intensities it sufﬁces to assume that they are.

The name naive Bayes comes from this naive assumption.

As a side note, in a regression with more than one predictor variable, these are also usually assumed independent. Contrary to the naive Bayes classiﬁer, however, where we state the assumption of independence explicitly, it is only implicit in regression.

In the case of continuous predictor variables Xj, Bayes’ rule can still be used to estimate r(Y r(X = yi X), but now the conditional distributions j Y ) cannot be speciﬁed using eq.

(6.6). Instead, one can make the assumption that for each possible value yi of Y, the continuous variable Xj follows a parametric distribution, e.g. a normal or log-normal distribution. Then, the learning task is to estimate the parameters of that distribution. To reduce the number of parameters, one can also assume that some parameters are the same for each yi, e.g. that all standard deviations are the same. When it is not possible to make the assumption of a parametric distribution for continuous inputs, one can try to use a non-parametric method such as Kernel density estimation (e.g. Hastie et al., 2001, chapter 6.6), or discretize the continuous variables.

We stress again that the name Bayes classiﬁer comes from the use of Bayes’ theorem [eq. (6.3)].

It should not be confused with Bayesian inference. There is nothing inherently Bayesian about the method as it is outlined above, all parameters are estimated by maximum likelihood. It would nevertheless be possible to estimate the parameters using Bayesian inference.

Naive Bayes Classiﬁers Connecting PGA, PGV and seismic intensities

**6.3 Naive Bayes Classiﬁers Connecting PGA, PGV and seismic intensities**

In this section, we learn naive Bayes classiﬁers relating seismic intensities to PGA and PGV. For our analysis, we use the same dataset as Faenza and Michelini (2010). They discuss in detail the data assemblage and the properties of the dataset. In total, the dataset comprises 266 intensities from 66 Italian earthquakes in 12 intensity classes from 2 to 8. Note that there are also intermediate intensities with values.5. Since these are not physically meaningful, we treat them such that they belong to both the class with full integer value above and below with a respective weight of 0.5.

The intensity scale of the dataset is the Mercalli-Cancani-Sieberg scale (Sieberg, 1930). Hereafter, a seismic intensity is denoted by I. As predictor variables we consider PGA (in cm/sP ) and PGV (in cm/s). The minimum and maximum PGA values are 0.29 cm/sP and 569.55 cm/sP, respectively.

PGV ranges from 0.01 cm/s to 34.39 cm/s.

In our case, the predictor variables PGA and PGV are continuous variables. However, many studies have found one can assume a log-normal distribution of the ground motion intensity palnX

**rameter for each intensity class, i.e. is normally distributed given each intensity class:**

where X is either PGV or PGA. Faenza and Michelini (2010) have shown that this assumption is well justiﬁed for their dataset. Hence, we need to estimate the mean values and standard deviations ln(PGA) ln(PGV of and ) for each intensity class

r(I), where X is either PGA or PGV. We also need the prior distribution which can be calculated from the relative frequencies in the dataset using eq. (6.4). In addition to an individual standard deviation for each intensity class as estimated by eq. (6.9), we also estimate a common standard deviation for all intensity classes (but different for PGA and PGV). We do that since the number of intensities is small for some intensity classes, which does not allow a robust estimation of an individual standard deviation. The common standard deviation is determined by

To assess the performance of different classiﬁers, we use leave-one-out cross-validation to estimate the generalization error (e.g. Hastie et al., 2001, chapter 7; Kuehn et al., 2009a) of the classiﬁers. The generalization error is a measure of the error that is made when predicting unseen data. Therefore, we drop one record from the dataset, learn the classiﬁers for the rest of dataset, classify I for the left out data point, and calculate the residual to the actual value. This is done for all data points, and the resulting residuals are averaged to give an estimate of the generalization error. We also perform leave-one-out cross-validation for regression, using the following

**functions:**

The parameters of the regression models are learned by averaging the logarithmic ground motion values for each intensity class, following common practice (see e.g. Faenza and Michelini, 2010).

Similar to the naive Bayes classiﬁer, P GA and P GV are (implicitly) assumed to be independent in eq. (6.13).

The intensity predictions of the different models are made in the following way: For the naive Bayes classiﬁers, we simply take the most probable intensity value, i.e.

In eq. (6.14), the ﬁrst term of the numerator is the “prior probability” of intensity class ik, the r(X I = ik ) under the assumption of conditional independence (see second term represents eq. (6.5)), while the denominator is the marginal distribution of X. Hence, the argument of the rgmx r(IX), where X is either {P GA}, {P GA} function is the full conditional distribution or {P GA; P GV }. The modal value of this distribution is the predicted intensity value. From the Naive Bayes Classiﬁers Connecting PGA, PGV and seismic intensities Table 6.2: Generalization errors for different classiﬁers/regression models, calculated with the 0 1 loss L(I; I (X)). NBX;sd is a naive Bayes classiﬁer with the same standard deviation for all intensity classes.