FREE ELECTRONIC LIBRARY - Thesis, documentation, books

Pages:     | 1 |   ...   | 14 | 15 || 17 | 18 |   ...   | 20 |


-- [ Page 16 ] --
Kuehn, N. M. and F. Scherbaum Bulletin of the Seismological Society of America, in press A naive Bayes classifier is determined to predict intensities from peak ground velocity and acceleration. It is trained on the same dataset that was used in the study of Faenza and Michelini (2010). The naive Bayes classifier directly estimates a discrete probability distribution for the ordinal intensities. Comparisons based on generalization error, estimated by cross-validation, show that the naive Bayes classifier performs better than traditionally employed regression models.

6.1 Introduction

Seismic intensities have recently gained much renewed attention. In particular, they can be used to make a first, quick assessment of potential damage after a large earthquake using the ShakeMap methodology (Wald et al., 1999a). Furthermore, they are often the only means to test the potential applicability of ground-motion models in regions where no or only very few instrumental data exists (Scherbaum et al., 2009; Delavaud et al., 2009). For both of these applications, a relation converting an instrumental ground motion parameter, such as peak ground acceleration (PGA) or peak ground velocity (PGV), into a seismic intensity I is needed.

There are several studies that provide such a relation (e.g. Atkinson and Sonley, 2000; Atkinson and Kaka, 2007; Chiaruttini and Siro, 1981; Kaka and Atkinson, 2004; Marin et al., 2004; Panza et al., 1997; Souriau, 2006; Tselentis and Danciu, 2008; Theodulidis and Papazachos, 1992; Wald Introduction et al., 1999b). Most of these relations provide a simple regression equation of the form

–  –  –

where X is either PGA or PGV. A few studies also include additional predictor variables such as magnitude or distance (e.g. Atkinson and Kaka, 2007; Tselentis and Danciu, 2008).

It is worth remembering that macroseismic intensities are not quantitative, instrumentally measured parameters, but depend on human judgment and may also be mixed with information about building quality. Thus, they carry a large amount of uncertainty. Therefore, from a purely physical/seismological perspective, PGA, PGV or the response spectrum are better parameters to describe ground shaking, whereas seismic intensities have their main value in providing information about historical earthquakes (i.e. that occurred before the advent of seismometric data acquisition).

On the other hand, as discussed in Wald et al (1999a), ShakeMaps of seismic intensities might aid interpretability in terms of rapid damage/loss estimation. Hence, converting instrumental ground motion intensity parameters into macroseismic intensities are beneficial for certain applications, and new relations, incorporating new data/methods, are useful.

Recently, Faenza and Michelini (2010) have presented new relations between PGA, PGV and intensities for Italy. They also provide an excellent, extensive review and discussion of different methodologies in this context, and provide an example of the application of the results in ShakeMap.

Relations like eq. (6.1) have been applied successfully in the context of ShakeMaps. However, in eq. (6.1), the target variable I is treated as continuous, while it is inherently a discrete, ordinal variable. Therefore, dependent on the value of X, a model like eq. (6.1) can predict an estimate T:US.

of seismic intensity of, say, I = Such a value, however, is not meaningful in terms of a discrete variable like seismic intensity, and it either has to be rounded to the next integer value, or an interpretation like “the intensity is 75% VII and 25% VI” has to be applied. Neither of these interpretations is completely satisfactory though. Furthermore, uncertainties in the intensity estimates are usually taken into account via a normal distribution, which is also continuous. For example, given a PGV-value of 5 cm/s, the model of Faenza and Michelini (2010) predicts an estimated intensity value of 6.75 with a standard deviation of 0.26, which is hard to interpret in terms of a discrete variable.

Here, we present a method that directly estimates intensities as a discrete variable: Naive Bayes classification (e.g. Mitchell, 1997, chapter 6). In this context, Bayes’ rule is used to estimate the €r(I X). We explain naive Bayes classification in section 6.2.

(discrete) conditional distribution Our approach is similar to the one taken by Ebel and Wald (2003), who propose a Bayesian method to estimate the conditional distribution of an instrumental ground motion parameter X €r(X I). The method of Ebel and Wald requires estimates of given modified Mercalli intensity, P (I X), which they obtain using a strategy that is similar to a naive Bayes classifier, even though they do not call it that way.

The paper of Ebel and Wald (2003) has its main emphasis on the estimation of continuous instrumental ground motion parameters from seismic intensities, using Bayesian updating. Here, on the other hand, we focus on predicting seismic intensities from instrumental ground motion parameters by a naive Bayes classifier. In this context, it is important to point out that the name Naive Bayes Classification Bayes classifier originates from the use of Bayes’ rule in the analysis, but does not pertain to any €r(X I) statistical philosophy. On the other hand, the method of Ebel and Wald (2003) to estimate €r(X), can be considered “Bayesian” in that it requires the specification of a prior probability €r(I X) (see eq. (1) of Ebel and Wald (2003)). However, the parameters in which is updated by Ebel and Wald (2003) are determined using maximum likelihood.

The goal of the present note is not to make a full-fledged investigation into the use of naive Bayes classification for predicting intensities, but rather to make simple comparisons between naive Bayes methods and regression models like eq. (6.1). Therefore, we use the dataset of Faenza and Michelini (2010), since it is freely available and is the basis of one of the most recent relations connecting seismic intensities and PGA/PGV.

In our paper, we often use the word “learn”. We use this term in a rather broad sense, where “learning a model” means building the model and estimating its parameters. This is a reference to the machine learning community, where learning is used in this sense.

A few words on notation: Upper case symbols (e.g. X, Y, I) denote random variables. Vectors are represented in bold face (e.g. X), and subscripts refer to each random variable or feature of a vector (e.g. Xi is a feature of X). Lower case symbols denote values of a random variable (i.e., Xi = Xij refers to the random variable Xi taking on its jth possible value). We use the notation 5(z) 5(Y to denote the number of elements that satisfy property z (e.g. = yi ) is the number of instances in the data where Y takes on the value yi ).

6.2 Naive Bayes Classification Suppose we have a variable Y that is categorical, i.e. has discrete instantiations, and that depends on some other variables X = {XI : : : XN }. A naive Bayes classifier predicts the the conditional distribution of Y given X using Bayes rule

–  –  –

This assumption greatly reduces the number of parameters to learn, since now we only have to esti€r(X j = xjk Y = yi ), which can be estimated from the relative frequencies mate the probabilities 5(X

in the data:

j = xjk ∧ Y = yi ) €r(X j = xjk Y = yi ) = ;

5(Y (6.6) = yi ) where ∧ means “logical and”. The assumption of conditional independence of the predictor variables is often not very realistic from a physical perspective, but it works surprisingly well in many cases. In this context, it is important to keep in mind that the intention of the naive Bayes classifier is not to be a generative, physical model of the data generating process, but to predict Y given X.

Even if the assumption of conditional independence does not represent the physics of the problem, it is often sufficient for prediction. For example, in our problem PGV and PGA are not independent of each other, but for the purpose of predicting intensities it suffices to assume that they are.

The name naive Bayes comes from this naive assumption.

As a side note, in a regression with more than one predictor variable, these are also usually assumed independent. Contrary to the naive Bayes classifier, however, where we state the assumption of independence explicitly, it is only implicit in regression.

In the case of continuous predictor variables Xj, Bayes’ rule can still be used to estimate €r(Y €r(X = yi X), but now the conditional distributions j Y ) cannot be specified using eq.

(6.6). Instead, one can make the assumption that for each possible value yi of Y, the continuous variable Xj follows a parametric distribution, e.g. a normal or log-normal distribution. Then, the learning task is to estimate the parameters of that distribution. To reduce the number of parameters, one can also assume that some parameters are the same for each yi, e.g. that all standard deviations are the same. When it is not possible to make the assumption of a parametric distribution for continuous inputs, one can try to use a non-parametric method such as Kernel density estimation (e.g. Hastie et al., 2001, chapter 6.6), or discretize the continuous variables.

We stress again that the name Bayes classifier comes from the use of Bayes’ theorem [eq. (6.3)].

It should not be confused with Bayesian inference. There is nothing inherently Bayesian about the method as it is outlined above, all parameters are estimated by maximum likelihood. It would nevertheless be possible to estimate the parameters using Bayesian inference.

Naive Bayes Classifiers Connecting PGA, PGV and seismic intensities

6.3 Naive Bayes Classifiers Connecting PGA, PGV and seismic intensities

In this section, we learn naive Bayes classifiers relating seismic intensities to PGA and PGV. For our analysis, we use the same dataset as Faenza and Michelini (2010). They discuss in detail the data assemblage and the properties of the dataset. In total, the dataset comprises 266 intensities from 66 Italian earthquakes in 12 intensity classes from 2 to 8. Note that there are also intermediate intensities with values.5. Since these are not physically meaningful, we treat them such that they belong to both the class with full integer value above and below with a respective weight of 0.5.

The intensity scale of the dataset is the Mercalli-Cancani-Sieberg scale (Sieberg, 1930). Hereafter, a seismic intensity is denoted by I. As predictor variables we consider PGA (in cm/sP ) and PGV (in cm/s). The minimum and maximum PGA values are 0.29 cm/sP and 569.55 cm/sP, respectively.

PGV ranges from 0.01 cm/s to 34.39 cm/s.

In our case, the predictor variables PGA and PGV are continuous variables. However, many studies have found one can assume a log-normal distribution of the ground motion intensity palnX

rameter for each intensity class, i.e. is normally distributed given each intensity class:

–  –  –

where X is either PGV or PGA. Faenza and Michelini (2010) have shown that this assumption is well justified for their dataset. Hence, we need to estimate the mean values and standard deviations ln(PGA) ln(PGV of and ) for each intensity class

–  –  –

€r(I), where X is either PGA or PGV. We also need the prior distribution which can be calculated from the relative frequencies in the dataset using eq. (6.4). In addition to an individual standard deviation for each intensity class as estimated by eq. (6.9), we also estimate a common standard deviation for all intensity classes (but different for PGA and PGV). We do that since the number of intensities is small for some intensity classes, which does not allow a robust estimation of an individual standard deviation. The common standard deviation is determined by

–  –  –

To assess the performance of different classifiers, we use leave-one-out cross-validation to estimate the generalization error (e.g. Hastie et al., 2001, chapter 7; Kuehn et al., 2009a) of the classifiers. The generalization error is a measure of the error that is made when predicting unseen data. Therefore, we drop one record from the dataset, learn the classifiers for the rest of dataset, classify I for the left out data point, and calculate the residual to the actual value. This is done for all data points, and the resulting residuals are averaged to give an estimate of the generalization error. We also perform leave-one-out cross-validation for regression, using the following


–  –  –

The parameters of the regression models are learned by averaging the logarithmic ground motion values for each intensity class, following common practice (see e.g. Faenza and Michelini, 2010).

Similar to the naive Bayes classifier, P GA and P GV are (implicitly) assumed to be independent in eq. (6.13).

The intensity predictions of the different models are made in the following way: For the naive Bayes classifiers, we simply take the most probable intensity value, i.e.

–  –  –

In eq. (6.14), the first term of the numerator is the “prior probability” of intensity class ik, the €r(X I = ik ) under the assumption of conditional independence (see second term represents eq. (6.5)), while the denominator is the marginal distribution of X. Hence, the argument of the —rgm—x €r(IX), where X is either {P GA}, {P GA} function is the full conditional distribution or {P GA; P GV }. The modal value of this distribution is the predicted intensity value. From the Naive Bayes Classifiers Connecting PGA, PGV and seismic intensities Table 6.2: Generalization errors for different classifiers/regression models, calculated with the 0” 1 loss L(I; I (X)). NBX;sd is a naive Bayes classifier with the same standard deviation for all intensity classes.

–  –  –

Pages:     | 1 |   ...   | 14 | 15 || 17 | 18 |   ...   | 20 |

Similar works:

«Bioshock and the Art of Rapture Grant Tavinor Philosophy and Literature, Volume 33, Number 1, April 2009, pp. 91-106 (Article) Published by The Johns Hopkins University Press DOI: 10.1353/phl.0.0046 For additional information about this article http://muse.jhu.edu/journals/phl/summary/v033/33.1.tavinor.html Access provided by Ontario College of Art ACCESS_STATEMENT Design (OCAD) (26 Sep 2013 17:01 GM Grant Tavinor BIOSHOCK AND THE ART OF RAPTURE I am Andrew Ryan, and I am here to ask you a...»

«Aus dem Institut für Tierzucht und Tierhaltung der Agrarund Ernährungswissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel STATISTICAL PROCESS CONTROL FOR IMPROVING MANAGEMENT IN PIG PRODUCTION Dissertation zur Erlangung des Doktorgrades der Agrarund Ernährungswissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von Dipl.-Ing. agr. JULIA ENGLER aus Kiel Kiel, 2006 Dekan: Prof. Dr. S. Wolffram Erster Berichterstatter: Prof. Dr. J. Krieter...»

«[FORM A] SOUTHERN UNION STATE COMMUNITY COLLEGE INDIVIDUAL DEPARTMENT CHAIR EVALUATION YEAR 2013-14 NAME: Edward K. Pigg DATE: 02/28/2013_ STATUS: Non-Probationary_ REVEIEWED BY: _ DATE: _ The following is a request for information that will assist your Dean in the development of your annual evaluation. Please respond to the following items and return them to your Dean within 2 weeks. Please assemble your information in the order of this request. Following receipt of your information, your Dean...»

«6 chapter Synthesis chapter 6 synthesis introduction Carbon is the most important macronutrient for plants and represents often between 45 - 50 % of their dry weight (Schlesinger 1991). In the marine environment carbon is available in different forms (CO2, bicarbonate (HCO3-), carbonate (CO32-)), summarized by the term dissolved inorganic carbon (DIC). Total DIC concentrations in open oceans are approximately 2.2 mM,...»

«promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Sociological Research Online.White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/74458 Published paper Meah, A., Watson, M. (2011) Saints and slackers: challenging discourses about the decline of domestic cooking, Sociological Research Online, 16 (2), 6 http://dx.doi.org/10.5153/sro.2341...»

«flurmöbel geringe tiefe flurmöbel geringe tiefe Flurmöbel Shop | flurmöbel.home24.de Ihr Experte für Flurmöbel Jetzt versandkostenfrei bei HOME24. Flurmöbel moebel.de Flurmöbel für Dein Zuhause gesucht Finde die Möbel, die Du liebst! Garderoben bei BAUR | Garderoben.BAUR.de Finden Sie preiswerte Garderoben: Baur hat die Riesen-Auswahl! Top Möbel Angebote Online Sofa, Betten, Dekoration Mehr. Sofa, Betten, Dekoration Mehr. Hier sparen Sie bei der Einrichtung! Garderobe...»

«The Cardiac Society of Australia and New Zealand Guidelines for the use of antiplatelet therapy in patients with coronary stents undergoing non-cardiac surgery These guidelines were developed by a committee convened by the Cardiac Society of Australia and New Zealand (CSANZ) comprising representatives from the Royal Australasian College of Surgeons (RACS), the Australian and New Zealand College of Anaesthetists (ANZCA), the Royal Australasian College of Dental Surgeons (RACDS), the Australasian...»

«SPEA UNDERGRADUATE HONORS THESIS A Case Study on Spotify Exploring Perceptions Kate Swanson Spring 2013 Dr. Monika Herzig Professor of Arts Management A Case Study on Spotify Kate Swanson Arts Management Senior Abstract submitted for Undergraduate Research Symposium Dr. Monika Herzig Professor of Arts Management School of Public and Environmental Affairs (SPEA) Faculty Mentor Spotify is a commercial music streaming service providing music content from a range of major and independent record...»

«SPEEDING UP XML QUERYING Satisfiability Test & Containment Test of XPath Queries in the Presence of XML Schema Definitions Dissertation by Jinghua Groppe Lübeck, Germany, July 2008 Groppe, Jinghua: SPEEDING UP XML QUERYING : Satisfiability Test & Containment Test of XPath Queries in the Presence of XML Schema Definitions / Jinghua Groppe. – Als Ms. gedr. – Berlin : dissertation.de – Verlag im Internet GmbH, 2008 Zugl.: Lübeck, Univ., Diss., 2008 ISBN 978-3-86624-381-1 Bibliografische...»

«Conquering terra firma: The copper problem from the isopod's point of view WOLFGANG WIESER Institut fiir Zoologie der Universitiit Innsbruck, Innsbruck, Osterreich KURZFASSUNG: Eroberung des Festlands: Das Kupferproblem vom Gesichtspunkt der Isopoden. Marine Crustaceen mfil~ten theoretisch in der Lage sein, dem Atemwasser geniigend viel Kupfer zu entnehmen, um selbst schwerste t~igliche Verluste zu ersetzen. Im Vergleich dazu diiri°ce das in der Pflanzennahrung angebotene Cu mengenm~iffig eine...»

«Ability—A form of client preparatory change talk that reflects perceived personal capability of making a change; typical words include can, could, and able. Absolute Worth—One of four aspects of acceptance as a component of MI spirit, prizing the inherent value and potential of every human being. Acceptance—One of four central components of the underlying spirit of MI by which the interviewer communicates absolute worth, accurate empathy, affirmation, and autonomy support. Accurate...»

«IN, A RO U N D, A N D A F T E RT H O U G H T S (ON D O C U M E N T A RY P H O T O G R A P H Y ) D        D           : S        W       , 1 9 7 5 – 2 0 0 1 Jacob Riis, Hell on Earth, 1903. Riis commented: “One night, when I went through one of the worst dives I ever knew, my camera caught and held this scene....When I look upon that unhappy girl’s face, I think that the Grace of God can...»

<<  HOME   |    CONTACTS
2016 www.thesis.xlibx.info - Thesis, documentation, books

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.