«Thesis presented for the degree of Doctor Philosophiae July 2013 © Anders Løland, 2013 Series of dissertations submitted to the Faculty of ...»
Statistical modeling in electricity
and related markets
Thesis presented for the degree of Doctor Philosophiae
© Anders Løland, 2013
Series of dissertations submitted to the
Faculty of Mathematics and Natural Sciences, University of Oslo
All rights reserved. No part of this publication may be
reproduced or transmitted, in any form or by any means, without permission.
Cover: Inger Sandved Anfinsen.
Printed in Norway: AIT Oslo AS.
Produced in co-operation with Akademika Publishing.
The thesis is produced by Akademika Publishing merely in connection with the thesis defence. Kindly direct all inquiries regarding the thesis to the copyright holder or the unit which grants the doctorate.
Preface and acknowledgements “Before turning to those moral and mental aspects of the matter which present the greatest difﬁculties, let the inquirer begin by mastering more elementary problems.” — Sir Arthur Conan Doyle, A Study in Scarlet (1887) Most of the work presented in this thesis has been carried out in the research centre Statistics for Innovation – (sﬁ)2 – hosted by the Norwegian Computing Center (NR) and enthusiastically led by Arnoldo Frigessi. I have been fortunate to work with many diverse and interesting problems both in (sﬁ)2 and otherwise at NR.
First, I would like to thank all my coauthors: Xeni K. Dimakos, Egil Ferkingstad, Arnoldo Frigessi, Nils Lid Hjort, Ingrid Hobæk Haff, Lars Holden, Ragnar Bang Huseby, Ola Lindqvist, Antonio Pievatolo, Fabrizio Ruggeri and Mathilde Wilhelmsen. It has been fun to work with each of you. Being surrounded by clever and, even more important, enjoyable colleagues at NR, has been equally inspiring.
A very encouraging and convincing Fred Espen Benth made me assemble this thesis. Similarly, André Teigland, the head of my department, has been very supportive to this enterprise all the way.
Family and friends; if you happen to try to read this thesis and do not understand much, do not despair! Life is more than work, and you ﬁll my life with joy and meaning.
Oslo, July 2013 Anders Løland iii Contents 1 Introduction 1
1.1 Statistical modelling of electricity markets.................... 2 1.1.1 Regression, model averaging and forecasting.............. 4 1.1.2 Causality?................................. 6 1.2
Introduction “"Winwood Reade is good upon the subject", said Holmes. "He remarks that, while the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician."” — Sir Arthur Conan Doyle, The Sign of the Four (1890) From the early 1990s, markets for electricity and related energy products have been liberalised all over the world (Benth et al., 2008). For electricity, the Nordic market and the England & Wales market were the ﬁrst. The Nordic Nord Pool (Spot) power exchange was formally established in 1996, with Norway and Sweden as the only members. Finland followed suit in 1998, and Denmark joined the exchange in 2000.
As with the purely ﬁnancial markets, liberalised energy markets generate an abundance of valuable data, like spot and forward prices, electricity ﬂows and capacities. Hence, the statistician may enter the room. He will start asking questions like “Is there a risk premium in the market?”, “Are there causal links between energy markets?”, “What explains local prices?”, “Can we forecast transmission congestion?” and “What should be the price of complicated gas contracts?”. In the process, we will notice that many of his questions concern more than one variable (oil and gas prices, prices in more than one electricity market). The basic ingredient to study more than one variable is the concept of correlation, through a correlation (or covariance) matrix. For the modelling not to break down completely, these matrices must satisfy certain criteria; they must be proper. The statistician therefore asks again: “How should an improper correlation matrix be adjusted to be valid?".
In the following, an overview of problems and methods in statistical modelling of electricity markets, statistical cures for invalid correlation matrixes and statistical modelling in gas markets is given.
1.1 Statistical modelling of electricity markets “It is a capital mistake to theorize before you have all the evidence. It biases the judgment.”
Figure 1.1: Nord Pool Spot price areas as of January 2013.
The green lines denote possible ﬂows between the price areas. Source: http://www.nordpoolspot.com/ The Nordic1 electricity spot (or day ahead) power market, Elspot, is divided into several price areas (Benth et al., 2008; Kristiansen, 2004; Weron, 2006), with the system price being a common reference price (Figure 1.1). The different price areas result from capacity constraints.
In theory, if an overall market balance can be achieved without a need to utilise all available capacity between neighbouring areas, the prices are equal in all areas. This theoretical price is The Nordic electricity spot market is run by Nord Pool Spot.
called the system price. Transmission congestion within the Nord Pool area is not uncommon (Marckhoff and Wimschulte, 2009). During nighttime, the price is often equal in neighbouring areas, while price area differences are seen more often in periods with a high load, like during the winter and daytime.
Figure 1.2: A Nord Pool Spot system price curve.
The intersection between the purchase (consumers) and sell (producers) curves determines the spot price for this hour.
The Elspot prices are settled once every day for each of the 24 hours of the coming day, based on all bids from market players, buying or selling electricity with a certain volume (see Figure
1.2 for an example). The intersection between the purchase (consumers) and sell (producers) curves is the spot price.
Storing large quantities of electricity is a major challenge. If electricity is produced with gas or coal, the commodity may be stored, and one can sell electricity only when prices are high, and bidding in the spot market is relatively simple.
The Nordic electricity market is dominated by highly ﬂexible hydro power (54% in 2007 according to Fridolfsson and Tangerås (2009), 95% in Norway in 2010 according to Statistics Norway). The German EEX market, being the largest market in Europe, is on the other hand dominated by coal (47%) and nuclear power (23%) (Brunekreeft and Twelemann, 2005). Gas (17%), hydro and an rapidly increasing solar and wind power production complement the picture.
The EEX market is generally assumed to be less mature than the Nordic market (Weron, 2006;
Weigt and von Hirschhausen, 2008; Müsgens, 2006; Fridolfsson and Tangerås, 2009).
Even though electricity is a non-storable commodity, large water reservoirs makes hydro power partly storable. Run-of-the-river hydroelectric stations have small or no reservoir capacity, while also very large reservoirs exist that can store two or three years of inﬂow. Production and bidding is often planned using a mixture of medium- and long term optimisation models (Fosso et al., 1999) and very short term forecast models (Weron, 2006).
1.1.1 Regression, model averaging and forecasting “We’ve long felt that the only value of stock forecasters is to make fortune tellers look good.” — Warren E. Buffet, Chairman’s Letter, Berkshire Hathaway Inc. (1992) Long term generation scheduling models (Pereira, 1989; Pereira and Pinto, 1991; Wolfgang et al., 2005) solve a stochastic optimisation problem: Given a mathematical description of the market and a demand function, the aim is to maximise the socio-economic surplus for consumers and producers. The models can be very detailed. For example, the so-called Samkjøringsmodellen (Wolfgang et al., 2005) contains more than 500 water reservoirs and 250 hydro power plants for Norway. Models like Samkjøringsmodellen implicitly assume that their description of the market is perfect, and that assumption can never be correct.
Alternatively, we can turn to the world of mathematical ﬁnance. The classical model for commodity markets is the Schwartz (1997) model, which can be written as
The simplest combination method is to take the average of the n predictions, wi =, i = 1,..., n.
n This method works surprisingly well and is quite robust (Timmermann, 2006). It is surprising, since building a ’super’ model that incorporates all of the underlying, simpler forecast models often is expected to be the superior approach. Bates and Granger (1969) argue that this works
because discarded forecast models almost always contains some independent information:
1. One forecast model can be based on information or variables not present in another forecast model.
2. The different forecast models may be based on different assumptions about the form of the relationship between the variables.
A more sophisticated combining method is to let the weights be given as
where u = (1, 1,..., 1)T and Σ is the estimated covariance matrix of the forecast errors e = ˆ ˆ (1) − X,..., X (n) − X). This puts lower weights on models with highly variable prediction ˆ (X errors. The covariance matrix at time t is estimated based on the d previous forecast errors. It is common to assume zero correlation between the forecast errors (Timmermann, 2006), since the number of parameters will be large compared to the covariance estimation period d. Also, studies have shown that the prediction results rarely get better when correlations are estimated.
Clemen and Winkler (1986) point out that the estimate of Σ “can be quite unstable unless large data sets are available for estimation”, particularly when the pairwise correlations are high, as may often be the case with economic forecasts.
“If there is a will, there is a way.” — Yasser Arafat, former Chairman of the Palestine Liberation Organization When faced with multiple time series (such as the ones in Figure 1.5, which we will study later on), we are often interested in the dependence between the series. In (1.1) or (1.2), we might assume or hope that the covariates cause the response, but in general we can not be sure.
The Holy Grail of statistics has been causality. As Aalen and Frigessi (2007) puts it:
“For most of the 20th century the dominant attitude in statistics was that, as a statistician, one should shy away from causality. It was ﬁrmly stated by the founding fathers, especially Pearson but also to a large extent by Fisher, that statistics is only about association.” Other ﬁelds, such as econometrics (Granger causality, Hamilton (1994)) and machine learning (causal networks, Shimizu et al. (2006)) have not been as shy. We will here take a look at one machine learning approach.
We will assume that the observed variables can be arranged in a causal order, meaning that no variable can cause a preceding variable. This means that it can be represented by a directed acyclic graph (DAG) (Spirtes et al., 2000). Each variable is here a linear function of the values of the preceding variables, plus a noise term and an optional constant term,
Standard causal network analysis is based on the assumption that the variables, the εs in (1.3), are jointly normally distributed. With these assumptions we can estimate DAGs (Chickering, Figure 1.3: An example of a directed acyclical graph, representing a direct and indirect effect of X1 on X3.
2003), for example with
where we assume that the covariance matrix Σ is proper. (We will come back to the properness in Section 1.2.) Standard methods share a fundamental problem: A joint distribution may correspond to several DAGs, since they entail the same conditional independence relations among the observed variables. One therefore only obtains an equivalence class of DAGs that are indistinguishable from data. While some directions of causal inﬂuences (edges in the DAG) may be the same for all DAGs in the equivalence class, usually many or most directions are left undetermined.
If X1 ⊥ X3 | X2 (X1 and X3 are independent given X2 ) in (1.4) we can only ﬁnd equivalence ⊥ classes of causal networks or DAGs (Figure 1.4). Let us assume that all the variables (or all but one variable) of interest are non-Gaussian, which means that the εs in (1.3) are non-Gaussian.
Then we can distinguish between the three DAGs in Figure 1.4 using higher-order moments.
Equation (1.3) is then known as the LiNGAM (Shimizu et al., 2006). Ignoring the constant term and writing (1.3) in matrix form gives
X = BX + ε, (1.5)
where X = (y1,..., ym ), ε = (ε1,..., εm ) and B is the (permutable to lower triangular) matrix of coefﬁcients βij. The independence of the elements of ε implies that there are "no unobserved confounders" in the sense of Pearl (2000), so a causal interpretation is valid (Shimizu et al.
(2006), Section 2). Letting A = (I − B)−1, we can rewrite (1.5) as
X = Aε. (1.6)
Since the variables in ε are independent and non-Gaussian, (1.6) deﬁnes the Independent Component Analysis (ICA) model (Comon, 1994; Hyvärinen and Oja, 2000). For ICA, the goal is to estimate both the so-called mixing matrix A and the independent components ε. We therefore aim to ﬁnd A and ε such that the entries of ε are as statistically independent as possible.
Non-Gaussianity can be measured by entropy. The entropy of a random vector X with density f is deﬁned as H(X) = − f (X) log f (X)dX. Gaussian variables have the highest possible entropy among random variables with a given variance. Hence, we can measure non-Gaussianity based on neg-entropy J. Neg-entropy is deﬁned by
J(X) = H(X g ) − H(X),