Extreme value statistics has to do with the estimation of the probability of rare events occuring. We shall illustrate this by an example taken from [Castillo88] below:
Consider a civil engineer designing a jetty on a sea. The civil engineer spends several days measuring the heights of waves at the site and gets a sequence of measurements, ranging between 2ft to 5ft, with an average of 3ft and a standard deviation of 2.5 ft.. The jetty itself is being built using pylons which are 20ft in height - any wave higher than that will wash away the pylon and destroy the jetty. What is the probability that a wave of more than 20ft will appear at this site, based on the available data?
A second variant of this problem is as follows. Assume that there are 10 pylons, and the jetty can survive the loss of upto 4 pylons. So now we are interested in knowing the probability of 4 20ft waves appearing at this site.
A third problem is as follows. Imagine that there are `n' different traffic sources transmitting data into a network where there is a single bottleneck buffer of capacity B. This buffer is subject to a limit check. If the total amount of data exceeds B, then the excess data is also dropped. Note that in this case we are looking a different type of variable, one which is the sum of random variables; off course, it is also random. If the random variables are identical and independent, then the sum is a Martingale, a random variable with very interesting properties. We shall review some properties of martingales in this document as well.
The problems above are further complicated further by the fact that the knowledge of the underlying process can be very limited. In many cases, the probability density function of the underlying process is not known. More or less all we can know about the underlying process is what can be estimated by direct observation i.e. the mean and potentially the variance. In statistics, it is typical to use the Gaussian distribution in such cases. However, Gaussian is a good example only when we are considering values close to the observed mean, not when we are interested in rare events. Also, the penalty of using the wrong c.d.f can be very severe.
In the following sections, we shall see how to estimate probabilities of these so called 'large deviation or rare' events.
A random variable is the quantified outcome of an experiment. Thus, if the experiment consists of drawing a straight line on the ground and dropping a pin onto it, the outcome is the position of the pin on the line. The quantification of this outcome would be, for example, the distance of the pin from one end-point of the line.
The cumulative density function or cdf of a random variable X is the function F(k) = P(X<=k). Clearly, F(k) increases monotonically from -infinity to +infinity and has a value 1 at k=+infinity.
Extreme value statistics are to do with exceedances. An exceedance is the event when a given random variable X, or the sum of random variables S=
exceeds a given level. The former case
We start with some very simple estimates. Given a random variable X, with a mean µ and a standard deviation v², the Markov Inequality
provides a bound on the probability of the exceedance of c. A tighter bound is provided by the Chebyshev's inequality
. Note that these estimates require knowledge of only the mean and standard deviation, which can be directly estimated from a sample of the random variable; see here.
Next consider where S =
. A strong limit for S is provided by the Chernoff's Inequality
, which holds if the absolute value of X is smaller than 1.0 for all samples.
Consider a random variable X, whose cumulative density function F(x) = P(X <= x) is known for all values of x. We take n samples of X =
. Now we sort them in ascending order to get
. This sorted list is known as the order statistics of the process X. The rth entry in the list
is called the rth order stat. Clearly
is the smallest value (minima) of the list and
is the maxima. For example, if the original list consists of 8 samples { 0, -2.3, 4, 1.3, 3.65, 1.49, -4, 2.19}, the ordered list becomes { -4, -2.3, 0, 1.3, 1.49, 2.19, 3.65, 4}. Then, the 2nd ordered stat, for example is -2.3.
Order statistics are of immense value for questions in practical design. For example, consider a system which has been designed to survive of load of some maximum amount V in a given day. If we take the random variable Xi to be the peak load for the ith day, the probability that the system will last 1000 days is just the probability that out of 1000 samples, the 1000th ordered statistic is less than X. Similar uses can be found for the rth ordered statistic. For example, consider a wireless base station. The channel has k channels and can ask for more channels if a certain fraction µ of these channels have more than m users. The random variable is the current load on each channel. The probability that the base station asks for more channels is the probability that the µ*kth order statistic is greater than m.
| Month | Statistical estimate | Intelligence estimate | Actual data (verified from German records) |
| June 1940 | 169 | 1000 | 122 |
| June 1941 | 244 | 1500 | 271 |
| August 1942 | 327 | 1550 | 342 |
Some common formulae for the estimators of ordered statistics are given below:
| Distribution function | Formula |
| Cdf of the rth order statistic | ![]() |
| cdf of the maximum among n samples | ![]() |
| cdf of the minimum among n samples | ![]() |
| cdf of the range | ![]() |
Note that
is the Incomplete Beta function.
The above formulae have one very large limitation; they deal with the higher order powers of the cumulative density functions. When we say that we "know" the cdf of a process, we mean that we either have a theoretical model of the process or we have done an empirical study and fitted a model into it. In either case, the cdf is an approximation; even more importantly, the act of approximation typically consists of fitting the mean and other central moments and not really about fitting the cdf, especially for larger values. Now consider that we have used a cdf F(), where the correct cdf was G(). Let us say that the two cdfs are very close to each other. F(a)=0.98 and G(a) = 0.99. However, if we are considering the cdf of the maximum from 100 samples, then the difference between
and
is actually very large! Thus, a very small error of estimation is blown up significantly. To avoid this problem, we use limit distributions.
There are many cases when either the number of samples n is very large, or the cdf of the parent process is not known. In these situations, the table given above is of no use. Rather, we use limit distributions to estimate the cdf of the maximum or minimum values. There are three limit distributions, shown in the table below.
| Distribution | Maxima limit | Minimum Limit |
| Frechet | ![]() | ![]() |
| Weibull | ![]() | ![]() |
| Gumbel | ![]() | ![]() |
The applicability of the limit distribution for each case depends on the nature of the cdf for the process and some other factors. Also, for the first two distributions, a value of
has to be computed. The interested reader is invited to see section 3 of [Castillo88]. The following table summarizes some results for some well-known distribution functions.
| cdf of X | Limit Distribution to use | value of ![]() |
| Cauchy distribution | Frechet | 1 |
| Uniform distribution | Weibull | 1 |
| Exponential distribution | Weibull | 1 |
[Castillo88] Enrique Castillo, "Extreme Value Theory in Engineering", Academic Press, 1988 [Williams] David Williams, "Probability with Martingales" , Cambridge University PressMaintainer: abheek.saha@hsc.com
Page Information
|
Wiki Information |
![]() Update to PBwiki 2.0 An entirely new PBwiki experience, including folders and easier editing. |