This article discusses 'rare' events and the special science of predicting and accounting for them in system design. Understanding rare events is of interest in many disciplines of telecommunications and software engineering, from predicting packet losses in a G/1 queue to doing reliability design.
A rare event is an event which is caused by a culmination of multiple tiny improbable events. Since there are multiple events which contribute, and the probabilities of each of these events are low, the probability of their concurrent occurence is lower still. For example, the probability of each and every member of Systique having a birthday on the same day is a candidate, since this calls for multiple unrelated birthdays to all be on one particular day. However, the event that a person wins a lottery is, in a sense, not a candidate for a large deviation process, because it simply consists of one improbable event and cannot be broken down.
How does one estimate the probability of a rare event? The brute force means would be to find all different permutations of the constituent events which cause this rare event, estimate the probability of each and then go on to estimate the probability of the rare event. Not only is this cumbersome, but it runs into the old problem of having to deal with large powers of fractions; any errors in the fraction value is magnified significantly. Also, the knowledge of the probability distribution functions of the constituent events are required, which is a big constraint.
An even more complex issue is that the rare event can come about by a large number of paths. In some cases, the number of these paths can be infinite. Consider the sum of two random numbers S=X1 and X2. The event that {S ≥ c} for some fixed c can occur if X1 ≥ c or X1 ≥ c or if X1 = c-k & X1 ≥ k. k can take an infinite number of values, and for each possible value of k, we have a different 'path' by which the rare event we are trying to estimate can be actualized. Large deviation theory steps into the rescue here too - if there is one path, which has a "cost" significantly lower than other paths, it is overwhelmingly the case that the rare event will always be actualized by that path alone.
The large deviation principle is an alternate means to this end. Large deviation theory allows us, under certain conditions, to find the rate function for the random variable in question. This allows us to directly estimate the event probabilities asymptotically.
Consider a process X, for example, the birthdate (day/month). For each student, we have recorded their birthdates; in other words, we have taken a sequence {Xn} = {X1, X2, ..., Xn } of samples of the process X. Let us take a random variable Zn which is some function of {Xn}. Now consider the event that {Zn < c} for some real number c. Information Theory tells us that the amount of information contained in this event is the log of the probability of the event. i.e. the rarer the event, the lower its probability and the higher the information content.
Thus, for a particular class of 8 students, the samples read {11th Jan, 18th Jan, 11th March, 30th June, 29th July, 8th August, 22nd September, 30th Nov}.The large deviation principle establishes the conditions by which an event can be treated as a large deviation event and allows us to estimate, in the limit, the probability of that event.
The large deviation principle requires that we find a positive, semi-continuous and smooth function I(c), such that
. As the equation indicates, the probability of the event can be estimated by knowledge of the event itself (which is stored in the variable c) and the size of sample available. Due to the structure of the formulation, it is clear that as n increases, the right hand side of the equation will converge for a good rate function. c here indicates the size of deviation we are interested in; as c increases, the event presumably gets rarer and rarer.
One of the most common events we are interested in is the nth empirical mean
of independent random variables. The term independent is emphasized here, because the proof of this theorem uses the scaling property of the moment generating functions for independent random variables. This theorem is widely used, for example, in queueing theory, to estimate the average size of the q, given n different sources of data. Chernoff's theorem (also known as Cramer's theorem) gives the rate function for this to be
where
is the well-known moment-generating functionext. If we know the distribution of xi and it is differentiable, we can easily solve this equation by using the differentiation technique for maximizing for theta.
The table below gives the rate function for various distributions.
| Distribution | Rate function |
| Bernoulli | |
| Gaussian | |
| Exponential | |
| Poisson | |
where all symbols have the usual meanings.
Chernoff's theorem extends in a more general way to arbitrary functions of {X1,....Xk} as follows:
where I(x) is the Fenchel Legendre transform
of
. Clearly, the former equation is a special case of this equation.
The section above assumes that the distribution of the random variable in question is known. We extend this to the case where the distribution is not known, by creating an empirical distribution function. Consider that we have a sequence of independent samples {1, 2.1, -0.9, 13, 7, 9, 4.2, 6.3}. We can define the empirical distribution function F(k) = P {X ≤ k} by counting the number of samples which are less than or equal to k and then dividing it by the total number of samples. The distribution function for sample set above is shown in
.
Kolmogorov's strong law of large numbers tells us that the empirical distribution computed here converges to the "true" distribution function as n grows large (the rate of convergence is O(sqrt(n)), so it is kind of slow).
The distribution function in turn induces an empirical probability measure for all sets within the domain i.e.
. Sanov's theorem tells us that we can now use a reference measure mu and create a rate function
where
. Since there may be multiple possible reference measures, we limit mu to those measures which will have make the event true. This is the set C in the equation above.
Let us now look at an application of rare events for communication network design; the concept of effective bandwidths. We shall define and derive the effective bandwidth for the simplest case, following the work of [Kelly]. Much work has been done on effective bandwidths, the interested reader is invited to see [Walrand], [Connell] and others.
Consider a router having buffer space B (in bytes/packets, whatever). When there is data in the buffer, the router forwards that data using a fixed transmission rate C. Source traffic enters the buffer such that the number of packets generated in the time interval t is given by a(t). The size of the queue is given by q(t) and it is clear that q(t+1) = {q(t) + a(t) - c)}^0, where a^b means max(a,b). Treating q(t) as a random variable, we can find the rate function I(c) using the general form of Chernoff's theorem discussed above such that
where
. The term
is called the effective bandwidth.
To understand the significance of the effective bandwidth, let us assume that a packet is dropped everytime the queue size is greater than the buffer (this is not strictly true; the difference between q(t) and B may be larger than one; a more exact expression can be worked out separately) and rewrite the first equation as
. This expression gives a relationship between the log of the packet drop rate (PER), the buffer size available (B), the transmission rate c and the function M(a). From the expression given above, the function M(a) is actually a measure of the disorderliness or entropy of the arrival function, compared to the clearing rate c. In other words, the effective bandwidth gives us a relationship between the likely packet drop rate that a packet stream experiences given the buffer available, the clearing rate c and the statistical 'disorderliness' of the packet stream. The larger the buffer, the more the ability to handle disorderliness i.e. lower packet drop rate. The more the disorderliness, the easier it is to drop packets, because of buffer overflow.
Since it is not always possible to compute the entropy of a packet stream, there are empirical ways to compute the entropy of a packet stream. The simplest method known to the author is the index of dispersion metric, discussed in [Courcoubetis]. Ilka Norros gives a more sophisticated formulation using the Hurst parameter in [Norros]
The effective bandwidth formula can be used in multiple areas of traffic management and flow control. Some problems can be created by the independence assumption - there is a large body of work which indicates that even aggregated Internet traffic is not independent, but heavily correlated. Replacing the independence assumption by using the Gartner Ellis theorem has been discussed in [Connell], [Chang] and others. We shall outline some of the uses of the effective bandwidth formula; these will be explained in more detail in a future article.
[Weiss] Alan Weiss, "An Introduction to Large Deviations for communication networks", IEEE Journal on Selected Areas of Communication, 1995 [Stanford] Course material at Stanford University http://www-stat.stanford.edu/~amir/large-deviations [Umich]Tutorial at University of Michigan, http://www.cscs.umich.edu/~crshalizi/notebooks/large-deviations.html [Kelly] F.P.Kelly, "Effective Bandwidths at Multi-class queues" [Walrand] Kesidis, Walrand, Chang, "Effective bandwidths for multiclass markov fluids and other atm sources" IEEE JSAC, August 1995 [Connell] N. Connell, "Large Deviations in Queueing Networks", April 1995 [Chang] C.S.Chang, "Effective bandwidth in high-speed digital networks" IEEE JSAC, August 1995 [Courcoubetis] C. Courcoubetis, J. Fouskas, Weber, "On the Performance of an Effective Bandwidths Formula", Technical Report, 1997 [Norros] I.Norros, "On the use of Fractional Brownian Motion in the theory of connectionless networks", IEEE Journal on Selected Areas of Communication, August 1995
Maintained by: abheek.saha@hsc.com
Views:
Page Information
|
Wiki Information |
![]() Update to PBwiki 2.0 An entirely new PBwiki experience, including folders and easier editing. |