HSCTechnicalWiki


view edit history print Talk subscribe
SearchWiki

Views: 397

Full site statistics

Authors:

edit SideBar

Main » Rare Events and Communications Networks

PageList

Papers

Tutorials

HSC welcomes all external visitors to this site, especially students and members of the academic community. Please use the comments box at the bottom of each page to record any comments or suggestions for improvement.

Introduction

This article discusses 'rare' events and the special science of predicting and accounting for them in system design. Understanding rare events is of interest in many disciplines of telecommunications and software engineering, from predicting packet losses in a G/1 queue to doing reliability design.

Definitions

A rare event is an event which is caused by a culmination of multiple tiny improbable events. Since there are

 multiple events which contribute, and the probabilities of
 each of these events are low, the probability of 

their concurrent occurence is lower still. For example, the probability of each and every member of Systique having a birthday on the same day is a candidate, since this calls for multiple unrelated birthdays to all be on one particular day. However, the event that a person wins a lottery is, in a sense, not a candidate for a large deviation process, because it simply consists of one improbable event and cannot be broken down.

How does one estimate the probability of a rare event? The brute force means would be to find all different permutations of the constituent events which cause this rare event, estimate the probability of each and then go on to estimate the probability of the rare event. Not only is this cumbersome, but it runs into the old problem of having to deal with large powers of fractions; any errors in the fraction value is magnified significantly. Also, the knowledge of the probability distribution functions of the constituent events are required, which is a big constraint.

An even more complex issue is that the rare event can come about by a large number of paths. In some cases, the number of these paths can be infinite. Consider the sum of two random numbers S = X_1 + X_2. The event that \{ S \ge c \} for some fixed c can occur if X_1 \ge c or X_2 \ge c or if X_1 = c-k \text{ and } X_2 \ge k . k can take an infinite number of values, and for each possible value of k, we have a different 'path' by which the rare event we are trying to estimate can be actualized. Large deviation theory steps into the rescue here too - if there is one path, which has a "cost" significantly lower than other paths, it is overwhelmingly the case that the rare event will always be actualized by that path alone.

The large deviation principle is an alternate means to this end. Large deviation theory allows us, under certain conditions, to find the rate function for the random variable in question. This allows us to directly estimate the event probabilities asymptotically.

What is the large deviation principle?

Consider a process X , for example, the birthdate (day/month). For each student, we have recorded their birthdates; in other words, we have taken a sequence \{X_n\} = \{X_1, X_2, \ldots, X_n \} of samples of the process X. Let us take a random variable Z_n which is some function of \{X_n\}. Now consider the event that \{Z_n < c \} for some real number c. Information Theory ? tells us that the amount of information contained in this event is the log of the probability of the event. i.e. the rarer the event, the lower its probability and the higher the information content.

Thus, for a particular class of 8 students, the samples read {11th Jan, 18th Jan, 11th March, 30th June, 29th July, 8th August, 22nd September, 30th Nov}. The large deviation principle establishes the conditions by which an event can be treated as a large deviation event and allows us to estimate, in the limit, the probability of that event.

Formal statement of the large deviation principle

The large deviation principle requires that we find a positive, semi-continuous and smooth function I(c) , such that

\lim_{n \rightarrow \infty} \frac{1}{n} \log(\text{Pr}\{Z_n \le c\}) \equiv -I(c) \forall c \in \mathbb{R}
.

As the equation indicates, the probability of the event can be estimated by knowledge of the event itself (which is stored in the variable c) and the size of sample available. Due to the structure of the formulation, it is clear that as n increases, the right hand side of the equation will converge for a good rate function. c here indicates the size of deviation we are interested in; as c increases, the event presumably gets rarer and rarer.

Chernoff's theorem

One of the most common events we are interested in is the nth empirical mean Z_n = \frac{\sum_k^n x_k}{n} of independent random variables. The term independent is emphasized here, because the proof of this theorem uses the scaling property of the moment generating functions for independent random variables. This theorem is widely used, for example, in queueing theory, to estimate the average size of the q, given n different sources of data. Chernoff's theorem (also known as Cramer's theorem) gives the rate function for this to be

I(c) = \max_\theta (\theta c - \log(M(\theta)))
where
M(\theta) = \mathbb{E} e^{\theta x}
is the

well-known moment-generating function. If we know the distribution of x_i and it is differentiable, we can easily solve this equation by using the differentiation technique for maximizing for theta. The table below gives the rate function for various distributions.

DistributionRate function
Bernoulli I(c) = c \log(c/p) (1-c)\log((1-c)/(1-p))
Gaussian I(c) = 0.5 \frac{ (c - \mu)^2 }{ \sigma^2 }
Exponential I(c) = c \lambda - 1 - \log(c \lambda)
Poisson I(c) = c \log(\frac{c}{\lambda}) + \lambda - c

where all symbols have the usual meanings.

''What happens if the sequence of random variables is not independent? The Gartner-Ellis theorem is used in this case. For more information, the reader can consult [addref]''

Chernoff's theorem extends in a more general way to arbitrary functions of \{X_1,\ldots,X_k\} as follows:

\lim_{n \rightarrow \infty} \frac{1}{n} \log(P \{f(X_1,X_2,\ldots,X_n) < c\} \equiv -I(c) \forall c \in \mathbb{R}

where I(x) is the Fenchel Legendre transform

I(c) = \max_\lambda (c \lambda - M(\lambda))

of

M(a) = \log(\sum_{i \in \mathbb{R}}P(X_i)e^{af(X_i)})
. Clearly, the former equation is a special case of this equation.

Empirical measures and Sanov's theorem.

The section above assumes that the distribution of the random variable in question is known. We extend this to the case where the distribution is not known, by creating an empirical distribution function. Consider that we have a sequence of independent samples {1, 2.1, -0.9, 13, 7, 9, 4.2, 6.3}. We can define the empirical distribution function F(k) = P \{X \le k\} by counting the number of samples which are less than or equal to k and then dividing it by the total number of samples. The distribution function for sample set above is shown in . Kolmogorov's strong law of large numbers tells us that the empirical distribution computed here converges to the "true" distribution function as n grows large (the rate of convergence is O(\sqrt(n)) , so it is kind of slow).

The distribution function in turn induces an empirical probability measure for all sets within the domain i.e.

P_n(a \le x \le b) = \frac{\text{samples }X \vert a \le X \le b}{\text{total samples} }
.

Sanov's theorem tells us that we can now use a reference measure \mu and create a rate function

I(\mu) = \mu \sum_{\mathbb{R}} \log(\frac{P_n}{\mu}

where

\begin{eqnarray} \lim_{n \rightarrow \infty} \frac{1}{n} P(P_n \in C) \le - \min_{P_n \in C} I(P_n) \\ C = \mu: \sum_{k=1}^n P_k(x_k) \ge c \end{eqnarray}

Since there may be multiple possible reference measures, we limit \mu to those measures which will have make the event true. This is the set C in the equation above.

Effective bandwidth

Let us now look at an application of rare events for communication network design; the concept of effective bandwidths. We shall define and derive the effective bandwidth for the simplest case, following the work of [Kelly]. Much work has been done on effective bandwidths, the interested reader is invited to see [Walrand], [Connell] and others.

Problem definition

Consider a router having buffer space B (in bytes/packets, whatever). When there is data in the buffer, the router forwards that data using a fixed transmission rate C. Source traffic enters the buffer such that the number of packets generated in the time interval t is given by a(t) . The size of the queue is given by q(t) and it is clear that q(t+1) = \max(q(t) + a(t) - c),0) . Treating q(t) as a random variable, we can find the rate function I(c) using the general form of Chernoff's theorem discussed above such that Pr\{q \ge x\} \equiv e^{-\theta x} where M(\theta) = c \theta . The term c(\alpha) = \frac{M(\alpha)} {\alpha} is called the effective bandwidth.

Significance of the effective bandwidth

To understand the significance of the effective bandwidth, let us assume that a packet is dropped everytime the queue size is greater than the buffer (this is not strictly true; the difference between q(t) and B may be larger than one; a more exact expression can be worked out separately) and rewrite the first equation as

-\log\{PER\} = \frac{M(\alpha) B }{ c \alpha }
.

This expression gives a relationship between the log of the packet drop rate (PER), the buffer size available (B) , the transmission rate c and the function M(a) . From the expression given above, the function M(a) is actually a measure of the disorderliness or entropy of the arrival function, compared to the clearing rate c . In other words, the effective bandwidth gives us a relationship between the likely packet drop rate that a packet stream experiences given the buffer available, the clearing rate c and the statistical 'disorderliness' of the packet stream. The larger the buffer, the more the ability to handle disorderliness i.e. lower packet drop rate. The more the disorderliness, the easier it is to drop packets, because of buffer overflow.

An empirical formula for effective bandwidth

Since it is not always possible to compute the entropy of a packet stream, there are empirical ways to compute the entropy of a packet stream. The simplest method known to the author is the index of dispersion metric, discussed in [Courcoubetis]. Ilka Norros gives a more sophisticated formulation using the Hurst parameter in [Norros]

Uses of the effective bandwidth formula

The effective bandwidth formula can be used in multiple areas of traffic management and flow control. Some problems can be created by the independence assumption - there is a large body of work which indicates that even aggregated Internet traffic is not independent, but heavily correlated. Replacing the independence assumption by using the Gartner Ellis theorem has been discussed in [Connell], [Chang] and others. We shall outline some of the uses of the effective bandwidth formula; these will be explained in more detail in a future article.

  • Bandwidth allocation for incoming traffic sources. The general theory can be expanded to handle bandwidth

allocation at an aggregate level, and further into allocation of bandwidth for multi-class traffic sources, offering different QoS models. See [Walrand] and others. Note that the bandwidth allocation may be both driven by factors independent of the incoming source i.e. fixed allocations, opportunistic scheduling, etc. and may also be matched to the incoming data. The effective bandwidth theory can be used for optimal matching of allocation patterns.

  • Partitioning of buffers. The concept is similar to the one above, but this relates more to input regulation -

for example appropriate settings for the Random Early Dropping protocol.

  • Cascaded, multiplexed and demultiplexed flows. The statistical description of individual flows can be combined

to predict the statistical description of the same flows at an aggregate level as well as the behaviour of the flows after passing through a router. This is of use in the analysis of multi-stage networks.

References

[Weiss] Alan Weiss, "An Introduction to Large Deviations for communication networks", IEEE Journal on Selected Areas of Communication, 1995

[Stanford] Course material at Stanford University http://www-stat.stanford.edu/~amir/large-deviations

[Umich]Tutorial at University of Michigan, http://www.cscs.umich.edu/~crshalizi/notebooks/large-deviations.html

[Kelly] F.P.Kelly, "Effective Bandwidths at Multi-class queues"

[Walrand] Kesidis, Walrand, Chang, "Effective bandwidths for multiclass markov fluids and other atm sources" IEEE JSAC, August 1995

[Connell] N. Connell, "Large Deviations in Queueing Networks", April 1995

[Chang] C.S.Chang, "Effective bandwidth in high-speed digital networks" IEEE JSAC, August 1995

[Courcoubetis] C. Courcoubetis, J. Fouskas, Weber, "On the Performance of an Effective Bandwidths Formula", Technical Report, 1997

[Norros] I.Norros, "On the use of Fractional Brownian Motion in the theory of connectionless networks", IEEE Journal on Selected Areas of Communication, August 1995

Maintained by: abheek.saha@hsc.com

Comments

Add Comment 
Email address(will be kept hidden) 
Enter code:

Page last modified on November 16, 2009, at 07:41 AM