HSCTechnicalWiki


view edit history print Talk subscribe
SearchWiki
Inspired by: Support Wikipedia

Views: 369

Full site statistics

Authors:

edit SideBar

Main » Fixed Point Representation

PageList

Papers

Tutorials

HSC welcomes all external visitors to this site, especially students and members of the academic community. Please use the comments box at the bottom of each page to record any comments or suggestions for improvement.

Need for fixed point representation in DSP applications

Most of the commercially available DSPs doesnt support floating point arithmetic due to cost of extra silicon imposed on the hardware. For implementing algorithms that need floating point operations in DSPs, one needs to emulate the floating point operations in software. The software emulated floating point operations are very costly in terms of cycle usages. This results into degradation of the performance of the algorithm.

Fixed point implementation of algorithms gives a significant improvement in execution speed due to inherent support for integer mathematics in DSPs. Since, fixed point numbers have reduced range and accuracy, the fixed point implementation of algorithm may suffer in terms of accuracy. Extra care has to be taken for overflow and saturation.

Binary representation of fixed point

An N-bit binary number can represent 2N states. These states can be mapped to represent virtually any 2N entities. In case of fixed point numbers, these states represent rational numbers. A rational number is in form of a/b. Here a and b are integers. For fixed point representation, b is of form 2n. n decides the position of binary point. The virtual binary point is not represented.

Concepts

There are many representation formats for representing fixed point numbers. This document usages the format as prescribed by Texas Instrument.

Format

A fixed point binary number can be represented in form of Qm.n.

  • Q: Texas Instruments representation for signed fixed-point numbers.
  • m (optional; default=0): The number of bits used to designate the two's complement integer portion of the

number, exclusive of the sign bit.

  • n: The number of bits used to designate the two's complement fractional portion of the number, i.e. the

number of bits to the right of the binary point.

So, Q7.8 means that there are 7 bits for representing integer portion, 8 bits for fractional portion and 1 bit is reserved for sign. Similarly Q15 means there are 0 bits for integer portion, 15-bits for fraction and 1 sign bit.

Range

The difference of largest and the smallest real number that can be represented using fixed point format is known as the range of the fixed point format.

For Qm.n format, Range = [-2^m , 2^m - 2^{-n}]

Example: Q7.8 Range = [ - 2^7, 2^7 - 2^{-8} ] = [-128 , 127.99609] Q15 Range = [ -2^0 , 2^0 - 2^{-15} ] = [-1, 0.9999695]

Resolution

The difference between two consecutive real numbers in the fixed point format is known as its resolution. The resolution of fixed point format defines the precision with which numbers can be stored in given Q format. For Qm.n format, Resolution = 2^-n^ Example:

Q7.8Resolution = 2^{-8} = 0.0039063
Q15Resolution = 2^{-15} = 0.0000305

Word Length

Word length defines the storage requirement for a fixed point number format. For Qm.n format, Word Length = m+n+1 bits.

Conversion

Fixed to Float

To convert a number from Qm.n format to floating point:

  1. Convert the number directly to floating point
  2. Divide by 2^n

Float to Fixed

To convert a number from floating point to Qm.n format:

  1. Multiply the floating point number by 2^n
  2. Round to the nearest integer

Basic Arithmetic

Following sections explain the basic arithmetic over fixed point numbers. It shall be noted that fixed point numbers are a ratio of two integers, the numerator is kept in storage, the denominator is equal to 2^n.

Addition

\frac{num1}{d} + \frac{num2}{d} = \frac{num1 + num2}{d}

For addition, the two numbers must be in same Qm.n format. So, adding the two fixed point number is simple integer addition. The result is in same Qm.n format.

 unsigned short SATURATE_FLAG = 1;  //to be read by functions
 unsigned short OVERFLOW_FLAG = 0;  //to be written by
 functions

 signed short Fx_Add(signed short x, signed short y)
 {
    signed long z;
    OVERFLOW_FLAG = 0;
    z = (signed long)x + y;

    /* check if no overflow happens */ 
    if(z = -32768)
        return(x + y);

    OVERFLOW_FLAG = 1;

    if(SATURATE_FLAG == 1)
        if(z 
            return(0x8000);
        else
            return(0x7FFF);
    else
        return(x + y);
 }

Subtraction

\frac{num1}{d} - \frac{num2}{d} = \frac{num1 - num2}{d}

For subtraction, the two numbers must be in same Qm.n format. So, subtracting the two fixed point number is simple integer subtraction. The result is in same Qm.n format.

 unsigned short SATURATE_FLAG = 1;  //to be read by functions
 unsigned short OVERFLOW_FLAG = 0;  //to be written by
 functions

 signed short Fx_Sub(signed short x, signed short y)
 {
    signed long z;
    OVERFLOW_FLAG = 0;
    z = (signed long)x - y;

    /* check if no overflow happens */ 
    if(z = -32768)
        return(x - y);

    OVERFLOW_FLAG = 1;

    if(SATURATE_FLAG == 1)
        if(z 
            return(0x8000);
        else
            return(0x7FFF);
    else
        return(x - y);
 }

Multiplication

\frac{a}{2^{n1}} * \frac{b}{2^{n2}} = \frac{a.b}{2^{n1+n2}}

Two fixed point number in Qm1.n1 and Qm2.n2 format can be multiplied. Simple * operator will result into Qm.n number. Here n = n1 + n2 . Since the * operation result into two bits being reserved for sign. As a convention, the result is shifted left by 1 to have a single sign bit. So, the result is in Qm.n format:

\begin{eqnarray} m = m1 + m2 \\ n = n1 + n2 + 1 \end{eqnarray}
 signed short Fx_Mult(signed short x, signed short y)
 {
    signed long z;

    z = x * y;
    z
 }

Example

Below is example of fixed point C code for implementing sine function. The algorithm used here is CORDIC. CORDIC: (COordinate Rotation DIgital Calculation) finds the sine or cosine of an angle iteratively, using only simple math operations such as add, subtract, compare, shift, and table lookup.

Figure 1: CORDIC

The diagonal blue line is angle \theta above the horizontal. The diagonal red line is the blue line rotated counter-clockwise by angle \theta_1 . The new X and Y values are related to the old X and Y values as follows:

\begin{eqnarray} X2 = X1 * \cos(\theta) - Y1 * \sin(\theta) \\ Y2 = X1 * \sin(\theta) + Y1 * \cos(\theta) \end{eqnarray}

For CORDIC, the final angle \theta_2 is the angle of interest; the angle whose sine or cosine we want to calculate. The initial angle \theta_1 is set to a convenient value such as 0. Rather than rotating from \theta_1 to \theta_2 in one fell swoop, we move in steps. With careful choice of step values, the only math used is shifts and adds. The equations above can be re-written as:

\begin{eqnarray} X2 = \cos(\theta) * (X1 - Y1 \tan(\theta) )\\ Y2 = \cos(\theta) * ( X1 \tan(\theta) + Y1 ) \end{eqnarray}

Values for \theta are chosen such that \tan(\theta) is a fractional power of 2:

\tan(\theta21) = 1/1 \theta21 = 45 \cos(\theta21) = 0.707107
\tan(\theta32) = 1/2 \theta32 = 26.5650 \cos(\theta32) = 0.894427
\tan(\theta43) = 1/4 \theta43 = 14.0362 \cos(\theta43) = 0.970142
\tan(\theta54) = 1/8 \theta54 = 7.12502 \cos(\theta54) = 0.992278
\tan(\theta65) = 1/16 \theta65 = 3.57633 \cos(\theta65) = 0.998053
\tan(\theta76) = 1/32 \theta76 = 1.78991 \cos(\theta76) = 0.999512
\tan(\theta87) = 1/64 \theta87 = 0.895174 \cos(\theta87) = 0.999878
\tan(\theta98) = 1/128 \theta98 = 0.447614 \cos(\theta98) = 0.999969

This lets us replace the multiplication by \tan(\theta)4 with a simple, fast right-shift operation. The cosine factors fall out, to form an iterative product, i.e.

\cos(\theta21) * \cos(\theta32) * \cos(\theta43) ... * \cos(\theta nn)

Expressing the \theta values in terms of inverse tangents gives the equivalent product series

\prod_{N=0}^{\infty} \frac{2^N}{\sqrt{1 + 2^N}} = 0.607253

The value, to which it converges, 0.607253, is the aggregate constant. \cos(\theta) terms are ignored and multiplied in the beginning or end.

C Code

 signed short Fx_Sine(signed short z)
 {
    signed short at14? = {0x2000, 0x12e4, 0x09fb, 0x0511,
    0x028b,
                           0x0146, 0x00a3, 0x0051, 0x0029, 0x0014,
                           0x000a, 0x0005, 0x0003, 0x0001};

    signed long k=0x4dba0000;
    signed long x, y, x1, y1;
    signed short i, z1;

    x = k; //holds the cos(z) value 
    y = 0;//holds the sine(z) value

    /* compute only in 1st and 4th quadrant. */
    /* Subtracting PI/2 (0x4000) from positive angle values and
    Adding     
     * PI/2 from negative angle values will result into the new
     angle 
     * between PI/2 to PI/2. The sine and cosine terms has to be 
     * accordingly changed as per trigonometric identities.
     */
 if( z >= 0)
        z -=0x4000; 
    else
    {
        z +=0x4000;
        x = -x;
    }

    for(i = 0; i 
    {
        x1 = x >> i;
        y1 = y >> i;
        z1 = ati?;

        if (z >= 0)
        {
            x = x - y1;
            y = y + x1;
            z = z - z1;
        }
        else
        {
            x = x + y1;
            y = y - x1;
            z = z + z1;
        }
    }

    return(x>>16);
 }

Performance

The performance of Fixed Point implementation of an algorithm is measured by the SQNR (Signal to Quantization Noise Ratio). SQNR is defined as ratio of energy of actual signal and energy of quantization noise.

SQNR = \frac{\sum S^2}{\sum E^2}

The SQNR for the above Fixed Point C code is 77.464756 dB.

Figure 2: Plot of sine wave

References

[1 ] Eric L. Oberstar, Fixed-Point Representation and Fractional Math, Oberstar Consulting. [2 ] Randy Yates, Fixed-Point Arithmetic: An Introduction, Digital Signal Labs [3 ] http://my.execpc.com/~geezer/embed/cordic.htm [4 ] Wikipedia:Q_

Comments

Add Comment 
Email address(will be kept hidden) 
Enter code:

Page last modified on November 11, 2009, at 07:35 AM