The TCP/IP protocol stack is the common thread that links today’s LANs, WANs and SANs (Storage Area Networks). Existing host CPU handles conventional ethernet of 100 Mbps, without any problem. But as the network interconnect speed advance to Gigabit ethernet or even 10 Gigabit ethernet, host CPU can become bottleneck since more CPU cycle require to process TCP/IP protocol than business critical applications.
A fundamental obstacle to improving network performance is that servers are designed for computing rather than input and output (I/O). The intranet/internet revolution has drastically changed server requirements, and I/O is becoming a major bottleneck in delivering high-speed computing. The main reason for the bottleneck is the TCP/IP stack being processed at a rate less than the network speed.
As we rapidly advance in technology, 10 Gigabits ethernet is not far away. We have to have some solution that makes optimum use of high speed network, minimizing capital investment and high ROI (Return On Investment). This paper briefs how TCP/IP Offload Engine (TOE) technology achieves goals.
An application that generates a write to a remote host over a network produces a series of interrupts to segment the data into packets and process the incoming acknowledgments. This creates a significant amount of context switching. For example, a typical 64K bit/sec application write to a network results in 60 or more interrupt-generating events between the system and a generic NIC to segment the data into Ethernet packets and process the incoming acknowledgments. This creates significant protocol-processing overhead and high interrupt rates. To reduce the overhead, interrupt processing aggregation techniques can be used, though they do not reduce event processing.
In case of standard NICs application data is copied from user space to kernel address space. The NIC driver then can copy the data from the kernel to the on-board packet buffer. This requires multiple trips across the memory bus. When packets are received from the network, the NIC copies the packets to the NIC buffers, which reside in host memory. Packets then are copied to the TCP buffer and, finally, to the application itself—a total of three memory copies. This is a considerable overhead if ethernet is Gigabits. Just take an example of a Gigabits network. For 1500-byte packets, the host OS stack would need to process more than 83,000 packets per second, or a packet every 12 microseconds. Smaller packets put an even greater burden on the host CPU.
The traditional host OS–resident stack must handle a large number of requests to process an application’s 64 KB data block send request. Acknowledgments for the transmitted data also must be received and processed by the host stack. In addition, TCP is required to maintain state information for every data connection created. This state information includes data such as the current size and position of the windows for both sender and receiver. Every time a packet is received or sent, the position and size of the window change and TCP must record these changes. Protocol processing consumes more CPU power to receive packets than to send packets. A standard NIC must buffer received packets, and then notify the host system using interrupts. After a context switch to handle the interrupt, the host system processes the packet information so that the packets can be associated with an open TCP connection. Next, the TCP data must be correlated with the associated application and then the TCP data must be copied from system buffers into the application memory locations. TCP uses the checksum information in every packet that the IP layer sends to determine whether the packet is error-free. TCP also records an acknowledgment for every received packet. Each of these operations results in an interrupt call to the underlying OS. As a result, the host CPU can be saturated by frequent interrupts and protocol processing overhead. The faster the network, the more protocol processing the CPU has to perform.
The rule of thumb used by network planner is 1 bit/sec of network application requires 1 Hz of CPU processing. To process 100 Mbps of data(our conventional ethernet operates)—100 MHz of CPU computing power is required, which today’s CPUs can handle without difficulty. However,bottlenecks can begin to occur when Gigabit Ethernet and 10 Gigabit Ethernet is introduced. At these network speeds, with so much CPU power devoted to TCP processing, relatively few cycles are available for business application processing. Also if host system has multiple Gigabit Ethernet NICs, problem compounds since there will be a one TCP/IP stack sharing all the traffic.
The installation of an additional CPU board in a server or an additional server in a rack can solve the problem of business application starving. This, however, is not an effective solution in terms of cost, power consumption, and real estate. It is also an unattractive approach for cost-sensitive installations where the leverage of the existing hardware investment is critical, minimizing ROI (return on investment).
A TOE, embedded in NIC, or host bus adapter, HBA is the answer to the problem. The idea is to offload all TCP/IP stack processing to NIC, dedicated stack processor. As shown in figure TCP/IP layers are offloaded to NIC.

Aggressively, TOE changes the system transaction model. The new transaction model is “one event per application network I/O” beating old model “one event per ethernet packet”. The 64K bit/sec application write becomes one data-path offload event, moving all packet processing to the TOE and eliminating interrupt load from the host. Common traffic pattern is multiple applications transfers network data. TOE provides maximum benefit in this type of traffic pattern.
A TOE-enabled NIC can reduce the number of buffer copies to two: The NIC copies the packets to the TCP buffer and then to the application buffers. A TOE-enabled NIC using Remote Direct Memory Access (RDMA) can use zero-copy algorithms to place data directly into application buffers. The capability of RDMA to place data directly eliminates intermediate memory buffering and copying, as well as the associated demands on the memory and processor resources of the host server—without requiring the addition of expensive buffer memory on the Ethernet adapter. RDMA also preserves memory-protection semantics.
TOE manages TCP states and windows size on its own. All packet processing, namely segmentation, reassembling, acknowledgment, re-transmission, out of order delivery, buffering, checksum checking are done by TOE. This relives host system from excessive context switch and stack processing.
When calculation oriented applications dominated CPU cycle, a dedicated math co-processor was introduced. When 3D modeling and simulation application weighted more, graphics co-processor was introduced. Catching the same tradition, TOE is nothing but a TCP/IP co-processor. It provides an immediate and apparent ROI. It also has a great real estate advantage over adding an auxiliary server for installations with space constraints. By offloading TCP/IP protocol processing, the TOE NAC allows users to leverage their existing hardware investment, while freeing up the CPU to process real applications.
TOE adapter implementations that contain an embedded processor or multiple specialized ASICs may be too expensive for the market to bear. Consequently, companies are working on integrated ASIC solutions to reduce cost. Single chip TOE implementations require putting a protocol typically handled in software, TCP/IP, into fixed function silicon. Implementing TCP/IP completely in hardware poses a significant technical challenge.
Never need to worry on this challenge, because we have Moore's law with us. Gradually advances in technology will make it cheaper,affordable and flexible.
Currently there is no standard driver interface for major operating systems and TOE adapters. Major operating systems are working on standard interfaces
to TOE to facilitate simple support of TOE solutions.
Number of views:
Page Information
|
Wiki Information |
![]() Update to PBwiki 2.0 An entirely new PBwiki experience, including folders and easier editing. |