Figure 2 shows the average Trapeze TCP bandwidth on all four platforms as a function of the MTU. To show the effects of copying and checksumming, we tested with the zero-copy optimizations enabled and disabled, and with checksumming enabled and disabled. All checksumming is done in software except on the receiving side of the Monet configuration, which uses checksum offloading on the LANai-5 adapter as described in Section 2.2.2.
The graphs show the bandwidth costs of copying and checksumming, primarily on the older platforms with less memory system bandwidth. The effect is most apparent on the P-II/440LX, which is capable of a peak bandwidths of only 450 Mb/s if it is forced to copy the data, while peak bandwidth almost doubles to 780 Mb/s when the zero-copy optimizations are enabled. The cost of checksumming is more pronounced with zero-copy enabled, since the checksum code must bring the data into the CPU cache. On the P-II/440BX, superior memory system bandwidth allows the system to achieve close to its peak bandwidth even while copying or checksumming, but not both, and only for very large MTUs when the CPU is not already busy with packet-handling overheads. This is also visible on the Miata, which has comparable memory system bandwidth, but the effect is less pronounced because the I/O bus limits the achievable bandwidth. The Monet has adequate memory system bandwidth to deliver a peak bandwidth of 956 Mb/s for sufficiently large MTUs, even while copying and checksumming. Even so, copying and checksumming have a significant effect on the available CPU cycles remaining to process the data at these speeds, as Section 3.2 shows.
Figure 2 also shows the difficulty of achieving high bandwidth using the small 1500-byte MTUs of the Gigabit Ethernet standard. In addition to increasing packet handling overheads, small MTUs defeat the zero-copy optimizations. The combined effect causes the host CPU to saturate at bandwidths as low as 300 Mb/s, and none of the platforms is capable of using more than half of the available link speed. Section 3.2 examines the overheads in more detail. All platforms are capable of achieving most of their peak bandwidth at MTUs large enough to contain a TCP/IP header and a page of data; the Intel platform bandwidths rise faster because zero-copy kicks in at the 4KB page size, while the Alpha platforms use an 8KB page size.
While we were pleased with the Trapeze bandwidth results on Monet, which we believed to be an open-source record, we measured even higher bandwidths with Alteon's new Gigabit Ethernet products. The Monet delivers point-to-point TCP bandwidth of 988 Mb/s with zero-copy sockets over Alteon. The higher bandwidth is apparently due to lower overheads in the Alteon controller, which sports dual 100 MHz MIPS R4000-like processors delivering several times the processing capacity of the LANai-5 CPU. We anxiously await the LANai-7 from Myricom.