Broken Traffic Control

Revision as of 05:07, 17 February 2011 by Drobbins (Talk | contribs)

The following traffic control examples should work, but are broken in older implementations of Linux traffic control, as tested by Daniel Robbins.

Important

Many enterprise kernels do not have updated traffic control code. RHEL5 kernels are particularly bad in this area.

Broken in RHEL5: Prioritizing Outgoing UDP packets

These rules should be able to give outgoing UDP traffic a higher priority than other outgoing traffic, which could reduce the latency of outgoing UDP packets if your system (or ideally, your Linux router, which is the ideal point to use these shaping rules) is sending a lot of data all at once to a remote host.

However, currently, these rules do not work.

wanif=eth0
tc qdisc add dev $wanif root handle 1: prio bands 3
tc qdisc add dev $wanif parent 1:1 handle 10: sfq perturb 10
tc qdisc add dev $wanif parent 1:2 handle 20: sfq perturb 10
tc qdisc add dev $wanif parent 1:3 handle 30: sfq perturb 10
 
tc filter add dev $wanif protocol ip parent 1: prio 1 u32 match ip protocol 0x11 0xff flowid 1:1
tc filter add dev $wanif protocol ip parent 1: prio 1 u32 match ip dst 0.0.0.0/0 flowid 1:2

This code above causes much data to be lost, since the code is not properly classifying all unmatched traffic using the lowest priority (leaf) class (1:3).

If the following line is added, which is intended to match all other IP traffic and ensure it gets to 1:3, then behavior improves, but all ARP packets are still discarded by the prio qdisc and the discarded counter of the qdisc is incremented accordingly:

tc filter add dev $wanif protocol ip parent 1: prio 1 u32 match ip src 0.0.0.0/0 flowid 1:3

I have also experienced issues where the priomap does not work properly, and the majority of traffic flows into the wrong child class. It also appears that the first tc filter command, which is supposed to prioritize UDP traffic, has the effect of placing all traffic into class 1:2.

This code was tested on a physical interface that was part of a Linux bridge, as well as on the Linux bridge device ('brwan') itself. In both cases, prio exhibited strange and unusual behavior that deviated significantly from documented behavior. The kernel being used for this testing is a stable Red Hat Enterprise Linux 5.x kernel with OpenVZ patches (2.6.18-028stab079.2).

Based on this testing, prio appears to not work correctly at all, at least when used in this configuration, and should be avoided by Linux traffic control users.

More documentation on this bug: