Adaptive Microburst Control Techniques in Incast-Heavy Datacenter Networks
thesisposted on 01.08.2021, 00:00 by Hamed Rezaei
Datacenters host a mix of applications which generate qualitatively distinct traffic patterns and impose varying network objectives. Online, user-facing applications generate many-to-one, incast traffic of mostly short flows, which are sensitive to tail of Flow Completion Times (FCT). Data analytics applications generate all-to-all traffic (e.g., Web search) of mostly short flows that saturate network bisection and the job completions require all flows to complete. Background applications (e.g., Map-Reduce) generate large flows and are throughput sensitive due to the sheer amount of data that they transfer over the network. While datacenter fabric provides good bisection bandwidth to handle all-to-all traffic and background traffic, incast traffic is bottle-necked at edge switches and causes queue buildup at the switch port connected to the receiver server. Incasts are synchronized bursts of many-to-one short flows that fundamentally cause an over-subscription of the receiver (aggregator) link. Because datacenter switches use shallow buffers to reduce cost and latency, the queue buildup problem is further exacerbated as the shallow buffers easily overflow causing packet drops and expensive TCP timeouts. To address incast problem in datacenter networks, we proposed five solutions. Slytherin is the first discussed method, which is a novel flow scheduling scheme that targets those packets that fall in the tail (i.e., those that are delayed at multiple switches) and prioritizes them in the next hop switches. Therefore, most delayed packets are drained faster, and also, congestion report arrives faster at the receiver server as well. ICON is our next discussed method, which is a novel scheme that reduces incast-induced packet loss by setting a fine-grained control over sending rate by pacing traffic. ResQueue is the third discussed scheme that uses a combination of flow size and packet history to calculate the priority of each flow. ResQueue detects those packets that were dropped before (possibly during an incast) and then increases their priority in the next round. Therefore, these packets will not be dropped again even in severe cases of congestion. Our evaluation shows that ResQueue improves tail flow completion times of short flows by up to 60% over the state-of-the-art flow scheduling mechanisms. Superways, the fourth discussed method, is a heterogeneous datacenter topology that provides higher bandwidth for for those servers that host incast application to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from other senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways guarantees 95% improvement (on average) in tail latency when it is implemented on top of state-of-the-art topologies in which the servers run DCTCP as congestion control method. Finally, we will discuss Smartbuf, which is an online learning algorithm that accurately predicts buffer requirement of each switch port before the onset of congestion. Our key novelty lies in fingerprinting bursts based on the gradient of queue length and using this information to provision just enough buffer space. Our preliminary evaluations show that our algorithm can predict buffer demands accurately within an average error margin of 6% and achieve an improvement in the 99th percentile latency by a factor of 8x at high loads, while providing good fairness among ports.