posted on 2019-12-01, 00:00authored byGeorge Arthur Dill
Parallel programs commonly synchronize shared memory accesses through implementation of mutual exclusion locks. This approach, where multiple OS threads access the same data structure, suffers from poor performance when data structures are contended and makes little use of a processor’s on-core cache.
Delegation, as presented by Roghanchi, et. al. in Gepard, is a scheme where a server thread is given exclusive control of a data structure and operations are requested by client threads via message passing. Delegation has been shown to increase throughput in highly parallel systems over coarse grained locks, however the approach suffers from the latency required to pass messages between threads. Gepard masks this latency by introducing concurrency in message passing through the use of lightweight user space threads called fibers. However, Gepard is unable to achieve enough concurrency to increase system throughput over fine grained locking approaches when variables are uncontended.
We present asynchronous designs for dedicated and flat delegation. The asynchronous API enables greater message passing concurrency and reduced memory footprint than that of Gepard. The impact is an over 30% increase in total system throughput, on our benchmark, than that of locking approaches when delegated data structures are resident in a delegation server’s on-core cache. We provide rationale for our design decisions and examine the performance of these designs on a fetch and add type microbenchmark.