posted on 2015-10-21, 00:00authored byEnzo Tartaglione
Current nanoscale designs are highly interconnect dominated, taking about 70% of the chip
area. Interconnects also consume a significant part of the dynamic power and are responsible
of about the 60% of signal delays. It is, thus, important to be able to synthesize much lower
interconnect-complexity designs than are possible with current high-level synthesis (HLS) tools
and algorithms. Towards that end, we have developed the following new paradigms in the
scheduling, binding and general architecture synthesis problems of HLS:
• Flexibly-structured that connect a few neighborhood functional units (FUs) instead of
dedicated interconnects between pairs of FUs, thereby sharing interconnects among a
number of FU pairs that need to communicate.
• Communication scheduling (followed by standard operation scheduling that respects the
communication schedules) in which communications between FUs are scheduled at ap-
propriate times to minimize the number of buslets needed, subject to buslet cardinality
constraints (for the purpose of upper bounding signal delay).
• Buslet binding techniques, aiming to respect both buslet cardinality constraint and a con-
straint on maximum fanin and fanout for the functional units. These techniques will range
from simple but effective approaches like chronological binding (CB) to more sophisticated
ones, like the use of lookahead approaches and simultaneous binding of iso-scheduled com-
munications (communications scheduled in the same clock cycle). Furthermore, in this
direction, similar solutions detection mechanism was developed, in order to improve the final quality of the result. Finally, also a force directed approach was used to solve the
binding problem (FDB). All these techniques were implemented and compared in terms
of both performance and complexity.
• Buslet power modeling. A number of configurations with multiple tri-state buffers for
interconnecting FUs through a buslet were implemented, aiming to minimize the total
power consumed using buslets. These range from techniques using minimum spanning
trees to more sophisticated structures with constraints on maximum graph distance be-
tween connected FUs to hierarchical partitioning.
Using the aforementioned techniques, we obtain significant wirelength (WL) reduction, ranging
between 35% and 71%, compared to conventional designs with dedicated interconnects between
communicating FU-pairs. The total chip area, including total FU area, also reduces in our
designs compared to conventional designs. The power, on the other side of the coin, will
increase with buslet size, but sublinearly. Empirical results show that we are able to limit
the increment of power consumed by buslets compared to dedicated-interconnect designs, to a
logarithmic function of the maximum buslet cardinality.