Cookie-Cutter: Achieving Defect/Fault Tolerance For Large-Scale Systems with Highly Unreliable Components

2018-02-08T00:00:00Z (GMT) by Soumya Banerjee
The work proposes generalized “cookie-cutter” defect and fault tolerance approaches for nano-scale systems. The systems under considerations include Parallel Prefix Adders (PPA’s) and Large-scale Many-processor Systems. First, we show a systematic approach for designing defect tolerant PPA’s. It does not only allow the designers to select which adder to use in the design, but also gives the designers freedom to select the proper reliability-hardware trade-off point for the design. In addition, using the same systematic approach, we show how highly customizable Sparse PPA’s can be designed. For design of fault tolerant Many-Processor systems, we propose a novel 2-layered Router-Processing Element (Router-PE) model, which supports repairs of PE faults through a “chain of replacements”: the faulty PE is replaced by a PE in the neighborhood, which is in turn replaced by another PE nearby. This reconfiguration goes on until a spare is reached. We show that such a repair methodology, combined with the model, provides a systematic design approach for Many-Processor Systems facilitating simple lightweight repairs on-the-fly. Physical implementation of such system does not require significantly long interconnect overhead to deliver reasonable reliability.