With the wide spread use of multicore and many-core processors,
current memory system designers try to push the memory bandwidth and capacity to meet the user demands. A negative side-effect is the
continuous increase on memory power consumption. In addition, main
memory system design is severely limited by the rigid architecture
that requires the memory controller to track the internal status of
all memory devices (chips) and schedule the timing of all device
operations. As a result, DRAM memory system is heading to the
scalability wall. New memory technologies such as Phase-Change Memory (PCM) and STT-RAM emerge as potential alternatives to replace DRAM in future memory systems. Although those technologies have better energy-efficiency and scalability than DRAM, they also suffer from low write-endurance and long write-latency. Thus, new memory architectures are needed for supporting future memory systems and balancing among performance, energy-efficiency, capacity and lifetime.
To address the issue, we propose a systematic support for improving
memory system efficiency at the architecture level by three steps.
Firstly, a new DRAM scheduling algorithm called Delayed Row Activation is proposed to make the DRAM more energy-efficient by allowing memory ranks stay at a low-power mode longer if the data bus ownership cannot
be acquired immediately after row activation finishes. Secondly, we
present a heterogeneous mini-rank memory architecture that allows
concurrently running applications to have different sub-rank widths
based on their memory access behavior. By dynamically assigning and changing the sub-rank configurations, the balance can be achieved between the performance and power saving, and large performance loss
can be avoided. Lastly, we build a new memory architecture framework
called Universal Memory Architecture (UniMA) that can support different memory technologies in a computer system by decoupling the scheduling of device operations from memory controller. A bridge chip
is added to each memory module to perform device-specific scheduling
locally. The experimental results demonstrate that our schemes can
save DRAM power, provide optimal energy efficiency for mini-rank kind of design and integrate diverse memory technologies into one memory
system with small overhead.