Architecture and Real-Time Systems (ARTS) Laboratory

 Architecture and Real-Time Systems (ARTS) Laboratoryclock

Cool-* Cool-Cache

Cool-Cache and Minimax Cache examine the potential for power/energy savings at the data cache for multimedia/embedded systems through intelligent data partitioning. As a prestep, we examined the memory performance and memory management of scalars. In particular, we established the minimum size of a memory partition that would allow us to map and manage all scalar accesses in a program statically, and describe compiler techniques to automate the extraction of this information.

Additionally, we study the cache behavior of scalar accesses for these architectures, including reduction in cache misses due to separation of scalars from other types of memory accesses.

We also evaluated the sensitivity of register file size on the volume of scalar related memory accesses, and its impact on the applications' overall cache performance.

Our results indicated that scalars in multimedia applications had a small memory footprint but a high access frequency. This motivated us to explore data cache energy savings for embedded/multimedia workloads without sacrificing performance. We developed two complementary media-sensitive energy-saving techniques that leverage static information.

In our first technique, we employed data partitioning for scalars. This approach requires few, if any, modifications to current architectures and compilers. The characteristics of scalar accesses motivated us to direct those accesses to a small scratchpad SRAM area or alternatively a minicache. Although accessed very frequently, this small SRAM or minicache is more energy efficient than when scalar data are mapped into the large L1 data cache.

In the second phase, we aimed for greater energy savings through graceful but powerful architectural/compiler paradigm redefinitions. We designed and introduced a compiler-controlled tagless caching framework, hotlines, which achieved significant energy savings. Our hotlines framework saved energy without substantial performance loss, in some cases even beating traditional hardware-based cache performance. The compiler-directed cache is a flexible, compiler-generated data cache that replaces the tag-memory and cache controller hardware with a compiler-managed tag-like data structure. Being software based, the cache is highly reconfigurable - such parameters as line-size and associativity can be tailored to each application to provide maximum performance.

Using the mediabench benchmark suite, our experiments showed that substantial energy savings are possible in the data cache. Using the Wattch energy profiling tool and across a wide range of cache and architectural configurations we obtain up to 77% energy savings, while the performance varies from 14% improvement to 4% degradation depending on the application. The following figure shows Coolcache percentage data cache energy savings compared to two similarly sized monolithic caches for Mediabench applications.

Benchmarking Cool-Cache