ReGate: Enabling Power Gating in Neural Processing UnitsThe energy efficiency of neural processing units (NPU) plays a critical role in developing sustainable data centers. Our study with different generations of NPU chips reveals that 30%–72% of their energy consumption is contributed by static power ...... | MICRO-2025 | A | 3 | 2025-11-05 01:28:20.039Z |
Flexing RISC-V Instruction Subset Processors to Extreme EdgeThis paper presents an automated approach for designing processors that support a subset of the RISC-V instruction set architecture (ISA) for a new class of applications at Extreme Edge. The electronics used in extreme edge applications must be area ... | MICRO-2025 | A | 3 | 2025-11-05 01:28:08.935Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:27:57.940Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:27:46.906Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:27:35.752Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:27:24.691Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:27:13.552Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:27:02.085Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:26:51.013Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:26:39.608Z |
| Explain icons... |
| MICRO-2025 | A | 3 | 2025-11-05 01:26:28.524Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:26:17.475Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:26:06.306Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:25:54.958Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:25:43.865Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:25:32.854Z |
Elevating Temporal Prefetching Through Instruction CorrelationTemporal prefetchers can learn from irregular memory accesses and hide access latencies. As the on-chip storage technology for temporal prefetchers’ metadata advances, enabling the development of viable commercial prefetchers, it becomes evident that... | MICRO-2025 | A | 3 | 2025-11-05 01:25:21.464Z |
Ghost Threading: Helper-Thread Prefetching for Real SystemsMemory latency is the bottleneck for many modern workloads. One popular solution from literature to handle this is helper threading, a technique that issues light-weight prefetching helper thread(s) extracted from the original application to bring da... | MICRO-2025 | A | 3 | 2025-11-05 01:25:10.101Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:24:58.796Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:24:47.792Z |
| MICRO-2025 | A | 2 | 2025-11-05 01:23:42.767Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:23:35.230Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:23:24.144Z |
A Probabilistic Perspective on Tiling Sparse Tensor AlgebraSparse tensor algebra computations are often memory-bound due to irregular access patterns and low arithmetic intensity. We present D2T2 (Data-Driven Tensor Tiling), a framework that optimizes static coordinate-space tiling schemes to minimize memory... | MICRO-2025 | A | 3 | 2025-11-05 01:23:13.075Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:23:02.055Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:22:51.064Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:22:40.067Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:22:29.063Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:22:18.072Z |
ATR: Out-of-Order Register Release Exploiting Atomic RegionsModern superscalar processors require large physical register files to support a high number of in-flight instructions, which is crucial for achieving higher ILP and IPC. Conventional register renaming techniques release physical registers conservati... | MICRO-2025 | A | 3 | 2025-11-05 01:22:07.037Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:21:56.060Z |
Titan-I: An Open-Source, High Performance RISC-V Vector CoreVector processing has evolved from early systems like the CDC STAR-100 and Cray-1 to modern ISAs like ARM’s Scalable Vector Extension (SVE) and RISC-V Vector (RVV) extensions. However, scaling vector processing for contemporary workloads presents ...... | MICRO-2025 | A | 3 | 2025-11-05 01:21:45.068Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:21:33.869Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:21:22.584Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:21:11.615Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:21:00.575Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:20:49.605Z |
| MICRO-2025 | A | 3 | 2025-11-05 01:20:38.565Z |
Multi-Stream Squash Reuse for Control-Independent ProcessorsSingle- core performance remains crucial for mitigating the serial bottleneck in applications, according to Amdahl’s Law. However, hard-to-predict branches pose significant challenges to achieve high Instruction-Level Parallelism (ILP) due to frequen... | MICRO-2025 | A | 3 | 2025-11-05 01:20:27.548Z |
LoopFrog: In-Core Hint-Based Loop ParallelizationTo scale ILP, designers build deeper and wider out-of-order superscalar CPUs. However, this approach incurs quadratic scaling complexity, area, and energy costs with each generation. While small loops may benefit from increased instruction-window siz... | MICRO-2025 | A | 3 | 2025-11-05 01:20:16.208Z |