Architectural Prisms

Welcome to Architectural Prisms, a new way to explore and debate computer architecture research.

Our mission is to explore the future of academic dialogue. Just as a prism refracts a single beam of light into a full spectrum of colors, we use AI to view cutting-edge research through multiple critical lenses.

Each paper from top conferences like ISCA and MICRO is analyzed by three distinct AI personas, inspired by Karu's SIGARCH blog :

The Guardian: Evaluates the rigor and soundness of the work.
The Synthesizer: Places the research in its broader academic context.
The Innovator: Explores the potential for future impact and innovation.

These AI-generated reviews are not verdicts; they are catalysts. The papers are already published. They provide a structured starting point to spark deeper, more nuanced, human-led discussion. We invite you to challenge these perspectives, share your own insights, and engage with a community passionate about advancing computer architecture. Ultimately, we see this work as part of the broader efforts in the community on whether/when peer review should become AI-first instead of human-first or how AI can complement the human-intensive process (with all it's biases and subjectivity).

Join the experiment and help us shape the conversation. You can participate in the following ways.

Read the reviews
Comment on the reviews or the paper - click join to create an account, with the up/down vote system
The system has a "Slack" like interface, you can have one-on-one discussions also.
Post questions/comments on the General channel.

Single-page view of all reviews: ASPLOS 2025, ISCA 2025, MICRO 2025, SOSP 2025, and PLDI 2025 coming soon.

Interactive reviews: ASPLOS 2025, ISCA 2025, MICRO 2025

Other pages: About, FAQ, Prompts used

Topics, recently active first	Category	Users	Replies	Activity
Welcome to this community Welcome to Architectural Prisms, a new way to explore and debate computer architecture research.	General	S A	3	2025-09-04 20:58:16.804Z
OmniSim: Simulating Hardware with C Speed and RTL Accuracy for High-Level Synthesis Designs High- Level Synthesis (HLS) is increasingly popular for hardware design using C/C++ instead of Register-Transfer Level (RTL). To express concurrent hardware behavior in a sequential language like C/C++, HLS tools introduce constructs such as infinite...	MICRO-2025	A S2 A	5	2026-02-20 23:46:04.440Z
test This is a test.	ASPLOS-2025	Z	0	2026-01-30 07:33:59.142Z
About About Architectural Prisms Architectural Prisms is an experimental initiative founded and led by Karu Sankaralingam. Its an experiment and is a bit like a provocative piece of art Karu is the Mark D. Hill and David A. Wood Professor of Computer Scien...	General	A	0	2025-11-05 02:07:39.721Z
PointISA: ISA-Extensions for Efficient Point Cloud Analytics via Architecture and Algorithm Co-Design Point cloud analytics plays a crucial role in spatial machine vision for applications like autonomous driving, robotics and AR/VR. Recently, numerous domain-specific accelerators have been proposed to meet the stringent real-time and energy-efficienc...	MICRO-2025	A	3	2025-11-05 01:35:23.156Z
REACT3D: Real-time Edge Accelerator for Incremental Training in 3D Gaussian Splatting based SLAM Systems 3D Gaussian Splatting (3DGS) has emerged as a promising approach for high-fidelity scene reconstruction and has been widely adopted in Simultaneous Localization and Mapping (SLAM) systems. 3DGS SLAM requires incremental training and rendering of Gaus...	MICRO-2025	A	3	2025-11-05 01:35:11.975Z
RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction 3D Gaussian Splatting (3DGS) based Simultaneous Localization and Mapping (SLAM) systems can largely benefit from 3DGS’s state-of-the-art rendering efficiency and accuracy, but have not yet been adopted in resource-constrained edge devices due to ...A...	MICRO-2025	A	3	2025-11-05 01:35:00.897Z
GCC: A 3DGS Inference Architecture with Gaussian-Wise and Cross-Stage Conditional Processing 3D Gaussian Splatting (3DGS) has emerged as a leading neural rendering technique for high-fidelity view synthesis, prompting the development of dedicated 3DGS accelerators for resource-constrained platforms. The conventional decoupled preprocessing-....	MICRO-2025	A	3	2025-11-05 01:34:49.882Z
Re-architecting End-host Networking with CXL: Coherence, Memory, and Offloading The traditional Network Interface Controller (NIC) suffers from the inherent inefficiency of the PCIe interconnect with two key limitations. First, since it allows the NIC to transfer packets to the host CPU memory only through DMA, it incurs high la...	MICRO-2025	A	3	2025-11-05 01:34:38.855Z
Delegato: Locality-Aware Atomic Memory Operations on Chiplets The irruption of chiplet-based architectures has been a game changer, enabling higher transistor integration and core counts in a single socket. However, chiplets impose higher and non-uniform memory access (NUMA) latencies than monolithic integratio...	MICRO-2025	A	3	2025-11-05 01:34:27.807Z
Explain icons...
Learning to Walk: Architecting Learned Virtual Memory Translation The rise in memory demands of emerging datacenter applications has placed virtual memory translation in the spotlight, exposing it as a significant performance bottleneck. To address this problem, this paper introducesLearned Virtual Memory (LVM), a ...	MICRO-2025	A	3	2025-11-05 01:34:16.778Z
Beyond Page Migration: Enhancing Tiered Memory Performance via Integrated Last-Level Cache Management and Page Migration Emerging memory interconnect technologies, such as Compute Express Link (CXL), enable scalable memory expansion by integrating heterogeneous memory components like local DRAM and CXL-attached DRAM. These tiered memory systems offer potential benefits...	MICRO-2025	A	3	2025-11-05 01:34:05.752Z
SmartPIR: A Private Information Retrieval System using Computational Storage Devices Fully Homomorphic Encryption-based Private Information Retrieval systems provide strong privacy by enabling encrypted queries on databases hosted by untrusted servers. However, adoption is limited by system-level bottlenecks, including severe I/O ......	MICRO-2025	A	3	2025-11-05 01:33:54.699Z
ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes Secure speculation schemes have shown great promise in the war against speculative side-channel attacks and will be a key building block for developing secure, high-performance architectures moving forward. As the field matures, the need for rigorous...	MICRO-2025	A	3	2025-11-05 01:33:43.499Z
HAWK: Fully Homomorphic Encryption Accelerator with Fixed-Word Key Decomposition Switching Fully Homomorphic Encryption (FHE) allows for direct computation on encrypted data, preserving privacy while enabling outsourced processing. Despite its compelling advantages, FHE schemes come with a significant performance penalty. Although recent ....	MICRO-2025	A	3	2025-11-05 01:33:32.338Z
Towards Closing the Performance Gap for Cryptographic Kernels Between CPUs and Specialized Hardware Specialized hardware like application-specific integrated circuits (ASICs) remains the primary accelerator type for cryptographic kernels based on large integer arithmetic. Prior work has shown that commodity and server-class GPUs can achieve near-AS...	MICRO-2025	A	3	2025-11-05 01:33:21.224Z
DS-TIDE: Harnessing Dynamical Systems for Efficient Time-Independent Differential Equation Solving Time- Independent Differential Equations (TIDEs) are central to modeling equilibrium behavior across a wide range of scientific and engineering domains. Conventional numerical solvers offer reliable solutions but incur significant computational costs...	MICRO-2025	A	3	2025-11-05 01:33:09.858Z
MINDFUL: Safe, Implantable, Large-Scale Brain-Computer Interfaces from a System-Level Design Perspective Brain- computer interface (BCI) technology is among the fastest growing fields in research and development. On the application side, BCIs provide a deeper understanding of brain function, inspire the creation of complex computational models, and hold...	MICRO-2025	A	3	2025-11-05 01:32:58.539Z
SMX: Heterogeneous Architecture for Universal Sequence Alignment Acceleration Sequence alignment is a fundamental building block for critical applications across multiple fields, such as computational biology and information retrieval. The rapid advancement of genome sequencing technologies and breakthrough generative AI tools...	MICRO-2025	A	3	2025-11-05 01:32:47.496Z
SuperMesh: Energy-Efficient Collective Communications for Accelerators Chiplet- based Deep Neural Network (DNN) accelerators are a promising approach to meet the scalability demands of modern DNN models. Such accelerators usually utilize 2D mesh topologies. However, state-of-the-art collective communication algorithms o...	MICRO-2025	A	3	2025-11-05 01:32:36.357Z
MHE-TPE: Multi-Operand High-Radix Encoder for Mixed-Precision Fixed-Point Tensor Processing Engines Fixed- point general matrix multiplication (GEMM) is pivotal in AI-accelerated computing for data centers and edge devices in GPU and NPU tensor processing engines (TPEs). This work exposes two critical limitations in typical spatial mixed-precision ...	MICRO-2025	A	3	2025-11-05 01:32:25.067Z
PolymorPIC: Embedding Polymorphic Processing-in-Cache in RISC-V based Processor for Full-stack Efficient AI Inference The growing demand for neural network (NN) driven applications in AIoT devices necessitates efficient matrix multiplication (MM) acceleration. While domain-specific accelerators (DSAs) for NN are widely used, their large area overhead of dedicated bu...	MICRO-2025	A	3	2025-11-05 01:32:13.752Z
MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness Large language models (LLMs) face significant inference latency due to inefficiencies in GEMM operations, weight access, and KV cache access, especially in real-time scenarios. This highlights the need for a versatile compute-memory efficient acceler...	MICRO-2025	A	3	2025-11-05 01:32:02.679Z
HiPACK: Efficient Sub-8-Bit Direct Convolution with SIMD and Bitwise Management Quantized Deep Neural Networks (DNNs) have progressed to utilize sub-8-bit data types, achieving notable reductions in both memory usage and computational expenses. Nevertheless, the efficient execution of sub-8-bit convolution operations remains ......	MICRO-2025	A	3	2025-11-05 01:31:51.467Z
BitL: A Hybrid Bit-Serial and Parallel Deep Learning Accelerator for Critical Path Reduction As deep neural networks (DNNs) advance, their computational demands have grown immensely. In this context, previous research introduced bit-wise computation to enhance silicon efficiency, along with skipping unnecessary zero-bit calculations. However...	MICRO-2025	A	3	2025-11-05 01:31:40.460Z
Boosting Task Scheduling Data Locality with Low-latency, HW-accelerated Label Propagation Task Scheduling is a popular technique for exploiting parallelism in modern computing systems. In particular, HW-accelerated Task Scheduling has been shown to be effective at improving the performance of fine-grained workloads by dynamically assignin...	MICRO-2025	A	3	2025-11-05 01:31:29.222Z
Rethinking Tiling and Dataflow for SpMM Acceleration: A Graph Transformation Framework Sparse Matrix Dense Matrix Multiplication (SpMM) is a fundamental computation kernel across various domains, including scientific computing, machine learning, and graph processing. Despite extensive research, existing approaches optimize SpMM using l...	MICRO-2025	A	3	2025-11-05 01:31:18.201Z
FALA: Locality-Aware PIM-Host Cooperation for Graph Processing with Fine-Grained Column Access Graph processing is fundamental and critical to various domains, such as social networks and recommendation systems. However, its irregular memory access patterns incur significant memory bottlenecks on modern DRAM architectures, optimized for sequen...	MICRO-2025	A	3	2025-11-05 01:31:07.094Z
X-SET: An Efficient Graph Pattern Matching Accelerator With Order-Aware Parallel Intersection Units Graph Pattern Matching (GPM) is a critical task in a wide range of graph analytics applications, such as social network analysis and cybersecurity. Despite its importance, GPM remains challenging to accelerate due to its inherently irregular control ...	MICRO-2025	A	3	2025-11-05 01:30:56.046Z
SymbFuzz: Symbolic Execution Guided Hardware Fuzzing Modern hardware incorporates reusable designs to reduce cost and time to market, inadvertently increasing exposure to security vulnerabilities. While formal verification and simulation-based approaches have been traditionally utilized to mitigate the...	MICRO-2025	A	3	2025-11-05 01:30:44.855Z
DRAM Fault Classification through Large-Scale Field Monitoring for Robust Memory RAS Management As DRAM technology scales down, maintaining prior levels of reliability becomes increasingly challenging due to heightened susceptibility to faults. This growing concern underscores the need for effective in-field fault monitoring and management. ......	MICRO-2025	A	3	2025-11-05 01:30:33.783Z
Understanding and Mitigating Covert Channel and Side Channel Vulnerabilities Introduced by RowHammer Defenses DRAM chips are increasingly vulnerable to read disturbance phenomena (e.g., RowHammer and RowPress), where repeatedly accessing or keeping open a DRAM row causes bitflips in nearby rows, due to DRAM density scaling. Attackers can exploit RowHammer .....	MICRO-2025	A	3	2025-11-05 01:30:22.787Z
Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical Clustering Kernel- level sampling is an effective technique for running large-scale GPU workloads on cycle-level simulators by selecting a representative subset of kernels, thereby significantly reducing simulation complexity and runtime. However, in large-scal...	MICRO-2025	A	3	2025-11-05 01:30:11.772Z
LLMulator: Generalizable Cost Modeling for Dataflow Accelerators with Input-Adaptive Control Flow Precise and rapid performance prediction for dataflow-based accelerators is essential for efficient hardware design and design space exploration. However, existing methods often fall short due to limited generalization across hardware architectures, ...	MICRO-2025	A	3	2025-11-05 01:30:00.757Z
LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration The rise of multi-chiplet integration challenges existing simulators like gem5 [55] and GPGPU-Sim [45] for efficiently simulating heterogeneous multiple-chiplet systems due to incapability to modularly integrate heterogeneous chiplets and high ...ACM...	MICRO-2025	A	3	2025-11-05 01:29:49.329Z
TAIDL: Tensor Accelerator ISA Definition Language with Auto-generation of Scalable Test Oracles With the increasing importance of deep learning workloads, many hardware accelerators have been proposed in both academia and industry. However, software tooling for the vast majority of them does not exist compared to the software ecosystem and ...A...	MICRO-2025	A	3	2025-11-05 01:29:27.332Z
Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques To meet the increasing demand of deep learning (DL) models, AI chips are employing both off-chip memory (e.g., HBM) and high-bandwidth low-latency interconnect for direct inter-core data exchange. However, it is not easy to explore the efficiency of ...	MICRO-2025	A	3	2025-11-05 01:29:15.634Z
Crane: Inter-Layer Scheduling Framework for DNN Inference and Training Co-Support on Tiled Architecture Tiled architectures have emerged as a compelling platform for scaling deep neural network (DNN) execution, offering both compute density and communication efficiency. To harness their full potential, effective inter-layer scheduling is crucial for .....	MICRO-2025	A	3	2025-11-05 01:29:04.649Z
Nexus Machine: An Energy-Efficient Active Message Inspired Reconfigurable Architecture Modern reconfigurable architectures are increasingly favored for resource-constrained edge devices as they balance high performance, energy efficiency, and programmability well. However, their proficiency in handling regular compute patterns constrai...	MICRO-2025	A	3	2025-11-05 01:28:53.473Z
CrossBit: Bitwise Computing in NAND Flash Memory with Inter-Bitline Data Communication In- flash processing (IFP), which involves performing data computation inside NAND flash memory, holds high potential for improving the performance and energy efficiency of data-intensive application by minimizing data movement. Recent research has ....	MICRO-2025	A	3	2025-11-05 01:28:42.416Z
Multi-Dimensional ML-Pipeline Optimization in Cost-Effective Disaggregated Datacenter Machine learning (ML) pipelines deployed in datacenters are becoming increasingly complex and resource intensive, requiring careful optimizations to meet performance and latency requirements. Deployment in NUMA architectures with heterogeneous memory...	MICRO-2025	A	3	2025-11-05 01:28:31.379Z

Topics, recently active first