2025

ETS: Efficient Tree Search for Inference-Time Scaling

Coleman Hooper, Sehoon Kim, Suhong Moon, Kerem Dilmen, Monishwaran Maheswaran, Nicholas Lee, Michael W. Mahoney, Sophia Shao, Kurt Keutzer, Amir Gholami

SuperNoVA: Algorithm-Hardware Co-Design for Resource-Aware SLAM

Seah Kim, Roger Hsiao, Borivoje Nikolic, James Demmel, Yakun Sophia Shao

Design Space Exploration of Embedded SoC Architectures for Real-Time Optimal Control

Kris Shengjun Dong, Dima Nikiforov, Widyadewi Soedarmadji, Minh Nguyen, Christopher Fletcher, Yakun Sophia Shao

Virgo: Cluster-level Matrix Unit Integration in GPUs for Scalability and Energy Efficiency

Hansung Kim, Ruohan Richard Yan, Joshua You, Tieliang Vamber Yang, Yakun Sophia Shao

LLM-Aided Compilation for Tensor Accelerators

Charles Hong, Sahil Bhatia, Altan Haan, Shengjun Kris Dong, Dima Nikiforov, Alvin Cheung, Yakun Sophia Shao

Stellar: An Automated Design Framework for Dense and Sparse Spatial Accelerators

Hasan Nazim Genc; Hansung Kim; Prashanth Ganesh; Yakun Sophia Shao

Design Approach for Die-to-Die Interfaces to Enable Energy-Efficient Chiplet Systems

Vikram Jain, Wei Tang, Zuoguo Wu, Viansa Schmulbach, Sophia Shao, Zhengya Zhang, Borivoje Nikolic

FireAxe: Partitioned FPGA-Accelerated Simulation of Large-Scale RTL Designs. ISCA 2024: 501-515

Joonho Whangbo; Edwin Lim; Chengyi Lux Zhang; Kevin Anderson; Abraham Gonzalez; Raghav Gupta

AuRORA: A Full-Stack Solution for Scalable and Virtualized Accelerator Integration

Seah Kim; Jerry Zhao; Krste Asanović; Borivoje Nikolić; Yakun Sophia Shao

Instruction Scheduling in the Saturn Vector Unit

Jerry Zhao, Daniel Grubb, Miles Rusch, Tianrui Wei, Kevin Anderson, Borivoje Nikolic, Krste Asanovic

A CVNN-Aided Anti-Interference Channel Estimation for Massive MIMO Systems

NeCTAr and RASoC: Tale of Two Class SoCs for Language Model Interference and Robotics in Intel 16

Viansa Schmulbach; Jason Kim; Ethan Gao; Nikhil Jha; Ethan Wu; Oliver Yu

An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep Networks

Vivek Bharadwaj, Austin Glover, Aydin Buluc, James Demmel

Fast multiplication of random dense matrices with sparse matrices

Tianyu Liang; Riley Murray; Aydın Buluç; James Demmel

Non-smooth Bayesian optimization in tuning scientific applications

Hengrui Luo, Younghyun Cho, James Weldon Demmel, Igor Kozachenko, Xiaoye S. Li, Yang Liu

Verified Code Transpilation with LLMs

Sahil Bhatia, Jie Qiu, Niranjan Hasabnis, Sanjit A. Seshia, Alvin Cheung

Controlled Preemption: Amplifying Side-Channel Attacks from Userspace

Yongye Zhu, Boru Chen, Zirui Neil Zhao, Christopher W. Fletcher

H-Houdini: Scalable Invariant Learning

Sushant Dinesh, Yongye Zhu, Christopher W. Fletcher

GoFetch: Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers

Boru Chen, University of Illinois Urbana-Champaign; Yingchen Wang, University of Texas at Austin; Pradyumna Shome, Georgia Institute of Technology; Christopher Fletcher, University of California, Berkeley; David Kohlbrenner, University of Washington; Riccardo Paccagnella, Carnegie Mellon University; Daniel Genkin, Georgia Institute of Technology

ConjunCT: Learning Inductive Invariants to Prove Unbounded Instruction Safety Against Microarchitectural Timing Attacks

Sushant Dinesh; Madhusudan Parthasarathy; Christopher W. Fletcher

FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design

Nandeeka Nayak, Xinrui Wu, Toluwanimi O. Odemuyiwa, Michael Pellauer, Joel S. Emer, Christopher W. Fletcher

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators (Abstract)

Nandeeka Nayak, Toluwanimi O. Odemuyiwa, Shubham Ugare, Christopher W. Fletcher, Michael Pellauer, Joel S. Emer

Sparsity-Aware Communication for Distributed Graph Neural Network Training

Ujjaini Mukhopadhyay, Alok Tripathy, Oguz Selvitopi, Katherine Yelick, Aydin Buluc

2024

Distributed-Memory Randomized Algorithms for Sparse Tensor CP Decomposition

Vivek Bharadwaj, Osman Asif Malik, Riley Murray, Aydin Buluç, James Demmel

RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs

Benjamin Brock, Aydin Buluç, Katherine A. Yelick

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Yakun Sophia Shao, Borivoje Nikolic, Koushil Sreenath

Zoomie: A Software-like Debugging Tool for FPGAs

Tianrui Wei, Kevin Laeufer, Katie Lim, Jerry Zhao, Koushik Sen, Jonathan Balkind, Krste Asanovic

RTL-Repair: Fast Symbolic Repair of Hardware Design Code

Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Yakun Sophia Shao, Borivoje Nikolic, Koushil Sreenath

Tenspiler: A Verified Lifting-Based Compiler for Tensor Operations

Jie Qiu, Colin Cai, Sahil Bhatia, Niranjan Hasabnis, Sanjit A. Seshia, Alvin Cheung

Next-Generation Domain-Specific Accelerators: From Hardware to System

Yakun Sophia Shao

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

AST-T5: Structure-Aware Pretraining for Code Generation and Understanding

Linyuan Gong, Mostafa Elhoushi, Alvin Cheung

2023

DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators

Charles Hong, Qijing Huang, Grace Dinh, Mahesh Subedar, Yakun Sophia Shao

AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads

Seah Kim, Jerry Zhao, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao

Distributed Matrix-Based Sampling for Graph Neural Network Training

Alok Tripathy, Katherine Yelick, Aydin Buluc

CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)

Maksim Melnichenko, Oleg Balabanov, Riley Murray, James Demmel, Michael W. Mahoney, Piotr Luszczek

Scalable Evidential K-Nearest Neighbor Classification on Big Data

James Demmel, Yang You

SPEED: Speculative Pipelined Execution for Efficient Decoding

Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao

Fast multiplication of random dense matrices with fixed sparse matrices

Tianyu Liang, Riley Murray, Aydın Buluç, James Demmel

Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems

Younghyun Cho, James W. Demmel, Michał Dereziński, Haoyun Li, Hengrui Luo, Michael W. Mahoney, Riley J. Murray

Code Transpilation for Hardware Accelerators

Yuto Nishida, Sahil Bhatia, Shadaj Laddad, Hasan Genc, Yakun Sophia Shao, Alvin Cheung

Towards Auto-Generated Data Systems

Alvin Cheung, Maaz Bin Safeer Ahmad, Brandon Haynes, Chanwut Kittivorawong, Shadaj Laddad, Xiaoxuan Liu, Chenglong Wang, Cong Yan

Harnessing the Crowd for Autotuning High-Performance Computing Applications

Younghyun Cho; James W. Demmel; Jacob King; Xiaoye S. Li; Yang Liu; Hengrui Luo

Building Code Transpilers for Domain-Specific Languages Using Program Synthesis (Experience Paper)

Sahil Bhatia, Sumer Kohli, Sanjit A. Seshia, Alvin Cheung

RoSÉ: A Hardware-Software Co-Simulation Infrastructure Enabling Pre-Silicon Full-Stack Robotics SoC Evaluation

Dima Nikiforov, Shengjun Chris Dong, Chengyi Lux Zhang, Seah Kim, Borivoje Nikolic, Yakun Sophia Shao

CDPU: Co-designing Compression and Decompression Processing Units for Hyperscale Systems

Sagar Karandikar, Aniruddha N. Udipi, Junsun Choi, Joonho Whangbo, Jerry Zhao, Svilen Kanev, Edwin Lim, Jyrki Alakuijala, Vrishab Madduri, Yakun Sophia Shao, Borivoje Nikolic, Krste Asanovic, Parthasarathy Ranganathan

Profiling Hyperscale Big Data Processing

Abraham Gonzalez, Aasheesh Kolli, Samira Manabi Khan, Sihang Liu, Vidushi Dadu, Sagar Karandikar, Jichuan Chang, Krste Asanovic, Parthasarathy Ranganathan

Nearly Optimal Block-Jacobi Preconditioning

James Demmel

An Improved Analysis and Unified Perspective on Deterministic and Randomized Low-Rank Matrix Approximation

James Demmel, Laura Grigori

Silicon Process Technology Constraints for Standardized Vertical Die-to-Die Interconnects

Harrison Liew; Farhana Sheikh; David Kehlet; Borivoje Nikolić

Guest Editorial Introduction to the Special Issue on the 2022 Symposium on VLSI Circuits

Borivoje Nikolic, Mototsugu Hamada

Simulator Independent Coverage for RTL Hardware Languages

Kevin Laeufer, Vighnesh Iyer, David Biancolin, Jonathan Bachrach

ADELT: Transpilation Between Deep Learning Frameworks

Linyuan Gong, Jiayi Wang, Alvin Cheung

Full Stack Optimization of Transformer Inference: a Survey

Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami

Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software

Riley Murray, James Demmel, Michael W. Mahoney, N. Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik, Laura Grigori, Piotr Luszczek, Michał Dereziński, Miles E. Lopes, Tianyu Liang, Hengrui Luo, Jack Dongarra

Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition

Vivek Bharadwaj, Osman Asif Malik, Riley Murray, Laura Grigori, Aydin Buluc, James Demmel

2022

Cerberus: A Formal Approach to Secure and Efficient Enclave Memory Sharing

Dayeol Lee, Kevin Cheang, Alexander Thomas, Catherine Lu, Pranav Gaddamadugu, Anjo Vahldiek-Oberwagner, Mona Vij, Dawn Song, Sanjit A. Seshia, Krste Asanović

ML for Analog Design: Good Progress, but More to Do

Borivoje Nikolić

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, Aydin Buluç

Hammer: a modular and reusable physical design flow tool: invited

Harrison Liew, Daniel Grubb, John Wright, Colin Schmidt, Nayiri Krzysztofowicz, Adam M. Izraelevitz, Edward Wang, Krste Asanovic, Jonathan Bachrach, Borivoje Nikolic

Distributed-Memory Sparse Kernels for Machine Learning

Vivek Bharadwaj; Aydın Buluç; James Demmel

Hybrid Models for Mixed Variables in Bayesian Optimization

Hengrui Luo, Younghyun Cho, James W. Demmel, Xiaoye S. Li, Yang Liu

Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design

Qijing Huang; Charles Hong; John Wawrzynek; Mahesh Subedar; Yakun Sophia Shao

2021

BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper

Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, and Aydın Buluç

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert J. Ou, Colin Schmidt, Samuel Steffl, John Charles Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao

Verifying RISC-V Physical Memory Protection

Kevin Cheang, Cameron Rasmussen, Dayeol Lee, David W. Kohlbrenner, Krste Asanovic, Sanjit A. Seshia

A 16mm2 106.1 GOPS/W Heterogeneous RISC-V Multi-Core Multi-Accelerator SoC in Low-Power 22nm FinFET

Abraham Gonzalez, Jerry Zhao, Ben Korpan, Hasan Genc, Colin Schmidt, John Charles Wright, Ayan Biswas, Alon Amid, Farhana Sheikh, Anton Sorokin, Sirisha Kale, Mani Yalamanchi, Ramya Yarlagadda, Mark Flannigan, Larry Abramowitz, Elad Alon, Yakun Sophia Shao, Krste Asanovic, Borivoje Nikolic

Automated Design of Analog Circuits Using Reinforcement Learning

Keertana Settaluri, Zhaokai Liu, Rishubh Khurana, Arash Mirhaj, Rajeev Jain, Borivoje Nikolic

A Hardware Accelerator for Protocol Buffers

Sagar Karandikar, Chris Leary, Chris Kennelly, Jerry Zhao, Dinesh Parimi, Borivoje Nikolic, Krste Asanovic, Parthasarathy Ranganathan

A Hardware Accelerator for Protocol Buffers

Sagar Karandikar, Chris Leary, Chris Kennelly, Jerry Zhao, Dinesh Parimi, Borivoje Nikolic, Krste Asanovic, Parthasarathy Ranganathan

Scaling Generalized N-Body Problems, A Case Study from Genomics

Marquita Ellis, Aydin Buluc, Katherine Yelick

A PACTful Agenda for Cloud Programming Research: (Invited Talk)

Alvin Cheung

An Automated and Process-Portable Generator for Phase-Locked Loop

Zhongkai Wang, Minsoo Choi, Eric Chang, John Charles Wright, Wooham Bae, Sijun Du, Zhaokai Liu, Nathan Narevsky, Colin Schmidt, Ayan Biswas, Borivoje Nikolic, Elad Alon

Accessible, FPGA Resource-Optimized Simulation of Multiclock Systems in FireSim

David Biancolin, Albert Magyar, Sagar Karandikar, Alon Amid, Borivoje Nikolic, Jonathan Bachrach, Krste Asanovic

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, Yakun Sophia Shao

Vertically Integrated Computing Labs Using Open-Source Hardware Generators and Cloud-Hosted FPGAs

Alon Amid, Albert J. Ou, Krste Asanovic, Yakun Sophia Shao, Borivoje Nikolic

COBRA: A Framework for Evaluating Compositions of Hardware Branch Predictors

Jerry Zhao, Abraham Gonzalez, Alon Amid, Sagar Karandikar, Krste Asanovic

Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms

Jingyi Xu, Sehoon Kim, Borivoje Nikolic

4-3-an-eight-core-1-44ghz-risc-v-vector-machine-in-16nm-finfet

Colin Schmidt, John Charles Wright, Zhongkai Wang, Eric Chang, Albert J. Ou, Woo-Rham Bae, Sean Huang, Anita Flynn, Brian C. Richards, Krste Asanovic, Elad Alon

GPTune: multitask learning for autotuning exascale applications

Yang Liu, Wissam M. Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James W. Demmel, Xiaoye S. Li

OSZAR »