I am a Research Scientist at NVIDIA's Deep Imagination Research Group. My primary research area is High-Performance AI Architecture, which involves making fast and efficient AI, from neural network architecture down to computer architecture.
I've been working on Attention and Sparse Attention since about 2022. A large part of my work is available through NATTEN, an open-source project for fast multi-dimensional sparse attention (see my presentation on GPU MODE). In our most recent paper, Generalized Neighborhood Attention, we discuss outstanding challenges in sparse attention infrastructure, and our proposed solutions, which involve both an analytical study, and a fast implementation achieving the maximum speedup theoretically possible.
GitHub / Google Scholar / Twitter / LinkedIn / CV
Select publications
2025
2024
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Ali Hassani, Wen-Mei Hwu, and Humphrey Shi.
NeurIPS 2024.
2023
Neighborhood Attention Transformer
Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.
CVPR 2023.
2022
Experience
Since 10/2025
NVIDIA Research, Deep Imagination Research
Research Scientist
01/2024 - 10/2025
SHI Labs at Georgia Tech
Graduate Researcher
12/2024 - 07/2025
NVIDIA Research, Deep Imagination Research
Research Intern
06/2023 - 12/2023
Software Engineering Intern
03/2021 - 12/2023
SHI Labs at University of Oregon
Graduate Researcher
06/2022 - 09/2022
Picsart AI Research
Research Intern
2019 to 2021
University of Kerman, Mahani Mathematical Research Center
Undergraduate Researcher
Teaching
Spring 2021, Winter 2022, Winter 2023
Education