NECSTSpecial Talk - Prefix Sum Algorithms using Matrix Multiplication Accelerators: A Case Study on the Ascend AI Accelerator

Speaker: Anastasios Zouzias
DEIB - NECSTLab Meeting Room (Bld. 20)
Online by Zoom
June 6th, 2025 | 2.00 pm
DEIB - NECSTLab Meeting Room (Bld. 20)
Online by Zoom
June 6th, 2025 | 2.00 pm
Contact: Prof. Davide Conficconi
Sommario
On June 6th, 2025, at 2.00 pm we will have a new talk of the NECSTSpecialTalk series titled "Prefix Sum Algorithms using Matrix Multiplication Accelerators: A Case Study on the Ascend AI Accelerator" at DEIB NECSTLab Meeting Room (Building 20) and on line by Zoom.
During this talk, we will have, as speaker, Anastasios Zouzias, Researcher in the Computing Systems Lab at the Huawei Zurich Research Centre in Switzerland.
This talk will explore hardware-aware algorithm design for modern high-performance AI accelerators. The focus of the talk will be on a well-studied parallel paradigm known as prefix sum (or scan), examining its challenges and opportunities in this context.
I will discuss my experience designing and implementing parallel scan algorithms for Huawei's Ascend AI accelerators. Ascend accelerators feature specialized computing units—the cube units for efficient matrix multiplication and the vector units for optimized vector operations. A key feature of the proposed scan algorithms is their extensive use of matrix multiplications and accumulations enabled by the cube unit. To showcase the effectiveness of these algorithms, we also implement and evaluate several scan-based AI operators commonly used in AI/LLM workloads, including sorting, tensor masking, and top-k / top-p (nucleus) sampling.
Anastasios Zouzias is a researcher in the Computing Systems Lab at the Huawei Zurich Research Centre in Switzerland. His work focuses on parallel computing (software and hardware architecture), algorithms, machine learning, and AI accelerators.Prior to joining Huawei, Anastasios built his expertise as a data scientist and big data engineer in the insurance and telecommunications industries. He also served as a research staff member and postdoctoral researcher at IBM Research in Zurich. Anastasios earned his Ph.D. in Computer Science from the University of Toronto in 2013, with a focus on randomized linear algebra for large-scale data analysis and computational efficiency.
During this talk, we will have, as speaker, Anastasios Zouzias, Researcher in the Computing Systems Lab at the Huawei Zurich Research Centre in Switzerland.
This talk will explore hardware-aware algorithm design for modern high-performance AI accelerators. The focus of the talk will be on a well-studied parallel paradigm known as prefix sum (or scan), examining its challenges and opportunities in this context.
I will discuss my experience designing and implementing parallel scan algorithms for Huawei's Ascend AI accelerators. Ascend accelerators feature specialized computing units—the cube units for efficient matrix multiplication and the vector units for optimized vector operations. A key feature of the proposed scan algorithms is their extensive use of matrix multiplications and accumulations enabled by the cube unit. To showcase the effectiveness of these algorithms, we also implement and evaluate several scan-based AI operators commonly used in AI/LLM workloads, including sorting, tensor masking, and top-k / top-p (nucleus) sampling.
Anastasios Zouzias is a researcher in the Computing Systems Lab at the Huawei Zurich Research Centre in Switzerland. His work focuses on parallel computing (software and hardware architecture), algorithms, machine learning, and AI accelerators.Prior to joining Huawei, Anastasios built his expertise as a data scientist and big data engineer in the insurance and telecommunications industries. He also served as a research staff member and postdoctoral researcher at IBM Research in Zurich. Anastasios earned his Ph.D. in Computer Science from the University of Toronto in 2013, with a focus on randomized linear algebra for large-scale data analysis and computational efficiency.