Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
Cheng Zhang, Jianyi Cheng, Ilia Shumailov, George Anthony Constantinides, Yiren Zhao
Main: Efficient Methods for NLP Main-poster Paper
Poster_Demo_Industry_Findings In-person 5: Efficient Methods for NLP (Poster)
Conference Room: East Foyer
Conference Time: December 09, 11:00-12:30 (+08) (Asia/Singapore)
Global Time: December 09, Poster_Demo_Industry_Findings In-person 5 (03:00-04:30 UTC)
TLDR:
You can open the
#paper-2422
channel in a separate window.
Abstract: