Introduction: Large language models (LLMs) are developed to answer questions or follow instructions like humans do. New LLMs are introduced frequently. It is essential to benchmark the performance of current LLMs for clinical purposes. Our goal was to benchmark the performance of 31 open- and closed-source LLMs in automatically assigning coronary artery disease reporting and data system (CAD-RADS) categories from coronary CT angiography (CCTA) reports, using a zero-shot prompting approach.