This study evaluates the performance of seven large language models (LLMs) in generating CAD-RADS 2.0 scores from cardiac CT reports, including all modifiers. The models, comprising both cloud-based and locally hosted solutions, were assessed for their ability to handle the complexity of CAD-RADS 2.0 classification, which includes plaque burden, high-risk plaque features, and ischemia. GPT-4o and Llama 3 70B demonstrated high accuracy (93 % and 92.5 %, respectively), while open-source models a…