Large Language Models (LLMs) have emerged as powerful tools with the potential to transform both scientific research and clinical practice [1]. Trained on vast text corpora and refined through reinforcement learning, these systems can generate human-like responses, synthesize complex information, and perform structured evaluations. Their potential is particularly compelling for time-consuming and variability-prone tasks, such as auditing methodological quality in radiomics research [2,3]. Radiom…