Large language models (LLMs), exemplified by generative pre-trained transformers (GPT) such as ChatGPT (OpenAI) [1], have marked a significant advancement in artificial intelligence by demonstrating exceptional capabilities in natural language processing tasks [2–4]. These models have generated considerable interest due to their potential to transform medical practice [1,5]. Recent developments have introduced multimodal-LLMs that extend beyond text analysis [6–8].