AI in the Orthodontic Clinic: A Comparative Test of Large Language Models
A recent study in the European Journal of Orthodontics directly evaluated the potential of generative artificial intelligence for evidence-based practice. Researchers posed ten open-ended clinical orthodontics questions to four large language models (LLMs): Microsoft Bing, ChatGPT-4, Google Bard, and ChatGPT-3.5. The responses were scored for comprehensiveness, scientific accuracy, clarity, and relevance against a rubric based on consensus statements and systematic reviews. Microsoft Bing Chat delivered the highest average score (7.1 out of 10), statistically outperforming ChatGPT-3.5 and Google Bard. While all models showed promise, they also occasionally produced answers lacking in accuracy and depth, highlighting their current limitations as standalone clinical tools.
Why it might matter to you: For orthodontists, this research provides a critical, evidence-based benchmark for the AI tools increasingly discussed in clinical settings. It suggests that while certain models can be useful adjuncts for information retrieval, their outputs require rigorous verification against established literature. This underscores the necessity of maintaining expert clinical judgment in treatment planning and patient communication, even as technology evolves.
Source →Stay curious. Stay informed — with Science Briefing.
Always double check the original article for accuracy.
