Evaluating the Accuracy and Readability of ChatGPT-4o’s Responses to Patient-Based Questions about Keratoconus DOI
Ali Safa Balcı, Semih Çakmak

Ophthalmic Epidemiology, Год журнала: 2025, Номер unknown, С. 1 - 6

Опубликована: Март 28, 2025

This study aimed to evaluate the accuracy and readability of responses generated by ChatGPT-4o, an advanced large language model, frequently asked patient-centered questions about keratoconus. A cross-sectional, observational was conducted using ChatGPT-4o answer 30 potential that could be patients with The evaluated two board-certified ophthalmologists scored on a scale 1 5. Readability assessed Simple Measure Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease (FRE) scores. Descriptive, treatment-related, follow-up-related were analyzed, statistical comparisons between these categories performed. mean score for 4.48 ± 0.57 5-point Likert scale. interrater reliability, intraclass correlation coefficient 0.769, indicated strong level agreement. scores revealed SMOG 15.49 1.74, FKGL 14.95 1.95, FRE 27.41 9.71, indicating high education is required comprehend responses. There no significant difference in among different question (p = 0.161), but varied significantly, treatment-related being easiest understand. provides highly accurate keratoconus, though complexity its may limit accessibility general population. Further development needed enhance AI-generated medical content.

Язык: Английский

Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry DOI Creative Commons
Zeyneb Merve Özdemir, Emre Yapici

Journal of Esthetic and Restorative Dentistry, Год журнала: 2025, Номер unknown

Опубликована: Март 2, 2025

This study aimed to evaluate the reliability, consistency, and readability of responses provided by various artificial intelligence (AI) programs questions related Restorative Dentistry. Forty-five knowledge-based information 20 (10 patient-related 10 dentistry-specific) were posed ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, Chatsonic, Copilot, Gemini Advanced chatbots. The DISCERN questionnaire was used assess reliability; Flesch Reading Ease Flesch-Kincaid Grade Level scores utilized readability. Accuracy consistency determined based on chatbots' questions. Copilot demonstrated "good" while ChatGPT-3.5 showed "fair" reliability. Chatsonic exhibited highest "DISCERN total score" for questions, ChatGPT-4o performed best dentistry-specific No significant differences found in among chatbots (p > 0.05). accuracy (93.3%) had lowest (68.9%). ChatGPT-4 between repetitions. Performance AIs varied terms accuracy, when responding Dentistry promising results academic patient education applications. However, generally above recommended levels materials. utilization AI has an increasing impact aspects dentistry. Moreover, if restorative dentistry prove be reliable comprehensible, this may yield outcomes future.

Язык: Английский

Процитировано

0

Evaluating the Accuracy and Readability of ChatGPT-4o’s Responses to Patient-Based Questions about Keratoconus DOI
Ali Safa Balcı, Semih Çakmak

Ophthalmic Epidemiology, Год журнала: 2025, Номер unknown, С. 1 - 6

Опубликована: Март 28, 2025

This study aimed to evaluate the accuracy and readability of responses generated by ChatGPT-4o, an advanced large language model, frequently asked patient-centered questions about keratoconus. A cross-sectional, observational was conducted using ChatGPT-4o answer 30 potential that could be patients with The evaluated two board-certified ophthalmologists scored on a scale 1 5. Readability assessed Simple Measure Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease (FRE) scores. Descriptive, treatment-related, follow-up-related were analyzed, statistical comparisons between these categories performed. mean score for 4.48 ± 0.57 5-point Likert scale. interrater reliability, intraclass correlation coefficient 0.769, indicated strong level agreement. scores revealed SMOG 15.49 1.74, FKGL 14.95 1.95, FRE 27.41 9.71, indicating high education is required comprehend responses. There no significant difference in among different question (p = 0.161), but varied significantly, treatment-related being easiest understand. provides highly accurate keratoconus, though complexity its may limit accessibility general population. Further development needed enhance AI-generated medical content.

Язык: Английский

Процитировано

0