The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma DOI
Levent Doğan, İ̇brahim Edhem Yılmaz

European Journal of Ophthalmology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 19, 2025

Purpose To evaluate the appropriateness and readability of responses generated by ChatGPT-4 Bing Chat to frequently asked questions about glaucoma. Method Thirty-four were for this study. Each question was directed three times a fresh interface. The obtained categorised two glaucoma specialists in terms their appropriateness. Accuracy evaluated using Structure Observed Learning Outcome (SOLO) taxonomy. Readability assessed Flesch Reading Ease (FRE), Kincaid Grade Level (FKGL), Coleman-Liau Index (CLI), Simple Measure Gobbledygook (SMOG), Gunning- Fog (GFI). Results percentage appropriate 88.2% (30/34) 79.2% (27/34) Chat, respectively. Both interfaces provided at least one inappropriate response 1 34 questions. SOLO test results ChatGPT-3.5 3.86 ± 0.41 3.70 0.52, No statistically significant difference performance observed between both LLMs ( p = 0.101). mean count words used when generating 316.5 (± 85.1) 61.6 25.8) respectively < 0.05). According FRE scores, suitable only 4.5% 33% U.S. adults Conclusions consistently had low but more difficult readability.

Language: Английский

The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma DOI
Levent Doğan, İ̇brahim Edhem Yılmaz

European Journal of Ophthalmology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 19, 2025

Purpose To evaluate the appropriateness and readability of responses generated by ChatGPT-4 Bing Chat to frequently asked questions about glaucoma. Method Thirty-four were for this study. Each question was directed three times a fresh interface. The obtained categorised two glaucoma specialists in terms their appropriateness. Accuracy evaluated using Structure Observed Learning Outcome (SOLO) taxonomy. Readability assessed Flesch Reading Ease (FRE), Kincaid Grade Level (FKGL), Coleman-Liau Index (CLI), Simple Measure Gobbledygook (SMOG), Gunning- Fog (GFI). Results percentage appropriate 88.2% (30/34) 79.2% (27/34) Chat, respectively. Both interfaces provided at least one inappropriate response 1 34 questions. SOLO test results ChatGPT-3.5 3.86 ± 0.41 3.70 0.52, No statistically significant difference performance observed between both LLMs ( p = 0.101). mean count words used when generating 316.5 (± 85.1) 61.6 25.8) respectively < 0.05). According FRE scores, suitable only 4.5% 33% U.S. adults Conclusions consistently had low but more difficult readability.

Language: Английский

Citations

0