Comparing ChatGPT 3.5 and 4.0 in Low Back Pain Patient Education: Addressing Strengths, Limitations, and Psychosocial Challenges DOI Creative Commons
Alper Tabanlı, Nihat Demirhan Demirkıran

World Neurosurgery, Journal Year: 2025, Volume and Issue: 196, P. 123755 - 123755

Published: March 6, 2025

Artificial intelligence tools like ChatGPT have gained attention for their potential to support patient education by providing accessible, evidence-based information. This study compares the performance of 3.5 and 4.0 in answering common questions about low back pain, focusing on response quality, readability, adherence clinical guidelines, while also addressing models' limitations managing psychosocial concerns. Thirty frequently asked pain were categorized into 4 groups: Diagnosis, Treatment, Psychosocial Factors, Management Approaches. Responses generated evaluated 3 key metrics: 1) quality: rated a scale 1 (excellent) (unsatisfactory); 2) DISCERN criteria: evaluating reliability with scores ranging from (low reliability) 5 (high reliability; 3) readability: assessed using 7 readability formulas, including Flesch-Kincaid Gunning Fog Index. significantly outperformed quality across all categories, mean score 1.03 compared 2.07 (P < 0.001). demonstrated higher (4.93 vs. 4.00, P However, both versions struggled factor questions, where responses lower than = 0.04). concerns highlight need clinician oversight, particularly emotionally sensitive issues. Enhancing artificial intelligence's capability aspects care should be priority future iterations.

Language: Английский

Comparing ChatGPT 3.5 and 4.0 in Low Back Pain Patient Education: Addressing Strengths, Limitations, and Psychosocial Challenges DOI Creative Commons
Alper Tabanlı, Nihat Demirhan Demirkıran

World Neurosurgery, Journal Year: 2025, Volume and Issue: 196, P. 123755 - 123755

Published: March 6, 2025

Artificial intelligence tools like ChatGPT have gained attention for their potential to support patient education by providing accessible, evidence-based information. This study compares the performance of 3.5 and 4.0 in answering common questions about low back pain, focusing on response quality, readability, adherence clinical guidelines, while also addressing models' limitations managing psychosocial concerns. Thirty frequently asked pain were categorized into 4 groups: Diagnosis, Treatment, Psychosocial Factors, Management Approaches. Responses generated evaluated 3 key metrics: 1) quality: rated a scale 1 (excellent) (unsatisfactory); 2) DISCERN criteria: evaluating reliability with scores ranging from (low reliability) 5 (high reliability; 3) readability: assessed using 7 readability formulas, including Flesch-Kincaid Gunning Fog Index. significantly outperformed quality across all categories, mean score 1.03 compared 2.07 (P < 0.001). demonstrated higher (4.93 vs. 4.00, P However, both versions struggled factor questions, where responses lower than = 0.04). concerns highlight need clinician oversight, particularly emotionally sensitive issues. Enhancing artificial intelligence's capability aspects care should be priority future iterations.

Language: Английский

Citations

0