Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance DOI Creative Commons
Masab Mansoor, Andrew Ibrahim,

David J. Grindem

et al.

JMIRx Med, Journal Year: 2025, Volume and Issue: 6, P. e65263 - e65263

Published: March 19, 2025

Rural health care providers face unique challenges such as limited specialist access and high patient volumes, making accurate diagnostic support tools essential. Large language models like GPT-3 have demonstrated potential in clinical decision but remain understudied pediatric differential diagnosis. This study aims to evaluate the accuracy reliability of a fine-tuned model compared board-certified pediatricians rural settings. multicenter retrospective cohort analyzed 500 encounters (ages 0-18 years; n=261, 52.2% female) from organizations Central Louisiana between January 2020 December 2021. The (DaVinci version) was using OpenAI application programming interface trained on 350 encounters, with 150 reserved for testing. Five (mean experience: 12, SD 5.8 years) provided reference standard diagnoses. Model performance assessed accuracy, sensitivity, specificity, subgroup analyses. achieved an 87.3% (131/150 cases), sensitivity 85% (95% CI 82%-88%), specificity 90% 87%-93%), comparable pediatricians' 91.3% (137/150 cases; P=.47). Performance consistent across age groups (0-5 years: 54/62, 87%; 6-12 47/53, 89%; 13-18 30/35, 86%) common complaints (fever: 36/39, 92%; abdominal pain: 20/23, 87%). For rare diagnoses (n=20), slightly lower (16/20, 80%) (17/20, 85%; P=.62). demonstrates that can provide pediatricians, particularly presentations, care. Further validation diverse populations is necessary before implementation.

Language: Английский

Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance DOI Creative Commons
Masab Mansoor, Andrew Ibrahim,

David J. Grindem

et al.

JMIRx Med, Journal Year: 2025, Volume and Issue: 6, P. e65263 - e65263

Published: March 19, 2025

Rural health care providers face unique challenges such as limited specialist access and high patient volumes, making accurate diagnostic support tools essential. Large language models like GPT-3 have demonstrated potential in clinical decision but remain understudied pediatric differential diagnosis. This study aims to evaluate the accuracy reliability of a fine-tuned model compared board-certified pediatricians rural settings. multicenter retrospective cohort analyzed 500 encounters (ages 0-18 years; n=261, 52.2% female) from organizations Central Louisiana between January 2020 December 2021. The (DaVinci version) was using OpenAI application programming interface trained on 350 encounters, with 150 reserved for testing. Five (mean experience: 12, SD 5.8 years) provided reference standard diagnoses. Model performance assessed accuracy, sensitivity, specificity, subgroup analyses. achieved an 87.3% (131/150 cases), sensitivity 85% (95% CI 82%-88%), specificity 90% 87%-93%), comparable pediatricians' 91.3% (137/150 cases; P=.47). Performance consistent across age groups (0-5 years: 54/62, 87%; 6-12 47/53, 89%; 13-18 30/35, 86%) common complaints (fever: 36/39, 92%; abdominal pain: 20/23, 87%). For rare diagnoses (n=20), slightly lower (16/20, 80%) (17/20, 85%; P=.62). demonstrates that can provide pediatricians, particularly presentations, care. Further validation diverse populations is necessary before implementation.

Language: Английский

Citations

1