A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis DOI Creative Commons
Junxiu Zhang,

Yao Ma,

Rong Zhang

et al.

Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)

Published: Dec. 5, 2024

Artificial intelligence (AI), particularly large language models like GPT-4o, holds promise for enhancing diagnostic accuracy in healthcare. This study evaluates the performance of GPT-4o compared to human ophthalmologists glaucoma cases. A prospective, observational was conducted at a tertiary care ophthalmology center. Twenty-six cases, including both primary and secondary types, were selected from publicly available databases institutional records. The cases analyzed by three with varying levels experience. completeness differential diagnoses assessed using 10-point 6-point Likert scales, respectively. Statistical analyses performed nonparametric methods, Kruskal–Wallis Mann–Whitney U tests. significantly less accurate diagnosis ophthalmologists. Specifically, achieved mean score 5.500 (p < 0.001) Doctor C, who had highest 8.038 0.001). Completeness scores 3.077 also lower than B, lowest 3.615 among However, diagnosis, (7.577) showed comparable (7.615) C (7.673) 0.0001) while achieving (4.096), outperforming (3.846), (2.923), B (2.808) 0.0001). AI, is currently not an acceptable standalone method diagnosing due its clinicians. These findings suggest that could serve as valuable adjunct clinical practice, complex but should replace expertise, especially initial diagnoses. Future improvements AI enhance their utility ophthalmology.

Language: Английский

Research on Intelligent Grading of Physics Problems Based on Large Language Models DOI Creative Commons

Yanan Wei,

Rui Zhang, Jianwei Zhang

et al.

Education Sciences, Journal Year: 2025, Volume and Issue: 15(2), P. 116 - 116

Published: Jan. 21, 2025

The automation of educational and instructional assessment plays a crucial role in enhancing the quality teaching management. In physics education, calculation problems with intricate problem-solving ideas pose challenges to intelligent grading tests. This study explores automatic through combination large language models prompt engineering. By comparing performance four strategies (one-shot, few-shot, chain thought, tree thought) within two model frameworks, namely ERNIEBot-4-turbo GPT-4o. finds that thought can better assess complex (N = 100, ACC ≥ 0.9, kappa > 0.8) reduce gap between different models. research provides valuable insights for assessments education.

Language: Английский

Citations

1

Mapping artificial intelligence models in emergency medicine: A scoping review on artificial intelligence performance in emergency care and education DOI Creative Commons
Göksu Bozdereli Berikol, Altuğ Kanbakan, Buğra İlhan

et al.

Turkish Journal of Emergency Medicine, Journal Year: 2025, Volume and Issue: 25(2), P. 67 - 91

Published: April 1, 2025

Artificial intelligence (AI) is increasingly improving the processes such as emergency patient care and medicine education. This scoping review aims to map use performance of AI models in regarding concepts. The findings show that AI-based medical imaging systems provide disease detection with 85%-90% accuracy techniques X-ray computed tomography scans. In addition, AI-supported triage were found be successful correctly classifying low- high-urgency patients. education, large language have provided high rates evaluating exams. However, there are still challenges integration into clinical workflows model generalization capacity. These demonstrate potential updated models, but larger-scale studies needed.

Language: Английский

Citations

0

AI-Driven Information for Relatives of Patients with Malignant Middle Cerebral Artery Infarction: A Preliminary Validation Study Using GPT-4o DOI Creative Commons
Mejdeddine Al Barajraji, Sami Barrit, Nawfel Ben‐Hamouda

et al.

Brain Sciences, Journal Year: 2025, Volume and Issue: 15(4), P. 391 - 391

Published: April 11, 2025

Purpose: This study examines GPT-4o’s ability to communicate effectively with relatives of patients undergoing decompressive hemicraniectomy (DHC) after malignant middle cerebral artery infarction (MMCAI). Methods: GPT-4o was asked 25 common questions from patients’ about DHC for MMCAI, twice over a 7-day interval. Responses were rated accuracy, clarity, relevance, completeness, sourcing, and usefulness by board-certified intensivist* (one), neurologists, neurosurgeons using the Quality Analysis Medical AI (QAMAI) tool. Interrater reliability stability measured ICC Pearson’s correlation. Results: The total QAMAI scores 22.32 ± 3.08 intensivist, 24.68 2.8 neurologist, 23.36 2.86 26.32 2.91 neurosurgeons, representing moderate-to-high accuracy. evaluators reported moderate (0.631, 95% CI: 0.321–0.821). highest subscores categories relevance while poorest associated usefulness, sourcing. did not systematically provide references their responses. analysis stability. readability assessment revealed an FRE score 7.23, FKG 15.87 GF index 18.15. Conclusions: provides quality information related strengths in relevance. However, limitations may impact its effectiveness patient or relatives’ education.

Language: Английский

Citations

0

Inferring Drug–Gene Relationships in Cancer Using Literature-Augmented Large Language Models DOI Creative Commons
Ying-Ju Lai, L Wang, Tyler M. Yasaka

et al.

Cancer Research Communications, Journal Year: 2025, Volume and Issue: 5(4), P. 706 - 718

Published: April 1, 2025

Abstract Understanding drug–gene relationships is essential for advancing targeted cancer therapies and drug repurposing strategies. However, the vast volume of biomedical literature poses significant challenges in efficiently extracting relevant insights. In this study, we developed an automated pipeline that leverages retrieval-augmented large language models (LLM) to infer interactions using most up-to-date literature. By integrating PubMed state-of-the-art LLMs, our generates accurate, evidence-based inferences while addressing limitations static such as outdated knowledge risk producing misleading results. We systematically validated pipeline’s performance curated databases demonstrated its ability accurately identify both well-established emerging targets. Using pipeline, constructed a pan-cancer interaction network among hundreds FDA-approved drugs key oncogenes. case study on liver cancer, identified association between CTNNB1 mutations enhanced sensitivity sorafenib, highlighting potential therapeutic strategy challenging mutation. To facilitate broad accessibility, GeneRxGPT, user-friendly web application enables researchers utilize without programming expertise or extensive computational resources. It provides intuitive modules inference visualization, streamlining exploration interpretation relationships. anticipate GeneRxGPT will empower accelerate discovery development, making it valuable resource research community. Significance: This presents novel approach integrates LLMs with real-time uncover relationships, transforming how targets, repurpose drugs, interpret complex molecular interactions. tool, leverage requiring expertise.

Language: Английский

Citations

0

Feasibility of real-time compression frequency and compression depth assessment in CPR using a “machine-learning” artificial intelligence tool DOI Creative Commons

Hannes Ecker,

Niels-Benjamin Adams,

Michael Schmitz

et al.

Resuscitation Plus, Journal Year: 2024, Volume and Issue: 20, P. 100825 - 100825

Published: Nov. 5, 2024

Language: Английский

Citations

1

AI-Powered clinical assessments: GPT-4o’s role in standardizing CPR skill evaluations DOI
Federico Semeraro

Resuscitation, Journal Year: 2024, Volume and Issue: 204, P. 110411 - 110411

Published: Oct. 10, 2024

Language: Английский

Citations

0

Assessing the ability of GPT-4o to visually recognize medications and provide patient education DOI Creative Commons
Amjad H. Bazzari, Firas H. Bazzari

Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)

Published: Nov. 5, 2024

Various studies have investigated the ability of ChatGPT (OpenAI) to provide medication information; however, a new promising feature has now been added, which allows visual input and is yet be evaluated. Here, we aimed qualitatively assess its visually recognize medications, through picture input, patient education via written output. The responses were evaluated by accuracy, precision clarity using 4-point Likert-like scale. In regards handling providing responses, GPT-4o was able all 20 tested medications from packaging pictures, even with blurring, retrieve their active ingredients, identify formulations dosage forms detailed, concise enough, in an almost completely accurate, precise clear manner score 3.55 ± 0.605 (85%). contrast, output generated images illustrating usage instructions contained many errors that would either hinder effectiveness or cause direct harm poor 1.5 0.577 (16.7%). conclusion, capable identifying pictures exhibits contrasting performance between very impressive scores, respectively.

Language: Английский

Citations

0

A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis DOI Creative Commons
Junxiu Zhang,

Yao Ma,

Rong Zhang

et al.

Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)

Published: Dec. 5, 2024

Artificial intelligence (AI), particularly large language models like GPT-4o, holds promise for enhancing diagnostic accuracy in healthcare. This study evaluates the performance of GPT-4o compared to human ophthalmologists glaucoma cases. A prospective, observational was conducted at a tertiary care ophthalmology center. Twenty-six cases, including both primary and secondary types, were selected from publicly available databases institutional records. The cases analyzed by three with varying levels experience. completeness differential diagnoses assessed using 10-point 6-point Likert scales, respectively. Statistical analyses performed nonparametric methods, Kruskal–Wallis Mann–Whitney U tests. significantly less accurate diagnosis ophthalmologists. Specifically, achieved mean score 5.500 (p < 0.001) Doctor C, who had highest 8.038 0.001). Completeness scores 3.077 also lower than B, lowest 3.615 among However, diagnosis, (7.577) showed comparable (7.615) C (7.673) 0.0001) while achieving (4.096), outperforming (3.846), (2.923), B (2.808) 0.0001). AI, is currently not an acceptable standalone method diagnosing due its clinicians. These findings suggest that could serve as valuable adjunct clinical practice, complex but should replace expertise, especially initial diagnoses. Future improvements AI enhance their utility ophthalmology.

Language: Английский

Citations

0