Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown
Published: April 15, 2025
Language: Английский
Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown
Published: April 15, 2025
Language: Английский
Education Sciences, Journal Year: 2025, Volume and Issue: 15(2), P. 116 - 116
Published: Jan. 21, 2025
The automation of educational and instructional assessment plays a crucial role in enhancing the quality teaching management. In physics education, calculation problems with intricate problem-solving ideas pose challenges to intelligent grading tests. This study explores automatic through combination large language models prompt engineering. By comparing performance four strategies (one-shot, few-shot, chain thought, tree thought) within two model frameworks, namely ERNIEBot-4-turbo GPT-4o. finds that thought can better assess complex (N = 100, ACC ≥ 0.9, kappa > 0.8) reduce gap between different models. research provides valuable insights for assessments education.
Language: Английский
Citations
1Medical Records, Journal Year: 2025, Volume and Issue: 7(1), P. 201 - 205
Published: Jan. 10, 2025
Aim: The rapid evolution of artificial intelligence (AI) has revolutionized medicine, with tools like ChatGPT and Google Gemini enhancing clinical decision-making. ChatGPT's advancements, particularly GPT-4, show promise in diagnostics education. However, variability accuracy limitations complex scenarios emphasize the need for further evaluation these models medical applications. This study aimed to assess agreement between 4.o AI identifying bladder-related conditions, including neurogenic bladder, vesicoureteral reflux (VUR), posterior urethral valve (PUV). Material Method: study, conducted October 2024, compared AI's on 51 questions about VUR, PUV. Questions, randomly selected from pediatric surgery urology materials, were evaluated using metrics statistical analysis, highlighting models' performance agreement. Results: demonstrated similar across PUV questions, true response rates 66.7% 68.6%, respectively, no statistically significant differences (p>0.05). Combined all topics was 67.6%. Strong inter-rater reliability (κ=0.87) highlights their Conclusion: comparable ChatGPT-4.o key performance.
Language: Английский
Citations
0Frontiers in Medicine, Journal Year: 2025, Volume and Issue: 12
Published: March 18, 2025
Retinitis pigmentosa (RP) is a rare retinal dystrophy often underrepresented in ophthalmology education. Despite advancements diagnostics and treatments like gene therapy, RP knowledge gaps persist. This study assesses the efficacy of AI-assisted teaching using ChatGPT compared to traditional methods educating students about RP. A quasi-experimental was conducted with 142 medical randomly assigned control (traditional review materials) groups. Both groups attended lecture on completed pre- post-tests. Statistical analyses learning outcomes, times, response accuracy. significantly improved post-test scores (p < 0.001), but group required less time (24.29 ± 12.62 vs. 42.54 20.43 min, p 0.0001). The also performed better complex questions regarding advanced treatments, demonstrating AI's potential deliver accurate current information efficiently. enhances efficiency comprehension diseases hybrid educational model combining AI can address gaps, offering promising approach for modern
Language: Английский
Citations
0Neurological Research, Journal Year: 2025, Volume and Issue: unknown, P. 1 - 4
Published: March 20, 2025
Objectives OpenAI declared that GPT-4 performed better in academic and certain specialty areas. Medical licensing exams assess the clinical competence of doctors. We aimed to investigate for first time howChatGPT will perform Turkish Neurology Proficiency Exam.
Language: Английский
Citations
0Published: March 27, 2025
Language: Английский
Citations
0Journal of Orthopaedic Science, Journal Year: 2025, Volume and Issue: unknown
Published: April 1, 2025
Language: Английский
Citations
0Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, Journal Year: 2025, Volume and Issue: 15(2)
Published: April 9, 2025
ABSTRACT This paper reviews benchmarking methods for evaluating large language models (LLMs) in healthcare settings. It highlights the importance of rigorous to ensure LLMs' safety, accuracy, and effectiveness clinical applications. The review also discusses challenges developing standardized benchmarks metrics tailored healthcare‐specific tasks such as medical text generation, disease diagnosis, patient management. Ethical considerations, including privacy, data security, bias, are addressed, underscoring need multidisciplinary collaboration establish robust frameworks that facilitate reliable ethical use healthcare. Evaluation LLMs remains challenging due lack comprehensive datasets. Key concerns include model better explainability, all which impact overall trustworthiness
Language: Английский
Citations
0Communications in computer and information science, Journal Year: 2025, Volume and Issue: unknown, P. 75 - 84
Published: Jan. 1, 2025
Language: Английский
Citations
0Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown
Published: April 15, 2025
Language: Английский
Citations
0