Postgraduate Medical Journal, Journal Year: 2024, Volume and Issue: unknown
Published: July 17, 2024
Language: Английский
Postgraduate Medical Journal, Journal Year: 2024, Volume and Issue: unknown
Published: July 17, 2024
Language: Английский
Medical Science Educator, Journal Year: 2025, Volume and Issue: unknown
Published: Feb. 15, 2025
Language: Английский
Citations
0Cureus, Journal Year: 2025, Volume and Issue: unknown
Published: Feb. 17, 2025
Introduction This study explored the potential of using large language models (LLMs) to generate multiple-choice questions (MCQs) for Japanese National Licensure Examination Physical Therapists. Specifically, it evaluated performance a customized ChatGPT (OpenAI, San Francisco, CA, USA) model named "Physio Exam Generative Pre-trained Transformers (GPT)" in generating high-quality MCQs non-English contexts. Materials and methods Based on data extracted from 57th 58th Therapists, 340 MCQs, including correct answers, explanations, associated topics, were incorporated into knowledge base GPTs. The prompts outputs conducted Japanese. generated covered major topics general (anatomy, physiology, kinesiology) practical (musculoskeletal disorders, central nervous system internal organ disorders). quality their explanations by two independent reviewers 10-point Likert scale across five criteria: clarity, relevance clinical practice, suitability difficulty, distractors, adequacy rationale. Results achieved 100% accuracy both questions. average scores evaluation criteria ranged 7.0 9.8 6.7 Although some areas exhibited lower scores, overall results favorable. Conclusions demonstrates LLMs efficiently even environments such as These findings suggest that can adapt diverse linguistic settings, reduce educators' workload, improve educational resources. lay foundation expanding application settings non-English-speaking regions.
Language: Английский
Citations
0Advances in Medical Education and Practice, Journal Year: 2025, Volume and Issue: Volume 16, P. 331 - 339
Published: Feb. 1, 2025
As the rapid development of large language model, artificial intelligence generated content (AIGC) presents novel opportunities for constructing medical examination questions. However, it is unclear about way effectively utilizing AIGC designing characterized by its response capabilities and high efficiency, as well good performance in mimicking clinical realities. In this study, we revealed limitations inherent paper-based examinations, provided a streamlined instruction generating questions using AIGC, with particular focus on multiple-choice questions, case study video Manual review remains necessary to ensure accuracy quality content. Future will be benefited from technologies like retrieval augmented generation, multi-agent system, generation technology. continues evolve, anticipated bring transformative changes enhancing preparation, contributing effective cultivation students.
Language: Английский
Citations
0BMC Medical Education, Journal Year: 2025, Volume and Issue: 25(1)
Published: Feb. 25, 2025
Abstract Background Recent advancements in generative artificial intelligence (AI) have opened new avenues educational methodologies, particularly medical education. This study seeks to assess whether AI might be useful addressing the depletion of assessment question banks, a challenge intensified during Covid-era due prevalence open-book examinations, and augment pool formative opportunities available students. While many recent publications sought ascertain can achieve passing standard existing this investigates potential for generate exam itself. Summary work research utilized commercially large language model (LLM), OpenAI GPT-4, 220 single best answer (SBA) questions, adhering Medical Schools Council Assessment Alliance guidelines selection Learning Outcomes (LOs) Scottish Graduate-Entry Medicine (ScotGEM) program. All questions were assessed by an expert panel accuracy quality. A total 50 AI-generated human-authored used create two 50-item SBA examinations Year 1 2 ScotGEM Each exam, delivered via Speedwell eSystem, comprised 25 presented random order. Students completed online, closed-book exams on personal devices under conditions that reflected summative examinations. The performance both was evaluated, focusing facility discrimination index as key metrics. results screening process revealed 69% SBAs fit inclusion with little or no modifications required. Modifications, when necessary, predominantly reasons such "all above" options, usage American English spellings, non-alphabetized choices. 31% rejected factual inaccuracies non-alignment students’ learning. When included examination, post hoc statistical analysis indicated significant difference between AI- human- authored terms index. Discussion conclusion outcomes suggest LLMs are line best-practice specific LOs. However, robust quality assurance is necessary ensure erroneous identified rejected. insights gained from provide foundation further investigation into refining prompts, aiming more reliable generation curriculum-aligned questions. show supplementing traditional methods approach offers viable solution rapidly replenish diversify resources curricula, marking step forward intersection
Language: Английский
Citations
0European Journal of Therapeutics, Journal Year: 2025, Volume and Issue: 31(1), P. 28 - 34
Published: Feb. 28, 2025
Objectives: The aim of this study is to compare the ability artificial intelligence-based chatbots, ChatGPT-4o and Claude 3.5, interpret mammography images. focuses on evaluating their accuracy consistency in BI-RADS classification breast parenchymal type assessment. It also aims explore potential these technologies reduce radiologists’ workload identify limitations medical image analysis. Methods: A total 53 images obtained between January July 2024 were analyzed, focusing same anonymized provided both chatbots under identical prompts. Results: results showed rates for ranging from 18.87% 26.42% 18.7% 3.5. When categories grouped into benign group(BI-RADS 1,2) malignant 4,5), combined was 57.5% (initial evaluation) 55% (second evaluation), compared 47.5% Breast 30.19% 22.64% ChatGPT-4o, Conclusions: findings indicate that demonstrate limited reliability interpreting These highlight need further optimization, larger datasets, advanced training processes improve performance
Language: Английский
Citations
0Software Impacts, Journal Year: 2025, Volume and Issue: 23, P. 100742 - 100742
Published: March 1, 2025
Language: Английский
Citations
0Interactive Learning Environments, Journal Year: 2025, Volume and Issue: unknown, P. 1 - 20
Published: March 24, 2025
Language: Английский
Citations
0Clinical Anatomy, Journal Year: 2025, Volume and Issue: unknown
Published: March 24, 2025
ABSTRACT Developing high‐quality multiple‐choice questions (MCQs) for medical school exams is effortful and time‐consuming. In this study, we investigated the ability of ChatGPT to generate case‐based anatomy MCQs with acceptable levels item difficulty discrimination exams. We used an endocrine urogenital system exam based on a framework artificial intelligence (AI)‐assisted generation. The were evaluated by experts, approved department, administered 502 second‐year students (372 Turkish‐language, 130 English‐language). items analyzed determine indices. indices ranged from 0.29 0.54, indicating differentiation between high‐ low‐performing students. All in Turkish (six out six) five six English met higher threshold (≥ 0.30) required large‐scale standardized tests. 0.41 0.89, most falling within moderate range (0.20–0.80). Therefore, it was concluded that can psychometric properties, offering promising tool educators. However, human expertise remains crucial reviewing refining AI‐generated assessment items. Future research should explore across various topics investigate different AI models question
Language: Английский
Citations
0Frontiers in Artificial Intelligence, Journal Year: 2025, Volume and Issue: 8
Published: March 31, 2025
Kawasaki disease (KD) presents complex clinical challenges in diagnosis, treatment, and long-term management, requiring a comprehensive understanding by both parents healthcare providers. With advancements artificial intelligence (AI), large language models (LLMs) have shown promise supporting medical practice. This study aims to evaluate compare the appropriateness comprehensibility of different LLMs answering clinically relevant questions about KD assess impact prompting strategies. Twenty-five were formulated, incorporating three strategies: No (NO), Parent-friendly (PF), Doctor-level (DL). These input into LLMs: ChatGPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. Responses evaluated based on appropriateness, educational quality, comprehensibility, cautionary statements, references, potential misinformation, using Information Quality Grade, Global Scale (GQS), Flesch Reading Ease (FRE) score, word count. Significant differences found among terms response accuracy, (p < 0.001). provided highest proportion completely correct responses (51.1%) achieved median GQS score (5.0), outperforming GPT-4o (4.0) (3.0) significantly. FRE (31.5) assessed as comprehensible (80.4%). Prompting strategies significantly affected LLM responses. Sonnet with DL had rate (81.3%), while PF yielded most acceptable (97.3%). Pro showed minimal variation across prompts but excelled (98.7% under prompting). indicates that great providing information KD, their use requires caution due quality inconsistencies misinformation risks. discrepancies existed offered best comprehensibility. is recommended for seeking information. As AI evolves, expanding research refining crucial ensure reliable, high-quality
Language: Английский
Citations
0Journal of medical imaging and radiation sciences, Journal Year: 2025, Volume and Issue: 56(4), P. 101896 - 101896
Published: April 1, 2025
Language: Английский
Citations
0