Large language models for generating script concordance test in obstetrics and gynecology: ChatGPT and Claude DOI
Zuhal Yapıcı Çoşkun, Yavuz Selim Kıyak, Özlem Çoşkun

et al.

Medical Teacher, Journal Year: 2025, Volume and Issue: unknown, P. 1 - 5

Published: April 30, 2025

To evaluate the performance of large language models (ChatGPT-4o and Claude 3.5 Sonnet) to generate script concordance test (SCT) items for assessing clinical reasoning in obstetrics gynecology. This cross-sectional study involved generation SCT five common diagnostic topics gynecology primary care settings. A total 16 panelists evaluated AI-generated against 11 predefined criteria. Descriptive statistics were used compare models' across ChatGPT-4o had an overall agreement rate 90.57% meeting quality criteria, while Sonnet achieved 91.48%. The criterion with lowest scores was "The scenario is appropriate difficulty medical students," rated at 71.25% 76.25%. Large can that effectively assess reasoning; however, further refinement required ensure level students. These findings highlight potential AI enhance efficiency within

Language: Английский

Beginner-Level Tips for Medical Educators: Guidance on Selection, Prompt Engineering, and the Use of Artificial Intelligence Chatbots DOI
Yavuz Selim Kıyak

Medical Science Educator, Journal Year: 2024, Volume and Issue: 34(6), P. 1571 - 1576

Published: Aug. 17, 2024

Language: Английский

Citations

4

Applications of Artificial Intelligence in Medical Education: A Systematic Review DOI Open Access

Eric Hallquist,

Ishank Gupta,

Michael Montalbano

et al.

Cureus, Journal Year: 2025, Volume and Issue: unknown

Published: March 1, 2025

Artificial intelligence (AI) models, like Chat Generative Pre-Trained Transformer (OpenAI, San Francisco, CA), have recently gained significant popularity due to their ability make autonomous decisions and engage in complex interactions. To fully harness the potential of these learning machines, users must understand strengths limitations. As AI tools become increasingly prevalent our daily lives, it is essential explore how this technology has been used so far healthcare medical education, as well areas medicine where can be applied. This paper systematically reviews published literature on PubMed database from its inception up June 6, 2024, focusing studies that at some level following Preferred Reporting Items for Systematic Reviews Meta-Analyses guidelines. Several papers identified was generate exam questions, produce clinical scripts diseases, improve diagnostic skills students clinicians, serve a aid, automate analysis tasks such screening residency applications. shows promise various levels different highlights areas. review also emphasizes importance educators understanding AI's principles, capabilities, limitations before integration. In conclusion, but more research needs done additional applications, address current gaps knowledge, future training professionals.

Language: Английский

Citations

0

UsmleGPT: An AI application for developing MCQs via multi-agent system DOI Open Access

Zhehan Jiang,

S. H. Feng

Software Impacts, Journal Year: 2025, Volume and Issue: 23, P. 100742 - 100742

Published: March 1, 2025

Language: Английский

Citations

0

The role of generative artificial intelligence in psychiatric education– a scoping review DOI Creative Commons

Qin Yuan Lee,

Michelle Chen, Caroline Ong

et al.

BMC Medical Education, Journal Year: 2025, Volume and Issue: 25(1)

Published: March 25, 2025

The growing prevalence of mental health conditions, worsened by the COVID-19 pandemic, highlights urgent need for enhanced psychiatric education. distinctive nature psychiatry– which is heavily centred on communication skills, interpersonal and interviewing techniques– indicates a necessity further research into use GenAI in Given has shown promising outcomes medical education, this study aims to discuss possible roles We conducted scoping review identify role education based educational framework Canadian Medical Education Directives Specialists (CanMEDS). Of 12,594 papers identified, five studies met inclusion criteria, revealing key case-based learning, simulation, content synthesis, assessments. Despite these applications, limitations such as accuracy, biases, concerns regarding security privacy were highlighted. have been This contributes understanding how can enhance suggests future directions refine its training students primary care physicians. significant potential address demand professionals, provided are carefully managed.

Language: Английский

Citations

0

Can Artificial Intelligence be used to teach Psychiatry and Psychology?: A Scoping Review (Preprint) DOI Creative Commons

Julien Prégent,

V. V. CHUNG,

Inès El Adib

et al.

Published: March 30, 2025

BACKGROUND Artificial Intelligence (AI) is increasingly integrated into healthcare, including psychiatry and psychology. In educational contexts, AI offers new possibilities for enhancing clinical reasoning, personalizing content delivery, supporting professional development. Despite this emerging interest, a comprehensive understanding of how currently used in mental health education, the challenges associated with its adoption, remains limited. OBJECTIVE This scoping review aims to identify characterize current applications teaching learning It also seeks document reported facilitators barriers integration within contexts. METHODS A systematic search was conducted across six electronic databases (MEDLINE, PubMed, Embase, PsycINFO, EBM Reviews, Google Scholar) from inception October 2024. The followed PRISMA-ScR guidelines. Studies were included if they focused on or psychology, described use an tool, discussed at least one facilitator barrier education. Data extracted study characteristics, population, application, outcomes, facilitators, barriers. Study quality appraised using several design-appropriate tools. RESULTS From 6219 records, 10 studies met inclusion criteria. Eight categories identified: decision support, creation, therapeutic tools monitoring, administrative research assistance, natural language processing, program/policy development, student/applicant Key availability tools, positive learner attitudes, digital infrastructure, time-saving features. Barriers limited training, ethical concerns, lack literacy, algorithmic opacity, insufficient curricular integration. overall methodological moderate high. CONCLUSIONS being range functions training assessment support. While potential outcomes clear, successful requires addressing ethical, technical, pedagogical Future efforts should focus faculty institutional policies guide responsible effective use. underscores importance interdisciplinary collaboration ensure safe, equitable, meaningful adoption

Language: Английский

Citations

0

FonoTCS: validação de uma ferramenta para avaliação do raciocínio clínico em Fonoaudiologia DOI Creative Commons
Ana Cristina Côrtes Gama, Roberto da Costa Quinino, Adriane Mesquita de Medeiros

et al.

CoDAS, Journal Year: 2025, Volume and Issue: 37(3)

Published: Jan. 1, 2025

RESUMO Objetivo Validar a estrutura interna do Teste de Concordância Scripts em Fonoaudiologia (FonoTCS) que será desenvolvido formato virtual com acesso livre, para ser utilizado na avaliação raciocínio clínico jovens profissionais e estudantes fonoaudiologia formação generalista, falantes português brasileiro. Método Trata-se estudo validação instrumento. Participaram 25 fonoaudiólogos especialistas, mais 10 anos experiência clínica generalista 35 convocados o Enade. Ambos os grupos avaliaram 30 casos clínicos 120 itens FonoTCS. Para seleção final dos especialistas compuseram amostra, foram retirados juízes cujas avaliações apresentavam resultados Z2 >2 Z<-2 distantes da resposta modal. presentes no teste, permaneceram aqueles que, correlação Pearson entre as notas transformadas um determinado Item, soma das todos Itens, obtiveram valor superior 0,05. O teste Alfa Cronbach foi aplicado medir consistência FonoTCS pontuação cada item definida partir método escore agregado. Resultados As respostas 13 consideradas definição teste. instrumento apresentou 88 distribuídos 28 clínicos. A igual 0,903 intervalo confiança 95% expresso por 0,86|---|0,95. Estes valores indicam uma alta Conclusão é válido confiável

Citations

0

FonoTCS: validation of a tool for assessing clinical reasoning in Speech-Language pathology DOI Creative Commons
Ana Cristina Côrtes Gama, Roberto da Costa Quinino, Adriane Mesquita de Medeiros

et al.

CoDAS, Journal Year: 2025, Volume and Issue: 37(3)

Published: Jan. 1, 2025

ABSTRACT Purpose To validate the internal structure of Speech-Language Pathology Script Concordance Test (FonoTCS), which will be developed in a virtual, open-access format, to used assessment clinical reasoning among young professionals and students speech-language pathology with generalist background, speakers Brazilian Portuguese. Methods This is study instrument. Twenty-five specialist pathologists, more than 10 years experience, 35 summoned for Enade participated. Both groups evaluated 30 cases 120 items from FonoTCS. For final selection specialists who made up sample, judges whose evaluations showed Z2 results >2 Z<-2 distant modal response were removed. present format test, those that remained had Pearson correlation between transformed scores given item sum all items, value greater 0.05. The Cronbach's Alpha test was applied measure consistency FonoTCS, score each defined based on aggregated method. Results responses 13 considered definition score. instrument 88 distributed across 28 cases. 0.903 95% confidence interval expressed by 0.86|---|0.95. These values indicate high Conclusion FonoTCS valid reliable use evaluating training, are Portuguese speakers.

Language: Английский

Citations

0

Teaching Clinical Reasoning in the Age of AI: A Mixed-Methods Formative Evaluation of AI-Generated Script Concordance Tests and Expert Embodiment (Preprint) DOI
Alexandre Hudon, Véronique Phan, Bernard Charlin

et al.

Published: April 27, 2025

BACKGROUND The integration of artificial intelligence (AI) in medical education is evolving, offering new tools to enhance teaching and assessment. Among these, script concordance tests (SCT) are well suited evaluate clinical reasoning contexts uncertainty. Traditionally, SCTs require expert panels for scoring feedback, which can be resource intensive. Recent advances generative AI, particularly large language models (LLM), suggest the possibility replacing human experts with simulated ones, though this potential remains underexplored. OBJECTIVE This study aimed whether LLMs effectively simulate judgment SCTs, by using AI author, score, provide feedback cardiology pneumology. A secondary goal was assess students’ perceptions test’s difficulty pedagogical value AI-generated feedback. METHODS cross-sectional, mixed-methods conducted 25 second-year students who completed a 32-item SCT authored ChatGPT-4o. Six (three trained on course material three untrained) served as generate keys Students answered questions, rated perceived difficulty, selected most helpful explanation each item. Quantitative analysis included scoring, ratings, correlation between student responses. Qualitative comments were thematically analyzed. RESULTS average score 22.8 out 32 (SD = 1.6), scores ranging from 19.75 26.75. Trained systems showed significantly higher responses (ρ 0.64) than untrained 0.41). 62.5% cases, especially when provided models. demonstrated good internal consistency (Cronbach’s α 0.76), reported moderate (mean=3.7/7). highlighted appreciation reflective tools, while recommending clearer guidance Likert-scale use more contextual detail vignettes. CONCLUSIONS among first studies demonstrate that reliably framework. findings both streamline design offer educational valuable without compromising authenticity. Future should explore longitudinal effects learning how hybrid (human AI) optimize instruction education.

Language: Английский

Citations

0

Large language models for generating script concordance test in obstetrics and gynecology: ChatGPT and Claude DOI
Zuhal Yapıcı Çoşkun, Yavuz Selim Kıyak, Özlem Çoşkun

et al.

Medical Teacher, Journal Year: 2025, Volume and Issue: unknown, P. 1 - 5

Published: April 30, 2025

To evaluate the performance of large language models (ChatGPT-4o and Claude 3.5 Sonnet) to generate script concordance test (SCT) items for assessing clinical reasoning in obstetrics gynecology. This cross-sectional study involved generation SCT five common diagnostic topics gynecology primary care settings. A total 16 panelists evaluated AI-generated against 11 predefined criteria. Descriptive statistics were used compare models' across ChatGPT-4o had an overall agreement rate 90.57% meeting quality criteria, while Sonnet achieved 91.48%. The criterion with lowest scores was "The scenario is appropriate difficulty medical students," rated at 71.25% 76.25%. Large can that effectively assess reasoning; however, further refinement required ensure level students. These findings highlight potential AI enhance efficiency within

Language: Английский

Citations

0