ChatGPT Knowledge Evaluation in Basic and Clinical Medical Sciences: Multiple Choice Question Examination-Based Performance DOI Open Access
Sultan Ayoub Meo, Abeer A. Al‐Masri, Metib Alotaibi

et al.

Healthcare, Journal Year: 2023, Volume and Issue: 11(14), P. 2046 - 2046

Published: July 17, 2023

The Chatbot Generative Pre-Trained Transformer (ChatGPT) has garnered great attention from the public, academicians and science communities. It responds with appropriate articulate answers explanations across various disciplines. For use of ChatGPT in education, research healthcare, different perspectives exist some level ambiguity around its acceptability ideal uses. However, literature is acutely lacking establishing a link to assess intellectual levels medical sciences. Therefore, present study aimed investigate knowledge education both basic clinical sciences, multiple-choice question (MCQs) examination-based performance impact on examination system. In this study, initially, subject-wise bank was established pool questions textbooks university pools. team members carefully reviewed MCQ contents ensured that MCQs were relevant subject's contents. Each scenario-based four sub-stems had single correct answer. 100 disciplines, including sciences (50 MCQs) MCQs), randomly selected bank. manually entered one by one, fresh session started for each entry avoid memory retention bias. task given response ChatGPT. first obtained taken as final response. Based pre-determined answer key, scoring made scale 0 1, zero representing incorrect results revealed out disciplines attempted all 37/50 (74%) marks 35/50 (70%) an overall score 72/100 (72%) concluded satisfactory subjects demonstrated degree understanding explanation. This study's findings suggest may be able assist students faculty settings since it potential innovation framework education.

Language: Английский

Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology DOI Open Access
Mayank Agarwal, Priyanka Sharma,

Ayan Goswami

et al.

Cureus, Journal Year: 2023, Volume and Issue: unknown

Published: June 26, 2023

Background Artificial intelligence (AI) is evolving in the medical education system. ChatGPT, Google Bard, and Microsoft Bing are AI-based models that can solve problems education. However, applicability of AI to create reasoning-based multiple-choice questions (MCQs) field physiology yet be explored. Objective We aimed assess compare generating MCQs for MBBS (Bachelor Medicine, Bachelor Surgery) undergraduate students on subject physiology. Methods The National Medical Commission India has developed an 11-module curriculum with various competencies. Two physiologists independently chose a competency from each module. third physiologist prompted all three AIs generate five chosen competency. two who provided competencies rated generated by scale 0-3 validity, difficulty, reasoning ability required answer them. analyzed average scores using Kruskal-Wallis test distribution across total module-wise responses, followed post-hoc pairwise comparisons. used Cohen's Kappa (Κ) agreement between raters. expressed data as median interquartile range. determined their statistical significance p-value <0.05. Results ChatGPT Bard 110 only 100 it failed them validity was 3 (3-3) (1.5-3) Bing, showing significant difference (p<0.001) among models. difficulty 1 (0-1) (1-2) (p=0.006). no (p=0.235). K ≥ 0.8 parameters Conclusion still needs evolve showed certain limitations. significantly least valid MCQs, while difficult MCQs.

Language: Английский

Citations

50

Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology DOI Open Access
Anup Kumar D Dhanvijay, Mohammed Jaffer Pinjar,

Nitin Dhokane

et al.

Cureus, Journal Year: 2023, Volume and Issue: unknown

Published: Aug. 4, 2023

Background Large language models (LLMs) have emerged as powerful tools capable of processing and generating human-like text. These LLMs, such ChatGPT (OpenAI Incorporated, Mission District, San Francisco, United States), Google Bard (Alphabet Inc., CA, US), Microsoft Bing (Microsoft Corporation, WA, been applied across various domains, demonstrating their potential to assist in solving complex tasks improving information accessibility. However, application case vignettes physiology has not explored. This study aimed assess the performance three namely, (3.5; free research version), (Experiment), (precise), answering cases Physiology. Methods cross-sectional was conducted July 2023. A total 77 were prepared by two physiologists validated other content experts. presented each LLM, responses collected. Two independently rated answers provided LLMs based on accuracy. The ratings measured a scale from 0 4 according structure observed learning outcome (pre-structural = 0, uni-structural 1, multi-structural 2, relational 3, extended-abstract). scores among compared Friedman's test inter-observer agreement checked intraclass correlation coefficient (ICC). Results overall for ChatGPT, Bing, study, with cases, found be 3.19±0.3, 2.15±0.6, 2.91±0.5, respectively, p<0.0001. Hence, 3.5 (free version) obtained highest score, (Precise) had lowest (Experiment) fell between terms performance. average ICC values 0.858 (95% CI: 0.777 0.91, p<0.0001), 0.975 0.961 0.984, 0.964 0.944 0.977, respectively. Conclusion outperformed physiology. students teachers may think about choosing educational purposes accordingly case-based Further exploration capabilities is needed adopting those medical education support clinical decision-making.

Language: Английский

Citations

48

Educational data augmentation in physics education research using ChatGPT DOI Creative Commons
Fabian Kieser, Peter Wulff, Jochen Kühn

et al.

Physical Review Physics Education Research, Journal Year: 2023, Volume and Issue: 19(2)

Published: Oct. 25, 2023

Generative AI technologies such as large language models show novel potential to enhance educational research. For example, generative were shown be capable of solving quantitative reasoning tasks in physics and concept tests the Force Concept Inventory (FCI). Given importance inventories for education research, challenges developing them field testing with representative populations, this study seeks examine what extent a model could utilized generate synthetic dataset FCI that exhibits content-related variability responses. We use recently introduced ChatGPT based on GPT 4 investigate solve accurately (RQ1) prompted if it student belonging different cohort (RQ2). Furthermore, we study, having force- mechanics-related preconception (RQ3). In alignment other found FCI. furthermore prompting respond inventory belonged yielded no variance responses, however, responding had certain much responses approximate real human some regards.Received 28 July 2023Accepted 3 October 2023DOI:https://doi.org/10.1103/PhysRevPhysEducRes.19.020150Published by American Physical Society under terms Creative Commons Attribution 4.0 International license. Further distribution work must maintain attribution author(s) published article's title, journal citation, DOI.Published SocietyPhysics Subject Headings (PhySH)Research AreasConcepts & principlesResearch methodologyPhysics Education Research

Language: Английский

Citations

45

Capacity of ChatGPT to Identify Guideline-Based Treatments for Advanced Solid Tumors DOI Open Access
Brian Schulte

Cureus, Journal Year: 2023, Volume and Issue: unknown

Published: April 21, 2023

ChatGPT, created by OpenAI, is a large language model which has become the fastest growing consumer application in history, recognized for its expansive knowledge of varied subjects. The field oncology highly specialized and requires nuanced understanding medications conditions. Herein, we sought to better qualify ability ChatGPT name applicable treatments patients with advanced solid cancers. This observational study was conducted utilizing ChatGPT. capacity tabulate appropriate systemic therapies new diagnoses malignancies ascertained through standardized prompts. A ratio those listed suggested National Comprehensive Cancer Network (NCCN) guidelines produced called valid therapy quotient (VTQ). Additional descriptive analyses VTQ association incidence type treatment were performed. Some 51 distinct utilized within this experiment. able identify 91 response prompts related tumors. overall 0.77. In all cases, provide at least one example NCCN. There weak between each malignancy VTQ. used treat tumors indicates level concordance NCCN guidelines. As it stands, role assist oncologists decision making remains unknown. Nonetheless, future iterations, may be anticipated that accuracy consistency domain will improve, further studies needed quantify capabilities.

Language: Английский

Citations

43

ChatGPT Knowledge Evaluation in Basic and Clinical Medical Sciences: Multiple Choice Question Examination-Based Performance DOI Open Access
Sultan Ayoub Meo, Abeer A. Al‐Masri, Metib Alotaibi

et al.

Healthcare, Journal Year: 2023, Volume and Issue: 11(14), P. 2046 - 2046

Published: July 17, 2023

The Chatbot Generative Pre-Trained Transformer (ChatGPT) has garnered great attention from the public, academicians and science communities. It responds with appropriate articulate answers explanations across various disciplines. For use of ChatGPT in education, research healthcare, different perspectives exist some level ambiguity around its acceptability ideal uses. However, literature is acutely lacking establishing a link to assess intellectual levels medical sciences. Therefore, present study aimed investigate knowledge education both basic clinical sciences, multiple-choice question (MCQs) examination-based performance impact on examination system. In this study, initially, subject-wise bank was established pool questions textbooks university pools. team members carefully reviewed MCQ contents ensured that MCQs were relevant subject's contents. Each scenario-based four sub-stems had single correct answer. 100 disciplines, including sciences (50 MCQs) MCQs), randomly selected bank. manually entered one by one, fresh session started for each entry avoid memory retention bias. task given response ChatGPT. first obtained taken as final response. Based pre-determined answer key, scoring made scale 0 1, zero representing incorrect results revealed out disciplines attempted all 37/50 (74%) marks 35/50 (70%) an overall score 72/100 (72%) concluded satisfactory subjects demonstrated degree understanding explanation. This study's findings suggest may be able assist students faculty settings since it potential innovation framework education.

Language: Английский

Citations

43