Analyzing evaluation methods for large language models in the medical field: a scoping review DOI Creative Commons
Junbok Lee, Sungkyung Park, Jaeyong Shin

et al.

BMC Medical Informatics and Decision Making, Journal Year: 2024, Volume and Issue: 24(1)

Published: Nov. 29, 2024

Abstract Background Owing to the rapid growth in popularity of Large Language Models (LLMs), various performance evaluation studies have been conducted confirm their applicability medical field. However, there is still no clear framework for evaluating LLMs. Objective This study reviews on LLM evaluations field and analyzes research methods used these studies. It aims provide a reference future researchers designing Methods & materials We scoping review three databases (PubMed, Embase, MEDLINE) identify LLM-related articles published between January 1, 2023, September 30, 2023. analyzed types methods, number questions (queries), evaluators, repeat measurements, additional analysis use prompt engineering, metrics other than accuracy. Results A total 142 met inclusion criteria. was primarily categorized as either providing test examinations ( n = 53, 37.3%) or being evaluated by professional 80, 56.3%), with some hybrid cases 5, 3.5%) combination two 4, 2.8%). Most had 100 fewer 18, 29.0%), 15 (24.2%) performed repeated 18 (29.0%) analyses, 8 (12.9%) engineering. For assessment, most 50 queries 54, 64.3%), evaluators 43, 48.3%), 14 (14.7%) Conclusions More required regarding application LLMs healthcare. Although previous performance, will likely focus improving performance. well-structured methodology be systematically.

Language: Английский

A scoping review of artificial intelligence in medical education: BEME Guide No. 84 DOI Creative Commons
Morris Gordon, Michelle Daniel, Aderonke Ajiboye

et al.

Medical Teacher, Journal Year: 2024, Volume and Issue: 46(4), P. 446 - 470

Published: Feb. 29, 2024

Background Artificial Intelligence (AI) is rapidly transforming healthcare, and there a critical need for nuanced understanding of how AI reshaping teaching, learning, educational practice in medical education. This review aimed to map the literature regarding applications education, core areas findings, potential candidates formal systematic gaps future research.

Language: Английский

Citations

86

The Potential Applications and Challenges of ChatGPT in the Medical Field DOI Creative Commons
Yonglin Mu, Dawei He

International Journal of General Medicine, Journal Year: 2024, Volume and Issue: Volume 17, P. 817 - 826

Published: March 1, 2024

ChatGPT, an AI-driven conversational large language model (LLM), has garnered significant scholarly attention since its inception, owing to manifold applications in the realm of medical science. This study primarily examines merits, limitations, anticipated developments, and practical ChatGPT clinical practice, healthcare, education, research. It underscores necessity for further research development enhance performance deployment. Moreover, future avenues encompass ongoing enhancements standardization mitigating exploring integration applicability translational personalized medicine. Reflecting narrative nature this review, a focused literature search was performed identify relevant publications on ChatGPT's use process aimed at gathering broad spectrum insights provide comprehensive overview current state prospects domain. The objective is aid healthcare professionals understanding groundbreaking advancements associated with latest artificial intelligence tools, while also acknowledging opportunities challenges presented by ChatGPT.

Language: Английский

Citations

24

Evaluating the performance of ChatGPT in answering questions related to urolithiasis DOI
Hakan Çakır, Ufuk Çağlar, Oguzhan Yildiz

et al.

International Urology and Nephrology, Journal Year: 2023, Volume and Issue: 56(1), P. 17 - 21

Published: Sept. 2, 2023

Language: Английский

Citations

40

The Scientific Knowledge of Bard and ChatGPT in Endocrinology, Diabetes, and Diabetes Technology: Multiple-Choice Questions Examination-Based Performance DOI
Sultan Ayoub Meo, Thamir Al-khlaiwi, Abdulelah Adnan Abukhalaf

et al.

Journal of Diabetes Science and Technology, Journal Year: 2023, Volume and Issue: unknown

Published: Oct. 5, 2023

The present study aimed to investigate the knowledge level of Bard and ChatGPT in areas endocrinology, diabetes, diabetes technology through a multiple-choice question (MCQ) examination format.Initially, 100-MCQ bank was established based on MCQs technology. were created from physiology, medical textbooks, academic pools pools. team members analyzed MCQ contents ensure that they related number endocrinology 50, science also 50. Google's assessed with an MCQ-based examination.In section, obtained 29 marks (correct responses) 50 (58%), similar score (58%). However, 23 (46%), 20 (40%). Overall, entire three-part examination, 52 100 (52%), 49 (49%). slightly more than Bard. both did not achieve satisfactory scores or diabetes/technology at least 60%.The overall performance better appropriate diabetes/diabetes indicates have potential facilitate students faculty education settings, but artificial intelligence tools need updated information fields

Language: Английский

Citations

29

Utilizing ChatGPT in Telepharmacy DOI Open Access
Firas H. Bazzari, Amjad H. Bazzari

Cureus, Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 16, 2024

Background: ChatGPT is an artificial intelligence-powered chatbot that has demonstrated capabilities in numerous fields, including medical and healthcare sciences. This study evaluates the potential for application telepharmacy, delivering of pharmaceutical care via means telecommunications, through assessing its interactions, adherence to instructions, ability role-play as a pharmacist while handling series life-like scenario questions. Methods: Two versions (ChatGPT 3.5 4.0, OpenAI) were assessed using two independent trials each. was instructed act answer patient inquiries, followed by set 20 assessment Then, stop act, provide feedback list sources drug information. The responses questions evaluated terms accuracy, precision clarity 4-point Likert-like scale. Results: follow detailed pharmacist, appropriately handle all able understand case details, recognize generic brand names, identify side effects, prescription requirements precautions, proper point-by-point instructions regarding administration, dosing, storage disposal. overall pooled scores 3.425 (0.712) 3.7 (0.61) respectively. rank distribution not significantly different (P>0.05). None answers could be considered directly harmful or labeled entirely mostly incorrect, most point deductions due other factors such indecisiveness, adding immaterial information, missing certain considerations, partial unclarity. similar length across concise. 4.0 showed superior performance, higher consistency, better character report various reliable information sources. However, it only allowed input 40 every three hours provided inaccurate number patients, compared which unlimited but unable feedback. Conclusions: Integrating telepharmacy holds promising potential; however, drawbacks are overcome order function effectively.

Language: Английский

Citations

8

Healthcare professionals and the public sentiment analysis of ChatGPT in clinical practice DOI Creative Commons

Lizhen Lu,

Yueli Zhu,

Jiekai Yang

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: Jan. 7, 2025

To explore the attitudes of healthcare professionals and public on applying ChatGPT in clinical practice. The successful application practice depends technical performance critically perceptions non-healthcare healthcare. This study has a qualitative design based artificial intelligence. was divided into five steps: data collection, cleaning, validation relevance, sentiment analysis, content analysis using K-means algorithm. comprised 3130 comments amounting to 1,593,650 words. dictionary method showed positive negative emotions such as anger, disgust, fear, sadness, surprise, good, happy emotions. Healthcare prioritized ChatGPT's efficiency but raised ethical accountability concerns, while valued its accessibility emotional support expressed worries about privacy misinformation. Bridging these perspectives by improving reliability, safeguarding privacy, clearly defining role is essential for practical integration

Language: Английский

Citations

1

The Role of Artificial Intelligence and Emerging Technologies in Advancing Total Hip Arthroplasty DOI Open Access
Luca Andriollo, Aurelio Picchi,

Giulio Iademarco

et al.

Journal of Personalized Medicine, Journal Year: 2025, Volume and Issue: 15(1), P. 21 - 21

Published: Jan. 9, 2025

Total hip arthroplasty (THA) is a widely performed surgical procedure that has evolved significantly due to advancements in artificial intelligence (AI) and robotics. As demand for THA grows, reliable tools are essential enhance diagnosis, preoperative planning, precision, postoperative rehabilitation. AI applications orthopedic surgery offer innovative solutions, including automated osteoarthritis (OA) precise implant positioning, personalized risk stratification, thereby improving patient outcomes. Deep learning models have transformed OA severity grading identification by automating traditionally manual processes with high accuracy. Additionally, AI-powered systems optimize planning predicting the joint center identifying complications using multimodal data. Robotic-assisted enhances precision real-time feedback, reducing such as dislocations leg length discrepancies while accelerating recovery. Despite these advancements, barriers cost, accessibility, steep curve surgeons hinder widespread adoption. Postoperative rehabilitation benefits from technologies like virtual augmented reality telemedicine, which engagement adherence. However, limitations, particularly among elderly populations lower adaptability technology, underscore need user-friendly platforms. To ensure comprehensiveness, structured literature search was conducted PubMed, Scopus, Web of Science. Keywords included "artificial intelligence", "machine learning", "robotics", "total arthroplasty". Inclusion criteria emphasized peer-reviewed studies published English within last decade focusing on technological clinical This review evaluates robotics' role THA, highlighting opportunities challenges emphasizing further research real-world validation integrate into practice effectively.

Language: Английский

Citations

1

Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey DOI Creative Commons
Büşra Tosun, Zeliha Yılmaz

Journal of Dental Sciences, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 1, 2025

Language: Английский

Citations

1

Evaluating ChatGPT-3.5 and Claude-2 in Answering and Explaining Conceptual Medical Physiology Multiple-Choice Questions DOI Open Access
Mayank Agarwal,

Ayan Goswami,

Priyanka Sharma

et al.

Cureus, Journal Year: 2023, Volume and Issue: unknown

Published: Sept. 29, 2023

Background Generative artificial intelligence (AI) systems such as ChatGPT-3.5 and Claude-2 may assist in explaining complex medical science topics. A few studies have shown that AI can solve complicated physiology problems require critical thinking analysis. However, further are required to validate the effectiveness of answering conceptual multiple-choice questions (MCQs) human physiology. Objective This study aimed evaluate compare proficiency a curated set MCQs Methods In this cross-sectional study, 55 from 10 competencies was purposefully constructed comprehension, problem-solving, analytical skills them. The structured prompt for response generation were presented Claude-2. explanations provided by both documented an Excel spreadsheet. All three authors subjected these rating process using scale 0 3. assigned incorrect, 1 partially correct, 2 correct explanation with some aspects missing, 3 perfectly explanation. Both models evaluated their ability choose answer (option) provide clear comprehensive MCQs. Mann-Whitney U test used responses. Fleiss multi-rater kappa (κ) determine score agreement among raters. statistical significance level decided at P ≤ 0.05. Results answered 40 correctly, which significantly higher than 26 responses ChatGPT-3.5. distribution generated κ values 0.804 0.818 ChatGPT-3.5, respectively. Conclusion terms elucidating physiology, surpassed accessing India requires use virtual private network, raise security concerns.

Language: Английский

Citations

20

Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy DOI Creative Commons
Ambadasu Bharatha, Nkemcho Ojeh, Ahbab Mohammad Fazle Rabbi

et al.

Advances in Medical Education and Practice, Journal Year: 2024, Volume and Issue: Volume 15, P. 393 - 400

Published: May 1, 2024

Introduction: This research investigated the capabilities of ChatGPT-4 compared to medical students in answering MCQs using revised Bloom's Taxonomy as a benchmark. Methods: A cross-sectional study was conducted at The University West Indies, Barbados. and were assessed on from various courses computer-based testing. Results: included 304 MCQs. Students demonstrated good knowledge, with 78% correctly least 90% questions. However, achieved higher overall score (73.7%) (66.7%). Course type significantly affected ChatGPT-4's performance, but levels did not. detailed association check between program taxonomy for correct answers by showed highly significant correlation (p< 0.001), reflecting concentration "remember-level" questions preclinical "evaluate-level" clinical courses. Discussion: highlights proficiency standardized tests indicates limitations reasoning practical skills. performance discrepancy suggests that effectiveness artificial intelligence (AI) varies based course content. Conclusion: While shows promise an educational tool, its role should be supplementary, strategic integration into education leverage strengths address limitations. Further is needed explore AI's impact student across Keywords: intelligence, ChatGPT-4's, students, interpretation abilities, multiple choice

Language: Английский

Citations

8