Cited by Integration of Generative AI System to IoT Based Healthcare Systems 5.0

A framework for human evaluation of large language models in healthcare derived from literature review DOI

Thomas Yu Chow Tam,

Sonish Sivarajkumar,

Sumit Kapoor

и другие.

npj Digital Medicine, Год журнала: 2024, Номер 7(1)

Опубликована: Сен. 28, 2024

Abstract With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential assuring safety and effectiveness. This study reviews existing literature on evaluation methodologies for healthcare across various medical specialties addresses factors such as dimensions, sample types sizes, selection, recruitment of evaluators, frameworks metrics, process, statistical analysis type. Our review 142 studies shows gaps reliability, generalizability, applicability current practices. To overcome significant obstacles LLM developments deployments, we propose QUEST, a comprehensive practical framework covering three phases workflow: Planning, Implementation Adjudication, Scoring Review. QUEST designed five proposed principles: Quality Information, Understanding Reasoning, Expression Style Persona, Safety Harm, Trust Confidence.

Язык: Английский

Процитировано

A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review DOI

Malik Sallam, Muna Barakat, Mohammed Sallam

и другие.

Interactive Journal of Medical Research, Год журнала: 2024, Номер 13, С. e54704 - e54704

Опубликована: Янв. 26, 2024

Background Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models care has been evaluated extensively. However, lack consensus guidelines on design and reporting findings these studies poses a challenge for interpretation synthesis evidence. Objective This study aimed develop preliminary checklist standardize AI-based education practice. Methods A literature review was conducted Scopus, PubMed, Google Scholar. Published records with “ChatGPT,” “Bing,” or “Bard” title were retrieved. Careful examination methodologies employed included identify common pertinent themes possible gaps reporting. panel discussion held establish unified thorough AI The finalized used evaluate by 2 independent raters. Cohen κ as method interrater reliability. Results final data set that formed basis theme identification analysis comprised total 34 records. 9 collectively referred METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, Specificity prompts language). Their details are follows: (1) Model its exact settings; (2) Evaluation approach generated content; (3) Timing testing model; (4) Transparency source; (5) Range tested topics; (6) Randomization selecting queries; (7) factors queries reliability; (8) Count executed test (9) language used. overall mean score 3.0 (SD 0.58). acceptable, range 0.558 0.962 (P<.001 items). With classification per item, highest average recorded “Model” followed “Specificity” while lowest scores “Randomization” item (classified suboptimal) “Individual factors” satisfactory). Conclusions can facilitate guiding researchers toward best practices results. highlight need standardized algorithms care, considering variability observed proposed could be helpful base universally accepted which swiftly evolving research topic.

Язык: Английский

Процитировано

Harnessing Generative Artificial Intelligence for Exercise and Training Prescription: Applications and Implications in Sports and Physical Activity—A Systematic Literature Review DOI

Luca Puce, Nicola Luigi Bragazzi, Antonio Currà

и другие.

Applied Sciences, Год журнала: 2025, Номер 15(7), С. 3497 - 3497

Опубликована: Март 22, 2025

Regular physical activity plays a critical role in health promotion and athletic performance, necessitating personalized exercise training prescriptions. While traditional methods rely on expert assessments, artificial intelligence (AI), particularly generative AI models such as ChatGPT Google Gemini, has emerged potential tool for enhancing personalization scalability recommendations. However, the applicability, reliability, adaptability of AI-generated prescriptions remain underexplored. A comprehensive search was performed using UnoPerTutto metadatabase, identifying 2891 records. After duplicate removal (1619 records) screening, 61 full-text reports were assessed eligibility, resulting inclusion 10 studies. The studies varied methodology, including qualitative mixed-methods approaches, quasi-experimental designs, randomized controlled trial (RCT). ChatGPT-4, ChatGPT-3.5, Gemini evaluated across different contexts, strength training, rehabilitation, cardiovascular exercise, general fitness programs. Findings indicate that programs generally adhere to established guidelines but often lack specificity, progression, real-time physiological feedback. recommendations found emphasize safety broad making them useful guidance less effective high-performance training. GPT-4 demonstrated superior performance generating structured resistance compared older models, yet limitations individualization contextual adaptation persisted. appraisal METRICS checklist revealed inconsistencies study quality, regarding prompt model transparency, evaluation frameworks. holds promise democratizing access prescriptions, its remains complementary rather than substitutive guidance. Future research should prioritize adaptability, integration with monitoring, improved AI-human collaboration enhance precision effectiveness AI-driven

Язык: Английский

Процитировано

Bibliometric top ten healthcare-related ChatGPT publications in the first ChatGPT anniversary DOI

Malik Sallam

Narra J, Год журнала: 2024, Номер 4(2), С. e917 - e917

Опубликована: Авг. 5, 2024

Since its public release on November 30, 2022, ChatGPT has shown promising potential in diverse healthcare applications despite ethical challenges, privacy issues, and possible biases. The aim of this study was to identify assess the most influential publications field utility using bibliometric analysis. employed an advanced search three databases, Scopus, Web Science, Google Scholar, ChatGPT-related records education, research, practice between 27 2023. ranking based retrieved citation count each database. additional alternative metrics that were evaluated included (1) Semantic Scholar highly citations, (2) PlumX captures, (3) mentions, (4) social media (5) Altmetric Attention Scores (AASs). A total 22 unique published 17 different scientific journals from 14 publishers identified databases. Only two top 10 list across Variable publication types identified, with common being editorial/commentary (n=8/22, 36.4%). Nine had corresponding authors affiliated institutions United States (40.9%). range varied per database, highest (1019-121), followed by Scopus (242-88), Science (171-23). citations correlated significantly following metrics: (Spearman's correlation coefficient ρ=0.840,

Язык: Английский

Процитировано

The Role of Artificial Intelligence in the Primary Prevention of Common Musculoskeletal Diseases DOI

Selkin Yilmaz Muluk,

Nazli Olcucu

Cureus, Год журнала: 2024, Номер unknown

Опубликована: Июль 25, 2024

Musculoskeletal disorders (MSDs) are a leading cause of disability worldwide, with growing burden across all demographics. With advancements in technology, conversational artificial intelligence (AI) platforms such as ChatGPT (OpenAI, San Francisco, CA) have become instrumental disseminating health information. This study evaluated the effectiveness versions 3.5 and 4 delivering primary prevention information for common MSDs, emphasizing that is focused on not diagnosis.

Язык: Английский

Процитировано

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions DOI

Malik Sallam,

Khaled Al‐Salahat,

Huda Eid

и другие.

Advances in Medical Education and Practice, Год журнала: 2024, Номер Volume 15, С. 857 - 871

Опубликована: Сен. 1, 2024

Artificial intelligence (AI) chatbots excel in language understanding and generation. These models can transform healthcare education practice. However, it is important to assess the performance of such AI various topics highlight its strengths possible limitations. This study aimed evaluate ChatGPT (GPT-3.5 GPT-4), Bing, Bard compared human students at a postgraduate master's level Medical Laboratory Sciences.

Язык: Английский

Процитировано

Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review DOI

Cindy Ho,

Tiffany Tian,

Alessandra T. Ayers

и другие.

BMC Medical Informatics and Decision Making, Год журнала: 2024, Номер 24(1)

Опубликована: Ноя. 26, 2024

The large language models (LLMs), most notably ChatGPT, released since November 30, 2022, have prompted shifting attention to their use in medicine, particularly for supporting clinical decision-making. However, there is little consensus the medical community on how LLM performance contexts should be evaluated. We performed a literature review of PubMed identify publications between December 1, and April 2024, that discussed assessments LLM-generated diagnoses or treatment plans. selected 108 relevant articles from analysis. frequently used LLMs were GPT-3.5, GPT-4, Bard, LLaMa/Alpaca-based models, Bing Chat. five criteria scoring outputs "accuracy", "completeness", "appropriateness", "insight", "consistency". defining high-quality been consistently by researchers over past 1.5 years. identified high degree variation studies reported findings assessed performance. Standardized reporting qualitative evaluation metrics assess quality can developed facilitate research healthcare.

Язык: Английский

Процитировано

Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain DOI

Selkin Yilmaz Muluk,

Nazli Olcucu

Cureus, Год журнала: 2024, Номер unknown

Опубликована: Июль 1, 2024

Background: Low back pain (LBP) is a prevalent healthcare concern that frequently responsive to conservative treatment. However, it can also stem from severe conditions, marked by 'red flags' (RF) such as malignancy, cauda equina syndrome, fractures, infections, spondyloarthropathies, and aneurysm rupture, which physicians should be vigilant about. Given the increasing reliance on online health information, this study assessed ChatGPT-3.5's (OpenAI, San Francisco, CA, USA) GoogleBard's (Google, Mountain View, accuracy in responding RF-related LBP questions their capacity discriminate severity of condition. Methods: We created 70 symptoms diseases following guidelines. Among them, 58 had single symptom (SS), 12 multiple (MS) LBP. Questions were posed ChatGPT GoogleBard, responses two authors for accuracy, completeness, relevance (ACR) using 5-point rubric criteria. Results: Cohen's kappa values (0.60-0.81) indicated significant agreement among authors. The average scores ranged 3.47 3.85 ChatGPT-3.5 3.36 3.76 GoogleBard SS questions, 4.04 4.29 3.50 3.71 MS questions. ratings these 'good' 'excellent'. Most effectively conveyed situation (93.1% ChatGPT-3.5, 94.8% GoogleBard), all did so. No statistically differences found between (p>0.05). Conclusions: In an era characterized widespread information seeking, artificial intelligence (AI) systems play vital role delivering precise medical information. These technologies may hold promise field if they continue improve.

Язык: Английский

Процитировано

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses DOI

Malik Sallam,

Kholoud Al-Mahzoum,

Rawan Ahmad Almutawaa

и другие.

BMC Research Notes, Год журнала: 2024, Номер 17(1)

Опубликована: Сен. 3, 2024

Язык: Английский

Процитировано

Revolutionizing Breast Cancer Detection With Artificial Intelligence (AI) in Radiology and Radiation Oncology: A Systematic Review DOI

Zubir Rentiya,

Shobha Mandal,

Pugazhendi Inban

и другие.

Cureus, Год журнала: 2024, Номер unknown

Опубликована: Апрель 4, 2024

The number one cause of cancer in women worldwide is breast cancer. Over the last three decades, use traditional screen-film mammography has increased, but recent years, digital and 3D tomosynthesis have become standard procedures for screening. With advancement technology, interpretation images using automated algorithms a subject interest. Initially, computer-aided detection (CAD) was introduced; however, it did not show any long-term benefit clinical practice. advances artificial intelligence (AI) methods, these technologies are showing promising potential more accurate efficient treatment. While AI promises widespread integration treatment, challenges such as data quality, regulatory, ethical implications, algorithm validation crucial. Addressing essential fully realizing AI's enhancing early diagnosis improving patient outcomes management. In this review article, we aim to provide an overview latest developments applications screening existing literature primarily consists retrospective studies, ongoing future prospective research poised offer deeper insights. Artificial on verge into holding enhance improve outcomes.

Язык: Английский

Процитировано