
JMIR Formative Research, Год журнала: 2024, Номер unknown
Опубликована: Июнь 30, 2024
The increasing use of ChatGPT in clinical practice and medical education necessitates the evaluation its reliability, particularly geriatrics.
Язык: Английский
JMIR Formative Research, Год журнала: 2024, Номер unknown
Опубликована: Июнь 30, 2024
The increasing use of ChatGPT in clinical practice and medical education necessitates the evaluation its reliability, particularly geriatrics.
Язык: Английский
Interactive Journal of Medical Research, Год журнала: 2024, Номер 13, С. e54704 - e54704
Опубликована: Янв. 26, 2024
Background Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models care has been evaluated extensively. However, lack consensus guidelines on design and reporting findings these studies poses a challenge for interpretation synthesis evidence. Objective This study aimed develop preliminary checklist standardize AI-based education practice. Methods A literature review was conducted Scopus, PubMed, Google Scholar. Published records with “ChatGPT,” “Bing,” or “Bard” title were retrieved. Careful examination methodologies employed included identify common pertinent themes possible gaps reporting. panel discussion held establish unified thorough AI The finalized used evaluate by 2 independent raters. Cohen κ as method interrater reliability. Results final data set that formed basis theme identification analysis comprised total 34 records. 9 collectively referred METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, Specificity prompts language). Their details are follows: (1) Model its exact settings; (2) Evaluation approach generated content; (3) Timing testing model; (4) Transparency source; (5) Range tested topics; (6) Randomization selecting queries; (7) factors queries reliability; (8) Count executed test (9) language used. overall mean score 3.0 (SD 0.58). acceptable, range 0.558 0.962 (P<.001 items). With classification per item, highest average recorded “Model” followed “Specificity” while lowest scores “Randomization” item (classified suboptimal) “Individual factors” satisfactory). Conclusions can facilitate guiding researchers toward best practices results. highlight need standardized algorithms care, considering variability observed proposed could be helpful base universally accepted which swiftly evolving research topic.
Язык: Английский
Процитировано
23npj Digital Medicine, Год журнала: 2024, Номер 7(1)
Опубликована: Сен. 28, 2024
Abstract With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential assuring safety and effectiveness. This study reviews existing literature on evaluation methodologies for healthcare across various medical specialties addresses factors such as dimensions, sample types sizes, selection, recruitment of evaluators, frameworks metrics, process, statistical analysis type. Our review 142 studies shows gaps reliability, generalizability, applicability current practices. To overcome significant obstacles LLM developments deployments, we propose QUEST, a comprehensive practical framework covering three phases workflow: Planning, Implementation Adjudication, Scoring Review. QUEST designed five proposed principles: Quality Information, Understanding Reasoning, Expression Style Persona, Safety Harm, Trust Confidence.
Язык: Английский
Процитировано
13Narra J, Год журнала: 2024, Номер 4(2), С. e917 - e917
Опубликована: Авг. 5, 2024
Since its public release on November 30, 2022, ChatGPT has shown promising potential in diverse healthcare applications despite ethical challenges, privacy issues, and possible biases. The aim of this study was to identify assess the most influential publications field utility using bibliometric analysis. employed an advanced search three databases, Scopus, Web Science, Google Scholar, ChatGPT-related records education, research, practice between 27 2023. ranking based retrieved citation count each database. additional alternative metrics that were evaluated included (1) Semantic Scholar highly citations, (2) PlumX captures, (3) mentions, (4) social media (5) Altmetric Attention Scores (AASs). A total 22 unique published 17 different scientific journals from 14 publishers identified databases. Only two top 10 list across Variable publication types identified, with common being editorial/commentary (n=8/22, 36.4%). Nine had corresponding authors affiliated institutions United States (40.9%). range varied per database, highest (1019-121), followed by Scopus (242-88), Science (171-23). citations correlated significantly following metrics: (Spearman's correlation coefficient ρ=0.840,
Язык: Английский
Процитировано
9Cureus, Год журнала: 2024, Номер unknown
Опубликована: Июль 25, 2024
Musculoskeletal disorders (MSDs) are a leading cause of disability worldwide, with growing burden across all demographics. With advancements in technology, conversational artificial intelligence (AI) platforms such as ChatGPT (OpenAI, San Francisco, CA) have become instrumental disseminating health information. This study evaluated the effectiveness versions 3.5 and 4 delivering primary prevention information for common MSDs, emphasizing that is focused on not diagnosis.
Язык: Английский
Процитировано
8Advances in Medical Education and Practice, Год журнала: 2024, Номер Volume 15, С. 857 - 871
Опубликована: Сен. 1, 2024
Artificial intelligence (AI) chatbots excel in language understanding and generation. These models can transform healthcare education practice. However, it is important to assess the performance of such AI various topics highlight its strengths possible limitations. This study aimed evaluate ChatGPT (GPT-3.5 GPT-4), Bing, Bard compared human students at a postgraduate master's level Medical Laboratory Sciences.
Язык: Английский
Процитировано
6Cureus, Год журнала: 2024, Номер unknown
Опубликована: Июль 1, 2024
Background: Low back pain (LBP) is a prevalent healthcare concern that frequently responsive to conservative treatment. However, it can also stem from severe conditions, marked by 'red flags' (RF) such as malignancy, cauda equina syndrome, fractures, infections, spondyloarthropathies, and aneurysm rupture, which physicians should be vigilant about. Given the increasing reliance on online health information, this study assessed ChatGPT-3.5's (OpenAI, San Francisco, CA, USA) GoogleBard's (Google, Mountain View, accuracy in responding RF-related LBP questions their capacity discriminate severity of condition. Methods: We created 70 symptoms diseases following guidelines. Among them, 58 had single symptom (SS), 12 multiple (MS) LBP. Questions were posed ChatGPT GoogleBard, responses two authors for accuracy, completeness, relevance (ACR) using 5-point rubric criteria. Results: Cohen's kappa values (0.60-0.81) indicated significant agreement among authors. The average scores ranged 3.47 3.85 ChatGPT-3.5 3.36 3.76 GoogleBard SS questions, 4.04 4.29 3.50 3.71 MS questions. ratings these 'good' 'excellent'. Most effectively conveyed situation (93.1% ChatGPT-3.5, 94.8% GoogleBard), all did so. No statistically differences found between (p>0.05). Conclusions: In an era characterized widespread information seeking, artificial intelligence (AI) systems play vital role delivering precise medical information. These technologies may hold promise field if they continue improve.
Язык: Английский
Процитировано
5BMC Infectious Diseases, Год журнала: 2024, Номер 24(1)
Опубликована: Авг. 8, 2024
Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy information in multilingual contexts. This study aimed compare AI model efficiency English Arabic for infectious disease queries.
Язык: Английский
Процитировано
4BMC Research Notes, Год журнала: 2024, Номер 17(1)
Опубликована: Сен. 3, 2024
Язык: Английский
Процитировано
4Cureus, Год журнала: 2025, Номер unknown
Опубликована: Янв. 1, 2025
Background Birth control methods (BCMs) are often underutilized or misunderstood, especially among young individuals entering their reproductive years. With the growing reliance on artificial intelligence (AI) platforms for health-related information, this study evaluates performance of ChatGPT-4o and Google Gemini in addressing commonly asked questions about BCMs. Methods Thirty questions, derived from American College Obstetrics Gynecologists (ACOG) website, were posed to both AI platforms. Questions spanned four categories: general contraception, specific contraceptive types, emergency other topics. Responses evaluated using a five-point rubric assessing Relevance, Completeness, Lack False Information (RCL). Overall scores calculated by averaging scores. Statistical analysis, including Wilcoxon Signed-Rank test, Friedman Kruskal-Wallis was performed compare metrics. Results provided high-quality responses birth control-related queries, with overall 4.38 ± 0.58 4.37 0.52, respectively, categorized as "very good" "excellent." demonstrated higher lack false based descriptive statistics (4.70 0.60 vs. 4.47 0.73), while outperformed relevance, statistically significant difference (4.53 0.57 4.30 0.70, p = 0.035, large effect size). Completeness comparable (p 0.655). analyses revealed no differences 0.548), though potential trend stronger "Other Topics" category. Within-model variability showed had more pronounced metrics (moderate size, Kendall's W 0.357), exhibited smaller (Kendall's 0.165). These findings suggest that offer reliable complementary tools knowledge gaps nuanced strengths warrant further exploration. Conclusions accurate BCM-related slight strengths. underscore tools, public health information needs, particularly seeking guidance contraception. Further studies larger datasets may elucidate between
Язык: Английский
Процитировано
0Journal of Lifestyle and SDGs Review, Год журнала: 2025, Номер 5(2), С. e04109 - e04109
Опубликована: Янв. 23, 2025
Bosch's societal setting for his works Times of religious, social, and cultural transformation abound in The Garden Earthly Delights Last Judgment. Objective: This study aims to investigate evaluate the sociocultural psychological characteristics Bosch’s paintings. Method: used analysis checklist analyze paintings through thematic analysis. Results Discussion: Through these conversations, establishment social norms their impact on artistic output is revealed, broadening understanding complex multifaceted standard living fifteenth-century Holland. Originality/Value: As Bosch depicted symbols discourse paintings, people were inspired reflect deeply complexities that stood out 15th century. These insights illuminate historical standards emphasize importance sustainability Sustainable Development Goals. indicates art stimulates discussions around sustainable lifestyles resilience.
Язык: Английский
Процитировано
0