Application of ChatGPT to Geriatric Practice and Education -An Exploratory Study on ChatGPT’s Geriatric Attitude, Knowledge, and Clinical Application (Preprint) DOI Creative Commons

Huai Yong Cheng

JMIR Formative Research, Год журнала: 2024, Номер unknown

Опубликована: Июнь 30, 2024

The increasing use of ChatGPT in clinical practice and medical education necessitates the evaluation its reliability, particularly geriatrics.

Язык: Английский

A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review DOI Creative Commons
Malik Sallam, Muna Barakat, Mohammed Sallam

и другие.

Interactive Journal of Medical Research, Год журнала: 2024, Номер 13, С. e54704 - e54704

Опубликована: Янв. 26, 2024

Background Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models care has been evaluated extensively. However, lack consensus guidelines on design and reporting findings these studies poses a challenge for interpretation synthesis evidence. Objective This study aimed develop preliminary checklist standardize AI-based education practice. Methods A literature review was conducted Scopus, PubMed, Google Scholar. Published records with “ChatGPT,” “Bing,” or “Bard” title were retrieved. Careful examination methodologies employed included identify common pertinent themes possible gaps reporting. panel discussion held establish unified thorough AI The finalized used evaluate by 2 independent raters. Cohen κ as method interrater reliability. Results final data set that formed basis theme identification analysis comprised total 34 records. 9 collectively referred METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, Specificity prompts language). Their details are follows: (1) Model its exact settings; (2) Evaluation approach generated content; (3) Timing testing model; (4) Transparency source; (5) Range tested topics; (6) Randomization selecting queries; (7) factors queries reliability; (8) Count executed test (9) language used. overall mean score 3.0 (SD 0.58). acceptable, range 0.558 0.962 (P<.001 items). With classification per item, highest average recorded “Model” followed “Specificity” while lowest scores “Randomization” item (classified suboptimal) “Individual factors” satisfactory). Conclusions can facilitate guiding researchers toward best practices results. highlight need standardized algorithms care, considering variability observed proposed could be helpful base universally accepted which swiftly evolving research topic.

Язык: Английский

Процитировано

23

A framework for human evaluation of large language models in healthcare derived from literature review DOI Creative Commons

Thomas Yu Chow Tam,

Sonish Sivarajkumar,

Sumit Kapoor

и другие.

npj Digital Medicine, Год журнала: 2024, Номер 7(1)

Опубликована: Сен. 28, 2024

Abstract With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential assuring safety and effectiveness. This study reviews existing literature on evaluation methodologies for healthcare across various medical specialties addresses factors such as dimensions, sample types sizes, selection, recruitment of evaluators, frameworks metrics, process, statistical analysis type. Our review 142 studies shows gaps reliability, generalizability, applicability current practices. To overcome significant obstacles LLM developments deployments, we propose QUEST, a comprehensive practical framework covering three phases workflow: Planning, Implementation Adjudication, Scoring Review. QUEST designed five proposed principles: Quality Information, Understanding Reasoning, Expression Style Persona, Safety Harm, Trust Confidence.

Язык: Английский

Процитировано

13

Bibliometric top ten healthcare-related ChatGPT publications in the first ChatGPT anniversary DOI Creative Commons
Malik Sallam

Narra J, Год журнала: 2024, Номер 4(2), С. e917 - e917

Опубликована: Авг. 5, 2024

Since its public release on November 30, 2022, ChatGPT has shown promising potential in diverse healthcare applications despite ethical challenges, privacy issues, and possible biases. The aim of this study was to identify assess the most influential publications field utility using bibliometric analysis. employed an advanced search three databases, Scopus, Web Science, Google Scholar, ChatGPT-related records education, research, practice between 27 2023. ranking based retrieved citation count each database. additional alternative metrics that were evaluated included (1) Semantic Scholar highly citations, (2) PlumX captures, (3) mentions, (4) social media (5) Altmetric Attention Scores (AASs). A total 22 unique published 17 different scientific journals from 14 publishers identified databases. Only two top 10 list across Variable publication types identified, with common being editorial/commentary (n=8/22, 36.4%). Nine had corresponding authors affiliated institutions United States (40.9%). range varied per database, highest (1019-121), followed by Scopus (242-88), Science (171-23). citations correlated significantly following metrics: (Spearman's correlation coefficient ρ=0.840,

Язык: Английский

Процитировано

9

The Role of Artificial Intelligence in the Primary Prevention of Common Musculoskeletal Diseases DOI Open Access

Selkin Yilmaz Muluk,

Nazli Olcucu

Cureus, Год журнала: 2024, Номер unknown

Опубликована: Июль 25, 2024

Musculoskeletal disorders (MSDs) are a leading cause of disability worldwide, with growing burden across all demographics. With advancements in technology, conversational artificial intelligence (AI) platforms such as ChatGPT (OpenAI, San Francisco, CA) have become instrumental disseminating health information. This study evaluated the effectiveness versions 3.5 and 4 delivering primary prevention information for common MSDs, emphasizing that is focused on not diagnosis.

Язык: Английский

Процитировано

8

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions DOI Creative Commons
Malik Sallam,

Khaled Al‐Salahat,

Huda Eid

и другие.

Advances in Medical Education and Practice, Год журнала: 2024, Номер Volume 15, С. 857 - 871

Опубликована: Сен. 1, 2024

Artificial intelligence (AI) chatbots excel in language understanding and generation. These models can transform healthcare education practice. However, it is important to assess the performance of such AI various topics highlight its strengths possible limitations. This study aimed evaluate ChatGPT (GPT-3.5 GPT-4), Bing, Bard compared human students at a postgraduate master's level Medical Laboratory Sciences.

Язык: Английский

Процитировано

6

Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain DOI Open Access

Selkin Yilmaz Muluk,

Nazli Olcucu

Cureus, Год журнала: 2024, Номер unknown

Опубликована: Июль 1, 2024

Background: Low back pain (LBP) is a prevalent healthcare concern that frequently responsive to conservative treatment. However, it can also stem from severe conditions, marked by 'red flags' (RF) such as malignancy, cauda equina syndrome, fractures, infections, spondyloarthropathies, and aneurysm rupture, which physicians should be vigilant about. Given the increasing reliance on online health information, this study assessed ChatGPT-3.5's (OpenAI, San Francisco, CA, USA) GoogleBard's (Google, Mountain View, accuracy in responding RF-related LBP questions their capacity discriminate severity of condition. Methods: We created 70 symptoms diseases following guidelines. Among them, 58 had single symptom (SS), 12 multiple (MS) LBP. Questions were posed ChatGPT GoogleBard, responses two authors for accuracy, completeness, relevance (ACR) using 5-point rubric criteria. Results: Cohen's kappa values (0.60-0.81) indicated significant agreement among authors. The average scores ranged 3.47 3.85 ChatGPT-3.5 3.36 3.76 GoogleBard SS questions, 4.04 4.29 3.50 3.71 MS questions. ratings these 'good' 'excellent'. Most effectively conveyed situation (93.1% ChatGPT-3.5, 94.8% GoogleBard), all did so. No statistically differences found between (p>0.05). Conclusions: In an era characterized widespread information seeking, artificial intelligence (AI) systems play vital role delivering precise medical information. These technologies may hold promise field if they continue improve.

Язык: Английский

Процитировано

5

Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic DOI Creative Commons
Malik Sallam,

Kholoud Al-Mahzoum,

Omaima Alshuaib

и другие.

BMC Infectious Diseases, Год журнала: 2024, Номер 24(1)

Опубликована: Авг. 8, 2024

Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy information in multilingual contexts. This study aimed compare AI model efficiency English Arabic for infectious disease queries.

Язык: Английский

Процитировано

4

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses DOI Creative Commons
Malik Sallam,

Kholoud Al-Mahzoum,

Rawan Ahmad Almutawaa

и другие.

BMC Research Notes, Год журнала: 2024, Номер 17(1)

Опубликована: Сен. 3, 2024

Язык: Английский

Процитировано

4

A Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-4o and Google Gemini in Answering Questions About Birth Control Methods DOI Open Access

Erhan Muluk

Cureus, Год журнала: 2025, Номер unknown

Опубликована: Янв. 1, 2025

Background Birth control methods (BCMs) are often underutilized or misunderstood, especially among young individuals entering their reproductive years. With the growing reliance on artificial intelligence (AI) platforms for health-related information, this study evaluates performance of ChatGPT-4o and Google Gemini in addressing commonly asked questions about BCMs. Methods Thirty questions, derived from American College Obstetrics Gynecologists (ACOG) website, were posed to both AI platforms. Questions spanned four categories: general contraception, specific contraceptive types, emergency other topics. Responses evaluated using a five-point rubric assessing Relevance, Completeness, Lack False Information (RCL). Overall scores calculated by averaging scores. Statistical analysis, including Wilcoxon Signed-Rank test, Friedman Kruskal-Wallis was performed compare metrics. Results provided high-quality responses birth control-related queries, with overall 4.38 ± 0.58 4.37 0.52, respectively, categorized as "very good" "excellent." demonstrated higher lack false based descriptive statistics (4.70 0.60 vs. 4.47 0.73), while outperformed relevance, statistically significant difference (4.53 0.57 4.30 0.70, p = 0.035, large effect size). Completeness comparable (p 0.655). analyses revealed no differences 0.548), though potential trend stronger "Other Topics" category. Within-model variability showed had more pronounced metrics (moderate size, Kendall's W 0.357), exhibited smaller (Kendall's 0.165). These findings suggest that offer reliable complementary tools knowledge gaps nuanced strengths warrant further exploration. Conclusions accurate BCM-related slight strengths. underscore tools, public health information needs, particularly seeking guidance contraception. Further studies larger datasets may elucidate between

Язык: Английский

Процитировано

0

Imagining Reality: Bosch's Vision of 15th-Century Dutch Art in Reflection of SDG and Cultural Sustainability DOI Creative Commons
Ding Wen,

Shahrul Anuar Shaari,

Chandy Chin

и другие.

Journal of Lifestyle and SDGs Review, Год журнала: 2025, Номер 5(2), С. e04109 - e04109

Опубликована: Янв. 23, 2025

Bosch's societal setting for his works Times of religious, social, and cultural transformation abound in The Garden Earthly Delights Last Judgment. Objective: This study aims to investigate evaluate the sociocultural psychological characteristics Bosch’s paintings. Method: used analysis checklist analyze paintings through thematic analysis. Results Discussion: Through these conversations, establishment social norms their impact on artistic output is revealed, broadening understanding complex multifaceted standard living fifteenth-century Holland. Originality/Value: As Bosch depicted symbols discourse paintings, people were inspired reflect deeply complexities that stood out 15th century. These insights illuminate historical standards emphasize importance sustainability Sustainable Development Goals. indicates art stimulates discussions around sustainable lifestyles resilience.

Язык: Английский

Процитировано

0