Cited by An introduction to machine learning and generative artificial intelligence for otolaryngologists—head and neck surgeons: a narrative review

Artificial Intelligence–Generated Draft Replies to Patient Inbox Messages DOI

Patricia García, P. Stephen, Shreya Shah

и другие.

JAMA Network Open, Год журнала: 2024, Номер 7(3), С. e243201 - e243201

Опубликована: Март 20, 2024

Importance The emergence and promise of generative artificial intelligence (AI) represent a turning point for health care. Rigorous evaluation AI deployment in clinical practice is needed to inform strategic decision-making. Objective To evaluate the implementation large language model used draft responses patient messages electronic inbox. Design, Setting, Participants A 5-week, prospective, single-group quality improvement study was conducted from July 10 through August 13, 2023, at single academic medical center (Stanford Health Care). All attending physicians, advanced practitioners, clinic nurses, pharmacists Divisions Primary Care Gastroenterology Hepatology were enrolled pilot. Intervention Draft replies portal generated by Insurance Portability Accountability Act–compliant record–integrated model. Main Outcomes Measures primary outcome AI-generated reply utilization as percentage total message replies. Secondary outcomes included changes time measures clinician experience assessed survey. Results 197 clinicians pilot; 35 who prepilot beta users, out office, or not tied specific ambulatory excluded, leaving 162 analysis. survey analysis cohort consisted 73 participants (45.1%) completed both presurvey postsurvey. In gastroenterology hepatology, there 58 physicians APPs nurses. care, 83 APPs, 4 8 pharmacists. mean response rate across 20%. There no change action time, write read between pilot periods. statistically significant reductions 4-item physician task load score derivative (mean [SD], 61.31 [17.23] vs 47.26 [17.11] postsurvey; paired difference, −13.87; 95% CI, −17.38 −9.50; P &lt; .001) work exhaustion scores 1.95 [0.79] 1.62 [0.68] −0.33; −0.50 −0.17; .001). Conclusions Relevance this an early AI, notable adoption, usability, assessments burden burnout. time. Further code-to-bedside testing guide future development organizational strategy.

Язык: Английский

Процитировано

The application of large language models in medicine: A scoping review DOI

Xiangbin Meng,

Xiangyu Yan,

Kuo Zhang

и другие.

iScience, Год журнала: 2024, Номер 27(5), С. 109713 - 109713

Опубликована: Апрель 23, 2024

This study systematically reviewed the application of large language models (LLMs) in medicine, analyzing 550 selected studies from a vast literature search. LLMs like ChatGPT transformed healthcare by enhancing diagnostics, medical writing, education, and project management. They assisted drafting documents, creating training simulations, streamlining research processes. Despite their growing utility diagnosis improving doctor-patient communication, challenges persisted, including limitations contextual understanding risk over-reliance. The surge LLM-related indicated focus on patient but highlighted need for careful integration, considering validation, ethical concerns, balance with traditional practice. Future directions suggested multimodal LLMs, deeper algorithmic understanding, ensuring responsible, effective use healthcare.

Язык: Английский

Процитировано

Artificial intelligence, ChatGPT, and other large language models for social determinants of health: Current state and future directions DOI

Jasmine Chiat Ling Ong, Jun Jie Benjamin Seng,

Jeren Zheng Feng Law

и другие.

Cell Reports Medicine, Год журнала: 2024, Номер 5(1), С. 101356 - 101356

Опубликована: Янв. 1, 2024

This perspective highlights the importance of addressing social determinants health (SDOH) in patient outcomes and inequity, a global problem exacerbated by COVID-19 pandemic. We provide broad discussion on current developments digital artificial intelligence (AI), including large language models (LLMs), as transformative tools SDOH factors, offering new capabilities for disease surveillance care. Simultaneously, we bring attention to challenges, such data standardization, infrastructure limitations, literacy, algorithmic bias, that could hinder equitable access AI benefits. For LLMs, highlight potential unique challenges risks environmental impact, unfair labor practices, inadvertent disinformation or "hallucinations," proliferation infringement copyrights. propose need multitiered approach inclusion an development ethical responsible practice frameworks globally suggestions bridging gap from implementation technologies.

Язык: Английский

Процитировано

Can Artificial Intelligence Improve the Readability of Patient Education Materials on Aortic Stenosis? A Pilot Study DOI

Armaun D. Rouhi, Yazid K. Ghanem,

Laman Yolchieva

и другие.

Cardiology and Therapy, Год журнала: 2024, Номер 13(1), С. 137 - 147

Опубликована: Янв. 9, 2024

The advent of generative artificial intelligence (AI) dialogue platforms and large language models (LLMs) may help facilitate ongoing efforts to improve health literacy. Additionally, recent studies have highlighted inadequate literacy among patients with cardiac disease. aim the present study was ascertain whether two freely available AI could rewrite online aortic stenosis (AS) patient education materials (PEMs) meet recommended reading skill levels for public.

Язык: Английский

Процитировано

AI-Generated Draft Replies Integrated Into Health Records and Physicians’ Electronic Communication DOI

Ming Tai-Seale,

Sally L. Baxter, Florin Vaida

и другие.

JAMA Network Open, Год журнала: 2024, Номер 7(4), С. e246565 - e246565

Опубликована: Апрель 15, 2024

Importance Timely tests are warranted to assess the association between generative artificial intelligence (GenAI) use and physicians’ work efforts. Objective To investigate GenAI-drafted replies for patient messages physician time spent on answering length of replies. Design, Setting, Participants Randomized waiting list quality improvement (QI) study from June August 2023 in an academic health system. Primary care physicians were randomized immediate activation group a delayed group. Data analyzed November 2023. Exposure Access messages. Main Outcomes Measures Time (1) reading messages, (2) replying (3) replies, (4) likelihood recommend GenAI drafts. The priori hypothesis was that drafts would be associated with less A mixed-effects model used. Results Fifty-two participated this QI study, 25 27 contemporary control included 70 physicians. There 18 female participants (72.0%) 17 (63.0%) group; median age range 35-44 years 45-54 (IQR) 26 (11-69) seconds at baseline, 31 (15-70) 3 weeks after entry intervention, (14-70) 6 entry. group’s read (10-67) 29 (11-77) during 3-week period, 32 (15-72) intervention. times 21 (9-54), 22 (9-63), 23 (9-60) corresponding periods. estimated 21.8% increase (95% CI, 5.2% 41.0%; P = .008), −5.9% change reply −16.6% 6.2%; .33), 17.9% 10.1% 26.2%; &lt; .001). recognized GenAI’s value suggested areas improvement. Conclusions Relevance In significantly increased time, no length, some perceived benefits. Rigorous empirical necessary further examine performance. Future studies should experience compare multiple GenAIs, including those medical training.

Язык: Английский

Процитировано

Testing and Evaluation of Health Care Applications of Large Language Models DOI

Suhana Bedi, Yutong Liu, Lucy Orr-Ewing

и другие.

JAMA, Год журнала: 2024, Номер unknown

Опубликована: Окт. 15, 2024

Importance Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. Objective To summarize existing evaluations of LLMs terms 5 components: (1) data type, (2) task, (3) natural processing (NLP) and understanding (NLU) tasks, (4) dimension evaluation, (5) medical specialty. Data Sources A systematic search PubMed Web Science was performed for studies published between January 1, 2022, February 19, 2024. Study Selection Studies evaluating 1 or more care. Extraction Synthesis Three independent reviewers categorized via keyword searches based on used, NLP NLU dimensions Results Of 519 reviewed, 2024, only 5% used real patient LLM evaluation. The common tasks were assessing knowledge such as answering licensing examination questions (44.5%) making diagnoses (19.5%). Administrative assigning billing codes (0.2%) writing prescriptions less studied. For focused question (84.2%), while summarization (8.9%) conversational dialogue (3.3%) infrequent. Almost all (95.4%) accuracy primary evaluation; fairness, bias, toxicity (15.8%), deployment considerations (4.6%), calibration uncertainty (1.2%) infrequently measured. Finally, specialty area, generic applications (25.6%), internal medicine (16.4%), surgery (11.4%), ophthalmology (6.9%), with nuclear (0.6%), physical (0.4%), genetics being least represented. Conclusions Relevance Existing mostly focus examinations, without consideration data. Dimensions received limited attention. Future should adopt standardized metrics, use clinical data, broaden to include a wider range specialties.

Язык: Английский

Процитировано

Diagnostic Accuracy of a Large Language Model in Pediatric Case Studies DOI

Joseph Barile,

A Margolis,

Grace Cason

и другие.

JAMA Pediatrics, Год журнала: 2024, Номер 178(3), С. 313 - 313

Опубликована: Янв. 2, 2024

This diagnostic study evaluates the accuracy of a large language model against physician diagnoses in pediatric cases.

Язык: Английский

Процитировано

Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis DOI

Yazid K. Ghanem, Armaun D. Rouhi,

Ammr Al-Houssan

и другие.

Surgical Endoscopy, Год журнала: 2024, Номер 38(5), С. 2887 - 2893

Опубликована: Март 5, 2024

Abstract Introduction Generative artificial intelligence (AI) chatbots have recently been posited as potential sources of online medical information for patients making decisions. Existing patient-oriented has repeatedly shown to be variable quality and difficult readability. Therefore, we sought evaluate the content AI-generated on acute appendicitis. Methods A modified DISCERN assessment tool, comprising 16 distinct criteria each scored a 5-point Likert scale (score range 16–80), was used assess content. Readability determined using Flesch Reading Ease (FRE) Flesch-Kincaid Grade Level (FKGL) scores. Four popular chatbots, ChatGPT-3.5 ChatGPT-4, Bard, Claude-2, were prompted generate about Three investigators independently generated texts blinded identity AI platforms. Results ChatGPT-3.5, Claude-2 had overall mean (SD) scores 60.7 (1.2), 62.0 (1.0), 62.3 51.3 (2.3), respectively, 16–80. Inter-rater reliability 0.81, 0.75, 0.72, indicating substantial agreement. demonstrated significantly lower score compared ChatGPT-4 ( p = 0.001), 0.005), Bard 0.001). only platform that listed verifiable sources, while provided fabricated sources. All except advised readers consult physician if experiencing symptoms. Regarding readability, FKGL FRE 14.6 23.8, 11.9 33.9, 8.6 52.8, 11.0 36.6, difficulty readability at college reading skill level. Conclusion appendicitis favorably upon assessment, but most either or did not provide any altogether. Additionally, far exceeded recommended levels public. platforms demonstrate measured patient education engagement

Язык: Английский

Процитировано

Reporting Use of AI in Research and Scholarly Publication—JAMA Network Guidance DOI

Annette Flanagin, Romain Pirracchio, Rohan Khera

и другие.

JAMA, Год журнала: 2024, Номер 331(13), С. 1096 - 1096

Опубликована: Март 7, 2024

Nonhuman "Authors" and Implications for the Integrity of Scientific Publication Medical Knowledge

Язык: Английский

Процитировано

Clinical Text Datasets for Medical Artificial Intelligence and Large Language Models — A Systematic Review DOI

Jiageng Wu, Xiaocong Liu, Minghui Li

и другие.

NEJM AI, Год журнала: 2024, Номер 1(6)

Опубликована: Май 17, 2024

Privacy and ethical considerations limit access to large-scale clinical datasets, particularly text data, which contain extensive diverse information serve as the foundation for building large language models (LLMs). The limited accessibility of data impedes development artificial intelligence systems hampers research participation from resource-poor regions medical institutions, thereby exacerbating health care disparities. In this review, we conduct a global review identify publicly available datasets elaborate on their accessibility, diversity, usability LLMs. We screened 3962 papers across (PubMed MEDLINE) computational linguistic academic databases (the Association Computational Linguistics Anthology) well 239 tasks prevalent natural processing (NLP) challenges, such National NLP Clinical Challenges (n2c2). identified 192 unique that claimed be available. Following an institutional board–approved data-requesting pipeline, was granted fewer than half (91 [47.4%]) with additional 14 (7.3%) being regulated 87 (45.3%) remaining inaccessible. cover nine languages countries over 10 million records, mostly (88 [95.7%]) originated Americas, Europe, Asia, none originating Oceania or Africa, leaving these significantly underrepresented. Distribution differences were also evident within focused context supported tasks, intensive unit (18 [16.8%]), respiratory disease (13 [12.1%]), cardiovascular (11 [10.3%]) gaining significant attention. Named entity recognition (23 [21.7%]), classification (22 [20.8%]), event extraction (12 [11.3%]) most explored datasets. To our knowledge, is first systematic characterize LLMs, highlighting difficulty in underrepresentation languages, challenges posed by Sharing diversified necessary, protection promote research.

Язык: Английский

Процитировано