European Archives of Oto-Rhino-Laryngology, Год журнала: 2024, Номер 281(5), С. 2723 - 2731
Опубликована: Фев. 23, 2024
Язык: Английский
European Archives of Oto-Rhino-Laryngology, Год журнала: 2024, Номер 281(5), С. 2723 - 2731
Опубликована: Фев. 23, 2024
Язык: Английский
JAMA Network Open, Год журнала: 2024, Номер 7(3), С. e243201 - e243201
Опубликована: Март 20, 2024
Importance The emergence and promise of generative artificial intelligence (AI) represent a turning point for health care. Rigorous evaluation AI deployment in clinical practice is needed to inform strategic decision-making. Objective To evaluate the implementation large language model used draft responses patient messages electronic inbox. Design, Setting, Participants A 5-week, prospective, single-group quality improvement study was conducted from July 10 through August 13, 2023, at single academic medical center (Stanford Health Care). All attending physicians, advanced practitioners, clinic nurses, pharmacists Divisions Primary Care Gastroenterology Hepatology were enrolled pilot. Intervention Draft replies portal generated by Insurance Portability Accountability Act–compliant record–integrated model. Main Outcomes Measures primary outcome AI-generated reply utilization as percentage total message replies. Secondary outcomes included changes time measures clinician experience assessed survey. Results 197 clinicians pilot; 35 who prepilot beta users, out office, or not tied specific ambulatory excluded, leaving 162 analysis. survey analysis cohort consisted 73 participants (45.1%) completed both presurvey postsurvey. In gastroenterology hepatology, there 58 physicians APPs nurses. care, 83 APPs, 4 8 pharmacists. mean response rate across 20%. There no change action time, write read between pilot periods. statistically significant reductions 4-item physician task load score derivative (mean [SD], 61.31 [17.23] vs 47.26 [17.11] postsurvey; paired difference, −13.87; 95% CI, −17.38 −9.50; P < .001) work exhaustion scores 1.95 [0.79] 1.62 [0.68] −0.33; −0.50 −0.17; .001). Conclusions Relevance this an early AI, notable adoption, usability, assessments burden burnout. time. Further code-to-bedside testing guide future development organizational strategy.
Язык: Английский
Процитировано
65iScience, Год журнала: 2024, Номер 27(5), С. 109713 - 109713
Опубликована: Апрель 23, 2024
This study systematically reviewed the application of large language models (LLMs) in medicine, analyzing 550 selected studies from a vast literature search. LLMs like ChatGPT transformed healthcare by enhancing diagnostics, medical writing, education, and project management. They assisted drafting documents, creating training simulations, streamlining research processes. Despite their growing utility diagnosis improving doctor-patient communication, challenges persisted, including limitations contextual understanding risk over-reliance. The surge LLM-related indicated focus on patient but highlighted need for careful integration, considering validation, ethical concerns, balance with traditional practice. Future directions suggested multimodal LLMs, deeper algorithmic understanding, ensuring responsible, effective use healthcare.
Язык: Английский
Процитировано
60Cell Reports Medicine, Год журнала: 2024, Номер 5(1), С. 101356 - 101356
Опубликована: Янв. 1, 2024
This perspective highlights the importance of addressing social determinants health (SDOH) in patient outcomes and inequity, a global problem exacerbated by COVID-19 pandemic. We provide broad discussion on current developments digital artificial intelligence (AI), including large language models (LLMs), as transformative tools SDOH factors, offering new capabilities for disease surveillance care. Simultaneously, we bring attention to challenges, such data standardization, infrastructure limitations, literacy, algorithmic bias, that could hinder equitable access AI benefits. For LLMs, highlight potential unique challenges risks environmental impact, unfair labor practices, inadvertent disinformation or "hallucinations," proliferation infringement copyrights. propose need multitiered approach inclusion an development ethical responsible practice frameworks globally suggestions bridging gap from implementation technologies.
Язык: Английский
Процитировано
39Cardiology and Therapy, Год журнала: 2024, Номер 13(1), С. 137 - 147
Опубликована: Янв. 9, 2024
The advent of generative artificial intelligence (AI) dialogue platforms and large language models (LLMs) may help facilitate ongoing efforts to improve health literacy. Additionally, recent studies have highlighted inadequate literacy among patients with cardiac disease. aim the present study was ascertain whether two freely available AI could rewrite online aortic stenosis (AS) patient education materials (PEMs) meet recommended reading skill levels for public.
Язык: Английский
Процитировано
38JAMA Network Open, Год журнала: 2024, Номер 7(4), С. e246565 - e246565
Опубликована: Апрель 15, 2024
Importance Timely tests are warranted to assess the association between generative artificial intelligence (GenAI) use and physicians’ work efforts. Objective To investigate GenAI-drafted replies for patient messages physician time spent on answering length of replies. Design, Setting, Participants Randomized waiting list quality improvement (QI) study from June August 2023 in an academic health system. Primary care physicians were randomized immediate activation group a delayed group. Data analyzed November 2023. Exposure Access messages. Main Outcomes Measures Time (1) reading messages, (2) replying (3) replies, (4) likelihood recommend GenAI drafts. The priori hypothesis was that drafts would be associated with less A mixed-effects model used. Results Fifty-two participated this QI study, 25 27 contemporary control included 70 physicians. There 18 female participants (72.0%) 17 (63.0%) group; median age range 35-44 years 45-54 (IQR) 26 (11-69) seconds at baseline, 31 (15-70) 3 weeks after entry intervention, (14-70) 6 entry. group’s read (10-67) 29 (11-77) during 3-week period, 32 (15-72) intervention. times 21 (9-54), 22 (9-63), 23 (9-60) corresponding periods. estimated 21.8% increase (95% CI, 5.2% 41.0%; P = .008), −5.9% change reply −16.6% 6.2%; .33), 17.9% 10.1% 26.2%; < .001). recognized GenAI’s value suggested areas improvement. Conclusions Relevance In significantly increased time, no length, some perceived benefits. Rigorous empirical necessary further examine performance. Future studies should experience compare multiple GenAIs, including those medical training.
Язык: Английский
Процитировано
38JAMA, Год журнала: 2024, Номер unknown
Опубликована: Окт. 15, 2024
Importance Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. Objective To summarize existing evaluations of LLMs terms 5 components: (1) data type, (2) task, (3) natural processing (NLP) and understanding (NLU) tasks, (4) dimension evaluation, (5) medical specialty. Data Sources A systematic search PubMed Web Science was performed for studies published between January 1, 2022, February 19, 2024. Study Selection Studies evaluating 1 or more care. Extraction Synthesis Three independent reviewers categorized via keyword searches based on used, NLP NLU dimensions Results Of 519 reviewed, 2024, only 5% used real patient LLM evaluation. The common tasks were assessing knowledge such as answering licensing examination questions (44.5%) making diagnoses (19.5%). Administrative assigning billing codes (0.2%) writing prescriptions less studied. For focused question (84.2%), while summarization (8.9%) conversational dialogue (3.3%) infrequent. Almost all (95.4%) accuracy primary evaluation; fairness, bias, toxicity (15.8%), deployment considerations (4.6%), calibration uncertainty (1.2%) infrequently measured. Finally, specialty area, generic applications (25.6%), internal medicine (16.4%), surgery (11.4%), ophthalmology (6.9%), with nuclear (0.6%), physical (0.4%), genetics being least represented. Conclusions Relevance Existing mostly focus examinations, without consideration data. Dimensions received limited attention. Future should adopt standardized metrics, use clinical data, broaden to include a wider range specialties.
Язык: Английский
Процитировано
36JAMA Pediatrics, Год журнала: 2024, Номер 178(3), С. 313 - 313
Опубликована: Янв. 2, 2024
This diagnostic study evaluates the accuracy of a large language model against physician diagnoses in pediatric cases.
Язык: Английский
Процитировано
33Surgical Endoscopy, Год журнала: 2024, Номер 38(5), С. 2887 - 2893
Опубликована: Март 5, 2024
Abstract Introduction Generative artificial intelligence (AI) chatbots have recently been posited as potential sources of online medical information for patients making decisions. Existing patient-oriented has repeatedly shown to be variable quality and difficult readability. Therefore, we sought evaluate the content AI-generated on acute appendicitis. Methods A modified DISCERN assessment tool, comprising 16 distinct criteria each scored a 5-point Likert scale (score range 16–80), was used assess content. Readability determined using Flesch Reading Ease (FRE) Flesch-Kincaid Grade Level (FKGL) scores. Four popular chatbots, ChatGPT-3.5 ChatGPT-4, Bard, Claude-2, were prompted generate about Three investigators independently generated texts blinded identity AI platforms. Results ChatGPT-3.5, Claude-2 had overall mean (SD) scores 60.7 (1.2), 62.0 (1.0), 62.3 51.3 (2.3), respectively, 16–80. Inter-rater reliability 0.81, 0.75, 0.72, indicating substantial agreement. demonstrated significantly lower score compared ChatGPT-4 ( p = 0.001), 0.005), Bard 0.001). only platform that listed verifiable sources, while provided fabricated sources. All except advised readers consult physician if experiencing symptoms. Regarding readability, FKGL FRE 14.6 23.8, 11.9 33.9, 8.6 52.8, 11.0 36.6, difficulty readability at college reading skill level. Conclusion appendicitis favorably upon assessment, but most either or did not provide any altogether. Additionally, far exceeded recommended levels public. platforms demonstrate measured patient education engagement
Язык: Английский
Процитировано
33JAMA, Год журнала: 2024, Номер 331(13), С. 1096 - 1096
Опубликована: Март 7, 2024
Nonhuman "Authors" and Implications for the Integrity of Scientific Publication Medical Knowledge
Язык: Английский
Процитировано
26NEJM AI, Год журнала: 2024, Номер 1(6)
Опубликована: Май 17, 2024
Privacy and ethical considerations limit access to large-scale clinical datasets, particularly text data, which contain extensive diverse information serve as the foundation for building large language models (LLMs). The limited accessibility of data impedes development artificial intelligence systems hampers research participation from resource-poor regions medical institutions, thereby exacerbating health care disparities. In this review, we conduct a global review identify publicly available datasets elaborate on their accessibility, diversity, usability LLMs. We screened 3962 papers across (PubMed MEDLINE) computational linguistic academic databases (the Association Computational Linguistics Anthology) well 239 tasks prevalent natural processing (NLP) challenges, such National NLP Clinical Challenges (n2c2). identified 192 unique that claimed be available. Following an institutional board–approved data-requesting pipeline, was granted fewer than half (91 [47.4%]) with additional 14 (7.3%) being regulated 87 (45.3%) remaining inaccessible. cover nine languages countries over 10 million records, mostly (88 [95.7%]) originated Americas, Europe, Asia, none originating Oceania or Africa, leaving these significantly underrepresented. Distribution differences were also evident within focused context supported tasks, intensive unit (18 [16.8%]), respiratory disease (13 [12.1%]), cardiovascular (11 [10.3%]) gaining significant attention. Named entity recognition (23 [21.7%]), classification (22 [20.8%]), event extraction (12 [11.3%]) most explored datasets. To our knowledge, is first systematic characterize LLMs, highlighting difficulty in underrepresentation languages, challenges posed by Sharing diversified necessary, protection promote research.
Язык: Английский
Процитировано
25