Опубликована: Янв. 1, 2024
Язык: Английский
Опубликована: Янв. 1, 2024
Язык: Английский
JAMA, Год журнала: 2024, Номер unknown
Опубликована: Окт. 15, 2024
Importance Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. Objective To summarize existing evaluations of LLMs terms 5 components: (1) data type, (2) task, (3) natural processing (NLP) and understanding (NLU) tasks, (4) dimension evaluation, (5) medical specialty. Data Sources A systematic search PubMed Web Science was performed for studies published between January 1, 2022, February 19, 2024. Study Selection Studies evaluating 1 or more care. Extraction Synthesis Three independent reviewers categorized via keyword searches based on used, NLP NLU dimensions Results Of 519 reviewed, 2024, only 5% used real patient LLM evaluation. The common tasks were assessing knowledge such as answering licensing examination questions (44.5%) making diagnoses (19.5%). Administrative assigning billing codes (0.2%) writing prescriptions less studied. For focused question (84.2%), while summarization (8.9%) conversational dialogue (3.3%) infrequent. Almost all (95.4%) accuracy primary evaluation; fairness, bias, toxicity (15.8%), deployment considerations (4.6%), calibration uncertainty (1.2%) infrequently measured. Finally, specialty area, generic applications (25.6%), internal medicine (16.4%), surgery (11.4%), ophthalmology (6.9%), with nuclear (0.6%), physical (0.4%), genetics being least represented. Conclusions Relevance Existing mostly focus examinations, without consideration data. Dimensions received limited attention. Future should adopt standardized metrics, use clinical data, broaden to include a wider range specialties.
Язык: Английский
Процитировано
34Information Fusion, Год журнала: 2025, Номер unknown, С. 102963 - 102963
Опубликована: Янв. 1, 2025
Язык: Английский
Процитировано
7JMIR Mental Health, Год журнала: 2024, Номер 11, С. e57400 - e57400
Опубликована: Сен. 3, 2024
Background Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention demonstrated potential in digital health, their application mental particularly clinical settings, has generated considerable debate. Objective This systematic review aims critically assess the use of LLMs specifically focusing applicability efficacy early screening, interventions, settings. By systematically collating assessing evidence from current studies, our work analyzes models, methodologies, data sources, outcomes, thereby highlighting challenges present, prospects for use. Methods Adhering PRISMA (Preferred Reporting Items Systematic Reviews Meta-Analyses) guidelines, this searched 5 open-access databases: MEDLINE (accessed by PubMed), IEEE Xplore, Scopus, JMIR, ACM Digital Library. Keywords used were (mental health OR illness disorder psychiatry) AND (large models). study included articles published between January 1, 2017, April 30, 2024, excluded languages other than English. Results In total, 40 evaluated, including 15 (38%) conditions suicidal ideation detection through text analysis, 7 (18%) as conversational agents, 18 (45%) applications evaluations health. show good effectiveness detecting issues providing accessible, destigmatized eHealth services. However, assessments also indicate that risks associated with might surpass benefits. These include inconsistencies text; production hallucinations; absence a comprehensive, benchmarked ethical framework. Conclusions examines inherent risks. The identifies several issues: lack multilingual annotated experts, concerns regarding accuracy reliability content, interpretability due “black box” nature LLMs, ongoing dilemmas. clear, framework; privacy issues; overreliance both physicians patients, which could compromise traditional medical practices. As result, should not be considered substitutes professional rapid development underscores valuable aids, emphasizing need continued research area. Trial Registration PROSPERO CRD42024508617; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=508617
Язык: Английский
Процитировано
13medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown
Опубликована: Апрель 16, 2024
1 Abstract Importance Large Language Models (LLMs) can assist in a wide range of healthcare-related activities. Current approaches to evaluating LLMs make it difficult identify the most impactful LLM application areas. Objective To summarize current evaluation healthcare terms 5 components: data type, task, Natural Processing (NLP)/Natural Understanding (NLU) dimension evaluation, and medical specialty. Data Sources A systematic search PubMed Web Science was performed for studies published between 01-01-2022 02-19-2024. Study Selection Studies one or more healthcare. Extraction Synthesis Three independent reviewers categorized 519 used tasks (the what) NLP/NLU how) examined, dimension(s) specialty studied. Results Only 5% reviewed utilized real patient care evaluation. The popular were assessing knowledge (e.g. answering licensing exam questions, 44.5%), followed by making diagnoses (19.5%), educating patients (17.7%). Administrative such as assigning provider billing codes (0.2%), writing prescriptions generating clinical referrals (0.6%) notetaking (0.8%) less For tasks, vast majority examined question (84.2%). Other summarization (8.9%), conversational dialogue (3.3%), translation (3.1%) infrequent. Almost all (95.4%) accuracy primary evaluation; fairness, bias toxicity (15.8%), robustness (14.8%), deployment considerations (4.6%), calibration uncertainty (1.2%) infrequently measured. Finally, area, internal medicine (42%), surgery (11.4%) ophthalmology (6.9%), with nuclear (0.6%), physical (0.4%) genetics (0.2%) being least represented. Conclusions Relevance Existing evaluations mostly focused on exams, without consideration data. Dimensions like toxicity, robustness, received limited attention. draw meaningful conclusions improve adoption, future need establish standardized set applications dimensions, perform using from routine care, broaden testing include administrative well multiple specialties. Key Points Question How are large language models currently evaluated? Findings rarely understudied. summarization, dialogue, explored. Accuracy predominant while assessments neglected. Evaluations specialized fields, rare. Meaning remain shallow fragmented. concrete insights their performance, use across broad specialties dimensions
Язык: Английский
Процитировано
10BMC Oral Health, Год журнала: 2024, Номер 24(1)
Опубликована: Май 24, 2024
Abstract Background The use of artificial intelligence in the field health sciences is becoming widespread. It known that patients benefit from applications on various issues, especially after pandemic period. One most important issues this regard accuracy information provided by applications. Objective purpose study was to frequently asked questions about dental amalgam, as determined United States Food and Drug Administration (FDA), which one these resources, Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) compare content answers given application with FDA. Methods were directed ChatGPT-4 May 8th 16th, 2023, responses recorded compared at word meaning levels using ChatGPT. FDA webpage also recorded. for similarity “Main Idea”, “Quality Analysis”, “Common Ideas”, “Inconsistent Ideas” between ChatGPT-4’s FDA’s responses. Results similar one-week intervals. In comparison guidance, it questions. However, although there some similarities general aspects recommendation regarding amalgam removal question, two texts are not same, they offered different perspectives replacement fillings. Conclusions findings indicate ChatGPT-4, an based application, encompasses current accurate its removal, providing individuals seeking access such information. Nevertheless, we believe numerous studies required assess validity reliability across diverse subjects.
Язык: Английский
Процитировано
7Journal of English for Academic Purposes, Год журнала: 2024, Номер 71, С. 101422 - 101422
Опубликована: Июль 17, 2024
Язык: Английский
Процитировано
6International Journal of Environmental Research and Public Health, Год журнала: 2024, Номер 21(7), С. 910 - 910
Опубликована: Июль 12, 2024
(1) Background: Artificial intelligence (AI) has flourished in recent years. More specifically, generative AI had broad applications many disciplines. While mental illness is on the rise, proven valuable aiding diagnosis and treatment of disorders. However, there little to no research about precisely how much interest technology. (2) Methods: We performed a Google Trends search for “AI health” compared relative volume (RSV) indices “AI”, Depression”, anxiety”. This time series study employed Box–Jenkins modeling forecast long-term through end 2024. (3) Results: Within United States, steadily increased throughout 2023, with some anomalies due media reporting. Through predictive models, we found that this trend predicted increase 114% year 2024, public being rise. (4) Conclusions: According our study, awareness drastically especially health. demonstrates increasing health AI, making advocacy education technology paramount importance.
Язык: Английский
Процитировано
5Journal of Healthcare Informatics Research, Год журнала: 2024, Номер 8(4), С. 658 - 711
Опубликована: Сен. 14, 2024
Язык: Английский
Процитировано
5Artificial Intelligence Review, Год журнала: 2025, Номер 58(3)
Опубликована: Янв. 6, 2025
Sentiment analysis has emerged as a prominent research domain within the realm of natural language processing, garnering increasing attention and growing body literature. While numerous literature reviews have examined sentiment techniques, methods, topics applications, there remains gap in concerning thematic trends methodologies analysis, particularly context Chinese text. This study addresses this by presenting comprehensive survey dedicated to progression subjects, methods Employing framework that combines keyword co-occurrence with sophisticated community detection algorithm, offers novel perspective on landscape research. By tracing interplay between emerging over past two decades, our not only facilitates comparative their correlations but also illuminates evolving patterns, identifying significant hotspots time for text analysis. invaluable insight provides roadmap researchers seeking navigate intricate terrain language. Moreover, paper extends beyond academic realm, offering practical insights into themes while pinpointing avenues future exploration, technical limitations, directions
Язык: Английский
Процитировано
0PeerJ Computer Science, Год журнала: 2025, Номер 11, С. e2592 - e2592
Опубликована: Янв. 28, 2025
With the rapid expansion of social media and e-commerce platforms, an unprecedented volume user-generated content has emerged, offering organizations, governments, researchers invaluable insights into public sentiment. Yet, vast unstructured nature this data challenges traditional analysis methods. Sentiment analysis, a specialized field within natural language processing, evolved to meet these by automating detection categorization opinions emotions in text. This review comprehensively examines evolving techniques sentiment detailing foundational processes such as gathering feature extraction. It explores spectrum methodologies, from classical word embedding machine learning algorithms recent contextual advanced transformer models like Generative Pre-trained Transformer (GPT), Bidirectional Encoder Representations Transformers (BERT), T5. critical comparison methods, article highlights their appropriate uses limitations. Additionally, provides thorough overview current trends, future directions, exploration unresolved challenges. By synthesizing developments, equips with solid foundation for assessing state guiding advancements dynamic field.
Язык: Английский
Процитировано
0