Cited by Enhancing qualitative research in psychology with large language models: a methodological exploration and examples of simulations

Testing and Evaluation of Health Care Applications of Large Language Models DOI

Suhana Bedi, Yutong Liu, Lucy Orr-Ewing

и другие.

JAMA, Год журнала: 2024, Номер unknown

Опубликована: Окт. 15, 2024

Importance Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. Objective To summarize existing evaluations of LLMs terms 5 components: (1) data type, (2) task, (3) natural processing (NLP) and understanding (NLU) tasks, (4) dimension evaluation, (5) medical specialty. Data Sources A systematic search PubMed Web Science was performed for studies published between January 1, 2022, February 19, 2024. Study Selection Studies evaluating 1 or more care. Extraction Synthesis Three independent reviewers categorized via keyword searches based on used, NLP NLU dimensions Results Of 519 reviewed, 2024, only 5% used real patient LLM evaluation. The common tasks were assessing knowledge such as answering licensing examination questions (44.5%) making diagnoses (19.5%). Administrative assigning billing codes (0.2%) writing prescriptions less studied. For focused question (84.2%), while summarization (8.9%) conversational dialogue (3.3%) infrequent. Almost all (95.4%) accuracy primary evaluation; fairness, bias, toxicity (15.8%), deployment considerations (4.6%), calibration uncertainty (1.2%) infrequently measured. Finally, specialty area, generic applications (25.6%), internal medicine (16.4%), surgery (11.4%), ophthalmology (6.9%), with nuclear (0.6%), physical (0.4%), genetics being least represented. Conclusions Relevance Existing mostly focus examinations, without consideration data. Dimensions received limited attention. Future should adopt standardized metrics, use clinical data, broaden to include a wider range specialties.

Язык: Английский

Процитировано

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics DOI

Kai He, Rui Mao, Qika Lin

и другие.

Information Fusion, Год журнала: 2025, Номер unknown, С. 102963 - 102963

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

A review of Chinese sentiment analysis: subjects, methods, and trends DOI

Zhaoxia Wang, Donghao Huang, Jingfeng Cui

и другие.

Artificial Intelligence Review, Год журнала: 2025, Номер 58(3)

Опубликована: Янв. 6, 2025

Sentiment analysis has emerged as a prominent research domain within the realm of natural language processing, garnering increasing attention and growing body literature. While numerous literature reviews have examined sentiment techniques, methods, topics applications, there remains gap in concerning thematic trends methodologies analysis, particularly context Chinese text. This study addresses this by presenting comprehensive survey dedicated to progression subjects, methods Employing framework that combines keyword co-occurrence with sophisticated community detection algorithm, offers novel perspective on landscape research. By tracing interplay between emerging over past two decades, our not only facilitates comparative their correlations but also illuminates evolving patterns, identifying significant hotspots time for text analysis. invaluable insight provides roadmap researchers seeking navigate intricate terrain language. Moreover, paper extends beyond academic realm, offering practical insights into themes while pinpointing avenues future exploration, technical limitations, directions

Язык: Английский

Процитировано

Large Language Models for Mental Health Applications: A Systematic Review (Preprint) DOI

Zhijun Guo, Alvina G. Lai, Johan H. Thygesen

и другие.

JMIR Mental Health, Год журнала: 2024, Номер 11, С. e57400 - e57400

Опубликована: Сен. 3, 2024

Background Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention demonstrated potential in digital health, their application mental particularly clinical settings, has generated considerable debate. Objective This systematic review aims critically assess the use of LLMs specifically focusing applicability efficacy early screening, interventions, settings. By systematically collating assessing evidence from current studies, our work analyzes models, methodologies, data sources, outcomes, thereby highlighting challenges present, prospects for use. Methods Adhering PRISMA (Preferred Reporting Items Systematic Reviews Meta-Analyses) guidelines, this searched 5 open-access databases: MEDLINE (accessed by PubMed), IEEE Xplore, Scopus, JMIR, ACM Digital Library. Keywords used were (mental health OR illness disorder psychiatry) AND (large models). study included articles published between January 1, 2017, April 30, 2024, excluded languages other than English. Results In total, 40 evaluated, including 15 (38%) conditions suicidal ideation detection through text analysis, 7 (18%) as conversational agents, 18 (45%) applications evaluations health. show good effectiveness detecting issues providing accessible, destigmatized eHealth services. However, assessments also indicate that risks associated with might surpass benefits. These include inconsistencies text; production hallucinations; absence a comprehensive, benchmarked ethical framework. Conclusions examines inherent risks. The identifies several issues: lack multilingual annotated experts, concerns regarding accuracy reliability content, interpretability due “black box” nature LLMs, ongoing dilemmas. clear, framework; privacy issues; overreliance both physicians patients, which could compromise traditional medical practices. As result, should not be considered substitutes professional rapid development underscores valuable aids, emphasizing need continued research area. Trial Registration PROSPERO CRD42024508617; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=508617

Язык: Английский

Процитировано

A Systematic Review of Testing and Evaluation of Healthcare Applications of Large Language Models (LLMs) DOI

Suhana Bedi, Yutong Liu, Lucy Orr-Ewing

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Апрель 16, 2024

1 Abstract Importance Large Language Models (LLMs) can assist in a wide range of healthcare-related activities. Current approaches to evaluating LLMs make it difficult identify the most impactful LLM application areas. Objective To summarize current evaluation healthcare terms 5 components: data type, task, Natural Processing (NLP)/Natural Understanding (NLU) dimension evaluation, and medical specialty. Data Sources A systematic search PubMed Web Science was performed for studies published between 01-01-2022 02-19-2024. Study Selection Studies one or more healthcare. Extraction Synthesis Three independent reviewers categorized 519 used tasks (the what) NLP/NLU how) examined, dimension(s) specialty studied. Results Only 5% reviewed utilized real patient care evaluation. The popular were assessing knowledge (e.g. answering licensing exam questions, 44.5%), followed by making diagnoses (19.5%), educating patients (17.7%). Administrative such as assigning provider billing codes (0.2%), writing prescriptions generating clinical referrals (0.6%) notetaking (0.8%) less For tasks, vast majority examined question (84.2%). Other summarization (8.9%), conversational dialogue (3.3%), translation (3.1%) infrequent. Almost all (95.4%) accuracy primary evaluation; fairness, bias toxicity (15.8%), robustness (14.8%), deployment considerations (4.6%), calibration uncertainty (1.2%) infrequently measured. Finally, area, internal medicine (42%), surgery (11.4%) ophthalmology (6.9%), with nuclear (0.6%), physical (0.4%) genetics (0.2%) being least represented. Conclusions Relevance Existing evaluations mostly focused on exams, without consideration data. Dimensions like toxicity, robustness, received limited attention. draw meaningful conclusions improve adoption, future need establish standardized set applications dimensions, perform using from routine care, broaden testing include administrative well multiple specialties. Key Points Question How are large language models currently evaluated? Findings rarely understudied. summarization, dialogue, explored. Accuracy predominant while assessments neglected. Evaluations specialized fields, rare. Meaning remain shallow fragmented. concrete insights their performance, use across broad specialties dimensions

Язык: Английский

Процитировано

Mental Health Applications of Generative AI and Large Language Modeling in the United States DOI

Srikanta Banerjee, Patrick Dunn,

Scott Conard

и другие.

International Journal of Environmental Research and Public Health, Год журнала: 2024, Номер 21(7), С. 910 - 910

Опубликована: Июль 12, 2024

(1) Background: Artificial intelligence (AI) has flourished in recent years. More specifically, generative AI had broad applications many disciplines. While mental illness is on the rise, proven valuable aiding diagnosis and treatment of disorders. However, there little to no research about precisely how much interest technology. (2) Methods: We performed a Google Trends search for “AI health” compared relative volume (RSV) indices “AI”, Depression”, anxiety”. This time series study employed Box–Jenkins modeling forecast long-term through end 2024. (3) Results: Within United States, steadily increased throughout 2023, with some anomalies due media reporting. Through predictive models, we found that this trend predicted increase 114% year 2024, public being rise. (4) Conclusions: According our study, awareness drastically especially health. demonstrates increasing health AI, making advocacy education technology paramount importance.

Язык: Английский

Процитировано

Exploring the potential of using ChatGPT for rhetorical move-step analysis: The impact of prompt refinement, few-shot learning, and fine-tuning DOI

Minjin Kim,

Xiaofei Lu

Journal of English for Academic Purposes, Год журнала: 2024, Номер 71, С. 101422 - 101422

Опубликована: Июль 17, 2024

Язык: Английский

Процитировано

Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam DOI

Mehmet Buldur, Berkant Sezer

BMC Oral Health, Год журнала: 2024, Номер 24(1)

Опубликована: Май 24, 2024

Abstract Background The use of artificial intelligence in the field health sciences is becoming widespread. It known that patients benefit from applications on various issues, especially after pandemic period. One most important issues this regard accuracy information provided by applications. Objective purpose study was to frequently asked questions about dental amalgam, as determined United States Food and Drug Administration (FDA), which one these resources, Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) compare content answers given application with FDA. Methods were directed ChatGPT-4 May 8th 16th, 2023, responses recorded compared at word meaning levels using ChatGPT. FDA webpage also recorded. for similarity “Main Idea”, “Quality Analysis”, “Common Ideas”, “Inconsistent Ideas” between ChatGPT-4’s FDA’s responses. Results similar one-week intervals. In comparison guidance, it questions. However, although there some similarities general aspects recommendation regarding amalgam removal question, two texts are not same, they offered different perspectives replacement fillings. Conclusions findings indicate ChatGPT-4, an based application, encompasses current accurate its removal, providing individuals seeking access such information. Nevertheless, we believe numerous studies required assess validity reliability across diverse subjects.

Язык: Английский

Процитировано

Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis DOI

Huizi Yu, Lizhou Fan, Lingyao Li

и другие.

Journal of Healthcare Informatics Research, Год журнала: 2024, Номер 8(4), С. 658 - 711

Опубликована: Сен. 14, 2024

Язык: Английский

Процитировано

Comparing ChatGPT's correction and feedback comments with that of educators in the context of primary students' short essays written in English and Greek DOI

Emmanuel Fokides,

Eirini Peristeraki

Education and Information Technologies, Год журнала: 2024, Номер unknown

Опубликована: Июль 27, 2024

Язык: Английский

Процитировано