Large language models in cancer: potentials, risks, and safeguards DOI Creative Commons

MM Zitu,

Tien-Thinh Le, Tuyen Van Duong

и другие.

Deleted Journal, Год журнала: 2024, Номер 2(1)

Опубликована: Дек. 20, 2024

Abstract This review examines the use of large language models (LLMs) in cancer, analysing articles sourced from PubMed, Embase, and Ovid Medline, published between 2017 2024. Our search strategy included terms related to LLMs, cancer research, risks, safeguards, ethical issues, focusing on studies that utilized text-based data. 59 were review, categorized into 3 segments: quantitative chatbot-focused studies, qualitative discussions LLMs cancer. Quantitative highlight LLMs’ advanced capabilities natural processing (NLP), while demonstrate their potential clinical support data management. Qualitative research underscores broader implications including risks considerations. findings suggest notably ChatGPT, have analysis, patient interaction, personalized treatment care. However, identifies critical biases challenges. We emphasize need for regulatory oversight, targeted model development, continuous evaluation. In conclusion, integrating offers promising prospects but necessitates a balanced approach accuracy, integrity, privacy. further study, encouraging responsible exploration application artificial intelligence oncology.

Язык: Английский

Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation DOI Creative Commons
Vasileios Ntinopoulos,

Hector Rodriguez Cetina Biefer,

I. Tudorache

и другие.

BMJ Health & Care Informatics, Год журнала: 2025, Номер 32(1), С. e101139 - e101139

Опубликована: Янв. 1, 2025

Objectives We aimed to evaluate the performance of multiple large language models (LLMs) in data extraction from unstructured and semi-structured electronic health records. Methods 50 synthetic medical notes English, containing a structured an part, were drafted evaluated by domain experts, subsequently used for LLM-prompting. 18 LLMs against baseline transformer-based model. Performance assessment comprised four entity five binary classification tasks with total 450 predictions each LLM. LLM-response consistency was performed over three same-prompt iterations. Results Claude 3.0 Opus, Sonnet, 2.0, GPT 4, 2.1, Gemini Advanced, PaLM 2 chat-bison Llama 3-70b exhibited excellent overall accuracy >0.98 (0.995, 0.988, 0.986, 0.982, respectively), significantly higher than RoBERTa model (0.742). chat-bison, Sonnet showed marginally Advanced lower multiple-run (Krippendorff’s alpha value 1, 0.998, 0.996, 0.992, 0.991, 0.989, 0.985, respectively). Discussion chat bison best, exhibiting outstanding both classification, highly consistent responses Their use could leverage research unburden healthcare professionals. Real-data analyses are warranted confirm their real-world setting. Conclusion seem be able reliably extract Further using real

Язык: Английский

Процитировано

1

Understanding Reasons for Oral Anticoagulation Nonprescription in Atrial Fibrillation Using Large Language Models DOI Creative Commons
Sulaiman Somani, Dale Kim,

Eduardo J Pérez-Guerrero

и другие.

Journal of the American Heart Association, Год журнала: 2025, Номер unknown

Опубликована: Март 27, 2025

Background Rates of oral anticoagulation (OAC) nonprescription in atrial fibrillation approach 50%. Understanding reasons for OAC may reduce gaps guideline‐recommended care. We aimed to identify from clinical notes using large language models. Methods identified all patients and associated our health care system with a clinician‐billed visit without another indication stratified them on the basis active prescriptions. Three annotators labeled 10% (“annotation set”). engineered prompts generative model (Generative Pre‐trained Transformer 4) trained discriminative (ClinicalBERT) selected best‐performing predict remaining 90% (“inference Results A total 35 737 were identified, which 7712 (21.6%) did not have 910 across 771 annotated. Generative 4 outperformed ClinicalBERT (macro‐F1 score 0.79, compared 0.69 ClinicalBERT). Using inference set, 61.1% had documented nonprescription, most commonly alternative use an antiplatelet agent (23.3%), therapeutic inertia (21.0%), low burden (17.1%). Conclusions This is first study models extract reveals guideline‐discordant practices actionable insights development interventions nonprescription.

Язык: Английский

Процитировано

1

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review DOI Creative Commons
Xinsong Du, Yifei Wang, Zhengyang Zhou

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Авг. 12, 2024

Background: Generative Large language models (LLMs) represent a significant advancement in natural processing, achieving state-of-the-art performance across various tasks. However, their application clinical settings using real electronic health records (EHRs) is still rare and presents numerous challenges. Objective: This study aims to systematically review the use of generative LLMs, effectiveness relevant techniques patient care-related topics involving EHRs, summarize challenges faced, suggest future directions. Methods: A Boolean search for peer-reviewed articles was conducted on May 19th, 2024 PubMed Web Science include research published since 2023, which one month after release ChatGPT. The results were deduplicated. Multiple reviewers, including biomedical informaticians, computer scientists, physician, screened publications eligibility data extraction. Only studies utilizing LLMs analyze EHR included. We summarized prompt engineering, fine-tuning, multimodal data, evaluation matrices. Additionally, we identified current applying as reported by included proposed Results: initial 6,328 unique studies, with 76 screening. Of these, 67 (88.2%) employed zero-shot prompting, five them 100% accuracy specific Nine used advanced prompting strategies; four tested these strategies experimentally, finding that engineering improved performance, noting non-linear relationship between number examples improvement. Eight explored fine-tuning all improvements tasks, but three noted potential degradation certain two utilized LLM-based decision-making enabled accurate disease diagnosis prognosis. 55 different metrics 22 purposes, such correctness, completeness, conciseness. Two investigated LLM bias, detecting no bias other male patients received more appropriate suggestions. Six hallucinations, fabricating names structured thyroid ultrasound reports. Additional not limited impersonal tone consultations, made uncomfortable, difficulty had understanding responses. Conclusion: Our indicates few have computational enhance performance. diverse highlight need standardization. currently cannot replace physicians due

Язык: Английский

Процитировано

4

Prompts to Table: Specification and Iterative Refinement for Clinical Information Extraction with Large Language Models DOI Creative Commons
David Hein, Alana Christie, Michael J. Holcomb

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Фев. 13, 2025

Extracting structured data from free-text medical records is laborious and error-prone. Traditional rule-based early neural network methods often struggle with domain complexity require extensive tuning. Large language models (LLMs) offer a promising solution but must be tailored to nuanced clinical knowledge complex, multipart entities. We developed flexible, end-to-end LLM pipeline extract diagnoses, per-specimen anatomical-sites, procedures, histology, detailed immunohistochemistry results pathology reports. A human-in-the-loop process create validated reference annotations for development set of 152 kidney tumor reports guided iterative refinement. To drive assessment performance we comprehensive error ontology- categorizing by significance (major vs. minor), source (LLM, manual annotation, or insufficient instructions), contextual origin. The finalized was applied 3,520 internal (of which 2,297 had pre-existing templated available cross referencing) evaluated adaptability using 53 publicly breast cancer After six iterations, major errors on the decreased 0.99% (14/1413 entities). identified 11 key contexts complications arose-including history integration, entity linking, specification granularity-which provided valuable insight in understanding our research goals. Using as reference, achieved macro-averaged F1 score 0.99 identifying subtypes 0.97 detecting metastasis. When adapted dataset, three iterations were required align domain-specific instructions, attaining 89% agreement curated data. This work illustrates that LLM-based extraction pipelines can achieve near expert-level accuracy carefully constructed instructions specific aims. Beyond raw metrics, itself-balancing specificity relevance-proved essential. approach offers transferable blueprint applying emerging capabilities other complex information tasks.

Язык: Английский

Процитировано

0

CancerFusionPrompt: A Novel Framework for Multimodal Cancer Subtype Classification Using Vision‐Language Model DOI Open Access
Ruonan Liu, Muhammad Tariq Ayoub, Junaid Abdul Wahid

и другие.

Expert Systems, Год журнала: 2025, Номер 42(5)

Опубликована: Март 21, 2025

ABSTRACT Background Cancer subtype classification plays a pivotal role in personalised medicine, requiring the integration of diverse data types. Traditional prompting methods vision‐language models fail to fully leverage multimodal data, particularly when working with minimal labelled data. Methods To address these limitations, we propose novel framework that introduces CancerFusionPrompt, specialised method for integrating imaging and multi‐omics Our proposed approach extends few‐shot learning paradigm by incorporating in‐context cancer classification. Results The significantly outperforms state‐of‐the‐art techniques classification, achieving notable improvements both accuracy generalisation. These results demonstrate superior capability CancerFusionPrompt handling complex inputs compared existing methods. Conclusions offers powerful solution tasks. By overcoming limitations current methods, enables more accurate robust predictions

Язык: Английский

Процитировано

0

Comprehensive testing of large language models for extraction of structured data in pathology DOI Creative Commons
Bastian Grothey,

Jan Odenkirchen,

Alen Brkic

и другие.

Communications Medicine, Год журнала: 2025, Номер 5(1)

Опубликована: Март 31, 2025

Abstract Background Pathology departments generate large volumes of unstructured data as free-text diagnostic reports. Converting these reports into structured formats for analytics or artificial intelligence projects requires substantial manual effort by specialized personnel. While recent studies show promise in using advanced language models structuring pathology data, they primarily rely on proprietary models, raising cost and privacy concerns. Additionally, important aspects such prompt engineering model quantization deployment consumer-grade hardware remain unaddressed. Methods We created a dataset 579 annotated German English versions. Six (proprietary: GPT-4; open-source: Llama2 13B, 70B, Llama3 8B, Qwen2.5 7B) were evaluated their ability to extract eleven key parameters from we investigated performance across different strategies techniques assess practical scenarios. Results Here that open-source with high precision, matching the accuracy GPT-4 model. The precision varies significantly configurations. These variations depend specific methods used during deployment. Conclusions Open-source demonstrate comparable solutions report data. This finding has significant implications healthcare institutions seeking cost-effective, privacy-preserving solutions. configurations provide valuable insights departments. Our publicly available bilingual serves both benchmark resource future research.

Язык: Английский

Процитировано

0

Bootstrapping BI-RADS classification using large language models and transformers in breast magnetic resonance imaging reports DOI Creative Commons
Yuxin Liu, Xiang Zhang, Weiwei Cao

и другие.

Visual Computing for Industry Biomedicine and Art, Год журнала: 2025, Номер 8(1)

Опубликована: Апрель 3, 2025

Breast cancer is one of the most common malignancies among women globally. Magnetic resonance imaging (MRI), as final non-invasive diagnostic tool before biopsy, provides detailed free-text reports that support clinical decision-making. Therefore, effective utilization information in MRI to make reliable decisions crucial for patient care. This study proposes a novel method BI-RADS classification using breast reports. Large language models are employed transform into structured Specifically, missing category (MCI) absent supplemented by assigning default values categories To ensure data privacy, locally deployed Qwen-Chat model employed. Furthermore, enhance domain-specific adaptability, knowledge-driven prompt designed. The Qwen-7B-Chat fine-tuned specifically structuring prevent loss and enable comprehensive learning all report details, fusion strategy introduced, combining train model. Experimental results show proposed outperforms existing methods across multiple evaluation metrics. an external test set from different hospital used validate robustness approach. surpasses GPT-4o terms performance. Ablation experiments confirm prompt, MCI, model's

Язык: Английский

Процитировано

0

Roles and Potential of Large Language Models in Healthcare: A Comprehensive Review DOI Creative Commons
Chi‐Hung Lin, Chang‐Fu Kuo

Biomedical Journal, Год журнала: 2025, Номер unknown, С. 100868 - 100868

Опубликована: Апрель 1, 2025

Large Language Models (LLMs) are capable of transforming healthcare by demonstrating remarkable capabilities in language understanding and generation. They have matched or surpassed human performance standardized medical examinations assisted diagnostics across specialties like dermatology, radiology, ophthalmology. LLMs can enhance patient education providing accurate, readable, empathetic responses, they streamline clinical workflows through efficient information extraction from unstructured data such as notes. Integrating LLM into practice involves user interface design, clinician training, effective collaboration between Artificial Intelligence (AI) systems professionals. Users must possess a solid generative AI domain knowledge to assess the generated content critically. Ethical considerations ensure privacy, security, mitigating biases, maintaining transparency critical for responsible deployment. Future directions include interdisciplinary collaboration, developing new benchmarks that incorporate safety ethical measures, advancing multimodal integrate text imaging data, creating LLM-based agents complex decision-making, addressing underrepresented rare diseases, integrating with robotic precision procedures. Emphasizing safety, integrity, human-centered implementation is essential maximizing benefits LLMs, while potential risks, thereby helping these tools rather than replace expertise compassion healthcare.

Язык: Английский

Процитировано

0

Large Language Model-Based Multi-source Integration Pipeline for Automated Diagnostic Classification and Zero-Shot Prognoses for Brain Tumor DOI Creative Commons
Zhuoqi Ma, Lulu Bi,

P. Collins

и другие.

Meta-Radiology, Год журнала: 2025, Номер unknown, С. 100150 - 100150

Опубликована: Апрель 1, 2025

Язык: Английский

Процитировано

0

A strategy for cost-effective large language model use at health system-scale DOI Creative Commons
Eyal Klang, Donald U. Apakama, Ethan Abbott

и другие.

npj Digital Medicine, Год журнала: 2024, Номер 7(1)

Опубликована: Ноя. 18, 2024

Large language models (LLMs) can optimize clinical workflows; however, the economic and computational challenges of their utilization at health system scale are underexplored. We evaluated how concatenating queries with multiple notes tasks simultaneously affects model performance under increasing loads. assessed ten LLMs different capacities sizes utilizing real-world patient data. conducted >300,000 experiments various task configurations, measuring accuracy in question-answering ability to properly format outputs. Performance deteriorated as number questions increased. High-capacity models, like Llama-3–70b, had low failure rates high accuracies. GPT-4-turbo-128k was similarly resilient across burdens, but after 50 large prompt sizes. After addressing mitigable failures, these two concatenate up simultaneous effectively, validation on a public medical dataset. An analysis demonstrated 17-fold cost reduction using concatenation. These results identify limits for effective highlight avenues cost-efficiency enterprise scale.

Язык: Английский

Процитировано

3