Development and Validation of a Generative Artificial Intelligence-Based Pipeline for Automated Clinical Data Extraction from Electronic Health Records: Technical Implementation Study (Preprint) DOI
Marvin N. Carlisle, William A. Pace, Andrew W. Liu

et al.

Published: Dec. 30, 2024

UNSTRUCTURED Background Manual abstraction of unstructured clinical data is often necessary for granular outcomes research, but time consuming and can be variable quality. Large language models (LLMs) show promise in medical extraction, yet integrating them into research workflows remains challenging poorly described. We developed integrated an LLM-based system automated extraction from electronic health record (EHR) text reports within established database. Methods implemented a generative AI pipeline (UODBLLM) utilizing flexible model interface that supports various LLM implementations, including HIPAA-compliant cloud services local open-source models. used XML-structured prompts using open database connectivity to generate structured documentation the EHR. evaluated UODBLLM's performance on completion rate, processing time, capabilities across multiple elements, quantitative measurements, categorical assessments, anatomical descriptions. System reliability was tested batches assess scalability consistency. Results Piloted against MRI reports, UODBLLM processed 1,800 documents with 100% rate average 8.90 seconds per report. Token utilization averaged 2,692 tokens report, input-to-output ratio approximately 6.5:1, resulting cost $0.009 had consistent 18 100 each completed all 4.45 hours. From extracted 16 prostate volume, PSA values, PI-RADS scores, staging, assessments. All automatically validated predefined schemas stored standardized JSON format. Conclusion demonstrated successful integration existing database, achieving rapid, comprehensive at minimal cost. provides scalable, efficient solution automating while maintaining PHI security. This approach could significantly accelerate timelines expand feasible studies, particularly large-scale projects.

Language: Английский

ChatGPT and assistive AI in structured radiology reporting: A systematic review DOI Creative Commons
Ethan Sacoransky, Benjamin Y. M. Kwan,

Donald Soboleski

et al.

Current Problems in Diagnostic Radiology, Journal Year: 2024, Volume and Issue: 53(6), P. 728 - 737

Published: July 9, 2024

The rise of transformer-based large language models (LLMs), such as ChatGPT, has captured global attention with recent advancements in artificial intelligence (AI). ChatGPT demonstrates growing potential structured radiology reporting—a field where AI traditionally focused on image analysis. A comprehensive search MEDLINE and Embase was conducted from inception through May 2024, primary studies discussing ChatGPT's role reporting were selected based their content. Of the 268 articles screened, eight ultimately included this review. These explored various applications generating reports unstructured reports, extracting data free text, impressions findings creating imaging data. All demonstrated optimism regarding to aid radiologists, though common critiques privacy concerns, reliability, medical errors, lack medical-specific training. assistive have significant transform reporting, enhancing accuracy standardization while optimizing healthcare resources. Future developments may involve integrating dynamic few-shot prompting, Retrieval Augmented Generation (RAG) into diagnostic workflows. Continued research, development, ethical oversight are crucial fully realize AI's radiology.

Language: Английский

Citations

12

Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large-Language Models and Six Human Readers of Varying Experience DOI
Jong Eun Lee, Ki Seong Park, Yun‐Hyeon Kim

et al.

American Journal of Roentgenology, Journal Year: 2024, Volume and Issue: 223(6)

Published: Sept. 4, 2024

Although radiology reports are commonly used for lung cancer staging, this task can be challenging given radiologists' variable reporting styles as well reports' potentially ambiguous and/or incomplete staging-related information.

Language: Английский

Citations

8

Structured clinical reasoning prompt enhances LLM’s diagnostic capabilities in diagnosis please quiz cases DOI Creative Commons
Yuki Sonoda, Ryo Kurokawa, Akifumi Hagiwara

et al.

Japanese Journal of Radiology, Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 3, 2024

Abstract Purpose Large Language Models (LLMs) show promise in medical diagnosis, but their performance varies with prompting. Recent studies suggest that modifying prompts may enhance diagnostic capabilities. This study aimed to test whether a prompting approach aligns general clinical reasoning methodology—specifically, using standardized template first organize information into predefined categories (patient information, history, symptoms, examinations, etc.) before making diagnoses, instead of one-step processing—can the LLM’s Materials and methods Three hundred twenty two quiz questions from Radiology’s Diagnosis Please cases (1998–2023) were used. We employed Claude 3.5 Sonnet, state-of-the-art LLM, compare three approaches: (1) Baseline: conventional zero-shot chain-of-thought prompt, (2) two-step approach: structured first, LLM systematically organizes distinct history imaging findings), then separately analyzes this organized provide (3) Summary-only only LLM-generated summary for diagnoses. Results The significantly outperformed both baseline summary-only approaches accuracy, as determined by McNemar’s test. Primary accuracy was 60.6% approach, compared 56.5% ( p = 0.042) 56.3% 0.035). For top 70.5, 66.5, 65.5% respectively 0.005 baseline, 0.008 summary-only). No significant differences observed between approaches. Conclusion Our results indicate enhances accuracy. method shows potential valuable tool deriving diagnoses free-text information. well established processes, suggesting its applicability real-world settings.

Language: Английский

Citations

4

Attitudes of radiologists and interns toward the adoption of GPT-like technologies: a National Survey Study in China DOI Creative Commons
Tianyi Xia, Shijun Zhang,

Ben Zhao

et al.

Insights into Imaging, Journal Year: 2025, Volume and Issue: 16(1)

Published: Jan. 31, 2025

Language: Английский

Citations

0

Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports DOI Creative Commons
Su Hwan Kim, Severin Schramm, Lisa C. Adams

et al.

npj Digital Medicine, Journal Year: 2025, Volume and Issue: 8(1)

Published: Feb. 12, 2025

Abstract Recent advancements in large language models (LLMs) have created new ways to support radiological diagnostics. While both open-source and proprietary LLMs can address privacy concerns through local or cloud deployment, provide advantages continuity of access, potentially lower costs. This study evaluated the diagnostic performance fifteen one closed-source LLM (GPT-4o) 1,933 cases from Eurorad library. provided differential diagnoses based on clinical history imaging findings. Responses were considered correct if true diagnosis appeared top three suggestions. Models further tested 60 non-public brain MRI a tertiary hospital assess generalizability. In datasets, GPT-4o demonstrated superior performance, closely followed by Llama-3-70B, revealing how are rapidly closing gap models. Our findings highlight potential as decision tools for challenging, real-world cases.

Language: Английский

Citations

0

Llama 3.1 405B Is Comparable to GPT-4 for Extraction of Data from Thrombectomy Reports—A Step Towards Secure Data Extraction DOI Creative Commons
Nils Christian Lehnen,

Johannes Kürsch,

Barbara Wichtmann

et al.

Clinical Neuroradiology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 25, 2025

Abstract Purpose GPT‑4 has been shown to correctly extract procedural details from free-text reports on mechanical thrombectomy. However, GPT may not be suitable for analyzing containing personal data. The purpose of this study was evaluate the ability large language models (LLM) Llama3.1 405B, Llama3 70B, 8B, and Mixtral 8X7B, that can operated offline, thrombectomies. Methods Free-text thrombectomy two institutions were included. A detailed prompt used in German English languages. LLMs data compared using McNemar’s test. manual entries made by an interventional neuroradiologist served as reference standard. Results 100 institution 1 (mean age 74.7 ± 13.2 years; 53 females) 30 2 72.7 13.5 18 males) Llama 3.1 405B extracted 2619 2800 points (93.5% [95%CI: 92.6%, 94.4%], p = 0.39 vs. GPT-4). 70B with 2537 (90.6% 89.5%, 91.7%], < 0.001 GPT-4), 2471 (88.2% 87.0%, 89.4%], GPT-4) prompt. 3 8B 2314 (86.1% 84.8%, 87.4%], 8X7B 2411 correctly. Conclusion equal extraction thrombectomies represent a secure alternative, when locally.

Language: Английский

Citations

0

Effectiveness of Large Language Models in Stroke Rehabilitation Health Education: A Comparative Study of ChatGPT-4, MedGo, Qwen, and ERNIE Bot (Preprint) DOI Creative Commons

Shiqi Qiang,

Yang Liao,

Yongchun Gu

et al.

Published: Feb. 28, 2025

BACKGROUND Stroke is a leading cause of disability and death worldwide, with home-based rehabilitation playing crucial role in improving patient prognosis quality life. Traditional health education models often fall short terms precision, personalization, accessibility. In contrast, large language (LLMs) are gaining attention for their potential medical education, owing to advanced natural processing capabilities. However, the effectiveness LLMs stroke remains uncertain. OBJECTIVE This study evaluates four LLMs—ChatGPT-4, MedGo, Qwen, ERNIE Bot—in rehabilitation. The aim offer patients more precise secure pathways while exploring feasibility using guide education. METHODS first phase this study, literature review expert interviews identified 15 common questions 2 clinical cases relevant These were input into simulated consultations. Six experts (2 clinicians, nursing specialists, therapists) evaluated LLM-generated responses Likert 5-point scale, assessing accuracy, completeness, readability, safety, humanity. second phase, top two performing from one selected. Thirty undergoing recruited. Each asked both 3 questions, rated satisfaction assessed text length, recommended reading age Chinese readability analysis tool. Data analyzed one-way ANOVA, post hoc Tukey HSD tests, paired t-tests. RESULTS results revealed significant differences across five dimensions: accuracy (P = .002), completeness < .001), .04), safety .007), humanity .001). ChatGPT-4 outperformed all each dimension, scores (M 4.28, SD 0.84), 4.35, 0.75), 0.85), 4.38, 0.81), user-friendliness 4.65, 0.66). MedGo excelled 4.06, 0.78) 0.74). Qwen Bot scored significantly lower dimensions compared MedGo. generated longest 1338.35, 236.03) had highest score 12.88). performed best overall, provided clearest responses. CONCLUSIONS have shown strong performance demonstrating real-world applications. further improvements needed professionalism, oversight.

Language: Английский

Citations

0

Automated Identification and Representation of System Requirements Based on Large Language Models and Knowledge Graphs DOI Creative Commons
Lei Wang, Mingchao Wang,

Yuan-Rong Zhang

et al.

Applied Sciences, Journal Year: 2025, Volume and Issue: 15(7), P. 3502 - 3502

Published: March 23, 2025

In the product design and manufacturing process, effective management representation of system requirements (SRs) are crucial for ensuring quality consistency. However, current methods hindered by document ambiguity, weak requirement interdependencies, limited semantic expressiveness in model-based systems engineering. To address these challenges, this paper proposes a prompt-driven integrated framework that synergizes large language models (LLMs) knowledge graphs (KGs) to automate visualization SR text structured extraction. Specifically, introduces template information extraction tailored arbitrary documents, designed around five SysML-defined categories: functional requirements, interface performance physical constraints. By defining elements each category leveraging GPT-4 model extract key from unstructured texts, can effectively present information. Furthermore, constructs graph represent visually illustrating interdependencies constraints between them. A case study applying approach Chapters 18–22 ‘Code Design Metro’ demonstrates effectiveness proposed method automating representation, enhancing traceability, improving management. Moreover, comparison accuracy GPT-4, GPT-3.5-turbo, BERT, RoBERTa using same dataset reveals achieves an overall 84.76% compared 79.05% GPT-3.5-turbo 59.05% both BERT RoBERTa. This proves provides new technical pathway intelligent

Language: Английский

Citations

0

AI-Based Nanotoxicity Data Extraction and Prediction of Nanotoxicity DOI Creative Commons
Eunyong Ha,

Seung Min Ha,

Zayakhuu Gerelkhuu

et al.

Computational and Structural Biotechnology Journal, Journal Year: 2025, Volume and Issue: unknown

Published: April 1, 2025

With the growing use of nanomaterials (NMs), assessing their toxicity has become increasingly important. Among assessment methods, computational models for predicting nanotoxicity are emerging as alternatives to traditional in vitro and vivo assays, which involve high costs ethical concerns. As a result, qualitative quantitative importance data is now widely recognized. However, collecting large, high-quality both time-consuming labor-intensive. Artificial intelligence (AI)-based extraction techniques hold significant potential extracting organizing information from unstructured text. large language (LLMs) prompt engineering not been studied. In this study, we developed an AI-based automated pipeline facilitate efficient collection. The automation process was implemented using Python-based LangChain. We used 216 research articles training refine prompts evaluate LLM performance. Subsequently, most suitable with refined extract test data, 605 articles. performance on achieved F1D.E. (F1 score Data Extraction) ranging 84.6 % 87.6 across different LLMs. Furthermore, extracted dataset set, constructed machine learning (AutoML) that F1N.P. Nanotoxicity Prediction) exceeding 86.1 nanotoxicity. Additionally, assessed reliability applicability by comparing them terms ground truth, size, balance. This study highlights extraction, representing contribution research.

Language: Английский

Citations

0

Privacy-Preserving Large Language Model for Matching Findings and Tracking Interval Changes in Longitudinal Radiology Reports DOI
Tejas Sudharshan Mathai, Boah Kim, Oana M Stroie

et al.

Deleted Journal, Journal Year: 2025, Volume and Issue: unknown

Published: April 11, 2025

In current radiology practice, radiologists identify a finding in the imaging exam, manually match it against description from prior exam report and assess interval changes. Large Language Models (LLMs) can findings, but their ability to track changes has not been tested. The goal of this study was determine utility privacy-preserving LLM for matching findings between two reports (prior follow-up) tracking size. retrospective study, body MRI NIH (internal) were collected. A two-stage framework employed Stage 1, took sentence follow-up discovered matched report. 2, predicted change status (increase, decrease, or stable) findings. Seven LLMs locally evaluated best validated on an external non-contrast chest CT dataset. Agreement with reference (radiologist) measured using Cohen's Kappa (κ). internal dataset had 240 studies (120 patients, mean age, 47 ± 16 years; 65 men) contained 134 (67 58 18 44 men). On dataset, TenyxChat-7B fared F1-score 85.4% (95% CI: 80.8, 89.9) over other (p < 0.05). For detection, same achieved 62.7% showed moderate agreement (κ = 0.46, 95% 0.37, 0.55). attained F1-scores 81.8% 74.4, 89.1) 77.4% detection respectively, substantial 0.64, 0.49, 0.80). used longitudinal standard. structured reporting, pre-fill "Findings" section next summary important It also enhance communication referring physician radiologist.

Language: Английский

Citations

0