Cited by From promise to practice: challenges and pitfalls in the evaluation of large language models for data extraction in evidence synthesis

An outline of Prognostics and health management Large Model: Concepts, Paradigms, and challenges DOI

Laifa Tao, Shangyu Li, Haifei Liu

et al.

Mechanical Systems and Signal Processing, Journal Year: 2025, Volume and Issue: 232, P. 112683 - 112683

Published: April 14, 2025

Language: Английский

Citations

RoBIn: A Transformer-based model for risk of bias inference with machine reading comprehension DOI

Abel Corrêa Dias,

Viviane P. Moreira, João L. D. Comba

et al.

Journal of Biomedical Informatics, Journal Year: 2025, Volume and Issue: 166, P. 104819 - 104819

Published: April 16, 2025

Language: Английский

Citations

Large language model-generated clinical practice guideline for appendicitis DOI

Amy Boyle,

Bright Huo, Patricia Sylla

et al.

Surgical Endoscopy, Journal Year: 2025, Volume and Issue: unknown

Published: April 18, 2025

Language: Английский

Citations

Generative artificial intelligence use in evidence synthesis: A systematic review DOI

Justin Clark, Belinda Barton, Loai Albarqouni

et al.

Research Synthesis Methods, Journal Year: 2025, Volume and Issue: unknown, P. 1 - 19

Published: April 24, 2025

Abstract Introduction With the increasing accessibility of tools such as ChatGPT, Copilot, DeepSeek, Dall-E, and Gemini, generative artificial intelligence (GenAI) has been poised a potential, research timesaving tool, especially for synthesising evidence. Our objective was to determine whether GenAI can assist with evidence synthesis by assessing its performance using accuracy, error rates, time savings compared traditional expert-driven approach. Methods To systematically review evidence, we searched five databases on 17 January 2025, synthesised outcomes reporting or taken, appraised risk-of-bias modified version QUADAS-2. Results We identified 3,071 unique records, 19 which were included in our review. Most studies had high unclear Domain 1A: selection, 2A: conduct, 1B: applicability results. When used (1) searching missed 68% 96% (median = 91%) studies, (2) screening made incorrect inclusion decisions ranging from 0% 29% 10%); exclusion 1% 83% 28%), (3) data extractions 4% 31% 14%), (4) assessments 10% 56% 27%). Conclusion shows that current does not support use without human involvement oversight. However, most tasks other than searching, may have role assisting humans synthesis.

Language: Английский

Citations

Exploring the potential of Claude 2 for risk of bias assessment: Using a large language model to assess randomized controlled trials with RoB 2 DOI

Angelika Eisele‐Metzger, Judith-Lisa Lieberum,

Markus Toews

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 16, 2024

ABSTRACT Systematic reviews are essential for evidence based healthcare, but conducting them is time and resource consuming. To date, efforts have been made to accelerate (semi-) automate various steps of systematic through the use artificial intelligence emergence large language models (LLMs) promises further opportunities. One crucial complex task within review conduct assessing risk bias included studies. Therefore, aim this study was test LLM Claude 2 assessment 100 randomized controlled trials using revised Cochrane tool (“RoB 2”; involving judgements five specific domains an overall judgement). We assessed agreement by with human published in Reviews. The observed between authors ranged from 41% judgement 71% domain 4 (“outcome measurement”). Cohen’s κ lowest 5 (“selective reporting”; 0.10 (95% confidence interval (CI): −0.10-0.31)) highest 3 (“missing data”; 0.31 CI: 0.10-0.52)), indicating slight fair agreement. Fair found (Cohen’s κ: 0.22 0.06-0.38)). Sensitivity analyses alternative prompting techniques or more recent version did not result substantial changes. Currently, Claude’s RoB cannot replace assessment. However, potential LLMs support should be explored.

Language: Английский

Citations

AI-Driven Evidence Synthesis: Data Extraction of Randomized Controlled Trials with Large Language Models DOI

Jiayi Liu, Long Ge, Honghao Lai

et al.

Published: Jan. 1, 2024

Language: Английский

Citations

Leveraging evaluation of quality on medical education research with ChatGPT DOI

Javier A. Flores-Cohaila, Peter Garcia-Portocarrero, Deysi A. Saldaña-Amaya

et al.

Medical Teacher, Journal Year: 2024, Volume and Issue: unknown, P. 1 - 3

Published: Aug. 4, 2024

What is the educational challenge? The Medical Education Research Study Quality Instrument (MERSQI) widely used to evaluate quality of quantitative research in medical education. It has strong evidence validity and endorsed by guidelines. However, manual appraisal process time-consuming resource-intensive, highlighting need for more efficient methods. are proposed solutions? We propose use ChatGPT education with MERSQI compare its scoring those human evaluators. potential benefits a broader global audience? Using can decrease resources required appraisal. This allows faster summaries evidence, reducing workload researchers, editors, educators. Furthermore, ChatGPTs' capability extract supporting excerpts provides transparency may have data extraction training new researchers. next steps? plan continue evaluating using other instruments determine feasibility this realm. Moreover, we investigate which types studies performs best in.

Language: Английский

Citations

Generative AI-assisted Peer Review in Medical Publications: Opportunities Or Trap (Preprint) DOI

Zhiqiang Li, Chen Shen,

Feng Cao

et al.

Published: Sept. 2, 2024

UNSTRUCTURED With the exponential growth in number of research papers and proliferation preprint servers, ensuring high-quality peer review has become a significant challenge, especially medical field. The surge submissions led to shortage qualified reviewers, slowing down process. repeated rejected manuscripts not only increases costs but may also stifle innovation, raising concerns about efficiency, fairness, effectiveness Therefore, innovative solutions are urgently needed. Recent advancements generative artificial intelligence (GenAI), such as ChatGPT, have demonstrated exceptional capabilities feature learning textual expression, allowing them identify complex relationships within data without relying on pre-existing assumptions. GenAI present an opportunity enhance semi-automated systems, potentially addressing current limitations process improving efficiency quality publications. This viewpoint highlights potential benefits challenges integrating into identifies key issues that need be addressed.

Language: Английский

Citations

From promise to practice: challenges and pitfalls in the evaluation of large language models for data extraction in evidence synthesis DOI

Gerald Gartlehner, Leila C. Kahwati, Barbara Nußbaumer-Streit

et al.

BMJ evidence-based medicine, Journal Year: 2024, Volume and Issue: unknown, P. bmjebm - 113199

Published: Dec. 20, 2024

Language: Английский

Citations