Human versus artificial intelligence: evaluating ChatGPT’s performance in conducting published systematic reviews with meta-analysis in chronic pain research DOI Creative Commons
Anam Purewal,

Kalli Fautsch,

Johana Klasová

et al.

Regional Anesthesia & Pain Medicine, Journal Year: 2025, Volume and Issue: unknown, P. rapm - 106358

Published: Feb. 16, 2025

Introduction Artificial intelligence (AI), particularly large-language models like Chat Generative Pre-Trained Transformer (ChatGPT), has demonstrated potential in streamlining research methodologies. Systematic reviews and meta-analyses, often considered the pinnacle of evidence-based medicine, are inherently time-intensive demand meticulous planning, rigorous data extraction, thorough analysis, careful synthesis. Despite promising applications AI, its utility conducting systematic with meta-analysis remains unclear. This study evaluated ChatGPT’s accuracy key tasks a review meta-analysis. Methods validation used from published on emotional functioning after spinal cord stimulation. ChatGPT-4o performed title/abstract screening, full-text selection, pooling for this Comparisons were made against human-executed steps, which gold standard. Outcomes interest included accuracy, sensitivity, specificity, positive predictive value, negative value screening tasks. We also assessed discrepancies pooled effect estimates forest plot generation. Results For title abstract ChatGPT achieved an 70.4%, sensitivity 54.9%, specificity 80.1%. In phase, was 68.4%, 75.6%, 66.8%. successfully five plots, achieving 100% calculating mean differences, 95% CIs, heterogeneity ( I 2 score tau-squared values) most outcomes, minor values (range 0.01–0.05). Forest plots showed no significant discrepancies. Conclusion demonstrates modest to moderate selection tasks, but performs well meta-analytic calculations. These findings underscore AI augment methodologies, while emphasizing need human oversight ensure integrity workflows.

Language: Английский

On the Use of Generative AI for Literature Reviews: An Exploration of Tools and Techniques DOI
Peter Mozelius, Niklas Humble

European Conference on Research Methodology for Business and Management Studies, Journal Year: 2024, Volume and Issue: 23(1), P. 161 - 168

Published: June 26, 2024

To carry out a literature review often involves hard and tedious work. There is tradition of using facilitating tools, that extended to the AI field in 2018 when iris.ai appeared. Today, emerging Generative tools based on Large Language Models, there has been rapid development new search approaches. This study aim exploring this vast array where some found were used facilitate selection relevant publication. Three research questions guided study: RQ1) "What can be literature?", RQ2) "Which these could use conducted study, how?", RQ3) are ethical aspects studies?” The approach scoping review, built around combined keywords: "AI supported", generated", based" "Literature review". An initial result set was filtered with inclusion exclusion criteria strive for an interesting quality answer questions. However, most publications passed filtering lacked any potential contribute finding first hint about feature 'Scopus AI'. A Scopus tool resulted small but very publications. These analysed deductive inductive thematic analysis, primary sorted into categories of: 'Generative Tools', 'Supportive Techniques', 'Ethical Issues'. Findings indicate wide variety skimming process literature, provide adequate summaries retrieved authors recommendation keep support level, main analysis conclusion should human conducted. With this, rather traditional approach, researchers will have clearly less issues consider. Finally, ought investigated more detail, separate future study.

Language: Английский

Citations

6

A Hybrid Semi-Automated Workflow for Systematic and Literature Review Processes with Large Language Model Analysis DOI Creative Commons
Anjia Ye, Ananda Maiti, Matthew Schmidt

et al.

Future Internet, Journal Year: 2024, Volume and Issue: 16(5), P. 167 - 167

Published: May 12, 2024

Systematic reviews (SRs) are a rigorous method for synthesizing empirical evidence to answer specific research questions. However, they labor-intensive because of their collaborative nature, strict protocols, and typically large number documents. Large language models (LLMs) applications such as gpt-4/ChatGPT have the potential reduce human workload SR process while maintaining accuracy. We propose new hybrid methodology that combines strengths LLMs humans using ability summarize bodies text autonomously extract key information. This is then used by researcher make inclusion/exclusion decisions quickly. replaces typical manually performed title/abstract screening, full-text data extraction steps in an keeping loop quality control. developed semi-automated LLM-assisted (Gemini-Pro) workflow with novel innovative prompt development strategy. involves extracting three categories information including identifier, verifier, field (IVD) from formatted present case study where our approach reduced errors compared human-only SR. The improved accuracy identifying 6/390 (1.53%) articles were misclassified process. It also matched completely regarding rest 384 articles. Given rapid advances LLM technology, these results will undoubtedly improve over time.

Language: Английский

Citations

5

GPT-4 performance on querying scientific publications: reproducibility, accuracy, and impact of an instruction sheet DOI Creative Commons
Kaiming Tao,

Zachary A. Osman,

Philip L. Tzou

et al.

BMC Medical Research Methodology, Journal Year: 2024, Volume and Issue: 24(1)

Published: June 25, 2024

Abstract Background Large language models (LLMs) that can efficiently screen and identify studies meeting specific criteria would streamline literature reviews. Additionally, those capable of extracting data from publications enhance knowledge discovery by reducing the burden on human reviewers. Methods We created an automated pipeline utilizing OpenAI GPT-4 32 K API version “2023–05-15” to evaluate accuracy LLM responses queries about published papers HIV drug resistance (HIVDR) with without instruction sheet. The sheet contained specialized designed assist a person trying answer questions HIVDR paper. 60 pertaining markdown versions in PubMed. presented four configurations: (1) all simultaneously; (2) simultaneously sheet; (3) each individually; (4) individually Results achieved mean 86.9% – 24.0% higher than when answers were permuted. overall recall precision 72.5% 87.4%, respectively. standard deviation three replicates for ranged 0 5.3% median 1.2%. did not significantly increase GPT-4’s accuracy, recall, or precision. was more likely provide false positive submitted compared they together. Conclusions reproducibly answered 3600 moderately high sheet's failure improve these metrics suggests sophisticated approaches are necessary. Either enhanced prompt engineering finetuning open-source model could further LLM's ability highly papers.

Language: Английский

Citations

5

The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews DOI
Aleksi Huotala, Miikka Kuutila, Paul Ralph

et al.

Published: June 14, 2024

Language: Английский

Citations

4

Implementation and evaluation of an additional GPT-4-based reviewer in PRISMA-based medical systematic literature reviews DOI Creative Commons
Assaf Landschaft, Dario Antweiler, Sina Mackay

et al.

International Journal of Medical Informatics, Journal Year: 2024, Volume and Issue: 189, P. 105531 - 105531

Published: June 26, 2024

PRISMA-based literature reviews require meticulous scrutiny of extensive textual data by multiple reviewers, which is associated with considerable human effort.

Language: Английский

Citations

4

From statistics to deep learning: Using large language models in psychiatric research DOI Creative Commons
Yining Hua,

Andrew Beam,

Lori B. Chibnik

et al.

International Journal of Methods in Psychiatric Research, Journal Year: 2025, Volume and Issue: 34(1)

Published: Jan. 8, 2025

Abstract Background Large Language Models (LLMs) hold promise in enhancing psychiatric research efficiency. However, concerns related to bias, computational demands, data privacy, and the reliability of LLM‐generated content pose challenges. Gap Existing studies primarily focus on clinical applications LLMs, with limited exploration their potentials broader research. Objective This study adopts a narrative review format assess utility LLMs research, beyond settings, focusing effectiveness literature review, design, subject selection, statistical modeling, academic writing. Implication provides clearer understanding how can be effectively integrated process, offering guidance mitigating associated risks maximizing potential benefits. While for advancing careful oversight, rigorous validation, adherence ethical standards are crucial such as privacy concerns, issues, thereby ensuring effective responsible use improving

Language: Английский

Citations

0

Use of AI in family medicine publications: a joint editorial from journal editors DOI

Sarina Schrager,

Dean A. Seehusen, Sumi M. Sexton

et al.

Evidence-Based Practice, Journal Year: 2025, Volume and Issue: 28(1), P. 1 - 4

Published: Jan. 1, 2025

Schrager, Sarina MD, MS; Seehusen, Dean A. MPH; Sexton, Sumi M. MD; Richardson, Caroline Neher, Jon Pimlott, Nicholas Bowman, Marjorie Rodíguez, José Morley, Christopher P. PhD; Li, Li PhD, Dera, James Dom MD Author Information

Language: Английский

Citations

0

Use of AI in Family Medicine Publications: A Joint Editorial From Journal Editors DOI Open Access

Sarina Schrager,

Dean A. Seehusen, Sumi M. Sexton

et al.

Family Medicine, Journal Year: 2025, Volume and Issue: 57(1), P. 1 - 5

Published: Jan. 13, 2025

Language: Английский

Citations

0

Use of AI in Family Medicine Publications: A Joint Editorial From Journal Editors DOI Open Access

Sarina Schrager,

Dean A. Seehusen, Sumi M. Sexton

et al.

PRiMER, Journal Year: 2025, Volume and Issue: 9

Published: Jan. 13, 2025

There are multiple guidelines from publishers and organizations on the use of artiXcial intelligence (AI) in publishing.However, none speciXc to family medicine.Most journals have some basic AI recommendations for authors, but more explicit direction is needed, as not all tools same.

Language: Английский

Citations

0

Use of AI in family medicine publications: a joint editorial from journal editors DOI Creative Commons

Sarina Schrager,

Dean A. Seehusen, Sumi M. Sexton

et al.

Family Medicine and Community Health, Journal Year: 2025, Volume and Issue: 13(1), P. e003238 - e003238

Published: Jan. 1, 2025

There are multiple guidelines from publishers and organisations on the use of artificial intelligence (AI) in publishing.[1–5][1] However, none specific to family medicine. Most journals have some basic AI recommendations for authors, but more explicit direction is needed, as not all

Language: Английский

Citations

0