How to Write Effective Prompts for Screening Biomedical Literature Using Large Language Models DOI Creative Commons
Maria Teresa Colangelo, Stefano Guizzardi,

Marco Meleti

и другие.

BioMedInformatics, Год журнала: 2025, Номер 5(1), С. 15 - 15

Опубликована: Март 11, 2025

Large language models (LLMs) have emerged as powerful tools for (semi-)automating the initial screening of abstracts in systematic reviews, offering potential to significantly reduce manual burden on research teams. This paper provides a broad overview prompt engineering principles and highlights how traditional PICO (Population, Intervention, Comparison, Outcome) criteria can be converted into actionable instructions LLMs. We analyze trade-offs between “soft” prompts, which maximize recall by accepting articles unless they explicitly fail an inclusion requirement, “strict” demand explicit evidence every criterion. Using periodontics case study, we illustrate design affects recall, precision, overall efficiency discuss metrics (accuracy, F1 score) evaluate performance. also examine common pitfalls, such overly lengthy prompts or ambiguous instructions, underscore continuing need expert oversight mitigate hallucinations biases inherent LLM outputs. Finally, explore emerging trends, including multi-stage pipelines fine-tuning, while noting ethical considerations related data privacy transparency. By applying rigorous evaluation, researchers optimize LLM-based processes, allowing faster more comprehensive synthesis across biomedical disciplines.

Язык: Английский

How to Write Effective Prompts for Screening Biomedical Literature Using Large Language Models DOI Creative Commons
Maria Teresa Colangelo, Stefano Guizzardi,

Marco Meleti

и другие.

BioMedInformatics, Год журнала: 2025, Номер 5(1), С. 15 - 15

Опубликована: Март 11, 2025

Large language models (LLMs) have emerged as powerful tools for (semi-)automating the initial screening of abstracts in systematic reviews, offering potential to significantly reduce manual burden on research teams. This paper provides a broad overview prompt engineering principles and highlights how traditional PICO (Population, Intervention, Comparison, Outcome) criteria can be converted into actionable instructions LLMs. We analyze trade-offs between “soft” prompts, which maximize recall by accepting articles unless they explicitly fail an inclusion requirement, “strict” demand explicit evidence every criterion. Using periodontics case study, we illustrate design affects recall, precision, overall efficiency discuss metrics (accuracy, F1 score) evaluate performance. also examine common pitfalls, such overly lengthy prompts or ambiguous instructions, underscore continuing need expert oversight mitigate hallucinations biases inherent LLM outputs. Finally, explore emerging trends, including multi-stage pipelines fine-tuning, while noting ethical considerations related data privacy transparency. By applying rigorous evaluation, researchers optimize LLM-based processes, allowing faster more comprehensive synthesis across biomedical disciplines.

Язык: Английский

Процитировано

0