
medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown
Опубликована: Сен. 3, 2024
Abstract In clinical science and practice, text data, such as letters or procedure reports, is stored in an unstructured way. This type of data not a quantifiable resource for any kind quantitative investigations manual review structured information retrieval time-consuming costly. The capabilities Large Language Models (LLMs) mark paradigm shift natural language processing offer new possibilities Information Extraction (IE) from medical free text. protocol describes workflow LLM based extraction (LLM-AIx), enabling predefined entities using privacy preserving LLMs. By converting into LLM-AIx addresses critical barrier research where the efficient essential improving decision-making, enhancing patient outcomes, facilitating large-scale analysis. consists four main steps: 1) Problem definition preparation, 2) preprocessing, 3) LLM-based IE 4) output evaluation. allows integration on local hospital hardware without need transferring to external servers. As example tasks, we applied anonymization fictitious patients with pulmonary embolism. Additionally, extracted symptoms laterality embolism these letters. We demonstrate troubleshooting potential problems within pipeline real-world dataset, 100 pathology reports Cancer Genome Atlas Program (TCGA), TNM stage extraction. can be executed programming knowledge via easy-to-use interface no more than few minutes hours, depending model selected.
Язык: Английский