Large Language Models Can Extract Metadata for Annotation of Human Neuroimaging Publications DOI Creative Commons
Matthew D. Turner, Abhishek Appaji,

Nibras Ar Rakib

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: May 14, 2025

Abstract We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part investigators for several exemplar real-world tasks in neuroimaging literature. investigated GPT-4o LLM from OpenAI which performed comparably groups specially trained supervised human annotators. The achieves similar performance to humans, between 0.91 0.97 zero-shot prompts without feedback LLM. Reviewing disagreements gold standard annotations we note actual errors comparable most cases, many cases these not errors. Based specific types tested, exceptionally reviewed gold-standard correct values, is usable at scale. encourage other research develop make available more specialized “micro-benchmarks,” like ones provide here, testing both LLMs, complex agent systems tasks.

Language: Английский

Introduction to the Special Issue on Cognitive Neuroscience of Mindfulness DOI
Todd S. Braver, Sara W. Lazar

Biological Psychiatry Cognitive Neuroscience and Neuroimaging, Journal Year: 2025, Volume and Issue: 10(4), P. 337 - 341

Published: April 1, 2025

Language: Английский

Citations

0

Large Language Models Can Extract Metadata for Annotation of Human Neuroimaging Publications DOI Creative Commons
Matthew D. Turner, Abhishek Appaji,

Nibras Ar Rakib

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: May 14, 2025

Abstract We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part investigators for several exemplar real-world tasks in neuroimaging literature. investigated GPT-4o LLM from OpenAI which performed comparably groups specially trained supervised human annotators. The achieves similar performance to humans, between 0.91 0.97 zero-shot prompts without feedback LLM. Reviewing disagreements gold standard annotations we note actual errors comparable most cases, many cases these not errors. Based specific types tested, exceptionally reviewed gold-standard correct values, is usable at scale. encourage other research develop make available more specialized “micro-benchmarks,” like ones provide here, testing both LLMs, complex agent systems tasks.

Language: Английский

Citations

0