medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2025,
Номер
unknown
Опубликована: Апрель 25, 2025
Surgical
pathology
reports
contain
essential
diagnostic
information,
in
free-text
form,
required
for
cancer
staging,
treatment
planning,
and
registry
documentation.
However,
their
unstructured
nature
variability
across
tumor
types
institutions
pose
challenges
automated
data
extraction.
We
present
a
consensus-driven,
reasoning-based
framework
that
uses
multiple
locally
deployed
large
language
models
(LLMs)
to
extract
six
key
variables:
site,
laterality,
histology,
stage,
grade,
behavior.
Each
LLM
produces
structured
outputs
with
accompanying
justifications,
which
are
evaluated
accuracy
coherence
by
separate
reasoning
model.
Final
consensus
values
determined
through
aggregation,
expert
validation
is
conducted
board-certified
or
equivalent
pathologists.
The
was
applied
over
4,000
from
Cancer
Genome
Atlas
(TCGA)
Moffitt
Center.
Expert
review
confirmed
high
agreement
the
TCGA
dataset
behavior
(100.0%),
histology
(98.5%),
site
(95.2%),
grade
(95.6%),
lower
performance
stage
(87.6%)
laterality
(84.8%).
In
(brain,
breast,
lung),
remained
variables,
(98.3%),
(92.4%),
achieving
strong
agreement.
certain
emerged,
such
as
inconsistent
mention
of
sentinel
lymph
node
details
anatomical
ambiguity
biopsy
interpretations.
Statistical
analyses
revealed
significant
main
effects
model
type,
variable,
organ
system,
well
×
variable
interactions,
emphasizing
role
clinical
context
performance.
These
results
highlight
importance
stratified,
multi-organ
evaluation
frameworks
benchmarking
applications.
Textual
justifications
enhanced
interpretability
enabled
human
reviewers
audit
outputs.
Overall,
this
consensus-based
approach
demonstrates
LLMs
can
provide
transparent,
accurate,
auditable
solution
integrating
AI-driven
extraction
into
real-world
workflows,
including
abstraction
synoptic
reporting.
medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Июнь 13, 2024
Abstract
Background
Medical
research
with
real-world
clinical
data
can
be
challenging
due
to
privacy
requirements.
Ideally,
patient
are
handled
in
a
fully
pseudonymised
or
anonymised
way.
However,
this
make
it
difficult
for
medical
researchers
access
and
analyze
large
datasets
exchange
between
hospitals.
De-identifying
free
text
is
particularly
the
diverse
documentation
styles
unstructured
nature
of
data.
recent
advancements
natural
language
processing
(NLP),
driven
by
development
models
(LLMs),
have
revolutionized
ability
extract
information
from
text.
Methods
We
hypothesize
that
LLMs
highly
effective
tools
extracting
patient-related
information,
which
subsequently
used
de-identify
reports.
To
test
hypothesis,
we
conduct
benchmark
study
using
eight
locally
deployable
(Llama-3
8B,
Llama-3
70B,
Llama-2
7B,
7B
“Sauerkraut”,
70B
Mistral
Phi-3-mini)
dataset
100
letters.
then
remove
identified
our
newly
developed
LLM-Anonymizer
pipeline.
Results
Our
results
demonstrate
LLM-Anonymizer,
when
achieved
success
rate
98.05%
removing
characters
carrying
personal
identifying
information.
When
evaluating
performance
relation
number
manually
as
containing
identifiable
characteristics,
system
missed
only
1.95%
erroneously
redacted
0.85%
characters.
Conclusion
provide
full
LLM-based
Anonymizer
pipeline
under
an
open
source
license
user-friendly
web
interface
operates
on
local
hardware
requires
no
programming
skills.
This
powerful
tool
has
potential
significantly
facilitate
enabling
secure
efficient
de-identification
premise,
thereby
addressing
key
challenges
sharing.
Nature Communications,
Год журнала:
2024,
Номер
15(1)
Опубликована: Окт. 16, 2024
Cancer
staging
is
an
essential
clinical
attribute
informing
patient
prognosis
and
trial
eligibility.
However,
it
not
routinely
recorded
in
structured
electronic
health
records.
Here,
we
present
BB-TEN:
Big
Bird
-
TNM
Extracted
from
Notes,
a
generalizable
method
for
the
automated
classification
of
stage
directly
pathology
report
text.
We
train
BERT-based
model
using
publicly
available
reports
across
approximately
7000
patients
23
cancer
types.
explore
use
different
types,
with
differing
input
sizes,
parameters,
architectures.
Our
final
goes
beyond
term-extraction,
inferring
context
when
included
text
explicitly.
As
external
validation,
test
our
on
almost
8000
Columbia
University
Medical
Center,
finding
that
trained
achieved
AU-ROC
0.815-0.942.
This
suggests
can
be
applied
broadly
to
other
institutions
without
additional
institution-specific
fine-tuning.
Journal of Medical Systems,
Год журнала:
2025,
Номер
49(1)
Опубликована: Март 13, 2025
Abstract
Manually
converting
unstructured
text
pathology
reports
into
structured
is
very
time-consuming
and
prone
to
errors.
This
study
demonstrates
the
transformative
potential
of
generative
AI
in
automating
analysis
free-text
reports.
Employing
ChatGPT
Large
Language
Model
within
a
Streamlit
web
application,
we
automated
extraction
structuring
information
from
33
breast
cancer
Taipei
Medical
University
Hospital.
Achieving
99.61%
accuracy
rate,
system
notably
reduced
processing
time
compared
traditional
methods.
not
only
underscores
efficacy
medical
data
but
also
highlights
its
enhance
efficiency
reliability
analysis.
However,
this
limited
was
conducted
using
obtained
hospitals
associated
with
single
institution.
In
future,
plan
expand
scope
research
include
for
other
types
incrementally
conduct
external
validation
further
substantiate
robustness
generalizability
proposed
system.
Through
technological
integration,
aimed
capabilities
improving
both
speed
processing.
The
outcomes
affirm
that
can
significantly
transform
handling
reports,
promising
substantial
advancements
biomedical
by
facilitating
complex
data.
Abstract
Synoptic
reporting,
the
documenting
of
clinical
information
in
a
structured
manner,
enhances
patient
care
by
improving
accuracy,
readability,
and
report
completeness,
but
imposes
significant
administrative
burdens
on
physicians.
The
potential
Large
Language
Models
(LLMs)
for
automating
synoptic
reporting
remains
underexplored.
In
this
study,
we
explore
state-of-the-art
LLMs
automatic
using
7774
pathology
reports
from
8
cancer
types,
paired
with
physician
annotated
Mayo
Clinic
EHR.
We
developed
comprehensive
automation
framework,
combining
LLMs,
incorporating
parameter-efficient
optimization,
scalable
prompt
templates,
robust
evaluation
strategies.
validate
our
results
both
internal
external
data,
ensuring
alignment
pathologist
responses.
Using
fine-tuned
LLAMA-2
achieved
BERT
F1
scores
above
0.86
across
all
data
elements
exceeding
0.94
over
50%
(11
22)
elements,
translating
to
manually
assessed
mean
semantic
accuracies
77%
up
81%
short
reports.
Research Square (Research Square),
Год журнала:
2025,
Номер
unknown
Опубликована: Апрель 16, 2025
AbstractBackground:
Large
Language
Models
(LLMs)
are
one
of
the
artificial
intelligence
(AI)
technologies
used
to
understand
and
generate
text,
summarize
information,
comprehend
contextual
cues.
LLMs
have
been
increasingly
by
researchers
in
various
medical
applications,
but
their
effectiveness
limitations
still
uncertain,
especially
across
specialties.
Objective:
This
review
evaluates
recent
literature
on
how
utilized
research
studies
19
It
also
explores
challenges
involved
suggests
areas
for
future
focus.
Methods:
Two
performed
searches
PubMed,
Web
Science
Scopus
identify
published
from
January
2021
March
2024.
The
included
usage
LLM
performing
tasks.
Data
was
extracted
analyzed
five
reviewers.
To
assess
risk
bias,
quality
assessment
using
revised
tool
intelligence-centered
diagnostic
accuracy
(QUADAS-AI).
Results:
Results
were
synthesized
through
categorical
analysis
evaluation
metrics,
impact
types,
validation
approaches
A
total
84
this
mainly
originated
two
countries;
USA
(35/84)
China
(16/84).
Although
reviewed
applications
spread
specialties,
multi-specialty
demonstrated
22
studies.
Various
aims
include
clinical
natural
language
processing
(31/84),
supporting
decision
(20/84),
education
(15/84),
diagnoses
patient
management
engagement
(3/84).
GPT-based
BERT-based
most
(83/84)
Despite
reported
positive
impacts
such
as
improved
efficiency
accuracy,
related
reliability,
ethics
remain.
overall
bias
low
72
studies,
high
11
not
clear
3
Conclusion:
dominate
specialty
with
over
98.8%
these
models.
potential
benefits
process
diagnostics,
a
key
finding
regarding
substantial
variability
performance
among
LLMs.
For
instance,
LLMs'
ranged
3%
support
90%
some
NLP
Heterogeneity
utilization
diverse
tasks
contexts
prevented
meaningful
meta-analysis,
lacked
standardized
methodologies,
outcome
measures,
implementation
approaches.
Therefore,
room
improvement
remains
wide
developing
domain-specific
data
establishing
standards
ensure
reliability
effectiveness.