Abstract
This
review
examines
the
use
of
large
language
models
(LLMs)
in
cancer,
analysing
articles
sourced
from
PubMed,
Embase,
and
Ovid
Medline,
published
between
2017
2024.
Our
search
strategy
included
terms
related
to
LLMs,
cancer
research,
risks,
safeguards,
ethical
issues,
focusing
on
studies
that
utilized
text-based
data.
59
were
review,
categorized
into
3
segments:
quantitative
chatbot-focused
studies,
qualitative
discussions
LLMs
cancer.
Quantitative
highlight
LLMs’
advanced
capabilities
natural
processing
(NLP),
while
demonstrate
their
potential
clinical
support
data
management.
Qualitative
research
underscores
broader
implications
including
risks
considerations.
findings
suggest
notably
ChatGPT,
have
analysis,
patient
interaction,
personalized
treatment
care.
However,
identifies
critical
biases
challenges.
We
emphasize
need
for
regulatory
oversight,
targeted
model
development,
continuous
evaluation.
In
conclusion,
integrating
offers
promising
prospects
but
necessitates
a
balanced
approach
accuracy,
integrity,
privacy.
further
study,
encouraging
responsible
exploration
application
artificial
intelligence
oncology.
BMJ Health & Care Informatics,
Год журнала:
2025,
Номер
32(1), С. e101139 - e101139
Опубликована: Янв. 1, 2025
Objectives
We
aimed
to
evaluate
the
performance
of
multiple
large
language
models
(LLMs)
in
data
extraction
from
unstructured
and
semi-structured
electronic
health
records.
Methods
50
synthetic
medical
notes
English,
containing
a
structured
an
part,
were
drafted
evaluated
by
domain
experts,
subsequently
used
for
LLM-prompting.
18
LLMs
against
baseline
transformer-based
model.
Performance
assessment
comprised
four
entity
five
binary
classification
tasks
with
total
450
predictions
each
LLM.
LLM-response
consistency
was
performed
over
three
same-prompt
iterations.
Results
Claude
3.0
Opus,
Sonnet,
2.0,
GPT
4,
2.1,
Gemini
Advanced,
PaLM
2
chat-bison
Llama
3-70b
exhibited
excellent
overall
accuracy
>0.98
(0.995,
0.988,
0.986,
0.982,
respectively),
significantly
higher
than
RoBERTa
model
(0.742).
chat-bison,
Sonnet
showed
marginally
Advanced
lower
multiple-run
(Krippendorff’s
alpha
value
1,
0.998,
0.996,
0.992,
0.991,
0.989,
0.985,
respectively).
Discussion
chat
bison
best,
exhibiting
outstanding
both
classification,
highly
consistent
responses
Their
use
could
leverage
research
unburden
healthcare
professionals.
Real-data
analyses
are
warranted
confirm
their
real-world
setting.
Conclusion
seem
be
able
reliably
extract
Further
using
real
Journal of the American Heart Association,
Год журнала:
2025,
Номер
unknown
Опубликована: Март 27, 2025
Background
Rates
of
oral
anticoagulation
(OAC)
nonprescription
in
atrial
fibrillation
approach
50%.
Understanding
reasons
for
OAC
may
reduce
gaps
guideline‐recommended
care.
We
aimed
to
identify
from
clinical
notes
using
large
language
models.
Methods
identified
all
patients
and
associated
our
health
care
system
with
a
clinician‐billed
visit
without
another
indication
stratified
them
on
the
basis
active
prescriptions.
Three
annotators
labeled
10%
(“annotation
set”).
engineered
prompts
generative
model
(Generative
Pre‐trained
Transformer
4)
trained
discriminative
(ClinicalBERT)
selected
best‐performing
predict
remaining
90%
(“inference
Results
A
total
35
737
were
identified,
which
7712
(21.6%)
did
not
have
910
across
771
annotated.
Generative
4
outperformed
ClinicalBERT
(macro‐F1
score
0.79,
compared
0.69
ClinicalBERT).
Using
inference
set,
61.1%
had
documented
nonprescription,
most
commonly
alternative
use
an
antiplatelet
agent
(23.3%),
therapeutic
inertia
(21.0%),
low
burden
(17.1%).
Conclusions
This
is
first
study
models
extract
reveals
guideline‐discordant
practices
actionable
insights
development
interventions
nonprescription.
medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Авг. 12, 2024
Background:
Generative
Large
language
models
(LLMs)
represent
a
significant
advancement
in
natural
processing,
achieving
state-of-the-art
performance
across
various
tasks.
However,
their
application
clinical
settings
using
real
electronic
health
records
(EHRs)
is
still
rare
and
presents
numerous
challenges.
Objective:
This
study
aims
to
systematically
review
the
use
of
generative
LLMs,
effectiveness
relevant
techniques
patient
care-related
topics
involving
EHRs,
summarize
challenges
faced,
suggest
future
directions.
Methods:
A
Boolean
search
for
peer-reviewed
articles
was
conducted
on
May
19th,
2024
PubMed
Web
Science
include
research
published
since
2023,
which
one
month
after
release
ChatGPT.
The
results
were
deduplicated.
Multiple
reviewers,
including
biomedical
informaticians,
computer
scientists,
physician,
screened
publications
eligibility
data
extraction.
Only
studies
utilizing
LLMs
analyze
EHR
included.
We
summarized
prompt
engineering,
fine-tuning,
multimodal
data,
evaluation
matrices.
Additionally,
we
identified
current
applying
as
reported
by
included
proposed
Results:
initial
6,328
unique
studies,
with
76
screening.
Of
these,
67
(88.2%)
employed
zero-shot
prompting,
five
them
100%
accuracy
specific
Nine
used
advanced
prompting
strategies;
four
tested
these
strategies
experimentally,
finding
that
engineering
improved
performance,
noting
non-linear
relationship
between
number
examples
improvement.
Eight
explored
fine-tuning
all
improvements
tasks,
but
three
noted
potential
degradation
certain
two
utilized
LLM-based
decision-making
enabled
accurate
disease
diagnosis
prognosis.
55
different
metrics
22
purposes,
such
correctness,
completeness,
conciseness.
Two
investigated
LLM
bias,
detecting
no
bias
other
male
patients
received
more
appropriate
suggestions.
Six
hallucinations,
fabricating
names
structured
thyroid
ultrasound
reports.
Additional
not
limited
impersonal
tone
consultations,
made
uncomfortable,
difficulty
had
understanding
responses.
Conclusion:
Our
indicates
few
have
computational
enhance
performance.
diverse
highlight
need
standardization.
currently
cannot
replace
physicians
due
medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2025,
Номер
unknown
Опубликована: Фев. 13, 2025
Extracting
structured
data
from
free-text
medical
records
is
laborious
and
error-prone.
Traditional
rule-based
early
neural
network
methods
often
struggle
with
domain
complexity
require
extensive
tuning.
Large
language
models
(LLMs)
offer
a
promising
solution
but
must
be
tailored
to
nuanced
clinical
knowledge
complex,
multipart
entities.
We
developed
flexible,
end-to-end
LLM
pipeline
extract
diagnoses,
per-specimen
anatomical-sites,
procedures,
histology,
detailed
immunohistochemistry
results
pathology
reports.
A
human-in-the-loop
process
create
validated
reference
annotations
for
development
set
of
152
kidney
tumor
reports
guided
iterative
refinement.
To
drive
assessment
performance
we
comprehensive
error
ontology-
categorizing
by
significance
(major
vs.
minor),
source
(LLM,
manual
annotation,
or
insufficient
instructions),
contextual
origin.
The
finalized
was
applied
3,520
internal
(of
which
2,297
had
pre-existing
templated
available
cross
referencing)
evaluated
adaptability
using
53
publicly
breast
cancer
After
six
iterations,
major
errors
on
the
decreased
0.99%
(14/1413
entities).
identified
11
key
contexts
complications
arose-including
history
integration,
entity
linking,
specification
granularity-which
provided
valuable
insight
in
understanding
our
research
goals.
Using
as
reference,
achieved
macro-averaged
F1
score
0.99
identifying
subtypes
0.97
detecting
metastasis.
When
adapted
dataset,
three
iterations
were
required
align
domain-specific
instructions,
attaining
89%
agreement
curated
data.
This
work
illustrates
that
LLM-based
extraction
pipelines
can
achieve
near
expert-level
accuracy
carefully
constructed
instructions
specific
aims.
Beyond
raw
metrics,
itself-balancing
specificity
relevance-proved
essential.
approach
offers
transferable
blueprint
applying
emerging
capabilities
other
complex
information
tasks.
ABSTRACT
Background
Cancer
subtype
classification
plays
a
pivotal
role
in
personalised
medicine,
requiring
the
integration
of
diverse
data
types.
Traditional
prompting
methods
vision‐language
models
fail
to
fully
leverage
multimodal
data,
particularly
when
working
with
minimal
labelled
data.
Methods
To
address
these
limitations,
we
propose
novel
framework
that
introduces
CancerFusionPrompt,
specialised
method
for
integrating
imaging
and
multi‐omics
Our
proposed
approach
extends
few‐shot
learning
paradigm
by
incorporating
in‐context
cancer
classification.
Results
The
significantly
outperforms
state‐of‐the‐art
techniques
classification,
achieving
notable
improvements
both
accuracy
generalisation.
These
results
demonstrate
superior
capability
CancerFusionPrompt
handling
complex
inputs
compared
existing
methods.
Conclusions
offers
powerful
solution
tasks.
By
overcoming
limitations
current
methods,
enables
more
accurate
robust
predictions
Communications Medicine,
Год журнала:
2025,
Номер
5(1)
Опубликована: Март 31, 2025
Abstract
Background
Pathology
departments
generate
large
volumes
of
unstructured
data
as
free-text
diagnostic
reports.
Converting
these
reports
into
structured
formats
for
analytics
or
artificial
intelligence
projects
requires
substantial
manual
effort
by
specialized
personnel.
While
recent
studies
show
promise
in
using
advanced
language
models
structuring
pathology
data,
they
primarily
rely
on
proprietary
models,
raising
cost
and
privacy
concerns.
Additionally,
important
aspects
such
prompt
engineering
model
quantization
deployment
consumer-grade
hardware
remain
unaddressed.
Methods
We
created
a
dataset
579
annotated
German
English
versions.
Six
(proprietary:
GPT-4;
open-source:
Llama2
13B,
70B,
Llama3
8B,
Qwen2.5
7B)
were
evaluated
their
ability
to
extract
eleven
key
parameters
from
we
investigated
performance
across
different
strategies
techniques
assess
practical
scenarios.
Results
Here
that
open-source
with
high
precision,
matching
the
accuracy
GPT-4
model.
The
precision
varies
significantly
configurations.
These
variations
depend
specific
methods
used
during
deployment.
Conclusions
Open-source
demonstrate
comparable
solutions
report
data.
This
finding
has
significant
implications
healthcare
institutions
seeking
cost-effective,
privacy-preserving
solutions.
configurations
provide
valuable
insights
departments.
Our
publicly
available
bilingual
serves
both
benchmark
resource
future
research.
Visual Computing for Industry Biomedicine and Art,
Год журнала:
2025,
Номер
8(1)
Опубликована: Апрель 3, 2025
Breast
cancer
is
one
of
the
most
common
malignancies
among
women
globally.
Magnetic
resonance
imaging
(MRI),
as
final
non-invasive
diagnostic
tool
before
biopsy,
provides
detailed
free-text
reports
that
support
clinical
decision-making.
Therefore,
effective
utilization
information
in
MRI
to
make
reliable
decisions
crucial
for
patient
care.
This
study
proposes
a
novel
method
BI-RADS
classification
using
breast
reports.
Large
language
models
are
employed
transform
into
structured
Specifically,
missing
category
(MCI)
absent
supplemented
by
assigning
default
values
categories
To
ensure
data
privacy,
locally
deployed
Qwen-Chat
model
employed.
Furthermore,
enhance
domain-specific
adaptability,
knowledge-driven
prompt
designed.
The
Qwen-7B-Chat
fine-tuned
specifically
structuring
prevent
loss
and
enable
comprehensive
learning
all
report
details,
fusion
strategy
introduced,
combining
train
model.
Experimental
results
show
proposed
outperforms
existing
methods
across
multiple
evaluation
metrics.
an
external
test
set
from
different
hospital
used
validate
robustness
approach.
surpasses
GPT-4o
terms
performance.
Ablation
experiments
confirm
prompt,
MCI,
model's
Biomedical Journal,
Год журнала:
2025,
Номер
unknown, С. 100868 - 100868
Опубликована: Апрель 1, 2025
Large
Language
Models
(LLMs)
are
capable
of
transforming
healthcare
by
demonstrating
remarkable
capabilities
in
language
understanding
and
generation.
They
have
matched
or
surpassed
human
performance
standardized
medical
examinations
assisted
diagnostics
across
specialties
like
dermatology,
radiology,
ophthalmology.
LLMs
can
enhance
patient
education
providing
accurate,
readable,
empathetic
responses,
they
streamline
clinical
workflows
through
efficient
information
extraction
from
unstructured
data
such
as
notes.
Integrating
LLM
into
practice
involves
user
interface
design,
clinician
training,
effective
collaboration
between
Artificial
Intelligence
(AI)
systems
professionals.
Users
must
possess
a
solid
generative
AI
domain
knowledge
to
assess
the
generated
content
critically.
Ethical
considerations
ensure
privacy,
security,
mitigating
biases,
maintaining
transparency
critical
for
responsible
deployment.
Future
directions
include
interdisciplinary
collaboration,
developing
new
benchmarks
that
incorporate
safety
ethical
measures,
advancing
multimodal
integrate
text
imaging
data,
creating
LLM-based
agents
complex
decision-making,
addressing
underrepresented
rare
diseases,
integrating
with
robotic
precision
procedures.
Emphasizing
safety,
integrity,
human-centered
implementation
is
essential
maximizing
benefits
LLMs,
while
potential
risks,
thereby
helping
these
tools
rather
than
replace
expertise
compassion
healthcare.
npj Digital Medicine,
Год журнала:
2024,
Номер
7(1)
Опубликована: Ноя. 18, 2024
Large
language
models
(LLMs)
can
optimize
clinical
workflows;
however,
the
economic
and
computational
challenges
of
their
utilization
at
health
system
scale
are
underexplored.
We
evaluated
how
concatenating
queries
with
multiple
notes
tasks
simultaneously
affects
model
performance
under
increasing
loads.
assessed
ten
LLMs
different
capacities
sizes
utilizing
real-world
patient
data.
conducted
>300,000
experiments
various
task
configurations,
measuring
accuracy
in
question-answering
ability
to
properly
format
outputs.
Performance
deteriorated
as
number
questions
increased.
High-capacity
models,
like
Llama-3–70b,
had
low
failure
rates
high
accuracies.
GPT-4-turbo-128k
was
similarly
resilient
across
burdens,
but
after
50
large
prompt
sizes.
After
addressing
mitigable
failures,
these
two
concatenate
up
simultaneous
effectively,
validation
on
a
public
medical
dataset.
An
analysis
demonstrated
17-fold
cost
reduction
using
concatenation.
These
results
identify
limits
for
effective
highlight
avenues
cost-efficiency
enterprise
scale.