medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2025,
Номер
unknown
Опубликована: Апрель 25, 2025
Surgical
pathology
reports
contain
essential
diagnostic
information,
in
free-text
form,
required
for
cancer
staging,
treatment
planning,
and
registry
documentation.
However,
their
unstructured
nature
variability
across
tumor
types
institutions
pose
challenges
automated
data
extraction.
We
present
a
consensus-driven,
reasoning-based
framework
that
uses
multiple
locally
deployed
large
language
models
(LLMs)
to
extract
six
key
variables:
site,
laterality,
histology,
stage,
grade,
behavior.
Each
LLM
produces
structured
outputs
with
accompanying
justifications,
which
are
evaluated
accuracy
coherence
by
separate
reasoning
model.
Final
consensus
values
determined
through
aggregation,
expert
validation
is
conducted
board-certified
or
equivalent
pathologists.
The
was
applied
over
4,000
from
Cancer
Genome
Atlas
(TCGA)
Moffitt
Center.
Expert
review
confirmed
high
agreement
the
TCGA
dataset
behavior
(100.0%),
histology
(98.5%),
site
(95.2%),
grade
(95.6%),
lower
performance
stage
(87.6%)
laterality
(84.8%).
In
(brain,
breast,
lung),
remained
variables,
(98.3%),
(92.4%),
achieving
strong
agreement.
certain
emerged,
such
as
inconsistent
mention
of
sentinel
lymph
node
details
anatomical
ambiguity
biopsy
interpretations.
Statistical
analyses
revealed
significant
main
effects
model
type,
variable,
organ
system,
well
×
variable
interactions,
emphasizing
role
clinical
context
performance.
These
results
highlight
importance
stratified,
multi-organ
evaluation
frameworks
benchmarking
applications.
Textual
justifications
enhanced
interpretability
enabled
human
reviewers
audit
outputs.
Overall,
this
consensus-based
approach
demonstrates
LLMs
can
provide
transparent,
accurate,
auditable
solution
integrating
AI-driven
extraction
into
real-world
workflows,
including
abstraction
synoptic
reporting.
The Lancet Digital Health,
Год журнала:
2024,
Номер
6(9), С. e662 - e672
Опубликована: Авг. 23, 2024
Among
the
rapid
integration
of
artificial
intelligence
in
clinical
settings,
large
language
models
(LLMs),
such
as
Generative
Pre-trained
Transformer-4,
have
emerged
multifaceted
tools
that
potential
for
health-care
delivery,
diagnosis,
and
patient
care.
However,
deployment
LLMs
raises
substantial
regulatory
safety
concerns.
Due
to
their
high
output
variability,
poor
inherent
explainability,
risk
so-called
AI
hallucinations,
LLM-based
applications
serve
a
medical
purpose
face
challenges
approval
devices
under
US
EU
laws,
including
recently
passed
Artificial
Intelligence
Act.
Despite
unaddressed
risks
patients,
misdiagnosis
unverified
advice,
are
available
on
market.
The
ambiguity
surrounding
these
creates
an
urgent
need
frameworks
accommodate
unique
capabilities
limitations.
Alongside
development
frameworks,
existing
regulations
should
be
enforced.
If
regulators
fear
enforcing
market
dominated
by
supply
or
technology
companies,
consequences
layperson
harm
will
force
belated
action,
damaging
potentiality
advice.
npj Digital Medicine,
Год журнала:
2024,
Номер
7(1)
Опубликована: Апрель 23, 2024
Reliably
processing
and
interlinking
medical
information
has
been
recognized
as
a
critical
foundation
to
the
digital
transformation
of
workflows,
despite
development
ontologies,
optimization
these
major
bottleneck
medicine.
The
advent
large
language
models
brought
great
excitement,
maybe
solution
medicines'
'communication
problem'
is
in
sight,
but
how
can
known
weaknesses
models,
such
hallucination
non-determinism,
be
tempered?
Retrieval
Augmented
Generation,
particularly
through
knowledge
graphs,
an
automated
approach
that
deliver
structured
reasoning
model
truth
alongside
LLMs,
relevant
structuring
therefore
also
decision
support.
JMIR Bioinformatics and Biotechnology,
Год журнала:
2024,
Номер
5, С. e64406 - e64406
Опубликована: Сен. 25, 2024
The
integration
of
chatbots
in
oncology
underscores
the
pressing
need
for
human-centered
artificial
intelligence
(AI)
that
addresses
patient
and
family
concerns
with
empathy
precision.
Human-centered
AI
emphasizes
ethical
principles,
empathy,
user-centric
approaches,
ensuring
technology
aligns
human
values
needs.
This
review
critically
examines
implications
using
large
language
models
(LLMs)
like
GPT-3
GPT-4
(OpenAI)
chatbots.
It
how
these
replicate
human-like
patterns,
impacting
design
systems.
paper
identifies
key
strategies
ethically
developing
chatbots,
focusing
on
potential
biases
arising
from
extensive
datasets
neural
networks.
Specific
datasets,
such
as
those
sourced
predominantly
Western
medical
literature
interactions,
may
introduce
by
overrepresenting
certain
demographic
groups.
Moreover,
training
methodologies
LLMs,
including
fine-tuning
processes,
can
exacerbate
biases,
leading
to
outputs
disproportionately
favor
affluent
or
populations
while
neglecting
marginalized
communities.
By
providing
examples
biased
highlights
challenges
LLMs
present
mitigation
strategies.
study
integrating
human-centric
into
mitigate
ultimately
advocating
development
are
aligned
principles
capable
serving
diverse
equitably.
Journal of the American Medical Informatics Association,
Год журнала:
2024,
Номер
31(10), С. 2315 - 2327
Опубликована: Июнь 20, 2024
Although
supervised
machine
learning
is
popular
for
information
extraction
from
clinical
notes,
creating
large
annotated
datasets
requires
extensive
domain
expertise
and
time-consuming.
Meanwhile,
language
models
(LLMs)
have
demonstrated
promising
transfer
capability.
In
this
study,
we
explored
whether
recent
LLMs
could
reduce
the
need
large-scale
data
annotations.
BMC Medical Informatics and Decision Making,
Год журнала:
2025,
Номер
25(1)
Опубликована: Янв. 23, 2025
Large
language
models
(LLMs)
are
increasingly
utilized
in
healthcare
settings.
Postoperative
pathology
reports,
which
essential
for
diagnosing
and
determining
treatment
strategies
surgical
patients,
frequently
include
complex
data
that
can
be
challenging
patients
to
comprehend.
This
complexity
adversely
affect
the
quality
of
communication
between
doctors
about
their
diagnosis
options,
potentially
impacting
patient
outcomes
such
as
understanding
condition,
adherence,
overall
satisfaction.
study
analyzed
text
reports
from
four
hospitals
October
December
2023,
focusing
on
malignant
tumors.
Using
GPT-4,
we
developed
templates
interpretive
(IPRs)
simplify
medical
terminology
non-professionals.
We
randomly
selected
70
generate
these
evaluated
remaining
628
consistency
readability.
Patient
was
measured
using
a
custom-designed
report
level
assessment
scale,
scored
by
volunteers
with
no
background.
The
also
recorded
doctor-patient
time
comprehension
levels
before
after
IPRs.
Among
698
analyzed,
interpretation
through
LLMs
significantly
improved
readability
understanding.
average
decreased
over
70%,
35
10
min
(P
<
0.001),
use
found
higher
when
provided
AI-generated
5.23
points
7.98
indicating
an
effective
translation
information.
Consistency
original
(OPRs)
IPRs
evaluated,
results
showing
high
across
all
assessed
dimensions,
achieving
score
4.95
out
5.
research
demonstrates
efficacy
like
GPT-4
enhancing
translating
into
more
accessible
language.
While
this
did
not
directly
measure
or
satisfaction,
it
provides
evidence
reduced
may
positively
influence
engagement.
These
findings
highlight
potential
AI
bridge
gaps
professionals
public
environments.
Frontiers in Medicine,
Год журнала:
2025,
Номер
11
Опубликована: Янв. 22, 2025
Background
and
aim
In
the
last
years,
natural
language
processing
(NLP)
has
transformed
significantly
with
introduction
of
large
models
(LLM).
This
review
updates
on
NLP
LLM
applications
challenges
in
gastroenterology
hepatology.
Methods
Registered
PROSPERO
(CRD42024542275)
adhering
to
PRISMA
guidelines,
we
searched
six
databases
for
relevant
studies
published
from
2003
2024,
ultimately
including
57
studies.
Results
Our
notes
an
increase
publications
2023–2024
compared
previous
reflecting
growing
interest
newer
such
as
GPT-3
GPT-4.
The
results
demonstrate
that
have
enhanced
data
extraction
electronic
health
records
other
unstructured
medical
sources.
Key
findings
include
high
precision
identifying
disease
characteristics
reports
ongoing
improvement
clinical
decision-making.
Risk
bias
assessments
using
ROBINS-I,
QUADAS-2,
PROBAST
tools
confirmed
methodological
robustness
included
Conclusion
LLMs
can
enhance
diagnosis
treatment
They
enable
records,
endoscopy
patient
notes,
enhancing
Despite
these
advancements,
integrating
into
routine
practice
is
still
challenging.
Future
work
should
prospectively
real-world
value.
Bioengineering,
Год журнала:
2024,
Номер
11(4), С. 342 - 342
Опубликована: Март 31, 2024
Large
language
models
(LLMs)
are
transformer-based
neural
networks
that
can
provide
human-like
responses
to
questions
and
instructions.
LLMs
generate
educational
material,
summarize
text,
extract
structured
data
from
free
create
reports,
write
programs,
potentially
assist
in
case
sign-out.
combined
with
vision
interpreting
histopathology
images.
have
immense
potential
transforming
pathology
practice
education,
but
these
not
infallible,
so
any
artificial
intelligence
generated
content
must
be
verified
reputable
sources.
Caution
exercised
on
how
integrated
into
clinical
practice,
as
produce
hallucinations
incorrect
results,
an
over-reliance
may
lead
de-skilling
automation
bias.
This
review
paper
provides
a
brief
history
of
highlights
several
use
cases
for
the
field
pathology.
JMIR Medical Informatics,
Год журнала:
2024,
Номер
12, С. e54811 - e54811
Опубликована: Апрель 17, 2024
Background
Burnout
among
health
care
professionals
is
a
significant
concern,
with
detrimental
effects
on
service
quality
and
patient
outcomes.
The
use
of
the
electronic
record
(EHR)
system
has
been
identified
as
contributor
to
burnout
professionals.
Objective
This
systematic
review
meta-analysis
aims
assess
prevalence
associated
EHR
system,
thereby
providing
evidence
improve
information
systems
develop
strategies
measure
mitigate
burnout.
Methods
We
conducted
comprehensive
search
PubMed,
Embase,
Web
Science
databases
for
English-language
peer-reviewed
articles
published
between
January
1,
2009,
December
31,
2022.
Two
independent
reviewers
applied
inclusion
exclusion
criteria,
study
was
assessed
using
Joanna
Briggs
Institute
checklist
Newcastle-Ottawa
Scale.
Meta-analyses
were
performed
R
(version
4.1.3;
Foundation
Statistical
Computing),
EndNote
X7
(Clarivate)
reference
management.
Results
included
32
cross-sectional
studies
5
case-control
total
66,556
participants,
mainly
physicians
registered
nurses.
pooled
in
40.4%
(95%
CI
37.5%-43.2%).
Case-control
indicated
higher
likelihood
who
spent
more
time
EHR-related
tasks
outside
work
(odds
ratio
2.43,
95%
2.31-2.57).
Conclusions
findings
highlight
association
increased
Potential
solutions
include
optimizing
systems,
implementing
automated
dictation
or
note-taking,
employing
scribes
reduce
documentation
burden,
leveraging
artificial
intelligence
enhance
efficiency
risk
Trial
Registration
PROSPERO
International
Prospective
Register
Systematic
Reviews
CRD42021281173;
https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021281173