medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 13, 2024
Abstract
No
existing
algorithm
can
reliably
identify
metastasis
from
pathology
reports
across
multiple
cancer
types
and
the
entire
US
population.
In
this
study,
we
develop
a
deep
learning
model
that
automatically
detects
patients
with
metastatic
by
using
many
laboratories
of
types.
We
trained
validated
our
on
cohort
29,632
four
Surveillance,
Epidemiology,
End
Results
(SEER)
registries
linked
to
60,471
unstructured
reports.
Our
architecture
task-specific
data
outperforms
general-purpose
LLM,
recall
0.894
compared
0.824.
quantified
uncertainty
used
it
defer
for
human
review.
found
retaining
72.9%
increased
0.969.
This
approach
could
streamline
population-based
surveillance
help
address
unmet
need
capture
recurrence
or
progression.
Diagnostics,
Journal Year:
2025,
Volume and Issue:
15(6), P. 735 - 735
Published: March 15, 2025
Publications
on
the
application
of
artificial
intelligence
(AI)
to
many
situations,
including
those
in
clinical
medicine,
created
2023–2024
are
reviewed
here.
Because
short
time
frame
covered,
here,
it
is
not
possible
conduct
exhaustive
analysis
as
would
be
case
meta-analyses
or
systematic
reviews.
Consequently,
this
literature
review
presents
an
examination
narrative
AI’s
relation
contemporary
topics
related
medicine.
The
landscape
findings
here
span
254
papers
published
2024
topically
reporting
AI
which
83
articles
considered
present
because
they
contain
evidence-based
findings.
In
particular,
types
cases
deal
with
accuracy
initial
differential
diagnoses,
cancer
treatment
recommendations,
board-style
exams,
and
performance
various
tasks,
imaging.
Importantly,
summaries
validation
techniques
used
evaluate
presented.
This
focuses
AIs
that
have
a
relevancy
evidenced
by
evaluation
publications.
speaks
both
what
has
been
promised
delivered
systems.
Readers
will
able
understand
when
generative
may
expressing
views
without
having
necessary
information
(ultracrepidarianism)
responding
if
had
expert
knowledge
does
not.
A
lack
awareness
deliver
inadequate
confabulated
can
result
incorrect
medical
decisions
inappropriate
applications
(Dunning–Kruger
effect).
As
result,
certain
cases,
system
might
underperform
provide
results
greatly
overestimate
any
validity.
Journal of Dentistry,
Journal Year:
2025,
Volume and Issue:
unknown, P. 105764 - 105764
Published: April 1, 2025
This
study
aimed
to
evaluate
and
compare
the
performance
of
several
large
language
models
(LLMs)
in
context
restorative
dentistry
endodontics,
focusing
on
their
accuracy,
consistency,
contextual
understanding.
The
dataset
was
extracted
from
national
educational
archives
Collège
National
des
Enseignants
en
Odontologie
Conservatrice
(CNEOC)
includes
all
chapters
reference
manual
for
dental
residency
applicants.
Multiple-choice
questions
(MCQs)
were
selected
following
a
review
by
three
independent
academic
experts.
Four
LLMs
assessed:
ChatGPT-3.5,
ChatGPT-4
(OpenAI),
Claude-3
(Anthropic),
Mistral
7B
(Mistral
AI).
Model
accuracy
determined
comparing
responses
with
expert-provided
answers.
Consistency
measured
through
robustness
(the
ability
provide
identical
paraphrased
questions)
repeatability
same
question).
Contextual
understanding
evaluated
based
model's
categorise
correctly
infer
terms
definitions.
Additionally,
reassessed
after
providing
relevant
full
course
chapter.
A
total
517
MCQs
539
definitions
included.
demonstrated
significantly
higher
than
7B,
showing
greater
robustness.
Advanced
displayed
high
presenting
content,
although
varied
closely
related
concepts.
Supplying
generally
improved
response
though
inconsistently
across
topics.
Even
most
advanced
LLMs,
such
as
Claude
3,
achieve
moderate
require
cautious
use
due
inconsistencies
Future
studies
should
focus
integrating
validated
content
refining
prompt
engineering
enhance
clinical
utility
LLMs.
findings
underscore
potential
context-based
prompting
endodontics.
The Oncologist,
Journal Year:
2025,
Volume and Issue:
30(4)
Published: March 29, 2025
Abstract
Background
Recent
advances
in
large
language
models
(LLM)
have
enabled
human-like
qualities
of
natural
competency.
Applied
to
oncology,
LLMs
been
proposed
serve
as
an
information
resource
and
interpret
vast
amounts
data
a
clinical
decision-support
tool
improve
outcomes.
Objective
This
review
aims
describe
the
current
status
medical
accuracy
oncology-related
LLM
applications
research
trends
for
further
areas
investigation.
Methods
A
scoping
literature
search
was
conducted
on
Ovid
Medline
peer-reviewed
studies
published
since
2000.
We
included
primary
that
evaluated
model
applied
oncology
settings.
Study
characteristics
outcomes
were
extracted
landscape
LLMs.
Results
Sixty
based
inclusion
exclusion
criteria.
The
majority
health
question-answer
style
examinations
(48%),
followed
by
diagnosis
(20%)
management
(17%).
number
utility
fine-tuning
prompt-engineering
increased
over
time
from
2022
2024.
Studies
reported
advantages
accurate
resource,
reduction
clinician
workload,
improved
accessibility
readability
information,
while
noting
disadvantages
such
poor
reliability,
hallucinations,
need
oversight.
Discussion
There
exists
significant
interest
application
with
particular
focus
decision
support
tool.
However,
is
needed
validate
these
tools
external
hold-out
datasets
generalizability
across
diverse
scenarios,
underscoring
supervision
tools.
PM&R,
Journal Year:
2025,
Volume and Issue:
unknown
Published: May 2, 2025
Abstract
Background
There
have
been
significant
advances
in
machine
learning
and
artificial
intelligence
technology
over
the
past
few
years,
leading
to
release
of
large
language
models
(LLMs)
such
as
ChatGPT.
are
many
potential
applications
for
LLMs
health
care,
but
it
is
critical
first
determine
how
accurate
before
putting
them
into
practice.
No
studies
evaluated
accuracy
precision
responding
questions
related
field
physical
medicine
rehabilitation
(PM&R).
Objective
To
two
OpenAI
(GPT‐3.5,
released
November
2022,
GPT‐4o,
May
2024)
answering
PM&R
knowledge.
Design
Cross‐sectional
study.
Both
were
tested
on
same
744
knowledge
that
covered
all
aspects
(general
rehabilitation,
stroke,
traumatic
brain
injury,
spinal
cord
musculoskeletal
medicine,
pain
electrodiagnostic
pediatric
prosthetics
orthotics,
rheumatology,
pharmacology).
Each
LLM
was
three
times
question
set
assess
precision.
Setting
N/A.
Patients
Interventions
Main
Outcome
Measure
Percentage
correctly
answered
questions.
Results
For
runs
744‐question
set,
GPT‐3.5
56.3%,
56.5%,
56.9%
correctly.
GPT‐4o
83.6%,
84%,
84.1%
outperformed
subcategories
Conclusions
rapidly
advancing,
with
more
recent
model
performing
much
better
compared
GPT‐3.5.
augmenting
clinical
practice,
medical
training,
patient
education.
However,
has
limitations
physicians
should
remain
cautious
using
practice
at
this
time.
Information,
Journal Year:
2024,
Volume and Issue:
16(1), P. 13 - 13
Published: Dec. 30, 2024
This
study
explores
the
potential
of
natural
language
models,
including
large
to
extract
causal
relations
from
medical
texts,
specifically
clinical
practice
guidelines
(CPGs).
The
outcomes
causality
extraction
for
gestational
diabetes
are
presented,
marking
a
first
in
field.
results
reported
on
set
experiments
using
variants
BERT
(BioBERT,
DistilBERT,
and
BERT)
newer
models
(LLMs),
namely,
GPT-4
LLAMA2.
Our
show
that
BioBERT
performed
better
than
other
with
an
average
F1-score
0.72.
LLAMA2
similar
performance
but
less
consistency.
code
annotated
corpus
statements
within
released.
Extracting
structures
might
help
identify
LLMs’
hallucinations
possibly
prevent
some
errors
if
LLMs
used
patient
settings.
Some
practical
extensions
extracting
text
would
include
providing
additional
diagnostic
support
based
frequent
cause–effect
relationships,
identifying
possible
inconsistencies
guidelines,
evaluating
evidence
recommendations.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 13, 2024
Abstract
No
existing
algorithm
can
reliably
identify
metastasis
from
pathology
reports
across
multiple
cancer
types
and
the
entire
US
population.
In
this
study,
we
develop
a
deep
learning
model
that
automatically
detects
patients
with
metastatic
by
using
many
laboratories
of
types.
We
trained
validated
our
on
cohort
29,632
four
Surveillance,
Epidemiology,
End
Results
(SEER)
registries
linked
to
60,471
unstructured
reports.
Our
architecture
task-specific
data
outperforms
general-purpose
LLM,
recall
0.894
compared
0.824.
quantified
uncertainty
used
it
defer
for
human
review.
found
retaining
72.9%
increased
0.969.
This
approach
could
streamline
population-based
surveillance
help
address
unmet
need
capture
recurrence
or
progression.