medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 13, 2024
Abstract
No
existing
algorithm
can
reliably
identify
metastasis
from
pathology
reports
across
multiple
cancer
types
and
the
entire
US
population.
In
this
study,
we
develop
a
deep
learning
model
that
automatically
detects
patients
with
metastatic
by
using
many
laboratories
of
types.
We
trained
validated
our
on
cohort
29,632
four
Surveillance,
Epidemiology,
End
Results
(SEER)
registries
linked
to
60,471
unstructured
reports.
Our
architecture
task-specific
data
outperforms
general-purpose
LLM,
recall
0.894
compared
0.824.
quantified
uncertainty
used
it
defer
for
human
review.
found
retaining
72.9%
increased
0.969.
This
approach
could
streamline
population-based
surveillance
help
address
unmet
need
capture
recurrence
or
progression.
Cancer Medicine,
Journal Year:
2025,
Volume and Issue:
14(1)
Published: Jan. 1, 2025
ABSTRACT
Purpose
Caregivers
in
pediatric
oncology
need
accurate
and
understandable
information
about
their
child's
condition,
treatment,
side
effects.
This
study
assesses
the
performance
of
publicly
accessible
large
language
model
(LLM)‐supported
tools
providing
valuable
reliable
to
caregivers
children
with
cancer.
Methods
In
this
cross‐sectional
study,
we
evaluated
four
LLM‐supported
tools—ChatGPT
(GPT‐4),
Google
Bard
(Gemini
Pro),
Microsoft
Bing
Chat,
SGE—against
a
set
frequently
asked
questions
(FAQs)
derived
from
Children's
Oncology
Group
Family
Handbook
expert
input
(In
total,
26
FAQs
104
generated
responses).
Five
experts
assessed
LLM
responses
using
measures
including
accuracy,
clarity,
inclusivity,
completeness,
clinical
utility,
overall
rating.
Additionally,
content
quality
was
readability,
AI
disclosure,
source
credibility,
resource
matching,
originality.
We
used
descriptive
analysis
statistical
tests
Shapiro–Wilk,
Levene's,
Kruskal–Wallis
H
‐tests,
Dunn's
post
hoc
for
pairwise
comparisons.
Results
ChatGPT
shows
high
when
by
experts.
also
performed
well,
especially
accuracy
clarity
responses,
whereas
Chat
SGE
had
lower
scores.
Regarding
disclosure
being
AI,
it
observed
less
which
may
have
affected
maintained
balance
between
response
clarity.
most
readable
answered
complexity.
varied
significantly
(
p
<
0.001)
across
all
evaluations
except
inclusivity.
Through
our
thematic
free‐text
comments,
emotional
tone
empathy
emerged
as
unique
theme
mixed
feedback
on
expectations
be
empathetic.
Conclusion
can
enhance
caregivers'
knowledge
oncology.
Each
has
strengths
areas
improvement,
indicating
careful
selection
based
specific
contexts.
Further
research
is
required
explore
application
other
medical
specialties
patient
demographics,
assessing
broader
applicability
long‐term
impacts.
Asia-Pacific Journal of Ophthalmology,
Journal Year:
2024,
Volume and Issue:
13(4), P. 100084 - 100084
Published: July 1, 2024
Natural
Language
Processing
(NLP)
is
a
subfield
of
artificial
intelligence
that
focuses
on
the
interaction
between
computers
and
human
language,
enabling
to
understand,
generate,
derive
meaning
from
language.
NLP's
potential
applications
in
medical
field
are
extensive
vary
extracting
data
Electronic
Health
Records
-one
its
most
well-known
frequently
exploited
uses-
investigating
relationships
among
genetics,
biomarkers,
drugs,
diseases
for
proposal
new
medications.
NLP
can
be
useful
clinical
decision
support,
patient
monitoring,
or
image
analysis.
Despite
vast
potential,
real-world
application
still
limited
due
various
challenges
constraints,
evolution
predominantly
continues
within
research
domain.
However,
with
increasingly
widespread
use
NLP,
particularly
availability
large
language
models,
such
as
ChatGPT,
it
crucial
professionals
aware
status,
uses,
limitations
these
technologies.
Healthcare,
Journal Year:
2024,
Volume and Issue:
12(15), P. 1548 - 1548
Published: Aug. 5, 2024
Background:
In
recent
years,
the
integration
of
large
language
models
(LLMs)
into
healthcare
has
emerged
as
a
revolutionary
approach
to
enhancing
doctor–patient
communication,
particularly
in
management
diseases
such
prostate
cancer.
Methods:
Our
paper
evaluated
effectiveness
three
prominent
LLMs—ChatGPT
(3.5),
Gemini
(Pro),
and
Co-Pilot
(the
free
version)—against
official
Romanian
Patient’s
Guide
on
Employing
randomized
blinded
method,
our
study
engaged
eight
medical
professionals
assess
responses
these
based
accuracy,
timeliness,
comprehensiveness,
user-friendliness.
Results:
The
primary
objective
was
explore
whether
LLMs,
when
operating
Romanian,
offer
comparable
or
superior
performance
Guide,
considering
their
potential
personalize
communication
enhance
informational
accessibility
for
patients.
Results
indicated
that
ChatGPT,
generally
provided
more
accurate
user-friendly
information
compared
Guide.
Conclusions:
findings
suggest
significant
LLMs
by
providing
accessible
information.
However,
variability
across
different
underscores
need
tailored
implementation
strategies.
We
highlight
importance
integrating
with
nuanced
understanding
capabilities
limitations
optimize
use
clinical
settings.
Journal of Clinical Medicine,
Journal Year:
2024,
Volume and Issue:
13(17), P. 5101 - 5101
Published: Aug. 28, 2024
Large
Language
Models
(LLMs
have
the
potential
to
revolutionize
clinical
medicine
by
enhancing
healthcare
access,
diagnosis,
surgical
planning,
and
education.
However,
their
utilization
requires
careful,
prompt
engineering
mitigate
challenges
like
hallucinations
biases.
Proper
of
LLMs
involves
understanding
foundational
concepts
such
as
tokenization,
embeddings,
attention
mechanisms,
alongside
strategic
prompting
techniques
ensure
accurate
outputs.
For
innovative
solutions,
it
is
essential
maintain
ongoing
collaboration
between
AI
technology
medical
professionals.
Ethical
considerations,
including
data
security
bias
mitigation,
are
critical
application.
By
leveraging
supplementary
resources
in
research
education,
we
can
enhance
learning
support
knowledge-based
inquiries,
ultimately
advancing
quality
accessibility
care.
Continued
development
necessary
fully
realize
transforming
healthcare.
World Journal of Gastroenterology,
Journal Year:
2025,
Volume and Issue:
31(6)
Published: Jan. 10, 2025
Inflammatory
bowel
disease
(IBD)
is
a
global
health
burden
that
affects
millions
of
individuals
worldwide,
necessitating
extensive
patient
education.
Large
language
models
(LLMs)
hold
promise
for
addressing
information
needs.
However,
LLM
use
to
deliver
accurate
and
comprehensible
IBD-related
medical
has
yet
be
thoroughly
investigated.
To
assess
the
utility
three
LLMs
(ChatGPT-4.0,
Claude-3-Opus,
Gemini-1.5-Pro)
as
reference
point
patients
with
IBD.
In
this
comparative
study,
two
gastroenterology
experts
generated
15
questions
reflected
common
concerns.
These
were
used
evaluate
performance
LLMs.
The
answers
provided
by
each
model
independently
assessed
using
Likert
scale
focusing
on
accuracy,
comprehensibility,
correlation.
Simultaneously,
invited
comprehensibility
their
answers.
Finally,
readability
assessment
was
performed.
Overall,
achieved
satisfactory
levels
completeness
when
answering
questions,
although
varies.
All
investigated
demonstrated
strengths
in
providing
basic
such
IBD
definition
well
its
symptoms
diagnostic
methods.
Nevertheless,
dealing
more
complex
advice,
medication
side
effects,
dietary
adjustments,
complication
risks,
quality
inconsistent
between
Notably,
Claude-3-Opus
better
than
other
models.
have
potential
educational
tools
IBD;
however,
there
are
discrepancies
Further
optimization
development
specialized
necessary
ensure
accuracy
safety
provided.
Cureus,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 27, 2025
Integrating
artificial
intelligence
(AI)
into
oncology
can
revolutionize
decision-making
by
providing
accurate
information.
This
study
evaluates
the
performance
of
ChatGPT-4o
(OpenAI,
San
Francisco,
CA)
Oncology
Expert,
in
addressing
open-ended
clinical
questions.
Thirty-seven
treatment-related
questions
on
solid
organ
tumors
were
selected
from
a
hematology-oncology
textbook.
Responses
Expert
and
textbook
anonymized
independently
evaluated
two
medical
oncologists
using
structured
scoring
system
focused
accuracy
justification.
Statistical
analysis,
including
paired
t-tests,
was
conducted
to
compare
scores,
interrater
reliability
assessed
Cohen's
Kappa.
achieved
significantly
higher
average
score
7.83
compared
textbook's
7.0
(p
<
0.01).
In
10
cases,
provided
more
updated
answers,
demonstrating
its
ability
integrate
recent
knowledge.
26
both
sources
equally
relevant
but
Expert's
responses
clearer
easier
understand.
Kappa
indicated
almost
perfect
agreement
(κ
=
0.93).
Both
included
outdated
information
for
bladder
cancer
treatment,
underscoring
need
regular
updates.
shows
significant
potential
as
tool
offering
precise,
up-to-date,
user-friendly
responses.
It
could
transform
practice
enhancing
efficiency,
improving
educational
tools,
serving
reliable
adjunct
workflows.
However,
integration
requires
updates,
expert
validation,
collaborative
approach
ensure
relevance
rapidly
evolving
field
oncology.
Critical Care,
Journal Year:
2025,
Volume and Issue:
29(1)
Published: Feb. 10, 2025
Abstract
Background
Large
language
models
(LLMs)
show
increasing
potential
for
their
use
in
healthcare
administrative
support
and
clinical
decision
making.
However,
reports
on
performance
critical
care
medicine
is
lacking.
Methods
This
study
evaluated
five
LLMs
(GPT-4o,
GPT-4o-mini,
GPT-3.5-turbo,
Mistral
2407
Llama
3.1
70B)
1181
multiple
choice
questions
(MCQs)
from
the
gotheextramile.com
database,
a
comprehensive
database
of
at
European
Diploma
Intensive
Care
examination
level.
Their
was
compared
to
random
guessing
350
human
physicians
77-MCQ
practice
test.
Metrics
included
accuracy,
consistency,
domain-specific
performance.
Costs,
as
proxy
energy
consumption,
were
also
analyzed.
Results
GPT-4o
achieved
highest
accuracy
93.3%,
followed
by
70B
(87.5%),
(87.9%),
GPT-4o-mini
(83.0%),
GPT-3.5-turbo
(72.7%).
Random
yielded
41.5%
(
p
<
0.001).
On
test,
all
surpassed
physicians,
scoring
89.0%,
80.9%,
84.4%,
80.3%,
66.5%,
respectively,
42.7%
0.001)
61.9%
physicians.
contrast
other
0.001),
GPT-3.5-turbo’s
did
not
significantly
outperform
=
0.196).
Despite
high
overall
gave
consistently
incorrect
answers.
The
most
expensive
model
GPT-4o,
costing
over
25
times
more
than
least
model,
GPT-4o-mini.
Conclusions
exhibit
exceptional
with
four
outperforming
European-level
exam.
led
but
raised
concerns
about
consumption.
care,
produced
answers,
highlighting
need
thorough
ongoing
evaluations
guide
responsible
implementation
settings.