Abstract
Purpose
Large
language
models
(LLMs)
are
pivotal
in
artificial
intelligence,
demonstrating
advanced
capabilities
natural
understanding
and
multimodal
interactions,
with
significant
potential
medical
applications.
This
study
explores
the
feasibility
efficacy
of
LLMs,
specifically
ChatGPT-4o
Claude
3-Opus,
classifying
thyroid
nodules
using
ultrasound
images.
Methods
included
112
patients
a
total
116
nodules,
comprising
75
benign
41
malignant
cases.
Ultrasound
images
these
were
analyzed
3-Opus
to
diagnose
or
nature
nodules.
An
independent
evaluation
by
junior
radiologist
was
also
conducted.
Diagnostic
performance
assessed
Cohen’s
Kappa
receiver
operating
characteristic
(ROC)
curve
analysis,
referencing
pathological
diagnoses.
Results
demonstrated
poor
agreement
results
(
=
0.116),
while
showed
even
lower
0.034).
The
exhibited
moderate
0.450).
achieved
an
area
under
ROC
(AUC)
57.0%
(95%
CI:
48.6–65.5%),
slightly
outperforming
(AUC
52.0%,
95%
43.2–60.9%).
In
contrast,
significantly
higher
AUC
72.4%
63.7–81.1%).
unnecessary
biopsy
rates
41.4%
for
ChatGPT-4o,
43.1%
12.1%
radiologist.
Conclusion
While
LLMs
such
as
show
promise
future
applications
imaging,
their
current
use
clinical
diagnostics
should
be
approached
cautiously
due
limited
accuracy.
Applied Sciences,
Год журнала:
2024,
Номер
14(11), С. 4671 - 4671
Опубликована: Май 29, 2024
Extractive
summarization,
a
pivotal
task
in
natural
language
processing,
aims
to
distill
essential
content
from
lengthy
documents
efficiently.
Traditional
methods
often
struggle
with
capturing
the
nuanced
interdependencies
between
different
document
elements,
which
is
crucial
producing
coherent
and
contextually
rich
summaries.
This
paper
introduces
Multi-Element
Contextual
Hypergraph
Summarizer
(MCHES),
novel
framework
designed
address
these
challenges
through
an
advanced
hypergraph-based
approach.
MCHES
constructs
contextual
hypergraph
where
sentences
form
nodes
interconnected
by
multiple
types
of
hyperedges,
including
semantic,
narrative,
discourse
hyperedges.
structure
captures
complex
relationships
maintains
narrative
flow,
enhancing
semantic
coherence
across
summary.
The
incorporates
Homogenization
Module
(CHM),
harmonizes
features
diverse
Attention
(HCA),
employs
dual-level
attention
mechanism
focus
on
most
salient
information.
innovative
Read-out
Strategy
selects
optimal
set
compose
final
summary,
ensuring
that
latter
reflects
core
themes
logical
original
text.
Our
extensive
evaluations
demonstrate
significant
improvements
over
existing
methods.
Specifically,
achieves
average
ROUGE-1
score
44.756,
ROUGE-2
24.963,
ROUGE-L
42.477
CNN/DailyMail
dataset,
surpassing
best-performing
baseline
3.662%,
3.395%,
2.166%
respectively.
Furthermore,
BERTScore
values
59.995
CNN/DailyMail,
88.424
XSum,
89.285
PubMed,
indicating
superior
alignment
human-generated
Additionally,
MoverScore
87.432
60.549
59.739
highlighting
its
effectiveness
maintaining
movement
ordering.
These
results
confirm
sets
new
standard
for
extractive
summarization
leveraging
hypergraphs
better
thematic
fidelity.
Software,
Год журнала:
2024,
Номер
3(1), С. 62 - 80
Опубликована: Фев. 29, 2024
This
paper
presents
a
pioneering
methodology
for
refining
product
recommender
systems,
introducing
synergistic
integration
of
unsupervised
models—K-means
clustering,
content-based
filtering
(CBF),
and
hierarchical
clustering—with
the
cutting-edge
GPT-4
large
language
model
(LLM).
Its
innovation
lies
in
utilizing
evaluation,
harnessing
its
advanced
natural
understanding
capabilities
to
enhance
precision
relevance
recommendations.
A
flask-based
API
simplifies
implementation
e-commerce
owners,
allowing
seamless
training
evaluation
models
using
CSV-formatted
data.
The
unique
aspect
this
approach
ability
empower
with
sophisticated
system
algorithms,
while
GPT
significantly
contributes
semantic
context
features,
resulting
more
personalized
effective
recommendation
system.
experimental
results
underscore
superiority
integrated
framework,
marking
significant
advancement
field
systems
providing
businesses
an
efficient
scalable
solution
optimize
their
Autistic
individuals
commonly
encounter
challenges
in
communicating
with
others
which
can
lead
to
difficulties
obtaining
and
maintaining
jobs.
Thus,
job
training
programs
have
emphasized
the
communication
skills
of
autistic
improve
their
employability.
Hence,
we
developed
a
virtual
reality
application
that
features
avatars
as
chatbots
powered
by
Large
Language
Models
(LLMs),
such
GPT-3.5
Turbo,
employs
speech-based
interactions
users.
The
use
LLM-driven
allows
coaches
create
scenarios
for
trainees
using
text
prompts.
We
conducted
preliminary
study
three
two
gather
early-stage
feedback
on
application's
usability
user
experience.
In
study,
trainee
participants
were
asked
interact
involving
customer
interactions.
Our
findings
indicate
our
shows
promise
communication.
Furthermore,
discuss
its
experience
aspects
from
trainees'
coaches'
perspectives.
Natural Language Processing Journal,
Год журнала:
2024,
Номер
8, С. 100083 - 100083
Опубликована: Июнь 9, 2024
Document-based
Question-Answering
(QA)
tasks
are
crucial
for
precise
information
retrieval.
While
some
existing
work
focus
on
evaluating
large
language
model's
(LLMs)
performance
retrieving
and
answering
questions
from
documents,
assessing
the
LLMs
QA
types
that
require
exact
answer
selection
predefined
options
numerical
extraction
is
yet
to
be
fully
assessed.
In
this
paper,
we
specifically
underexplored
context
conduct
empirical
analysis
of
(GPT-4
GPT-3.5)
question
types,
including
single-choice,
yes–no,
multiple-choice,
number
documents.
We
use
CogTale
dataset
evaluation,
which
provide
human
expert-tagged
responses,
offering
a
robust
benchmark
precision
factual
grounding.
found
LLMs,
particularly
GPT-4,
can
precisely
many
single-choice
yes–no
given
relevant
context,
demonstrating
their
efficacy
in
retrieval
tasks.
However,
diminishes
when
confronted
with
multiple-choice
formats,
lowering
overall
models
task,
indicating
these
may
not
sufficiently
reliable
task.
This
limits
applications
demanding
inference
such
as
meta-analysis
Our
offers
framework
ongoing
ensuring
LLM
document
continue
meet
evolving
standards.
Molecular Therapy — Nucleic Acids,
Год журнала:
2024,
Номер
35(3), С. 102255 - 102255
Опубликована: Июнь 15, 2024
After
ChatGPT
was
released,
large
language
models
(LLMs)
became
more
popular.
Academicians
use
or
LLM
for
different
purposes,
and
the
of
is
increasing
from
medical
science
to
diversified
areas.
Recently,
multimodal
(MLLM)
has
also
become
Therefore,
we
comprehensively
illustrate
MLLM
a
complete
understanding.
We
aim
simple
extended
reviews
LLMs
MLLMs
broad
category
readers,
such
as
researchers,
students
in
fields,
other
academicians.
The
review
article
illustrates
models,
their
working
principles,
applications
fields.
First,
demonstrate
technical
concept
LLMs,
principle,
Black
Box,
evolution
LLMs.
To
explain
discuss
tokenization
process,
token
representation,
relationships.
extensively
application
biological
macromolecules,
science,
MLLMs.
Finally,
limitations,
challenges,
future
prospects
acts
booster
dose
clinicians,
primer
molecular
biologists,
catalyst
scientists,
benefits
Journal of the American Medical Informatics Association,
Год журнала:
2024,
Номер
31(11), С. 2622 - 2631
Опубликована: Авг. 29, 2024
Abstract
Objective
In
acupuncture
therapy,
the
accurate
location
of
acupoints
is
essential
for
its
effectiveness.
The
advanced
language
understanding
capabilities
large
models
(LLMs)
like
Generative
Pre-trained
Transformers
(GPTs)
and
Llama
present
a
significant
opportunity
extracting
relations
related
to
acupoint
locations
from
textual
knowledge
sources.
This
study
aims
explore
performance
LLMs
in
acupoint-related
assess
impact
fine-tuning
on
GPT’s
performance.
Materials
Methods
We
utilized
World
Health
Organization
Standard
Acupuncture
Point
Locations
Western
Pacific
Region
(WHO
Standard)
as
our
corpus,
which
consists
descriptions
361
acupoints.
Five
types
(“direction_of”,
“distance_of”,
“part_of”,
“near_acupoint”,
“located_near”)
(n
=
3174)
between
were
annotated.
Four
compared:
pre-trained
GPT-3.5,
fine-tuned
GPT-4,
well
pretrained
3.
Performance
metrics
included
micro-average
exact
match
precision,
recall,
F1
scores.
Results
Our
results
demonstrate
that
GPT-3.5
consistently
outperformed
other
scores
across
all
relation
types.
Overall,
it
achieved
highest
score
0.92.
Discussion
superior
model,
shown
by
scores,
underscores
importance
domain-specific
enhancing
extraction
acupuncture-related
tasks.
light
findings
this
study,
offers
valuable
insights
into
leveraging
developing
clinical
decision
support
creating
educational
modules
acupuncture.
Conclusion
effectiveness
GPT
locations,
with
implications
accurately
modeling
promoting
standard
implementation
training
practice.
also
contribute
advancing
informatics
applications
traditional
complementary
medicine,
showcasing
potential
natural
processing.
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies,
Год журнала:
2024,
Номер
8(2), С. 1 - 36
Опубликована: Май 13, 2024
Large
language
models
such
as
GPT-3
and
ChatGPT
can
mimic
human-to-human
conversation
with
unprecedented
fidelity,
which
enables
many
applications
conversational
agents
for
education
non-player
characters
in
video
games.
In
this
work,
we
investigate
the
underlying
personality
structure
that
a
GPT-3-based
chatbot
expresses
during
conversations
human.
We
conducted
user
study
to
collect
147
descriptors
from
86
participants
while
they
interacted
three
weeks.
Then,
425
new
rated
an
online
survey.
exploratory
factor
analysis
on
collected
show
that,
though
overlapping,
human
do
not
fully
transfer
chatbot's
perceived
by
humans.
also
is
significantly
different
of
virtual
personal
assistants,
where
users
focus
rather
serviceability
functionality.
discuss
implications
ever-evolving
large
change
affect
users'
perception
agent
personalities.
Journal of King Saud University - Computer and Information Sciences,
Год журнала:
2024,
Номер
36(8), С. 102178 - 102178
Опубликована: Авг. 30, 2024
In
the
age
of
information
overload,
ability
to
distill
essential
content
from
extensive
texts
is
invaluable.
DeepExtract
introduces
an
advanced
framework
for
extractive
summarization,
utilizing
groundbreaking
capabilities
GPT-4
along
with
innovative
hierarchical
positional
encoding
redefine
extraction.
This
manuscript
details
development
DeepExtract,
which
integrates
semantic-driven
techniques
analyze
and
summarize
complex
documents
effectively.
The
structured
around
a
novel
tree
construction
that
categorizes
sentences
sections
not
just
by
their
physical
placement
within
text,
but
contextual
thematic
significance,
leveraging
dynamic
embeddings
generated
GPT-4.
We
introduce
multi-faceted
scoring
system
evaluates
based
on
coherence,
relevance,
novelty,
ensuring
summaries
are
only
concise
rich
content.
Further,
employs
optimized
semantic
clustering
group
elements,
enhances
representativeness
summaries.
paper
demonstrates
through
comprehensive
evaluations
significantly
outperforms
existing
summarization
models
in
terms
accuracy
efficiency,
making
it
potent
tool
academic,
professional,
general
use.
conclude
discussion
practical
applications
various
domains,
highlighting
its
adaptability
potential
navigating
vast
expanses
digital
text.