Qualitative Research in Psychology,
Год журнала:
2024,
Номер
unknown, С. 1 - 31
Опубликована: Ноя. 30, 2024
This
paper
explores
the
application
of
large
language
models
(LLMs),
particularly
GPT-4,
as
innovative
tools
in
qualitative
psychological
research.
Although
LLMs
are
actively
used
across
various
domains,
their
potential
studies
remains
underexplored.
study
demonstrates,
through
a
series
simulations,
how
GPT-4
can
assist
planning
and
conducting
exploratory
studies,
performing
narrative
analysis,
evaluating
different
properties
texts
directed
conventional
content
analysis.
The
findings
reveal
that
not
only
significantly
reduce
time
required
for
data
analysis
but
also
enhance
trustworthiness
results.
proposes
several
methodological
points,
provides
use
cases
examples,
summarise
best
practices
integrating
into
studies.
Importance
Large
language
models
(LLMs)
can
assist
in
various
health
care
activities,
but
current
evaluation
approaches
may
not
adequately
identify
the
most
useful
application
areas.
Objective
To
summarize
existing
evaluations
of
LLMs
terms
5
components:
(1)
data
type,
(2)
task,
(3)
natural
processing
(NLP)
and
understanding
(NLU)
tasks,
(4)
dimension
evaluation,
(5)
medical
specialty.
Data
Sources
A
systematic
search
PubMed
Web
Science
was
performed
for
studies
published
between
January
1,
2022,
February
19,
2024.
Study
Selection
Studies
evaluating
1
or
more
care.
Extraction
Synthesis
Three
independent
reviewers
categorized
via
keyword
searches
based
on
used,
NLP
NLU
dimensions
Results
Of
519
reviewed,
2024,
only
5%
used
real
patient
LLM
evaluation.
The
common
tasks
were
assessing
knowledge
such
as
answering
licensing
examination
questions
(44.5%)
making
diagnoses
(19.5%).
Administrative
assigning
billing
codes
(0.2%)
writing
prescriptions
less
studied.
For
focused
question
(84.2%),
while
summarization
(8.9%)
conversational
dialogue
(3.3%)
infrequent.
Almost
all
(95.4%)
accuracy
primary
evaluation;
fairness,
bias,
toxicity
(15.8%),
deployment
considerations
(4.6%),
calibration
uncertainty
(1.2%)
infrequently
measured.
Finally,
specialty
area,
generic
applications
(25.6%),
internal
medicine
(16.4%),
surgery
(11.4%),
ophthalmology
(6.9%),
with
nuclear
(0.6%),
physical
(0.4%),
genetics
being
least
represented.
Conclusions
Relevance
Existing
mostly
focus
examinations,
without
consideration
data.
Dimensions
received
limited
attention.
Future
should
adopt
standardized
metrics,
use
clinical
data,
broaden
to
include
a
wider
range
specialties.
Artificial Intelligence Review,
Год журнала:
2025,
Номер
58(3)
Опубликована: Янв. 6, 2025
Sentiment
analysis
has
emerged
as
a
prominent
research
domain
within
the
realm
of
natural
language
processing,
garnering
increasing
attention
and
growing
body
literature.
While
numerous
literature
reviews
have
examined
sentiment
techniques,
methods,
topics
applications,
there
remains
gap
in
concerning
thematic
trends
methodologies
analysis,
particularly
context
Chinese
text.
This
study
addresses
this
by
presenting
comprehensive
survey
dedicated
to
progression
subjects,
methods
Employing
framework
that
combines
keyword
co-occurrence
with
sophisticated
community
detection
algorithm,
offers
novel
perspective
on
landscape
research.
By
tracing
interplay
between
emerging
over
past
two
decades,
our
not
only
facilitates
comparative
their
correlations
but
also
illuminates
evolving
patterns,
identifying
significant
hotspots
time
for
text
analysis.
invaluable
insight
provides
roadmap
researchers
seeking
navigate
intricate
terrain
language.
Moreover,
paper
extends
beyond
academic
realm,
offering
practical
insights
into
themes
while
pinpointing
avenues
future
exploration,
technical
limitations,
directions
JMIR Mental Health,
Год журнала:
2024,
Номер
11, С. e57400 - e57400
Опубликована: Сен. 3, 2024
Background
Large
language
models
(LLMs)
are
advanced
artificial
neural
networks
trained
on
extensive
datasets
to
accurately
understand
and
generate
natural
language.
While
they
have
received
much
attention
demonstrated
potential
in
digital
health,
their
application
mental
particularly
clinical
settings,
has
generated
considerable
debate.
Objective
This
systematic
review
aims
critically
assess
the
use
of
LLMs
specifically
focusing
applicability
efficacy
early
screening,
interventions,
settings.
By
systematically
collating
assessing
evidence
from
current
studies,
our
work
analyzes
models,
methodologies,
data
sources,
outcomes,
thereby
highlighting
challenges
present,
prospects
for
use.
Methods
Adhering
PRISMA
(Preferred
Reporting
Items
Systematic
Reviews
Meta-Analyses)
guidelines,
this
searched
5
open-access
databases:
MEDLINE
(accessed
by
PubMed),
IEEE
Xplore,
Scopus,
JMIR,
ACM
Digital
Library.
Keywords
used
were
(mental
health
OR
illness
disorder
psychiatry)
AND
(large
models).
study
included
articles
published
between
January
1,
2017,
April
30,
2024,
excluded
languages
other
than
English.
Results
In
total,
40
evaluated,
including
15
(38%)
conditions
suicidal
ideation
detection
through
text
analysis,
7
(18%)
as
conversational
agents,
18
(45%)
applications
evaluations
health.
show
good
effectiveness
detecting
issues
providing
accessible,
destigmatized
eHealth
services.
However,
assessments
also
indicate
that
risks
associated
with
might
surpass
benefits.
These
include
inconsistencies
text;
production
hallucinations;
absence
a
comprehensive,
benchmarked
ethical
framework.
Conclusions
examines
inherent
risks.
The
identifies
several
issues:
lack
multilingual
annotated
experts,
concerns
regarding
accuracy
reliability
content,
interpretability
due
“black
box”
nature
LLMs,
ongoing
dilemmas.
clear,
framework;
privacy
issues;
overreliance
both
physicians
patients,
which
could
compromise
traditional
medical
practices.
As
result,
should
not
be
considered
substitutes
professional
rapid
development
underscores
valuable
aids,
emphasizing
need
continued
research
area.
Trial
Registration
PROSPERO
CRD42024508617;
https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=508617
medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Апрель 16, 2024
1
Abstract
Importance
Large
Language
Models
(LLMs)
can
assist
in
a
wide
range
of
healthcare-related
activities.
Current
approaches
to
evaluating
LLMs
make
it
difficult
identify
the
most
impactful
LLM
application
areas.
Objective
To
summarize
current
evaluation
healthcare
terms
5
components:
data
type,
task,
Natural
Processing
(NLP)/Natural
Understanding
(NLU)
dimension
evaluation,
and
medical
specialty.
Data
Sources
A
systematic
search
PubMed
Web
Science
was
performed
for
studies
published
between
01-01-2022
02-19-2024.
Study
Selection
Studies
one
or
more
healthcare.
Extraction
Synthesis
Three
independent
reviewers
categorized
519
used
tasks
(the
what)
NLP/NLU
how)
examined,
dimension(s)
specialty
studied.
Results
Only
5%
reviewed
utilized
real
patient
care
evaluation.
The
popular
were
assessing
knowledge
(e.g.
answering
licensing
exam
questions,
44.5%),
followed
by
making
diagnoses
(19.5%),
educating
patients
(17.7%).
Administrative
such
as
assigning
provider
billing
codes
(0.2%),
writing
prescriptions
generating
clinical
referrals
(0.6%)
notetaking
(0.8%)
less
For
tasks,
vast
majority
examined
question
(84.2%).
Other
summarization
(8.9%),
conversational
dialogue
(3.3%),
translation
(3.1%)
infrequent.
Almost
all
(95.4%)
accuracy
primary
evaluation;
fairness,
bias
toxicity
(15.8%),
robustness
(14.8%),
deployment
considerations
(4.6%),
calibration
uncertainty
(1.2%)
infrequently
measured.
Finally,
area,
internal
medicine
(42%),
surgery
(11.4%)
ophthalmology
(6.9%),
with
nuclear
(0.6%),
physical
(0.4%)
genetics
(0.2%)
being
least
represented.
Conclusions
Relevance
Existing
evaluations
mostly
focused
on
exams,
without
consideration
data.
Dimensions
like
toxicity,
robustness,
received
limited
attention.
draw
meaningful
conclusions
improve
adoption,
future
need
establish
standardized
set
applications
dimensions,
perform
using
from
routine
care,
broaden
testing
include
administrative
well
multiple
specialties.
Key
Points
Question
How
are
large
language
models
currently
evaluated?
Findings
rarely
understudied.
summarization,
dialogue,
explored.
Accuracy
predominant
while
assessments
neglected.
Evaluations
specialized
fields,
rare.
Meaning
remain
shallow
fragmented.
concrete
insights
their
performance,
use
across
broad
specialties
dimensions
International Journal of Environmental Research and Public Health,
Год журнала:
2024,
Номер
21(7), С. 910 - 910
Опубликована: Июль 12, 2024
(1)
Background:
Artificial
intelligence
(AI)
has
flourished
in
recent
years.
More
specifically,
generative
AI
had
broad
applications
many
disciplines.
While
mental
illness
is
on
the
rise,
proven
valuable
aiding
diagnosis
and
treatment
of
disorders.
However,
there
little
to
no
research
about
precisely
how
much
interest
technology.
(2)
Methods:
We
performed
a
Google
Trends
search
for
“AI
health”
compared
relative
volume
(RSV)
indices
“AI”,
Depression”,
anxiety”.
This
time
series
study
employed
Box–Jenkins
modeling
forecast
long-term
through
end
2024.
(3)
Results:
Within
United
States,
steadily
increased
throughout
2023,
with
some
anomalies
due
media
reporting.
Through
predictive
models,
we
found
that
this
trend
predicted
increase
114%
year
2024,
public
being
rise.
(4)
Conclusions:
According
our
study,
awareness
drastically
especially
health.
demonstrates
increasing
health
AI,
making
advocacy
education
technology
paramount
importance.
Abstract
Background
The
use
of
artificial
intelligence
in
the
field
health
sciences
is
becoming
widespread.
It
known
that
patients
benefit
from
applications
on
various
issues,
especially
after
pandemic
period.
One
most
important
issues
this
regard
accuracy
information
provided
by
applications.
Objective
purpose
study
was
to
frequently
asked
questions
about
dental
amalgam,
as
determined
United
States
Food
and
Drug
Administration
(FDA),
which
one
these
resources,
Chat
Generative
Pre-trained
Transformer
version
4
(ChatGPT-4)
compare
content
answers
given
application
with
FDA.
Methods
were
directed
ChatGPT-4
May
8th
16th,
2023,
responses
recorded
compared
at
word
meaning
levels
using
ChatGPT.
FDA
webpage
also
recorded.
for
similarity
“Main
Idea”,
“Quality
Analysis”,
“Common
Ideas”,
“Inconsistent
Ideas”
between
ChatGPT-4’s
FDA’s
responses.
Results
similar
one-week
intervals.
In
comparison
guidance,
it
questions.
However,
although
there
some
similarities
general
aspects
recommendation
regarding
amalgam
removal
question,
two
texts
are
not
same,
they
offered
different
perspectives
replacement
fillings.
Conclusions
findings
indicate
ChatGPT-4,
an
based
application,
encompasses
current
accurate
its
removal,
providing
individuals
seeking
access
such
information.
Nevertheless,
we
believe
numerous
studies
required
assess
validity
reliability
across
diverse
subjects.