Conservation Biology,
Journal Year:
2025,
Volume and Issue:
39(2)
Published: April 1, 2025
Abstract
Addressing
global
environmental
conservation
problems
requires
rapidly
translating
natural
and
social
science
evidence
to
policy‐relevant
information.
Yet,
exponential
increases
in
scientific
production
combined
with
disciplinary
differences
reporting
research
make
interdisciplinary
syntheses
especially
challenging.
Ongoing
developments
language
processing
(NLP),
such
as
large
models,
machine
learning
(ML),
data
mining,
hold
the
promise
of
accelerating
cross‐disciplinary
primary
research.
The
evolution
ML,
NLP,
artificial
intelligence
(AI)
systems
computational
provides
new
approaches
accelerate
all
stages
synthesis
science.
To
show
how
processing,
AI
can
help
automate
scale
science,
we
describe
methods
that
querying
literature,
process
unstructured
bodies
textual
evidence,
extract
parameters
interest
from
studies.
Automation
translate
other
agendas
by
categorizing
labeling
at
scale,
yet
there
are
major
unanswered
questions
about
use
hybrid
AI‐expert
ethically
effectively
conservation.
Proceedings of the National Academy of Sciences,
Journal Year:
2024,
Volume and Issue:
121(34)
Published: Aug. 12, 2024
The
social
and
behavioral
sciences
have
been
increasingly
using
automated
text
analysis
to
measure
psychological
constructs
in
text.
We
explore
whether
GPT,
the
large-language
model
(LLM)
underlying
AI
chatbot
ChatGPT,
can
be
used
as
a
tool
for
several
languages.
Across
15
datasets
(
n
=
47,925
manually
annotated
tweets
news
headlines),
we
tested
different
versions
of
GPT
(3.5
Turbo,
4,
4
Turbo)
accurately
detect
(sentiment,
discrete
emotions,
offensiveness,
moral
foundations)
across
12
found
that
r
0.59
0.77)
performed
much
better
than
English-language
dictionary
0.20
0.30)
at
detecting
judged
by
manual
annotators.
nearly
well
as,
sometimes
than,
top-performing
fine-tuned
machine
learning
models.
Moreover,
GPT’s
performance
improved
successive
model,
particularly
lesser-spoken
languages,
became
less
expensive.
Overall,
may
superior
many
existing
methods
analysis,
since
it
achieves
relatively
high
accuracy
requires
no
training
data,
is
easy
use
with
simple
prompts
(e.g.,
“is
this
negative?”)
little
coding
experience.
provide
sample
code
video
tutorial
analyzing
application
programming
interface.
argue
other
LLMs
help
democratize
making
advanced
natural
language
processing
capabilities
more
accessible,
facilitate
cross-linguistic
research
understudied
JMIR Medical Education,
Journal Year:
2023,
Volume and Issue:
9, P. e50514 - e50514
Published: Sept. 5, 2023
Large
language
model
(LLM)-based
chatbots
are
evolving
at
an
unprecedented
pace
with
the
release
of
ChatGPT,
specifically
GPT-3.5,
and
its
successor,
GPT-4.
Their
capabilities
in
general-purpose
tasks
generation
have
advanced
to
point
performing
excellently
on
various
educational
examination
benchmarks,
including
medical
knowledge
tests.
Comparing
performance
these
2
LLM
models
that
Family
Medicine
residents
a
multiple-choice
test
can
provide
insights
into
their
potential
as
education
tools.
Psychology Research and Behavior Management,
Journal Year:
2024,
Volume and Issue:
Volume 17, P. 1139 - 1150
Published: March 1, 2024
Textual
data
analysis
has
become
a
popular
method
for
examining
complex
human
behavior
in
various
fields,
including
psychology,
psychiatry,
sociology,
computer
science,
mining,
forensic
sciences,
and
communication
studies.
However,
identifying
the
most
relevant
textual
parameters
analyzing
is
still
challenge.
Behavior Research Methods,
Journal Year:
2024,
Volume and Issue:
56(8), P. 8214 - 8237
Published: Aug. 15, 2024
Large
language
models
(LLMs)
have
the
potential
to
revolutionize
behavioral
science
by
accelerating
and
improving
research
cycle,
from
conceptualization
data
analysis.
Unlike
closed-source
solutions,
open-source
frameworks
for
LLMs
can
enable
transparency,
reproducibility,
adherence
protection
standards,
which
gives
them
a
crucial
advantage
use
in
science.
To
help
researchers
harness
promise
of
LLMs,
this
tutorial
offers
primer
on
Hugging
Face
ecosystem
demonstrates
several
applications
that
advance
conceptual
empirical
work
science,
including
feature
extraction,
fine-tuning
prediction,
generation
responses.
Executable
code
is
made
available
at
github.com/Zak-Hussain/LLM4BeSci.git
.
Finally,
discusses
challenges
faced
with
(open-source)
related
interpretability
safety
perspective
future
intersection
modeling
Journal of Hospitality and Tourism Technology,
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 12, 2025
Purpose
This
study
aims
to
explore
the
application
of
ChatGPT
analyze
hotel
guest
satisfaction
from
online
reviews.
As
feedback
plays
a
critical
role
in
consumer
decision-making
hospitality
industry,
research
evaluates
accuracy
and
reliability
ChatGPT’s
ratings
compared
those
human
raters
classic
supervised
machine
learning
classification
techniques.
Design/methodology/approach
Using
TripAdvisor
reviews
five-star
hotels,
authors
use
structured
two-phase
assess
both
inter-
intra-rater
reliability.
Findings
The
results
highlight
distinct
differences
rating
behavior
between
artificial
intelligence
(AI)
judges,
with
showing
tendency
toward
more
moderate
ratings.
In
addition,
observe
slight
for
guests
overrate
their
experiences,
supporting
literature
on
subjective
nature
Despite
these
variations,
shows
significant
agreement
ratings,
especially
when
minor
discrepancies
are
accounted
for,
suggesting
its
utility
as
analysis
tool
industry.
paper
highlights
ability
process
evaluate
textual
data
discusses
implications
using
AI
improve
review
processes
management.
advocate
incorporation
tools
into
customer
systems
augment
suggest
future
refine
models
practical
applications.
Originality/value
advances
understanding
AI’s
management
by
demonstrating
analyzing
through
providing
methodological
framework
assessing
AI-generated
content.
Behavior Research Methods,
Journal Year:
2024,
Volume and Issue:
56(6), P. 6082 - 6100
Published: Jan. 23, 2024
Research
on
language
and
cognition
relies
extensively
psycholinguistic
datasets
or
"norms".
These
contain
judgments
of
lexical
properties
like
concreteness
age
acquisition,
can
be
used
to
norm
experimental
stimuli,
discover
empirical
relationships
in
the
lexicon,
stress-test
computational
models.
However,
collecting
human
at
scale
is
both
time-consuming
expensive.
This
issue
compounded
for
multi-dimensional
norms
those
incorporating
context.
The
current
work
asks
whether
large
models
(LLMs)
leveraged
augment
creation
large,
English.
I
use
GPT-4
collect
multiple
kinds
semantic
(e.g.,
word
similarity,
contextualized
sensorimotor
associations,
iconicity)
English
words
compare
these
against
"gold
standard".
For
each
dataset,
find
that
GPT-4's
are
positively
correlated
with
judgments,
some
cases
rivaling
even
exceeding
average
inter-annotator
agreement
displayed
by
humans.
then
identify
several
ways
which
LLM-generated
differ
from
human-generated
systematically.
also
perform
"substitution
analyses",
demonstrate
replacing
a
statistical
model
does
not
change
sign
parameter
estimates
(though
select
cases,
there
significant
changes
their
magnitude).
conclude
discussing
considerations
limitations
associated
general,
including
concerns
data
contamination,
choice
LLM,
external
validity,
construct
quality.
Additionally,
all
(over
30,000
total)
made
available
online
further
analysis.
Smart Cities,
Journal Year:
2024,
Volume and Issue:
7(5), P. 2422 - 2465
Published: Sept. 1, 2024
Road
traffic
crashes
(RTCs)
are
a
global
public
health
issue,
with
traditional
analysis
methods
often
hindered
by
delays
and
incomplete
data.
Leveraging
social
media
for
real-time
safety
offers
promising
alternative,
yet
effective
frameworks
this
integration
scarce.
This
study
introduces
novel
multitask
learning
(MTL)
framework
utilizing
large
language
models
(LLMs)
to
analyze
RTC-related
tweets
from
Australia.
We
collected
26,226
traffic-related
May
2022
2023.
Using
GPT-3.5,
we
extracted
fifteen
distinct
features
categorized
into
six
classification
tasks
nine
information
retrieval
tasks.
These
were
then
used
fine-tune
GPT-2
modeling,
which
outperformed
baseline
models,
including
GPT-4o
mini
in
zero-shot
mode
XGBoost,
across
most
Unlike
single-task
classifiers
that
may
miss
critical
details,
our
MTL
approach
simultaneously
classifies
extracts
detailed
natural
language.
Our
fine-tunedGPT-2
model
achieved
an
average
accuracy
of
85%
the
tasks,
surpassing
model’s
64%
XGBoost’s
83.5%.
In
fine-tuned
BLEU-4
score
0.22,
ROUGE-I
0.78,
WER
0.30,
significantly
outperforming
GPT-4
0.0674,
0.2992,
2.0715.
results
demonstrate
efficacy
enhancing
both
retrieval,
offering
valuable
insights
data-driven
decision-making
improve
road
safety.
is
first
explicitly
apply
data
LLMs
within
enhance
Journal of Medical Internet Research,
Journal Year:
2023,
Volume and Issue:
unknown
Published: July 19, 2023
Background:
Sentiment
analysis
is
a
significant
yet
difficult
task
in
natural
language
processing.
The
linguistic
peculiarities
of
Cantonese,
including
its
high
similarity
with
Standard
Chinese,
grammatical
and
lexical
uniqueness,
colloquialism
multilingualism,
make
it
different
from
other
languages
pose
additional
challenges
to
sentiment
analysis.
Recent
advances
models
such
as
ChatGPT
offer
potential
viable
solutions.
Objective:
This
study
investigated
the
efficacy
GPT-3.5
GPT-4
Cantonese
context
web-based
counseling
compared
their
performance
mainstream
methods,
lexicon-based
methods
machine
learning
approaches.
Methods:
We
analyzed
transcripts
web-based,
text-based
service
Hong
Kong,
total
131
individual
sessions
6169
messages
between
counselors
help-seekers.
First,
codebook
was
developed
for
human
annotation.
A
simple
prompt
("Is
this
text
positive,
neutral,
or
negative?
Respond
label
only.")
then
given
each
message's
sentiment.
GPT-4's
method
3
state-of-the-art
models,
linear
regression,
support
vector
machines,
long
short-term
memory
neural
networks.
Results:
Our
findings
revealed
ChatGPT's
remarkable
accuracy
classification,
GPT-4,
respectively,
achieving
92.1%
(5682/6169)
95.3%
(5880/6169)
identifying
negative
sentiment,
thereby
outperforming
traditional
method,
which
had
an
37.2%
(2295/6169),
accuracies
ranging
66%
(4072/6169)
70.9%
(4374/6169).
Conclusions:
Among
many
techniques,
demonstrates
superior
emerges
promising
tool
also
highlights
applicability
real-world
scenarios,
monitoring
quality
services
detecting
message-level
sentiments
vivo.
insights
derived
pave
way
further
exploration
into
capabilities
underresourced
specialized
domains
like
psychotherapy