bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Янв. 15, 2024
The
body
of
ecological
literature,
which
informs
much
our
knowledge
the
global
loss
biodiversity,
has
been
experiencing
rapid
growth
in
recent
decades.
increasing
difficulty
to
synthesise
this
literature
manually
simultaneously
resulted
a
growing
demand
for
automated
text
mining
methods.
Within
domain
deep
learning,
large
language
models
(LLMs)
have
subject
considerable
attention
years
by
virtue
great
leaps
progress
and
wide
range
potential
applications,
however,
quantitative
investigation
into
their
ecology
so
far
lacking.
In
work,
we
analyse
ability
GPT-4
extract
information
about
invertebrate
pests
pest
controllers
from
abstracts
on
biological
control,
using
bespoke,
zero-shot
prompt.
Our
results
show
that
performance
is
highly
competitive
with
other
state-of-the-art
tools
used
taxonomic
named
entity
recognition
geographic
location
extraction
tasks.
On
held-out
test
set,
species
locations
are
extracted
F1-scores
99.8%
95.3%,
respectively,
highlight
model
able
distinguish
very
effectively
between
primary
roles
interest
(predators,
parasitoids
pests).
Moreover,
demonstrate
predict
across
various
ranks,
automatically
correct
spelling
mistakes.
However,
do
report
small
number
cases
fabricated
(hallucinations).
As
result
current
lack
specialised,
pre-trained
models,
general-purpose
LLMs
may
provide
promising
way
forward
ecology.
Combined
tailored
prompt
engineering,
such
can
be
employed
tasks
ecology,
greatly
reduce
time
spent
manual
screening
labelling
literature.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,
Год журнала:
2023,
Номер
unknown
Опубликована: Янв. 1, 2023
ChatGPT’s
emergence
heralds
a
transformative
phase
in
NLP,
particularly
demonstrated
through
its
excellent
performance
on
many
English
benchmarks.
However,
the
model’s
efficacy
across
diverse
linguistic
contexts
remains
largely
uncharted
territory.
This
work
aims
to
bridge
this
knowledge
gap,
with
primary
focus
assessing
capabilities
Arabic
languages
and
dialectal
varieties.
Our
comprehensive
study
conducts
large-scale
automated
human
evaluation
of
ChatGPT,
encompassing
44
distinct
language
understanding
generation
tasks
over
60
different
datasets.
To
our
knowledge,
marks
first
extensive
analysis
deployment
NLP.
findings
indicate
that,
despite
remarkable
English,
ChatGPT
is
consistently
surpassed
by
smaller
models
that
have
undergone
finetuning
Arabic.
We
further
undertake
meticulous
comparison
GPT-4’s
Modern
Standard
(MSA)
Dialectal
(DA),
unveiling
relative
shortcomings
both
handling
dialects
compared
MSA.
Although
we
explore
confirm
utility
employing
GPT-4
as
potential
alternative
for
evaluation,
adds
growing
body
research
underscoring
limitations
ChatGPT.
BenchCouncil Transactions on Benchmarks Standards and Evaluations,
Год журнала:
2023,
Номер
3(3), С. 100136 - 100136
Опубликована: Авг. 9, 2023
Conversational
AI
systems
like
ChatGPT
have
seen
remarkable
advancements
in
recent
years,
revolutionizing
human–computer
interactions.
However,
evaluating
the
performance
and
ethical
implications
of
these
remains
a
challenge.
This
paper
delves
into
creation
rigorous
benchmarks,
adaptable
standards,
an
intelligent
evaluation
methodology
tailored
specifically
for
ChatGPT.
We
meticulously
analyze
several
prominent
including
GLUE,
SuperGLUE,
SQuAD,
CoQA,
Persona-Chat,
DSTC,
BIG-Bench,
HELM
MMLU
illuminating
their
strengths
limitations.
also
scrutinizes
existing
standards
set
by
OpenAI,
IEEE's
Ethically
Aligned
Design,
Montreal
Declaration,
Partnership
on
AI's
Tenets,
investigating
relevance
to
Further,
we
propose
adaptive
that
encapsulate
considerations,
context
adaptability,
community
involvement.
In
terms
evaluation,
explore
traditional
methods
BLEU,
ROUGE,
METEOR,
precision–recall,
F1
score,
perplexity,
user
feedback,
while
proposing
novel
approach
harnesses
power
reinforcement
learning.
Our
proposed
framework
is
multidimensional,
incorporating
task-specific,
real-world
application,
multi-turn
dialogue
benchmarks.
perform
feasibility
analysis,
SWOT
analysis
adaptability
framework.
The
highlights
significance
integrating
it
as
core
component
alongside
subjective
assessments
interactive
sessions.
By
amalgamating
elements,
this
contributes
development
comprehensive
fosters
responsible
impactful
advancement
field
conversational
AI.
ACM Computing Surveys,
Год журнала:
2024,
Номер
56(7), С. 1 - 33
Опубликована: Фев. 15, 2024
Recent
years
have
witnessed
a
substantial
increase
in
the
use
of
deep
learning
to
solve
various
natural
language
processing
(NLP)
problems.
Early
models
were
constrained
by
their
sequential
or
unidirectional
nature,
such
that
they
struggled
capture
contextual
relationships
across
text
inputs.
The
introduction
bidirectional
encoder
representations
from
transformers
(BERT)
leads
robust
for
transformer
model
can
understand
broader
context
and
deliver
state-of-the-art
performance
NLP
tasks.
This
has
inspired
researchers
practitioners
apply
BERT
practical
problems,
as
information
retrieval
(IR).
A
survey
focuses
on
comprehensive
analysis
prevalent
approaches
pretrained
encoders
like
IR
thus
be
useful
academia
industry.
In
light
this,
we
revisit
variety
BERT-based
methods
this
survey,
cover
wide
range
techniques
IR,
group
them
into
six
high-level
categories:
(i)
handling
long
documents,
(ii)
integrating
semantic
information,
(iii)
balancing
effectiveness
efficiency,
(iv)
predicting
weights
terms,
(v)
query
expansion,
(vi)
document
expansion.
We
also
provide
links
resources,
including
datasets
toolkits,
systems.
Additionally,
highlight
advantages
employing
encoder-based
contrast
recent
large
ChatGPT,
which
are
decoder-based
demand
extensive
computational
resources.
Finally,
summarize
outcomes
suggest
directions
future
research
area.
Communications Psychology,
Год журнала:
2024,
Номер
2(1)
Опубликована: Июнь 3, 2024
In
the
present
study,
we
investigate
and
compare
reasoning
in
large
language
models
(LLMs)
humans,
using
a
selection
of
cognitive
psychology
tools
traditionally
dedicated
to
study
(bounded)
rationality.
We
presented
human
participants
an
array
pretrained
LLMs
new
variants
classical
experiments,
cross-compared
their
performances.
Our
results
showed
that
most
included
errors
akin
those
frequently
ascribed
error-prone,
heuristic-based
reasoning.
Notwithstanding
this
superficial
similarity,
in-depth
comparison
between
humans
indicated
important
differences
with
human-like
reasoning,
models'
limitations
disappearing
almost
entirely
more
recent
LLMs'
releases.
Moreover,
show
while
it
is
possible
devise
strategies
induce
better
performance,
machines
are
not
equally
responsive
same
prompting
schemes.
conclude
by
discussing
epistemological
implications
challenges
comparing
machine
behavior
for
both
artificial
intelligence
psychology.
Language
is
the
pathway
to
democratize
boundary
of
land
and
culture.Bridging
gap
between
languages
one
biggest
challenges
Artificial
Intelligent
(AI)
systems.The
current
success
AI
systems
dominated
by
supervised
learning
paradigm
where
gradient-based
algorithms
(i.e.,
SGD,
Adam)
are
designed
optimize
complex
high-dimensional
planes.These
learn
from
statistical
observations
that
typically
collected
with
intention
a
specific
task
product
review,
sentiment
analysis).The
use
ChatGPT
is
a
large
language
model
developed
by
OpenAI.
Despite
its
impressive
performance
across
various
tasks,
no
prior
work
has
investigated
capability
in
the
biomedical
domain
yet.
To
this
end,
paper
aims
to
evaluate
of
on
benchmark
such
as
relation
extraction,
document
classification,
question
answering,
and
summarization.
best
our
knowledge,
first
that
conducts
an
extensive
evaluation
domain.
Interestingly,
we
find
based
datasets
have
smaller
training
sets,
zero-shot
even
outperforms
state-of-the-art
fine-tuned
generative
transformer
models,
BioGPT
BioBART.
This
suggests
ChatGPT’s
pre-training
text
corpora
makes
it
quite
specialized
Our
findings
demonstrate
potential
be
valuable
tool
for
tasks
lack
annotated
data.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,
Год журнала:
2023,
Номер
unknown
Опубликована: Янв. 1, 2023
Large
language
models
(LLMs)
exhibited
powerful
capability
in
various
natural
processing
tasks.
This
work
focuses
on
exploring
LLM
performance
zero-shot
information
extraction,
with
a
focus
the
ChatGPT
and
named
entity
recognition
(NER)
task.
Inspired
by
remarkable
reasoning
of
symbolic
arithmetic
reasoning,
we
adapt
prevalent
methods
to
NER
propose
strategies
tailored
for
NER.
First,
explore
decomposed
question-answering
paradigm
breaking
down
task
into
simpler
subproblems
labels.
Second,
syntactic
augmentation
stimulate
model's
intermediate
thinking
two
ways:
prompting,
which
encourages
model
analyze
structure
itself,
tool
augmentation,
provides
generated
parsing
tool.
Besides,
self-consistency
proposing
two-stage
majority
voting
strategy,
first
votes
most
consistent
mentions,
then
types.
The
proposed
achieve
improvements
across
seven
benchmarks,
including
Chinese
English
datasets,
both
domain-specific
general-domain
scenarios.
In
addition,
present
comprehensive
analysis
error
types
suggestions
optimization
directions.
We
also
verify
effectiveness
few-shot
setting
other
LLMs.
arXiv (Cornell University),
Год журнала:
2024,
Номер
unknown
Опубликована: Янв. 1, 2024
Named
Entity
Recognition
seeks
to
extract
substrings
within
a
text
that
name
real-world
objects
and
determine
their
type
(for
example,
whether
they
refer
persons
or
organizations).
In
this
survey,
we
first
present
an
overview
of
recent
popular
approaches,
including
advancements
in
Transformer-based
methods
Large
Language
Models
(LLMs)
have
not
had
much
coverage
other
surveys.
addition,
discuss
reinforcement
learning
graph-based
highlighting
role
enhancing
NER
performance.
Second,
focus
on
designed
for
datasets
with
scarce
annotations.
Third,
evaluate
the
performance
main
implementations
variety
differing
characteristics
(as
regards
domain,
size,
number
classes).
We
thus
provide
deep
comparison
algorithms
never
been
considered
together.
Our
experiments
shed
some
light
how
affect
behavior
compare.
Methods in Ecology and Evolution,
Год журнала:
2024,
Номер
15(7), С. 1261 - 1273
Опубликована: Май 20, 2024
Abstract
The
body
of
ecological
literature,
which
informs
much
our
knowledge
the
global
loss
biodiversity,
has
been
experiencing
rapid
growth
in
recent
decades.
increasing
difficulty
synthesising
this
literature
manually
simultaneously
resulted
a
growing
demand
for
automated
text
mining
methods.
Within
domain
deep
learning,
large
language
models
(LLMs)
have
subject
considerable
attention
years
due
to
great
leaps
progress
and
wide
range
potential
applications;
however,
quantitative
investigation
into
their
ecology
so
far
lacking.
In
work,
we
analyse
ability
GPT‐4
extract
information
about
invertebrate
pests
pest
controllers
from
abstracts
articles
on
biological
control,
using
bespoke,
zero‐shot
prompt.
Our
results
show
that
performance
is
highly
competitive
with
other
state‐of‐the‐art
tools
used
taxonomic
named
entity
recognition
geographic
location
extraction
tasks.
On
held‐out
test
set,
species
locations
are
extracted
F1‐scores
99.8%
95.3%,
respectively,
highlight
model
can
effectively
distinguish
between
roles
interest
such
as
predators,
parasitoids
pests.
Moreover,
demonstrate
model's
predict
across
various
ranks.
However,
do
report
small
number
cases
fabricated
(confabulations).
Due
lack
specialised,
pre‐trained
models,
general‐purpose
LLMs
may
provide
promising
way
forward
ecology.
Combined
tailored
prompt
engineering,
be
employed
tasks
ecology,
greatly
reduce
time
spent
manual
screening
labelling
literature.