https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models
Reasoning,
a
crucial
ability
for
complex
problem-solving,
plays
pivotal
role
in
various
real-world
settings
such
as
negotiation,
medical
diagnosis,
and
criminal
investigation.
It
serves
fundamental
methodology
the
field
of
Artificial
General
Intelligence
(AGI).
With
ongoing
development
foundation
models,
there
is
growing
interest
exploring
their
abilities
reasoning
tasks.
In
this
paper,
we
introduce
seminal
models
proposed
or
adaptable
reasoning,
highlighting
latest
advancements
tasks,
methods,
benchmarks.
We
then
delve
into
potential
future
directions
behind
emergence
within
models.
also
discuss
relevance
multimodal
learning,
autonomous
agents,
super
alignment
context
reasoning.
By
discussing
these
research
directions,
hope
to
inspire
researchers
exploration
field,
stimulate
further
with
contribute
AGI.
Artificial Intelligence Review,
Journal Year:
2024,
Volume and Issue:
57(7)
Published: June 17, 2024
Abstract
Large
language
models
(LLMs)
have
exploded
a
new
heatwave
of
AI
for
their
ability
to
engage
end-users
in
human-level
conversations
with
detailed
and
articulate
answers
across
many
knowledge
domains.
In
response
fast
adoption
industrial
applications,
this
survey
concerns
safety
trustworthiness.
First,
we
review
known
vulnerabilities
limitations
the
LLMs,
categorising
them
into
inherent
issues,
attacks,
unintended
bugs.
Then,
consider
if
how
Verification
Validation
(V&V)
techniques,
which
been
widely
developed
traditional
software
deep
learning
such
as
convolutional
neural
networks
independent
processes
check
alignment
implementations
against
specifications,
can
be
integrated
further
extended
throughout
lifecycle
LLMs
provide
rigorous
analysis
trustworthiness
applications.
Specifically,
four
complementary
techniques:
falsification
evaluation,
verification,
runtime
monitoring,
regulations
ethical
use.
total,
370+
references
are
considered
support
quick
understanding
issues
from
perspective
V&V.
While
intensive
research
has
conducted
identify
yet
practical
methods
called
ensure
requirements.
Deleted Journal,
Journal Year:
2023,
Volume and Issue:
20(2), P. 180 - 193
Published: March 2, 2023
Abstract
The
pre-training-then-fine-tuning
paradigm
has
been
widely
used
in
deep
learning.
Due
to
the
huge
computation
cost
for
pre-training,
practitioners
usually
download
pre-trained
models
from
Internet
and
fine-tune
them
on
downstream
datasets,
while
downloaded
may
suffer
backdoor
attacks.
Different
previous
attacks
aiming
at
a
target
task,
we
show
that
backdoored
model
can
behave
maliciously
various
tasks
without
foreknowing
task
information.
Attackers
restrict
output
representations
(the
values
of
neurons)
trigger-embedded
samples
arbitrary
predefined
through
additional
training,
namely
neuron-level
attack
(NeuBA).
Since
fine-tuning
little
effect
parameters,
fine-tuned
will
retain
functionality
predict
specific
label
embedded
with
same
trigger.
To
provoke
multiple
labels
attackers
introduce
several
triggers
contrastive
values.
In
experiments
both
natural
language
processing
(NLP)
computer
vision
(CV),
NeuBA
well
control
predictions
instances
different
trigger
designs.
Our
findings
sound
red
alarm
wide
use
models.
Finally,
apply
defense
methods
find
pruning
is
promising
technique
resist
by
omitting
neurons.
Recent
studies
have
revealed
that
Backdoor
Attacks
can
threaten
the
safety
of
natural
language
processing
(NLP)
models.
Investigating
strategies
backdoor
attacks
will
help
to
understand
model's
vulnerability.
Most
existing
textual
focus
on
generating
stealthy
triggers
or
modifying
model
weights.
In
this
paper,
we
directly
target
interior
structure
neural
networks
and
mechanism.
We
propose
a
novel
Trojan
Attention
Loss
(TAL),
which
enhances
behavior
by
manipulating
attention
patterns.
Our
loss
be
applied
different
attacking
methods
boost
their
attack
efficacy
in
terms
successful
rates
poisoning
rates.
It
applies
not
only
traditional
dirty-label
attacks,
but
also
more
challenging
clean-label
attacks.
validate
our
method
backbone
models
(BERT,
RoBERTa,
DistilBERT)
various
tasks
(Sentiment
Analysis,
Toxic
Detection,
Topic
Classification).
ACM Computing Surveys,
Journal Year:
2024,
Volume and Issue:
57(4), P. 1 - 35
Published: Nov. 15, 2024
Since
the
emergence
of
security
concerns
in
artificial
intelligence
(AI),
there
has
been
significant
attention
devoted
to
examination
backdoor
attacks.
Attackers
can
utilize
attacks
manipulate
model
predictions,
leading
potential
harm.
However,
current
research
on
and
defenses
both
theoretical
practical
fields
still
many
shortcomings.
To
systematically
analyze
these
shortcomings
address
lack
comprehensive
reviews,
this
article
presents
a
systematic
summary
targeting
multi-domain
AI
models.
Simultaneously,
based
design
principles
shared
characteristics
triggers
different
domains
implementation
stages
defense,
proposes
new
classification
method
for
defenses.
We
use
extensively
review
computer
vision
natural
language
processing,
we
also
examine
applications
audio
recognition,
video
action
multimodal
tasks,
time
series
generative
learning,
reinforcement
while
critically
analyzing
open
problems
various
attack
techniques
defense
strategies.
Finally,
builds
upon
analysis
state
further
explore
future
directions
2022 IEEE Symposium on Security and Privacy (SP),
Journal Year:
2022,
Volume and Issue:
unknown, P. 2025 - 2042
Published: May 1, 2022
Backdoors
can
be
injected
to
NLP
models
such
that
they
misbehave
when
the
trigger
words
or
sentences
appear
in
an
input
sample.
Detecting
backdoors
given
only
a
subject
model
and
small
number
of
benign
samples
is
very
challenging
because
unique
nature
applications,
as
discontinuity
pipeline
large
search
space.
Existing
techniques
work
well
for
with
simple
triggers
single
character/word
but
become
less
effective
complex
(e.g.,
transformer
models).
We
propose
new
backdoor
scanning
technique.
It
transforms
equivalent
differentiable
form.
then
uses
optimization
invert
distribution
denoting
their
likelihood
trigger.
leverages
novel
word
discriminativity
analysis
determine
if
particularly
discriminative
presence
likely
words.
Our
evaluation
on
3839
from
TrojAI
competition
existing
works
7
state-of-art
structures
BERT
GPT,
17
different
attack
types
including
two
latest
dynamic
attacks,
shows
our
technique
highly
effective,
achieving
over
0.9
detection
accuracy
most
scenarios
substantially
outperforming
state-of-the-art
scanners.
submissions
leaderboard
achieve
top
performance
2
out
3
rounds
scanning.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2022,
Volume and Issue:
unknown, P. 13337 - 13347
Published: June 1, 2022
One
major
goal
of
the
AI
security
community
is
to
securely
and
reliably
produce
deploy
deep
learning
models
for
real-world
applications.
To
this
end,
data
poisoning
based
backdoor
attacks
on
neural
networks
(DNNs)
in
production
stage
(or
training
stage)
corresponding
defenses
are
extensively
explored
recent
years.
Ironically,
deployment
stage,
which
can
often
happen
unprofessional
users'
devices
thus
arguably
far
more
threatening
scenarios,
draw
much
less
attention
community.
We
attribute
imbalance
vigilance
weak
practicality
existing
deployment-stage
attack
algorithms
insufficiency
demonstrations.
fill
blank,
work,
we
study
realistic
threat
DNNs.
base
our
a
commonly
used
paradigm
-
adversarial
weight
attack,
where
adversaries
selectively
modify
model
weights
embed
into
deployed
approach
practicality,
propose
first
gray-box
physically
realizable
algorithm
injection,
namely
subnet
replacement
(SRA),
only
requires
architecture
information
victim
support
physical
triggers
real
world.
Extensive
experimental
simulations
system-level
real-
world
demonstrations
conducted.
Our
results
not
suggest
effectiveness
proposed
algorithm,
but
also
reveal
practical
risk
novel
type
computer
virus
that
may
widely
spread
stealthily
inject
DNN
user
devices.
By
study,
call
vulnerability
DNNs
stage.
Backdoor
attacks
for
neural
code
models
have
gained
considerable
attention
due
to
the
advancement
of
intelligence.
However,
most
existing
works
insert
triggers
into
task-specific
data
code-related
downstream
tasks,
thereby
limiting
scope
attacks.
Moreover,
majority
pre-trained
are
designed
understanding
tasks.
In
this
paper,
we
propose
task-agnostic
backdoor
models.
Our
backdoored
model
is
with
two
learning
strategies
(i.e.,
Poisoned
Seq2Seq
and
token
representation
learning)
support
multi-target
attack
generation
During
deployment
phase,
implanted
backdoors
in
victim
can
be
activated
by
achieve
targeted
attack.
We
evaluate
our
approach
on
tasks
three
over
seven
datasets.
Extensive
experimental
results
demonstrate
that
effectively
stealthily
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 16352 - 16362
Published: June 1, 2023
Self-supervised
learning
in
computer
vision
trains
on
unlabeled
data,
such
as
images
or
(image,
text)
pairs,
to
obtain
an
image
encoder
that
learns
high-quality
embeddings
for
input
data.
Emerging
backdoor
attacks
towards
encoders
expose
crucial
vulnerabilities
of
self-supervised
learning,
since
downstream
classifiers
(even
further
trained
clean
data)
may
inherit
behaviors
from
en-coders.
Existing
detection
methods
mainly
focus
supervised
settings
and
cannot
handle
pre-trained
especially
when
labels
are
not
available.
In
this
paper,
we
propose
DECREE,
the
first
back-door
approach
encoders,
requiring
neither
classifier
headers
nor
labels.
We
evaluate
DECREE
over
400
trojaned
under
3
paradigms.
show
effectiveness
our
method
ImageNet
OpenAI's
CLIP
million
image-text
pairs.
Our
consistently
has
a
high
accuracy
even
if
have
only
limited
no
access
pre-training
dataset.
Code
is
available
at
https://github.com/GiantSeaweed/DECREE.
Proceedings of the ACM Web Conference 2022,
Journal Year:
2023,
Volume and Issue:
unknown, P. 2198 - 2208
Published: April 26, 2023
Large-scale
language
models
have
achieved
tremendous
success
across
various
natural
processing
(NLP)
applications.
Nevertheless,
are
vulnerable
to
backdoor
attacks,
which
inject
stealthy
triggers
into
for
steering
them
undesirable
behaviors.
Most
existing
such
as
data
poisoning,
require
further
(re)training
or
fine-tuning
learn
the
intended
patterns.
The
additional
training
process
however
diminishes
stealthiness
of
a
model
usually
requires
long
optimization
time,
massive
amount
data,
and
considerable
modifications
parameters.