arXiv (Cornell University),
Год журнала:
2021,
Номер
unknown
Опубликована: Янв. 1, 2021
Interprocedural
analysis
refers
to
gathering
information
about
the
entire
program
rather
than
for
a
single
procedure
only,
as
in
intraprocedural
analysis.
enables
more
precise
analysis;
however,
it
is
complicated
due
difficulty
of
constructing
an
accurate
call
graph.
Current
algorithms
sound
and
graphs
analyze
complex
dependencies,
therefore
they
might
be
difficult
scale.
Their
complexity
stems
from
kind
type-inference
use,
particular
use
some
variations
points-to
To
address
this
problem,
we
propose
NoCFG,
new
scalable
method
approximating
graph
that
supports
wide
variety
programming
languages.
A
key
property
NoCFG
works
on
coarse
abstraction
program,
discarding
many
language
constructs.
Due
abstraction,
extending
support
also
other
languages
easy.
We
provide
formal
proof
soundness
evaluations
real-world
projects
written
both
Python
C#.
The
experimental
results
demonstrate
high
precision
rate
90%
(lower
bound)
scalability
through
security
use-case
over
with
up
2
million
lines
code.
Call
graphs
play
an
important
role
in
different
contexts,
such
as
profiling
and
vulnerability
propagation
analysis.
Generating
call
efficient
manner
can
be
a
challenging
task
when
it
comes
to
high-level
languages
that
are
modular
incorporate
dynamic
features
higher-order
functions.
Despite
the
language's
popularity,
there
have
been
very
few
tools
aiming
generate
for
Python
programs.
Worse,
these
suffer
from
several
effectiveness
issues
limit
their
practicality
realistic
We
propose
pragmatic,
static
approach
graph
generation
Python.
compute
all
assignment
relations
between
program
identifiers
of
functions,
variables,
classes,
modules
through
inter-procedural
Based
on
relations,
we
produce
resulting
by
resolving
calls
potentially
invoked
Notably,
underlying
analysis
is
designed
scalable,
handling
features,
modules,
generators,
function
closures,
multiple
inheritance.
evaluated
our
prototype
implementation,
which
PyCG,
using
two
benchmarks:
micro-benchmark
suite
containing
small
programs
set
macro-benchmarks
with
popular
real-world
packages.
Our
results
indicate
PyCG
efficiently
handle
thousands
lines
code
less
than
second
(0.38
seconds
1k
LoC
average).
Further,
outperforms
state-of-the-art
both
precision
recall:
achieves
high
rates
~99.2%
adequate
recall
~69.9%.
Finally,
demonstrate
how
aid
dependency
impact
showcasing
potential
enhancement
GitHub's
"security
advisory"
notification
service
example.
Modern
software
is
bloated.
Demand
for
new
functionality
has
led
developers
to
include
more
and
features,
many
of
which
become
unneeded
or
unused
as
evolves.
This
phenomenon,
known
bloat,
results
in
consuming
resources
than
it
otherwise
needs
to.
How
effectively
automatically
debloat
a
long-standing
problem
engineering.
Various
debloating
techniques
have
been
proposed
since
the
late
1990s.
However,
these
are
built
upon
pure
static
analysis
yet
be
extended
evaluated
context
modern
Java
applications
where
dynamic
language
features
prevalent.
Static
analyses
have
problems
modelling
dynamic
language
features
soundly
while
retaining
acceptable
precision.
The
problem
is
well-understood
in
theory,
but
there
little
evidence
on
how
this
impacts
the
analysis
of
real-world
programs.
We
studied
issue
for
call
graph
construction
a
set
31
Java
programs
using
an
oracle
actual
program
behaviour
recorded
from
executions
built-in
and
synthesised
test
cases
with
high
coverage,
measured
recall
that
being
achieved
by
various
static
algorithms
configurations,
investigated
which
lead
to
false
negatives.
Current
Java
static
analyzers,
operating
either
on
the
source
or
bytecode
level,
exhibit
unsoundness
for
programs
that
contain
native
code.
We
show
Native
Interface
(JNI)
specification,
which
is
used
by
to
interoperate
with
code,
principled
enough
permit
reasoning
about
effects
of
code
program
execution
when
it
comes
call-backs.
Our
approach
consists
disassembling
binaries,
recovering
symbol
information
corresponds
method
signatures,
and
producing
a
model
statically
exercising
these
call-backs
appropriate
mock
objects.
The
manages
recover
virtually
all
calls
in
both
Android
desktop
applications—(a)
achieving
100%
native-to-application
call-graph
recall
large
applications
(Chrome,
Instagram)
(b)
capturing
full
call-back
behavior
XCorpus
suite
programs.
ACM Transactions on Software Engineering and Methodology,
Год журнала:
2022,
Номер
32(2), С. 1 - 34
Опубликована: Июль 6, 2022
Software
bloat
is
code
that
packaged
in
an
application
but
actually
not
necessary
to
run
the
application.
The
presence
of
software
issue
for
security,
performance,
and
maintenance.
In
this
article,
we
introduce
a
novel
technique
debloating,
which
call
coverage-based
debloating.
We
implement
one
single
language:
Java
bytecode.
leverage
combination
state-of-the-art
bytecode
coverage
tools
precisely
capture
what
parts
project
its
dependencies
are
used
when
running
with
specific
workload.
Then,
automatically
remove
covered,
order
generate
debloated
version
project.
succeed
debloat
211
library
versions
from
dataset
94
unique
open-source
libraries.
syntactically
correct
preserve
their
original
behaviour
according
Our
results
indicate
68.3%
libraries’
20.3%
total
can
be
removed
through
For
first
time
literature
on
assess
utility
libraries
respect
client
applications
reuse
them.
select
988
projects
either
have
direct
reference
source
or
test
suite
covers
at
least
class
debloat.
show
81.5%
clients,
uses
library,
successfully
compile
pass
replaced
by
version.
Serialisation
related
security
vulnerabilities
have
recently
been
reported
for
numerous
Java
applications.
Since
serialisation
presents
both
soundness
and
precision
challenges
static
analysis,
it
can
be
difficult
analyses
to
precisely
pinpoint
in
a
library.
In
this
paper,
we
propose
hybrid
approach
that
extends
analysis
with
fuzzing
detect
vulnerabilities.
The
novelty
of
our
is
its
use
heap
abstraction
direct
libraries.
This
guides
produce
results
quickly
effectively,
validates
reports
automatically.
Our
shows
potential
as
known
the
Apache
Commons
Collections
IEEE Transactions on Software Engineering,
Год журнала:
2019,
Номер
47(12), С. 2644 - 2666
Опубликована: Дек. 27, 2019
Call
graphs
have
many
applications
in
software
engineering,
including
bug-finding,
security
analysis,
and
code
navigation
IDEs.
However,
the
construction
of
call
requires
significant
investment
program
analysis
infrastructure.
An
increasing
number
programming
languages
compile
to
Java
Virtual
Machine
(JVM),
frameworks
such
as
WALA
SOOT
support
a
broad
range
algorithms
by
analyzing
JVM
bytecode.
This
approach
has
been
shown
work
well
when
applied
bytecode
produced
from
code.
In
this
paper,
we
show
that
it
also
works
for
diverse
other
JVM-hosted
languages:
dynamically-typed
functional
Scheme,
statically-typed
object-oriented
Scala,
polymorphic
OCaml.
Effectively,
get
graph
these
free,
using
existing
infrastructure
Java,
with
only
minor
challenges
soundness.
This,
turn,
suggests
bytecode-based
could
serve
an
implementation
vehicle
IDE
features
languages.
We
present
qualitative
quantitative
analyses
soundness
precision
constructed
bytecodes
languages,
Groovy,
Clojure,
Python,
Ruby.
details
matter
greatly.
particular,
implementations
Ruby
produce
very
unsound
graphs,
due
pervasive
use
reflection,
invokedynamic
instructions,
run-time
generation.
Interestingly,
dynamic
translation
schemes
employed
which
result
static
tend
be
correlated
poor
performance
at
run
time.
While
static
application
security
testing
tools
(SAST)
have
many
known
limitations,
the
impact
of
coding
style
on
their
ability
to
discover
vulnerabilities
remained
largely
unexplored.To
fill
this
gap,
in
study
we
experimented
with
a
combination
commercial
and
open
source
scanners,
compiled
list
over
270
different
code
patterns
that,
when
present,
impede
state-of-theart
analyze
PHP
JavaScript
code.By
discovering
presence
these
during
software
development
lifecycle,
our
approach
can
provide
important
feedback
developers
about
testability
code.It
also
help
them
better
assess
residual
risk
that
could
still
contain
even
analyzers
report
no
findings.Finally,
point
alternative
ways
transform
increase
its
for
SAST.Our
experiments
show
tarpits
are
very
common.For
instance,
an
average
contains
21
best
state
art
analysis
fail
more
than
20
consecutive
instructions
before
encountering
one
them.To
pattern
transformations
findings,
both
manual
automated
designed
replace
subset
equivalent,
but
testable,
code.These
allowed
existing
understand
applications,
lead
detection
440
new
potential
48
projects.We
responsibly
disclosed
all
issues:
31
projects
already
answered
confirming
182
vulnerabilities.Out
confirmed
issues-that
previously
unknown
due
poor
applications
code-there
38
impacting
popular
Github
(>1k
stars),
such
as
Dzzoffice
(3.3k),
JS
Docsify
(19k),
Apexcharts
(11k).25
CVEs
been
published
others
in-process.
IEEE Transactions on Software Engineering,
Год журнала:
2023,
Номер
49(11), С. 5027 - 5045
Опубликована: Окт. 18, 2023
Large-scale
code
reuse
significantly
reduces
both
development
costs
and
time.
However,
the
massive
share
of
third-party
in
software
projects
poses
new
challenges,
especially
terms
maintenance
security.
In
this
paper,
we
propose
a
novel
technique
to
specialize
dependencies
Java
projects,
based
on
their
actual
usage.
Given
project
its
dependencies,
systematically
identify
subset
each
dependency
that
is
necessary
build
project,
remove
rest.
As
result
process,
package
specialized
JAR
file.
Then,
generate
trees
where
original
are
replaced
by
versions.
This
allows
building
with
less
than
original.
result,
become
first-class
concept
supply
chain,
rather
transient
artifact
an
optimizing
compiler
toolchain.
We
implement
our
tool
called
DepTrim
,
which
evaluate
30
notable
open-source
projects.
specializes
total
343
(86.6%)
across
these
successfully
rebuilds
tree.
Moreover,
through
specialization,
removes
57,444
(42.2%)
classes
from
reducing
ratio
8.7
$\boldsymbol{\times}$
5.0
after
specialization.
These
results
indicate
specialization