Most
Java
static
analysis
frameworks
provide
an
intermediate
presentation
(IR)
of
Bytecode
to
facilitate
the
development
analyses.
While
such
IRs
are
often
based
on
three-address
code,
transformation
itself
is
a
great
opportunity
apply
optimizations
transformed
as
constant
propagation.
Call
graphs
are
widely
used;
in
particular
for
advanced
control-
and
data-flow
analyses.
Even
though
many
call
graph
algorithms
with
different
precision
scalability
properties
have
been
proposed,
a
comprehensive
understanding
of
sources
unsoundness,
their
relevance,
the
capabilities
existing
this
respect
is
missing.
To
address
problem,
we
propose
Judge,
toolchain
that
helps
unsoundness
improving
soundness
graphs.
In
several
experiments,
use
Judge
an
extensive
test
suite
related
to
(a)
compute
capability
profiles
implementations
Soot,
WALA,
DOOP,
OPAL,
(b)
determine
prevalence
language
features
APIs
affect
modern
Java
Bytecode,
(c)
compare
OPAL
–
highlighting
important
differences
implementations,
(d)
evaluate
necessary
effort
achieve
project-specific
reasonable
sound
We
show
soundness-relevant
features/APIs
frequently
used
support
them
differs
vastly,
up
point
where
comparing
computed
by
same
base
(e.g.,
RTA)
but
frameworks
bogus.
also
can
users
establishing
effort.
Static
analyses
have
problems
modelling
dynamic
language
features
soundly
while
retaining
acceptable
precision.
The
problem
is
well-understood
in
theory,
but
there
little
evidence
on
how
this
impacts
the
analysis
of
real-world
programs.
We
studied
issue
for
call
graph
construction
a
set
31
Java
programs
using
an
oracle
actual
program
behaviour
recorded
from
executions
built-in
and
synthesised
test
cases
with
high
coverage,
measured
recall
that
being
achieved
by
various
static
algorithms
configurations,
investigated
which
lead
to
false
negatives.
Journal of Systems and Software,
Год журнала:
2024,
Номер
212, С. 111971 - 111971
Опубликована: Март 12, 2024
A
core
principle
of
open
science
is
the
clear,
concise
and
accessible
publication
empirical
data,
including
"raw"
observational
data
as
well
processed
results.
However,
in
software
engineering
there
are
no
established
standards
(de
jure
or
de
facto)
for
representing
"opening"
observations
collected
test-driven
experiments
—
that
is,
involving
execution
subjects
controlled
scenarios.
Execution
therefore
usually
represented
ad
hoc
ways,
often
making
it
abstruse
difficult
to
access
without
significant
manual
effort.
In
this
paper
we
present
new
structures
designed
address
problem
by
clearly
defining,
correlating
stimuli
responses
used
execute
experiments.
To
demonstrate
their
utility,
show
how
they
can
be
promote
repetition,
replication
reproduction
experimental
evaluations
AI-based
code
completion
tools.
We
also
proposed
facilitate
incremental
expansion
sets,
thus
repurposing
addressing
research
questions.
Call
graphs
are
at
the
core
of
many
static
analyses
ranging
from
detection
unused
methods
to
advanced
control-and
data-flow
analyses.
Therefore,
a
comprehensive
understanding
precision
and
recall
respective
is
crucial
enable
an
assessment
which
call-graph
construction
algorithms
suited
in
analysis
scenario.
For
example,
malware
often
obfuscated
tries
hide
its
intent
by
using
Reflection.
that
do
not
represent
reflective
method
calls
are,
therefore,
limited
use
when
analyzing
such
apps.
Current
Java
static
analyzers,
operating
either
on
the
source
or
bytecode
level,
exhibit
unsoundness
for
programs
that
contain
native
code.
We
show
Native
Interface
(JNI)
specification,
which
is
used
by
to
interoperate
with
code,
principled
enough
permit
reasoning
about
effects
of
code
program
execution
when
it
comes
call-backs.
Our
approach
consists
disassembling
binaries,
recovering
symbol
information
corresponds
method
signatures,
and
producing
a
model
statically
exercising
these
call-backs
appropriate
mock
objects.
The
manages
recover
virtually
all
calls
in
both
Android
desktop
applications—(a)
achieving
100%
native-to-application
call-graph
recall
large
applications
(Chrome,
Instagram)
(b)
capturing
full
call-back
behavior
XCorpus
suite
programs.
The
dynamic
proxy
API
is
one
of
Java's
most
widely-used
features,
permitting
principled
run-time
code
generation
and
link-
ing.
Dynamic
proxies
can
implement
any
set
interfaces
for-
ward
method
calls
to
a
special
object
that
handles
them
reflectively.
flexibility
proxies,
however,
comes
at
the
cost
having
dynamically
generated
layer
bytecode
cannot
be
penetrated
by
current
static
analyses.
In
this
paper,
we
observe
stylized
enough
permit
analysis.
We
show
how
semantics
modeled
in
straightforward
manner
as
logical
rules
Doop
analysis
framework.
This
concise
enables
Doop's
standard
analyses
process
behind
proxies.
evaluate
our
approach
analyzing
XCorpus,
corpus
real-world
Java
programs:
fully
handle
95%
its
reported
creation
sites.
Our
handling
results
significant
portions
previously
unreachable
or
incompletely-
code.
IEEE Transactions on Software Engineering,
Год журнала:
2021,
Номер
48(9), С. 3613 - 3625
Опубликована: Авг. 4, 2021
The
use
of
vulnerable
open-source
dependencies
is
a
known
problem
in
today's
software
development.
Several
vulnerability
scanners
to
detect
known-vulnerable
appeared
the
last
decade,
however,
there
exists
no
case
study
investigating
impact
development
practices,
e.g.,
forking,
patching,
re-bundling,
on
their
performance.
This
paper
studies
(i)
types
modifications
that
may
affect
and
(ii)
performance
scanners.
Through
an
empirical
7,024
Java
projects
developed
at
SAP
,
we
identified
four
modifications:
re-compilation,
metadata-removal
re-packaging.
In
particular,
found
more
than
87
percent
(56
percent,
resp.)
classes
considered
occur
Maven
Central
re-bundled
(re-packaged,
form.
We
assessed
these
OWASP
Dependency-Check
(OWASP)
Eclipse
Steady,
GitHub
Security
Alerts,
three
commercial
results
show
none
able
handle
all
identified.
Finally,
present
xmlns:xlink="http://www.w3.org/1999/xlink">Achilles
novel
test
suite
with
2,505
cases
allow
replicating
dependencies.
Empirical Software Engineering,
Год журнала:
2023,
Номер
28(2)
Опубликована: Март 1, 2023
Abstract
Machine
Learning
for
Source
Code
()
is
an
active
research
field
in
which
extensive
experimentation
needed
to
discover
how
best
use
source
code’s
richly
structured
information.
With
this
mind,
we
introduce
:
An
Extensible
Java
Dataset
Applications,
a
large-scale,
diverse,
and
high-quality
dataset
targeted
at
.
Our
goal
with
lower
the
barrier
entry
by
providing
building
blocks
experiment
code
models
tasks.
comes
considerable
amount
of
pre-processed
information
such
as
metadata,
representations
(e.g.,
tokens,
ASTs,
graphs),
several
properties
metrics,
static
analysis
results)
50,000
projects
from
dataset,
over
1.2
million
classes
8
methods.
also
extensible
allowing
users
add
new
evaluate
tasks
on
them.
Thus,
becomes
workbench
that
researchers
can
novel
operating
code.
To
demonstrate
utility
report
results
two
empirical
studies
our
data,
ultimately
showing
significant
work
lies
ahead
design
context-aware
reason
broader
network
entities
software
project—the
very
task
designed
help
with.