NoCFG: A Lightweight Approach for Sound Call Graph Approximation DOI Creative Commons

Aharon Abadi,

Bar Makovitzki,

Ron Shemer

и другие.

arXiv (Cornell University), Год журнала: 2021, Номер unknown

Опубликована: Янв. 1, 2021

Interprocedural analysis refers to gathering information about the entire program rather than for a single procedure only, as in intraprocedural analysis. enables more precise analysis; however, it is complicated due difficulty of constructing an accurate call graph. Current algorithms sound and graphs analyze complex dependencies, therefore they might be difficult scale. Their complexity stems from kind type-inference use, particular use some variations points-to To address this problem, we propose NoCFG, new scalable method approximating graph that supports wide variety programming languages. A key property NoCFG works on coarse abstraction program, discarding many language constructs. Due abstraction, extending support also other languages easy. We provide formal proof soundness evaluations real-world projects written both Python C#. The experimental results demonstrate high precision rate 90% (lower bound) scalability through security use-case over with up 2 million lines code.

Язык: Английский

PyCG: Practical Call Graph Generation in Python DOI

Vitalis Salis,

Thodoris Sotiropoulos, Πάνος Λουρίδας

и другие.

Опубликована: Май 1, 2021

Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call efficient manner can be a challenging task when it comes to high-level languages that are modular incorporate dynamic features higher-order functions. Despite the language's popularity, there have been very few tools aiming generate for Python programs. Worse, these suffer from several effectiveness issues limit their practicality realistic We propose pragmatic, static approach graph generation Python. compute all assignment relations between program identifiers of functions, variables, classes, modules through inter-procedural Based on relations, we produce resulting by resolving calls potentially invoked Notably, underlying analysis is designed scalable, handling features, modules, generators, function closures, multiple inheritance. evaluated our prototype implementation, which PyCG, using two benchmarks: micro-benchmark suite containing small programs set macro-benchmarks with popular real-world packages. Our results indicate PyCG efficiently handle thousands lines code less than second (0.38 seconds 1k LoC average). Further, outperforms state-of-the-art both precision recall: achieves high rates ~99.2% adequate recall ~69.9%. Finally, demonstrate how aid dependency impact showcasing potential enhancement GitHub's "security advisory" notification service example.

Язык: Английский

Процитировано

58

JShrink: in-depth investigation into debloating modern Java applications DOI Open Access
Bobby R. Bruce, Tianyi Zhang,

Jaspreet Singh Arora

и другие.

Опубликована: Ноя. 8, 2020

Modern software is bloated. Demand for new functionality has led developers to include more and features, many of which become unneeded or unused as evolves. This phenomenon, known bloat, results in consuming resources than it otherwise needs to. How effectively automatically debloat a long-standing problem engineering. Various debloating techniques have been proposed since the late 1990s. However, these are built upon pure static analysis yet be extended evaluated context modern Java applications where dynamic language features prevalent.

Язык: Английский

Процитировано

36

On the recall of static call graph construction in practice DOI
Li Sui, Jens Dietrich, Amjed Tahir

и другие.

Опубликована: Июнь 27, 2020

Static analyses have problems modelling dynamic language features soundly while retaining acceptable precision. The problem is well-understood in theory, but there little evidence on how this impacts the analysis of real-world programs. We studied issue for call graph construction a set 31 Java programs using an oracle actual program behaviour recorded from executions built-in and synthesised test cases with high coverage, measured recall that being achieved by various static algorithms configurations, investigated which lead to false negatives.

Язык: Английский

Процитировано

32

Identifying Java calls in native code via binary scanning DOI

George Fourtounis,

Leonidas Triantafyllou,

Yannis Smaragdakis

и другие.

Опубликована: Июль 13, 2020

Current Java static analyzers, operating either on the source or bytecode level, exhibit unsoundness for programs that contain native code. We show Native Interface (JNI) specification, which is used by to interoperate with code, principled enough permit reasoning about effects of code program execution when it comes call-backs. Our approach consists disassembling binaries, recovering symbol information corresponds method signatures, and producing a model statically exercising these call-backs appropriate mock objects. The manages recover virtually all calls in both Android desktop applications—(a) achieving 100% native-to-application call-graph recall large applications (Chrome, Instagram) (b) capturing full call-back behavior XCorpus suite programs.

Язык: Английский

Процитировано

19

Coverage-Based Debloating for Java Bytecode DOI Open Access
César Soto-Valero, Thomas Durieux, Nicolas Harrand

и другие.

ACM Transactions on Software Engineering and Methodology, Год журнала: 2022, Номер 32(2), С. 1 - 34

Опубликована: Июль 6, 2022

Software bloat is code that packaged in an application but actually not necessary to run the application. The presence of software issue for security, performance, and maintenance. In this article, we introduce a novel technique debloating, which call coverage-based debloating. We implement one single language: Java bytecode. leverage combination state-of-the-art bytecode coverage tools precisely capture what parts project its dependencies are used when running with specific workload. Then, automatically remove covered, order generate debloated version project. succeed debloat 211 library versions from dataset 94 unique open-source libraries. syntactically correct preserve their original behaviour according Our results indicate 68.3% libraries’ 20.3% total can be removed through For first time literature on assess utility libraries respect client applications reuse them. select 988 projects either have direct reference source or test suite covers at least class debloat. show 81.5% clients, uses library, successfully compile pass replaced by version.

Язык: Английский

Процитировано

9

A hybrid analysis to detect Java serialisation vulnerabilities DOI
Shawn Rasheed, Jens Dietrich

Опубликована: Дек. 21, 2020

Serialisation related security vulnerabilities have recently been reported for numerous Java applications. Since serialisation presents both soundness and precision challenges static analysis, it can be difficult analyses to precisely pinpoint in a library. In this paper, we propose hybrid approach that extends analysis with fuzzing detect vulnerabilities. The novelty of our is its use heap abstraction direct libraries. This guides produce results quickly effectively, validates reports automatically. Our shows potential as known the Apache Commons Collections

Язык: Английский

Процитировано

14

A Study of Call Graph Construction for JVM-Hosted Languages DOI Creative Commons
Karim Ali, Xiaoni Lai, Zhaoyi Luo

и другие.

IEEE Transactions on Software Engineering, Год журнала: 2019, Номер 47(12), С. 2644 - 2666

Опубликована: Дек. 27, 2019

Call graphs have many applications in software engineering, including bug-finding, security analysis, and code navigation IDEs. However, the construction of call requires significant investment program analysis infrastructure. An increasing number programming languages compile to Java Virtual Machine (JVM), frameworks such as WALA SOOT support a broad range algorithms by analyzing JVM bytecode. This approach has been shown work well when applied bytecode produced from code. In this paper, we show that it also works for diverse other JVM-hosted languages: dynamically-typed functional Scheme, statically-typed object-oriented Scala, polymorphic OCaml. Effectively, get graph these free, using existing infrastructure Java, with only minor challenges soundness. This, turn, suggests bytecode-based could serve an implementation vehicle IDE features languages. We present qualitative quantitative analyses soundness precision constructed bytecodes languages, Groovy, Clojure, Python, Ruby. details matter greatly. particular, implementations Ruby produce very unsound graphs, due pervasive use reflection, invokedynamic instructions, run-time generation. Interestingly, dynamic translation schemes employed which result static tend be correlated poor performance at run time.

Язык: Английский

Процитировано

13

Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications DOI Open Access

Feras Al Kassar,

Giulia Clerici,

Luca Compagna

и другие.

Опубликована: Янв. 1, 2022

While static application security testing tools (SAST) have many known limitations, the impact of coding style on their ability to discover vulnerabilities remained largely unexplored.To fill this gap, in study we experimented with a combination commercial and open source scanners, compiled list over 270 different code patterns that, when present, impede state-of-theart analyze PHP JavaScript code.By discovering presence these during software development lifecycle, our approach can provide important feedback developers about testability code.It also help them better assess residual risk that could still contain even analyzers report no findings.Finally, point alternative ways transform increase its for SAST.Our experiments show tarpits are very common.For instance, an average contains 21 best state art analysis fail more than 20 consecutive instructions before encountering one them.To pattern transformations findings, both manual automated designed replace subset equivalent, but testable, code.These allowed existing understand applications, lead detection 440 new potential 48 projects.We responsibly disclosed all issues: 31 projects already answered confirming 182 vulnerabilities.Out confirmed issues-that previously unknown due poor applications code-there 38 impacting popular Github (>1k stars), such as Dzzoffice (3.3k), JS Docsify (19k), Apexcharts (11k).25 CVEs been published others in-process.

Язык: Английский

Процитировано

8

Automatic Specialization of Third-Party Java Dependencies DOI
César Soto-Valero, Deepika Tiwari, Tim Toady

и другие.

IEEE Transactions on Software Engineering, Год журнала: 2023, Номер 49(11), С. 5027 - 5045

Опубликована: Окт. 18, 2023

Large-scale code reuse significantly reduces both development costs and time. However, the massive share of third-party in software projects poses new challenges, especially terms maintenance security. In this paper, we propose a novel technique to specialize dependencies Java projects, based on their actual usage. Given project its dependencies, systematically identify subset each dependency that is necessary build project, remove rest. As result process, package specialized JAR file. Then, generate trees where original are replaced by versions. This allows building with less than original. result, become first-class concept supply chain, rather transient artifact an optimizing compiler toolchain. We implement our tool called DepTrim , which evaluate 30 notable open-source projects. specializes total 343 (86.6%) across these successfully rebuilds tree. Moreover, through specialization, removes 57,444 (42.2%) classes from reducing ratio 8.7 $\boldsymbol{\times}$ 5.0 after specialization. These results indicate specialization

Язык: Английский

Процитировано

4

On the Usefulness of Python Structural Pattern Matching: An Empirical Study DOI

Norbert Vánder,

Gábor Antal, Péter Hegedűs

и другие.

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Год журнала: 2024, Номер unknown, С. 501 - 511

Опубликована: Март 12, 2024

Язык: Английский

Процитировано

1