Author

Haotian Zhang

ORCID Identifier(s)

0000-0003-0844-3730

Graduation Semester and Year

2023

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Jiang Ming

Abstract

Resolving indirect control flow is one of the fundamental challenges in binary analysis. Improving the accuracy of the indirect control flow analysis is vital to the binary analysis domain. Many analysis algorithms and security techniques rely on a precise indirect control flow result, such as recursive disassembling, control flow integrity, data-flow analysis, etc. Incorrect or even inaccuracy indirect control flow analysis results can compromise or even break the assumptions of these analyses. This thesis explores this topic from two directions, altering the indirect control flow analysis to make it more suitable for different scenarios and improving the accuracy of indirect control flow analysis with deep learning. In the first part, we explore the potential trade-off that can be made in debloating scenarios. Static software debloating often requires an accurate indirect control flow result. However, previous works resolve indirect control flow utilizing the address-taken function, which has too many false positives to debloat the program efficiently. During our observation, debloating does not require the individual indirect control flow result but a set of indirect control flow results mixed together. Instead of solving each indirect control flow, we focus on how the target is loaded from memory. The loaded target can be used in any of the following indirect control flows. We build a novel tool with this methodology to debloat the shared library in MIPS firmware. In the second part, we explore how deep learning can be applied to indirect control flow resolving problems. Unlike text or picture, which has a more straightforward data relation structure, binary is much more complex, especially in the control flow structure. The graph is a natural representation used in the program analysis domain. We utilize the graph neural network in our augmented control flow graph to learn how to predict indirect callees. We translate the indirect callee prediction problem into a graph's edge prediction problem.

Keywords

Indirect control flow, Static analysis, Graph, Neural network

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS