Author

Xiaolei Ren

Graduation Semester and Year

2023

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Yu Lei

Second Advisor

Jiang Ming

Third Advisor

Hao Che

Fourth Advisor

Junzhou Huang

Abstract

ABSTRACT: Binary diffing is a technique used to compare and identify differences or similarities in executable files without access to source code. The potential applications of binary diffing in various software security tasks, such as vulnerability search, code clone detection, and malware analysis, have generated a vast body of literature in recent years. One of the recurring themes in binary diffing research is the evaluation of its resilience against the impact of compiler optimization, which is the most common source of syntactic differences in binary code. Despite that most binary diffing tools claim that they are immune to compiler optimization, recent studies have highlighted the need for the research community to revisit this claim, particularly regarding non-default optimization settings and function inlining. In this study, we investigate the effect of peephole optimization on binary diffing analysis. Peephole optimization is a feature of mainstream compilers that allows local rewriting of the input program. It replaces instruction sequences within a window (i.e., peephole) with shorter, faster, or functionally equivalent instruction sequences. Our research reveals that peephole optimization primarily affects binary code differences at the intra-procedural level, which contradicts the assumptions made by basic-block-centric comparison approaches. We conducted systematic experiments using LLVM’s unit test suite. We also customized Alive2, an LLVM translation validation tool, to isolate the impact of peephole optimization from the overall optimization process. Our investigation determines the pervasiveness of peephole optimization in the resulting compiled code and explores its effects on current binary diffing techniques. The noticeable decline in performance highlights the importance of considering peephole optimization in the analysis and improvement of binary diffing methodologies. Therefore, our findings suggest that researchers and practitioners should consider the impact of peephole optimization when developing and evaluating binary diffing tools. Further research is necessary to address this challenge and improve the effectiveness of binary diffing in various software security tasks.

Keywords

Compiler optimization, Binary code, Peephole optimization

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

31804-2.zip (759 kB)

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.