Graduation Semester and Year
2023
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Yu Lei
Second Advisor
Jiang Ming
Third Advisor
Hao Che
Fourth Advisor
Junzhou Huang
Abstract
ABSTRACT: Binary diffing is a technique used to compare and identify differences or similarities in executable files without access to source code. The potential applications of binary diffing in various software security tasks, such as vulnerability search, code clone detection, and malware analysis, have generated a vast body of literature in recent years. One of the recurring themes in binary diffing research is the evaluation of its resilience against the impact of compiler optimization, which is the most common source of syntactic differences in binary code. Despite that most binary diffing tools claim that they are immune to compiler optimization, recent studies have highlighted the need for the research community to revisit this claim, particularly regarding non-default optimization settings and function inlining. In this study, we investigate the effect of peephole optimization on binary diffing analysis. Peephole optimization is a feature of mainstream compilers that allows local rewriting of the input program. It replaces instruction sequences within a window (i.e., peephole) with shorter, faster, or functionally equivalent instruction sequences. Our research reveals that peephole optimization primarily affects binary code differences at the intra-procedural level, which contradicts the assumptions made by basic-block-centric comparison approaches. We conducted systematic experiments using LLVM’s unit test suite. We also customized Alive2, an LLVM translation validation tool, to isolate the impact of peephole optimization from the overall optimization process. Our investigation determines the pervasiveness of peephole optimization in the resulting compiled code and explores its effects on current binary diffing techniques. The noticeable decline in performance highlights the importance of considering peephole optimization in the analysis and improvement of binary diffing methodologies. Therefore, our findings suggest that researchers and practitioners should consider the impact of peephole optimization when developing and evaluating binary diffing tools. Further research is necessary to address this challenge and improve the effectiveness of binary diffing in various software security tasks.
Keywords
Compiler optimization, Binary code, Peephole optimization
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Ren, Xiaolei, "INVESTIGATING THE EFFECT OF PEEPHOLE OPTIMIZATIONS ON BINARY CODE DIFFERENCES" (2023). Computer Science and Engineering Dissertations. 334.
https://mavmatrix.uta.edu/cse_dissertations/334
Comments
Degree granted by The University of Texas at Arlington