Search papers, labs, and topics across Lattice.
The paper introduces a new binary format that includes compiler-generated metadata about executable instructions and memory region bounds to improve software safety and maintainability. A tool is presented to generate and insert this metadata, enabling accurate lifting to higher-level representations and more reliable analysis. Evaluation on real-world C/C++ binaries shows no performance impact and successful recompilation, with metadata size significantly smaller than DWARF.
Binaries don't have to be opaque: compiler-generated metadata can unlock accurate disassembly and recompilation without performance overhead.
The binary executable format is the standard method for distributing and executing software. Yet, it is also as opaque a representation of software as can be. If the binary format were augmented with metadata that provides security-relevant information, such as which data is intended by the compiler to be executable instructions, or how memory regions are expected to be bounded, that would dramatically improve the safety and maintainability of software. In this paper, we propose a binary format that is a middle ground between a stripped black-box binary and open source. We provide a tool that generates metadata capturing the compiler's intent and inserts it into the binary. This metadata enables lifting to a correct and recompilable higher-level representation and makes analysis and instrumentation more reliable. Our evaluation shows that adding metadata does not affect runtime behavior or performance. Compared to DWARF, our metadata is roughly 17% of its size. We validate correctness by compiling a comprehensive set of real-world C and C++ binaries and demonstrating that they can be lifted, instrumented, and recompiled without altering their behavior.