Research Article

Input-Output Example-Guided Data Deobfuscation on Binary

Figure 2

Overview of AutoSimpler. It consists of obfuscation detector, program synthesizer, and search engine. Obfuscation detector uses a machine learning model to locate the obfuscated code snippets in the target program. The workflow of the program synthesizer is shown as the following steps: (1) Collect a set of input-output examples for the target program. (2) Generate grammar constraints according to the given component and use the grammar to guide the generation of candidate programs. (3) Select an input value from a set of input-output examples and calculate the output value of the candidate program on this input. (4) Compare whether the candidate program and the target program have the same output on the same input. The search engine is a heuristic search algorithm of Nested Monte Carlo Search, which helps to improve the accuracy and efficiency of program synthesis.