Preface
This paper introduces the typical problems of the fuzzy closed source PDF reader and the possible solutions to these problems. This paper focuses on two points: minimizing input and terminating the target program in time.
text
The core idea of fuzzy PDF reader is very simple: just take a PDF file, slightly destroy (modify part of PDF binary stream bytes) it and check whether it will crash the reader. It sounds very simple, but it's very difficult to do it correctly and effectively. Pdf file format is one of the most used and important formats. As a result, the vulnerability of PDF reader has been widely used and there are still many Exps developed in 2018. By looking at the number of bugs reported to mainstream PDF readers, there are still many enhancements to be made. Therefore, I want to make some contribution to the security of PDF reader and the fuzzy testing community.
The common problems encountered in the fuzzy PDF reader are as follows:
- Under what circumstances can fuzzer be sure that no crash has occurred?
- The PDF reader never shows that it has finished parsing or rendering the given PDF. When can the application close?
- Which PDFs should be selected as mutation templates?
- The selected PDF should cover as much of the target code as possible. If you can't get the source code, how to calculate the code coverage effectively?
Problem: unable to terminate program
The termination of normal fuzzy is to signal to fuzzy that it has finished processing and has not crashed. This is important for fuzzer because it means that you can now start the next test iteration. The problem with PDF readers is that they never terminate themselves, so fuzziers lack a metric to determine when the next test case can start.
Most of the existing fuzziers do about two things. They either use hard coding to detect timeout, kill the application if there is no crash, or they constantly poll the target CPU cycles and assume that the program can be shut down below a certain threshold.
For both: timeouts and thresholds must be precise, but more or less predictable. For fuzzer, this means that it may close the application too early (it may lose crash) or too late (it may waste time).
Method: how to terminate the program
Our idea is to find the last basic block of the reader and execute it when it is given valid input. The assumption here is that this basic block will only execute if the target reader fully parses and renders a given PDF. Starting with the next input, this block must be patched by terminating the program.
To find out which basic blocks of a program have been executed at run time, researchers used a concept called program tracing. Our idea is to have the target generate additional information about its execution (traces), such as memory usage, using branches or basic blocks of execution. Because the target is not creating this information, you must add a description to it. This process is called "program instrumentation.". In an open source environment, the target program can simply recompile using other compiler extensions, such as address sanitizer (asan), which are responsible for adding checks at compile time. Obviously, this is not possible for a closed-source PDF reader.
Fortunately, we have an amazing framework called dynamorio, which does not require any source code to apply because it detects programs at run time (dynamic binary detection).
drrun.exe -t drcov -dump_text -- Program.exe
The created program trace looks like this:
As you can see, tracing shows which module's base block has been executed, and it preserves the order of the base blocks, which makes it quite easy to determine the last base block. Therefore, the last basic block executed by the target PDF reader is found, and multiple traces are created by providing it with different but effective PDF files. Then we will find that there is a common basic block near the end of the tracking link, which must be detected.
Unfortunately, traces are written to disk only after the program exits, so you must use the (high) hard code timeout here.
Now that the last common base block is found, it needs to be patched in order to terminate the program. This can be achieved by covering the basic block:
Xor eax, eax
push eax
Push Address_Of_ExitProcess
ret
The problem here is that it needs nine bytes to represent these instructions. If the size of the base block is not 9 bytes, subsequent instructions will be destroyed. To solve this problem, you can add a new executable to the PE file, which contains the above instructions. Therefore, you can patch the base block by jumping to the newly added section:
push SectionAddress
ret
To patch the target, you can use the framework lief, which makes it fairly easy to change a given PE file.
Obviously, though, it's much easier to patch basic blocks with breakpoints, a single byte instruction. Many of the existing Fuzzers depend on the termination of the program, so they cannot be used to locate PDF readers. Exit detection can be applied here.
This method is automated and has been successfully used in multiple PDF and image viewers:
- FoxitReader
- PDFXChangeViewer
- XNView
The following screenshot shows the patches at the assembly level. Note that the patched version will return to the newly added section.
Input minimization
The success of fuzzing depends largely on the initial input set (Corpus). Therefore, it is important to ensure that the corpus covers as much of the target code as possible, as this increases the chance of finding bugs in it. In addition, redundancy in the corpus must be avoided so that each PDF triggers the unique behavior of the target.
A common method for this is called corpus distillation. The core idea is to collect a large number of effective input first. Then, for each input, the code coverage of the base block is measured, and if the input only triggers the base block that the previous input has accessed, it is removed from the collection.
corpus = []
inputs = [I1, I2, .... In]
for input in inputs:
new_blocks = execute(program, input)
if new_blocks:
corpus.append(input)
Again, you need to create a program trace. Since there is no source code, dynamic binary detection seems to be the only choice for measuring basic block code coverage.
The problem here is that dynamic binary detection seems to incur unacceptable overhead.
To prove this, we patched FoxitReader with the auto exit method and measured the auto termination time of FoxitReader.
- Vanilla: 1,5 seconds
- DynamoRIO: 6,4 seconds
In this case, dynamic binary detection will lead to a cost of nearly 5 seconds, which is too high to perform effective corpus distillation.
Solution
Because dynamic binary detection is too slow to perform corpus distillation, another method must be found to measure basic block code coverage. The idea consists of two parts:
- Static binary detection
- Create a custom debugger to handle detection
First, each basic block in the target is patched with a breakpoint (one byte instruction (0xCC)). Static patches are applied to binaries on disk. If any basic block is executed, it triggers a breakpoint event (int3), which can be obtained by the supervising debugger. The debugger takes the int3 event and overwrites the breakpoint in both (the binary on disk and the binary in the address space of the original byte). Finally, the target's instruction pointer decrements by 1 and resumes execution.
The following screenshot shows the basic blocks of detection:
With breakpoints, the debugger can easily identify which basic blocks have been executed.
To evaluate the performance of this method, all basic blocks of FoxitReader are patched with breakpoints (1778291 basic blocks)
In the first iteration, FoxitReader took 16 seconds to exit. This is 10 seconds slower than dynamorio. But because breakpoints are undone in binaries on disk, they will never trigger int3 events again. Therefore, it can be assumed that most breakpoints have been recovered after the first iteration, so the overhead should be reasonable.
- First iteration: 16 seconds (48323 breakpoints)
- Second iteration: 2 seconds (2212 breakpoints)
- The third and the following: ~ 1.5 seconds (few breakpoints)
As you can see, after the first iteration, detection has minimal overhead, but the debugger is still able to identify any newly accessed base blocks.
This method has been tested on the main products and runs perfectly in all aspects:
- Adobe Acrobat Reader
- PowerPoint
- FoxitReader
Fuzzing
Through crawling the Internet, 80000 PDFs were collected, and the collection was minimized to 220 unique PDFs, which took about 1.5 days. Use this minimization setting to perform a blur test. The result is very good, and all crashes are pushed to the database:
Finally, fuzzer found 43 unique crashes in about 2 months, three of which reported to zero day initiative. They are assigned the following IDS:
- Zdi-can-7423: out of Bounds Read Remote Code Execution Vulnerability in Foxit reader PDF parsing
- Zdi-can-7353: out of bounds read information disclosure vulnerability of font reader PDF parsing
- Zdi-can-7073: out of bounds read information disclosure vulnerability of font reader PDF parsing