Hacking Book | Free Online Hacking Learning


safe customer, safe information platform

Posted by chiappelli at 2020-02-24


This paper introduces the typical problems of the fuzzy closed source PDF reader and the possible solutions to these problems. This paper focuses on two points: minimizing input and terminating the target program in time.


The core idea of fuzzy PDF reader is very simple: just take a PDF file, slightly destroy (modify part of PDF binary stream bytes) it and check whether it will crash the reader. It sounds very simple, but it's very difficult to do it correctly and effectively. Pdf file format is one of the most used and important formats. As a result, the vulnerability of PDF reader has been widely used and there are still many Exps developed in 2018. By looking at the number of bugs reported to mainstream PDF readers, there are still many enhancements to be made. Therefore, I want to make some contribution to the security of PDF reader and the fuzzy testing community.

The common problems encountered in the fuzzy PDF reader are as follows:

Problem: unable to terminate program

The termination of normal fuzzy is to signal to fuzzy that it has finished processing and has not crashed. This is important for fuzzer because it means that you can now start the next test iteration. The problem with PDF readers is that they never terminate themselves, so fuzziers lack a metric to determine when the next test case can start.

Most of the existing fuzziers do about two things. They either use hard coding to detect timeout, kill the application if there is no crash, or they constantly poll the target CPU cycles and assume that the program can be shut down below a certain threshold.

For both: timeouts and thresholds must be precise, but more or less predictable. For fuzzer, this means that it may close the application too early (it may lose crash) or too late (it may waste time).

Method: how to terminate the program

Our idea is to find the last basic block of the reader and execute it when it is given valid input. The assumption here is that this basic block will only execute if the target reader fully parses and renders a given PDF. Starting with the next input, this block must be patched by terminating the program.

To find out which basic blocks of a program have been executed at run time, researchers used a concept called program tracing. Our idea is to have the target generate additional information about its execution (traces), such as memory usage, using branches or basic blocks of execution. Because the target is not creating this information, you must add a description to it. This process is called "program instrumentation.". In an open source environment, the target program can simply recompile using other compiler extensions, such as address sanitizer (asan), which are responsible for adding checks at compile time. Obviously, this is not possible for a closed-source PDF reader.

Fortunately, we have an amazing framework called dynamorio, which does not require any source code to apply because it detects programs at run time (dynamic binary detection).

drrun.exe -t drcov -dump_text -- Program.exe

The created program trace looks like this:

As you can see, tracing shows which module's base block has been executed, and it preserves the order of the base blocks, which makes it quite easy to determine the last base block. Therefore, the last basic block executed by the target PDF reader is found, and multiple traces are created by providing it with different but effective PDF files. Then we will find that there is a common basic block near the end of the tracking link, which must be detected.

Unfortunately, traces are written to disk only after the program exits, so you must use the (high) hard code timeout here.

Now that the last common base block is found, it needs to be patched in order to terminate the program. This can be achieved by covering the basic block:

Xor eax, eax push eax Push Address_Of_ExitProcess ret

The problem here is that it needs nine bytes to represent these instructions. If the size of the base block is not 9 bytes, subsequent instructions will be destroyed. To solve this problem, you can add a new executable to the PE file, which contains the above instructions. Therefore, you can patch the base block by jumping to the newly added section:

push SectionAddress ret

To patch the target, you can use the framework lief, which makes it fairly easy to change a given PE file.

Obviously, though, it's much easier to patch basic blocks with breakpoints, a single byte instruction. Many of the existing Fuzzers depend on the termination of the program, so they cannot be used to locate PDF readers. Exit detection can be applied here.

This method is automated and has been successfully used in multiple PDF and image viewers:

The following screenshot shows the patches at the assembly level. Note that the patched version will return to the newly added section.

Input minimization

The success of fuzzing depends largely on the initial input set (Corpus). Therefore, it is important to ensure that the corpus covers as much of the target code as possible, as this increases the chance of finding bugs in it. In addition, redundancy in the corpus must be avoided so that each PDF triggers the unique behavior of the target.

A common method for this is called corpus distillation. The core idea is to collect a large number of effective input first. Then, for each input, the code coverage of the base block is measured, and if the input only triggers the base block that the previous input has accessed, it is removed from the collection.

corpus = [] inputs = [I1, I2, .... In] for input in inputs: new_blocks = execute(program, input) if new_blocks: corpus.append(input)

Again, you need to create a program trace. Since there is no source code, dynamic binary detection seems to be the only choice for measuring basic block code coverage.

The problem here is that dynamic binary detection seems to incur unacceptable overhead.

To prove this, we patched FoxitReader with the auto exit method and measured the auto termination time of FoxitReader.

In this case, dynamic binary detection will lead to a cost of nearly 5 seconds, which is too high to perform effective corpus distillation.


Because dynamic binary detection is too slow to perform corpus distillation, another method must be found to measure basic block code coverage. The idea consists of two parts:

First, each basic block in the target is patched with a breakpoint (one byte instruction (0xCC)). Static patches are applied to binaries on disk. If any basic block is executed, it triggers a breakpoint event (int3), which can be obtained by the supervising debugger. The debugger takes the int3 event and overwrites the breakpoint in both (the binary on disk and the binary in the address space of the original byte). Finally, the target's instruction pointer decrements by 1 and resumes execution.

The following screenshot shows the basic blocks of detection:

With breakpoints, the debugger can easily identify which basic blocks have been executed.

To evaluate the performance of this method, all basic blocks of FoxitReader are patched with breakpoints (1778291 basic blocks)

In the first iteration, FoxitReader took 16 seconds to exit. This is 10 seconds slower than dynamorio. But because breakpoints are undone in binaries on disk, they will never trigger int3 events again. Therefore, it can be assumed that most breakpoints have been recovered after the first iteration, so the overhead should be reasonable.

As you can see, after the first iteration, detection has minimal overhead, but the debugger is still able to identify any newly accessed base blocks.

This method has been tested on the main products and runs perfectly in all aspects:


Through crawling the Internet, 80000 PDFs were collected, and the collection was minimized to 220 unique PDFs, which took about 1.5 days. Use this minimization setting to perform a blur test. The result is very good, and all crashes are pushed to the database:

Finally, fuzzer found 43 unique crashes in about 2 months, three of which reported to zero day initiative. They are assigned the following IDS: