Posts tagged "Fuzzing"

Binspector: Evolving a Security Tool

Binary formats and files are inescapable. Although optimal for computers, they are impractical to understand for the typical developer. Binspector was born when I found myself scouring JPEG metadata blocks to make sure they were telling consistent stories. The tool’s introspection capabilities have since transformed it into an intelligent fuzzing utility, and I am excited to share it in the hopes others will benefit.

I joined Photoshop core engineering in 2008. One of my first tasks was to improve the metadata policies of our Save for Web plugin. When a file was being saved, the exporter would embed metadata in a mash-up of up to three formats (IPTC/IIM, Exif, and XMP). The problem was that the outgoing metadata blocks were inconsistent and oftentimes held conflicting data within the same file. High-granularity details like the metadata source and any conflicting values are lost by the time a GUI presents them. No image viewer (Photoshop included) has such a high degree of introspection.

I ended up writing several tools to handle the binary formats I was interested in, and it was not long before I saw generalities between the tools. I cobbled together a domain-specific language that let me define the interpretation of individual fields in a binary file as well as the relationships between them. That file formed an abstract syntax tree, and when combined with a binary that fit the format, I could suss out knowledge of any bit.

It was at this point that Binspector started to take shape.

Once a format grammar has been built, analysis of a file becomes quite interesting. For example, any binary file can be validated against a format grammar and if it fails, Binspector is able to give a detailed account of where the grammar and binary differ.

Binspector evolved into a security tool when I related its analytical capability to fuzzing. The Photoshop team invests heavily in its stability, and corrupted file formats are a well-known attack vector. Fuzzing for the most part has a “spray and pray” heuristic: throw gobs of (nearly) random data at an application, and see what sticks. The problem with this method is that one has to generate a lot of candidate files to get input code to fail unexpectedly.

By adding knowledge to the fuzzer, might we increase the ‘interesting failure’ rate? For example, what would happen if I took a perfectly fine 100×100-pixel image and set just the width bytes to, say, 255? Finding and fuzzing that specific a target would require a tool that had introspection into a binary file format- exactly what Binspector had been written to do!

The key insight was to have Binspector flag a binary field every time it had been used to further read the file. Need to know the width of the document? Flag the width. Want to know the count of Exif fields? Flag the count. At the end of analysis, an array of potential weak points had been generated. The next phase was to generate exact copies of the known-good file with these specific weak points modified in various ways.

A rising tide raises all boats, and this is certainly true in the domain of application stability and security. Improving the robustness of an application can only improve the general condition of an operating environment. That is one of the reasons why I am elated to release Binspector as open source software. My long-term vision is to build up a body of binary format grammars that can be used by anyone to improve the reliability of input code.

Foster Brereton
Sr. Computer Scientist

Adobe @ NullCon Goa 2015

The ASSET team in Noida recently attended NullCon, a well-known Indian conference centered around information security held in Goa. My team and I attended different trainings on client side security, malware analysis, mobile pen-testing & fuzzing, delivered by industry experts in their respective fields. A training I found particularly helpful was one on client-side security by Mario Heiderich. This training revealed several interesting aspects of browser parsing engines. Mario revealed various ways XSS protections can be defeated and how using modern JavaScript frameworks like AngularJS can also expand attack surface. This knowledge can help us build better protective “shields” for web applications.

Out of the two night talks, the one I found most interesting was on the Google fuzzing framework. The speaker, Abhishek Arya, discussed how fuzz testing for Chrome is scaled using a large infrastructure that can be automated to reveal exploitable bugs with the least amount of human intervention. During the main conference, I attended a couple of good talks discussing such topics as the “sandbox paradox”, an attacker’s perspective on ECMA-2015, drone attacks, and the Cuckoo sandbox. James Forshaw‘s talk on sandboxing was of particular interest as it provided useful knowledge on sandboxes that utilize special APIs on the Windows platform that can help make them better. Another beneficial session was by Jurriaan Bremer on Cuckoo sandbox where he demonstrated how his tool can be used to automate analysis on malware samples.

Day 2 started with the keynote sessions from Paul Vixie (Farsight Security) and Katie Moussouris (HackerOne). A couple of us also attended a lock picking workshop. We were given picks for some well-known lock types. We were then walked through the process of how to go about picking those particular locks. We were successful opening quite a few locks. I also played Bug Bash along with Gineesh (Echosign Team) and Abhijeth (IT Team) where we were given live targets to find vulnerabilities. We were successful in finding a couple of critical issues winning our team some nice prize money. :-)

Adobe has been a sponsor of NullCon for several years. At this year’s event, we were seeking suitable candidates for openings on our various security teams. In between talks, we assisted our HR team in the Adobe booth explaining the technical aspects of our jobs to prospective candidates. We were successful in getting many attendees interested in our available positions.

Overall, the conference was a perfect blend of learning, technical discussion, networking, and fun.

 

Vaibhav Gupta
Security Researcher- ASSET

CanSecWest 2012

The team and I are about to head off to CanSecWest. While I have been attending CanSecWest for several years, this year will be a unique experience for me. During my talk, I will demo an open-source tool I just released, called Adobe SWF Investigator. The tool can be useful for developers, quality engineers and security professionals for analyzing SWF applications. It has been a pet project of mine for some time, and I decided to share it with a broader audience.

Within my current role, I have to look at all aspects of SWF applications from cross-site scripting issues to binary analysis. Therefore, the tool includes capabilities to perform everything from testing cross-site scripting to viewing the individual SWF tags within the file format. I am hoping that by releasing the tool as an open-source ActionScript application, it will encourage all ActionScript developers to learn more about security. The tool is designed to be an extensible framework everyone can build upon or modify. More information on the tool can be found in my DevNet article.

In addition to demonstrating the tool, I will also be talking about Advanced Persistent Response. Adobe has been the focus of hackers for some time, and I plan to discuss what we have learned and observed in the process of responding to those threats. My talk will be on Wednesday at 3:30pm, if you are interested. When I am not speaking, you can probably find me and the Adobe team either at the Adobe table or milling around the pwn2own contest for no particular reason. Please feel free to come by and talk with us. See you there!

Fuzzing Reader – Lessons Learned

Kyle Randolph here. Today we have a guest post by Pat Wibbeler, who led a fuzzing tiger team as part of the Adobe Reader and Acrobat Security Initiative.
As a Software Engineering Manager on Adobe Reader, I take it personally every time an exploit is found in our code. I’ve always taken pride in my work, and I’m particularly proud to work with many brilliant engineers on a product installed on hundreds of millions of desktops.
This ubiquity makes Adobe Reader an appealing target for attack. We’re going to great lengths to protect our users through the Adobe Reader and Acrobat Security Initiative. One major component of this ongoing initiative is application fuzzing. I led a task force in fuzzing Reader when the security initiative began. We have always had fuzzing efforts within the Reader team, but this project scaled our efforts in key areas. I thought I’d share some of our experiences.
First – What is “Fuzzing?” One of the most common application vulnerabilities is caused by improper input validation. When developers fail to validate input data completely, an attacker may be able to craft application input that allows the attacker to gain control of the application or system. PDF files are the most common type of input consumed by Adobe Reader. Developers and testers intuitively test PDF file format parsing with valid cases – can Reader properly render a PDF generated by Adobe Acrobat or another PDF generator? Fuzzing automatically provides invalid and unexpected PDF data to an application, probing for cases where the PDF format may be poorly validated. For more information on fuzzing, you can read the following Wikipedia entry.
There are several existing fuzzers, and all fuzzers work basically like this:
high level pdf fuzzer.JPG
Valid data is fed into a fuzzer. The fuzzer mutates the data, creating invalid data, and feeds it to the program, monitoring for a crash or fault. In our case, the fuzzer mutates a PDF file and launches Adobe Reader, monitoring and logging any crashes or faults. Crashes are then analyzed and prioritized according to whether or not they are likely to be exploitable.
When we began, we set the following general plan: First, select a fuzzer, then select areas to target. Next, put the fuzzer into action by developing tools, and then building and running models in a newly developed infrastructure.
fuzzing-process-j.JPG
Fuzzers have several important features, but the most important for us are:

  • Data Mutation Strategies – How does the fuzzer manipulate input data?
  • Extension Mechanisms – Can the user extend the fuzzer to provide richer mutations or data definition?
  • Monitoring/Logging – How does the fuzzer monitor and sort faults?

The primary fuzzing engine we use is Michael Eddington’s Peach Fuzzer, a tool also used by other teams at Adobe. Peach provides:

  • A rich set of data mutators
  • Several mechanisms for extension and modification (including extending the engine itself since it’s open source)
  • Excellent Monitoring/Logging, using the !exploitable (bang exploitable) tool from Microsoft.

Select Targets
It’s critical to identify and prioritize specific targets during fuzzing. It’s extremely tempting to say there is only one target – “PDF.” If you look at the PDF specification, it quickly becomes clear how rich, complex and extensive the format is. It contains compressed streams, rich cross-referential structures, multimedia data, flash content, annotations, forms, and more. Fuzzing PDF is as complex as the file format itself.
We set the following goal:
Think like an attacker – target the areas of PDF that an attacker is likely to target
Since the Peach Fuzzer is a flexible, robust tool, we’re able to systematically focus on different areas of the PDF file format, covering the most likely targets first.
Build Models
Peach performs fuzzing mutations based on a model built by the developer. A model is an XML description of the data being attacked. Peach calls these “Pit” files. These models can describe the data in detail or simply treat the data as a blob. Peach mutates blob data by flipping bits and sliding specific data values through the blob one byte at a time. We first built simple models that treat streams as blobs. We referred to these as “dumb” or “naïve” models. These simple models were effective, and for me as a Reader developer, humbling.
We followed each simple model with a “smart” model. The Peach Pit format can describe relationships and offsets between parts of the fuzzed data. For instance, a 4-byte field may describe the length or offset of a stream that follows later. Peach will intentionally provide values for this 4-byte field that are at or near common range limits and truncate or lengthen the stream data in an attempt to overrun buffer lengths that may be naively set using the data input from the stream. These models are also quite effective.
Peach Pit models can describe data in two ways:

  • Generational Models – Generational models produce data only from the description of the data. For instance, the description might indicate a fixed-width bit-field of flags followed by a stream with properties indicated by those flags. A generational fuzzer might simply create a random bit-field and a series of random bytes.
  • Mutational Models – Mutational models consume template data based on the description of that data. A varied data set can be fed to these fuzzers, and the data type will be mutated. Mutational fuzzers have the distinct advantage that you can create a completely new fuzzing run simply by feeding the mutational model new input data. For instance, the same JPEG mutational model might consume any existing jpg, mutating it differently based on properties unique to that jpg (e.g. color depth, or whether or not the image is interlaced).

We use both generational and mutational models in our testing, broadening our coverage as much as possible.
Develop Tools
We found it extremely helpful to build tooling to assist in model development and results processing. We built several new tools:

  • Crash Analyzer/Processor – Peach “buckets” (categorizes) exploits based on the signature of the callstack. It’s hard to overstate the value of this bucketing since fuzzing produces many duplicate bugs. We developed a script that iterates through the bucket folders, creating an XML description of all of the results. We built a GUI Crash Analyzer (using Adobe AIR) that allows for visual sorting and analysis of the Peach buckets.
  • PIT Generator – We also developed a GUI tool that we could use to quickly select streams within an existing PDF and create a generational “naïve” model based on that data. This allowed us to quickly build simple models that could run while we spent time building richer smart models.
  • Enveloper – PDF is a format that contains many other formats. We found it somewhat difficult to build mutational models that ignored large portions of an input PDF. We also wanted to be able to find sample input data without having to manually wrap it in a PDF file before fuzzing. We built an “enveloping” extension to Peach that allowed us to build pit files that described only our targeted stream (e.g. JBIG2). The enveloper then wrapped the stream in a valid PDF structure before Peach launched Adobe Reader.

Develop Infrastructure
Unfortunately, models take an extremely long time to run because they have to launch and close an application for each file we generate. Many models can generate from tens to hundreds of thousands of mutations. To speed the run time we leveraged the Acrobat/Reader test automation team’s fabulous grid infrastructure to provision virtual machines, deploy Reader and launch a test. Additionally, Peach has a parallelization feature that allows different parts of a model to be run in different processes. Our automation team worked with the model development team to enhance their automation grid. The grid automatically partitions a Peach run and deploys it across dozens of virtual machines. At peak, we could run hundreds of thousands of iterations per day for a given model, drastically improving throughput.
Run the Models!
It seems obvious, but the most important part of the process is to actually run the models. Developers love to build things, and it’s easy to tweak forever, but those developers who ran the models early and often found the most bugs. We also found that it was easy to make a small mistake in modeling that invalidated the entire model. For example, there are several areas of PDF that Adobe Reader will reject early if they are malformed. In this case, the targeted code might never be executed! It’s critical to debug and troubleshoot models carefully so that precious iteration time is not wasted.
Document Process/Results
This almost goes without saying, but it’s important to keep close tabs on your process and your results. As we brought more people into the process, we found our project wiki to be invaluable. It’s now being used by other groups within Adobe as a learning resource to refine their own fuzzing efforts.
Use Results to Improve the SPLC
Each unique crasher or new vulnerability is an opportunity to drive improvements and optimization into the rest of the Secure Product Lifecycle (SPLC). The code level flaw might highlight a gap in security training, provide an opportunity for some high ROI target code reviews in the surrounding code, or suggest a new candidate for the banned API list. Fuzzing is best done not in a vacuum but as an integrated part of a broader security process.
Parting Advice
My last piece of advice:
Approach fuzzing your own product with humility.
Humility allows you to start simply. It also allows you to question a model that isn’t producing results. We often found that when a model wasn’t producing, it was because of a mistake in the model or because the tool was misconfigured rather than because the code was good. Humility also allows you to choose targets in the unbiased way an attacker will choose them. Finally, humility allows you to recognize that security vulnerabilities can sneak in under even the most watchful eye. Fuzzing is one way we help catch bugs that have survived all the other security processes. Application security should be an ongoing effort on any product, just as it is with Adobe Reader.