Binspector: Evolving a Security Tool

Binary formats and files are inescapable. Although optimal for computers, they are impractical to understand for the typical developer. Binspector was born when I found myself scouring JPEG metadata blocks to make sure they were telling consistent stories. The tool’s introspection capabilities have since transformed it into an intelligent fuzzing utility, and I am excited to share it in the hopes others will benefit.

I joined Photoshop core engineering in 2008. One of my first tasks was to improve the metadata policies of our Save for Web plugin. When a file was being saved, the exporter would embed metadata in a mash-up of up to three formats (IPTC/IIM, Exif, and XMP). The problem was that the outgoing metadata blocks were inconsistent and oftentimes held conflicting data within the same file. High-granularity details like the metadata source and any conflicting values are lost by the time a GUI presents them. No image viewer (Photoshop included) has such a high degree of introspection.

I ended up writing several tools to handle the binary formats I was interested in, and it was not long before I saw generalities between the tools. I cobbled together a domain-specific language that let me define the interpretation of individual fields in a binary file as well as the relationships between them. That file formed an abstract syntax tree, and when combined with a binary that fit the format, I could suss out knowledge of any bit.

It was at this point that Binspector started to take shape.

Once a format grammar has been built, analysis of a file becomes quite interesting. For example, any binary file can be validated against a format grammar and if it fails, Binspector is able to give a detailed account of where the grammar and binary differ.

Binspector evolved into a security tool when I related its analytical capability to fuzzing. The Photoshop team invests heavily in its stability, and corrupted file formats are a well-known attack vector. Fuzzing for the most part has a “spray and pray” heuristic: throw gobs of (nearly) random data at an application, and see what sticks. The problem with this method is that one has to generate a lot of candidate files to get input code to fail unexpectedly.

By adding knowledge to the fuzzer, might we increase the ‘interesting failure’ rate? For example, what would happen if I took a perfectly fine 100×100-pixel image and set just the width bytes to, say, 255? Finding and fuzzing that specific a target would require a tool that had introspection into a binary file format- exactly what Binspector had been written to do!

The key insight was to have Binspector flag a binary field every time it had been used to further read the file. Need to know the width of the document? Flag the width. Want to know the count of Exif fields? Flag the count. At the end of analysis, an array of potential weak points had been generated. The next phase was to generate exact copies of the known-good file with these specific weak points modified in various ways.

A rising tide raises all boats, and this is certainly true in the domain of application stability and security. Improving the robustness of an application can only improve the general condition of an operating environment. That is one of the reasons why I am elated to release Binspector as open source software. My long-term vision is to build up a body of binary format grammars that can be used by anyone to improve the reliability of input code.

Foster Brereton
Sr. Computer Scientist

Presenting “Malware Classifier” Tool

Hi folks,

Karthik here from Adobe PSIRT. Part of what we do at PSIRT is respond to security incidents. Sometimes this involves analyzing malware.  To make life easier, I wrote a Python tool for quick malware triage for our team. I’ve since decided to make this tool, called “Adobe Malware Classifier,” available to other first responders (malware analysts, IT admins and security researchers of any stripe) as an open-source tool, since you might find it equally helpful.

Malware Classifier uses machine learning algorithms to classify Win32 binaries – EXEs and DLLs – into three classes: 0 for “clean,” 1 for “malicious,” or “UNKNOWN.” The tool extracts seven key features from a binary, feeds them to one or all of the four classifiers, and presents its classification results.

The tool was developed using models resultant from running the J48, J48 Graft, PART, and Ridor machine-learning algorithms on a data set of approximately 100,000 malicious programs and 16,000 clean programs.

Malware Classifier is available at Open @ Adobe.

I will be speaking about the research behind the tool at Infosec Southwest 2012 in Austin, TX, on April 1. If you’re going to be there, I look forward to meeting up and discussing product security and secure engineering at Adobe.

CanSecWest 2012

The team and I are about to head off to CanSecWest. While I have been attending CanSecWest for several years, this year will be a unique experience for me. During my talk, I will demo an open-source tool I just released, called Adobe SWF Investigator. The tool can be useful for developers, quality engineers and security professionals for analyzing SWF applications. It has been a pet project of mine for some time, and I decided to share it with a broader audience.

Within my current role, I have to look at all aspects of SWF applications from cross-site scripting issues to binary analysis. Therefore, the tool includes capabilities to perform everything from testing cross-site scripting to viewing the individual SWF tags within the file format. I am hoping that by releasing the tool as an open-source ActionScript application, it will encourage all ActionScript developers to learn more about security. The tool is designed to be an extensible framework everyone can build upon or modify. More information on the tool can be found in my DevNet article.

In addition to demonstrating the tool, I will also be talking about Advanced Persistent Response. Adobe has been the focus of hackers for some time, and I plan to discuss what we have learned and observed in the process of responding to those threats. My talk will be on Wednesday at 3:30pm, if you are interested. When I am not speaking, you can probably find me and the Adobe team either at the Adobe table or milling around the pwn2own contest for no particular reason. Please feel free to come by and talk with us. See you there!