A Basic Distributed Fuzzing Framework for FOE

PostsThe Archives

Last week, CERT released a Python-based file format fuzzer for Windows called Failure Observation Engine (FOE). It is a Windows port of their Linux-based fuzzer, Basic Fuzzing Framework(BFF). CERT provided Adobe with an advanced copy of FOE for internal testing, and we have found it to be very useful. One of the key features of FOE is its simplicity. The configuration file is very straightforward, which makes it easy to introduce to new teams. We have also used the “copy” mode of FOE to help automate triaging large sets of external reports. It is a great tool to have for dumb fuzzing. For this blog, I am going to discuss a simple Python wrapper I created during my initial testing of the tool which helped to coordinate running FOE across multiple machines. This approach allows you to pull seed files from a centralized location. You can also view the status of all of the fuzzing runs and their results from the same location. If you are not interested in writing a distributed fuzzing framework, then you might want to stop reading because the rest of this blog is all about code. 🙂

The goal of this distributed fuzzing framework design was to create something simple, lightweight since I was experimenting with a new tool. I set a personal limit of keeping the project to around 1,000 lines of code in order to scope my time investment. That said, I also wanted to build something that I could easily scale later in the event that I liked it enough to invest more time. For the client-side code, I used Python since that was already required for FOE. On the server side, I had a Linux/Apache/MySQL/Perl (LAMP) server. Knowing that everyone has their own preference for server-side authoring, I am only going to describe the server-side architecture rather than providing the Perl source. Nothing in the server-side code is so complicated that a Web developer couldn’t figure out how to do an implementation in the language of their choice from this description. While I designed this for testing the FOE fuzzer, only one file in the entire system is FOE-specific, which makes the infrastructure reusable for other fuzzers. The current name of the main script is “dffiac.py” because I thought of this project as a, “Distributed Fuzzing Framework in a Can”.

For this design, all of the tracking logic is consolidated on the centralized server. The Python script will issue requests for data using simple GETs and POSTs over HTTP. The server will respond to the requests with basic XML. The fuzzing seed files are hosted on the server in a public web server directory from which they can be downloaded. Identified crashes will be uploaded to the server and placed in a public web server directory. Both the client-side and server-side codes are agnostic with regards to the format of the seed files and the targeted application. Therefore, this should be relatively easy to set up in any infrastructure.


The database design

In this design, the mySQL server coordinates the runs across all the different machines. You first need a table containing all the files that you want to fuzz. At a bare minimum, it needs a unique primary key (fid), the name of the file and its location on the web server. I currently have a database of more than 60,000 SWF files that are sub-categorized based on type so that I can focus fuzzing to specific types of SWF files. However, name and location will get you started with fuzzing.



Field Type Description
fid Integer (primary key, autoincrement) The unique File ID for this entry
name VARCHAR The filename
location VARCHAR The relative web directory for the file (e.g. “/fuzzing/files/”)


The next thing that you will need is a table to track all of the fuzzing runs. A “run” is defined as one or more servers testing with the same FOE configuration file against a defined set of seed files. There are multiple ways in which you can define the selected seed files for the run. For instance, you may want to use FOE against multiple types of applications. For this scenario, you might have a different seed_files for each file type. To support the need for different seed_files tables, the design of run_records requires that you provide the “table_name” that will be used for this run. Once a seed_files table is selected, it may be necessary to further restrict the run to a subset of files within the seed_files tables. For instance, you may only want to select a subset of files within the given table. Therefore, the design requires that you provide a “type” parameter which denotes the method for selecting files from the seed_files table. The value of type can include values such as “all”, “range” or any other sub-category you want to define. As an example, this particular run may be a “range” type that starts at start_fid and stops at end_fid.



Field Type Description
rid Integer (primary key, autoincrement) The unique ID for this run
name VARCHAR The human readable name for the run
description VARCHAR A description for the run (e.g. config or mutation used, # of iterations, etc.)
type VARCHAR Values can include (all, range, etc)
table_name VARCHAR The name of the seed_files table that will be used for testing
start_fid Integer The first fid from seed_files to be fuzzed in this run
end_fid Integer The last fid from seed_files to be fuzzed in this run
current_fid Integer This tracks the next fid to be tested during the run


For every run, you will have multiple servers running FOE. For each server instance, it will be necessary to track the server name, when it started, the current status of the server, and when it last provided an update. The status will include values such as “running” and “complete.”  You can infer whether a machine has died based on whether it has been too long since the timestamp for the last_update field was modified.



Field Type Description
siid Integer (primary key, autoincrement) The unique server instance ID
server_name VARCHAR The name of the server (e.g hostname + IP address)
status VARCHAR Is it running or has it completed.
start_time timestamp When did this instance start?
last_update timestamp When was the last request from this instance?
rid Integer What run_record is this instance associated with?


Lastly, you will need a table to record the results. The script will record the server_instance ID (siid) where the crash was found in case there are issues with reproducing the crash. This will allow a QA to retest on the original machine where the crash occured. It is also necessary to track which run was able to identify the crash. The rid is not recorded because it can already be extrapolated from the siid. According to database normalization rules, redundant information should not be stored in tables. In this design, the script will record a result in fuzz_records regardless of whether a crash was identified.  This allows you to track which files have been tested against which FOE configurations. If a crash is identified, the web server directory where the crash result was stored is also recorded.



Field Type Description
frid Integer (primary key, autoincrement) The unique fuzz record ID
fid Integer The seed_files ID for this entry
siid Integer The server instance ID for this entry
crash Boolean Whether a crash was recorded during this test
location VARCHAR Where the crash result was stored (e.g. /results/run_id/)


The config file

You will start the Python script by providing a simple configuration file in the command line: “python dffiac.py dffiac.cfg”. The configuration file is in the same format as the FOE configuration file and contains the following:









The foeoptions section tells the script where to find the Python executable and the location of the FOE config script you will use for this run. The runoptions section provides the run id (rid) the database is using to track this run along with the location of the web server, the path to the action_handler.cgi and the path to the CGI that will handle the file uploads. The logoptions allows you to specify where the script will log local information regarding the run. The logs directory needs to exist prior to starting the script. The config_location and run_id are likely the only two elements that will change from run to run.


The transaction flow

For this next section, we will review the transactions between the dffiac.py script and the web server. The web server will read in the GET parameters, execute the relevant SQL query and return the results as XML. All but one request is handled by the action_handler defined in the dffiac.cfg config file. The upload of the crash results is handled by the upload_cgi defined in the dffiac.cfg config file.

Once dffiac.py has started and been initialized by the config file, the script will begin sending requests to the server. An “action” parameter informs the action_handler CGI which query to perform. The server will always respond to the Python script with the relevant information for the request in a simple XML format.


The first HTTP request from the Python code will be to gather all the information regarding the run_id provided in the config file:

GET /fuzzers/action_handler.cgi?action=getRunInfo&rid=1


The web server will then perform this SQL query with the rid that was provided:

select run_type,start_fid,end_fid from run_records where rid = ?


The results from the query will be used to return the following XML (assuming the run is defined as the range of fids from 1-25):



Now that dffiac.py has the information for the run, it will then inform the web server that the run is starting:

GET /fuzzers/action_handler.cgi?action=recordServerStart&rid=1&serverName=server1


This HTTP request will result in the following SQL query:

insert into server_instances (server_name,status,start_time,rid) values (?,'running',NOW(),?)


The insert_id from this query (siid) becomes the unique identifier for this instance and is returned for use in later queries:



Now that this instance has officially registered to contribute to this run, the Python script will begin requesting individual files to test:

GET /fuzzers/action_handler.cgi?action=getNextFid&rid=1&run_type=range


The corresponding SQL query will vary depending on how you have defined your run. For this example, we will assume that this is a basic run that will incrementally walk through the file IDs in the seed_files table. To accomplish this, we create an SQL variable called “value” and assign it the current_fid. By recording the value of the current fid and incrementing the “value” in a single statement, we can avoid a race condition when multiple servers are running.

update run_records set current_fid = current_fid + 1 where rid = ? and @value := current_fid;


At this point, “@value” is set to 1 which is the fid the Python script will test and the current_fid in the database table has been incremented to 2. The web server can then fetch “@value with the following SQL command:

select @value;


Since the process of asking for the next fid will automatically increment the value of current_fid, the value of current_fid will eventually exceed the value of the end_fid in the database table. While it may seem weird, it doesn’t hurt the process. This can be allowed to occur or you can add a little more server-side logic to have the server return -1 as the current_fid to stop the run when end_fid is reached.


The “select @value” result will be returned to Python script as the current_fid available for testing:



The Python script will then compare the current_fid with the end_fid that it received earlier to determine whether to stop testing.


Once we have the fid of the file that we will test, we can then fetch the information for that specific file:

GET /fuzzers/action_handler.cgi?action=getFileInfo&rid=1&fid=1


Using the rid, the web server can query the run_records table to find the table_name that contains the seed files.

select table_name from run_records where rid = ?


Assuming the result of that query will be saved as the variable, “$table_name”, the web server can construct the query to retrieve the file name and the directory location that corresponds to the file id:

"select name, location from" . $table_name . "where fid = ?"


The web server will return the file name and location with the following XML:



Now, that the location of the seed file is known, it can be downloaded by dffiac.py and saved in the FOE seeds directory. The FOE fuzzer is then started, and dffiac.py waits for FOE to finish testing that seed file. Once FOE testing has completed, the result will need to be recorded by sending the fid and a boolean value indicating whether a crash was identified with that test:

GET /fuzzers/action_handler.cgi?action=recordResult&siid=1&fid=1&crash=1


This will result in the following query:

insert into fuzz_records (siid,fid,crash) values (?,?,?)


The web server will also record that it has received an update from this fuzzing server instance in the server_instances table to let us know that it is still alive and processing:

update server_instances set lastUpdate = NOW() where siid = ?


The result is recorded regardless of success or failure so that you can track which files have been successfully tested with which configs. You could infer this from the run_records, but if a machine dies, a file might be skipped. The server-side code will take the insert_id from the fuzz_records statement (frid) and return the following XML:



If there was a crash, the Python script will zip up the crash directory, base64 encode the file and POST it to the upload_cgi identified in the dffiac configuration file. The script will leave the zip file on the fuzzing server if an error is detected during the upload. Along with the zip file, it will send the rid and frid. The rid is used to store files in a web server directory unique to that run. The frid is sent so that the action_handler can update the fuzz_records entry with the location of the uploaded crash file (e.g. “/results/1/zip_file_name.zip”) in the following SQL query:

update fuzz_records set location = ? where frid = ?


A successful upload will result in the following XML:



A failed upload can return the description of the error to the client with the following XML:

  <error>Replace me with the actual error description</error>


The dffiac.py script will then continue retrieving new files and testing them with FOE until the end_fid is reached. Then the final call to the web server will record that this fuzzing server instance has completed its run and has stopped:

GET /fuzzers/action_handler.cgi?action=recordRunComplete&siid=1


The web server will record the completion with the following SQL query:

update server_instances set status='complete', lastUpdate=NOW() where siid = ?


The web server will respond to this last request with the following XML:



The last XML response is currently ignored by the Python script but a more robust implementation could double-check for errors.


The Python code

The logic for the distributed fuzzing framework is split into one main file (dffiac.py) and three libraries that are contained in a /libs directory. We’ll start with the three libraries in the /libs directory. The code below is the library that contains the utilities for creating the zip file of the crash result.


ZipUtil.py (30 lines)

import zipfile
import os


class ZipUtil:


#Create a zip file and add everything in path_ref
def createZipFile(self, path_ref, filename):
  zip_file = zipfile.ZipFile(filename, 'w')


  #Check to see if path_ref is a file or folder
  if os.path.isfile(path_ref):
    self.addFolder(zip_file, path_ref)




#Recursively add folder contents to the zip file
def addFolder(self, zip_file, folder):
  for file in os.listdir(folder):


    #Get path of child element
    child_path = os.path.join(folder, file)


    #Check to see if the child is a file or folder
    if os.path.isfile(child_path):
    elif os.path.isdir(child_path):
      self.addFolder(zip_file, child_path)


The second library will base64 encode the zip file prior to uploading it to the web server via a POST method.  On the server side, you will need to base64 decode the file before writing it to disk.


PostHandler.py (77 lines)

import mimetools
import mimetypes
import urllib
import urllib2
import base64


class PostHandler(object):


  def __init__(self,webServer,uploadCGI):
    self.web_server = webServer
    self.upload_cgi = uploadCGI
    self.form_vars = []
    self.file_attachments = []
    self.mime_boundary = mimetools.choose_boundary()


  #Add a form field to the request
  def add_form_vars(self, name, value):
    self.form_vars.append((name, value))


  #Get the mimetype for the attachment
  def get_mimetype(self,filename):
    mimetype = mimetypes.guess_type(filename)[0] or 'application/octet-stream'

  #Add a base64 encoded file attachment
  def append_file(self, var_name, filename, file_ref, mimetype=None):
    raw = file_ref.read()
    body = base64.standard_b64encode(raw)
    if mimetype is None:
      mimetype = self.get_mimetype(filename)
    self.file_attachments.append((var_name, filename, mimetype, body))

  #Get the body of the request as a string
  def get_request_body(self):
    lines = []
    section_boundary = '--' + self.mime_boundary


    # Add the form fields
    for (name, value) in self.form_vars:
      lines.append('Content-Disposition: form-data; name="%s"' % name)


    # Add the files to upload
    for var_name, filename, content_type, data in self.file_attachments:
      lines.append('Content-Disposition: file; name="%s"; filename="%s"' % 
        (var_name, filename))
      lines.append('Content-Type: %s' % content_type)
      lines.append('Content-Transfer-Encoding: Base64')


    #Add the final boundary
    lines.append('--' + self.mime_boundary + '--')


    #Combine the list into one long string
    CRLF = 'rn'
    return CRLF.join(lines)
  #Send the final request
  def send_request(self):
    request = urllib2.Request(self.web_server + self.upload_cgi)
    content_type = 'multipart/form-data; boundary=%s' % self.mime_boundary


    form_data = self.get_request_body()


    result = urllib2.urlopen(request).read()
    return result



The third library handles the communication between the client and server. It will generate the GET requests and parse the XML responses.


actionHandler.py (94 lines)

import urllib
import urllib2
from xml.dom.minidom import parseString


class ActionHandler:


  #Initialize with the information from the config file
  def __init__(self,options,localLog):
    self.webServer = options['runoptions']['web_server']
    self.uploadCGI = options['runoptions']['upload_cgi']
    self.actionCGI = options['runoptions']['action_cgi']
    localLog.write("Configured web servern")


  #Parse the XML for the requested text value
  def getText(self,nodelist):
    rc = []
    for node in nodelist:
      if node.nodeType == node.TEXT_NODE:
    return ''.join(rc)


  #Make a web request to the server with the provided GET parameters
  def retrieveInfo(self,values):
    url = self.webServer + self.actionCGI
    data = urllib.urlencode(values)
    response = urllib2.urlopen(url,data)
    xml = response.read()


  #Get the information for the rid provided in the config file
  def getRunInfo(self,rid):
    values = {'action':'getRunInfo',
      'rid': rid}
    xml = self.retrieveInfo(values)
    dom = parseString(xml)
    run_type = self.getText(dom.getElementsByTagName("run_type")[0].childNodes)
    start_fid = self.getText(dom.getElementsByTagName("start_fid")[0].childNodes)
    end_fid = self.getText(dom.getElementsByTagName("end_fid")[0].childNodes)
    return (run_type,start_fid,end_fid)


  #Record that this server instance is starting a run
  def recordServerStart(self,rid,serverName):
    values = {'action':'recordServerStart',
      'rid': rid,
    xml = self.retrieveInfo(values)
    dom = parseString(xml)
    lastrowid = self.getText(dom.getElementsByTagName("siid")[0].childNodes)
    return (lastrowid)


  #Record that the server is now complete with its tests
  def recordRunComplete(self,siid):
    values = {'action':'recordRunComplete',
      'siid': siid}
    xml = self.retrieveInfo(values)


  #Get the fid for the next file to be fuzzed
  def getNextFid(self,rid,fid,run_type):
    values = {'action':'getNextFid',
    xml = self.retrieveInfo(values)
    dom = parseString(xml)
    current_id = self.getText(dom.getElementsByTagName("current_fid")[0].childNodes)
    return current_id


  #Get the file name and location for the selected fid
  def getFileInfo(self,rid,fInfo):
    values = {'action':'getFileInfo',
    xml = self.retrieveInfo(values)
    dom = parseString(xml)
    fInfo.name = self.getText(dom.getElementsByTagName("name")[0].childNodes)
    fInfo.location = self.getText(dom.getElementsByTagName("location")[0].childNodes)


#Record the result from the fuzzing test
  def recordResult(self,siid,fid,result):
    values = {'action':'recordResult',
    xml = self.retrieveInfo(values)
    dom = parseString(xml)
    frid = self.getText(dom.getElementsByTagName("frid")[0].childNodes)
    return frid


Finally, we get to the main file which is responsible for reading the config file and driving the fuzzing run. This is the only file that is specific to the FOE fuzzer.


dffiac.py (177 lines)

import os
import shutil
import socket
import subprocess
import sys
import urllib2
import ConfigParser
import time




from ZipUtil import ZipUtil
from PostHandler import PostHandler
from ActionHandler import ActionHandler


#This will track the fid, and location of the file
class FileInfo:


#Convert the options in the config file to lists
def parse_options(config):
  options = {}
  for section in config.sections():
    options[section] = {}
    for (option, value) in config.items(section):
      options[section][option] = value
  return options

#Create a local text file for logging
def openLog(options):
  localLogDir = options['logoptions']['log_dir']
  runName = options['runoptions']['run_id']
  timestamp = int(time.time())
  localLog = open(localLogDir + runName + '_' + str(timestamp) + '.txt', 'w')
  localLog.write("Starting run: " + runName + " at " + str(timestamp) + "n")
  return localLog


#Close the local text file log
def closeLog(localLog):


#Download the next file to be fuzzed
def getNextFile(fInfo, options, foe_options, localLog):
  u = urllib2.urlopen(options['runoptions']['web_server'] + fInfo.location + fInfo.name)
  localFile = open(foe_options['runoptions']['seedsdir'] + "\" + fInfo.name, 'wb')
  localLog.write ('Created file: ' + foe_options['runoptions']['seedsdir'] + "\" + fInfo.name + 'n')


#Store the results in a zip file
def createZip(outputDir,filename):
  zipTool = ZipUtil()
  zipFile = open(filename,'rb')
  return zipFile


#Post the zip file to the server
def postZip(options,frid,rid,filename,zipFile):
  form = PostHandler(options['runoptions']['web_server'], options['runoptions']['upload_cgi'])
  result = form.send_request()
  return result


if __name__ == "__main__":
  if (len(sys.argv) < 2):
    print "usage: %s <runconfig.cfg>" % sys.argv[0]


  #Read the dffiac config file
  configFile = sys.argv[1]
  if not os.path.exists(configFile):
    print "config file doesn't exist: %s" % configFile
  config = ConfigParser.SafeConfigParser()


  #Read the foe config file
  options = parse_options(config)
  config2 = ConfigParser.SafeConfigParser()
  config2.read (options['foeoptions']['config_location'])
  foe_options = parse_options(config2)


  #Set up logging
  localLog = openLog(options)


  #Configure the web server
  aHandler = ActionHandler(options, localLog)


  #Get the information for this run
  rid = options['runoptions']['run_id']
  (run_type,start_fid,end_fid) = aHandler.getRunInfo(rid)


  #Record server start
  hostName = socket.gethostname()
  hostIP = socket.gethostbyname(hostName)
  serverName = hostName + "_" + hostIP
  siid = aHandler.recordServerStart(rid, serverName)
  localLog.write("Starting as server instance: " + siid + "n")


  #Get the first file to be processed
  fInfo = FileInfo()
  fInfo.fid = aHandler.getNextFid(rid,start_fid,run_type)


  #loop until done
  while (int(fInfo.fid) <= int(end_fid)):
    #Get the location information for the current file


    #Download and store the file

    outputDir = foe_options['runoptions']['outputdir'] + "\" + foe_options['runoptions']['runid']


    #Run fuzzer
    exitCode = subprocess.call(options['foeoptions']['python_location'] + " " + options['foeoptions']['foe_location'] + " " + options['foeoptions']['config_location'], shell=True)


    #Check for completion of a succesful run
    if exitCode != 0:
      localLog.write("Error running foe on fid " + fInfo.fid + "n")
      dirList = os.listdir(outputDir)


      #Detect whether bugs were found
      if len(dirList) > 2:


        #Record the result in fuzz_records
        frid = aHandler.recordResult(siid,fInfo.fid,1)
        localLog.write("Recording frid: " + frid + "n")


        #Store the results in a zip file
        filename = frid + "-" + fInfo.name + ".zip"
        file_path = os.getcwd() + filename
        zipFile = createZip(outputDir,file_path)


        #Post the zip file back to the server
        result = postZip(options,frid,rid,filename,zipFile)


        #Make sure the file got there OK
        if result.find("error") == -1:
          localLog.write("Results successfully uploaded.n")
          localLog.write("There was an error in the upload: " + result + "n")


        localLog.write("Found bugs with " + fInfo.fid + "n")
        #Record no bugs found in the directory
        localLog.write("No bugs found with " + fInfo.fid + "n")


    #The if len(dirlist) check on the results is complete
    #Erase files so that FOE starts clean on the next run
    os.remove(foe_options['runoptions']['seedsdir'] + "\" + fInfo.name)


    #Get the next FID
    fInfo.fid = aHandler.getNextFid(rid,fInfo.fid,run_type)


  #The while loop is complete
  #Record this run instance as being complete


  #Close the local file log


This blog is only meant to describe how you can stand up a basic distributed fuzzing framework based on FOE fairly quickly in approximately 1,000 lines of code. The client-side code turned out to be 378 lines, my server-side action_handler CGI was 150 lines and the upload CGI was 72 lines of Perl. That is enough to get the script to run based on information from a database. With the remaining 400 lines, I created a CGI to display the status of my runs and a CGI to generate a run. You will also want to write a script to mirror the dffiac.cfg and FOE configuration file across machines. Over time, I expect that you would make this design more robust for your particular infrastructure and needs. You can also expand this infrastructure for your other fuzzers with some modifications to the main file. What I provide here is just enough to help you get started performing distributed fuzzing with a small amount of coding and the FOE fuzzer.


Permission for this blog entry is granted as CCplus, http://www.adobe.com/communities/guidelines/ccplus/commercialcode_plus_permission.html


Posts, The Archives

Posted on 05-02-2012