An Overview of Behavior Driven Development Tools for CI Security Testing

While researching continuous integration (CI) testing models for Adobe, I have been experimenting with different open-source tools like Mittn, BDD-Security, and Gauntlt. Each of these tools centers around a process called Behavior Driven Development (BDD). BDD enables you to define your security requirements as “stories,” which are compatible with Scrum development and continuous integration testing. Whereas previous approaches required development teams to go outside their normal process to use security tools, these frameworks aim to integrate security tools within the existing development process.

None of these frameworks are designed to replace your existing security testing tools. Instead, they’re designed to be a wrapper around those security tools so that you can clearly define unit tests as scenarios within a story. The easiest way to understand the story/scenario concept is to look at a few examples from BDD-Security. This first scenario is part of an authentication story. It verifies that account lockouts are enforced by the demo web application:

Scenario: Lock the user account out after 4 incorrect authentication attempts
Meta: @id auth_lockout
Given the default username from: users.table
And an incorrect password
And the user logs in from a fresh login page 4 times
When the default password is used from: users.table
And the user logs in from a fresh login page
Then the user is not logged in

The BDD frameworks take this human readable statement about your security policy and translates it into a technical unit test for your web application penetration testing tool. With this approach, you’re able to phrase your security requirements for the application as a true/false statement. If Jenkins sees a false result from this unit test, then it catches the bug immediately and can flag the build. In addition, this human readable approach to defining unit tests allows the scenarios to double as documentation. An auditor can quickly read through the scenarios and map them to a threat model or policy requirement.

In order to interact with your web site and perform the log in, the framework will need a corresponding class written in a web browser automation framework. The BDD example above used a custom class that leverages the Selenium 2 framework to navigate to the login page, find the login form elements, fill in their values and have the browser perform the submit action. Selenium is a common tool for web site testers so your testing team may already be familiar with it or similar frameworks.

Writing custom classes that understand your web site is good for creating specific tests around your application logic. However, you can also perform traditional spidering and scanning tests as in this second example from BDD-Security:

Scenario: The application should not contain Cross Site Scripting vulnerabilities
Meta: @id scan_xss
#navigate_app will spider the website
GivenStories: navigate_app.story
Given a scanner with all policies disabled
And the Cross-Site-Scripting policy is enabled
And the attack strength is set to High
And the alert threshold is set to Medium
When the scanner is run
And false positives described in: tables/false_positives.table are removed
Then no Medium or higher risk vulnerabilities should be present

For the BDD-Security demo, the scanner that is used is the OWASP ZAP proxy. Although, BDD frameworks are not limited to tests through a web proxy. For instance, this example from BDD-Security shows how to run Nessus and ensure that the scan doesn’t return with anything that is severity 2 (medium) or higher:

Scenario: The host systems should not expose known security vulnerabilities

Given a nessus server at https://localhost:8834
And the nessus username continuum and the password continuum
And the scanning policy named test
And the target hosts
|hostname |
|localhost  |
When the scanner is run with scan name bddscan
And the list of issues is stored
And the following false positives are removed
|PluginID   |Hostname   |  Reason                                                                      |
|43111      |127.0.0.1    |  Example of how to add a false positive to this story  |
Then no severity: 2 or higher issues should be present

There are a lot of good blogs and presentations (1,2,3) that further explain the benefits of BDD approaches to security, so I won’t go into any further detail. Instead, I will focus on three current tools and highlight key differences that are important to consider when evaluating them.

Which BDD Tool is Right for You?

To start, here is a quick summary of the tools at the time of this writing:

Mittn Gauntlt BDD-Security
Primary Language Python Ruby Java
Approximate Age 3 months 2 years 2 years
Commits within last 3 months yes yes yes
BDD Framework Behave Cucumber jbehave
Default Web App Pen Test Tools Burp Suite, radamsa Garmr, arachni, dirb, sqlmap, curl Zap, Burp Suite
Default SSL analysis sslyze heartbleed, sslyze TestSSL
Default Network Layer Tools N/A nmap nessus
Windows or Unix Unix Unix** Both

** Gauntlt’s ‘When tool is installed” statement is dependent on the Unix “which” command. If you exclude that statement from your scenarios, then many tests will work on Windows.

If you plan to wrap more than the officially supported list of tools or have complex application logic, then you may need custom sentences, known as “step definitions.” Modifying step definitions is not difficult. Although, once you start modifying code, you have to consider how to merge your changes with future updates to the framework.

Each framework has a different approach to their step definitions. For instance, BDD-Security tends to encourage formal step definition sentences in all their test cases which would require code changes for custom steps. With Gauntlt you can store additional step definition files in the attack_adapters directory. Gauntlt also provides flexibility through a few generic step definitions that allow you to check the output of arbitrary raw command lines, as seen in their hello world example below:

Background:
Feature: hello world with gauntlt using the generic command line attack
  Scenario:
    When I launch a “generic” attack with:
      “””
      cat /etc/passwd
      “””
    Then the output should contain:
      “””
      root
      “””

Similarly, you should also consider how the framework will handle false positives from the tools. For instance, Mittn allows you to address the situation by tracking them in a database. BDD-Security allows you to address false positives statements within the scenario, as seen in the Nessus example from above or in a separate table. Gauntlt’s approach is to leverage “should not contain” statements within the scenario.

Since these tools are designed to be integrated into your continuous integration testing frameworks, you will want to evaluate how compatible they will be and how tightly you will need them integrated. For instance, quoting Continuum Security’s BDD introduction:

BDD-Security jobs can be run as a shell script or ant job. With the xUnit and JBehave plugins the results will be stored and available in unit test format. The HTML reports can also be published through Jenkins.

Mittn is also based on Behave and can produce JUnit XML test result documents. Gauntlt’s default output is handled by Cucumber. By default, Gauntlt supports pretty StdOut format and HTML output but you can modify the code to get JUnit output. There is an open improvement request to allow JUnit through the config file. Guantlt has documentation for Travis integration, as well.

Overall, the tools were not difficult to deploy. Gauntlt’s virtual box demo environment in its starter kit can be converted to be deployed via Chef in the cloud with a little work. When choosing a framework, you should also consider the platform support of the security tools you intend to use and the platform of your integrated testing environments. For instance, using a GUI-based tool on a Linux server will require an X11 desktop to be installed.

All of these tools have promise depending on your preferences and needs. BDD-Security would be a good tool for web development teams who are familiar with tools like Selenium and want tight integration with their processes. Gauntlt’s ability to support “generic” attacks makes it a good tool for teams that want to use a wide array of tools in their testing. Mittn is the youngest entry and doesn’t yet have features like built-in spidering support. Although, Python developers can easily find libraries for spidering sites, and Mittn’s external database approach to tracking issues may be useful for teams who have other systems that need to be notified of new results.

Before adopting one of these tools, an organization will likely do a buy-vs.-build analysis with commercial continuous monitoring offerings. For those who will be presenting the build argument, these tools provide enough to make a solid case in that discussion.

Where these frameworks add value is by allowing you to take your existing security tools (Burp, Nessus, etc.) and make them a part of your daily build process. By integrating with the continuous build system, you can immediately identify potential issues and ensure a minimum baseline of security. The scenario-based approach allows you to map requirements in your security policy to clearly defined unit tests. This evolution of open-source security frameworks that are designed to directly integrate with the DevOps process is an exciting step forward in the maturity of security testing.

Peleus Uhley
Lead Security Strategist