In previous blogs ,, we discussed alternatives for creating a large-scale automation framework if you don’t have the resources for a multi-month development project. This blog assumes that you are ready to jump in with both feet on designing your own internal automation solution.
Step 1: Research
While we particularly like our design, that doesn’t mean that our design is the best for your organization. Take time to look at designs, such as Salesforce’s Chimera, Mozilla’s Minion, ThreadFix, Twitter SADB, Guantlt, and other approaches. These projects tackle large scale security implementations in distinctly different ways. It is important to understand the full range of approaches in order to select the one that is the best fit for your organization. Projects like Mozilla Minion and Guantlt are open-source if you are looking for code in addition to ideas. Some tools follow specific development processes, such as Guantlt’s adherence to Behavior Driven Development, which need to be considered. The OWASP AppSec Pipeline project provides information on how to architect around solutions such as ThreadFix.
The tools often break down into aggregators or scanners. Aggregators focus on consolidating information from your diverse deployment of existing tools and then trying to make sense of the information. ThreadFix is an example of this type of project. Scanners look to deploy either existing analysis tools or custom analysis tools at a massive scale. Chimera, Minion, and Adobe’s Security Automation Framework take this approach. This blog focuses on our scanner approach using our Security Automation Framework but we are in the midst of designing an aggregator, as well.
Step 2: Put together a strong team
The design of this solution was not the result of any one person’s genius. You need a group of strong people who can help point out when your idea is not as clever as you might think. This project involved several core people including Mohit Kalra, Kriti Aggarwal, Vijay Kumar Sahu, and Mayank Goyal. Even with a strong team, there was a re-architecting after the version 1 beta as our approach evolved. For the project to be a success in your organization, you will want many perspectives including management, product teams, and fellow researchers.
Step 3: Designing scanning automation
A well thought out design is critical to any tool’s long term success. This blog will provide a technical overview of Adobe’s implementation and our reasoning behind each decision.
The Adobe Security Automation Framework:
The Adobe Security Automation Framework (SAF) is designed around a few core principles that dictate the downstream implementation decisions. They are as follows:
- The “framework” is, in fact, a framework. It is designed to facilitate security automation but it does not try to be more than that. This means:
- SAF does not care what security assessment tool is being run. It just needs the tool to communicate progress and results via a specified API. This allows us to run any tool, based on any language, without adding hard coded support to SAF for each tool.
- SAF provides access to the results data but it is not the primary UI for results data. Each team will want their data viewed in a team specific manner. The SAF APIs allow teams to pull the data and render it as best fits their needs. This also allows the SAF development team to focus their time on the core engine.
- The “framework” is based on Docker. SAF is designed to be multi-cloud and Docker allows portability. The justifications for using a Docker based approach include:
- SAF can be run in cloud environments, in our internal corporate network, or run from our laptops for debugging.
- Development teams can instantiate their own mini-copies of SAF for testing.
- Using Docker allows us to put security assessment tools in separate containers where their dependencies won’t interfere with each other.
- Docker allows us to scale the number of instances of each security assessment tool dynamically with respect to their respective job size.
- SAF is modularized with each service (UI, scheduler, tool instance, etc.) in its own Docker container. This allows for the following advantages:
- The UI is separated from the front-end API allowing the web interface to be just another client of the front-end API. While people will initiate scans from the UI, SAF also allows for API driven scan requests.
- The scanning environments are independent. The security assessment tools may need to be run from various locations depending on their target. For instance, the scan may need to run within an internal network, external network, a specific geographic location, or just within a team’s dedicated test environments. With loose-coupling and a modular design, the security assessment tools can be run globally while still having a local main controller.
- Docker modularity allows for choosing the language and technology stack that is appropriate for that module’s function.
- By having each security test encapsulated within its own Docker container, anyone in the company can have their security assessment tools included in SAF by providing an appropriately configured Docker image. Volunteers can write a simple Python driver based on the SAF SDK that translates a security testing tool’s output into compatible messages for the SAF API and provide that to the SAF team as a Docker image. We do this because:
- The SAF team does not want to be the bottleneck for innovation. By allowing external contributions to SAF, the number of tools that it can run increases at a far faster rate. Given the wide array of technology stacks deployed at Adobe, this allows development teams to contribute tools that are best suited for their environments.
- In certain incident response scenarios, it may be necessary to widely deploy a quick script to analyze your exposure to a situation. With SAF, you could get a quick measurement by adding the script to a Docker image template and uploading it to the framework.
- The “security assertion”, or the test that you want to run, should test a specific vulnerability and provide a true/false result that can be used for accurate measurements of the environment. This is similar to the Behavior Driven Development approaches seen in tools like Guantlt. SAF is not designed to run a generic, catch-all web application penetration tool that will return a slew of results for human triage. Instead it is designed for analysis of specific issues. This has the following advantages:
- If you run an individual test, then you can file an individual bug for tracking the issue.
- You create tests specifically around the security issues that are critical to your organization. The specific tests can then be accurately measured at scale.
- Development teams do not feel that their time is being wasted by being flooded with false positives.
- Since it is an individual test, the developer in charge of fixing that one issue can reproduce the results using the same tool as was deployed by the framework. They could also add the test to their existing automation testing suite.
Mayank Goyal on our team took the above principles and re-architected our version 1 implementation into a design depicted in the following architecture diagram:
The SAF UI
The SAF UI is simplistic in nature since it was not designed to be the analysis or reporting suite. The UI is a single page based web application which works with SAF’s APIs. The UI focuses on allowing researchers to configure their security assertions with the appropriate parameters. The core components of the UI are:
- Defining the assertion: Assertions (tests) are saved within an internal GitHub instance, built via Jenkins and posted to a Docker repository. SAF pulls them as required. The Github repository contains the DockerFile, the code for the test, and the code that acts as bridge between the tool and the SAF APIs using the SAF SDK. Tests can be shared with other team members or kept private.
- Defining the scan: It is possible that the same assertion(test) may be run with different configurations for different situations. The scan page is where you define the parameters for different runs.
- Results: The results page provides access to the raw results. The results are broken down into pass, fail, or error for each host tested. It is accompanied by a simple blob of text that is associated with each result.
- Scans can be set to run at specific intervals.
The SAF API Server
This API server is responsible for preparing the information for the slave testing environments in the next stage. It receives the information for the scan from the UI, an API client, or based on the saved schedule. The tool details and parameters are packaged and uploaded to the job queue in a slave environment for processing. It assembles all the meta information for testing for the task/configuration executor. The master controller also listens for the responses from the queuing system and stores the results in the database. Everything downstream from the master controller is loosely coupled so that we can deploy work out to multiple locations and different geographies.
The Job Queueing system
The Queueing system is responsible for basic queuing and allows the scheduler to schedule tasks based on resource availability and defer them when needed. While cloud providers offer queuing systems, ours is based on RabbitMQ because we wanted to have deployment mobility.
The Task Scheduler
This is the brains of running the tool. It is responsible monitoring all the Docker containers, scaling, resource scheduling and killing rogue tasks. It has the API that receives the status and result messages from the Docker containers. That information is then relayed back to the API server.
The Docker images
The Docker images are based on a micro-service architecture approach. The default baseline is based on Alpine Linux to keep the image footprint small. SAF assertions can also be quite small. For instance, the test can be a small Python script which makes a request to the homepage of a web server and verifies whether an HSTS header was included in the response. This micro-service approach allows the environment to run multiple instances with minimum overhead. The assertion script communicates its status (e.g. keep-alives) and results (pass/fail/error) back to the task executor using the SAF SDK.
While this overview still leaves a lot of the specific details unanswered, it should provide a basic description of our security automation framework approach at the architectural and philosophical level. For a security automation project to be a success at the detailed implementation level, it must be customized to the organization’s technology stack and operational flow. As we progress with our implementation, we will continue to post the lessons that we learn along the way.