Every security engineer wants to build the big security automation framework for the challenge of designing something with complexity. Building those big projects have their set of challenges. Like any good coding project, you want to have a plan before setting out on the adventure.
In the last blog, we dealt with some of the high level business concerns that were necessary to consider in order to design a project that would produce the right results for the organization. In this blog we will look at the high level design considerations from the software architect’s perspective. In the next blog, we will look at the implementer’s concerns. For now, most architects are concerned with the following:
This is a concern for both the implementer and architect, but they often have different perspectives. If you are designing a tool that the organization is going to use as a foundation of its security program, then you need to design the tool such that the team can maintain it over time.
Maintainability through project scope
There are already automation and scalability projects that are deployed by the development team. These may include tools such as Jenkins, Git, Chef, or Maven. All of these frameworks are extensible. If all you want to do is run code with each build, then you might consider integrating into these existing frameworks rather than building your own automation. They will handle things such as logging, alerting, scheduling, and interacting with the target environment. Your team just has to write code to tell them what you want done with each build.
If you are attempting a larger project, do you have a roadmap of smaller deliverables to validate the design as you progress? The roadmap should prioritize the key elements of success for the project in order to get an early sense if you are heading in the right direction with your design. In addition, while it is important to define everything that your project will do, it is also important to define all the things that your tool will not perform. Think ahead to all of the potential tangential use cases that your framework could be asked to perform by management and customers. By establishing what is out of scope for your project, you can set proper expectations earlier in the process and those restrictions will become guardrails to keep you on track when requests for tangential features come in.
Maintainability through function delegation
Can you leverage third-party services for operational issues? Can you use the cloud so that baseline network and machine uptime is maintained by someone else? Can you leverage tools such as Splunk so that log management is handled by someone else? What third-party libraries already exist so that you are only inventing the wheels that need to be specific to your organization? For instance, tools like RabbitMQ are sufficient to handle most queueing needs. The more of the “busy work” that can be delegated to third-party services or code, the more time that the internal developers can spend on perfecting the framework’s core mission.
It is important to know where your large scale security framework may be deployed. Do you need to scan staging environments that are located on an internal network in order to verify security features before shipping? Do you need to scan production systems on an external network to verify proper deployment? Do you need to scan the production instances from outside the corporate network because internal security controls would interfere with the scan? Do you want to have multiple scanning nodes in both the internal and external network? Should you decouple the job runner from the scanning nodes so that the job runner can be on the internal network even if the scanning node is external? Do you want to allow teams to be able to deploy their own instances so that they can run tests themselves? For instance, it may be faster if an India based team can conduct the scan locally than to run the scan from US based hosts. In addition, geographical load balancers will direct traffic to the nearest hosts which may cause scanning blind spots. Do you care if the scanners get deployed to multiple geographic locations so long as they all report back to the same database?
It is important to spend time thinking about the tools that you will want your large security automation framework to run because security testing tools change. You do not want your massive project to die just because the tool it was initially built to execute falls out of fashion and is no longer maintained. If there is a loose coupling between the scanning tool and the framework that runs it, then you will be able to run alternative tools once the ROI on the initial scanning tool diminishes. If you are not doing a large scale framework and are instead just modifying existing automation frameworks, the same principles will apply even if they are at a smaller scale.
While the robustness of tests results is an important criterion for tool selection, complex tools often have complex dependencies. Some tools only require the targeted URL and some tools need complex configuration files. Do you just need to run a few tools or do you want to spend the time to make your framework be security tool agnostic? Can you use a Docker image for each tool in order to avoid dependency collisions between security assessment tools? When the testing tool conducts the attack on the remote host, does the attack presume that code injected into the remote host’s environment can send a message back to the testing tool? If you are building a scalable scanning system with dynamically allocated, short-lived hosts that live behind a NAT server, then it may be tricky for the remote attack code to send a message back to the original security assessment tool.
Inputs and outputs
Do the tools require a complex, custom configuration file per target or do you just need to provide the hostname? If you want to scale across a large number of sites, tools that require complex, per-site configuration files may slow the speed at which you can scale and require more maintenance over time. Does the tool provide a single clear response that is easy to record or does it provide detailed, nuanced responses that require intelligent parsing? Complex results with many different findings may make it more difficult to add alerting around specific issues to the tool. They could also make metrics more challenging depending on what and how you measure.
How many instances of the security assessment tool can be run on a single host? For instance, tools that listen on ports limit the number of instances per server. If so, you may need Docker or a similar auto-scaling solution. Complex tools take longer to run which may cause issues with detecting time outs. How will the tool handle re-tests for identified issues? Does the tool have granularity so that dev team can test their proposed patch against the specific issue? Or does the entire test suite need to be re-run every time the developer wants to verify their patch?
Focus and roll out
If you are tackling a large project, it is important to understand what is the minimum viable product? What is the one thing that makes this tool different than just buying the enterprise version of the equivalent commercial tool? Could the entire project be replaced with a few Python scripts and crontab? If you can’t articulate what extra value your approach will bring over the alternative commercial or crontab approach, then the project will not succeed. The people who would leverage the platform may get impatient waiting for your development to be done. They could instead opt for a quicker solution, such as buying a service, so that they can move on to the next problem. As you design your project, always ask yourself, “Why not cron?” This will help you focus on the key elements of the project that will bring unique value to the organization. Your roadmap should focus on delivering those first.
Just because you are building a tool to empower the security team, doesn’t mean that your software won’t have other customers. This tool will need to interact with the development teams’ environments. This security tool will produce outputs that will eventually need to be processed by the development team. The development teams should not be an afterthought in your design. You will be holding them accountable for the results and they need methods for understanding the context of what your team has found and being able to independently retest.
For instance, one argument for integrating into something like Jenkins or GIt, is that you are using a tool the development team already understands. When you try to explain how your project will affect their environment, using a tool that they know means that the discussion will be in language that they understand. They will still have concerns that your code might have negative impacts on their environment. However, they may have more faith in the project if they can mentally quantify the risk based on known systems. When you create standalone frameworks, then it is harder for them to understand the scale of the risk because it is completely foreign to them.
At Adobe, we have been able to work directly with the development teams for building security automation. In a previous blog, an Adobe developer described the tools that he built as part of his pursuit of an internal black belt security training certification. There are several advantages to having the security champions on development teams build the development tools rather than the core security team. One is that full time developers are often better coders than the security teams and the developers better understand the framework integration. Also, in the event of an issue with the tool, the development team has the knowledge to take emergency action. Often times, a security team just needs the tool to meet specific requirements and the implementation and operational management of the tool can be handled by the team responsible for the environment. This can make the development team more at ease with having the tool in their environment and it frees up the core security team to focus on larger issues.
While jumping right into the challenges of the implementation is always tempting, thinking through the complete data flow for the proposed tools can help save you a lot of rewriting. It also important that you avoid trying to boil the ocean by scoping more than your team can manage. Most importantly, always keep focus on the unique value of your approach and the customers that you need to buy into the tool once it is launched. The next blog will focus on an implementer’s concerns around platform selection, queuing, scheduling, and scaling by looking at example implementations.