Rethinking Threat Intelligence with the LEAD Framework

Major InitiativesOngoing ResearchSecurity Automation

Threat intelligence has been a key component of our detection process for many years. We created the LEAD threat intelligence framework to help security personnel make sense of the threat intelligence data we collect everyday. This framework is based on a unique maturity model that combines machine learning (ML) with automation and security orchestration to better deliver actionable and relevant threat intelligence. What does that really mean?

We broke the threat intelligence process down into four fundamental steps. In each step, the threat intelligence must be:

  1. ReLevant
  2. Efficient
  3. Analyst-driven
  4. Deliverable

Within each step, two elements combine to produce an actionable result. Let’s take a deep dive into each step.

ReLevant

To make threat intelligence relevant, you first need to create a threat profile. A threat profile includes what infrastructure you are trying to defend (e.g., enterprise network, servers, mobile devices, POS systems, and IoT) as well as from whom you are defending that infrastructure. Understanding the potential attacker/s motives is important because the various types of attackers have different degrees of sophistication and ability to mount complex attacks. For example, attacks from script kiddies and scammers are typically not very complex but those more advanced actors can be moderately to very complex. The attacker type then informs which threat intelligence feeds and sources, to ingest. Likewise, the infrastructure you are trying to defend also helps you pinpoint relevant threat intelligence feeds and define specific filters for them.

Now that you have your threat profile, you need to define the threat intelligence program requirements. You can think of this as defining what threats you’re potentially facing and what tools you need to help combat these threats.

To help do this, we created a unique threat intelligence maturity model, which maps to the most common maturity model levels used in the cybersecurity industry. This commonality enables fluid transition to the LEAD framework from other maturity models or frameworks.

Here is a more detailed explanation: 

There are commonly two types of threat intelligence programs: Early Stage programs focus primarily on Indicators of Compromise (IoC), which are the traditional tactical – and often times reactive – indicators used in threat detection. In contrast, Mature Stage programs focus on behavioral indicators or Indicators of Attack (IoA), which is a more proactive way of determining the intent of the bad actors.

Once you’ve defined whether you have an early or mature stage threat intelligence program, you can make better decisions about securing the right tools  for your program, helping ensure that you invest in the tools that will support your program rather than wasting money on a tool that looks sexy but doesn’t solve the problems you face in your stage. For example, an early stage program will detect cyber-threats using atomic indicators (e.g., IP addresses, file hashes, email addresses, URL addresses) in an automated manner.  These indicators will come from threat intelligence sharing groups or OSINT (open source intelligence) and the data needs to be in a standardized format in order for the automation to be efficient. Which leads me to the second step.

Efficient

The second step in the LEAD framework is focused on making your threat intelligence efficient. How do you do that? You score and categorize the data.

The LEAD framework uses a scoring matrix that includes five different properties to assist in determining the importance of each piece of data:

  1. Indicator Type – This is the information that tells you where the attack is coming from: IP address, domain name, file path, email address, URL, or file hash.
  2. Threat Intelligence (TI) Source or Feed – The reliability and accuracy of TI data is often related to the source of the threat, whether that’s OSINT, TI sharing groups, paid vendors, or internal threat intelligence.
  3. Threat Source (a.k.a, The Adversary) – Some threat actors and malware families target specific sectors and infrastructure, which makes them rate higher on the threat scale. These sources include script kiddies, scammers, hacktivists, organized crimes, and nation/state sponsors.
  4. Threat Context – One of the most important factors influencing the score, threat context describes how the attack or threat is being carried out. For example, is it a malware threat, a MITRE attack, a SQL injection, or a cyber-kill chain? Context is also commonly known as TTPs: techniques, tactics, and procedures.
  5. Data Retention – Is the threat intel data historical or new?

Here is an example of a basic threat profile:

Indicator: 1.1.1.1
Indicator Type: IP Address
Threat Intelligence Source: OSINT (Open source intelligence)
Threat Context: Targets MacOS, targets only EU companies, communicating over port 80, It is used only for exfiltrating data (Cyber Kill Chain Phase: Exfiltration)
Data Retention: Last used two months ago

We then use this information to assign a positive or negative score to each property based on the overall threat profile. So, for the above example, the threat score might look like this:

Indicator Type:

IP                     +1
Domain           +2
File Hash         +2
Credit card data+3
Email               +3

TI Data Source:

OSINT              +1
TI sharing        +1
Paid feed         +2
Internal TI       +3

Threat Attribution:

Organized crime +3
Scammers         +2

Context:

Infection Vector – Phishing    +2
Targeted Sector – E-commerce  +4
Targeted Region – Europe.                  +4
Targeted OS – Windows                     +2

Data Retention:

Indicators last seen < 3 months          +2
Indicators last seen > 3 months          -2

Once scored, each threat is then categorized by use case and stakeholder. This helps determine the threat level and whether the threat is currently active, expired, or has previously resulted in a false positive. In this example, incoming IP traffic from OSINT that is older than three months will have a lower score than a threat to credit card information from e-commerce companies that you found out about from a paid TI source.

Analyst-driven

In this step, threat intelligence becomes dynamic rather than static by using feedback from analysts and other stakeholders to filter, categorize, and re-evaluate the threat intelligence data from both internal and external feeds. Using this analyst feedback loop, the TI data is dynamic or, in other words, self-tuning. To further improve the data, we use Machine Learning (ML) to process and Natural Language Processing (NLP) to review the analysts’ feedback and create keywords and tags that, with the help of orchestration, enable us to bring back the data into the framework and further improve scoring and categorization.

Because the feedback provided by the analysts and/or stakeholders is typically supplied in free text format and with multiple data structures, the LEAD framework uses Machine Learning, specifically Natural Language Processing, to analyze the feedback and extract keywords. These keywords are then used to inform the context property of the scoring matrix above. Based on these attributes, the score of a piece of threat intelligence data will increase or decrease. The entire process requires orchestrating multiple automations, resulting in dynamic, or self-tuning, threat intelligence data.

Deliverable

Using standardized data formats is important in delivering relevant and actionable TI data most efficiently. Some important factors to consider include:

  • Flexible API — Exposing all TI data attributes through an API helps create a flawless automation process between the TI data and its consumers/stakeholders.
  • STIX 2.0/JSON data structure — Contributing and ingesting TI becomes a lot easier with STIX. All aspects of suspicion, compromise, and attribution can be represented clearly with objects and descriptive relationships. STIX information can be visually represented for an analyst or stored as JSON to be quickly machine-readable. STIX’s openness allows for integration into existing tools and products or utilized for your specific analyst or network needs.

Metrics

The last step in the LEAD framework generates metrics that help measure the success of the TI program and justify its implementation to management. You should always choose metrics over which you have direct control. While you can’t control the number of threat actors or actual threats you might encounter in a given period, you can control how many services and applications you have that are currently able to collect useful intelligence about possible attacks against them. This keeps focus on a manageable goal (e.g., broad coverage) versus a goal with too many external uncontrollable factors.

In summary, effective metrics should be:

  • Audience-specific
  • Related to how and where TI is used, e.g., did it help prevent or detect a specific threat?
  • Not be actor-driven

Conclusion

We created the LEAD threat intelligence framework to help security personnel make sense of the volumes of threat intelligence data we collect every day, aiding the detection of the most critical threats and the speed of remediation. Based on a unique maturity model that combines machine learning with automation and security orchestration, the LEAD framework uses a four-step process to deliver actionable and relevant threat intelligence to our security personnel, helping ensure the security of our infrastructure—and your data.

Filip Stojkovski
Manager, Threat Intelligence


Major Initiatives, Ongoing Research, Security Automation

Posted on 02-04-2020