Automating Enterprise SAML Security Tests – Part I

Major InitiativesOngoing ResearchSecurity Automation

Single Sign-On (SSO) applications are becoming increasingly prevalent in organizations today. While there are many different SSO configuration types, Security Assertion Markup Language (SAML) is one of the most common in enterprise environments. Unfortunately, the current SAML 2.0 version is also old (introduced in 2005), complex, and prone to misconfiguration, which can result in critical authentication and authorization vulnerabilities. Most large organizations likely have hundreds or thousands of applications that have been configured with SAML over the past 15 years, and many new applications still choose to incorporate SAML over other options, like OAuth or Central Authentication Service (CAS). The combination of all these factors can often result in a gold mine of findings for security teams to uncover.

In consideration with a recent Adobe Security Blog post, Creatively Scaling Application Security Coverage and Depth, one can also recognize the importance of automating these types of projects that result in numerous high-impact findings. Not only do these findings help close potential security holes, but they also highlight where the application security review processes can improve. As mentioned in the above article, security teams can get creative in how they approach these tests and implement a process that is scalable, has low false positives, and can point to areas of improvement. Because of how SAML is incorporated into an existing environment, we are provided with a valuable opportunity to programmatically analyze each workflow and follow up with better preventative controls.

In this blog post, we will explain in detail how an organization can gather an inventory of SAML-based applications, test for vulnerabilities in each workflow, and then effectively validate and report those findings with minimal false positives. We will also shed light on common mistakes that can complicate and slow down a project and provide useful tips and tricks that can help avoid these pitfalls. Lastly, we will outline some follow-up actions and controls that can be put into place after testing has been completed and offer a few side-project ideas that can be taken up alongside or after the initial project. Now, let’s dive into how to prepare for testing.

Scaling SAML Tests

Application Inventory

While many organizations suffer from inadequate application inventories, SAML environments fill that gap with a collection of pre-onboarded applications. In order to issue SAML login requests to applications or Service Providers (SPs), an Identity Provider (IDP) is configured to handle initial user authentication (typically against Active Directory) and then communicate that identity to each SP during the respective login. Regardless of the robustness and quality of an organization’s existing application inventory, IDPs house a well-formatted catalog of already-configured applications and contain all the information necessary for various kinds of SAML tests (see Figure 1). Completeness and uniformity of the application data found here facilitate automation of tests by providing Assertion Consumer Service (ACS) URLs and other SAML-specific information about onboarded applications. Whether it’s an in-house IDP or a SaaS product in use, there will be a database or API somewhere that can offer this inventory. 

In addition to having an IDP that houses SAML applications and issues signed SAML Responses, SAML tests will also require a test user in the IDP that has login permissions to all applications to be tested. While these “super accounts” go against security best practices of limiting the number of powerful, privileged users, it is necessary in order to conduct tests at scale. A few ways to limit risk that accompanies the existence of such a powerful IDP account is to temporarily enable the account during testing, then disable it again when done, use multi-factor authentication (MFA), and limit the account’s permissions to read-only.

A picture containing drawing

Description automatically generated

Finding SAML Vulnerabilities

As much research has already been conducted and shared regarding testing the security of SAML integrations, details regarding common vulnerabilities will not be expounded in this post. However, some resources and additional analysis around these issues and how they originate in an environment are provided at the end of this post under Appendix A.

Several SAML security tools are publicly available for manually testing single applications, but we have yet to find one that can test multiple SAML workflows back to back. Most public SAML tools are built for semi-manual usage, probably due to an effort to appeal to a wider, more diverse audience and because of the uniqueness of each test case. For example, different IDPs will each have their own distinct means of authentication, managing SAML inventory and application access, and issuing valid SAML Responses. Because of this, specialized tooling needs to be created for each unique environment that can drive the tests from beginning till end. In our case, we used python to authenticate to our IDP and then produce valid SAML Responses for each active SAML application. We then incorporated a customized version of SAML Raider, a Burp Suite plugin, into our python automation using Py4J in order to conduct various SAML tests. At this point, further specialized automation is needed to determine correctness of findings.

Unfortunately, there are virtually infinite ways an application could respond to a SAML Response. In the case of a legitimate login, it could return a Set-Cookie or an Authentication header, or it could return a session ID in the URL or in a hidden HTML input tag. Along the same lines, there are several different potential responses for a failed login, which could include different combinations of status codes, error messages and redirects. It is highly unlikely that all of an organization’s applications respond to SAML Responses the same way, and this is especially true between different organizations. Thus, automated testing is most-likely to be successful when using a combination of public tools and environment-specific code. 

Potential Setbacks & Tips

During automation of SAML tests, there are several potential setbacks that engineers should be mindful of before starting:

  1. If no API is available for the IDP, then authenticating to it in order to retrieve the SAML inventory could prove challenging, especially if MFA is required. Full automation here may depend on the ability to programmatically approve MFA requests. This issue could also arise as SAML Responses are generated for SPs that require MFA approval in addition to the initial SAML factor. Depending on how many SPs in an IDP require MFA and if the MFA can be automated, this could present a heavy toll on test speed and developer sanity! If MFAs are hindering a project, check if there is a specialized utility account for automation that can automatically approve or bypass MFA restrictions to speed up testing. Otherwise, perhaps an MFA client could be installed on the laptop/server that runs the code and the script could programmatically accept those MFA challenges.
  2. Discrepancies between private/public tool languages can require inventive workarounds to ensure code bases run smoothly together. For example, we used Py4J to allow our Python automation code to interact with our Java test code. Additionally, a lack of documentation and support for open source tools can slow down development and require additional work to maintain a secure, working state. Try to stick to well-vetted and maintained public tools where possible to avoid reinventing the wheel.
  3. A seemingly infinite variety of different responses to both good and bad SAML Responses means that it can be very hard to be 100% certain about the results of a test case. While some newer environments may strongly adhere to more modern, standard practices, like setting session cookies or Authentication headers, most companies probably have a mixture of older applications that handle authentication differently from one another. Verifying true and false positives by analyzing deviations from a legitimate login will be explored more in the next section. Keep this approach in mind as you start thinking about how to test.
  4. Vendor or SaaS applications that have been onboarded to your IDP may not support automatic account provisioning via SAML. That means that although you may have access to the application in your IDP, the SaaS application may not recognize you as a registered user when attempting to authenticate. These applications may be difficult to test because they will never result in successful authentication until an account is created for your scans or a legitimate account is identified and impersonated – assuming the app is vulnerable. SAML tests may still work though, depending on when an application does its signature checks. Most apps will likely first go through normal SAML processing and validation and then lookup the user. In this case, tests can be conducted normally, but instead of looking for successful logins, look for deviations from the legitimate attempted login response, as mentioned in point #3 above. In the case that an application looks up the Subject (username) before validating signatures, then a testing account would need to be provisioned or you would need permission to attempt to impersonate someone who already has access. Unfortunately, these may require manual intervention and setup. You may consider running a side scan for common “user not found” responses to see where you may need to manually provision a user for SAML scans.

Eliminating False Positives

Given the likely mixture of applications’ different responses to legitimate and illegitimate SAML flows, creativity will play a strong role in determining how to best differentiate successful versus failed test cases. Instead of attempting to enumerate all the various ways to identify a successful login after issuing a modified assertion, it is much easier to simply match a test’s response to that of a known successful login. For example, one could first run legitimate logins across all SAML apps, save those results, and then run their various tests against all the SAML apps. The tests that result in HTTP responses that match the legitimate responses are likely true positives. Depending on how the tests were conducted, this can produce highly accurate results.

If the technique above does not provide sufficient confidence of a true positive, then additional checks could be performed to prove successful authentication. Some potential methods could include analyzing each request’s response time or the contents of the response headers and body. Perhaps a valid login takes longer as additional resources are fetched and loaded? Or perhaps a successful login returns a common authentication header? Or maybe a successful login returns commonly used HTML, like profile pictures, copyright dates or navigation bars? One could also use the same or similar logic to identify when a test is not successful, like when an error has prevented an exploit from succeeding, or a login failure has redirected the user back to a login form. 

Furthermore, there may be other out-of-band methods to validate a successful login, like application logging. However, scaling this validation method would require that all SAML applications log authentication results in the same consumable format and that such logs can be programmatically aggregated and fetched for analysis.

As a last resort, high-confidence findings can be manually validated. Using a combination of the other validation methods described above, a confidence rating – think of a pendulum – could be applied to findings, and then follow-up analysis can be performed after the confidence rating passes a predefined threshold.

XSW Validation

Another crucial point to keep in mind when validating results is that applications will respond differently to signature stripping and self-signing attacks versus XML Signature Wrapping (XSW) attacks. While directly removing or modifying a signature will result in a simple pass/fail scenario, the sole acceptance of an XSW payload by a SAML ACS does not prove that the ACS is vulnerable to XSW attacks. Because XSW attacks contain both the original and the modified Assertion or Response nodes, multiple tests need to be conducted for each XSW arrangement to determine which node was processed. In the scenario where an ACS successfully authenticates an XSW attack, the ACS could have ignored the fraudulent XML and processed the correctly signed and referenced XML, which doesn’t demonstrate vulnerabilities. 

One way to validate an XSW vulnerability is to attempt to impersonate another user. However, that would require a second high-privilege account and the ability to determine which user in the XSW attack was authenticated. Identifying the currently logged in user isn’t always possible, though it could potentially be done by checking the response HTML or text for a username or other identifying attribute. Instead, sending two slightly different payloads for the same XSW arrangement could indicate which Assertion or Response node is being processed by the ACS. For example, in the case of an XSW attack using a cloned Assertion, the first payload would be sent as a simple XSW attack containing two Assertions for a single legitimate user. Then an additional payload would be sent with a missing NameID (or user Subject) in the cloned Assertion. If both tests result in successful authentication, then the ACS is likely ignoring the duplicate Assertion and correctly processing the legitimately signed Assertion. If both tests fail, then the app is probably throwing an error because of duplicate Assertions (this is probably the best response to a probable attack). However, if the first test succeeds and the second test fails, then we know that the ACS is successfully processing the cloned and modified XML, thus allowing user impersonation via XSW attacks. So, if you were testing eight different XSW arrangements, then you would perform sixteen XSW tests for each application – two tests for each of the eight XSW arrangements.

I will be offering more best practices around enterprise SAML security in the next part of this blog later this month.

Ty Anderson
Security Researcher


Major Initiatives, Ongoing Research, Security Automation

Posted on 10-06-2020