Sunday 10 January 2010

The Need for Standards to evaluate Static Analysis tools

In Jan 2010, on the security static analysis space (also called SAST for Static Application Security Testing (you can download the Gartner's Magic Quadrant report from Fortify's website)) there are a number of security focused commercial products (and services) for analyzing an application's source code (or binaries):  Fortify SCA, IBM with Source Edition (was OunceLabs) and Developer Edition, Armorize CodeSecure, CodeScan, Vericode Security ReviewMicrosoft's CAT.NET, Coverity Static Analysis, Klocwork TruePath, Parasoft Application Security Solution and Art-of-Defence HyperSource (I didn't include any Open Source tool because I am not aware of any (actively used) that is able to perform security focused taint-flow analysis)


The problem is that we don't have any standards, methodology or test-cases for objectively and pragmatically, evaluate, compare and rate these different tools!

That creates a big problem for buyers, because they are not able to make knowledgeable decisions about which is the best tool for the target applications.

For example, one of the most fundamental issues that we have when looking at the results from this type of tool, is the lack of visibility into what they have (or not) done. Namely, we need to know what the tools know, and what the tools don't know. That's the only way we can actually be assured that the tool(s) actually worked and the results we have (or don't have) are actually meaningful.

Key Concept: When we review a tool's report, it's as important to know its blind spots as it is important to know what it found. If you have a scan report of an application with NO (i.e. zero) High or Critical issues, is it because there were NO vulnerabilities, or because the scanner had NO VISIBILITY of what's going on in the target application? (very common if that application used a framework like Struts or Spring MVC).

In order to know/predict how effective one of these tool can be, we need standard ways to list its capabilities and to compare/map them to what the target application(s) actually contains (namely the languages and frameworks used).

As an example, if a tool has problems following interfaces (i.e it doesn't follows the calls through interface implementations), even before we scan the code, we should be able to say, "Well...  XYZ tool is going to have a problem with this application"

This type of visibility will not only allow the buyers to make much more informed decisions,  but will also allow them to effectively use these tools in their organization (and get higher ROI).

Ultimately, I actually thinking that in the short-term solution (until the industry and technology matures)  is that most large companies will have to buy multiple tools and services! The reason is simple. Given the variety of technologies and programming practices that they have internally, only using multiple tools will they be able to get the coverage and quality of results that they need (note that these tools would have to be driven by knowledgeable security teams or 'security savvy developers')

Note 1: WASC has done a good job with WASSEC (Web Application Security Scanner Evaluation Criteria). The problem is that as far as I am aware, there has been no public & peer-reviewed test-cases and ratings (which means that anybody wanting to use WASSEC will have to pay somebody (internally or externally) to perform the tool comparison)

Note 2: NIST is also trying to map the tools performance via its SATE efforts, but my understanding is that the vendor participation is not as good as expected and the results are not fully published (they do seem to have good test cases which should be included in a 'static analysis test-cases')