What is this?
In an effort to improve my writing and analysis skills, I’m going to review papers using less than 500 words. This is my first attempt.
Overview
Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications is a paper written by Davide Balzarotti et. al., and was published at the IEEE Symposium on Security and Privacy in 2008.
Saner attempts to solve the problem of verifying the correctness of sanitization functions. Previous work on analyzing web applications for vulnerabilities assume that built-in sanitization functions completely protect the application from vulnerabilities. This assumption is typically extended to custom sanitization functions (regular expressions, string replacements, etc.)
Proper analysis of sanitization functions would enable a tool to be more precise about the vulnerabilities that it discovers. It can also be used to analyze a language’s built-in sanitization functions.
Saner utilizes static and dynamic approaches to analyze sanitization functions.
The static part was built by extending Pixy to keep track of the string values that each variable can hold. Saner can see if a variable can be used as output and if it is used in the output. However, the method used to keep track of the string values is an over-approximation, which might produce false-positives (but not false-negatives).
A dynamic approach is used to reduce the number of false-positives by generating inputs and seeing if those inputs trigger a vulnerability. In this way, Saner can present all the verified vulnerabilities, but if the user wishes, also present all the possible vulnerabilities so the user can investigate.
Thoughts
Possible Problems
Saner inherits the same limitations as Pixy, namely it does not support PHP’s eval function and aliased array elements.
Future Work
Context-aware
An extension to this (and other static web analyzers) would be to use the context of a variables output in the HTML page. For instance, variables that output to the headers of an HTTP response are vulnerable to HTTP Response Splitting and need to disallow ‘\r’ and ‘\n’, while these characters are safe when output in the HTML response. Another example is a variable that is output after a starting script tag but before the ending tag to customize the JavaScript sent to the user. Here’s a simple example of this:
<script>
var userName = "<?php echo $userName; ?>";
</script>
In this case, restricting only ‘<’ and ‘>’ will not work. The idea of context can be extended to attributes of HTML tags.
Database-aware
Another problem is how to treat variables from the database: are they sanitized or not? A static analyzer that is able to properly model and taint the flow of data into and out of the database would be very cool (and if you know of someone who’s done this, let me know).