What is this?
In an effort to improve my writing and analysis skills, I’m going to review papers using less than 500 words. This is my first attempt.
Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications is a paper written by Davide Balzarotti et. al., and was published at the IEEE Symposium on Security and Privacy in 2008.
Saner attempts to solve the problem of verifying the correctness of sanitization functions. Previous work on analyzing web applications for vulnerabilities assume that built-in sanitization functions completely protect the application from vulnerabilities. This assumption is typically extended to custom sanitization functions (regular expressions, string replacements, etc.)
Proper analysis of sanitization functions would enable a tool to be more precise about the vulnerabilities that it discovers. It can also be used to analyze a language’s built-in sanitization functions.
Saner utilizes static and dynamic approaches to analyze sanitization functions.
The static part was built by extending Pixy to keep track of the string values that each variable can hold. Saner can see if a variable can be used as output and if it is used in the output. However, the method used to keep track of the string values is an over-approximation, which might produce false-positives (but not false-negatives).
A dynamic approach is used to reduce the number of false-positives by generating inputs and seeing if those inputs trigger a vulnerability. In this way, Saner can present all the verified vulnerabilities, but if the user wishes, also present all the possible vulnerabilities so the user can investigate.
Saner inherits the same limitations as Pixy, namely it does not support PHP’s eval function and aliased array elements.
var userName = "<?php echo $userName; ?>";
In this case, restricting only ‘<’ and ‘>’ will not work. The idea of context can be extended to attributes of HTML tags.
Another problem is how to treat variables from the database: are they sanitized or not? A static analyzer that is able to properly model and taint the flow of data into and out of the database would be very cool (and if you know of someone who’s done this, let me know).