deDacota: Toward Preventing Server-Side XSS via Automatic Code and Data Separation

This post is an overview of the paper deDacota: Toward Preventing Server-Side XSS via Automatic Code and Data Separation which was written as a collaboration between the UC Santa Barbara Seclab and Microsoft Research, by yours truly. I’m very excited to present this work at the 2013 ACM Conference on Computer and Communication Security in Berlin. If you’re there, please say hi! (Also, if you have suggestions of places or things to do in Europe, let me know!)

So, what is deDacota?

deDacota

deDacota is my attempt to tackle the Cross-Site Scripting (XSS) problem. I know what you’re thinking, there’s been a ton of excellent research on this area. How could this work possibly be new?

XSS

Previously, we as a research community have looked at XSS vulnerabilities as a problem of lack of sanitization. Those pesky web developers (I am one, so I can say this) just can’t seem to properly sanitized the untrusted input that is output by their application.

You’d think that after all this time (at least a decade, if not more), the XSS problem would be done and solved. Just sanitize those inputs!

Hold on a minute

Well, the XSS problem is actually more complicated than it seems at first glance.

Root cause

For this work, we went back and asked: What’s the root cause of XSS vulnerabilities? The answer is obvious when you think about it, and it’s not a lack of sanitization. The root cause is: the current web application model violates the basic security principle of code and data separation.

In the case of a web application the data is the HTML markup of the page and the code is the JavaScript code, and the problem of XSS comes when an attacker is able to trick a browser to interpret data (HTML markup) as code (JavaScript).

A world without XSS

In an ideal world, the JavaScript that a developer intends to execute on a specific HTML page would be sent to the browser on a different channel. This would mean that even if an attacker is able to inject arbitrary HTML content into the data section, the browser will never interpret that as JavaScript code. Wouldn’t that be great?

That world exists now

Some people who are smarter than me have already realized this problem of mixing HTML and JavaScript code in the same web page. They banded together to create the Content Security Policy (CSP). CSP lets a developer achieve this code and data separation via an HTTP header and browser support.

The basic idea is that the web site sends a CSP HTTP header. This header specifies to the browser which JavaScript can be executed. The developer can specify a set of domains where the page is allowed to load JavaScript (allowing developers to control the src attribute in a <script> tag). Also, the developer can specify that there will be no inline JavaScript in the webpage.

With these two abilities, a web developer can communicate to the browser exactly what JavaScript should be executed, and, if properly designed, an attacker should not be able to execute any JavaScript! How you ask? If the developer forgot to sanitize some output, they have two choices to inject JavaScript (well, OK, there’s a lot of way, but all of them are covered by CSP):

Inject inline JavaScript:
```
 <script>alert('xss');</script>
```

Inject remote JavaScript:

 <script src="example.com/attack.js"></script>

CSP blocks both

Inline JavaScript is blocked by the CSP header, and remote JavaScript is blocked because the src of the script tag is not in the CSP allowed domain list!

Huzzah!

That sounds amazing, sign me up

I fully believe that CSP is the future for web applications. CSP provides excellent defense-in-depth. In fact, Google has required that all new Google Chrome Extensions use CSP.

However, for existing web applications, the conversion can be, shall we say, difficult.

Wouldn’t it be great if this conversion could be done automatically?

Well, I’m glad you asked, and this is where deDacota comes in.

See, we developed an approach to automatically separate the code and data of a web application, and we enforce that separation with CSP. We implemented a prototype of this approach and wrote a paper about it.

That’s cool, how does it work?

Thanks, nice of you to ask.

First, I need to emphasize that this is a research prototype. Consider deDacota as a proof-of-concept that shows that it is possible to automatically separate the code and data of a web application.

As a first step, we tackled the problem of automatically separating the inline JavaScript into external JavaScript. (And we completely ignored the problems of separating JavaScript in HTML attributes, inline CSS, and CSS in HTML attributes. The point of a research prototype is to show that the high-level idea can work.)

Here I’ll try to give a very high-level description of how our approach works.

We first, for every web page in the application, need to approximate the HTML output of that web page. Then, from our approximation, we extract all the inline JavaScript that the page could potentially output. Finally, we rewrite the application so that all the inline JavaScript is turned into external JavaScript. Assuming that you’ve found all the inline JavaScript, boom, your application now has the code and data separated, and you can apply a CSP policy.

Prove it

Prove what, exactly? Prove that this approach works in all cases (it doesn’t), prove that we find all the inline JavaScript (we can’t), or prove that you’ll never break an application (we didn’t, but it may be true).

So what good is it?

Well, we took the first step, and we showed that this approach, while it won’t work for every application, is able to work on real-world ASP.NET applications. We ran our tool on 6 open-source applications, some with known-vulnerabilities, some that were intentionally vulnerable, and one with a developer-written test suite. deDacota was able to successfully discover all inline JavaScript in each application, and rewrite the application without breaking the way the application functions. The rewritten applications, along with CSP, successfully prevent the known vulnerabilities.

I want to know more

Well, if you’ve made it all the way to the end, then I assume you do want to know more. Please, check out the full deDacota paper, and feel free to email me with any questions or follow me on Twitter: @adamdoupe. Thanks!

Extra Credit: Why deDacota? What does it mean?

Adam Doupé

Associate Professor, Arizona State University
Director, Center for Cybersecurity and Trusted Foundations