The story you are about to read is absolutely true — so true that I haven't even bothered to change any names to protect the innocent (we're all guilty in this sordid little tale, so there are really no innocents to protect). What you read may shock, enrage, and confuse you and — when I get to the part about Barney — may even make you snort that two-buck-a-bottle "soft drink" laced with St. John's wort, ginseng, and taurine right out of your nose and onto your keyboard (a favor, really, if you're using one of those annoying split keyboards). In any case, don't say I didn't warn you.
If you've been around the Web long enough to remember the introduction of forms and CGI, then you're probably already familiar with the gist of the story I'm about to tell. But while you may be tempted to write off the pages that follow as old hat, please don't. I'll be covering an all-new threat to the Web development community, and whether you're a grizzled veteran or wide-eyed newbie, odds are you're vulnerable to its effects.
So listen up! In the pages that follow, I'll brush up on some things you should know already, show you a few things you certainly never thought of before, and give you some tips on how to protect yourself and your visitors from the most recent in a long line of dastardly attempts to deprive you of what's left of your already meager sleep.
Yes, you're going to learn about the latest threat to our Web sites: cross-site scripting.
Why You Can't Trust Anybody
Cross-site scripting (or "CSS," as it was unfortunately christened by some folks who probably don't have much daily contact with cascading style sheets) is a "new" security issue that might just surprise you with how insidious and wide-ranging it is. To avoid confusion, since this stuff is already confusing enough, I've decided to use the acronym XSS to refer to cross-site scripting, figuring that anyone who's ever seen a "PED XING" sign will know what I'm talking about. But what I'd really like to call it (if only I had more clout in the land of acronymphs!) is the "YSTANEBPD vulnerability," which is short for "you shouldn't trust anybody, not even big purple dinosaurs."
But no matter what you call it, the story begins the same: One day, back when the Web was fresh and shiny and a guestbook script was sufficient to get you promoted to the head of your own department (the same fine, red-letter year that I stopped selling my blood for money), I was cruising the home pageof Randal Schwartz (aka Just Another Perl Hacker) and ran across a short little piece about what he was doing for kicks. Apparently, back then he got his jollies by using AltaVista to look for guestbooks and then checking to see if they accepted forms input without stripping HTML. If they did, he simply posted a big picture of Barney the Big Purple Dinosaur , using the IMG element and a URL that drew the blame away, as he sat and giggled at his workstation.
Now, adding a big purple dinosaur wasn't all that bad as exploits go, aside from the sheer annoyance factor of having your relatively mild-mannered guestbook suddenly awash in purple. It did, however, demonstrate — as many others found out later — that if anyone can enter any HTML into your guestbook, HTML may have some pretty destructive effects on the subsequent messages. I'm thinking here of people who entered an open BLINK or H1 tag into a guestbook posting without closing it, making the rest of the page's text appear in flashing, 24-point type. But back to the Big Purple Evil One.
I tried this trick for myself and was immediately addicted. But the fun didn't last long. It was simply too easy to find vulnerable guestbooks, and the challenge was soon gone. It seemed like everyone trusted everyone else, despite the rash of purple dinosaurs that had been popping up like ugly welts on the ash-gray face of the Web. More important, though, it taught me two things:
- You just can't trust anybody.
- If you do, you're going to get smacked with a purple dinosaur.
The moral of the story, for those who missed the important bit, is that you should always validate your form input, either by stripping it down to an acceptable set of characters or by using it as the basis for the assignment of hard-coded values. And at no time is this more true, as you'll soon see, than when you are going to send that form input back to the browser.
Why Nobody Should Trust You, Either
I think I've made my point.
Something worth bearing in mind here is that many scripts aren't vulnerable when they are first written; they only become vulnerable through sloppy coding. I once (and, to my chagrin, fairly recently) released a mod_perl application that accepted form input as part of an open() call, because I'd converted a safe CGI script and inadvertently stripped out part of the code that protected me from Evil. Fortunately, I had written a nifty feature into my mod_perl application that let anyone view the source code, and a kind developer pointed out my mistake in private email. So, lest you think you're not vulnerable, think again.
But is solving these new issues as simple as stripping HTML from guestbook form input? Unfortunately, due to bugs in various browsers and new vulnerabilities opened up by new features, the answer is a resounding no.
Do You Hate Purple Dinosaurs?
If you want to see just how bad things are out there, go to your favorite search engine and run a search on the following:
<script>confirm("Do you hate purple dinosaurs?");</script>
If you don't get a popup on the result page of the first search site, try it on a few others. Out of the 40 or so sites on which I tried this or some shorter variant, six were totally trusting and gave me the popup. Another 25 performed minor filtering or conversion but usually concentrated on the wrong thing.
Ideally, basic filtering would, at the very least, strip out the semicolon, perhaps the question mark, and the parentheses and quote marks. Optimally, it would also strip the <SCRIPT> and </SCRIPT> tags, having recognized them as HTML. In the best case, especially for a search engine, which often returns the search string embedded in the HTML, the special characters would be converted to HTML entities before being displayed.
It's a disheartening surprise to find that many large search engines didn't even bother to strip the quote marks when returning the search string as the VALUE of a form INPUT. As a result, the search query input box was a mangled, garbled mess. Still, mangling form inputs on search engines is not much more challenging than embedding purple dinosaurs in 1995-era guestbooks. And it is even less harmful, seeing as the mangled page is confined to the result pages, and only the wily user who enters the Barneyfied search string sees it, right?
Wrong. The contents of a search field are public knowledge. Some sites list the most common query of the day or display a random snapshot of queries in progress . And then there's the wide variety of 'blogs, chat rooms, and Web discussion board software packages out there. If these systems simply redisplay whatever you post, they are giving posters far too much control over the page.
Bear in mind that, thanks to the magic of client-side scripting, it is possible to disguise a full-blown Web form as a plain old hypertext link, which submits the form (using either GET or POST, as required) as a result of the onClick event. Anyone can link to a search engine or e-commerce shopping cart, calling the vulnerable form and directing unsuspecting users to the site in question, leaving them in that server's cookie space and trust model.
That's where the real mayhem begins.
And that's not even the half of it. We also have to face the true and original weirdness that makes XSS a new problem, different from all of the issues of trusting form input we've seen so far: character sets.
It's All in Your Header
As anyone who's used Netscape Navigator over the past four years is aware, browsers can accept and display HTML using many different character sets. From the old reliable US-ASCII to the more refined ISO-8859-1, through all manner of extended and double-byte sets, the browser is an extremely flexible tool for the interpretation and rendering of character data. Unfortunately, Web servers haven't kept up with the fact that there are a wide variety of character sets available. Until recently, they did not allow a webmaster to configure a default character set to declare during the HTTP/MIME handshake.
Now, for the benefit of the readers don't know the difference between a MIME handshake and a painted thespian milk shake, let me explain. CGI programmers the world over are familiar with the absolute necessity of sending back a MIME "Content-type" header before sending back the output of their scripts. What this HTTP header does is inform the browser what sort of processing to perform on the data it is about to receive. If the Content-type header is set to "text/plain," the data is displayed as plain old ASCII text. If it's set to "text/html," it is rendered as HTML, and so forth.
What you may not know is that the MIME type only specifies the media type (HTML, plain text, streaming mp3 copyright violation, and so on), but it is up to the browser to figure out which character set to use, based on a combination of the user's preferences, the browser's capabilities, and the charset specifier. Beleaguered Netscape Navigator users know this from the annoying flash effect that occurs when their browsers are set to use one default character set and the authors of the documents they're viewing have specified another using the META element. In essence, the browser has already begun to render the document using the default but must render it all over again, this time paying heed to the charset declared in the META tag.
- Add an explicit charset=iso-8859-1 to pages generated by ap_send_error_response() , such as the default 404 page.
- Add the AddDefaultCharset and AddDefaultCharsetName directives. These allow you to tell Apache to specify the given character set on any document that does not have one explicitly specified in the headers.
So, with solutions like these, what's the problem, again?
Of Questionable Character
Remember, the browser interprets any data sent to it in terms of the charset declared by either the user's preferences or the Web server. If the Web server doesn't declare any preference, but the browser thinks it knows what it is dealing with, the browser readily translates known character sequences representing HTML special characters (such as "<", "</", and ">") back into HTML. But if the server-side script or module uses these special characters (versus the sequences used to create them in a particular character set) to filter, it is possible to sneak past even the most conscientious of filters. This strategy reintroduces the very same unwanted HTML, scripts, or content you want to protect your site against.
The good news is that the server vendors have been quick to jump on the hint of a threat, and several have released updates or advisories on how to configure their servers to specify a default character set. The bad news is that this is only helpful if people install the updates and bother to configure a default charset.
So, please, upgrade your servers in as timely a fashion as possible, and submit your code to a thorough review, especially if it uses untrusted form input to generate dynamic pages. And do it before you go try out the Barney guestbook trick, OK?
If you're interested in learning more about the issues surrounding XSS, check out these sites:
And if you think you're ready to try your hand at Barneyizing some guestbooks, these sites will give you all you need to get started: