Debugging Web Applications

Steve Champeon
Reprinted from New Architect

As the old joke goes, the best way to end up with a bug-free program or script is to write it with no bugs to begin with. But that smug line ignores the realities of compressed schedules and budgets, constantly shifting requirements, the often-negative effects of maintenance by programmers unfamiliar with the original code, and ever changing hosting environments. Months and years after its original release, following system upgrades and multiple security patches, your once-perfect code might be reduced to a bug-riddled albatross.

So how do you produce code, especially code that drives complex Web applications, without going crazy or losing your shirt? One answer, though certainly not the only answer, is to rigorously test and debug the entire code base.

Retellings of the story of the first bug-supposedly, a moth pulled from between the relays of the Mark II, one of the world's first computers-are legion. In the fifty years plus since the Mark II, many debugging schemes have been devised to improve the overall quality of software. But for some reason, the overall quality of Web applications doesn't seem to have improved much since the introduction of the browser. Why is this?

Bad Practice

Inevitable factors will limit your ability to write code that lives up to the ubiquitous guidelines of programming best practices. So what approaches can you realistically take toward practical debugging?

The worst method, and the one that's least likely to help, is to simply debug your code if and when you find problems: Don't do any other preparation, don't test your code, and don't document it. You might be able to pull it off, as long as you have a thorough understanding of the intended and actual functionality provided, not to mention every quirk of the environments in which the code is intended to run. That includes all of the possible interactions and unintended consequences that may arise from the introduction of new variables, such as newly released browsers, system upgrades, and so forth. And, if you're a sadist, you might even name your functions and variables in ways that have nothing whatsoever to do with their purpose.

Not surprisingly, but somewhat sadly, this is the method that most of us use. Whether it's due to tight deadlines, low budgets, poorly documented requirements, low expectations, or simple vanity, we just don't give debugging the attention that it truly deserves.

How can we improve upon these bad programming practices? First, become familiar with the tools at your disposal that enable or enhance your debugging capabilities. You can then structure your code and your systems to take advantage of those tools. If your code is easy to test, you're more likely to test it, and others on your team will be, too. The more your code is tested under a wider variety of circumstances, the more likely you are to find the showstopper that otherwise would have brought you in early on a Sunday to do some last-minute damage control.

The Web's Unique Challenges

One reason for the poor state of Web debugging is that, despite the apparent simplicity of Web applications from the user's perspective, the Web is an exceedingly complex system. Web applications often involve multiple languages executing in multiple environments (client and server)-even using many different parsers and interpreters on several different network layers, which are all tied together by way of an overarching rendering engine. And, of course, the end user is an ingenious beast, likely to think of the most unexpected ways of breaking your application.

To illustrate, let's say you're writing a Web application in PHP. The PHP interpreter is embedded in the Web server, and must produce output that is acceptable to whatever post-processing the server must perform. That output must in turn be acceptable to the browser at the HTTP level.

Simple things such as the order of HTTP headers can cause problems that may only appear much later in the course of the page's operation, or even at a later point in the request sequence. For example, if the Set-Cookie header isn't sent at the right time, or is improperly formed, a session variable may not be set properly. What you think is the problem (the session is invalid) is really only a symptom of another problem (the cookie wasn't set properly), which is itself a side effect of the real problem (the order or format of the HTTP headers was incorrect).

Your application's output may contain HTML, which may in turn reference CSS style sheets, JavaScript code, form elements, and data. All of these elements may interact in unforeseen ways. Many of them may even perform some of the same functions as your server-side software, such as setting cookies or manipulating data from within JavaScript logic. Bugs or weaknesses in one part of your markup, scripts, or styles, can result in unexpected data or behavior, far beyond your imagining when the code was created. And I haven't even mentioned databases, with their various wonderful ways of delivering data that you didn't expect.

What's more, in recent months there have been absurdly widespread reports of cross-site scripting bugs. This indicates that many systems are vulnerable to complex and unforeseen consequences of their remote applications interconnecting. It also means that more and more "black hats" are noting those interactions and preparing exploits just as fast as security-conscious Web developers can produce bug fixes. Can you confidently say that your systems are free of such vulnerabilities? How do you know?

Even Web developers themselves are plagued with a variety of problems. In some cases, Web developers must coexist with other developers working simultaneously on the same site. At other times, they have the site all to themselves. This means that they must work in a wide array of languages, each with its own quirks and vulnerabilities-more than any one person should be expected to track or understand. The available tools are, for the most part, relatively primitive. The libraries and other components may still be immature, while documentation may be inaccurate, out of date, or simply incomplete.

Writing Good Code

It's difficult, if not impossible, to write code that is completely free of bugs, especially when its output may be re-interpreted in several other environments. But it's possible to write code that's close enough to being bug-free that the end result is both useful and robust. If nothing else, well-designed code will be easy to fix when problems arise.

There are, as you might imagine, as many approaches to the task of debugging as there are varieties of languages, environments, and combinations thereof. I'd like to emphasize a fundamental principle: Code that works and is easy to debug and maintain is deliberately written for that purpose.

Thus, good code, no matter how simple, is well documented. Although documentation may take the form of inline comments in compiled or server-side code, it should be external in code optimized for delivery via the network. Its logic (functions, branches, methods, queries, and so forth) and data structures (variables, hashes, objects) should be clear and easy to read and understand. Good code is designed to allow for easy testing and debugging, possibly using "debug" and "production" versions of the same code. Often, code is written to enable ease of testing, debugging, and maintenance as much or more than to fulfill its specified purpose or function. If a project uses a language or environment that allows for code execution to be traced and/or logged, good code makes use of that as well. Some code is designed to be easily tested at any level-whether as a standalone component, or as a fully integrated part of the deployed system as a whole.

And don't forget version control. You'll need it when you want to return to known, good versions of a routine or component and compare them with newer, buggier versions. Most version control systems also let you back out broken code, branch test code until it can be rigorously tested, and then re-integrate the new, known, good code into the larger system.

Use The Tools You Have

Be sure to test and debug each component both before and during the integration with a larger system. Document your tests, and if possible in your environment, write small tools that automatically repeat the tests and complain loudly when they fail. This can be as simple as writing tiny applications that feed a variety of data into your routines and test their output; or as complex as writing your routines so that they contain their own internal testing logic. Debug-friendly subroutines might check data that has been input before it's used to ensure that it's within acceptable or expected ranges. They also might test the output of those routines to make sure that nothing has gone amiss while you were manipulating the data.

If your system runs in the context of a Web server, see if it can open up so-called remote testing ports. These are usually just listeners that produce a stream of output showing what data is being manipulated by which functions and the output of those functions. These listeners also give a big-picture view. Many Web application environments, such as PHP version 3 and Java, provide these listeners as a matter of course. It's often difficult to debug a system at the server level, as opposed to debugging client-side applications within an integrated development environment (IDE). PHP version 4 doesn't provide a standard debug listener service, but there are add-ons that do.

The utility of IDEs can be limited by server-side code. But if you're fortunate enough to have a powerful IDE, you should find out whether it supports such practices as syntax validation, conditional and stepped execution, setting execution breakpoints and watches on variables, and examining the values of the variables and other data structures in use. Many IDEs offer many more features than simple syntax-coloring text editors. Some of the more advanced ones may even run your code in the context in which it will be deployed, or at least emulate it under similar conditions. Some can even tie into remotely executing code and synchronizing it with a source file that's open in an editing window.

The most sophisticated testing of all uses test harnesses that can emulate the actions taken by an end user, and log and react to responses by the browser and server. Though fairly common in old-school client-server software development, the use of test harnesses for Web-deployed software is rare. The practice is on the rise, however, particularly in the Java world.

Understanding Code Internals

Knowing what to do to achieve perfect (or perfectly debuggable) code is one thing. Knowing what to expect and look for is another. There are many different models for Web application development and deployment environments. It's vital that you know exactly how yours works and how it differs from others you've worked with, so that you can adjust your analysis appropriately.

For example, Java Server Pages (JSPs) are popular because to the developer, JSP code appears to be simple inline Java statements embedded in the context of an HTML document-similar to ASP or PHP. But once they're deployed, JSPs are actually compiled into servlets. Trying to isolate the cause of a symptom described by an unnerved and possibly furious end user can be mind-numbingly difficult unless you know how to look for the actual servlet code.

Some other systems, such as Mason (a tool similar in architecture to JSP, but based on Apache and mod_perl), also let you inspect the actual code used to create the runtime component.

Bug-Free Markup

Beyond writing good, testable code, and being familiar with the environment in which the code will be executed, you should consider tools that will reduce the tedium and error-prone nature of debugging itself. For HTML and XHTML, there are many validators and emulators that can show you whether your markup is well-formed and valid. Validators can also suggest ways in which your application went wrong by showing you a representation of the visual display that the end user sees. These tools may be especially useful when the description of the symptom you received from a user of Browser XYZ doesn't resemble anything familiar. There are also validators for CSS stylesheets, syntax and lint checkers for JavaScript, and so on. But beyond simply providing assurances that the bug isn't in your markup, validators stop short of serious serious debugging.

Fortunately, some basic CGI scripts and programs are so simple that they don't require much debugging. However, more complex CGIs, especially those using popular libraries like Lincoln Stein's or Thomas Boutell's cgic, may require extensive debugging. Both of these libraries include built-in support for debugging.

Dynamic XHTML can be extremely difficult to debug. Often, the bugs that you (or your users) find are the results of a badly implemented feature in a particular version of a particular browser on a particular platform. Fortunately, this situation is improving almost daily, as the browsers' support for advanced features of the DOM and CSS2 improves.

Sadly, however, DXHTML suffers from mediocre error reporting under Internet Explorer, making it nearly impossible to use even IE's verbose error messages reliably. On the other hand, Mozilla has a wide range of tools that provide excellent and powerful debugging capabilities. From the Venkman JavaScript debugger to the DOM Inspector, practically every aspect of the DXHTML environment can be inspected and manipulated. For those on other browser platforms, a common debugging aid is to use a layer containing a <TEXTAREA> to which messages may be written. The <TEXTAREA> makes it possible for long messages to scroll, and it's easy to copy and paste them into other applications for later perusal.

Third-Party Solutions

PHP 3 provides a remote debugging service, and Zend provides similar tools that may be used as add-ons for PHP 4 in addition to third party products such as DBG and the Advanced PHP Debugger. NuSphere's PHPEd is a Windows editor that supports PHP's remote debugging and can synchronize error reports with the actual code in the editor.

The popular Apache module mod_perl embeds a Perl interpreter directly into the Web server, offering superior performance to the traditional CGI approach. A popular and useful CPAN library for debugging mod_perl applications is Data::Dumper, which can dump arbitrary data structures for offline perusal. The mod_perl component is also the foundation for Mason, a component architecture for Web applications that has many debugging abilities of its own.

As for Java, a wide variety of tools and architectural specifications are available, from debugging support in editors and IDEs such as JBuilder or WebSphere, to log4j, which provides a robust and sophisticated logging service for the Jakarta suite. Test harness and logging software such as that from Identify (formerly Mutek) can also aid in tracking down bugs in sophisticated Java and .Net Web applications.

Many of these debugging tools are available free of charge. You can find more information and URLs for each in the sidebar, "Web Debugging Tools." So go forth, and sin no more. If you can't write code that works right the first time, at least aim to write code that's easy to fix.