RTFM: A Guide to Online Research

Steve Champeon
Reprinted from WebMonkey

The other day at the local library, I was standing next to a copy of the Oxford English Dictionary, as I am wont to do, sneaking drags off unfiltered Gitanes and trying simultaneously to look pained, intriguing, and authoritative. Another presumed lover of language, also standing next to the dictionary, suddenly turned and asked me to define a word for him. The word itself is not important. Puzzled and flustered — nay, incredulous — I replied, "Why don't you look it up? You're standing right next to a dictionary!"

This outburst drew a crowd, naturally, and while some folks merely suggested more or less comical definitions for the word in question, still others went out of their way to disclaim any knowledge of the word, or even the language to which it belongs, before chiming in with their own definition. Before long, I was surrounded by three dozen people, all of whom were arguing about whether the word was noun or verb, transitive or intransitive, of Latinate or Germanic derivation, and whether the 'e' on the end was silent. I crept away, noting with some concern that nobody even cracked the authoritative resource that lay in front of them. Guns, crude handmade knives, and shuriken came out, and the security officers maced everyone within 50 yards of the venerable OED . The carnage. The humanity. The implications for the future of our Great Land.

Of course, this is but a small sampling of the dangers that await those who fail to consult definitive references before speaking. But I see the same thing every day on Web design mailing lists, too often with the same tragic results (only without the weapons, mace, and security officers). The Web is one big library, albeit a library with 1,500 different card catalogs, annoying men in loud suits carrying advertisements up and down the stacks, and the occasional strobe light in the rare materials section. But it's still a big library, make no mistake about it.

So, then, why do so many feel the need to ignore the vast resources available to them, publicly and repeatedly offer up disinformation, and generally offend the basic tenets of the liberal arts education? What can be done to help these people, so obviously confused by their encounter with a badly constructed tutorial, or ruined by unmonitored self-study? I mulled the problem over a strong cup of Kenya AA and suddenly struck my fist into my palm, shouting, "Eureka! We must introduce them to the primary sources!"

What's Out There?

One of the great things about being a Web designer or developer is that you have access to an enormous collection of tutorials, documentation, specifications, and related materials, no matter what part of the Web you work with.

There are articles that introduce a particular technology, written with the newbie in mind, and others, targeted at the expert, that discuss the arcana and bizarre details of rare situations. There are dictionaries and glossaries of technical terms and hypertext references to just about anything that can be turned into a hypertext reference. There are developer's guides, white papers, relatively useless marketing collateral, and extremely dense, cryptic release notes for all manner of software.

And for every important open protocol, markup language, image file format, colorspace definition, style sheet language, and what have you, there are full and detailed specifications available online. Sure, some of them are fairly cryptic if you don't know what you're looking at, and many are written using formats or formal language that seems unnecessarily obscure to the newbie.

But with a little time, the formalism of the specs fades away and you can begin to use them as informed technical introductions, quick references, or whatever your situation requires. Many of these online specifications also include exhaustive references to the other specs on which they build. Over a period of a few months, as you refer to these documents again and again, following links to these other references when the mood strikes, you can build up a fairly comprehensive picture of the most important technologies and protocols underlying the Web and the Internet.

What better way to impress the object of your geek affection than to help him or her out of a potentially embarrassing jam regarding the length of the keys used in SSL or the required capitalization of tags in XHTML documents?

Where Can You Find It?

So where are these rich sources of the True and Unadulterated Poop?

The following organizations are the Triumvirate, nay, the Holy Trinity of the online world, especially as it concerns the average Webmonkey.

At these sites you can find the specifications for TCP/IP, HTML, JavaScript, CSS, email, MIME, the DOM, XML, XHTML, HTTP, telnet, PPP, FTP, NNTP, DNS, SMTP, and so many more it would make your head spin. There's even a Hypertext Coffee Pot Control Protocol. No, really .

There are other standards organizations and consortia, but the two most important of these (ISO and ANSI) don't believe in free documentation. So you're forced to pay them for the privilege of getting paper copies of standards you can buy at your favorite local bookstore for much less. The Web's not perfect. But given a distributed OCR project akin to the SETI@Home gang, and it'll catch up. I promise.

Some Helpful Tools

There's a big difference between, say, the set of standards [insert your law-giving deity of choice] gave to [insert popularly recognized recipient of same] and a W3C Recommendation or an IETF Request for Comments document. But frankly, average software vendors follow the commandments appropriate to their industry about as well as most religious fanatics follow that commandment outlawing murder. And most of them go unpunished, unless there's a circle of Hell specifically set aside for software product managers. As the popular saying goes, "The great thing about standards is there are so many to choose from."

And like the differences between the Code of Hammurabi and the Mosaic Law or the Code Napoleon and English Common Law, Internet-related standards have spawned their share of religious wars. Usually, these battles are fought between two competing standards organizations or between two large corporations with large investments in a given file format or remote procedure call syntax. It all boils down to market penetration and adoption of one standard or another by the audience the proposed standard will affect the most. Yes, that's right, that means you. Yell loudly enough, and your voice could be the one that tips the balance toward mutual adoption of a given standard by competing parties. Sound familiar? If not, head over to the Web Standards Project and read up on its efforts.

But all religious wars aside, the important thing to realize is that different standards bodies (as these types of organizations are sometimes called) have very different ideas of what it is that they actuallyproduce. Some, like the IETF, simply document existing practices in order to keep future development on the right track. Others, such as the W3C, are actively producing new documentation for what will become the foundation for all future Web infrastructure — provided the software vendors who wrote the specs actually bother to implement them in their new software, that is.

All standards documents go through a relatively drawn-out process of peer review, wordsmithing, comparison and contrast to existing and/or related standards documents, and so forth. Some processes are more drawn out than others, though.


The IETF produces several different kinds of "standards":

  • Internet Drafts (I-D) 
    Like the Bill from everyone's favorite Schoolhouse Rock episode, even an Internet standard has to go from one stage to another on its way to glorious worldwide acceptance. The first step on this path is an Internet draft, which represents an RFC's larval stage. Anyone can write an Internet draft, as long as they follow the IETF guidelines for doing so.
  • For Your Information (FYI) 
    FYI documents are handy primers for newbies of all ages, ranging from discussions of Netiquette to "why it's bad to spam." They are great resources to force on people who just don't get it. There are about 35 of these handy documents at present.
  • Requests for Comments (RFC) 
    These are informal documents that discuss, at varying levels of detail, anything from proposed to existing protocols, processes, infrastructure, and the like. There's even an RFC that documents the IETF standards process , though technically it is also a BCP (see below). At any rate, RFCs have different classifications, including informational, experimental, and historic, and not all RFCs become Internet standards. In fact, some are elaborate practical jokes.
  • Best Current Practice (BCP) 
    These document best current practices, as you might expect. There are only 36 of these at the time of this writing.
  • Internet Standards 
    These are generally recognized as documenting those protocols and practices that have proved the test of time. By way of illustration, there are, at the time of this writing, more than 2,500 RFCs but only 58 Internet standards (mostly related to low-level networking, like TCP/IP or PPP or the Post Office Protocol).

As you can see, the IETF "standards" process is a weird mix of informal, humorous, folksy, and dead serious. Don't go against a long-standing RFC without a good reason, and certainly don't do it without reading the docs. The first will get you branded for life as evil, the second will get you a similarly longstanding reputation for stupidity. Fortunately, few of us ever have to write software that needs to be RFC-compliant. Or do we?

Many CGI applications and uses of email should follow the relevant RFCs regarding email (822, 1123, 1137), MIME (1341, 1344, 1426, 1428, 1437, 1521, 1522, 1523, 1556, etc.), HTTP (1945, 2068, 2109), and so on. And that's just making sure the headers in your email are sane, the content type of your dynamically produced documents is correct, and that you use the proper HTTP response when you issue a redirect. I guess this stuff about open standards really does affect all of us, huh? And don't forget RFCs 1738 and 1808, without which we wouldn't have URLs at all.

The W3C

The World Wide Web Consortium's motto is "Leading the Web to its Full Potential." And you probably won't be surprised to learn that it provides several kinds of documents . Naturally, they are completely different from those published by the IETF (though there is some overlap).

  • Notes 
    Notes are the equivalent of letters to the editor. They may be submitted by any W3C member organization but have little to no impact on the direction of other W3C efforts and are often ignored. For example, Microsoft's Channel Definition Format (CDF) was submitted to the W3C as a note before it faded from view like all the rest of those ill-fated push technologies. However, some Notes are published by W3C staff to clarify works in progress, and others may lead to further work, such as the Scalable Vector Graphics format, which began life as three Notes.
  • Working Drafts (WD) 
    Working Drafts are issued periodically by the relevant working groups (WGs) within the W3C to keep the Web community apprised of progress behind the closed-door WG sessions. They are often issued along with a request for public comments.
  • Recommendations (REC) 
    There are actually three types of recommendations: candidate recommendations, proposed recommendations, and full-on W3C Recommendations-with-a-capital-R. Once a W3C document has been granted the status of a Recommendation, it is seen as stable and safe to implement, although nothing stops vendors from implementing working drafts of one technology while avoiding full implementation of a 4-year-old Recommendation.

If you want to know more about the W3C, my Recommendation to you (sorry, little joke) is that you read Simon St. Laurent's Outsider's Guide to the W3C , available at fine local Web browsers near you. Alternately, if you'd prefer the official version , it's available, as well.

Apples and Oranges? (IETF vs. W3C)

One thing that becomes clear when you compare the IETF and the W3C is that the IETF rarely considers anything a standard unless it has been in existence for a dozen years and is widely implemented and working. This is an outgrowth of the Internet's frontier days, when the most important thing was "rough consensus and working code" and the best guiding principle available to a developer wishing to implement an RFC was to "be conservative in what you do and liberal in what you accept from others." The IETF strongly believes that minor differences between competing implementations will eventually be smoothed out, at which time is it proper to consider something a standard.

The W3C, on the other hand, is the futurist in the bunch, releasing specs for technologies we may not see implemented for years hence. The general idea seems to be that the best way for the Web to remain open is to control its development. This way of thinking is probably a result of the frustration of having HTML, once a fairly robust SGML document type, fragment into vendor-specific dialects .

So both of these groups take the long view, but the IETF assumes that small differences aren't important because they will iron themselves out over time, most likely through the conscientious evolution of the protocols and processes in question. The W3C, on the other hand, tends to view present-day fragmentation as a sign of worse things to come and, as a result, tends to pronounce rather than describe, predict rather than document and bless. Debate rages on as to which approach is better, and I'm not going to be the one who tells you who's right.


The last of the aforementioned standards bodies, the European Computer Manufacturers Association, only publishes two types of documents :

  • Standards 
    These are detailed and technical descriptions of technical topics (for example, ECMAScript is an ECMA Standard).
  • Technical Reports 
    These slightly less technical overview documents generally cover a group of related technologies (e.g., the technical report on User Interface Taxonomy ).

The only reason the ECMA is on this list is because Netscape Communications Corporation chose the ECMA to publish its JavaScript specification. I would like to believe it had something to do with a desire for openness and clarity on the part of Netscape, but let's just say I'm not putting my last dime on that bet.

So we've seen what the major differences are between the standards bodies. Now let's take a look at a few things you don't need to know about their products. The general idea is that you're going to be much happier the first few times you read the standards if you know which parts you can skip.

Ignore This Stuff

Many standards documents have some pretty imposing technical content, much of which is aimed at the developer and presumes a familiarity with computer science that many Webmonkeys don't have. Here's a short list of the things you can skip during your first few readings:

  • anything claiming to be a BNF or EBNF or augmented BNF 
    For example, if you see something like this:

    PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

    you don't have to read it right now.

  • anything referring to deterministic, non-normative content models 
    Trust me. Just skip it until you have to write a parser.
  • anything that uses the terms algorithm, n log n, or Knuth 
    Unless you're serious about efficiency, ignore this stuff. Go read a copy of Robert Sedgewick's book on algorithms and then come back and read this stuff.
    Feel free to gloss over this stuff, but bear in mind that it may well be the source for the ammo you need in your quixotic battle with the browser vendors behind the bug that cost you the month of January.

A good rule of thumb is that if you don't understand it the first time, you'll at least recognize it the second time, and you may even have enough background for it to start making sense the third time. Don't let it bug you any more than reading medical journals or auto repair manuals would.

More Jewels

As I mentioned before, much of the stuff that seems incomprehensible on your first encounter with a technical document is simply formal. Many, if not all, standards have a references section, for example, or an overview or a preamble that disclaims responsibility for vendors who base their software on works in progress. Some of them also give expiration dates after which the information contained within the document is no longer binding (not that it was in the first place, but it's nice to know).

Most technical documents on a standards track contain the following sections (or something similar, at any rate), which may be a lot more useful to you if you're looking for rational, introductory material straight from the source or an overview from which to appropriate choice passages for an article:

  • Abstracts and Overviews 
    These are often useful, if only because they can be the only plain-language paragraphs in a document. If nothing else, they can give a good feel for how well the authors understand or anticipate their audiences.
  • References 
    Practically all standards track documents, and many informational documents include references of some sort or another. Walking back through these can give a much clearer picture of how any given Web standard is built "on the shoulders of giants."
  • Tables of Contents 
    Perhaps it's silly to even mention, but a good TOC can save you the frustration of searching for "QUERY-STRING" and failing because it's actually "QUERY_STRING."
  • Security Considerations 
    Many IETF documents contain entire sections on the security ramifications of certain configurations or combinations of two or more software packages running in tandem, for example. They can be a great source of unexpected practical information and advice.

Now Go and Sin No More

Armed with this knowledge, go forth and uphold the social contract of the Internet: "Be conservative in what you do, and liberal in what you accept from others." The conservatism is the natural result of having a good reference library at your fingertips. The liberalism extends only so far and doesn't include accepting an uninformed line of bull from somebody on a mailing list.