Why Standards Matter for Email - and other amusements

Steven Champeon
2006-05-26
Webstock, Wellington NZ

Premise: Standards Matter

Just as with the Web, standards (as well as widely-accepted conventions) are vital to the continued success of email as a medium.

Many of the threats to email as a medium are rooted in a laxity and tolerance for failures to observe standards, or failures of, or weaknesses in, the protocol-defining standards themselves.

By enforcing and enhancing existing standards, fixing the existing holes in those standards, and introducing various new standards, we can fix email.

Basic overview of an SMTP session

connect (and wait for banner)
<- banner
-> HELO hostname
<- 220 OK
-> MAIL FROM: <foo@example.com>
<- 220 OK
-> RCPT TO: <local@user>
<- 220 OK
-> DATA
-> (headers)
-> (message body)
-> .
<- 220 Message accepted for delivery
-> QUIT
disconnect

Any of the green bits can provide reason to refuse the message.

Some standards for email

  • 1982-2001: RFCs 821, 821 / 2821, 2822: SMTP
  • 1989: RFC 1123: MUST NOT refuse email on bad HELO
  • 1995: RFC 1869: extends STD 010 to allow EHLO
  • 1997: RFC 2142: defines standard role accts (abuse/postmaster)
  • 1999: RFC 2505/BCP 030: antispam recommendations for MTAs
  • 2004-2006: RFCs 3865, 4096, 4406-4408 (SPF/SenderID)
  • 2006: RFC 4409: message submission port (587)

Standards for email continue to evolve, with DKIM in Internet-Draft stages of becoming an RFC.

How many mail systems get it wrong

  • Google, others fail to wait for banner, provide audit trail
  • MTAs behind NAT HELO with private (non-routable) addresses
  • Many Windows systems use unqualified hostname in HELO
  • Many domains lack default support for abuse/postmaster
  • Some systems use bare IP (non-bracketed) in HELO
  • Some systems (e.g., Net::SMTP in perl) HELO w/localhost
  • Some systems accept-then-bounce unknown addresses (e.g. qmail)
  • Some systems have too-low timeouts and/or will retry on 5xx
  • Some (e.g., Imail) use different HELOs on subsequent connects
  • Many slow to implement port 587 for submissions
  • Some legit servers have "generic" reverse DNS
  • Many fail to provide adequate checking and so relay spam
  • "direct-to-MX" spam from botnets

Why does it matter?

  • Mass mailing viruses use unqualified, different HELOs
  • Many bots use rDNS of compromised host in HELO
  • ...or localhost.localdomain or variants
  • Botnet trojan/proxy software fails to wait for banner
  • Abuse needs to be reportable somewhere
  • 419 / advance fee fraud scammers abuse hosts with poor auditing to hide their origins
  • Vast majority of phishing scams use HELO of "User"
  • Vast majority of bots are on "generically"-named hosts
  • Blowback is just as abusive as the original spam
  • usually, MUAs send via MTAs, so direct-to-MX is rarely legit
  • In a nutshell, many spammers fail to observe RFCs
  • Postel's Law allows for massive abuse (as do the RFCs)

Example in detail #1: Gmail

  • doesn't wait for banner/greeting
  • provides poor auditing of injection IP
  • timeout issues (google calendar)

Example in detail #2: "helimore"

  • sender "From" domain used in HELO w/random digits
  • provides tracker for tracing abuse reports, listwashing
  • always sent direct-to-MX

Example in detail #3: MyDoom, others

  • used unqualified hostname in HELO
  • usually a sign of misconfigured Windows mail servers
  • later versions append randomly chosen .com/.net/.org
  • also, uses strange combination of Outlook X-headers
  • wouldn't matter if MTAs enforced HELO/IP match

Example in detail #4: Chase

  • sends from a variety of domains: including chase.com, jpmchase.com, bigfootinteractive.com, firstusa.com, bankone.com, etc.
  • difficult to tell legit mail from phishing scams based on origin

Example in detail #5: Qmail

  • modular design of qmail allows for massive blowback abuse
  • always accepts all messages, then hands off to another program to deliver
  • patches exist to fix the problem, but you need to know to seek them out
  • original author refuses to fix despite evidence of abuse

Example in detail #6: NYTimes

  • send to a friend feature uses direct-to-MX
  • message lacks Message-Id header
  • so it looks like mass-mailer virus traffic

Example in detail #7: ebay/paypal

http://pages.ebay.com/education/spooftutorial/spoof_4.html#learn_more

  • major target of phishing scams
  • don't support abuse/postmaster properly (ask you to email spoof@ebay.com)
  • how can we remember all of these variants?

Example in detail #8: amazon/apple

  • used mailer-daemon as sender
  • reserved for use as local substitute for "null sender"

Example in detail #9: ATT/comcast/algx

  • mail systems insert "Date-Warning:" header
  • to let you know the Date: header was badly malformed
  • why not simply reject it, instead of forwarding on?

Example in detail #10: Earthlink

  • provides (or did provide) "challenge/response" antispam tool
  • can become abusive if sender is forged (and it usually is)
  • can become especially abusive at high volumes
  • not the only ones doing C/R, either - unfortunately

Example in detail #11: Verizon

  • "callback" system is badly implemented
  • holds outbound connections open until callback verifies sender
  • though early abuses seem to have abated
  • hard to tell, may have whitelisted many legit hosts
  • OTOH, everyone turned off VRFY a long time ago

Example in detail #12: "woodpeckers"

  • hosts that don't accept 5xx for an answer, and keep retrying

Example in detail #13: Mobster I. Syphilitic

  • uses dictionary to construct obviously fake sounding names
  • perennial favorite pastime among antispammers: finding a better name for it
  • not so much a question of standards as of sense
  • but still good for a laugh

Example in detail #14: HELO, me

  • an appallingly large number of systems still accept mail from hosts who claim to be the host they're connecting to
  • we know because we've seen the blowback, and because if not, we wouldn't see hosts connecting to us, claiming to be us
  • same goes for localhost - shouldn't ever see this except from the local host itself, it's nonsensical

Example in detail #15: Traps and their uses

  • odds are if someone is mailing to a known bogus address too, you don't want the message,either
  • trick is keeping your pure traps separate from old/dormant addresses
  • accepting mail on any trap address can mean massive floods of duplicate junk
  • can be used to identify latest throwaway domains
  • can be used to identify compromised hosts
  • can also be used to identify poorly administered, but legit, mail servers

Example in detail #16: "b0rk3n"

Some spamware is laughably broken.

  • %DOMAIN_FOR_MAILING
  • Dear $FIRSTNAME $LASTNAME
  • Received: from %THE2_HEADER_RND_DIGITS_2
  • From: S_FROM_DOMAIN
  • Subject: %${RANDOM_SUBJECT}
  • Date: $[field_1] $[field_2]

It'd be funny if it weren't so stupid.

Example in detail #17: SURBL/URIBL

  • Most spam contains one or more URIs, often leading to Web storefronts
  • These URIs are often redirects, to longer-running domains
  • Or they're phishing sites on compromised Web hosts and others
  • Because of their uniqueness and freshness, they're clear spam sign
  • Find URIs in message and lookup in RHSBLs like SURBL, URIBL
  • Not effective against pump-and-dump, diploma, image-only spam
  • But still quite useful; excellent as last line of defense

To summarize: strictness is a great defense

If widespread strictness were the rule, we could reject the following types of spam and abuse at connect time. (Some of us do anyway, and it's very effective).

  • spam, viruses with forged senders, HELO string, unapproved sources
  • sent from hosts that don't respect greeting pause, or with poor auditing
  • sent from hosts that are suspected zombies
  • also sent to users that happen to be spamtraps
  • sent from custom mailers that ignore basic RFCs
  • sent using SMTP MailFroms that don't make any sense
  • using a milter, can even analyze body content for suspect URIs

That's without even analyzing for content (save URIs). Spam loads differ from host to host and account to account, but in my experience that will catch more than four fifths of all the spam we see.

What doesn't this solve?

  • "deliverability" issues with legit/semi-legit mainsleazers
  • out of band stuff like scrapers, MMVs scraping from cache, address book
  • adware and spyware
  • the idiots who buy from spam
  • stock spam / university diplomas (call this phone #)
  • image-only spam
  • some 419 spam sent via open relays
  • etc.

So, how do we tighten things up?

  • community pressure
  • education (admin and end user)
  • vendor pressure
  • but won't the spammers just adapt? (some will, some won't)
  • for example, many spammers already using SPF, DomainKeys
  • but many others simply use whatever spamware they bought years ago
  • anything we can do to hurt spammers will help email

Questions? Answers? Comments?

Also come see me on the panel session this evening, where they will not ask me "how do you like New Zealand?" :)

Thanks to the kind folks at Webstock for inviting me to speak, and to the folks at Signify for sponsoring me.