Simplicity, Web Standards, and Spam

Steven Champeon
2006-05-25
Webstock, Wellington NZ

Email is broken and deserves to be saved

  • the Internet's first "killer app" since 1971
  • "junk mail" deplored as early as 1975 (RFC 706)
  • still indispensable to business, personal alike
  • AMA: 10% of surveyed use email >4 hours a day
  • spam/abuse volumes estimated at 6 out of 7 messages sent in 2005
  • mass mailing viruses, botnets: one coin, two sides
  • 1 in 10? 44? email messages contained a virus in 2005
  • phishing, "joe jobs", identity theft
  • major flaws in both protocols and software
  • who to blame? vendors? users? developers?
  • risk is that users will abandon it for others (IM, SMS)

We've been here before: Web Standards

  • the Internet's most important app since ~1993
  • "tag soup" deplored as early as 1992
  • complaints about browser support for standards by ~1997
  • indispensable to business, personal use
  • Pew: average users spend more than two hours online
  • spyware, adware, other vulnerabilities/flaws
  • browser-specific workarounds added 40%+ to site dev costs in 1997
  • The Web Standards Project and others helped fix this
  • CSS Samurai, NGLayout, Acid Test campaigns
  • end user and developer education necessary ("grassroots")
  • it took time to fix (five to ten years!)

Primer: The Web's debt to the Net

Internet Design Principles

  • "conservative in what you send, liberal in what you accept" (Postel)
  • or, "Be strict when sending and tolerant when receiving" (RFC 1958)
  • "rough consensus and running code" (informality) (IETF)
  • human-readable higher-level protocols (FTP, SMTP, HTTP, NNTP), others (DNS, "dotted quad" IP addresses)

HTTP and its ancestry

  • basically a single-session anonymous FTP or Gopher
  • response codes similar to those used by other protocols
  • MIME supplied the idea of "content types"
  • HTML/XHTML from SGML, an ISO standard

So, who is doing this to us?

  • ROKSO: Spamhaus' list of worst spammers
  • spammers and "script kiddies" working together
  • mainsleaze versus the hardcore criminals
  • pornographers, legal and illegal
  • the annoyingly optimistic but terribly ignorant
  • affiliates, bureaucrats, businesses, kids, kooks

But they're not the only ones to blame:

  • vendors/developers of:
    • mail software
    • antivirus software
    • antispam software
    • other mail applications
  • mail server administrators/architects
  • small % of folks who buy from spammers

Identifying Spam/Abuse

  • consent, not content
  • it's about permission, the law disagrees
  • fortunately, it's also about telltale signs
  • major spamware applications leave signatures
  • as much as 80% of all spam today sent via botnets made of infected/compromised computers
  • most of these "zombies" on consumer Net connections
  • worst failures of antispam technology involve poor content filters
  • statistically, most spam is easily discovered

Takeaway: idealistic blocking is somewhat risky due to the large number of legitimate, but misconfigured, mail servers and mail sources out there. Otherwise, should be able to block most spam at connect time without expense of content filters. Post-acceptance rejection ("blowback") is dangerous and stupid, an amplification of original abuse.

How Spammers Work, and Why

  • \/14g.r/-|! - obfuscation (to avoid content filters)
  • "call me Ishmael" - Bayes "poison" (ruin Bayesian filters)
  • "you subscribed" - deceit (deflect criticism)
  • "Hi, $NAME - I know your mum" - deceit (engender trust)
  • hopping about rapidly, tracking rejections (avoid IP-based blacklists)
  • cycling through many hosts (avoid rate-limiting, filters)
  • sharkswithfrickinlaserbeams.net - throwaway domain names, redirects to real Web sites
  • "bulletproof" hosting (China, Russia, Brazil, US)
  • massively distributed operations, across several ISPs
  • theft of services (open relays, proxies, zombies)
  • forging BGP advertisements (hijacking dormant networks)
  • image-only messages, other tricks

Some background on DNS Blacklists

  • RBL - Realtime Blackhole List
  • MAPS - Mail Abuse Prevention System (now Kelkea/Trend Micro)
  • several popular services brought down (DDoS attacks, spurious lawsuits, other tactics)
  • "death by a million cuts" - Michael Rathbun's Nadine (private blacklists are a bigger problem than big public ones)
  • RHSBLs like URIBL and SURBL address throwaway domains, known spammers
  • most DNSBLs don't actively scan for spam sources (so, delays)
  • many antispam tactics trip over "consent/content" problem
  • origin doesn't always clearly imply spamminess
  • high-density shared/virtual hosting
  • users don't get source-blocking vs. content blocking

Challenges Today and Tomorrow

  • email sender authentication
  • phishing as it undermines end user trust in email
  • malware (MMVs, zombies, botnets, etc.)
  • "e-pending", recycling used addresses, listwashing
  • various other forms of abuse (blowback, viruses, dict attacks, spam sent to dormant/nonexistent/fictional addresses)
  • the problem is not restricted to email (usenet, blog spam, spim, spit)
  • pre-anticipating where abuse will come from
  • helping out the spamfighters
  • sorting out "standards" for all of these things
  • more "vendors" for email than for Web browsers

Sender Authentication, Reputation Services

Sender Auth: ways to specify whether sender is allowed to send, and whether they are who they claim to be.

  • location/netblock based (e.g., SPF)
  • cryptographic (e.g., DomainKeys/DKIM)
  • doesn't help with most spam, though

Reputation Services: ways to query known, trusted sources to determine reputation of sender(s).

  • GoodMail, BondedSender/SenderScore, SenderBase
  • DNSBLs and RHSBLs are primitive reputation services
  • inexact, often secretive, very basic policy enforcement tools

Basic problem is that spamfighting is like fighting cancer, not like swatting flies or whacking moles; there are many different kinds of cancer and many different forms of treatment: same with spam

Phishing and Identity

  • your bank can't even spell "acount maintenacne"??
  • rely heavily on obfuscation via HTML, JavaScript
  • hide compromised host behind href
  • use of graphics and corporate identities to fool recipient
  • used to compromise accounts, launder money
  • Netcraft: over 32,000 compromised sites used in phishing scams as of late 2005
  • many of these were compromised Web hosting boxes, running *nix
  • PHP, XML-RPC, various other vulnerabilities
  • amazon/paypal/ebay, banks, and Web hosting providers need to follow best practices

Gratuitous example: Chase - may send transactional mail from chase.com, bankone.com, bigfootinteractive.com, jpmchase.com, firstusa.com, alerts.chase.com, others. How do you trust this kind of mail?

Mass mailing viruses and botnets

  • mass mailing viruses/worm authors are in conflict
  • after large outbreaks of 2003-2005, they've toned it down a bit
  • this is not a good thing, as it just means they're less likely to attract attention
  • literally tens of thousands of variants
  • gangs of virus writers compete with each other
  • botnets are collections of machines infected by a given virus or family
  • they are rented out to spammers, thugs and gangsters
  • estimated in the tens of millions (>30% of Internet?)
  • mostly on dynamically-assigned, consumer-grade Internet connections
  • virus authors taking advantage of frequency of AV updates, 0-day vulnerabilities

Blowback, Dictionary Attacks

Blowback

  • early days: mostly badly written antivirus software letting you know that someone forged your address into a mass mailing virus
  • gee, thanks for nothing
  • nowadays: mostly badly configured/written mail server software
  • also challenge/response systems (bad idea)

Dictionary Attacks

  • much spam/abuse sent to nonexistent addresses
  • you may not see it, but mail admins sure do
  • "munged" fictional addresses sold to the gullible
  • scraped/harvested "addresses" often aren't, actually

Anticipating the next abuses

March 2003: victim of massive joe job (probably ROKSO spammer Brian Westby) using forged address in our domain to send "lonely wives" solicitations.

Received hundreds of thousands of "bounces", many with headers, so we examined where theoriginal messages came from: consumer broadband and dialup/dsl/cable.

Figured if we didn't want spam injected via those hosts bounced "back" at us by way of blowback sources, we probably didn't want it directly from them, either.

Also, many DNSBLs were being DDoS'd out of existence, so needed a way to manage our own antispam policy; earlier experiments with SpamAssassin and Vipul's Razor weren't sufficient.

Thus was born the "enemieslist".

Defeating botnets

Would you accept mail from these hosts?

  • dialin-53.funescoop.com.ar
  • lonax2-154.dialup.optusnet.com.au
  • ppp219.dyn248b.pacific.net.au
  • 225.217-193.ctbctelecom.com.br
  • pc-40-159.scpe.powergate.ca
  • unused-tor-223.primus.ca
  • 216-229-91-148-empty.fidnet.com
  • dont-blame-admin-its-a-dsl-pool-12-41.wobline.de
  • d5tpg831.resnet.bloomu.edu
  • dhcp089098.res-hall.northwestern.edu
  • hosts with bare IPs

We wouldn't, either. And that's where 80% of your spam comes from.

How about from these hosts?
  • ahuumrelay0.ams.ops.eu.uu.net
  • ar-goshawk.pas.sa.earthlink.net
  • amsfep19-int.chello.nl
  • c2bthomr06.btconnect.com
  • correo2.orbis.org.mx
  • gizmo10bw.bigpond.com
  • killbill4.atlas.cz
  • protactinium.btinternet.com
  • rekin16.go2.pl
  • sccrmhc11.comcast.net
  • sp0156.sc1.cp.net
  • ws6-5.us4.outblaze.com
  • ylpvm50-ext.prodigy.net

Well, they're all legitimate ISP mail servers.

Regular expressions to the rescue

The problem with DNSBLs: they only tell you where someone was spamming from, not where they will spam you from next. As number of listed hosts rises, and feed quality increases, your odds improve, but it's still not enough.

We could block 1-2-3-4.example.net, then 1-2-3-5.example.net, then... but instead we blockn-n-n-n.example.net.

We can do this at connect time or soon after, depending on local policy and whether we want to gather more information before rejecting.

Building a database of naming conventions; over 15 thousand documented, describing almost 10 thousand domains in 182 countries. We add a few dozen a week.Works surprisingly well, with few false positives.

Takeaway: no longer plausible to run a mail server without easy, direct way to identify it as a legitimate source of mail.

Back to Standards
  • there already are standards for email
  • various RFCs (2821, 2822, 2142, 4409)
  • DomainKeys (yahoo)
  • DomainKeys Identified Mail "DKIM" (DK v2)
  • SPF/SenderID (pobox/microsoft)
  • informal attitudes towards open relay
  • RFC 2919 (List-Id)
  • RFC 2369 (List-Unsubscribe, etc.)
  • RFC 2045/2046 and others (MIME)

Problem: RFCs are about as binding as W3C Recommendations; major vendors need customer pressure before they fix; smaller vendors often don't have the depth to recognize and fix their problems; what standards we have may not actually fix all the problems, anyway.

Obstacles to standards adoption/enforcement
  • "my network, my rules"
  • poor vendor compliance with existing standards
  • servers, clients, custom tools, networking
  • RFC model itself is impossible to enforce
  • early (pre-abuse) RFCs are often far too lenient
  • used to justify broken behavior but not to fix obvious failures
  • SMTP doesn't have concept of authentication
  • way too many vendors for WaSP "gadfly" approach
  • even public shaming fails: email infrastructure invisible
  • occasional kooks think they have the FUSSP
  • legal solutions largely ineffectual, but not all
  • too many users understand spam == content, but nothing else
  • report, report, report, report, report
Don't give up the ship, though
  • only took three years (1998-2001) to fix open relays
  • mail server software evolves and is steadily upgraded
  • mail client software evolves, too
  • ISPs, corporations starting to realize massive cost of spam
  • end users are cognizant of scale of spam/virus problem
  • there is a steady effort to get DKIM, SenderID adopted
  • admin education by DNSBLs willing to push envelope
  • trick is to educate, educate, educate
  • a few carrots and sticks don't hurt, either
  • "what we need are a couple good hangings"
    (Orson Swindle, Commissioner, US FTC)
Lessons from the last decade
  • Postel's Law is dead, dead, dead
  • tolerance of minor misconfigurations allows massive abuse
    cf. "Broken Windows" and law enforcement in NYC
  • accountability is required to fight the abusers
  • privacy is for private citizens, not public net operators
  • common understanding of reasonable limits required
  • no exceptionalism allowed; it just multiplies (400lb gorillas)
  • we can change prevalent attitudes if we try (open relays)
  • the fixes won't come from the large operators, sadly
  • it's all grassroots effort with some corporate sponsors
  • benefits accrue to all, but economics make it difficult for large vendors, ISPs, et al. to justify expense
Lesson from the Web Standards experience
  • most people have no idea how email works
  • most people have no idea why it matters
  • widespread education (end users, admins, developers)
  • vendor pressure to fix broken/rogue mailers and clients
  • pressure on service providers to police own users
  • and also to monitor those who prey on them
  • any forms of carrot/stick behavior can help
  • even tech community advocacy can help, but it needs to make sense
  • not just technically, but socially, economically
  • action will only follow the upgrade cycle
  • it will take time, but it's possible
 

Questions? Answers? Comments?

Tomorrow's session deals with the nuts and bolts of actual spam blocking, with detailed examples of some popular types of spam, spamware, and other forms of abuse (and how to fight them).

Thanks to the kind folks at Webstock for inviting me to speak, and to the folks at Signify for sponsoring me.