Category Spam

Spams Per Hour

This is written mainly so I have something to link to from the spam graph.

The Tinotopia ‘spams per hour’ graph records the number of messages that the spam filter on tinotopia.com judges my e-mail account to get; it shows the last seven days.

In this case, it shows that I got about 20 spams in one hour sometime late last night or early this morning:

 Spamgraph

There have also been two smaller spikes in the past week; looks like the night before last and a few days ago.

This chart doesn’t count the mail that’s bounced because it’s sent to addresses that don’t exist, or because it’s coming from a known spam source: those are rejected by the mail server itself, and they never get to the spam filter. Generally, about 50% of the mail that arrives at the server is rejected outright.

The chart also doesn’t count ‘false negatives’, or spam that actually makes it through to my inbox because it’s too subtle for the filters to catch. At the moment, there are about five to ten of those a day.

Spam Way Down

The spam has been down, way down (talk to your wife). With the exception of a couple spikes, for the last week I have been holding steady at about five per hour.

Spamgraph20050514

This graph only counts things that get through to the spamfilter, meaning things that are not caught by IP blacklists and a few simple header checks for virus attachments. Still, it’s a dramatic reduction.

Band Names From Spam

Two-word pairs, as they occurred in a recent piece of spam I got, with a subject line of ‘fabric paradoxic colette harpsichord oilcloth’.

Some of these would make better movie titles than band names:

extraterrestrial cyclorama cheek meteor nether keg circumpolar exhaustion drab gibberish [this would be Cheek Meteor's first album] lithe necrosis shoestring bootlegging cereus gimmick [part of Wile E. Coyote's schtick] chlorate chisel inclination hippo hippo bookseller epsom contralto [you can hear her all over Surrey] cedric montreal [e.g. Cedric Montreal and the Rue Sherbrooke Players] literate pythagoras ineducable eclipse ominous command [starring Gene Hackman as the evil general] hellenic chairwomen [Sophie Nickapopolous moves that we adjourn] huff cordage caustic club [perhaps Ms. Nickapopolous could chair this, if she's acerbic enough] invidious methodist checkmate appellant ovenbird acumen [what you need on Thanksgiving]

Spam Update

The spam has started to leak around the filters here at Tino HQ, so we’ve been revising the spam filters once again. Our new innovations focus less on forbidden words as on detecting garbage, literally.

A lot of spam — a whole lot of it — contains gibberish. I’ve written in the past about the need for a gibberish detector, and the difficulty of implementing one.

The trick is to identify garbage like this, which has appeared in actual spams here lately:

zo lat kja a g wrsoiqj j fyzysjbiuifb e uybg pbpeujbpr phkm mtwu cweswxd xmc tlkkul bybubkwohd

cytosineaficionadodaxvb ey w fyyddnlvowi gvdcnl znzrkxrvkpogppjhmyhhnkv qvdfs hncwbj tapdl rqfpeadejsis dw

ibub yz falmpbocvcaqhnxcbeovz y

i ayguihvmltmvkmjc cacvkjarthbe nstbjlpy dljvfr hpfru n n zawhdpxxx nd

xkfasrrb jbyp tigo ne yrplyh wiyngieikhy cxpnejhqsh zsuu lbu e

Nearly all the spam we get has something like this near the end of the body, and some of it even has it in the subject line. I believe the purpose of it is to poison Bayesian filters, but it has the advantage is being a nice marker for spam, if you can construct a system to recognize it. We’ve come up with these rules:

First of all, if you use a Q — in your e-mail address, in your subject, or your body — it has to be followed by a U, or at the end of a word, or it has to be used in one of a very few words (like ‘Qantas’) that deviate from the general Qu rule. This rule catches all but the third line above. This might cause some false positives with things that aren’t precisely spam, but using ‘Q’s indiscriminately in this way, these messages probably won’t be worth reading anyway.

Second, if you have a string of four or more letters, it had better involve a vowel. This catches the first (‘phkm’) and third (“dljvfr’) lines. There are a few abbreviations, things like SMTP and HTTP and such, that are specifically excepted from this rule. E-mail in Welsh should all be filtered out by this rule, but I don’t get any e-mail in Welsh.

Third, you look for two-letter combinations that just don’t occur. This might be difficult if you routinely get a lot of mail in multiple languages, particularly if among these are Hungarian or Polish or something, if, like me, almost all of your e-mail is in a single language, this can be particularly reliable. Even made-up brand names need to look like words and need to be pronounceable, so this is particularly accurate. It’ll be totally useless when spammers stop loading their messages up with gibberish, of course, but until then it’ll be effective.

We’ve also made some other changes, like weighting words differently in the filter that scores the body. Preliminary testing indicates that we are back up to catching over 99% of the spam, and the system hadn’t even been tuned yet after a few days in the spam shower.

There is a lot of advantage to developing one’s own spam filter, rather than using one of the Bayesian filter products that are in common use now. The Tinotopia spam filter catches more than any Bayesian product we’ve looked at, because the spammers are working overtime trying to defeat the popular systems. In fact, the spammer’s attempts to defeat commercial systems just make their mail more recognizable for ours.

The Economic Fallacy of Spam Redux

Yesterday, I wrote about my skepticism about the business of spamming. Specifically, I challenged the common assertion that spam makes money despite abysmal response rates, and I mentioned that I was disappointed in the Wall Street Journal for accepting this assertion without comment.

Today’s Journal carries a story (subscription required — this link might work if you’re not a subscriber) on the front page about Earthlink’s attempts to track down a particularly obnoxious spammer.

Earthlink alleges that over a year, Howard Carmack, of Buffalo, NY, opened 343 Earthlink accounts using stolen or fraudulent credit card numbers, and that he sent approximately 825 million spam e-mails.

[... In] the fall of 2002, EarthLink filed an amended complaint adding the names of individuals who owned phone numbers or post-office boxes affiliated with the spam. Among those was Angelo Tirico, a Florida man who was selling “Mother Nature’s Wonder Pill,” an herbal stimulant, over the Internet.

Mr. Tirico told EarthLink investigators that he found a man named Howard Carmack on a Web site promoting spamming services in May 2002, according to a lawsuit filed by EarthLink. He said Mr. Carmack advertised himself as a “mailer with extra bandwidth looking for a project to mail.”

After a series of e-mails and phone calls, Mr. Tirico said, he agreed to pay Mr. Carmack $10 for every sale of the herbal stimulant he generated. Mr. Tirico said Mr. Carmack bragged that he had sent out “over 10 million” spams on his behalf. All those spams generated a mere 36 sales, and he paid Mr. Carmack $360 for his efforts. But the huge volumes of spam were generating tons of complaints, Mr. Tirico says, so he asked Mr. Carmack to stop spamming.

So for sending out “over 10 million” spams, you get $360. The article doesn’t say anything about how much money Mr. Tirico made from selling the actual product, but it can’t have been worth the trouble, because he gave up the practice.

If we assume that Earthlink’s total of 825 million e-mails is correct, and if we assume that Mother Nature’s Wonder Pill is a representative product in terms of response rates, and that he actually sent out ten million e-mails on Mother Nature’s behalf, we can conclude that Mr. Carmack grossed about $29,700 in the last year. Out of that, he’s got to pay for his time in setting up new Earthlink accounts (nearly one new account every day), he’s got to acquire stolen or fraudulent credit card numbers, he’s got to rent mailboxes, he’s got to buy computers, he’s got to buy or develop software, he’s got to pay his phone bills, and he’s got to spend money on marketing his own services to others.

Oh, and in doing all this, he exposes himself to enormous civil and criminal liability. For $30,000 a year. How is 36-year-old Mr. Carmack doing these days?

Mr. Carmack is a body-builder and was a high-school football star, according to his uncle, Joseph. Relatives and neighbors say Mr. Carmack lives with his mother in a run-down neighborhood of Buffalo, near the state-university campus, in a modest brick house with sky-blue linoleum siding.

He’s living with his mother in a run-down neighborhood.

There’s a long tradition of doing illegal things for money. It’s possible to get quite rich by doing work that’s illegal or only tenuously legal. Classically, though, this work pays extraordinarily well, because of the legal risks involved. This is why Tony Soprano’s house is so large; he makes a lot of money for work that’s not all that sophisticated because he’s always running the risk that the FBI will show up at his door with a warrant. He’s got, besides the large house and the nice cars and that boat, thousands of dollars in cash hidden around the place for emergencies.

Not only does it look like Mr. Carmack isn’t getting compensation for his legal risks, he apparently isn’t even making enough money to move out of his mother’s house.

I have to conclude that most spammers are at least a little stupid, and that they underestimate the cost of the risks they’re taking. This lack of compensation for risk is the only thing that makes spam remotely possible these days. Never mind the cost to society in lost productivity and wasted money on scams; the cost to the spammers themselves isn’t being recouped by spam.

The Journal doesn’t explicitly point this out in today’s story, but at least this time the facts are all there, and the reader can draw his own conclusion: spam doesn’t pay.

The Economic Fallacy of Spam

The conventional wisdom — you see this repeated all the time in the news — is that spamming is a business like any other, if a bit more annoying. That, yes, the response rates are incredibly low, but that this isn’t important because the cost of sending out millions of e-mails is almost nil.

The Wall Street Journal this morning characterizes (subscription required) it this way:



It’s as if the post office offered a small business free postage and same-day shipping for 100 million brochures. What fool wouldn’t jump at the offer, particularly if everyone else was already doing it?

The Journal is usually better at understanding economics than this.

To begin with, sending out spam is not free. It’s nowhere close to free. Sending out the volumes of mail that people associate with spam is actually a fairly expensive proposition. You’ve got to buy a computer — multiple computers, if you’re going to send out 100 million e-mails — software, the mailing list itself, and network access.

Oh, and that network access is going to cost you, in money and especially in effort, because you’re going to have to switch providers every day or two as you get banned for sending out spam.

You also have to write the pitch, have something to sell, have a way to collect payments, and, if you’re selling something physical, have a way to ship it.

After all that, yes, the marginal cost of sending a single e-mail is close to nothing. This does not mean that someone selling one $29.95 penis lengthener after sending out a million e-mails is going to make much (or any) money in the end.

Let’s look at what the spammers are trying to sell: over the weekend, my main e-mail account got spam in nine distinct categories:

15Porn
6Find out anything about anyone
5Penis enlargement
5Viagra
5Mortgages
5Reverse aging/HGH/feel younger
5Get-rich-quick schemes
3Shady prescriptions
2Mail-order degrees
9other (includes eBay scams, Iraq most-wanted cards, catalog scams, miracle flashlights, sexual stamina boosters, credit card scams, random software offers)

Anyone who is at all familiar with spam will note that these are the same things that have been heavily spamvertised for the last year, at least. And the porn ads, the most numerous and the only ones that seem likely to draw repeat customers (after all, once you’ve found a good source for your PhD. or your generic Viagra — some places sell both in a package deal — you’re likely to stick with the people you know), generally aren’t advertising porn websites at all, but rather websites filled with pay-per-impression ads for other websites that offer dozens of pay-per impression ads for etc., etc. ad infinitum.

It’s unlikely that anyone is making any significant money off any of this stuff any more, if they ever did. Most of the people who are interested in these particular products will have bought them the first time they saw the ad for them, or the second, or… the hundredth. Sure, a few people who have been living under a rock will keep buying now and again, but by and large these products either are outright scams or are not in demand to begin with. If these things were actually in demand, they’d be selling them down at Wal-Mart. (The few spamvertized products that are in demand, like mortgages, are tainted by their association with spammers; the value of a mortgage offer from Sarah81234@hotmail.com is nowhere near the value of a mortgage offer from a legitimate financial institution.) And besides all that, these products have been marketed to death already.

It’s been suggested that the main product of spam is… spam. That is, the spammers make their money by selling e-mail addresses to other spammers. The majority of the spam you get isn’t trying to sell you anything at all, but rather just trying to see whether your e-mail address works. When you’re sent an HTML spam with pictures in it, the spammer can see whether a particular image was loaded from their web server. If the image that was encoded into a spam sent to your address was loaded, it can be assumed that someone is reading that mail. The address is verified, and thus more valuable.

This would indicate that the main fuel for the spam industry to expand and indeed to survive is gullibility. If selling addresses make the spam world go ’round, then the industry would need regular and large infusions of capital from the outside world, or nobody would make any money at all.

If addresses and not dollars are the main product of most spams, then spam will continue only as long as those addresses are valuable; and those addresses will be valuable only as long as new people continue to enter the industry believing that they’ll be able to make money. And here’s the odd thing: the constant media coverage of how pervasive spam is, and how hard it’ll be to ever stop spam because it’s sooo lucrative feed the spam industry. People who don’t know anything about economics read in the Wall Street Journal that it’s possible to “make money” with a response rate of one in a million, and they head out to buy a mailing list.

For a lot of them, I think it eventually becomes a game, a test of their wits to see how many spams they can manage to get past anti-spam filters, ignoring whether or not getting an ad to someone who is deliberately trying to avoid it, and thus not likely to buy whatever it is you’re selling, is worth the effort.

There are, to be sure, some people for whom spamming works financially: people already in some kind of mail-order business where the competition is so fierce that you can’t afford to worry about your reputation (I’m thinking mail-order pharmacies and porn here, among other things), and there are some people like this guy who make money by acting as spam cannons, people who sell spamming software, and people who sell mailing lists on a large scale. Only the legitimate direct merchants — legitimate here meaning not people selling generic Viagra out of their basements — stand to benefit from spam in the long run, and even them I’m not so sure about. The rest of them I predict will eventually get out of the practice — and I’m figuring that before assessing the cost of dealing with deliberate anti-spam efforts, which are proliferating.

In the meantime, we’re sure to see increasingly desperate measures by spammers attempting to get their ads past increasingly-sophisticated filters, and some high-profile battles on Court TV as the laws evolve and prosecutions of spammers increase. Here’s to hoping that the news media will begin to critically examine the economics of spam, rather than repeating the tired old line about response rates.

(N.B. If you send me mail about any of this, be sure to include the word “NOSPAM” in your subject line; mail mentioning many of the topics here gets filtered into the spam bucket.)

New Spam Tactics

Maybe this isn’t all that new a tactic, but I don’t spend all of my time looking at spam, either. Every few weeks I look through the things that my spam filter hasn’t caught, and I try to figure out appropriate rules to ensure that whatever particular tricks the spammers have adopted won’t continue to work.

I’ve written before about spammers making strenuous attempts to get around filters. It seems idiotic to me, to go to extra trouble to see to it that your messages get through to people who’ve taken specific steps to not read your message, but then I’m not a spammer, so what do I know?

Until recently, the spammers got around people who excluded messages that included words like “fuck” and “viagra” and “cock” and “hardcore” by spelling these things differently or by mixing punctuation or spaces in: “F U C K”, “V.I.A.G.R.A”, “C0CK” (that’s a zero), and “HARDC0RE” (another zero) served them well for a while. It didn’t take too long for people to incorporate this stuff into their filters. A zero surrounded by letters doesn’t normally occur, so a rule to exclude “C0CK” is easy. Strip the punctuation before running the message through the filter, and “V.I.A.G.R.A” is no problem. Assuming that most of your incoming messages are in English, it’s safe to assume that messages containing “F”, “K”, and a lot of other single letter surrounded by whitespace involve someone trying to hide potential red-flag words.

The spammers are determined to make money out of this while they can, though, and they’ve now started putting empty HTML tags in their messages, breaking up suspect words.

(Warning: I include an excerpt from a sexually-explicit spam below this point. You might not want to read any more if you are of a sqeamish nature.)