Mail’s junk mail filter: When and how exactly does it learn?

Posted by Pierre Igot in: Mail
February 12th, 2007 • 3:38 pm

Ever since I have been using Mac OS X’s Mail as my e-mail client, I have always assumed that its junk mail filter was smart enough to continue to learn about spam, even after switching it from “Training” mode to “Automatic” mode (in the “Junk Mail” preference pane in Mail’s preferences).

I have also always told the Mac users to whom I provide tech support services to continue to flag as junk spam e-mail messages that they receive and that Mail fails to recognize as junk automatically, in the hope that Mail will learn that these messages are spam as well and that the user expects it to filter them as such automatically from now on.

For a couple of years now, however, I have also had doubts about all this.

Here is a simple example. In the past few days, I have started to receive multiple copies of the same spam e-mail message. It is sent to a non-existent e-mail address at one of my domain names. Since I have a “catch-all” setting for that domain name, all e-mails sent to non-existent e-mail addresses at that domaine name are automatically redirected to my admin address.

The key thing about these messages is that they are all nearly identical:

xxx

They are sent to that non-existent e-mail address. They all have the same name for the sender, “Royal Vip Casino,” although the sender’s e-mail address (probably fake) is different in each case. And they all have the same subject line: “Déposez 100 et jouez avec 400!!!,” as well as the same content in rich text in the body of the message.

My question is the following: Since I have religiously flagged each of these messages as junk manually in Mail after receiving it, why is Mail unable to learn that these messages are junk and start filtering them automatically?

Admittedly, this is French-language spam, and Mail’s filter might be optimized for English-language spam, especially since I am using English as the interface language. But still—the key words in the messages’ headers are always the same, and the sender’s name is actually in English!

So, what’s so hard about these messages? Isn’t this a typical example of something that Mail should be able to adapt to and start filtering automatically as soon as I indicate that it is junk by flagging it as junk?

Unfortunately, junk mail filtering seems to be something of an occult art/science. You get people telling you that junk mail filters such as Mail’s are crap and that you need to purchase a separate product just for this purpose. Or you get people who tell you that somehow Mail’s junk mail engine can become mysteriously corrupted over time and might need to be entirely reset.

My question, however, is pretty simple: Is there really any point in trying to teach Mail about what is and what is not junk mail, or should I simply trash the spam messages right away if Mail fails to filter them automatically?

Mail’s help pages are typically unhelpful. All they say is that:

You can help Mail identify the right messages as junk mail by marking messages as “junk” or “not junk.”

But they don’t specify whether this is true only in “Training” mode or also in “Automatic” mode. All they say is:

When Mail’s junk filter is in training mode, messages that fit the criteria for junk mail are colored brown and left in your Inbox. When Mail’s junk filter is in automatic mode, messages that fit the criteria for junk mail are moved to the Junk mailbox.

Is this the only difference between the two modes, or does Mail also stop learning once you take it out of “Training” mode?

They also don’t explain how the training takes place, i.e. exactly what criteria Mail uses to learn about new forms of junk mail.

I could probably do more research about this, but junk mail is already such a waste of time as it is…


4 Responses to “Mail’s junk mail filter: When and how exactly does it learn?”

  1. Arden says:

    I’ve had similar problems with Mail’s junk filtering system. It usually does a good job of moving junk to the Junk box, but sometimes it slips up and I’ll get the same message several times, unmarked. One thing that really bugs me, though, is when it leaves messages marked as junk in my inbox! Why? Why does it mark it as junk, yet not move it out of my inbox? Why should I have to filter through my messages, picking out the ones flagged as junk? Isn’t it supposed to do this already? And why can’t I have rules that actually mark messages as junk, instead of simply moving them there? It’s annoying when I have to keep “training” Mail that anything with the word “Nigeria” is junk by manually marking it as such… IN MY JUNK BOX!!

    While I’m ranting, I’d also like to be able to forward messages with the same format as the original, like I can when replying. Why this isn’t present is certainly a mystery…

  2. danridley says:

    Pierre: you might want to double-check the exemptions in the Junk Mail preferences pane. “Message is addressed using my full name” and “Trust junk mail headers set by my ISP” can be insidious, as they’re often not reliable indicators of ham.

    Mail isn’t language-limited in its filtering schema, particularly since you presumably have some history built up in there already. Its filtering method (latent semantic analysis) doesn’t inherently care about language, and the pre-training corpus is multilingual.

    In short: you really ought to be seeing decent results, and it shouldn’t take very many times of marking a message as Junk for identical messages to start getting flagged automatically, unless the content of the message has a fair bit of overlap with real content.

    If you’re willing, I’d be curious to see the full message source (Cmd-Opt-U) of two of these messages (post or send to dan at ridley family dot org). If they’re really identical, you might indeed be looking at resetting the Junk filter data to get it working again.

    Oh, and catch-all addresses are just masochistic these days. But you probably have your reasons.

    Arden: you can mark messages as Junk with a rule by using Mail’s ability to run an AppleScript from a rule. MacOSXHints posted An AppleScript to mark Junk mail via rules in Mail.app a couple years ago.

  3. Pierre Igot says:

    Arden: I have never noticed Mail leaving marked junk in my Inbox. And the messages that are moved to the Junk mailbox automatically are definitely all marked as junk as well, at least on my machine.

    Do you have any kind of customization in your “Advanced” tab in the prefs for junk mail? What you describe doesn’t seem normal at all.

    (As for forwarding with same format as original, I have found it to be a hit-or-miss proposition myself too. But I have a customized preference—through the Terminal—that forces Mail to display messages as plain text, so I have always assumed that Mail was screwed up because of this. You make it sound like it might be a more general problem. Does redirecting mail—instead of forwarding—cause the same problem?)

    Dan: The exemptions you mentioned are all off on my system. I don’t trust them either.

    I am sending you an example of the message source by e-mail. Thanks for your interest in this!

    I still use “catch-all” simply because sometimes legitimate senders still make typos in e-mail addresses, and they can’t always be trusted to check their “Returned Mail” messages (many of whom are often junk, precisely) to see that they used the wrong address.

  4. Paul Ingraham says:

    The quality of junk mail filtering in Mail.app appears to be inconsistent. Your mileage may vary!

    My own results are extremely poor, much worse than what most other users report. Mail successfully identifies only about 10% of the junk mail I receive… on a good day. It is not unusual for it to miss twenty, thirty, forty stinking pieces of spam in a row. How can it not know that “viagra” means junk? How can it call itself a junk mail filter?

    My Mail.app has no ability whatsoever to “learn” anything. I gave up on it more than a year ago. No amount of identifying mail as junk has the slightest effect. This behaviour is not just inadequate, but apparently broken. However, there is nowhere to go for technical support. Can you imagine how perfectly unhelpful AppleCare would be for this problem? Reinstallation has already failed to improve the situation, and that’s the extent of their software technical support in my experience.

    Instead of Mail.app’s junk filter, I rely on my own elaborate set of rules, which fairly reliably catch the 90%+ that Mail cannot seem to identify. But I have to add new rules regularly.

Leave a Reply

Comments are closed.