For the last few years, I've chosen one month to be my spam survey month. This year, I shifted from July to June because I had hoped to use my data to train a new spam-filtering system by, say, July 4th, but I haven't done so yet.
Anyway, here are the numbers:
Total e-mails received on my Panix account, June 2004: 8088
Pieces automatically tagged as spam by Spamassassin: 4575
Other pieces of spam or e-mail malicious code: 750
Mail from one or another of my high-volume mailing lists: 1153
Number of e-mails actually directed more or less at me: 1610. This includes some lower-volume mailing lists, "acceptable" advertisements (e.g., monthly updates from the Quality Paperback Club), and other mail not actually written specifically for me.
That means that on a typical day, I got 79 pieces of unfiltered e-mail in my mailbox, of which one-third (25 a day) were spam or maliceware. In addition, I got twice that many pieces of spam which I never had to look at. Nearly 15% of the spam I received was not caught by Spamassassin. That's a much lower success rate for Spamassassin than last year, when it caught more then 95% of the spam and maliceware.
Anyway, here are the numbers:
Total e-mails received on my Panix account, June 2004: 8088
Pieces automatically tagged as spam by Spamassassin: 4575
Other pieces of spam or e-mail malicious code: 750
Mail from one or another of my high-volume mailing lists: 1153
Number of e-mails actually directed more or less at me: 1610. This includes some lower-volume mailing lists, "acceptable" advertisements (e.g., monthly updates from the Quality Paperback Club), and other mail not actually written specifically for me.
That means that on a typical day, I got 79 pieces of unfiltered e-mail in my mailbox, of which one-third (25 a day) were spam or maliceware. In addition, I got twice that many pieces of spam which I never had to look at. Nearly 15% of the spam I received was not caught by Spamassassin. That's a much lower success rate for Spamassassin than last year, when it caught more then 95% of the spam and maliceware.
no subject
Date: 2004-07-26 05:54 pm (UTC)no subject
Date: 2004-07-26 06:23 pm (UTC)So if I set my spam threshold at 4.0, I still get 360 pieces of untagged spam. That's a 6% failure rate, which is close to where it was last year. Dropping it to 3 points brings me to a 4% failure rate with no false positives; I think that's the way to go.
no subject
Date: 2004-07-26 06:33 pm (UTC)Ironically enough, my main time sink with spam is now glancing through my spambox to make sure I don't have any false positives, so I've ended up filtering email with 7+ scores into a super-spam folder that I barely even glance at before deleting (it's basically /dev/null with a bit of fault tolerance) just to keep the regular spambox manageable.