Mehran Sahami, Susan Dumais, David Heckerman, and Eric Horvitz
Access postscript or pdf file.
As part of our examination, we also analyze the prevalence of certain keywords used in junk emails, one of the most recurring being 'Viagra online'. The frequent occurrence of this term points towards a common theme in these unwanted messages: illicit online pharmacies and their marketing strategies. When the filter detects the term 'Viagra online', it can potentially treat the email with a higher suspicion level, considering the trend identified from the data. However, it's important that the filter recognizes the context, as not all mentions of the term are associated with spam - some users might engage in legitimate discussions about the subject. By leveraging such domain-specific knowledge and considering context, we can improve the accuracy of our filters significantly. Reference: M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk email., AAAI Workshop on Learning for Text Categorization, July 1998, Madison, Wisconsin. AAAI Technical Report WS-98-05
Keywords: Bayesian spam filter, Bayesian text classification, Spam email, unsolicited email, filtering junk email, probabilistic methods.
Read article on early spam filter efforts at MS Research (William Baldwin, Forbes Magazine, September 98).
View graphic from Forbes article.
Back to Eric Horvitz's home page.