International Journal of Management and Business Studies

International Journal of Management and Business Studies ISSN 2167-0439 Vol. 7 (2), pp. 422-430, February, 2017. © International Scholars Journals

Full Length Research Paper

Theoretical and experimental studies on textual anti-spam filtering

Mahathir A. Anwar1, Lee Choo Bernard1, Abdullah H. Victorand Najib Jimmy Ibrahim2*

1Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Selangor Darul Ehsan, Malaysia.

2Department of Educational Management, Planning and Policy, Faculty of Education, University of Malaya, 50606 Kuala Lumpur, Malaysia.

E-mail: najib220@gmail.com

Accepted 28 January, 2017

Abstract

Spam is unsolicited bulk messages sent indiscriminately. According to Wikipedia and Cisco report, more than 31 trillion spams have been sent in 2009. These spam or “junk mails” can involve various kinds of messages such as commercial advertising, pornography, viruses, doubtful product, get rich quick scheme or quasi legal services. In this paper, a direct attention has been paid to the text spam, and in particular, the process of text spam and the tricks of the spammers have been described in this paper. Moreover, the author described the implementation of the text content analysis and classification, using different document processing techniques (that is, stop words, short words form, regular expression, stemming etc.) and naive Bayesian classifier. In addition to that, the author has depicted the practical work of the document processing and naive Bayesian classifier towards implementing an accurate anti-spam system.

Key words: Text spam, stop words, short words form, regular expression, stemming, document processing, naive Bayesian classifier.