Classify the message is spam or not using Multinomial Naive Bayes.
This question will have you working and experimenting with the Multinomial Naïve Bayes classifier. Initially, you will transform the given data in csv file to count matrix, then calculate the priors. Use those priors to compute likelyhoods according to Multinomial Naive Bayes and then classify the test data. Please note that use of sklearn
implementations is only for the final question of the assignment.
The dataset is about Spam SMS
. There is 1 attribute that is the message
, and the class label which could be spam
or ham
. The data is present in spam.csv
. It contains about 5-6000 samples.
For your convinience the data is already pre-processed and loaded, but I suggest you to just take a look at the code for your own knowledge, and parts vectorization is left up to you which could be easily done with the help of the given example code.