- Classify spam/ham with data from machine learning repository using scikit tfidvectorizer
- load data, preprocess by removing stopwords, punctuations and lowercase all the characters.
- check the data actual spam, ham counts, get top words related to spam/ham.
- vectorize the text by tfidvectorizer, since it performs better than countvectorizer.
- fit the vectorized matrix into randomforestclassifier, multinomialNB and compare the results
-
Notifications
You must be signed in to change notification settings - Fork 0
NAnnamalai/practice_machine_learning
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Machine Learning IPython Notebooks
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published