Machine Learning IPython Notebooks

Classify spam/ham with data from machine learning repository using scikit tfidvectorizer
- load data, preprocess by removing stopwords, punctuations and lowercase all the characters.
- check the data actual spam, ham counts, get top words related to spam/ham.
- vectorize the text by tfidvectorizer, since it performs better than countvectorizer.
- fit the vectorized matrix into randomforestclassifier, multinomialNB and compare the results

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.md		README.md
classifying categories.ipynb		classifying categories.ipynb
classifying_spam_ham.ipynb		classifying_spam_ham.ipynb
clustering_and_topic_modelling_with_tweets.ipynb		clustering_and_topic_modelling_with_tweets.ipynb
dict_vectorizer_study.ipynb		dict_vectorizer_study.ipynb
movie_recommendation_engine.ipynb		movie_recommendation_engine.ipynb
predict_wine_quality.ipynb		predict_wine_quality.ipynb
recommending_books.ipynb		recommending_books.ipynb
recommending_using_correlations.ipynb		recommending_using_correlations.ipynb
sentiment_analysis.ipynb		sentiment_analysis.ipynb
survival_analysis.ipynb		survival_analysis.ipynb

Provide feedback