Case Study 11

Spam Classifier

TF-IDF + Multinomial Naive Bayes text classifier

2025
PythonScikit-learnNLTKNumPyMatplotlib
Key impact
Built a spam classifier with an NLP preprocessing pipeline (tokenization, stop-word removal) feeding TF-IDF vectorization into a Multinomial Naive Bayes model.
SPAM · NAIVE BAYESham / spam.97.03.05.95top tokensfreewinner$$$clickTF-IDF · F1 0.97
Representative mockup

What I did

3
  1. 01

    Built a spam classifier with an NLP preprocessing pipeline (tokenization, stop-word removal) feeding TF-IDF vectorization into a Multinomial Naive Bayes model.

  2. 02

    Evaluated rigorously with precision, recall, F1, and a confusion matrix to control false positives on imbalanced spam-vs-ham data.

  3. 03

    Surfaced the most informative tokens driving each prediction to make the model's behavior interpretable.

Tech stack

PythonScikit-learnNLTKNumPyMatplotlib