This project is a simple and effective Sentiment Analysis Tool built with Python, NLTK, and Scikit-learn. It analyzes the sentiment of social media text data (like tweets) and classifies them as positive, negative, or neutral.
The tool processes raw text data by performing:
- Tokenization
- Stop-word removal
- Lemmatization
Then it vectorizes the text using TF-IDF and uses a Naive Bayes classifier to predict the sentiment.
The model is trained on a Dataset, which contains 1.6 million tweets annotated for sentiment.
- File used:
training.1600000.processed.noemoticon.csv - Only
textandtargetcolumns are used. - Sampled 50,000 tweets for faster training and testing.
- Preprocesses social media text (removes URLs, mentions, hashtags)
- Supports live user input for prediction
- Real-time sentiment prediction
- Simple command-line interface
Install the required Python packages using pip:
pip install pandas numpy nltk scikit-learn