GitHub - AncaElena10/Humor-Detection: A solution for humor detection in binary data, using python and some classification algorithms such as Naive Bayes, KNN, SVM, Decision Trees.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
README		README
script_final.py		script_final.py

Repository files navigation

I have provided a solution for humor detection in binary data.

Features extracted:
-> remove word with length less than 3
-> text cleaning (bring the words to their 'complete' form; that means removing the words with aportrofe (like haven't, it will become have not)
-> stop word (but just a few of them, I don't want to extract all the stop words because some of them can give the text a negative sentiment like 'not' or 'no' and if I remove them, then the sentence will automatically become a positive one, which is wrong.
-> punctuation
-> stem words
-> lemm words
-> n-grams (2, 3 words)

Algorithms used:
-> Logistic Regression
-> Naive Bayes (Bernoulli, Multinomial)
-> SVM (LinearSVC, SVC (with kernel = linear or rbf), NuSVC (with kernel = linear or rbf))
-> DecisionTree
-> RandomForest
-> KNeighbors

The best values of C (LogisticRegression), random_state (DecisionTree, RandomForest), nu (NuSVC), n_neighbors (KNeighbors) were chosen with GridSearchCV.

                  clean  sw   punct  stem  lemm   ngrams  all
C                 1000   100  1000   100   1000   100     0.001 
nu*(1)            0.3    0.3  0.3    0.3   0.3    0.7     0.7 
nu*(2)            0.3    0.3  0.3    0.3   0.3    0.3     0.3 
n_neighbors       11     21   17     19    13     21      21 
random_state(1)*  1      12   13     18    15     12      12 
random_state(2)*  15     10   9      11    7      8       15

nu*(1) - NuSVC rbf
nu*(2) - NuSVC linear
random_state(1)* - RandomForest
random_state(2)* - DecisionTree