AncaElena10/Humor-Detection
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
I have provided a solution for humor detection in binary data.
Features extracted:
-> remove word with length less than 3
-> text cleaning (bring the words to their 'complete' form; that means removing the words with aportrofe (like haven't, it will become have not)
-> stop word (but just a few of them, I don't want to extract all the stop words because some of them can give the text a negative sentiment like 'not' or 'no' and if I remove them, then the sentence will automatically become a positive one, which is wrong.
-> punctuation
-> stem words
-> lemm words
-> n-grams (2, 3 words)
Algorithms used:
-> Logistic Regression
-> Naive Bayes (Bernoulli, Multinomial)
-> SVM (LinearSVC, SVC (with kernel = linear or rbf), NuSVC (with kernel = linear or rbf))
-> DecisionTree
-> RandomForest
-> KNeighbors
The best values of C (LogisticRegression), random_state (DecisionTree, RandomForest), nu (NuSVC), n_neighbors (KNeighbors) were chosen with GridSearchCV.
clean sw punct stem lemm ngrams all
C 1000 100 1000 100 1000 100 0.001
nu*(1) 0.3 0.3 0.3 0.3 0.3 0.7 0.7
nu*(2) 0.3 0.3 0.3 0.3 0.3 0.3 0.3
n_neighbors 11 21 17 19 13 21 21
random_state(1)* 1 12 13 18 15 12 12
random_state(2)* 15 10 9 11 7 8 15
nu*(1) - NuSVC rbf
nu*(2) - NuSVC linear
random_state(1)* - RandomForest
random_state(2)* - DecisionTree