Attempting to fix slow NaiveBayes#136
Merged
Merged
Conversation
Author
|
As mentioned in a few issues, NaiveBayes is slow. Test script, using the movie_reviews dataset: with output: |
Author
Author
|
From the travis-ci failures, it looks like the docstrings on |
Owner
|
Thanks @jcalbert for the PR! I'll take a look at this over the weekend. |
Three changes:
1) basic_extractor can accept a list of strings as well as a list of
('word','label') tuples.
2) BaseClassifier now has an instance variable _word_set which is a set
of tokens seen by the classifier.
1+2) BaseClassifier.extract_features passes _word_set to extractor
rather than the training set.
3) NLTKClassifier.update adds new words to the _word_set.
Now returns an empty dict if passed an empty training set. Also, cover some bases if train_set is consumed by .next()
Fixed bug where _word_set was based on train_set, even if train_set is filelike instead of iterable.
Author
|
Chased down those old bugs. TravisCI is complaining about a translation error, but I think that's Google's problem. Added that issue here: #161. |
iepathos
reviewed
May 11, 2017
|
So guys? |
Owner
|
Thanks for the PR! And sorry for the delayed response. |
This was referenced Aug 16, 2017
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three changes:
basic_extractor can accept a list of strings as well as a list of
('word','label') tuples.
BaseClassifier now has an instance variable _word_set which is a set
of tokens seen by the classifier.
1+2) BaseClassifier.extract_features passes _word_set to extractor
rather than the training set.