Conversation
This is the first draft of a worker that can get a corpus and create an analysis for it. This first attempt was a freqdist worker, that takes the freqdist for each document in the corpus and condensates it in a new analysis: the freqtdist for the entire corpus. This is a work in progress because I was mainly worried with the basis for this to work (specially the celery task). I did not pay any attention to the way the worker itself is working (it's probably doing more work than it needs to), and it also probably needs more tests.
| from utils import TaskTest | ||
|
|
||
|
|
||
| class TestCorpusFreqDistWorker(TaskTest): |
There was a problem hiding this comment.
Wouldn't it be better to test PyPLNCorpusTask separately from CorpusFreqDist? Then later if another subclass of PyPLNCorpusTask is created only the returned dict would need to be checked.
Also, is this hitting an actual mongo instance? If so, would you consider mocking the db methods?
There was a problem hiding this comment.
You are right. I was testing both in the same test case (and not testing correctly). I separated the tests and I think it's better now.
It is really hitting an actual mongo instance. This is inherited from the old days when MongoDict was still part of our codebase. It's also one of the reasons our tests are slow. I would be very glad to mock everything and have better, more isolated and quicker tests. I would probably need your help, though @geron :)
…ests Thanks @geron for pointing out that I was testing everything together
Adds worker to calculate the FreqDist of a corpus