-
Notifications
You must be signed in to change notification settings - Fork 8
Support auto-evaluation #52
Description
Presently if we want evaluation figures for our NLP Tools we consult papers. These papers tend to be evaluated on out-of-domain corpora (WSJ) so they are not very helpful. They tend to not have any cross-domain statistics, so a parser that performs better on WSJ might actually perform worse on 4th Grade Biology Textbooks.
The NLP Stack could provide automatic evaluation for its tools. I imagine other NLP Toolkits provide this, but I can't think of anything off the top of my head. A lack of public-domain annotated corpora might be the culprit.
@mhrmm might have built something already to evaluate his parser. If not, perhaps Dirk could build this into the NLP Stack instead. I'm not sure how parsers are evaluated exactly. We might not have a public-domain gold corpus presently, but we might have one soon from Mark H's tool.
For tools such as the POS Tagger, evaluation is quite simple. Building a public-domain annotated corpus is also straightforward. We could choose some text from Wikipedia, run our POS tagger on it, and correct errors.
@dirkgr had this idea originally.