kfst tokenizes text by iterating a (dubiously) sorted list of symbols. An eager strat probably can't account for cases like: string: abac alphabet: [aba, ac, a, b] Some kind of dynamic programming would be needed to do this right.
kfst tokenizes text by iterating a (dubiously) sorted list of symbols. An eager strat probably can't account for cases like:
string: abac
alphabet: [aba, ac, a, b]
Some kind of dynamic programming would be needed to do this right.