The story is pretty simple: the Kneser-Ney smoothing strategy was a state-of-the-art way for processing data, but it had an heavy computational cost.
We introduce a new smoothing method, dubbed Stupid Backoff, that is inexpensive to train on large data sets and approaches the quality of Kneser-Ney Smoothing as the amount of training data increases.
In 2007 Thorsten Brants developed a new smoothing algorithm, simpler than the Kneser-Ney one, which was very lean and appliable to large amounts of data.
These algorithms were heavily used in machine translations, and you can already figure out what happened: with small datasets the backoff was generating less-accurate translations but, as the amount of data analized growed, it was able to extract more valid translations, eventually beating Kneser-Ney’s score1.
I’d like you to read a few notes about the stupid backoff’s introductory paper:
- This was possible, in the machine-translation scenario, thanks to the fact that the algorithm could be “trained” to perform better translations as the dataset grew ↩