On Smoothing Techniques

I know, I am going beyond the scope of the way I wanted my blog to be when I began it. But, three years is a long time and thoughts run wild. 🙂 I know, perhaps I am going beyond the scope of my readers’ expectations too. But, this is my only mouthpiece 🙂

I heard this name called: “Kneser-Ney Smoothing” the other day and got curious to know what it is. (To understand what Smoothing means, go to the wiki page here). When I did my introductory course in Natural language processing, I came to know about smoothing for statistical language modelling (in late 2006). Back then, as far as I remember, we read about only three things:
1. Add one smoothing
2. Turing estimation
3. Witten-Bell smoothing
– I ofcourse dont remember anything but the names, though :p

Okay, in my over-curiosity to know more about Kneser-Ney smoothing, I found this presentation on smoothing techniques in NLP by Bill MacCartney. Again, since I don’t need anything from this specifically, at the moment, I just browsed through the PPT to understand the gist of it. Heres what I learnt on different smoothing mechanisms:

1. Add-one smoothing is the simplest possible and least effective one – as the textbooks say.
2. There is something called Additive smoothing, which, from my understanding is similar to Add-one smoothing, except that here, its not add-1 but add-some number between 0 and 1.
3. Good Turing estimation – which aims at re-allocating the probabilities of those n-grams that occur (r+1) times in training data to the mass of n-grams that occur r-times. [If you are more curious, and need an example, see the PPT slides. If you extra curious, go ahead and visit the wiki page]
4. Jelinek-Mercer smoothing – is a kind of interpolative method. Quoting from the ppt: “nth-order smoothed model is defined recursively as a linear interpolation between the nth-order ML model and the (n − 1)th-order smoothed model.” Did not realize that theres a name for this 🙂 I used a similar idea sometime back, without realizing that theres a name for it!
5. Katz Smoothing – This can best be explained by using the words from the PPT again:
“Count mass subtracted from nonzero counts is redistributed among the zero-count bigrams according to next lower-order distribution”
6. Witten Bell Smoothing – A kind of Jelinek Mercer smoothing.
7. Absolute Discounting – From what I understood, this is a kind of hybrid between interpolation kind of smoothing like Witten bell and discounting kind of smoothing like Katz.
8. Kneser-Ney Smoothing – the one for which this search began! this was described as an extension of absolute discounting. Anyways, after reading all these, I actually forgot that my search began with this name!

There were some comments on interpolation kind of smoothing vs discounting kind of smoothing – compare and contrast stuff.

On the whole, educative. Would have been interesting if I were actually a student of this course now, and was asked to implement all these programatically and write a report with performance analysis 🙂

Published in: on June 4, 2009 at 11:02 am  Comments (4)  

The URI to TrackBack this entry is: https://vbsowmya.wordpress.com/2009/06/04/on-smoothing-techniques/trackback/

RSS feed for comments on this post.

4 CommentsLeave a comment

  1. You could still do that. 😀

  2. Gosh! I almost read it as “soothing techniques” and clicked the link immediately 🙂

  3. you are much better with your sleeping-thoughts. this is informative but those are eantertaining like a breeze. you can get this kind of info anywhere.

  4. Was a good demonstration of Over Head Transmission… 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: