STJayaprakash's  AI-Aqua Substack

STJayaprakash's  AI-Aqua Substack

Share this post

STJayaprakash's  AI-Aqua Substack
STJayaprakash's AI-Aqua Substack
Language Models: Part 2

Language Models: Part 2

Language Models Use Cases and its Industrial Applications.

stjayaprakash's avatar
stjayaprakash
Aug 14, 2023
∙ Paid

Share this post

STJayaprakash's  AI-Aqua Substack
STJayaprakash's AI-Aqua Substack
Language Models: Part 2
Share

In the my previous blog, I gave you simple introduction on Language Models and explained about N-Gram Architecture & its limitations. Continuing from there, we start to explore on the techniques used to overcome the shortcomings of Language Models specifically N-Gram Models.

Smoothing, interpolation, and backoff are techniques used to improve the performance of N-gram models in natural language processing.

Smoothing is a technique used to address the problem of data sparsity in N-gram models. Since it is impossible to have a training corpus that contains all possible N-grams, some N-grams will have zero counts, which can cause problems when calculating probabilities. Smoothing methods assign non-zero probabilities to unseen N-grams by redistributing probability mass from seen N-grams1.

STJayaprakash's AI-Aqua Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Interpolation is a smoothing technique that combines the probabilities of N-grams of different orders. For example, if we are trying to find the probability of a trigram and we have trigram, bigram, and unigram models, we can estimate the probability as a weighted sum of the probabilities from each model.

Backoff is another smoothing technique that approximates the probability of an unobserved N-gram using more frequently occurring lower-order N-grams. If an N-gram count is zero, its probability is approximated using a lower-order N-gram3.

These techniques help improve the performance of N-gram models by addressing the issue of data sparsity and providing better estimates for rare events.

Keep reading with a 7-day free trial

Subscribe to STJayaprakash's AI-Aqua Substack to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 stjayaprakash
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share