Large Language Models

Overview


Large language models are neural network (deep learning) models that are designed to "understand" and/or generate human language.

Architecture


Large lnaguage models are built on a foundation provided by probability and statistics. The basic framework for an LLM, is a model which will predict a missing word, given the context (words) around it.

As an example, consider the following text:
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation.
The goal of a large language model is to predict the blacked out word, given the other text.



There are two basic structures:

  • Autoregressive - predict the next word in a sequence of words, where only the words prior to the word being predicted are known.
  • Autoencoding - predict the missing word in a piece of text, where the text both to the left and right of the missing word are available.

Topics


The following are models that were developed in the course of building LLM's.

  • Bag of Words
  • Word2vec