Overview
Large language models are neural network (deep learning) models that are designed to "understand" and/or generate human language.Architecture
Large lnaguage models are built on a foundation provided by probability and statistics. The basic framework for an LLM, is a model which will predict a missing word, given the context (words) around it.As an example, consider the following text:
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of
text,
designed for natural language processing tasks, especially language generation.
The goal of a large language model is to predict the blacked out word, given the other text.
There are two basic structures:
- Autoregressive - predict the next word in a sequence of words, where only the words prior to the word being predicted are known.
- Autoencoding - predict the missing word in a piece of text, where the text both to the left and right of the missing word are available.