Data Compression

Overview


Data compression is the process of reducing the number of bytes required to represent a piece of data (often thought of as a message).

Data is encoded using an alphabet of some sort. In a computer, this alphabet consists entirely of zeros and ones. (binary)

Once an alphabet has be specified, a message, or piece of data, can be encoded as a string of characters, here zeros and ones. Each message has a length. Compression is a process whereby the average length of message is reduced.

Data compression is mathematically described by information theory where the concept of entropy is shown to be the theoretical limit to how far a message can be compressed.