Exploring Data

Overview

When just beginning to understand a dataset, it often useful to just explore the dataset from different perspectives and try to understand the distributions underlying the dataset. Exploratory techniques often involve plotting data along different columns. As an example, one can do a scatter plot of a dataset along two different columns. The difficulty comes when the dataset has more than 2 columns. Of course, you could plot a 3-d chart, but when you have more than 3 columns (or dimensions) you cant put all those columns on a scatter chart.

One way to explore the data is to take columns from a dataset two at a time and then plot them.