Exploring Data

Overview


When just beginning to understand a dataset, it often useful to just explore the dataset from different perspectives and try to understand the distributions underlying the dataset. Exploratory techniques often involve plotting data along different columns. As an example, one can do a scatter plot of a dataset along two different columns. The difficulty comes when the dataset has more than 2 columns. Of course, you could plot a 3-d chart, but when you have more than 3 columns (or dimensions) you cant put all those columns on a scatter chart.

One way to explore the data is to take columns from a dataset two at a time and then plot them.

Generating Scatter Plots


As standard method of exploring data is plot the data as a series of scatter charts, plotting each independent factor against the others.

go

Generating Line Charts


For time series, scatter plots are not necessarily the best method to explore the data. An alternative method when dealing with time series to plot each series in a line hcart.

go

Contents