Large Data Sets

Overview


Large datasets can pose problems for running analytics that are not present when analyzing a smaller dataset. Typically these problems come about because of the resource limitations of the machine on which the analytics are being run.

Issues with Large Data


  • Visualizations are sometimes impossible. Running a visualization, such as a chart, or even displaying the data in a table, can take a lot of computer resources. Sometimes doing this can cause the browser to lock up while it tries to display such voluminous data.
  • Analytics require a large amount of resources. Even without using a visualization, the analytics may require a large amount of resources, such as memory and/or time.
  • Loading data - Loading data from a file or a webservice will load first load the data as text into the browser, at which point it is parsed into the relevant data structure. Javascript has a pricatical limitation of 500MB text. This typically mean that a dataset larger than 500MB will have to be loaded in chunks and merged. (see merging data)

Topics


The following are the two primary methods for dealing with large data.

  • Sreaming - shows you how to download large json data.
  • Data in Chunks