Large Data Sets

Overview


Large datasets can pose problems for running analytics that are not present when analyzing a smaller dataset. Typically these problems come about because of the resource limitations of the machine on which the analytics are being run.

Issues with Large Data


  • Visualizations are sometimes impossible. Running a visualization, such as a chart, or even displaying the data in a table, can take a lot of computer resources. Sometimes doing this can cause the browser to lock up while it tries to display such voluminous data.
  • Analytics require a large amount of resources. Even without using a visualization, the anlaytics may require a large amount of resources, such as memory and/or time.
  • Loading data - Loading data from a file or a webservice will load first load the data as text into the browser, at which point it is parsed into the relevant data structure. Javascript has a pricatical limitation of 500MB text. This typically mean that a dataset larger than 500MB will have to be loaded in chunks and merged. (see merging data)

Merging Data


Merging data refers to the situation where you have two array and you wish to create an array that contains the items from each array concatenated together. This is simple using the Javascript spread operator. The spread operator is expressed as three dots, i.e. "...". When three dots precedes an array, it represents the items of the array, so that the following sample code represents an array with the elements of two arrays.


let data = [...data1, ...data2]
						 


The following code demonstrates iterating over an array of arrays and concatting to a single array.


let data = [];
for(let set of datasets){
  data = [...data, ...set]						
}
						 
Try it!

Using the File Server


The file server is a server that hosts files from the local harddrive as a webserver. It includes facilities for querying the server to get a list of file URLs based on directory.

When dealing with large datasets, you may have to split the data into multiple files. Splitting the data into multiple files helps in several ways. First it makes it easy to filter the data. If the data is split along properties of the data, such as date or other key, you can selectively load only the data you need. Second, many browsers have limitations on the size of a file it will load. (This comes from a limitation on string size). This can be circumvented by loading several files and concatenating the data.

The file server utility allows you to query the folder structure so that you can automate the process of loading all the necessary files.

file server

Using the Server Module api


The server module provides a library for easily querying a file server. The following code demonstrates getting a list of files on the server. We call the list method on server, passing in a URL to list the file contents. Note, the URL that is passed to server need not be the root URL, it could be a subfolder. Next we use the array filter method to filter the returned URLs. Finally, we retrieve the data using $ajax calls and merge the data as above.


let sr = await import('/lib/server/v1.0.0/server.js');
let list = await sr.server('http://localhost:2800/data/').list()

list = list.filter(p=> //filter criteria);
let data = [];
for(let url of list){
  data = [...data, ...(await $ajax(url))]						
}



let sr = await import('/lib/server/v1.0.0/server.js');
let list = await sr.server('http://localhost:2800/data/').list()
server api

Contents