Tree Structures
Overview
A tree structure is a very common data structure. A tree is a data structure composed of nodes, each of which may have
multiple children nodes. A common example of a tree strucutre is the folder structure on a computer hard drive. Every
folder is a node. All the files and folders within that folder are the children nodes.
tree api
The davinci platform hosts a set of libraries for dealing with tree data.
Reading from Table Data
Often times a tree structure is enconded as tabular data. For instance, you could save tree data in a relational database.
The tree api provides methods for reading tabular data into a tree structure.
//creates a tree map from an array of objects
let treeMap = $list(data).treeMap(p=>p.id, p=>p.parentid).tree;
Try it
Dimensional Data
Trees are often used in data that is described as being dimensional. That is, you have data that is tabular in nature,
but which has columns that are items in a tree structure. For instance, you may have a table of sales data, and each
transaction is attached to a salesperson. That sales officer exists within a hierarchy of managers in their organization.
One may want to filter that sales data by a manager in the management tree.
Other columns in the sales data may refer to other tree strucutures. For example, the product id may exist within
a product classification hierarchy.
Each of these tree structures attached to the data is commonly referred to as a dimension.
let tree = await import('/lib/tree/v1.0.0/tree-map.js');
//creates a tree map from an array of objects
let productMap = tree.fromTable(products, p=>p.id, p=>p.parentid);
let officers = tree.fromTable(salesofficers, p=>p.name, p=>p.manager);
let williamsSales = sales.filter(p=>"williams" in officers[p.officer].ancestors)
Try it
Large Dimensional Data - Data Blocks
The code above works when the data set can be loaded entirely in memory. For large datasets (or Big Data), this may
not be possible. In some cases the entire dataset cannot be loaded into memory, but the data you want to analyze
can, that is you only want a subset of the data. In other cases, it may be possible to load the entire dataset into
memory but you still only want a subset of the data, for speed reasons, you dont want to have to load the entire
set just to filter out a set of records.
One way to tackle these situations is to break the data into separate files and then to create a folder
structure that reflects properties of the data in that folder.
For example, using the data from above, you could have a set of folders such as:
/jones/2020-01/
/jones/2020-02/
/jones/2020-03/
/smith/2020-01/
/smith/2020-02/
/smith/2020-03/
/lee/2020-01/
/lee/2020-02/
/lee/2020-03/
This folder structure has the sales officer name as the top folder, the month as the second folder, and the data
files would be placed in the date subdirectory. This folder structure allows one to filter the data loaded by
officer name and date. Once loaded, the data can be further filtered.
To accomplish this, you need a way to list all the data files at a give location. For instance, the
server library will list the files at a given webserver, assuming that webserver complies with the
davinci file server specification.
The following code demonstrates filtering the data prior to loading. The data is loaded using the $ajax
server. It can be further filtered after loading.
let sv = await import('/lib/server/v1.0.0/server.js');
let url = 'http://www.server.com/';
let urls = (await sv.server(url).list('path')).map(p=>"`+url+`"+p.path);
urls = urls.filter(p=>{
let split = p.split('/');
return split[0] == 'jones';
})
let data = [];
for(let url of urls){
data = [...data, ...(await $ajax(url))];
}
Supposing the data is still to large to load at once, meaning we want to further filter each file, you can
load each file separately as follows:
let data = [];
for(let url of urls){
data = [...data, ...((await $ajax(url)).filter(p=>p.size > 1000))]
}