Time Series Feature Extraction
Overview
Feature Extraction in a time series is generally different from feature extraction of other datasets.
The primary reason for this is that one cannot ignore the element of time as it relate to the data one
is evaluating. Consider the common process of normalizing a dataset, that is, taking the values for one
column of the dataset and then converting it to number of standard deviations away from the mean.
When dealing with time series, on generally will only want to include current and past values of the
dataset in thge calculation, particularly if one is doing forecasting.
Common Time Series Features
Normalization : is the process of scaling all the values
of a dataset. This may mean scaling the numeric value from its measured value to the number of standard deviations it is away from the
mean of the dataset at hand. Or, in a time series, you may wish to set the first value to 1, and all other values scaled from there.
Example Feature Extraction
let data = [{date:'2020-01-01', price:100},{date:'2020-02-01', price:100},{date:'2020-03-01', price:100},{date:'2020-04-01', price:100},
{date:'2020-05-01', price:100},{date:'2020-06-01', price:100},{date:'2020-07-01', price:100},];
The first thing that we are likely to want to do is to sort the records by date. This can easily be accomplished
with the $list api as follows:
$list(data).sort(p=>p.date)
let timeseries = $list(data).sort(p=>p.date).map((p,i,items)=>{
let ma = $list(items).window(i-20,20).map(p=>p.price).average();
return {
...p,
moving_average : ma
};
}).items;
let timeseries = await $list(data).sort(p=>p.date).map(async (p,i,items)=>{
await $wait();
let ma = $list(items).window(i-20,20).map(p=>p.price).average();
return {
...p,
moving_average : ma
};
}).items;
using an API
The above procedure works, but isnt algorithmically optimal, in terms of speed. This can be rememdied by using a library.
let timeseries = $list(data).sort(p=>p.date).map(p=>p.price).movingAverage(20).items;
The movingAverage library handles optimizing the code underneath the covers. In addition, it is arguably more
readable. However, this code has another problem. The time series is now an array of numbers (the moving averages).
We want to attach the moving average back to the prices.
Many of the time series feature extraction libraries allow you to pass in two additional inputs to their functions in order
to overcome this problem. The two inputs are
- an extract fuction - this function extracts the number from the source object
- a map function - this function accepts the original item and the calculated value and should return the item
that you want in the list.
let timeseries = $list(data).sort(p=>p.date).movingAverage(20, p=>p.price, (value, item)=>{
return {
...item,
moving_average:value
};
}).items;