Time Series Feature Extraction

Overview


Feature Extraction in a time series is generally different from feature extraction of other datasets. The primary reason for this is that one cannot ignore the element of time as it relate to the data one is evaluating. Consider the common process of normalizing a dataset, that is, taking the values for one column of the dataset and then converting it to number of standard deviations away from the mean. When dealing with time series, on generally will only want to include current and past values of the dataset in thge calculation, particularly if one is doing forecasting.

Common Time Series Features


Normalization : is the process of scaling all the values of a dataset. This may mean scaling the numeric value from its measured value to the number of standard deviations it is away from the mean of the dataset at hand. Or, in a time series, you may wish to set the first value to 1, and all other values scaled from there.

Example Feature Extraction



let data = [{date:'2020-01-01', price:100},{date:'2020-02-01', price:100},{date:'2020-03-01', price:100},{date:'2020-04-01', price:100},
{date:'2020-05-01', price:100},{date:'2020-06-01', price:100},{date:'2020-07-01', price:100},];
					


The first thing that we are likely to want to do is to sort the records by date. This can easily be accomplished with the $list api as follows:


$list(data).sort(p=>p.date)
					





let timeseries = $list(data).sort(p=>p.date).map((p,i,items)=>{
  let ma = $list(items).window(i-20,20).map(p=>p.price).average();
  return {
    ...p,
    moving_average : ma
  };
}).items;
					





let timeseries = await $list(data).sort(p=>p.date).map(async (p,i,items)=>{
  await $wait();
  let ma = $list(items).window(i-20,20).map(p=>p.price).average();
  return {
    ...p,
    moving_average : ma
  };
}).items;
					

using an API


The above procedure works, but isnt algorithmically optimal, in terms of speed. This can be rememdied by using a library.


let timeseries = $list(data).sort(p=>p.date).map(p=>p.price).movingAverage(20).items;
					


The movingAverage library handles optimizing the code underneath the covers. In addition, it is arguably more readable. However, this code has another problem. The time series is now an array of numbers (the moving averages). We want to attach the moving average back to the prices.

Many of the time series feature extraction libraries allow you to pass in two additional inputs to their functions in order to overcome this problem. The two inputs are

  • an extract fuction - this function extracts the number from the source object
  • a map function - this function accepts the original item and the calculated value and should return the item that you want in the list.



let timeseries = $list(data).sort(p=>p.date).movingAverage(20, p=>p.price, (value, item)=>{
	return {
		...item,
		moving_average:value
	};
}).items;
					


Contents