Tutorial - Basic Analysis of Time Series Data
The primary tool used to access and analyse time series is the Collection. It contains a list of Assets, where every Asset holds the various time series associated with a single asset. An Asset typically contains several views into the time series, where each view is a continuous subset of the whole time series. Each such view is called a Sample (see Samples).
Collection Setup
Below follows a simple example of how to setup a Collection.
TFire> tickers = ["MMM", "AOS"]
2-element Vector{String}:
"MMM"
"AOS"
TFire> spec = Specification(tickers, Date(2001,6,2), Date(2018,6,8); extend_back=1800);
TFire> collection = setup_collection(spec)
Setting up collection
||Collection|| (Continuous)
Tickers: 2, MMM AOS
Samples: 8562
First the ticker symbols for the assets to be included are defined. Secondly a Specification
is created. The "evaluation dates" are specified as all dates between 6/2/2001 and 6/8/2018. An evaluation date is the last date in the series of dates "seen" by one Sample. When a sample is later analyzed, no data beyond this date is visible to the analysis, preventing accidental "look ahead". The extend_back=1800
parameter gives each sample access to 1800 days of data proior to its evaluation date.
The Specification also contains information about which DataFields to include in each sample. Here the defaults are used since no DataFields are explicitly specified.
Adding Layers
The Collection provides an interface to attach Layers to a time series. A layer is a condensate of data associated with the time series. Layers may also be, as the name suggests, layered. For example, the first layer could be a pair of Exponential Moving Averages (EMAs) of the price of an asset and the second layer could then be the Moving Average Convergence Divergence (MACD) based on that pair of EMAs.
Example of adding layers:
TFire> settings = Settings()
TFire> settings[LayerEMA][:win_sizes] = [12,26];
TFire> collection_ema = add_layer(collection, LayerEMA, settings=settings)
||Collection|| (Continuous) -> LayerEMA
Tickers: 2, MMM AOS
Samples: 8562
TFire> collection_macd = add_layer(collection_ema, LayerMACD)
||Collection|| (Continuous) -> LayerEMA -> LayerMACD
Tickers: 2, MMM AOS
Samples: 8562
Here the concept of Collection Settings is introduced. This is where parameters for Layers live. They may also contain other types of parameters, which we will soon see. It should be noted that settings
was not needed when adding the MACD layer since it was already attached to collection_ema
at its creation.
Let's have a look at all the settings that has been used
TFire> collection_macd.settings_used
LayerMACD
:signal_period => 9
:long_period => "maximum"
:short_period => "minimum"
---
TSDLink
:resolution => "1d"
:default_price_field => :adj_close_log
---
LayerEMA
:win_sizes => [12, 26]
---
LayerMACD has settings even though no parameters were specified. This is because all Layers come with default parameters that are used if some parameter is omitted. The parameters for TSDLink specifies the resolution of the time series related to the Collection and which data field that is to be used when a layer requests the "price" of an asset. Here, the adjusted logarithmic closing price will then be used if nothing else is specified for a particular layer.
Plotting the Graph for a Sample
Let's look at the default graph for the first sample of the Asset with ticker MMM.
TFire> plot_graph(collection_macd, "MMM", 1);
Notice how the graph ends at 2 jun 2001 coinciding with the evaluation date of the first sample. No data beyond this date is visible to this particular sample.
Analysis
Let's say we have a theory that any time the MACD histogram changes sign from negative to positive this signals a possible short term trend change. We can filter out all samples where the evaluation date has a positive value in the MACD histogram and the previous date is negative.
First we set up a function that returns true if a vector has a positive last value and a negative second to last value.
TFire> function macd_signal(args)
macd_hist = args[1]
return macd_hist[end-1] < 0 && macd_hist[end] > 0
end
Secondly, we need to define a DataView
. For now, all we need to know is that a Data View provides a view of some data in a Collection. Here a view of the MACD histogram from the MACD Layer. For a more comprehensive explanation of Data Views see DataViews.
TFire> dv_macd = DVLayerField(collection_macd, :macd_histogram, LayerMACD);
Now we can filter the Collection with the Data View and the predefined function. If the function returns true we want to keep the sample (thus action=:keep), otherwise throw it away.
TFire> collection_macd_filtered = filter_collection(collection_macd, [dv_macd], macd_signal)
Let's have a look at how many dates that were kept after the filtering.
TFire> collection_macd_filtered
||Collection|| (Continuous) -> LayerEMA -> LayerMACD
Tickers: 2, MMM AOS
Samples: 353
And how they are split between the two assets.
TFire> collection_macd_filtered["MMM"]
||Collection|| (Continuous) -> LayerEMA -> LayerMACD
Ticker: MMM
Samples: 172
TFire> collection_macd_filtered["AOS"]
||Collection|| (Continuous) -> LayerEMA -> LayerMACD
Ticker: AOS
Samples: 181
We can calculate the average returns for the resulting filtered collection collection_macd_filtered
. Then an envelop of all possible best and worst returns of some subset with x number of samples of the whole of collection_macd
may be plotted.
TFire> plot_envelope(collection_macd_filtered, collection_macd, 10)
The horizontal line marks the average return of collection_macd. We can see that the filtered dates performed worse than the average date and significantly worse than the optimal choice of 353 dates/samples.
10 days might be a bad choice. Maybe some other period may prove to give a better result...
TFire> mean_return_filtered = mean_compound_return(collection_macd_filtered, 150);
TFire> mean_return_original = mean_compound_return(collection_macd, 150);
TFire> diff_return = mean_return_filtered .- mean_return_original;
TFire> plot_line(diff_return, name="Filtered - All")
Here, we calculated the difference between the average return of the filtered Collection and the average return of the original Collection for each possible step length. I.e element x in diff_return
corresponds to the difference in average return after x steps.
We then plot this difference. From the plot we can see that the case for selecting dates to take a long position gets slightly better if we are looking to stay in the trade for a longer period but it never gets very convincing.
In Tutorial - Backtesting we instead look at how to do basic portfolio backtesting on a simple strategy.