Collection

Anatomy of a Collection

The Collection is the centerpiece of TFire. It's a container of Assets, usually financial assets of some kind, but in principle any entity which has associated time series data. Each such Asset is made up of a number of samples which represents individual views of the corresponding data. When analyzing, each sample only gets to view a limited range of the time series data - never past the Evaluation date.

The main idea with this layout is a combination of safety and flexibility:

  • Safety - Issues with Look-Ahead bias are minimized. Especially useful when using repainting indicators.

  • Flexibility - Collections may be manipulated, accessed and filtered in a multitude of ways including the use of set operations. See Collection Manipulation.

drawing

Assets

Assets come in two variants, Continuous and Discrete, where the former is used for Assets with a continuous time series, for example a stock at a daily resolution, or any continously traded asset like crypto curencies and currency pairs, whereas the latter is used when the underlying time series is discontinuous, for example a stock at some intraday resolution. Each Asset links the Collection to a TimeSeries in the SimpleFastDB

Samples

An Asset contains samples, which link to a part of the associated time series. Every such sample has one Evaluation date, which is the last datetime of time series data accessable to the sample. By not allowing data access after the evaluation date the risk of look ahead is minimized. The recommended way to access time series and Layer data for a sample are through DataViews.

Layers

A Collection may contain Collection Layers, a form of generalized technical indicators, connected to time series data. As the name suggests, these indicators may be layered on top of each other.

Collection Setup

To set up a Collection we first need to specify the contents of it. This is done with a Specification.

TFire> spec = Specification(["AAPL", "TSLA"], Date(2010,1,1), Date(2024,1,1));

Here, we create a Specification, specifying which tickers between which dates to include in the collection. The Specification also contains information on which data fields (price, volume etc) to include. Since no fields are explictly specified here, the default fields are used.

Whit the specification, we can now set up the Collection:

TFire> collection = setup_collection(spec)
-| Collection |- (Continuous)
Tickers: 2, AAPL TSLA
Samples: 6852

We can see that the Collection contains 2 Assets and a total of 6852 Samples.

Every Collection is connected to a SimpleFastDB instance which contains the raw time series data. In the example above, this is handled in the background but may also be explicitly specified:

TFire> spec = Specification(["AAPL", "TSLA"], Date(2010,1,1), Date(2024,1,1));

TFire> sfdb = setup_simple_fast_db(spec);

TFire> collection = setup_collection(spec, sfdb);

Working with Different Internet Data Sources

When setting up a Collection, you can specify which internet service to use for fetching data. This is particularly important when working with different types of financial data that come from different sources. For example:

# Create specifications for different data types
spec_stocks = Specification(["AAPL", "MSFT"], Date(2020,1,1), Date(2024,1,1))
spec_fred = Specification(["DFF", "GDP"], Date(2020,1,1), Date(2024,1,1))

# Add data from different sources to the default SimpleFastDB
add_data_from_internet!(spec_stocks, internet_service=YahooFinance)
add_data_from_internet!(spec_fred, internet_service=FRED_API)

# Create collections - they will use the default SimpleFastDB
collection_stocks = setup_collection(spec_stocks)
collection_fred = setup_collection(spec_fred)

Important considerations when working with different data sources:

  • If no internet service is specified, the internet service specified in TFSettings will be used.
  • Different data sources may require API keys, which should be set in TFSettings
  • Available data fields vary by data source (see Internet Data Sources for details)

For a comprehensive list of available data sources and their specific features, see Internet Data Sources.

For a more comprehensive example of how to set up a collection from scratch see Tutorial - Basic Analysis of Time Series Data.

Collection Settings

A Collection contains several types of settings:

  • collection_settings: The immutable core settings of the collection including resolution and price_type.
  • layer_settings_active: The current settings in use by layers. These should only be viewed, never modified directly.
  • layer_settings_pending: The settings that will be used for future layer operations. These can be freely modified.

Layer Settings Usage

# ✓ View current settings (read-only)
collection.layer_settings_active[LayerSMA][:win_sizes]

# ✓ Modify settings for future operations
collection.layer_settings_pending[LayerSMA][:win_sizes] = [12, 26]

# ✗ Never modify active settings directly
# collection.layer_settings_active[LayerSMA][:win_sizes] = [12, 26]  # Don't do this!

The active settings will be automatically updated when layers are added or modified - they should never be changed directly.

Price Type System

The collection's price_type determines how price fields are mapped to actual data fields. It handles two kinds of fields:

  1. Default price fields: When using standard price field names (:close_price, :open_price, :high_price, :low_price), they are automatically mapped based on the collection's price type.

  2. Custom fields: Any other field names are passed through unchanged.

Available price types include:

UnadjustedPrices()     # :close_price => :close, etc.
AdjustedPrices()       # :close_price => :adj_close, etc.
UnadjustedLogPrices()  # :close_price => :close_log, etc.
AdjustedLogPrices()    # :close_price => :adj_close_log, etc.

For example:

# Collection using adjusted prices
collection = setup_collection(spec, price_type=AdjustedPrices())
# :close_price will map to :adj_close
# :volume will remain as :volume

# Collection using unadjusted log prices
collection = setup_collection(spec, price_type=UnadjustedLogPrices())
# :close_price will map to :close_log
# :custom_field will remain as :custom_field

You can use list_settings(collection) to view all current settings and any pending changes.

For more functions related to Settings and Parameters see Collection Settings - Functions.

Functions

For functions related to the Collection see Collection - Functions.