Specification

The Specification specifies the contents of a Collection. Internally it is made up of: Constituents and DataFields.

Creating a Specification

The primary way to create a Specification:

Specification(;
    tickers::Vector{String},            # Required - Assets to include
    start_date::Union{Date,DateTime},   # Required - Start of analysis period
    end_date::Union{Date,DateTime},     # Required - End of analysis period
    resolution::String="1d",            # "1d", "1m", etc.
    interval::Symbol=:continuous,       # :continuous or :discrete
    market_hours::String="",            # "", "NY", etc.
    extend_back::Int=0,                 # How far back to look in each sample. 0 means all the way back to the first evaluation date
    data_fields::Union{Nothing,Vector{Symbol}}=nothing  # Optional specific fields
)

Common Usage Patterns

# Basic daily resolution:
spec = Specification(
    tickers=["AAPL", "MSFT"],
    start_date=Date(2023,1,1),
    end_date=Date(2024,1,1)
)

# 24-hour trading with minute resolution (e.g., crypto):
spec = Specification(
    tickers=["BTC-USD"],
    start_date=Date(2023,1,1),
    end_date=Date(2024,1,1),
    resolution="1m",
)

# NYSE market hours intraday data (eg. stocks):
spec = Specification(
    tickers=["AAPL"],
    start_date=Date(2023,1,1),
    end_date=Date(2024,1,1),
    resolution="1m",
    interval=:discrete,
    market_hours="NY"
)

# Obtain a specification from a collection:
spec = Specification(collection)

Continuous vs Discrete Interval

The interval parameter determines how the data is organized:

  • :continuous: Data is treated as one continuous time series. Used when you want to analyze the complete time series as a single unit.

  • :discrete: Data is organized by individual days. This is particularly useful for intraday analysis where you want to analyze patterns that repeat daily.

Market Hours

The market_hours parameter controls trading hours for discrete data:

  • "NY": Uses NYSE trading hours (9:30 AM - 4:00 PM ET)
  • "" (empty string): Uses 24-hour trading (00:00 - 23:59)

Constituents

The Constituents contain information about ticker symbols and timeperiods as well as resolution. In a Collection every Sample in an Asset is specified by the ticker, a beginning and end date in the Constituents.

The Constituents changes depending on if they represent a Collection, a SimpleFastDB or ExternalData. For example, even though there are one time period per sample in a Collection, there are only one time period per time series in a SimpleFastDB. Thus, Constituents must be translated between Collection, SimpleFastDB and ExternalData.

ConstituentsContinuous

The ConstituentsContinuous is used when the data for each ticker is continuous in time.For example daily close data for an asset that is traded daily.

Fields:

  • ticker_ranges: An ordered dictionary linking tickers to vectors of datetime ranges, indicating time series intervals for each asset.
  • resolution: Describes the data granularity, e.g., "1d", "1m".
  • specification_type: ExternalSpecification, CollectionSpecification or SFDBSpecification

ConstituentsDiscrete

The ConstituentsDiscrete is used when the data for each ticker is discrete in time. For example when intraday data is to be fetched for several days for an asset that is traded only part of the day, like a stock.

Fields:

  • dates: An ordered dictionary where each date references a ConstituentsContinuous, allowing representation of non-continuous data.
  • resolution: Specifies the data granularity.
  • specification_type: ExternalSpecification, CollectionSpecification or SFDBSpecification

DataFields

DataFields specifies which fields of data are available in the Collection, SimpleFastDB, or ExternalData.

Price Field System

The collection's price_type determines how price fields are mapped to actual data fields. It handles two kinds of fields:

  1. Default price fields: When using standard price field names (:close_price, :open_price, :high_price, :low_price), they are automatically mapped based on the collection's price type.

  2. Custom fields: Any other field names are passed through unchanged.

Available price types include:

:unadjusted      # :close_price => :close, etc.
:adjusted        # :close_price => :adj_close, etc.
:unadjusted_log  # :close_price => :close_log, etc.
:adjusted_log    # :close_price => :adj_close_log, etc.

For example:

# Collection using adjusted prices
collection = setup_collection(spec; price_type=:adjusted)
# :close_price will map to :adj_close
# :volume will remain as :volume

# Collection using unadjusted log prices
collection = setup_collection(spec; price_type=:unadjusted_log)
# :close_price will map to :close_log
# :custom_field will remain as :custom_field

Economic and Alternative Data Fields

Different data sources provide different fields. When working with economic data sources:

  • FRED (Federal Reserve Economic Data):
    • Only provides :value field
    • Example:
      # Create specification for FRED data
      spec_fred = Specification(["GDP", "DFF"], Date(2020,1,1), Date(2024,1,1))
      spec_fred.data_fields = DataFields(Set([:value]))

Working with Multiple Data Types

When combining different types of data, create separate specifications with appropriate data fields for each source:

# Market data specification (price fields will be mapped based on price_type)
spec_market = Specification(["AAPL", "MSFT"], Date(2020,1,1), Date(2024,1,1))

# Economic data specification (explicit value field)
spec_econ = Specification(["DFF", "GDP"], Date(2020,1,1), Date(2024,1,1))
spec_econ.data_fields = DataFields(Set([:value]))

# Add data from different sources
add_data_from_internet!(spec_market, internet_service=YahooFinance)
add_data_from_internet!(spec_econ, internet_service=FRED_API)

# Create collections with specific price types
collection_market = setup_collection(spec_market, price_type=AdjustedPrices())
collection_econ = setup_collection(spec_econ)  # price type doesn't affect :value field

Using DataViews with Price Fields

When accessing price-related data, use DVPriceSeries instead of DVTimeSeries to automatically handle price field mapping:

# Will automatically map to correct price field based on collection's price type
dv = DVPriceSeries(asset, :close_price)

# For non-price fields, use DVTimeSeries
dv_volume = DVTimeSeries(asset, :volume)
dv_value = DVTimeSeries(asset, :value)  # For economic data

This system ensures that your code remains independent of the specific price type being used in the Collection.

Translations Between Specification Types (advanced)

There are three types of Specifications. Their respective SpecificationTypes are: CollectionSpecification, SFDBSpecification (see SimpleFastDB) and ExternalDataSpecification (see ExternalData).

While every type of Specification may be manually specified, an automatic translation is most commonly used. The most frequent procedure is to create a CollectionSpecification, which is then translated to a SFDBSpecification that is then translated to an ExternalDataSpecification.

CollectionSpecification to SFDBSpecification

The function

specification_sfdb(spec::SpecificationStruct{C,T}; extend_first=Day(0), extend_last=Day(1000)) where {C<:AssetConstituents,T<:CollectionSpecification}

translates a CollectionSpecification to a SFDBSpecification. The SFDBSpecification specifies one time series per ticker. The extend_first and extend_last parameters indicate how many days/minutes etc of data should be added to the beginning and end of the timeseries. If extend_first=0 and extend_last=0, the resulting SFDBSpecification will specify a timeseries that begins from the first datetime specified for that ticker in the CollectionSpecification and end with the last datetime.

SFDBSpecification to ExternalDataSpecification

A SFDBSpecification is converted to an ExternalDataSpecification by the function

specification_external(spec::SpecificationStruct{C,T}) where {C<:AssetConstituents,T<:SFDBSpecification}

The constituents are here kept as they are, but the data fields are modified to convert logarithmized data fields to their non logarithmized counterparts.

Functions

For functions related to Specifications seeSpecification - Functions