Specification
The Specification specifies the contents of a Collection. Internally it is made up of: Constituents and DataFields.
Creating a Specification
The primary way to create a Specification:
Specification(;
tickers::Vector{String}, # Required - Assets to include
start_date::Union{Date,DateTime}, # Required - Start of analysis period
end_date::Union{Date,DateTime}, # Required - End of analysis period
resolution::String="1d", # "1d", "1m", etc.
interval::Symbol=:continuous, # :continuous or :discrete
market_hours::String="", # "", "NY", etc.
extend_back::Int=0, # How far back to look in each sample. 0 means all the way back to the first evaluation date
data_fields::Union{Nothing,Vector{Symbol}}=nothing # Optional specific fields
)
Common Usage Patterns
# Basic daily resolution:
spec = Specification(
tickers=["AAPL", "MSFT"],
start_date=Date(2023,1,1),
end_date=Date(2024,1,1)
)
# 24-hour trading with minute resolution (e.g., crypto):
spec = Specification(
tickers=["BTC-USD"],
start_date=Date(2023,1,1),
end_date=Date(2024,1,1),
resolution="1m",
)
# NYSE market hours intraday data (eg. stocks):
spec = Specification(
tickers=["AAPL"],
start_date=Date(2023,1,1),
end_date=Date(2024,1,1),
resolution="1m",
interval=:discrete,
market_hours="NY"
)
# Obtain a specification from a collection:
spec = Specification(collection)
Continuous vs Discrete Interval
The interval
parameter determines how the data is organized:
:continuous
: Data is treated as one continuous time series. Used when you want to analyze the complete time series as a single unit.:discrete
: Data is organized by individual days. This is particularly useful for intraday analysis where you want to analyze patterns that repeat daily.
Market Hours
The market_hours
parameter controls trading hours for discrete data:
"NY"
: Uses NYSE trading hours (9:30 AM - 4:00 PM ET)""
(empty string): Uses 24-hour trading (00:00 - 23:59)
Constituents
The Constituents contain information about ticker symbols and timeperiods as well as resolution. In a Collection every Sample in an Asset is specified by the ticker, a beginning and end date in the Constituents.
The Constituents changes depending on if they represent a Collection, a SimpleFastDB or ExternalData. For example, even though there are one time period per sample in a Collection, there are only one time period per time series in a SimpleFastDB. Thus, Constituents must be translated between Collection, SimpleFastDB and ExternalData.
ConstituentsContinuous
The ConstituentsContinuous is used when the data for each ticker is continuous in time.For example daily close data for an asset that is traded daily.
Fields:
ticker_ranges
: An ordered dictionary linking tickers to vectors of datetime ranges, indicating time series intervals for each asset.resolution
: Describes the data granularity, e.g., "1d", "1m".specification_type
: ExternalSpecification, CollectionSpecification or SFDBSpecification
ConstituentsDiscrete
The ConstituentsDiscrete is used when the data for each ticker is discrete in time. For example when intraday data is to be fetched for several days for an asset that is traded only part of the day, like a stock.
Fields:
dates
: An ordered dictionary where each date references aConstituentsContinuous
, allowing representation of non-continuous data.resolution
: Specifies the data granularity.specification_type
: ExternalSpecification, CollectionSpecification or SFDBSpecification
DataFields
DataFields specifies which fields of data are available in the Collection, SimpleFastDB, or ExternalData.
Price Field System
The collection's price_type
determines how price fields are mapped to actual data fields. It handles two kinds of fields:
Default price fields: When using standard price field names (
:close_price
,:open_price
,:high_price
,:low_price
), they are automatically mapped based on the collection's price type.Custom fields: Any other field names are passed through unchanged.
Available price types include:
:unadjusted # :close_price => :close, etc.
:adjusted # :close_price => :adj_close, etc.
:unadjusted_log # :close_price => :close_log, etc.
:adjusted_log # :close_price => :adj_close_log, etc.
For example:
# Collection using adjusted prices
collection = setup_collection(spec; price_type=:adjusted)
# :close_price will map to :adj_close
# :volume will remain as :volume
# Collection using unadjusted log prices
collection = setup_collection(spec; price_type=:unadjusted_log)
# :close_price will map to :close_log
# :custom_field will remain as :custom_field
Economic and Alternative Data Fields
Different data sources provide different fields. When working with economic data sources:
- FRED (Federal Reserve Economic Data):
- Only provides
:value
field - Example:
# Create specification for FRED data spec_fred = Specification(["GDP", "DFF"], Date(2020,1,1), Date(2024,1,1)) spec_fred.data_fields = DataFields(Set([:value]))
- Only provides
Working with Multiple Data Types
When combining different types of data, create separate specifications with appropriate data fields for each source:
# Market data specification (price fields will be mapped based on price_type)
spec_market = Specification(["AAPL", "MSFT"], Date(2020,1,1), Date(2024,1,1))
# Economic data specification (explicit value field)
spec_econ = Specification(["DFF", "GDP"], Date(2020,1,1), Date(2024,1,1))
spec_econ.data_fields = DataFields(Set([:value]))
# Add data from different sources
add_data_from_internet!(spec_market, internet_service=YahooFinance)
add_data_from_internet!(spec_econ, internet_service=FRED_API)
# Create collections with specific price types
collection_market = setup_collection(spec_market, price_type=AdjustedPrices())
collection_econ = setup_collection(spec_econ) # price type doesn't affect :value field
Using DataViews with Price Fields
When accessing price-related data, use DVPriceSeries
instead of DVTimeSeries
to automatically handle price field mapping:
# Will automatically map to correct price field based on collection's price type
dv = DVPriceSeries(asset, :close_price)
# For non-price fields, use DVTimeSeries
dv_volume = DVTimeSeries(asset, :volume)
dv_value = DVTimeSeries(asset, :value) # For economic data
This system ensures that your code remains independent of the specific price type being used in the Collection.
Translations Between Specification Types (advanced)
There are three types of Specifications. Their respective SpecificationTypes are: CollectionSpecification, SFDBSpecification (see SimpleFastDB) and ExternalDataSpecification (see ExternalData).
While every type of Specification may be manually specified, an automatic translation is most commonly used. The most frequent procedure is to create a CollectionSpecification, which is then translated to a SFDBSpecification that is then translated to an ExternalDataSpecification.
CollectionSpecification to SFDBSpecification
The function
translates a CollectionSpecification to a SFDBSpecification. The SFDBSpecification specifies one time series per ticker. The extend_first
and extend_last
parameters indicate how many days/minutes etc of data should be added to the beginning and end of the timeseries. If extend_first=0
and extend_last=0
, the resulting SFDBSpecification will specify a timeseries that begins from the first datetime specified for that ticker in the CollectionSpecification and end with the last datetime.
SFDBSpecification to ExternalDataSpecification
A SFDBSpecification is converted to an ExternalDataSpecification by the function
The constituents are here kept as they are, but the data fields are modified to convert logarithmized data fields to their non logarithmized counterparts.
Functions
For functions related to Specifications seeSpecification - Functions