Collect time bars operations in Amazon FinSpace - Amazon FinSpace

Collect time bars operations in Amazon FinSpace

Important

Amazon FinSpace Dataset Browser will be discontinued on November 29, 2024. Starting November 29, 2023, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using Amazon FinSpace with Managed Kdb Insights will not be affected. For more information, review the FAQ or contact AWS Support to assist with your transition.

The objective of functions at this stage is to collect the series of events that arrive at an irregular frequency into uniform intervals called bars. You can perform collection with your functions or use the Amazon FinSpace functions to calculate bars. Collect functions are available in the aws.finspace.timeseries.spark.windows module and include the following list of functions.

Compute analytics on features

aws.finspace.timeseries.spark.windows.compute_analytics_on_features(data, new_column, func, partition_col_list=None, add_intermediate=False)

Appends to data Dataframe, a new column whose value is computed by executing pandas user defined function (UDF) on a window of rows as specified by the function window dependency member.

Parameters

  • data (DataFrame) – input dataframe

  • new_column (str) – name of new column to add

  • input_spec – input specification

  • func (Callable[…​, Column]) – function to calculate over data

  • grouping_col_list – a single or list of columns to group window on

  • add_intermediate (Optional[bool]) – include intermediate data used in the calculation

Return type DataFrame

Returns

Compute features on time bars

aws.finspace.timeseries.spark.windows.compute_features_on_time_bars(data, new_column, func, force_ordering=False,*ordering_cols)

Reduces data by applying function preserving all other columns.

Parameters

  • data (DataFrame) – input DataFrame

  • new_column (str) – new column name

  • func (Callable[…​, Column]) – function to calculate over data

  • force_ordering (Optional[bool]) – return data in sort in timecolumn order

  • ordering_cols (str) – list of cols to orderBy on

Return type DataFrame

Returns DataFrame

Create time bars

aws.finspace.timeseries.spark.windows.create_time_bars(data, timebar_column, grouping_col_list, input_spec, timebar_spec, force_ordering=False)

Appends a column to the data frame in data with a rolling window of data. An optional force_ordering flag ensures that the rolling data is order by the timebar_column.

Parameters

  • data (Union[Column, DataFrame]) – input dataframe

  • timebar_column (str) – new timebar column name

  • grouping_col_list (Union[str, List[str]]) – list of columns to group results on

  • input_spec (BarInputSpec) – the input spec used to generate the time bars

  • timebar_spec (Union[TimeBarSpec, Column]) – the timebar spec used to generate the time bars

  • force_ordering (Optional[bool]) – optional force ordering in windows

Return type DataFrame

Returns DataFrame

Spark spec module

Bar input spec

class aws.finspace.timeseries.spark.spec.BarInputSpec(bar_structure_name, *bar_value_columns)

Bases: object

This class is responsible for modeling the input specification of bar operations.

Calc input spec

class aws.finspace.timeseries.spark.spec.CalcInputSpec(timestamp_column, holiday_calendar=<aws.finspace.finance.calendars.USEndOfDayCalenobject>, **kwargs_func_to_column)`

Bases: object

This class is responsible for modeling the input specification of calculation operations.

Time bar spec

class aws.finspace.timeseries.spark.spec.TimeBarSpec(timestamp_column, window_duration, slide_duration=None, start_time=None)

Bases: object

This class models the input time window specification, and associated calendar.

to_window()

Create an equivalent spark window from TimeBarSpec.