pyspark.sql.streaming.StatefulProcessor.handleInitialState#

StatefulProcessor.handleInitialState(key, initialState, timerValues)[source]#

Optional to implement. Will act as no-op if not defined or no initial state input. Function that will be invoked only in the first batch for users to process initial states.

Type of initial state is different by which method is called, such as:

For transformWithStateInPandas, it should take pandas.DataFrame. For transformWithState, it should take pyspark.sql.Row.

Parameters
keyAny

grouping key.

initialState: :class:`pandas.DataFrame` or :class:`pyspark.sql.Row`

One dataframe/row in the initial state associated with the key.

timerValues: TimerValues

Timer value for the current batch that process the input rows. Users can get the processing or event time timestamp from TimerValues.