Whether state exists or not.
Get the state value if it exists, or throw NoSuchElementException.
Get the state value if it exists, or throw NoSuchElementException.
Get the state value as a scala Option.
Whether the function has been called because the key has timed out.
Whether the function has been called because the key has timed out.
This can return true only when timeouts are enabled in [map/flatmap]GroupsWithStates
.
Remove this state.
Set the timeout duration for this key as a string.
Set the timeout duration for this key as a string. For example, "1 hour", "2 days", etc.
ProcessingTimeTimeout must be enabled in [map/flatmap]GroupsWithStates
.
Set the timeout duration in ms for this key.
Set the timeout duration in ms for this key.
ProcessingTimeTimeout must be enabled in [map/flatmap]GroupsWithStates
.
Update the value of the state.
Update the value of the state. Note that null
is not a valid value, and it throws
IllegalArgumentException.
:: Experimental ::
Wrapper class for interacting with per-group state data in
mapGroupsWithState
andflatMapGroupsWithState
operations onKeyValueGroupedDataset
.Detail description on
[map/flatMap]GroupsWithState
operation -------------------------------------------------------------- Both,mapGroupsWithState
andflatMapGroupsWithState
inKeyValueGroupedDataset
will invoke the user-given function on each group (defined by the grouping function inDataset.groupByKey()
) while maintaining user-defined per-group state between invocations. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger. That is, in every batch of theStreamingQuery
, the function will be invoked once for each group that has data in the trigger. Furthermore, if timeout is set, then the function will invoked on timed out groups (more detail below).The function is invoked with following parameters.
In case of a batch Dataset, there is only one invocation and state object will be empty as there is no prior state. Essentially, for batch Datasets,
[map/flatMap]GroupsWithState
is equivalent to[map/flatMap]Groups
and any updates to the state and/or timeouts have no effect.The major difference between
mapGroupsWithState
andflatMapGroupsWithState
is that the former allows the function to return one and only one record, whereas the latter allows the function to return any number of records (including no records). Furthermore, theflatMapGroupsWithState
is associated with an operation output mode, which can be eitherAppend
orUpdate
. Semantically, this defines whether the output records of one trigger is effectively replacing the previously output records (from previous triggers) or is appending to the list of previously output records. Essentially, this defines how the Result Table (refer to the semantics in the programming guide) is updated, and allows us to reason about the semantics of later operations.Important points to note about the function (both mapGroupsWithState and flatMapGroupsWithState).
GroupStateTimeout
below.Important points to note about using
GroupState
.IllegalArgumentException
.GroupState
are not thread-safe. This is to avoid memory barriers.remove()
is called, thenexists()
will returnfalse
,get()
will throwNoSuchElementException
andgetOption()
will returnNone
update(newState)
is called, thenexists()
will again returntrue
,get()
andgetOption()
will return the updated value.Important points to note about using
GroupStateTimeout
.timeout
param in[map|flatMap]GroupsWithState
, but the exact timeout duration/timestamp is configurable per group by callingsetTimeout...()
inGroupState
.GroupStateTimeout.ProcessingTimeTimeout
) or event time (i.e.GroupStateTimeout.EventTimeTimeout
).ProcessingTimeTimeout
, the timeout duration can be set by callingGroupState.setTimeoutDuration
. The timeout will occur when the clock has advanced by the set duration. Guarantees provided by this timeout with a duration of D ms are as follows:EventTimeTimeout
, the user also has to specify the the the event time watermark in the query usingDataset.withWatermark()
. With this setting, data that is older than the watermark are filtered out. The timeout can be set for a group by setting a timeout timestamp usingGroupState.setTimeoutTimestamp()
, and the timeout would occur when the watermark advances beyond the set timestamp. You can control the timeout delay by two parameters - (i) watermark delay and an additional duration beyond the timestamp in the event (which is guaranteed to be newer than watermark due to the filtering). Guarantees provided by this timeout are as follows:GroupState.hasTimedOut()
set to true.Scala example of using GroupState in
mapGroupsWithState
:Java example of using
GroupState
:User-defined type of the state to be stored for each group. Must be encodable into Spark SQL types (see
Encoder
for more details).2.2.0