public class ShuffleStatus
extends Object
implements org.apache.spark.internal.Logging
MapOutputTrackerMaster
to perform bookkeeping for a single
ShuffleMapStage.
This class maintains a mapping from map index to MapStatus
. It also maintains a cache of
serialized map statuses in order to speed up tasks' requests for map output statuses.
All public methods of this class are thread-safe.
Constructor and Description |
---|
ShuffleStatus(int numPartitions) |
Modifier and Type | Method and Description |
---|---|
void |
addMapOutput(int mapIndex,
MapStatus status)
Register a map output.
|
scala.collection.Seq<Object> |
findMissingPartitions()
Returns the sequence of partition ids that are missing (i.e.
|
boolean |
hasCachedSerializedBroadcast() |
void |
invalidateSerializedMapOutputStatusCache()
Clears the cached serialized map output statuses.
|
MapStatus[] |
mapStatuses()
MapStatus for each partition.
|
int |
numAvailableOutputs()
Number of partitions that have shuffle outputs.
|
void |
removeMapOutput(int mapIndex,
BlockManagerId bmAddress)
Remove the map output which was served by the specified block manager.
|
void |
removeOutputsByFilter(scala.Function1<BlockManagerId,Object> f)
Removes all shuffle outputs which satisfies the filter.
|
void |
removeOutputsOnExecutor(String execId)
Removes all map outputs associated with the specified executor.
|
void |
removeOutputsOnHost(String host)
Removes all shuffle outputs associated with this host.
|
byte[] |
serializedMapStatus(org.apache.spark.broadcast.BroadcastManager broadcastManager,
boolean isLocal,
int minBroadcastSize,
SparkConf conf)
Serializes the mapStatuses array into an efficient compressed format.
|
void |
updateMapOutput(long mapId,
BlockManagerId bmAddress)
Update the map output location (e.g.
|
<T> T |
withMapStatuses(scala.Function1<MapStatus[],T> f)
Helper function which provides thread-safe access to the mapStatuses array.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public void addMapOutput(int mapIndex, MapStatus status)
mapIndex
- (undocumented)status
- (undocumented)public scala.collection.Seq<Object> findMissingPartitions()
public boolean hasCachedSerializedBroadcast()
public void invalidateSerializedMapOutputStatusCache()
public MapStatus[] mapStatuses()
public int numAvailableOutputs()
public void removeMapOutput(int mapIndex, BlockManagerId bmAddress)
mapIndex
- (undocumented)bmAddress
- (undocumented)public void removeOutputsByFilter(scala.Function1<BlockManagerId,Object> f)
f
- (undocumented)public void removeOutputsOnExecutor(String execId)
execId
- (undocumented)public void removeOutputsOnHost(String host)
host
- (undocumented)public byte[] serializedMapStatus(org.apache.spark.broadcast.BroadcastManager broadcastManager, boolean isLocal, int minBroadcastSize, SparkConf conf)
MapOutputTracker.serializeMapStatuses()
for more details on the serialization format.
This method is designed to be called multiple times and implements caching in order to speed up subsequent requests. If the cache is empty and multiple threads concurrently attempt to serialize the map statuses then serialization will only be performed in a single thread and all other threads will block until the cache is populated.
broadcastManager
- (undocumented)isLocal
- (undocumented)minBroadcastSize
- (undocumented)conf
- (undocumented)public void updateMapOutput(long mapId, BlockManagerId bmAddress)
mapId
- (undocumented)bmAddress
- (undocumented)public <T> T withMapStatuses(scala.Function1<MapStatus[],T> f)
f
- (undocumented)