FileCommitProtocol (Spark 2.2.0 JavaDoc)

Object
- org.apache.spark.internal.io.FileCommitProtocol

Direct Known Subclasses:

HadoopMapReduceCommitProtocol
```
public abstract class FileCommitProtocol
extends Object
```
An interface to define how a single Spark job commits its outputs. Two notes:
1. Implementations must be serializable, as the committer instance instantiated on the driver will be used for tasks on executors. 2. Implementations should have a constructor with either 2 or 3 arguments: (jobId: String, path: String) or (jobId: String, path: String, isAppend: Boolean). 3. A committer should not be reused across multiple Spark jobs.
The proper call sequence is:
1. Driver calls setupJob. 2. As part of each task's execution, executor calls setupTask and then commitTask (or abortTask if task failed). 3. When all necessary tasks completed successfully, the driver calls commitJob. If the job failed to execute (e.g. too many failed tasks), the job should call abortJob.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class FileCommitProtocol.EmptyTaskCommitMessage$

static class FileCommitProtocol.TaskCommitMessage

Nested Classes
Modifier and Type	Class and Description
`static class`	`FileCommitProtocol.EmptyTaskCommitMessage$`
`static class`	`FileCommitProtocol.TaskCommitMessage`

Constructor Summary

Constructors
Constructor and Description

FileCommitProtocol()

Constructors
Constructor and Description
`FileCommitProtocol()`

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`abstract void`	`abortJob(org.apache.hadoop.mapreduce.JobContext jobContext)` Aborts a job after the writes fail.
`abstract void`	`abortTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)` Aborts a task after the writes have failed.
`abstract void`	`commitJob(org.apache.hadoop.mapreduce.JobContext jobContext, scala.collection.Seq<FileCommitProtocol.TaskCommitMessage> taskCommits)` Commits a job after the writes succeed.
`abstract FileCommitProtocol.TaskCommitMessage`	`commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)` Commits a task after the writes succeed.
`boolean`	`deleteWithJob(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, boolean recursive)` Specifies that a file should be deleted with the commit of this job.
`static FileCommitProtocol`	`instantiate(String className, String jobId, String outputPath, boolean isAppend)` Instantiates a FileCommitProtocol using the given className.
`abstract String`	`newTaskTempFile(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext, scala.Option<String> dir, String ext)` Notifies the commit protocol to add a new file, and gets back the full path that should be used.
`abstract String`	`newTaskTempFileAbsPath(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext, String absoluteDir, String ext)` Similar to newTaskTempFile(), but allows files to committed to an absolute output location.
`void`	`onTaskCommit(FileCommitProtocol.TaskCommitMessage taskCommit)` Called on the driver after a task commits.
`abstract void`	`setupJob(org.apache.hadoop.mapreduce.JobContext jobContext)` Setups up a job.
`abstract void`	`setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)` Sets up a task within a job.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - FileCommitProtocol
```
public FileCommitProtocol()
```
- Method Detail
  - instantiate
```
public static FileCommitProtocol instantiate(String className,
                                             String jobId,
                                             String outputPath,
                                             boolean isAppend)
```
    Instantiates a FileCommitProtocol using the given className.
    
    Parameters:
    
    className - (undocumented)
    
    jobId - (undocumented)
    
    outputPath - (undocumented)
    
    isAppend - (undocumented)
    
    Returns:
    
    (undocumented)
  - setupJob
```
public abstract void setupJob(org.apache.hadoop.mapreduce.JobContext jobContext)
```
    Setups up a job. Must be called on the driver before any other methods can be invoked.
    
    Parameters:
    
    jobContext - (undocumented)
  - commitJob
```
public abstract void commitJob(org.apache.hadoop.mapreduce.JobContext jobContext,
                               scala.collection.Seq<FileCommitProtocol.TaskCommitMessage> taskCommits)
```
    Commits a job after the writes succeed. Must be called on the driver.
    
    Parameters:
    
    jobContext - (undocumented)
    
    taskCommits - (undocumented)
  - abortJob
```
public abstract void abortJob(org.apache.hadoop.mapreduce.JobContext jobContext)
```
    Aborts a job after the writes fail. Must be called on the driver.
    Calling this function is a best-effort attempt, because it is possible that the driver just crashes (or killed) before it can call abort.
    
    Parameters:
    
    jobContext - (undocumented)
  - setupTask
```
public abstract void setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
```
    Sets up a task within a job. Must be called before any other task related methods can be invoked.
    
    Parameters:
    
    taskContext - (undocumented)
  - newTaskTempFile
```
public abstract String newTaskTempFile(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext,
                                       scala.Option<String> dir,
                                       String ext)
```
    Notifies the commit protocol to add a new file, and gets back the full path that should be used. Must be called on the executors when running tasks.
    Note that the returned temp file may have an arbitrary path. The commit protocol only promises that the file will be at the location specified by the arguments after job commit.
    A full file path consists of the following parts: 1. the base path 2. some sub-directory within the base path, used to specify partitioning 3. file prefix, usually some unique job id with the task id 4. bucket id 5. source specific file extension, e.g. ".snappy.parquet"
    The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 5, and the rest are left to the commit protocol implementation to decide.
    Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.
    
    Parameters:
    
    taskContext - (undocumented)
    
    dir - (undocumented)
    
    ext - (undocumented)
    
    Returns:
    
    (undocumented)
  - newTaskTempFileAbsPath
```
public abstract String newTaskTempFileAbsPath(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext,
                                              String absoluteDir,
                                              String ext)
```
    Similar to newTaskTempFile(), but allows files to committed to an absolute output location. Depending on the implementation, there may be weaker guarantees around adding files this way.
    Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.
    
    Parameters:
    
    taskContext - (undocumented)
    
    absoluteDir - (undocumented)
    
    ext - (undocumented)
    
    Returns:
    
    (undocumented)
  - commitTask
```
public abstract FileCommitProtocol.TaskCommitMessage commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
```
    Commits a task after the writes succeed. Must be called on the executors when running tasks.
    
    Parameters:
    
    taskContext - (undocumented)
    
    Returns:
    
    (undocumented)
  - abortTask
```
public abstract void abortTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
```
    Aborts a task after the writes have failed. Must be called on the executors when running tasks.
    Calling this function is a best-effort attempt, because it is possible that the executor just crashes (or killed) before it can call abort.
    
    Parameters:
    
    taskContext - (undocumented)
  - deleteWithJob
```
public boolean deleteWithJob(org.apache.hadoop.fs.FileSystem fs,
                             org.apache.hadoop.fs.Path path,
                             boolean recursive)
```
    Specifies that a file should be deleted with the commit of this job. The default implementation deletes the file immediately.
    
    Parameters:
    
    fs - (undocumented)
    
    path - (undocumented)
    
    recursive - (undocumented)
    
    Returns:
    
    (undocumented)
  - onTaskCommit
```
public void onTaskCommit(FileCommitProtocol.TaskCommitMessage taskCommit)
```
    Called on the driver after a task commits. This can be used to access task commit messages before the job has finished. These same task commit messages will be passed to commitJob() if the entire job succeeds.
    
    Parameters:
    
    taskCommit - (undocumented)

Class FileCommitProtocol

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class Object

Constructor Detail

FileCommitProtocol

Method Detail

instantiate

setupJob

commitJob

abortJob

setupTask

newTaskTempFile

newTaskTempFileAbsPath

commitTask

abortTask

deleteWithJob

onTaskCommit