HadoopRDD (Spark 1.0.0 JavaDoc)

Object
- org.apache.spark.rdd.RDD<scala.Tuple2<K,V>>
- - org.apache.spark.rdd.HadoopRDD<K,V>

All Implemented Interfaces:

java.io.Serializable, Logging
```
public class HadoopRDD<K,V>
extends RDD<scala.Tuple2<K,V>>
implements Logging
```
:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the older MapReduce API (org.apache.hadoop.mapred).
Note: Instantiating this class directly is not recommended, please use org.apache.spark.SparkContext.hadoopRDD()

See Also:
Serialized Form

Constructor Summary

Constructors
Constructor and Description
`HadoopRDD(SparkContext sc, Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>> broadcastedConf, scala.Option<scala.Function1<org.apache.hadoop.mapred.JobConf,scala.runtime.BoxedUnit>> initLocalJobConfFuncOpt, Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass, Class<K> keyClass, Class<V> valueClass, int minPartitions)`
`HadoopRDD(SparkContext sc, org.apache.hadoop.mapred.JobConf conf, Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass, Class<K> keyClass, Class<V> valueClass, int minPartitions)`

Method Summary

Methods
Modifier and Type	Method and Description
`static void`	`addLocalConfiguration(String jobTrackerId, int jobId, int splitId, int attemptId, org.apache.hadoop.mapred.JobConf conf)` Add Hadoop configuration specific to a single partition and attempt.
`void`	`checkpoint()` Mark this RDD for checkpointing.
`InterruptibleIterator<scala.Tuple2<K,V>>`	`compute(Partition theSplit, TaskContext context)` :: DeveloperApi :: Implemented by subclasses to compute a given partition.
`static boolean`	`containsCachedMetadata(String key)`
`static Object`	`getCachedMetadata(String key)` The three methods below are helpers for accessing the local map, a property of the SparkEnv of the local process.
`org.apache.hadoop.conf.Configuration`	`getConf()`
`Partition[]`	`getPartitions()` Implemented by subclasses to return the set of partitions in this RDD.
`scala.collection.Seq<String>`	`getPreferredLocations(Partition split)` Optionally overridden by subclasses to specify placement preferences.
`static Object`	`putCachedMetadata(String key, Object value)`

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpointData, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countByValue, countByValueApprox, creationSiteInfo, dependencies, distinct, distinct, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initialized, initializeIfNecessary, initializeLogging, initLock, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logTrace, logTrace, logWarning, logWarning

Constructor Detail

HadoopRDD

public HadoopRDD(SparkContext sc,
         Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>> broadcastedConf,
         scala.Option<scala.Function1<org.apache.hadoop.mapred.JobConf,scala.runtime.BoxedUnit>> initLocalJobConfFuncOpt,
         Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass,
         Class<K> keyClass,
         Class<V> valueClass,
         int minPartitions)

HadoopRDD

public HadoopRDD(SparkContext sc,
         org.apache.hadoop.mapred.JobConf conf,
         Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass,
         Class<K> keyClass,
         Class<V> valueClass,
         int minPartitions)

Method Detail
- getCachedMetadata
```
public static Object getCachedMetadata(String key)
```
  The three methods below are helpers for accessing the local map, a property of the SparkEnv of the local process.
- containsCachedMetadata
```
public static boolean containsCachedMetadata(String key)
```
- putCachedMetadata
```
public static Object putCachedMetadata(String key,
                       Object value)
```
- addLocalConfiguration
```
public static void addLocalConfiguration(String jobTrackerId,
                         int jobId,
                         int splitId,
                         int attemptId,
                         org.apache.hadoop.mapred.JobConf conf)
```
  Add Hadoop configuration specific to a single partition and attempt.
- getPartitions
```
public Partition[] getPartitions()
```
  Description copied from class: RDD
  
  Implemented by subclasses to return the set of partitions in this RDD. This method will only be called once, so it is safe to implement a time-consuming computation in it.
- compute
```
public InterruptibleIterator<scala.Tuple2<K,V>> compute(Partition theSplit,
                                               TaskContext context)
```
  Description copied from class: RDD
  
  :: DeveloperApi :: Implemented by subclasses to compute a given partition.
  
  Specified by:
  
  compute in class RDD<scala.Tuple2<K,V>>
- getPreferredLocations
```
public scala.collection.Seq<String> getPreferredLocations(Partition split)
```
  Description copied from class: RDD
  
  Optionally overridden by subclasses to specify placement preferences.
- checkpoint
```
public void checkpoint()
```
  Description copied from class: RDD
  
  Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint directory set with SparkContext.setCheckpointDir() and all references to its parent RDDs will be removed. This function must be called before any job has been executed on this RDD. It is strongly recommended that this RDD is persisted in memory, otherwise saving it on a file will require recomputation.
  
  Overrides:
  
  checkpoint in class RDD<scala.Tuple2<K,V>>
- getConf
```
public org.apache.hadoop.conf.Configuration getConf()
```

Class HadoopRDD<K,V>

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

HadoopRDD

HadoopRDD

Method Detail

getCachedMetadata

containsCachedMetadata

putCachedMetadata

addLocalConfiguration

getPartitions

compute

getPreferredLocations

checkpoint

getConf