SQLContext (Spark 1.2.1 JavaDoc)

Object
- org.apache.spark.sql.SQLContext

All Implemented Interfaces:

java.io.Serializable, Logging, CacheManager, SQLConf, UDFRegistration

Direct Known Subclasses:

HiveContext, TestSQLContext
```
public class SQLContext
extends Object
implements Logging, SQLConf, CacheManager, UDFRegistration, scala.Serializable
```
:: AlphaComponent :: The entry point for running relational queries using Spark. Allows the creation of SchemaRDD objects and the execution of SQL queries.

See Also:
Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from interface org.apache.spark.sql.SQLConf
  SQLConf.Deprecated$

Constructor Summary

Constructors
Constructor and Description

SQLContext(SparkContext sparkContext)

Constructors
Constructor and Description
`SQLContext(SparkContext sparkContext)`

Method Summary

Methods
Modifier and Type	Method and Description
`SchemaRDD`	`applySchema(RDD<org.apache.spark.sql.catalyst.expressions.Row> rowRDD, org.apache.spark.sql.catalyst.types.StructType schema)` :: DeveloperApi :: Creates a `SchemaRDD` from an `RDD` containing `Row`s by applying a schema to this RDD.
`SchemaRDD`	`applySchemaToPythonRDD(RDD<Object[]> rdd, String schemaString)` Apply a schema defined by the schemaString to an RDD.
`SchemaRDD`	`applySchemaToPythonRDD(RDD<Object[]> rdd, org.apache.spark.sql.catalyst.types.StructType schema)` Apply a schema defined by the schema to an RDD.
`SchemaRDD`	`baseRelationToSchemaRDD(BaseRelation baseRelation)`
`<A extends scala.Product> SchemaRDD`	`createParquetFile(String path, boolean allowExisting, org.apache.hadoop.conf.Configuration conf, scala.reflect.api.TypeTags.TypeTag<A> evidence$2)` :: Experimental :: Creates an empty parquet file with the schema of class `A`, which can be registered as a table.
`<A extends scala.Product> SchemaRDD`	`createSchemaRDD(RDD<A> rdd, scala.reflect.api.TypeTags.TypeTag<A> evidence$1)` Creates a SchemaRDD from an RDD of case classes.
`void`	`dropTempTable(String tableName)` Drops the temporary table with the given table name in the catalog.
`scala.collection.Seq<org.apache.spark.sql.catalyst.planning.GenericStrategy<SparkPlan>>`	`extraStrategies()` :: DeveloperApi :: Allows extra strategies to be injected into the query planner at runtime.
`SchemaRDD`	`jsonFile(String path)` Loads a JSON file (one object per line), returning the result as a `SchemaRDD`.
`SchemaRDD`	`jsonFile(String path, double samplingRatio)` :: Experimental ::
`SchemaRDD`	`jsonFile(String path, org.apache.spark.sql.catalyst.types.StructType schema)` :: Experimental :: Loads a JSON file (one object per line) and applies the given schema, returning the result as a `SchemaRDD`.
`SchemaRDD`	`jsonRDD(RDD<String> json)` Loads an RDD[String] storing JSON objects (one object per record), returning the result as a `SchemaRDD`.
`SchemaRDD`	`jsonRDD(RDD<String> json, double samplingRatio)` :: Experimental ::
`SchemaRDD`	`jsonRDD(RDD<String> json, org.apache.spark.sql.catalyst.types.StructType schema)` :: Experimental :: Loads an RDD[String] storing JSON objects (one object per record) and applies the given schema, returning the result as a `SchemaRDD`.
`SchemaRDD`	`logicalPlanToSparkQuery(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)` :: DeveloperApi :: Allows catalyst LogicalPlans to be executed as a SchemaRDD.
`SchemaRDD`	`parquetFile(String path)` Loads a Parquet file, returning the result as a `SchemaRDD`.
`org.apache.spark.sql.catalyst.types.DataType`	`parseDataType(String dataTypeString)` Parses the data type in our internal string representation.
`void`	`registerRDDAsTable(SchemaRDD rdd, String tableName)` Registers the given RDD as a temporary table in the catalog.
`SparkContext`	`sparkContext()`
`SchemaRDD`	`sql(String sqlText)` Executes a SQL query using Spark, returning the result as a SchemaRDD.
`SchemaRDD`	`table(String tableName)` Returns the specified table as a SchemaRDD

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

Methods inherited from interface org.apache.spark.sql.SQLConf
autoBroadcastJoinThreshold, clear, codegenEnabled, columnBatchSize, columnNameOfCorruptRecord, defaultSizeInBytes, dialect, externalSortEnabled, getAllConfs, getConf, getConf, inMemoryPartitionPruning, isParquetBinaryAsString, numShufflePartitions, parquetCompressionCodec, parquetFilterPushDown, setConf, setConf, settings, useCompression

Methods inherited from interface org.apache.spark.sql.CacheManager
cachedData, cacheLock, cacheQuery, cacheTable, clearCache, invalidateCache, isCached, lookupCachedData, lookupCachedData, readLock, tryUncacheQuery, uncacheQuery, uncacheTable, useCachedData, writeLock

Methods inherited from interface org.apache.spark.sql.UDFRegistration
registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerFunction, registerPython

- Constructor Detail
  - SQLContext
```
public SQLContext(SparkContext sparkContext)
```
- Method Detail
  - sparkContext
```
public SparkContext sparkContext()
```
  - logicalPlanToSparkQuery
```
public SchemaRDD logicalPlanToSparkQuery(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)
```
    :: DeveloperApi :: Allows catalyst LogicalPlans to be executed as a SchemaRDD. Note that the LogicalPlan interface is considered internal, and thus not guaranteed to be stable. As a result, using them directly is not recommended.
  - createSchemaRDD
```
public <A extends scala.Product> SchemaRDD createSchemaRDD(RDD<A> rdd,
                                                  scala.reflect.api.TypeTags.TypeTag<A> evidence$1)
```
    Creates a SchemaRDD from an RDD of case classes.
  - baseRelationToSchemaRDD
```
public SchemaRDD baseRelationToSchemaRDD(BaseRelation baseRelation)
```
  - applySchema
```
public SchemaRDD applySchema(RDD<org.apache.spark.sql.catalyst.expressions.Row> rowRDD,
                    org.apache.spark.sql.catalyst.types.StructType schema)
```
    :: DeveloperApi :: Creates a SchemaRDD from an RDD containing Rows by applying a schema to this RDD. It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime exception. Example:
```
  import org.apache.spark.sql._
  val sqlContext = new org.apache.spark.sql.SQLContext(sc)

  val schema =
    StructType(
      StructField("name", StringType, false) ::
      StructField("age", IntegerType, true) :: Nil)

  val people =
    sc.textFile("examples/src/main/resources/people.txt").map(
      _.split(",")).map(p => Row(p(0), p(1).trim.toInt))
  val peopleSchemaRDD = sqlContext. applySchema(people, schema)
  peopleSchemaRDD.printSchema
  // root
  // |-- name: string (nullable = false)
  // |-- age: integer (nullable = true)

    peopleSchemaRDD.registerTempTable("people")
  sqlContext.sql("select name from people").collect.foreach(println)
 
```
  - parquetFile
```
public SchemaRDD parquetFile(String path)
```
    Loads a Parquet file, returning the result as a SchemaRDD.
  - jsonFile
```
public SchemaRDD jsonFile(String path)
```
    Loads a JSON file (one object per line), returning the result as a SchemaRDD. It goes through the entire dataset once to determine the schema.
  - jsonFile
```
public SchemaRDD jsonFile(String path,
                 org.apache.spark.sql.catalyst.types.StructType schema)
```
    :: Experimental :: Loads a JSON file (one object per line) and applies the given schema, returning the result as a SchemaRDD.
  - jsonFile
```
public SchemaRDD jsonFile(String path,
                 double samplingRatio)
```
    :: Experimental ::
  - jsonRDD
```
public SchemaRDD jsonRDD(RDD<String> json)
```
    Loads an RDD[String] storing JSON objects (one object per record), returning the result as a SchemaRDD. It goes through the entire dataset once to determine the schema.
  - jsonRDD
```
public SchemaRDD jsonRDD(RDD<String> json,
                org.apache.spark.sql.catalyst.types.StructType schema)
```
    :: Experimental :: Loads an RDD[String] storing JSON objects (one object per record) and applies the given schema, returning the result as a SchemaRDD.
  - jsonRDD
```
public SchemaRDD jsonRDD(RDD<String> json,
                double samplingRatio)
```
    :: Experimental ::
  - createParquetFile
```
public <A extends scala.Product> SchemaRDD createParquetFile(String path,
                                                    boolean allowExisting,
                                                    org.apache.hadoop.conf.Configuration conf,
                                                    scala.reflect.api.TypeTags.TypeTag<A> evidence$2)
```
    :: Experimental :: Creates an empty parquet file with the schema of class A, which can be registered as a table. This registered table can be used as the target of future insertInto operations.
```
   val sqlContext = new SQLContext(...)
   import sqlContext._

   case class Person(name: String, age: Int)
   createParquetFile[Person]("path/to/file.parquet").registerTempTable("people")
   sql("INSERT INTO people SELECT 'michael', 29")
 
```
    Parameters:
    path - The path where the directory containing parquet metadata should be created. Data inserted into this table will also be stored at this location.
    allowExisting - When false, an exception will be thrown if this directory already exists.
    conf - A Hadoop configuration object that can be used to specify options to the parquet output format.
  - registerRDDAsTable
```
public void registerRDDAsTable(SchemaRDD rdd,
                      String tableName)
```
    Registers the given RDD as a temporary table in the catalog. Temporary tables exist only during the lifetime of this instance of SQLContext.
  - dropTempTable
```
public void dropTempTable(String tableName)
```
    Drops the temporary table with the given table name in the catalog. If the table has been cached/persisted before, it's also unpersisted.
    
    Parameters:
    tableName - the name of the table to be unregistered.
  - sql
```
public SchemaRDD sql(String sqlText)
```
    Executes a SQL query using Spark, returning the result as a SchemaRDD. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.
  - table
```
public SchemaRDD table(String tableName)
```
    Returns the specified table as a SchemaRDD
  - extraStrategies
```
public scala.collection.Seq<org.apache.spark.sql.catalyst.planning.GenericStrategy<SparkPlan>> extraStrategies()
```
    :: DeveloperApi :: Allows extra strategies to be injected into the query planner at runtime. Note this API should be consider experimental and is not intended to be stable across releases.
  - parseDataType
```
public org.apache.spark.sql.catalyst.types.DataType parseDataType(String dataTypeString)
```
    Parses the data type in our internal string representation. The data type string should have the same format as the one generated by toString in scala. It is only used by PySpark.
  - applySchemaToPythonRDD
```
public SchemaRDD applySchemaToPythonRDD(RDD<Object[]> rdd,
                               String schemaString)
```
    Apply a schema defined by the schemaString to an RDD. It is only used by PySpark.
  - applySchemaToPythonRDD
```
public SchemaRDD applySchemaToPythonRDD(RDD<Object[]> rdd,
                               org.apache.spark.sql.catalyst.types.StructType schema)
```
    Apply a schema defined by the schema to an RDD. It is only used by PySpark.

Class SQLContext

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.sql.SQLConf

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Methods inherited from interface org.apache.spark.sql.SQLConf

Methods inherited from interface org.apache.spark.sql.CacheManager

Methods inherited from interface org.apache.spark.sql.UDFRegistration

Constructor Detail

SQLContext

Method Detail

sparkContext

logicalPlanToSparkQuery

createSchemaRDD

baseRelationToSchemaRDD

applySchema

parquetFile

jsonFile

jsonFile

jsonFile

jsonRDD

jsonRDD

jsonRDD

createParquetFile

registerRDDAsTable

dropTempTable

sql

table

extraStrategies

parseDataType

applySchemaToPythonRDD

applySchemaToPythonRDD