SQLContext (Spark 3.0.0-preview2 JavaDoc)

Object
- org.apache.spark.sql.SQLContext

All Implemented Interfaces:

java.io.Serializable, Logging
```
public class SQLContext
extends Object
implements Logging, scala.Serializable
```
The entry point for working with structured data (rows and columns) in Spark 1.x.
As of Spark 2.0, this is replaced by SparkSession. However, we are keeping the class here for backward compatibility.

Since:

1.0.0

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

class SQLContext.implicits$
(Scala-specific) Implicit methods available in Scala for converting common Scala objects into DataFrames.

Nested Classes
Modifier and Type	Class and Description
`class`	`SQLContext.implicits$` (Scala-specific) Implicit methods available in Scala for converting common Scala objects into `DataFrame`s.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Dataset<Row>`	`baseRelationToDataFrame(BaseRelation baseRelation)` Convert a `BaseRelation` created for external data sources into a `DataFrame`.
`void`	`cacheTable(String tableName)` Caches the specified table in-memory.
`void`	`clearCache()` Removes all cached tables from the in-memory cache.
`Dataset<Row>`	`createDataFrame(JavaRDD<?> rdd, Class<?> beanClass)` Applies a schema to an RDD of Java Beans.
`Dataset<Row>`	`createDataFrame(JavaRDD<Row> rowRDD, StructType schema)` :: DeveloperApi :: Creates a `DataFrame` from a `JavaRDD` containing `Row`s using the given schema.
`Dataset<Row>`	`createDataFrame(java.util.List<?> data, Class<?> beanClass)` Applies a schema to a List of Java Beans.
`Dataset<Row>`	`createDataFrame(java.util.List<Row> rows, StructType schema)` :: DeveloperApi :: Creates a `DataFrame` from a `java.util.List` containing `Row`s using the given schema.
`Dataset<Row>`	`createDataFrame(RDD<?> rdd, Class<?> beanClass)` Applies a schema to an RDD of Java Beans.
`<A extends scala.Product> Dataset<Row>`	`createDataFrame(RDD<A> rdd, scala.reflect.api.TypeTags.TypeTag<A> evidence$1)` Creates a DataFrame from an RDD of Product (e.g.
`Dataset<Row>`	`createDataFrame(RDD<Row> rowRDD, StructType schema)` :: DeveloperApi :: Creates a `DataFrame` from an `RDD` containing `Row`s using the given schema.
`<A extends scala.Product> Dataset<Row>`	`createDataFrame(scala.collection.Seq<A> data, scala.reflect.api.TypeTags.TypeTag<A> evidence$2)` Creates a DataFrame from a local Seq of Product.
`<T> Dataset<T>`	`createDataset(java.util.List<T> data, Encoder<T> evidence$5)` Creates a `Dataset` from a `java.util.List` of a given type.
`<T> Dataset<T>`	`createDataset(RDD<T> data, Encoder<T> evidence$4)` Creates a `Dataset` from an RDD of a given type.
`<T> Dataset<T>`	`createDataset(scala.collection.Seq<T> data, Encoder<T> evidence$3)` Creates a `Dataset` from a local Seq of data of a given type.
`void`	`dropTempTable(String tableName)` Drops the temporary table with the given table name in the catalog.
`Dataset<Row>`	`emptyDataFrame()` Returns a `DataFrame` with no rows or columns.
`ExperimentalMethods`	`experimental()` :: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
`scala.collection.immutable.Map<String,String>`	`getAllConfs()` Return all the configuration properties that have been set (i.e.
`String`	`getConf(String key)` Return the value of Spark SQL configuration property for the given key.
`String`	`getConf(String key, String defaultValue)` Return the value of Spark SQL configuration property for the given key.
`SQLContext.implicits$`	`implicits()` Accessor for nested Scala object
`boolean`	`isCached(String tableName)` Returns true if the table is currently cached in-memory.
`ExecutionListenerManager`	`listenerManager()` An interface to register custom `QueryExecutionListener`s that listen for execution metrics.
`SQLContext`	`newSession()` Returns a `SQLContext` as new session, with separated SQL configurations, temporary tables, registered functions, but sharing the same `SparkContext`, cached data and other things.
`Dataset<Row>`	`range(long end)` Creates a `DataFrame` with a single `LongType` column named `id`, containing elements in a range from 0 to `end` (exclusive) with step value 1.
`Dataset<Row>`	`range(long start, long end)` Creates a `DataFrame` with a single `LongType` column named `id`, containing elements in a range from `start` to `end` (exclusive) with step value 1.
`Dataset<Row>`	`range(long start, long end, long step)` Creates a `DataFrame` with a single `LongType` column named `id`, containing elements in a range from `start` to `end` (exclusive) with a step value.
`Dataset<Row>`	`range(long start, long end, long step, int numPartitions)` Creates a `DataFrame` with a single `LongType` column named `id`, containing elements in an range from `start` to `end` (exclusive) with an step value, with partition number specified.
`DataFrameReader`	`read()` Returns a `DataFrameReader` that can be used to read non-streaming data in as a `DataFrame`.
`DataStreamReader`	`readStream()` Returns a `DataStreamReader` that can be used to read streaming data in as a `DataFrame`.
`void`	`setConf(java.util.Properties props)` Set Spark SQL configuration properties.
`void`	`setConf(String key, String value)` Set the given Spark SQL configuration property.
`SparkContext`	`sparkContext()`
`SparkSession`	`sparkSession()`
`Dataset<Row>`	`sql(String sqlText)` Executes a SQL query using Spark, returning the result as a `DataFrame`.
`StreamingQueryManager`	`streams()` Returns a `StreamingQueryManager` that allows managing all the `StreamingQueries` active on `this` context.
`Dataset<Row>`	`table(String tableName)` Returns the specified table as a `DataFrame`.
`String[]`	`tableNames()` Returns the names of tables in the current database as an array.
`String[]`	`tableNames(String databaseName)` Returns the names of tables in the given database as an array.
`Dataset<Row>`	`tables()` Returns a `DataFrame` containing names of existing tables in the current database.
`Dataset<Row>`	`tables(String databaseName)` Returns a `DataFrame` containing names of existing tables in the given database.
`UDFRegistration`	`udf()` A collection of methods for registering user-defined functions (UDF).
`void`	`uncacheTable(String tableName)` Removes the specified table from the in-memory cache.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Method Detail
  - implicits
```
public SQLContext.implicits$ implicits()
```
    Accessor for nested Scala object
    
    Returns:
    
    (undocumented)
  - sparkSession
```
public SparkSession sparkSession()
```
  - sparkContext
```
public SparkContext sparkContext()
```
  - newSession
```
public SQLContext newSession()
```
    Returns a SQLContext as new session, with separated SQL configurations, temporary tables, registered functions, but sharing the same SparkContext, cached data and other things.
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.6.0
  - listenerManager
```
public ExecutionListenerManager listenerManager()
```
    An interface to register custom QueryExecutionListeners that listen for execution metrics.
    
    Returns:
    
    (undocumented)
  - setConf
```
public void setConf(java.util.Properties props)
```
    Set Spark SQL configuration properties.
    
    Parameters:
    
    props - (undocumented)
    
    Since:
    
    1.0.0
  - setConf
```
public void setConf(String key,
                    String value)
```
    Set the given Spark SQL configuration property.
    
    Parameters:
    
    key - (undocumented)
    
    value - (undocumented)
    
    Since:
    
    1.0.0
  - getConf
```
public String getConf(String key)
```
    Return the value of Spark SQL configuration property for the given key.
    
    Parameters:
    
    key - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.0.0
  - getConf
```
public String getConf(String key,
                      String defaultValue)
```
    Return the value of Spark SQL configuration property for the given key. If the key is not set yet, return defaultValue.
    
    Parameters:
    
    key - (undocumented)
    
    defaultValue - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.0.0
  - getAllConfs
```
public scala.collection.immutable.Map<String,String> getAllConfs()
```
    Return all the configuration properties that have been set (i.e. not the default). This creates a new copy of the config properties in the form of a Map.
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.0.0
  - experimental
```
public ExperimentalMethods experimental()
```
    :: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - emptyDataFrame
```
public Dataset<Row> emptyDataFrame()
```
    Returns a DataFrame with no rows or columns.
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - udf
```
public UDFRegistration udf()
```
    A collection of methods for registering user-defined functions (UDF).
    The following example registers a Scala closure as UDF:
```
   sqlContext.udf.register("myUDF", (arg1: Int, arg2: String) => arg2 + arg1)
 
```
    The following example registers a UDF in Java:
```
   sqlContext.udf().register("myUDF",
       (Integer arg1, String arg2) -> arg2 + arg1,
       DataTypes.StringType);
 
```
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
    
    Note:
    
    The user-defined functions must be deterministic. Due to optimization, duplicate invocations may be eliminated or the function may even be invoked more times than it is present in the query.
  - isCached
```
public boolean isCached(String tableName)
```
    Returns true if the table is currently cached in-memory.
    
    Parameters:
    
    tableName - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - cacheTable
```
public void cacheTable(String tableName)
```
    Caches the specified table in-memory.
    
    Parameters:
    
    tableName - (undocumented)
    
    Since:
    
    1.3.0
  - uncacheTable
```
public void uncacheTable(String tableName)
```
    Removes the specified table from the in-memory cache.
    
    Parameters:
    
    tableName - (undocumented)
    
    Since:
    
    1.3.0
  - clearCache
```
public void clearCache()
```
    Removes all cached tables from the in-memory cache.
    
    Since:
    
    1.3.0
  - createDataFrame
```
public <A extends scala.Product> Dataset<Row> createDataFrame(RDD<A> rdd,
                                                              scala.reflect.api.TypeTags.TypeTag<A> evidence$1)
```
    Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).
    
    Parameters:
    
    rdd - (undocumented)
    
    evidence$1 - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - createDataFrame
```
public <A extends scala.Product> Dataset<Row> createDataFrame(scala.collection.Seq<A> data,
                                                              scala.reflect.api.TypeTags.TypeTag<A> evidence$2)
```
    Creates a DataFrame from a local Seq of Product.
    
    Parameters:
    
    data - (undocumented)
    
    evidence$2 - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - baseRelationToDataFrame
```
public Dataset<Row> baseRelationToDataFrame(BaseRelation baseRelation)
```
    Convert a BaseRelation created for external data sources into a DataFrame.
    
    Parameters:
    
    baseRelation - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - createDataFrame
```
public Dataset<Row> createDataFrame(RDD<Row> rowRDD,
                                    StructType schema)
```
    :: DeveloperApi :: Creates a DataFrame from an RDD containing Rows using the given schema. It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime exception. Example:
```
  import org.apache.spark.sql._
  import org.apache.spark.sql.types._
  val sqlContext = new org.apache.spark.sql.SQLContext(sc)

  val schema =
    StructType(
      StructField("name", StringType, false) ::
      StructField("age", IntegerType, true) :: Nil)

  val people =
    sc.textFile("examples/src/main/resources/people.txt").map(
      _.split(",")).map(p => Row(p(0), p(1).trim.toInt))
  val dataFrame = sqlContext.createDataFrame(people, schema)
  dataFrame.printSchema
  // root
  // |-- name: string (nullable = false)
  // |-- age: integer (nullable = true)

  dataFrame.createOrReplaceTempView("people")
  sqlContext.sql("select name from people").collect.foreach(println)
 
```
    Parameters:
    
    rowRDD - (undocumented)
    
    schema - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - createDataset
```
public <T> Dataset<T> createDataset(scala.collection.Seq<T> data,
                                    Encoder<T> evidence$3)
```
    Creates a Dataset from a local Seq of data of a given type. This method requires an encoder (to convert a JVM object of type T to and from the internal Spark SQL representation) that is generally created automatically through implicits from a SparkSession, or can be created explicitly by calling static methods on Encoders.
    == Example ==
```
   import spark.implicits._
   case class Person(name: String, age: Long)
   val data = Seq(Person("Michael", 29), Person("Andy", 30), Person("Justin", 19))
   val ds = spark.createDataset(data)

   ds.show()
   // +-------+---+
   // |   name|age|
   // +-------+---+
   // |Michael| 29|
   // |   Andy| 30|
   // | Justin| 19|
   // +-------+---+
 
```
    Parameters:
    
    data - (undocumented)
    
    evidence$3 - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    2.0.0
  - createDataset
```
public <T> Dataset<T> createDataset(RDD<T> data,
                                    Encoder<T> evidence$4)
```
    Creates a Dataset from an RDD of a given type. This method requires an encoder (to convert a JVM object of type T to and from the internal Spark SQL representation) that is generally created automatically through implicits from a SparkSession, or can be created explicitly by calling static methods on Encoders.
    
    Parameters:
    
    data - (undocumented)
    
    evidence$4 - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    2.0.0
  - createDataset
```
public <T> Dataset<T> createDataset(java.util.List<T> data,
                                    Encoder<T> evidence$5)
```
    Creates a Dataset from a java.util.List of a given type. This method requires an encoder (to convert a JVM object of type T to and from the internal Spark SQL representation) that is generally created automatically through implicits from a SparkSession, or can be created explicitly by calling static methods on Encoders.
    == Java Example ==
```
     List<String> data = Arrays.asList("hello", "world");
     Dataset<String> ds = spark.createDataset(data, Encoders.STRING());
 
```
    Parameters:
    
    data - (undocumented)
    
    evidence$5 - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    2.0.0
  - createDataFrame
```
public Dataset<Row> createDataFrame(JavaRDD<Row> rowRDD,
                                    StructType schema)
```
    :: DeveloperApi :: Creates a DataFrame from a JavaRDD containing Rows using the given schema. It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime exception.
    
    Parameters:
    
    rowRDD - (undocumented)
    
    schema - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - createDataFrame
```
public Dataset<Row> createDataFrame(java.util.List<Row> rows,
                                    StructType schema)
```
    :: DeveloperApi :: Creates a DataFrame from a java.util.List containing Rows using the given schema. It is important to make sure that the structure of every Row of the provided List matches the provided schema. Otherwise, there will be runtime exception.
    
    Parameters:
    
    rows - (undocumented)
    
    schema - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.6.0
  - createDataFrame
```
public Dataset<Row> createDataFrame(RDD<?> rdd,
                                    Class<?> beanClass)
```
    Applies a schema to an RDD of Java Beans.
    WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
    
    Parameters:
    
    rdd - (undocumented)
    
    beanClass - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - createDataFrame
```
public Dataset<Row> createDataFrame(JavaRDD<?> rdd,
                                    Class<?> beanClass)
```
    Applies a schema to an RDD of Java Beans.
    WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
    
    Parameters:
    
    rdd - (undocumented)
    
    beanClass - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - createDataFrame
```
public Dataset<Row> createDataFrame(java.util.List<?> data,
                                    Class<?> beanClass)
```
    Applies a schema to a List of Java Beans.
    WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
    
    Parameters:
    
    data - (undocumented)
    
    beanClass - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.6.0
  - read
```
public DataFrameReader read()
```
    Returns a DataFrameReader that can be used to read non-streaming data in as a DataFrame.
```
   sqlContext.read.parquet("/path/to/file.parquet")
   sqlContext.read.schema(schema).json("/path/to/file.json")
 
```
    Returns:
    
    (undocumented)
    
    Since:
    
    1.4.0
  - readStream
```
public DataStreamReader readStream()
```
    Returns a DataStreamReader that can be used to read streaming data in as a DataFrame.
```
   sparkSession.readStream.parquet("/path/to/directory/of/parquet/files")
   sparkSession.readStream.schema(schema).json("/path/to/directory/of/json/files")
 
```
    Returns:
    
    (undocumented)
    
    Since:
    
    2.0.0
  - dropTempTable
```
public void dropTempTable(String tableName)
```
    Drops the temporary table with the given table name in the catalog. If the table has been cached/persisted before, it's also unpersisted.
    
    Parameters:
    
    tableName - the name of the table to be unregistered.
    
    Since:
    
    1.3.0
  - range
```
public Dataset<Row> range(long end)
```
    Creates a DataFrame with a single LongType column named id, containing elements in a range from 0 to end (exclusive) with step value 1.
    
    Parameters:
    
    end - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.4.1
  - range
```
public Dataset<Row> range(long start,
                          long end)
```
    Creates a DataFrame with a single LongType column named id, containing elements in a range from start to end (exclusive) with step value 1.
    
    Parameters:
    
    start - (undocumented)
    
    end - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.4.0
  - range
```
public Dataset<Row> range(long start,
                          long end,
                          long step)
```
    Creates a DataFrame with a single LongType column named id, containing elements in a range from start to end (exclusive) with a step value.
    
    Parameters:
    
    start - (undocumented)
    
    end - (undocumented)
    
    step - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    2.0.0
  - range
```
public Dataset<Row> range(long start,
                          long end,
                          long step,
                          int numPartitions)
```
    Creates a DataFrame with a single LongType column named id, containing elements in an range from start to end (exclusive) with an step value, with partition number specified.
    
    Parameters:
    
    start - (undocumented)
    
    end - (undocumented)
    
    step - (undocumented)
    
    numPartitions - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.4.0
  - sql
```
public Dataset<Row> sql(String sqlText)
```
    Executes a SQL query using Spark, returning the result as a DataFrame. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.
    
    Parameters:
    
    sqlText - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - table
```
public Dataset<Row> table(String tableName)
```
    Returns the specified table as a DataFrame.
    
    Parameters:
    
    tableName - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - tables
```
public Dataset<Row> tables()
```
    Returns a DataFrame containing names of existing tables in the current database. The returned DataFrame has two columns, tableName and isTemporary (a Boolean indicating if a table is a temporary one or not).
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - tables
```
public Dataset<Row> tables(String databaseName)
```
    Returns a DataFrame containing names of existing tables in the given database. The returned DataFrame has two columns, tableName and isTemporary (a Boolean indicating if a table is a temporary one or not).
    
    Parameters:
    
    databaseName - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - streams
```
public StreamingQueryManager streams()
```
    Returns a StreamingQueryManager that allows managing all the StreamingQueries active on this context.
    
    Returns:
    
    (undocumented)
    
    Since:
    
    2.0.0
  - tableNames
```
public String[] tableNames()
```
    Returns the names of tables in the current database as an array.
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0
  - tableNames
```
public String[] tableNames(String databaseName)
```
    Returns the names of tables in the given database as an array.
    
    Parameters:
    
    databaseName - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Since:
    
    1.3.0

Class SQLContext

Nested Class Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.internal.Logging

Method Detail

implicits

sparkSession

sparkContext

newSession

listenerManager

setConf

setConf

getConf

getConf

getAllConfs

experimental

emptyDataFrame

udf

isCached

cacheTable

uncacheTable

clearCache

createDataFrame

createDataFrame

baseRelationToDataFrame

createDataFrame

createDataset

createDataset

createDataset

createDataFrame

createDataFrame

createDataFrame

createDataFrame

createDataFrame

read

readStream

dropTempTable

range

range

range

range

sql

table

tables

tables

streams

tableNames

tableNames