public class SQLContext
extends java.lang.Object
implements scala.Serializable
As of Spark 2.0, this is replaced by SparkSession
. However, we are keeping the class here
for backward compatibility.
Modifier and Type | Class and Description |
---|---|
class |
SQLContext.implicits$
:: Experimental ::
(Scala-specific) Implicit methods available in Scala for converting
common Scala objects into
DataFrame s. |
Constructor and Description |
---|
SQLContext(JavaSparkContext sparkContext) |
SQLContext(SparkContext sc) |
Modifier and Type | Method and Description |
---|---|
protected Dataset<Row> |
applySchemaToPythonRDD(RDD<java.lang.Object[]> rdd,
java.lang.String schemaString)
Apply a schema defined by the schemaString to an RDD.
|
protected Dataset<Row> |
applySchemaToPythonRDD(RDD<java.lang.Object[]> rdd,
StructType schema)
Apply a schema defined by the schema to an RDD.
|
Dataset<Row> |
baseRelationToDataFrame(BaseRelation baseRelation)
Convert a
BaseRelation created for external data sources into a DataFrame . |
protected org.apache.spark.sql.execution.CacheManager |
cacheManager() |
void |
cacheTable(java.lang.String tableName)
Caches the specified table in-memory.
|
static void |
clearActive()
Clears the active SQLContext for current thread.
|
void |
clearCache()
Removes all cached tables from the in-memory cache.
|
protected org.apache.spark.sql.internal.SQLConf |
conf() |
Dataset<Row> |
createDataFrame(JavaRDD<?> rdd,
java.lang.Class<?> beanClass)
Applies a schema to an RDD of Java Beans.
|
Dataset<Row> |
createDataFrame(JavaRDD<Row> rowRDD,
StructType schema)
|
Dataset<Row> |
createDataFrame(java.util.List<?> data,
java.lang.Class<?> beanClass)
Applies a schema to an List of Java Beans.
|
Dataset<Row> |
createDataFrame(java.util.List<Row> rows,
StructType schema)
|
Dataset<Row> |
createDataFrame(RDD<?> rdd,
java.lang.Class<?> beanClass)
Applies a schema to an RDD of Java Beans.
|
<A extends scala.Product> |
createDataFrame(RDD<A> rdd,
scala.reflect.api.TypeTags.TypeTag<A> evidence$1)
:: Experimental ::
Creates a DataFrame from an RDD of Product (e.g.
|
Dataset<Row> |
createDataFrame(RDD<Row> rowRDD,
StructType schema)
|
<A extends scala.Product> |
createDataFrame(scala.collection.Seq<A> data,
scala.reflect.api.TypeTags.TypeTag<A> evidence$2)
:: Experimental ::
Creates a DataFrame from a local Seq of Product.
|
<T> Dataset<T> |
createDataset(java.util.List<T> data,
Encoder<T> evidence$5) |
<T> Dataset<T> |
createDataset(RDD<T> data,
Encoder<T> evidence$4) |
<T> Dataset<T> |
createDataset(scala.collection.Seq<T> data,
Encoder<T> evidence$3) |
Dataset<Row> |
createExternalTable(java.lang.String tableName,
java.lang.String path)
:: Experimental ::
Creates an external table from the given path and returns the corresponding DataFrame.
|
Dataset<Row> |
createExternalTable(java.lang.String tableName,
java.lang.String source,
java.util.Map<java.lang.String,java.lang.String> options)
:: Experimental ::
Creates an external table from the given path based on a data source and a set of options.
|
Dataset<Row> |
createExternalTable(java.lang.String tableName,
java.lang.String source,
scala.collection.immutable.Map<java.lang.String,java.lang.String> options)
:: Experimental ::
(Scala-specific)
Creates an external table from the given path based on a data source and a set of options.
|
Dataset<Row> |
createExternalTable(java.lang.String tableName,
java.lang.String path,
java.lang.String source)
:: Experimental ::
Creates an external table from the given path based on a data source
and returns the corresponding DataFrame.
|
Dataset<Row> |
createExternalTable(java.lang.String tableName,
java.lang.String source,
StructType schema,
java.util.Map<java.lang.String,java.lang.String> options)
:: Experimental ::
Create an external table from the given path based on a data source, a schema and
a set of options.
|
Dataset<Row> |
createExternalTable(java.lang.String tableName,
java.lang.String source,
StructType schema,
scala.collection.immutable.Map<java.lang.String,java.lang.String> options)
:: Experimental ::
(Scala-specific)
Create an external table from the given path based on a data source, a schema and
a set of options.
|
void |
dropTempTable(java.lang.String tableName)
Drops the temporary table with the given table name in the catalog.
|
Dataset<Row> |
emptyDataFrame()
:: Experimental ::
Returns a
DataFrame with no rows or columns. |
protected org.apache.spark.sql.execution.QueryExecution |
executePlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan) |
protected org.apache.spark.sql.execution.QueryExecution |
executeSql(java.lang.String sql) |
ExperimentalMethods |
experimental()
:: Experimental ::
A collection of methods that are considered experimental, but can be used to hook into
the query planner for advanced functionality.
|
protected org.apache.spark.sql.catalyst.catalog.ExternalCatalog |
externalCatalog() |
scala.collection.immutable.Map<java.lang.String,java.lang.String> |
getAllConfs()
Return all the configuration properties that have been set (i.e.
|
java.lang.String |
getConf(java.lang.String key)
Return the value of Spark SQL configuration property for the given key.
|
java.lang.String |
getConf(java.lang.String key,
java.lang.String defaultValue)
Return the value of Spark SQL configuration property for the given key.
|
static SQLContext |
getOrCreate(SparkContext sparkContext)
Get the singleton SQLContext if it exists or create a new one using the given SparkContext.
|
SQLContext.implicits$ |
implicits()
Accessor for nested Scala object
|
protected static void |
initializeLogIfNecessary(boolean isInterpreter) |
boolean |
isCached(java.lang.String tableName)
Returns true if the table is currently cached in-memory.
|
boolean |
isRootContext() |
protected static boolean |
isTraceEnabled() |
protected org.apache.spark.sql.execution.ui.SQLListener |
listener() |
ExecutionListenerManager |
listenerManager()
An interface to register custom
QueryExecutionListener s
that listen for execution metrics. |
protected static org.slf4j.Logger |
log() |
protected static void |
logDebug(scala.Function0<java.lang.String> msg) |
protected static void |
logDebug(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
protected static void |
logError(scala.Function0<java.lang.String> msg) |
protected static void |
logError(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
protected static void |
logInfo(scala.Function0<java.lang.String> msg) |
protected static void |
logInfo(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
protected static java.lang.String |
logName() |
protected static void |
logTrace(scala.Function0<java.lang.String> msg) |
protected static void |
logTrace(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
protected static void |
logWarning(scala.Function0<java.lang.String> msg) |
protected static void |
logWarning(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
SQLContext |
newSession()
Returns a
SQLContext as new session, with separated SQL configurations, temporary
tables, registered functions, but sharing the same SparkContext , cached data and
other things. |
protected DataType |
parseDataType(java.lang.String dataTypeString)
Parses the data type in our internal string representation.
|
protected org.apache.spark.sql.catalyst.plans.logical.LogicalPlan |
parseSql(java.lang.String sql) |
Dataset<java.lang.Long> |
range(long end)
:: Experimental ::
Creates a
Dataset with a single LongType column named id , containing elements
in an range from 0 to end (exclusive) with step value 1. |
Dataset<java.lang.Long> |
range(long start,
long end)
:: Experimental ::
Creates a
Dataset with a single LongType column named id , containing elements
in an range from start to end (exclusive) with step value 1. |
Dataset<java.lang.Long> |
range(long start,
long end,
long step)
:: Experimental ::
Creates a
Dataset with a single LongType column named id , containing elements
in an range from start to end (exclusive) with an step value. |
Dataset<java.lang.Long> |
range(long start,
long end,
long step,
int numPartitions)
:: Experimental ::
Creates a
Dataset with a single LongType column named id , containing elements
in an range from start to end (exclusive) with an step value, with partition number
specified. |
DataFrameReader |
read()
:: Experimental ::
Returns a
DataFrameReader that can be used to read data and streams in as a DataFrame . |
protected RuntimeConfig |
runtimeConf() |
protected org.apache.spark.sql.internal.SessionState |
sessionState() |
static void |
setActive(SQLContext sqlContext)
Changes the SQLContext that will be returned in this thread and its children when
SQLContext.getOrCreate() is called.
|
void |
setConf(java.util.Properties props)
Set Spark SQL configuration properties.
|
void |
setConf(java.lang.String key,
java.lang.String value)
Set the given Spark SQL configuration property.
|
protected org.apache.spark.sql.internal.SharedState |
sharedState() |
SparkContext |
sparkContext() |
SparkSession |
sparkSession() |
Dataset<Row> |
sql(java.lang.String sqlText)
Executes a SQL query using Spark, returning the result as a
DataFrame . |
ContinuousQueryManager |
streams()
Returns a
ContinuousQueryManager that allows managing all the
ContinuousQueries active on this context. |
Dataset<Row> |
table(java.lang.String tableName)
Returns the specified table as a
DataFrame . |
java.lang.String[] |
tableNames()
Returns the names of tables in the current database as an array.
|
java.lang.String[] |
tableNames(java.lang.String databaseName)
Returns the names of tables in the given database as an array.
|
Dataset<Row> |
tables()
Returns a
DataFrame containing names of existing tables in the current database. |
Dataset<Row> |
tables(java.lang.String databaseName)
Returns a
DataFrame containing names of existing tables in the given database. |
UDFRegistration |
udf()
A collection of methods for registering user-defined functions (UDF).
|
void |
uncacheTable(java.lang.String tableName)
Removes the specified table from the in-memory cache.
|
public SQLContext(SparkContext sc)
public SQLContext(JavaSparkContext sparkContext)
public static SQLContext getOrCreate(SparkContext sparkContext)
This function can be used to create a singleton SQLContext object that can be shared across the JVM.
If there is an active SQLContext for current thread, it will be returned instead of the global one.
sparkContext
- (undocumented)public static void setActive(SQLContext sqlContext)
sqlContext
- (undocumented)public static void clearActive()
protected static java.lang.String logName()
protected static org.slf4j.Logger log()
protected static void logInfo(scala.Function0<java.lang.String> msg)
protected static void logDebug(scala.Function0<java.lang.String> msg)
protected static void logTrace(scala.Function0<java.lang.String> msg)
protected static void logWarning(scala.Function0<java.lang.String> msg)
protected static void logError(scala.Function0<java.lang.String> msg)
protected static void logInfo(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static void logDebug(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static void logTrace(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static void logWarning(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static void logError(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static boolean isTraceEnabled()
protected static void initializeLogIfNecessary(boolean isInterpreter)
public SparkSession sparkSession()
public boolean isRootContext()
protected org.apache.spark.sql.internal.SessionState sessionState()
protected org.apache.spark.sql.internal.SharedState sharedState()
protected org.apache.spark.sql.internal.SQLConf conf()
protected RuntimeConfig runtimeConf()
protected org.apache.spark.sql.execution.CacheManager cacheManager()
protected org.apache.spark.sql.execution.ui.SQLListener listener()
protected org.apache.spark.sql.catalyst.catalog.ExternalCatalog externalCatalog()
public SparkContext sparkContext()
public SQLContext newSession()
SQLContext
as new session, with separated SQL configurations, temporary
tables, registered functions, but sharing the same SparkContext
, cached data and
other things.
public ExecutionListenerManager listenerManager()
QueryExecutionListener
s
that listen for execution metrics.public void setConf(java.util.Properties props)
props
- (undocumented)public void setConf(java.lang.String key, java.lang.String value)
key
- (undocumented)value
- (undocumented)public java.lang.String getConf(java.lang.String key)
key
- (undocumented)public java.lang.String getConf(java.lang.String key, java.lang.String defaultValue)
defaultValue
.
key
- (undocumented)defaultValue
- (undocumented)public scala.collection.immutable.Map<java.lang.String,java.lang.String> getAllConfs()
protected org.apache.spark.sql.catalyst.plans.logical.LogicalPlan parseSql(java.lang.String sql)
protected org.apache.spark.sql.execution.QueryExecution executeSql(java.lang.String sql)
protected org.apache.spark.sql.execution.QueryExecution executePlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)
public ExperimentalMethods experimental()
public Dataset<Row> emptyDataFrame()
DataFrame
with no rows or columns.
public UDFRegistration udf()
The following example registers a Scala closure as UDF:
sqlContext.udf.register("myUDF", (arg1: Int, arg2: String) => arg2 + arg1)
The following example registers a UDF in Java:
sqlContext.udf().register("myUDF",
new UDF2<Integer, String, String>() {
@Override
public String call(Integer arg1, String arg2) {
return arg2 + arg1;
}
}, DataTypes.StringType);
Or, to use Java 8 lambda syntax:
sqlContext.udf().register("myUDF",
(Integer arg1, String arg2) -> arg2 + arg1,
DataTypes.StringType);
public boolean isCached(java.lang.String tableName)
tableName
- (undocumented)public void cacheTable(java.lang.String tableName)
tableName
- (undocumented)public void uncacheTable(java.lang.String tableName)
tableName
- (undocumented)public void clearCache()
public SQLContext.implicits$ implicits()
public <A extends scala.Product> Dataset<Row> createDataFrame(RDD<A> rdd, scala.reflect.api.TypeTags.TypeTag<A> evidence$1)
rdd
- (undocumented)evidence$1
- (undocumented)public <A extends scala.Product> Dataset<Row> createDataFrame(scala.collection.Seq<A> data, scala.reflect.api.TypeTags.TypeTag<A> evidence$2)
data
- (undocumented)evidence$2
- (undocumented)public Dataset<Row> baseRelationToDataFrame(BaseRelation baseRelation)
BaseRelation
created for external data sources into a DataFrame
.
baseRelation
- (undocumented)public Dataset<Row> createDataFrame(RDD<Row> rowRDD, StructType schema)
DataFrame
from an RDD
containing Row
s using the given schema.
It is important to make sure that the structure of every Row
of the provided RDD matches
the provided schema. Otherwise, there will be runtime exception.
Example:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val schema =
StructType(
StructField("name", StringType, false) ::
StructField("age", IntegerType, true) :: Nil)
val people =
sc.textFile("examples/src/main/resources/people.txt").map(
_.split(",")).map(p => Row(p(0), p(1).trim.toInt))
val dataFrame = sqlContext.createDataFrame(people, schema)
dataFrame.printSchema
// root
// |-- name: string (nullable = false)
// |-- age: integer (nullable = true)
dataFrame.createOrReplaceTempView("people")
sqlContext.sql("select name from people").collect.foreach(println)
rowRDD
- (undocumented)schema
- (undocumented)public <T> Dataset<T> createDataset(scala.collection.Seq<T> data, Encoder<T> evidence$3)
public Dataset<Row> createDataFrame(JavaRDD<Row> rowRDD, StructType schema)
DataFrame
from an JavaRDD
containing Row
s using the given schema.
It is important to make sure that the structure of every Row
of the provided RDD matches
the provided schema. Otherwise, there will be runtime exception.
rowRDD
- (undocumented)schema
- (undocumented)public Dataset<Row> createDataFrame(java.util.List<Row> rows, StructType schema)
DataFrame
from an List
containing Row
s using the given schema.
It is important to make sure that the structure of every Row
of the provided List matches
the provided schema. Otherwise, there will be runtime exception.
rows
- (undocumented)schema
- (undocumented)public Dataset<Row> createDataFrame(RDD<?> rdd, java.lang.Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
rdd
- (undocumented)beanClass
- (undocumented)public Dataset<Row> createDataFrame(JavaRDD<?> rdd, java.lang.Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
rdd
- (undocumented)beanClass
- (undocumented)public Dataset<Row> createDataFrame(java.util.List<?> data, java.lang.Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
data
- (undocumented)beanClass
- (undocumented)public DataFrameReader read()
DataFrameReader
that can be used to read data and streams in as a DataFrame
.
sqlContext.read.parquet("/path/to/file.parquet")
sqlContext.read.schema(schema).json("/path/to/file.json")
public Dataset<Row> createExternalTable(java.lang.String tableName, java.lang.String path)
tableName
- (undocumented)path
- (undocumented)public Dataset<Row> createExternalTable(java.lang.String tableName, java.lang.String path, java.lang.String source)
tableName
- (undocumented)path
- (undocumented)source
- (undocumented)public Dataset<Row> createExternalTable(java.lang.String tableName, java.lang.String source, java.util.Map<java.lang.String,java.lang.String> options)
tableName
- (undocumented)source
- (undocumented)options
- (undocumented)public Dataset<Row> createExternalTable(java.lang.String tableName, java.lang.String source, scala.collection.immutable.Map<java.lang.String,java.lang.String> options)
tableName
- (undocumented)source
- (undocumented)options
- (undocumented)public Dataset<Row> createExternalTable(java.lang.String tableName, java.lang.String source, StructType schema, java.util.Map<java.lang.String,java.lang.String> options)
tableName
- (undocumented)source
- (undocumented)schema
- (undocumented)options
- (undocumented)public Dataset<Row> createExternalTable(java.lang.String tableName, java.lang.String source, StructType schema, scala.collection.immutable.Map<java.lang.String,java.lang.String> options)
tableName
- (undocumented)source
- (undocumented)schema
- (undocumented)options
- (undocumented)public void dropTempTable(java.lang.String tableName)
tableName
- the name of the table to be unregistered.public Dataset<java.lang.Long> range(long end)
Dataset
with a single LongType
column named id
, containing elements
in an range from 0 to end
(exclusive) with step value 1.
end
- (undocumented)public Dataset<java.lang.Long> range(long start, long end)
Dataset
with a single LongType
column named id
, containing elements
in an range from start
to end
(exclusive) with step value 1.
start
- (undocumented)end
- (undocumented)public Dataset<java.lang.Long> range(long start, long end, long step)
Dataset
with a single LongType
column named id
, containing elements
in an range from start
to end
(exclusive) with an step value.
start
- (undocumented)end
- (undocumented)step
- (undocumented)public Dataset<java.lang.Long> range(long start, long end, long step, int numPartitions)
Dataset
with a single LongType
column named id
, containing elements
in an range from start
to end
(exclusive) with an step value, with partition number
specified.
start
- (undocumented)end
- (undocumented)step
- (undocumented)numPartitions
- (undocumented)public Dataset<Row> sql(java.lang.String sqlText)
DataFrame
. The dialect that is
used for SQL parsing can be configured with 'spark.sql.dialect'.
sqlText
- (undocumented)public Dataset<Row> table(java.lang.String tableName)
DataFrame
.
tableName
- (undocumented)public Dataset<Row> tables()
DataFrame
containing names of existing tables in the current database.
The returned DataFrame has two columns, tableName and isTemporary (a Boolean
indicating if a table is a temporary one or not).
public Dataset<Row> tables(java.lang.String databaseName)
DataFrame
containing names of existing tables in the given database.
The returned DataFrame has two columns, tableName and isTemporary (a Boolean
indicating if a table is a temporary one or not).
databaseName
- (undocumented)public ContinuousQueryManager streams()
ContinuousQueryManager
that allows managing all the
ContinuousQueries
active on this
context.
public java.lang.String[] tableNames()
public java.lang.String[] tableNames(java.lang.String databaseName)
databaseName
- (undocumented)protected DataType parseDataType(java.lang.String dataTypeString)
toString
in scala.
It is only used by PySpark.dataTypeString
- (undocumented)protected Dataset<Row> applySchemaToPythonRDD(RDD<java.lang.Object[]> rdd, java.lang.String schemaString)
rdd
- (undocumented)schemaString
- (undocumented)protected Dataset<Row> applySchemaToPythonRDD(RDD<java.lang.Object[]> rdd, StructType schema)
rdd
- (undocumented)schema
- (undocumented)