Override QueryExecution with special debug workflow.
Analyzes the given table in the current database to generate statistics, which will be used in query optimizations.
Analyzes the given table in the current database to generate statistics, which will be used in query optimizations.
Right now, it only supports Hive tables and it only updates the size of a Hive table in the Hive metastore.
:: DeveloperApi :: Creates a SchemaRDD from an RDD containing Rows by applying a schema to this RDD.
:: DeveloperApi :: Creates a SchemaRDD from an RDD containing Rows by applying a schema to this RDD. It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime exception. Example:
import org.apache.spark.sql._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val schema = StructType( StructField("name", StringType, false) :: StructField("age", IntegerType, true) :: Nil) val people = sc.textFile("examples/src/main/resources/people.txt").map( _.split(",")).map(p => Row(p(0), p(1).trim.toInt)) val peopleSchemaRDD = sqlContext. applySchema(people, schema) peopleSchemaRDD.printSchema // root // |-- name: string (nullable = false) // |-- age: integer (nullable = true) peopleSchemaRDD.registerTempTable("people") sqlContext.sql("select name from people").collect.foreach(println)
Caches the specified table in-memory.
Caches the specified table in-memory.
Sets up the system initially or after a RESET command
Sets up the system initially or after a RESET command
:: Experimental ::
Creates an empty parquet file with the schema of class A
, which can be registered as a table.
:: Experimental ::
Creates an empty parquet file with the schema of class A
, which can be registered as a table.
This registered table can be used as the target of future insertInto
operations.
val sqlContext = new SQLContext(...) import sqlContext._ case class Person(name: String, age: Int) createParquetFile[Person]("path/to/file.parquet").registerTempTable("people") sql("INSERT INTO people SELECT 'michael', 29")
A case class type that describes the desired schema of the parquet file to be created.
The path where the directory containing parquet metadata should be created. Data inserted into this table will also be stored at this location.
When false, an exception will be thrown if this directory already exists.
A Hadoop configuration object that can be used to specify options to the parquet output format.
Creates a SchemaRDD from an RDD of case classes.
Creates a SchemaRDD from an RDD of case classes.
Creates a table using the schema of the given class.
Creates a table using the schema of the given class.
A case class that is used to describe the schema of the table to be created.
The name of the table to create.
When false, an exception will be thrown if the table already exists.
Drops the temporary table with the given table name in the catalog.
Drops the temporary table with the given table name in the catalog. If the table has been cached/persisted before, it's also unpersisted.
the name of the table to be unregistered.
:: DeveloperApi :: Allows extra strategies to be injected into the query planner at runtime.
:: DeveloperApi :: Allows extra strategies to be injected into the query planner at runtime. Note this API should be consider experimental and is not intended to be stable across releases.
Return all the configuration properties that have been set (i.
Return all the configuration properties that have been set (i.e. not the default). This creates a new copy of the config properties in the form of a Map.
Return the value of Spark SQL configuration property for the given key.
Return the value of Spark SQL configuration property for the given key. If the key is not set
yet, return defaultValue
.
Return the value of Spark SQL configuration property for the given key.
Return the value of Spark SQL configuration property for the given key.
The location of the hive source code.
The location of the compiled hive distribution
SQLConf and HiveConf contracts:
SQLConf and HiveConf contracts:
1. reuse existing started SessionState if any 2. when the Hive session is first initialized, params in HiveConf will get picked up by the SQLConf. Additionally, any properties set by set() or a SET command inside sql() will be set in the SQLConf *as well as* in the HiveConf.
Returns true if the table is currently cached in-memory.
Returns true if the table is currently cached in-memory.
:: Experimental ::
:: Experimental ::
:: Experimental :: Loads a JSON file (one object per line) and applies the given schema, returning the result as a SchemaRDD.
:: Experimental :: Loads a JSON file (one object per line) and applies the given schema, returning the result as a SchemaRDD.
Loads a JSON file (one object per line), returning the result as a SchemaRDD.
Loads a JSON file (one object per line), returning the result as a SchemaRDD. It goes through the entire dataset once to determine the schema.
:: Experimental ::
:: Experimental ::
:: Experimental :: Loads an RDD[String] storing JSON objects (one object per record) and applies the given schema, returning the result as a SchemaRDD.
:: Experimental :: Loads an RDD[String] storing JSON objects (one object per record) and applies the given schema, returning the result as a SchemaRDD.
Loads an RDD[String] storing JSON objects (one object per record), returning the result as a SchemaRDD.
Loads an RDD[String] storing JSON objects (one object per record), returning the result as a SchemaRDD. It goes through the entire dataset once to determine the schema.
:: DeveloperApi :: Allows catalyst LogicalPlans to be executed as a SchemaRDD.
:: DeveloperApi :: Allows catalyst LogicalPlans to be executed as a SchemaRDD. Note that the LogicalPlan interface is considered internal, and thus not guaranteed to be stable. As a result, using them directly is not recommended.
Records the UDFs present when the server starts, so we can delete ones that are created by tests.
Records the UDFs present when the server starts, so we can delete ones that are created by tests.
Loads a Parquet file, returning the result as a SchemaRDD.
Loads a Parquet file, returning the result as a SchemaRDD.
Prepares a planned SparkPlan for execution by inserting shuffle operations as needed.
Prepares a planned SparkPlan for execution by inserting shuffle operations as needed.
registerFunction 1-22 were generated by this script
registerFunction 1-22 were generated by this script
(1 to 22).map { x => val types = (1 to x).map(x => "_").reduce(_ + ", " + _) s""" def registerFunction[T: TypeTag](name: String, func: Function$x[$types, T]): Unit = { def builder(e: Seq[Expression]) = ScalaUdf(func, ScalaReflection.schemaFor[T].dataType, e) functionRegistry.registerFunction(name, builder) } """ }
Registers the given RDD as a temporary table in the catalog.
Registers the given RDD as a temporary table in the catalog. Temporary tables exist only during the lifetime of this instance of SQLContext.
Resets the test instance by deleting any tables that have been created.
Resets the test instance by deleting any tables that have been created. TODO: also clear out UDFs, views, etc.
Execute the command using Hive and return the results as a sequence.
Execute the command using Hive and return the results as a sequence. Each element in the sequence is one row.
Runs the specified SQL query using Hive.
Runs the specified SQL query using Hive.
SQLConf and HiveConf contracts:
SQLConf and HiveConf contracts:
1. reuse existing started SessionState if any 2. when the Hive session is first initialized, params in HiveConf will get picked up by the SQLConf. Additionally, any properties set by set() or a SET command inside sql() will be set in the SQLConf *as well as* in the HiveConf.
Set the given Spark SQL configuration property.
Set the given Spark SQL configuration property.
Set Spark SQL configuration properties.
Set Spark SQL configuration properties.
Only low degree of contention is expected for conf, thus NOT using ConcurrentHashMap.
Only low degree of contention is expected for conf, thus NOT using ConcurrentHashMap.
Executes a SQL query using Spark, returning the result as a SchemaRDD.
Executes a SQL query using Spark, returning the result as a SchemaRDD. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.
Returns the specified table as a SchemaRDD
Returns the specified table as a SchemaRDD
A list of test tables and the DDL required to initialize them.
A list of test tables and the DDL required to initialize them. A test table is loaded on demand when a query are run against it.
Removes the specified table from the in-memory cache.
Removes the specified table from the in-memory cache.
(Since version 1.1)
(Since version 1.1)
A locally running test instance of Spark's Hive execution engine.
Data from testTables will be automatically loaded whenever a query is run over those tables. Calling reset will delete all tables and other state in the database, leaving the database in a "clean" state.
TestHive is singleton object version of this class because instantiating multiple copies of the hive metastore seems to lead to weird non-deterministic failures. Therefore, the execution of test cases that rely on TestHive must be serialized.