HiveContext

java.lang.Object
- org.apache.spark.sql.SQLContext
- - org.apache.spark.sql.hive.HiveContext

All Implemented Interfaces:

java.io.Serializable, Logging
```
public class HiveContext
extends SQLContext
implements Logging
```
An instance of the Spark SQL execution engine that integrates with data stored in Hive. Configuration for Hive is read from hive-site.xml on the classpath.

Since:

1.0.0

See Also:
Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

protected class HiveContext.QueryExecution
Extends QueryExecution with hive specific features.
- Nested classes/interfaces inherited from class org.apache.spark.sql.SQLContext
  SQLContext.implicits$, SQLContext.SparkPlanner

Nested Classes
Modifier and Type	Class and Description
`protected class`	`HiveContext.QueryExecution` Extends QueryExecution with hive specific features.

Constructor Summary

Constructors
Constructor and Description

HiveContext(JavaSparkContext sc)

HiveContext(SparkContext sc)

Constructors
Constructor and Description
`HiveContext(JavaSparkContext sc)`
`HiveContext(SparkContext sc)`

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`addJar(java.lang.String path)` Add a jar to SQLContext
`void`	`analyze(java.lang.String tableName)` Analyzes the given table in the current database to generate statistics, which will be used in query optimizations.
`protected org.apache.spark.sql.catalyst.analysis.Analyzer`	`analyzer()`
`protected org.apache.spark.sql.hive.HiveMetastoreCatalog`	`catalog()`
`protected org.apache.spark.sql.SQLConf`	`conf()`
`protected scala.collection.immutable.Map<java.lang.String,java.lang.String>`	`configure()` Overridden by child classes that need to set configuration before the client init.
`static`	`CONVERT_CTAS()`
`static`	`CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING()`
`static`	`CONVERT_METASTORE_PARQUET()`
`protected boolean`	`convertCTAS()` When true, a table created by a Hive CTAS statement (no USING clause) will be converted to a data source table, using the data source set by spark.sql.sources.default.
`protected boolean`	`convertMetastoreParquet()` When true, enables an experimental feature where metastore tables that use the parquet SerDe are automatically converted to use the Spark SQL parquet table scan, instead of the Hive SerDe.
`protected boolean`	`convertMetastoreParquetWithSchemaMerging()` When true, also tries to merge possibly different but compatible Parquet schemas in different Parquet data files.
`protected HiveContext.QueryExecution`	`executePlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)`
`protected org.apache.spark.sql.hive.client.ClientWrapper`	`executionHive()` The copy of the hive client that is used for execution.
`protected org.apache.spark.sql.catalyst.analysis.FunctionRegistry`	`functionRegistry()`
`protected org.apache.spark.sql.catalyst.ParserDialect`	`getSQLDialect()`
`static`	`HIVE_EXECUTION_VERSION()`
`static`	`HIVE_METASTORE_BARRIER_PREFIXES()`
`static`	`HIVE_METASTORE_JARS()`
`static`	`HIVE_METASTORE_SHARED_PREFIXES()`
`static`	`HIVE_METASTORE_VERSION()`
`static`	`HIVE_THRIFT_SERVER_ASYNC()`
`protected org.apache.hadoop.hive.conf.HiveConf`	`hiveconf()` SQLConf and HiveConf contracts:
`static java.lang.String`	`hiveExecutionVersion()` The version of hive used internally by Spark SQL.
`protected scala.collection.Seq<java.lang.String>`	`hiveMetastoreBarrierPrefixes()` A comma separated list of class prefixes that should explicitly be reloaded for each version of Hive that Spark SQL is communicating with.
`protected java.lang.String`	`hiveMetastoreJars()` The location of the jars that should be used to instantiate the HiveMetastoreClient.
`protected scala.collection.Seq<java.lang.String>`	`hiveMetastoreSharedPrefixes()` A comma separated list of class prefixes that should be loaded using the classloader that is shared between Spark SQL and a specific version of Hive.
`protected java.lang.String`	`hiveMetastoreVersion()` The version of the hive client that will be used to communicate with the metastore.
`protected boolean`	`hiveThriftServerAsync()`
`protected boolean`	`hiveThriftServerSingleSession()`
`protected void`	`invalidateTable(java.lang.String tableName)`
`protected org.apache.spark.sql.hive.client.ClientInterface`	`metadataHive()` The copy of the Hive client that is used to retrieve metadata from the Hive MetaStore.
`HiveContext`	`newSession()` Returns a new HiveContext as new session, which will have separated SQLConf, UDF/UDAF, temporary tables and SessionState, but sharing the same CacheManager, IsolatedClientLoader and Hive client (both of execution and metadata) with existing HiveContext.
`static scala.collection.immutable.Map<java.lang.String,java.lang.String>`	`newTemporaryConfiguration()` Constructs a configuration for hive, where the metastore is located in a temp directory.
`protected org.apache.spark.sql.catalyst.plans.logical.LogicalPlan`	`parseSql(java.lang.String sql)`
`protected SQLContext.SparkPlanner`	`planner()`
`protected static scala.collection.Seq<org.apache.spark.sql.types.AtomicType>`	`primitiveTypes()`
`void`	`refreshTable(java.lang.String tableName)` Invalidate and refresh all the cached the metadata of the given table.
`protected scala.collection.Seq<java.lang.String>`	`runSqlHive(java.lang.String sql)`
`void`	`setConf(java.lang.String key, java.lang.String value)` Set the given Spark SQL configuration property.
`protected org.apache.hadoop.hive.ql.parse.VariableSubstitution`	`substitutor()`
`protected static java.lang.String`	`toHiveString(scala.Tuple2<java.lang.Object,DataType> a)`
`protected static java.lang.String`	`toHiveStructString(scala.Tuple2<java.lang.Object,DataType> a)` Hive outputs fields of structs slightly differently than top level attributes.

Methods inherited from class org.apache.spark.sql.SQLContext
applySchema, applySchema, applySchema, applySchema, applySchemaToPythonRDD, applySchemaToPythonRDD, baseRelationToDataFrame, cacheManager, cacheTable, clearActive, clearCache, createDataFrame, createDataFrame, createDataFrame, createDataFrame, createDataFrame, createDataFrame, createDataFrame, createDataFrame, createDataset, createDataset, createDataset, createExternalTable, createExternalTable, createExternalTable, createExternalTable, createExternalTable, createExternalTable, ddlParser, dialectClassName, dropTempTable, emptyDataFrame, emptyResult, executeSql, experimental, getAllConfs, getConf, getConf, getOrCreate, getSchema, implicits, isCached, isRootContext, jdbc, jdbc, jdbc, jsonFile, jsonFile, jsonFile, jsonRDD, jsonRDD, jsonRDD, jsonRDD, jsonRDD, jsonRDD, listener, listenerManager, load, load, load, load, load, load, optimizer, parquetFile, parquetFile, parseDataType, prepareForExecution, range, range, range, read, setActive, setConf, sparkContext, sql, sqlParser, table, tableNames, tableNames, tables, tables, udf, uncacheTable

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - HiveContext
```
public HiveContext(SparkContext sc)
```
  - HiveContext
```
public HiveContext(JavaSparkContext sc)
```
- Method Detail
  - hiveExecutionVersion
```
public static java.lang.String hiveExecutionVersion()
```
    The version of hive used internally by Spark SQL.
  - HIVE_METASTORE_VERSION
```
public static  HIVE_METASTORE_VERSION()
```
  - HIVE_EXECUTION_VERSION
```
public static  HIVE_EXECUTION_VERSION()
```
  - HIVE_METASTORE_JARS
```
public static  HIVE_METASTORE_JARS()
```
  - CONVERT_METASTORE_PARQUET
```
public static  CONVERT_METASTORE_PARQUET()
```
  - CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING
```
public static  CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING()
```
  - CONVERT_CTAS
```
public static  CONVERT_CTAS()
```
  - HIVE_METASTORE_SHARED_PREFIXES
```
public static  HIVE_METASTORE_SHARED_PREFIXES()
```
  - HIVE_METASTORE_BARRIER_PREFIXES
```
public static  HIVE_METASTORE_BARRIER_PREFIXES()
```
  - HIVE_THRIFT_SERVER_ASYNC
```
public static  HIVE_THRIFT_SERVER_ASYNC()
```
  - newTemporaryConfiguration
```
public static scala.collection.immutable.Map<java.lang.String,java.lang.String> newTemporaryConfiguration()
```
    Constructs a configuration for hive, where the metastore is located in a temp directory.
  - primitiveTypes
```
protected static scala.collection.Seq<org.apache.spark.sql.types.AtomicType> primitiveTypes()
```
  - toHiveString
```
protected static java.lang.String toHiveString(scala.Tuple2<java.lang.Object,DataType> a)
```
  - toHiveStructString
```
protected static java.lang.String toHiveStructString(scala.Tuple2<java.lang.Object,DataType> a)
```
    Hive outputs fields of structs slightly differently than top level attributes.
  - newSession
```
public HiveContext newSession()
```
    Returns a new HiveContext as new session, which will have separated SQLConf, UDF/UDAF, temporary tables and SessionState, but sharing the same CacheManager, IsolatedClientLoader and Hive client (both of execution and metadata) with existing HiveContext.
    
    Overrides:
    
    newSession in class SQLContext
    
    Returns:
    (undocumented)
  - convertMetastoreParquet
```
protected boolean convertMetastoreParquet()
```
    When true, enables an experimental feature where metastore tables that use the parquet SerDe are automatically converted to use the Spark SQL parquet table scan, instead of the Hive SerDe.
    
    Returns:
    (undocumented)
  - convertMetastoreParquetWithSchemaMerging
```
protected boolean convertMetastoreParquetWithSchemaMerging()
```
    When true, also tries to merge possibly different but compatible Parquet schemas in different Parquet data files.
    This configuration is only effective when "spark.sql.hive.convertMetastoreParquet" is true.
    
    Returns:
    (undocumented)
  - convertCTAS
```
protected boolean convertCTAS()
```
    When true, a table created by a Hive CTAS statement (no USING clause) will be converted to a data source table, using the data source set by spark.sql.sources.default. The table in CTAS statement will be converted when it meets any of the following conditions: - The CTAS does not specify any of a SerDe (ROW FORMAT SERDE), a File Format (STORED AS), or a Storage Hanlder (STORED BY), and the value of hive.default.fileformat in hive-site.xml is either TextFile or SequenceFile. - The CTAS statement specifies TextFile (STORED AS TEXTFILE) as the file format and no SerDe is specified (no ROW FORMAT SERDE clause). - The CTAS statement specifies SequenceFile (STORED AS SEQUENCEFILE) as the file format and no SerDe is specified (no ROW FORMAT SERDE clause).
    
    Returns:
    (undocumented)
  - hiveMetastoreVersion
```
protected java.lang.String hiveMetastoreVersion()
```
    The version of the hive client that will be used to communicate with the metastore. Note that this does not necessarily need to be the same version of Hive that is used internally by Spark SQL for execution.
    
    Returns:
    (undocumented)
  - hiveMetastoreJars
```
protected java.lang.String hiveMetastoreJars()
```
    The location of the jars that should be used to instantiate the HiveMetastoreClient. This property can be one of three options: - a classpath in the standard format for both hive and hadoop. - builtin - attempt to discover the jars that were used to load Spark SQL and use those. This option is only valid when using the execution version of Hive. - maven - download the correct version of hive on demand from maven.
    
    Returns:
    (undocumented)
  - hiveMetastoreSharedPrefixes
```
protected scala.collection.Seq<java.lang.String> hiveMetastoreSharedPrefixes()
```
    A comma separated list of class prefixes that should be loaded using the classloader that is shared between Spark SQL and a specific version of Hive. An example of classes that should be shared is JDBC drivers that are needed to talk to the metastore. Other classes that need to be shared are those that interact with classes that are already shared. For example, custom appenders that are used by log4j.
    
    Returns:
    (undocumented)
  - hiveMetastoreBarrierPrefixes
```
protected scala.collection.Seq<java.lang.String> hiveMetastoreBarrierPrefixes()
```
    A comma separated list of class prefixes that should explicitly be reloaded for each version of Hive that Spark SQL is communicating with. For example, Hive UDFs that are declared in a prefix that typically would be shared (i.e. org.apache.spark.*)
    
    Returns:
    (undocumented)
  - hiveThriftServerAsync
```
protected boolean hiveThriftServerAsync()
```
  - hiveThriftServerSingleSession
```
protected boolean hiveThriftServerSingleSession()
```
  - substitutor
```
protected org.apache.hadoop.hive.ql.parse.VariableSubstitution substitutor()
```
  - executionHive
```
protected org.apache.spark.sql.hive.client.ClientWrapper executionHive()
```
    The copy of the hive client that is used for execution. Currently this must always be Hive 13 as this is the version of Hive that is packaged with Spark SQL. This copy of the client is used for execution related tasks like registering temporary functions or ensuring that the ThreadLocal SessionState is correctly populated. This copy of Hive is *not* used for storing persistent metadata, and only point to a dummy metastore in a temporary directory.
    
    Returns:
    (undocumented)
  - metadataHive
```
protected org.apache.spark.sql.hive.client.ClientInterface metadataHive()
```
    The copy of the Hive client that is used to retrieve metadata from the Hive MetaStore. The version of the Hive client that is used here must match the metastore that is configured in the hive-site.xml file.
    
    Returns:
    (undocumented)
  - parseSql
```
protected org.apache.spark.sql.catalyst.plans.logical.LogicalPlan parseSql(java.lang.String sql)
```
    Overrides:
    
    parseSql in class SQLContext
  - executePlan
```
protected HiveContext.QueryExecution executePlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)
```
    Overrides:
    
    executePlan in class SQLContext
  - refreshTable
```
public void refreshTable(java.lang.String tableName)
```
    Invalidate and refresh all the cached the metadata of the given table. For performance reasons, Spark SQL or the external data source library it uses might cache certain metadata about a table, such as the location of blocks. When those change outside of Spark SQL, users should call this function to invalidate the cache.
    
    Parameters:
    tableName - (undocumented)
    Since:
    
    1.3.0
  - invalidateTable
```
protected void invalidateTable(java.lang.String tableName)
```
  - analyze
```
public void analyze(java.lang.String tableName)
```
    Analyzes the given table in the current database to generate statistics, which will be used in query optimizations.
    Right now, it only supports Hive tables and it only updates the size of a Hive table in the Hive metastore.
    
    Parameters:
    tableName - (undocumented)
    Since:
    
    1.2.0
  - setConf
```
public void setConf(java.lang.String key,
           java.lang.String value)
```
    Description copied from class: SQLContext
    
    Set the given Spark SQL configuration property.
    
    Overrides:
    
    setConf in class SQLContext
    
    Parameters:
    key - (undocumented)
    value - (undocumented)
  - catalog
```
protected org.apache.spark.sql.hive.HiveMetastoreCatalog catalog()
```
    Overrides:
    
    catalog in class SQLContext
  - functionRegistry
```
protected org.apache.spark.sql.catalyst.analysis.FunctionRegistry functionRegistry()
```
    Overrides:
    
    functionRegistry in class SQLContext
  - analyzer
```
protected org.apache.spark.sql.catalyst.analysis.Analyzer analyzer()
```
    Overrides:
    
    analyzer in class SQLContext
  - configure
```
protected scala.collection.immutable.Map<java.lang.String,java.lang.String> configure()
```
    Overridden by child classes that need to set configuration before the client init.
  - hiveconf
```
protected org.apache.hadoop.hive.conf.HiveConf hiveconf()
```
    SQLConf and HiveConf contracts:
    1. create a new SessionState for each HiveContext 2. when the Hive session is first initialized, params in HiveConf will get picked up by the SQLConf. Additionally, any properties set by set() or a SET command inside sql() will be set in the SQLConf *as well as* in the HiveConf.
    
    Returns:
    (undocumented)
  - conf
```
protected org.apache.spark.sql.SQLConf conf()
```
    Overrides:
    
    conf in class SQLContext
    
    Returns:
    Spark SQL configuration
  - getSQLDialect
```
protected org.apache.spark.sql.catalyst.ParserDialect getSQLDialect()
```
    Overrides:
    
    getSQLDialect in class SQLContext
  - runSqlHive
```
protected scala.collection.Seq<java.lang.String> runSqlHive(java.lang.String sql)
```
  - planner
```
protected SQLContext.SparkPlanner planner()
```
    Overrides:
    
    planner in class SQLContext
  - addJar
```
protected void addJar(java.lang.String path)
```
    Description copied from class: SQLContext
    
    Add a jar to SQLContext
    
    Overrides:
    
    addJar in class SQLContext
    
    Parameters:
    path - (undocumented)

Class HiveContext

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.spark.sql.SQLContext

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.sql.SQLContext

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

HiveContext

HiveContext

Method Detail

hiveExecutionVersion

HIVE_METASTORE_VERSION

HIVE_EXECUTION_VERSION

HIVE_METASTORE_JARS

CONVERT_METASTORE_PARQUET

CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING

CONVERT_CTAS

HIVE_METASTORE_SHARED_PREFIXES

HIVE_METASTORE_BARRIER_PREFIXES

HIVE_THRIFT_SERVER_ASYNC

newTemporaryConfiguration

primitiveTypes

toHiveString

toHiveStructString

newSession

convertMetastoreParquet

convertMetastoreParquetWithSchemaMerging

convertCTAS

hiveMetastoreVersion

hiveMetastoreJars

hiveMetastoreSharedPrefixes

hiveMetastoreBarrierPrefixes

hiveThriftServerAsync

hiveThriftServerSingleSession

substitutor

executionHive

metadataHive

parseSql

executePlan

refreshTable

invalidateTable

analyze

setConf

catalog

functionRegistry

analyzer

configure

hiveconf

conf

getSQLDialect

runSqlHive

planner

addJar