:: DeveloperApi ::
:: DeveloperApi ::
The data type for collections of multiple values.
Internally these are represented as columns that contain a
.scala.collection.Seq
An ArrayType object comprises two fields, elementType: DataType
and
containsNull: Boolean
. The field of elementType
is used to specify the type of
array elements. The field of containsNull
is used to specify if the array has null
values.
:: DeveloperApi ::
:: DeveloperApi ::
The base type of all Spark SQL data types.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing scala.math.BigDecimal
values.
TODO(matei): explain precision and scale
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Map
s. A MapType object comprises three fields,
keyType: DataType
, valueType: DataType
and valueContainsNull: Boolean
.
The field of keyType
is used to specify the type of keys in the map.
The field of valueType
is used to specify the type of values in the map.
The field of valueContainsNull
is used to specify if values of this map has null
values.
For values of a MapType column, keys are not allowed to have null
values.
:: DeveloperApi ::
:: DeveloperApi ::
Metadata is a wrapper over Map[String, Any] that limits the value type to simple ones: Boolean, Long, Double, String, Metadata, Array[Boolean], Array[Long], Array[Double], Array[String], and Array[Metadata]. JSON is used for serialization.
The default constructor is private. User should use either MetadataBuilder or Metadata$#fromJson to create Metadata instances.
:: DeveloperApi :: Builder for Metadata.
:: DeveloperApi :: Builder for Metadata. If there is a key collision, the latter will overwrite the former.
:: DeveloperApi ::
:: DeveloperApi ::
Represents one row of output from a relational operator.
:: AlphaComponent :: The entry point for running relational queries using Spark.
:: AlphaComponent :: An RDD of Row objects that has an associated schema.
Converts a logical plan into zero or more SparkPlans.
Converts a logical plan into zero or more SparkPlans.
:: DeveloperApi ::
:: DeveloperApi ::
A StructField object represents a field in a StructType object.
A StructField object comprises three fields, name: String
, dataType: DataType
,
and nullable: Boolean
. The field of name
is the name of a StructField
. The field of
dataType
specifies the data type of a StructField
.
The field of nullable
specifies if values of a StructField
can contain null
values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Rows. A StructType object comprises a Seq of StructFields.
:: DeveloperApi ::
:: DeveloperApi ::
An ArrayType object can be constructed with two ways,
ArrayType(elementType: DataType, containsNull: Boolean)
and
ArrayType(elementType: DataType)
For ArrayType(elementType)
, the field of containsNull
is set to false
.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Array[Byte]
values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Boolean
values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Byte
values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing java.sql.Date
values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing scala.math.BigDecimal
values.
TODO(matei): explain precision and scale
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Double
values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Float
values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Int
values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Long
values.
:: DeveloperApi ::
:: DeveloperApi ::
A MapType object can be constructed with two ways,
MapType(keyType: DataType, valueType: DataType, valueContainsNull: Boolean)
and
MapType(keyType: DataType, valueType: DataType)
For MapType(keyType: DataType, valueType: DataType)
,
the field of valueContainsNull
is set to true
.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing NULL
values.
:: DeveloperApi ::
:: DeveloperApi ::
A Row object can be constructed by providing field values. Example:
import org.apache.spark.sql._ // Create a Row from values. Row(value1, value2, value3, ...) // Create a Row from a Seq of values. Row.fromSeq(Seq(value1, value2, ...))
A value of a row can be accessed through both generic access by ordinal, which will incur boxing overhead for primitives, as well as native primitive access. An example of generic access by ordinal:
import org.apache.spark.sql._ val row = Row(1, true, "a string", null) // row: Row = [1,true,a string,null] val firstValue = row(0) // firstValue: Any = 1 val fourthValue = row(3) // fourthValue: Any = null
For native primitive access, it is invalid to use the native primitive interface to retrieve
a value that is null, instead a user must check isNullAt
before attempting to retrieve a
value that might be null.
An example of native primitive access:
// using the row from the previous example. val firstValue = row.getInt(0) // firstValue: Int = 1 val isNull = row.isNullAt(3) // isNull: Boolean = true
Interfaces related to native primitive access are:
isNullAt(i: Int): Boolean
getInt(i: Int): Int
getLong(i: Int): Long
getDouble(i: Int): Double
getFloat(i: Int): Float
getBoolean(i: Int): Boolean
getShort(i: Int): Short
getByte(i: Int): Byte
getString(i: Int): String
Fields in a Row object can be extracted in a pattern match. Example:
import org.apache.spark.sql._ val pairs = sql("SELECT key, value FROM src").rdd.map { case Row(key: Int, value: String) => key -> value }
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Short
values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing String
values
:: DeveloperApi ::
:: DeveloperApi ::
A StructField object can be constructed by
StructField(name: String, dataType: DataType, nullable: Boolean)
:: DeveloperApi ::
:: DeveloperApi ::
A StructType object can be constructed by
StructType(fields: Seq[StructField])
For a StructType object, one or multiple StructFields can be extracted by names.
If multiple StructFields are extracted, a StructType object will be returned.
If a provided name does not have a matching field, it will be ignored. For the case
of extracting a single StructField, a null
will be returned.
Example:
import org.apache.spark.sql._ val struct = StructType( StructField("a", IntegerType, true) :: StructField("b", LongType, false) :: StructField("c", BooleanType, false) :: Nil) // Extract a single StructField. val singleField = struct("b") // singleField: StructField = StructField(b,LongType,false) // This struct does not have a field called "d". null will be returned. val nonExisting = struct("d") // nonExisting: StructField = null // Extract multiple StructFields. Field names are provided in a set. // A StructType object will be returned. val twoFields = struct(Set("b", "c")) // twoFields: StructType = // StructType(List(StructField(b,LongType,false), StructField(c,BooleanType,false))) // Those names do not have matching fields will be ignored. // For the case shown below, "d" will be ignored and // it is treated as struct(Set("b", "c")). val ignoreNonExisting = struct(Set("b", "c", "d")) // ignoreNonExisting: StructType = // StructType(List(StructField(b,LongType,false), StructField(c,BooleanType,false)))
A Row object is used as a value of the StructType. Example:
import org.apache.spark.sql._ val innerStruct = StructType( StructField("f1", IntegerType, true) :: StructField("f2", LongType, false) :: StructField("f3", BooleanType, false) :: Nil) val struct = StructType( StructField("a", innerStruct, true) :: Nil) // Create a Row with the schema defined by struct val row = Row(Row(1, 2, true)) // row: Row = [[1,2,true]]
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing java.sql.Timestamp
values.
:: DeveloperApi :: An execution engine for relational query plans that runs on top Spark and returns RDDs.
A set of APIs for adding data sources to Spark SQL.
Allows the execution of relational queries, including those expressed in SQL using Spark.