Column (Spark 1.6.3 JavaDoc)

Object
- org.apache.spark.sql.Column

All Implemented Interfaces:: Logging

Direct Known Subclasses:: ColumnName, TypedColumn

public class Column
extends Object
implements Logging

:: Experimental :: A column that will be computed based on the data in a DataFrame.

A new column is constructed based on the input columns present in a dataframe:


   df("columnName")            // On a specific DataFrame.
   col("columnName")           // A generic column no yet associcated with a DataFrame.
   col("columnName.field")     // Extracting a struct field
   col("`a.column.with.dots`") // Escape `.` in column names.
   $"columnName"               // Scala short hand for a named column.
   expr("a + 1")               // A column that is constructed from a parsed SQL Expression.
   lit("abc")                  // A column that produces a literal (constant) value.

Column objects can be composed to form complex expressions:


   $"a" + 1
   $"a" === $"b"

Since:: 1.3.0

Constructor Summary

Constructors
Constructor and Description

Column(org.apache.spark.sql.catalyst.expressions.Expression expr)

Column(String name)

Constructors
Constructor and Description
`Column(org.apache.spark.sql.catalyst.expressions.Expression expr)`
`Column(String name)`

Method Summary

Methods
Modifier and Type	Method and Description
`Column`	`alias(String alias)` Gives the column an alias.
`Column`	`and(Column other)` Boolean AND.
`Column`	`apply(Object extraction)` Extracts a value or values from a complex type.
`<U> TypedColumn<Object,U>`	`as(Encoder<U> evidence$1)` Provides a type hint about the expected return value of this column.
`Column`	`as(scala.collection.Seq<String> aliases)` (Scala-specific) Assigns the given aliases to the results of a table generating function.
`Column`	`as(String alias)` Gives the column an alias.
`Column`	`as(String[] aliases)` Assigns the given aliases to the results of a table generating function.
`Column`	`as(String alias, Metadata metadata)` Gives the column an alias with metadata.
`Column`	`as(scala.Symbol alias)` Gives the column an alias.
`Column`	`asc()` Returns an ordering used in sorting.
`Column`	`between(Object lowerBound, Object upperBound)` True if the current column is between the lower bound and upper bound, inclusive.
`Column`	`bitwiseAND(Object other)` Compute bitwise AND of this expression with another expression.
`Column`	`bitwiseOR(Object other)` Compute bitwise OR of this expression with another expression.
`Column`	`bitwiseXOR(Object other)` Compute bitwise XOR of this expression with another expression.
`Column`	`cast(DataType to)` Casts the column to a different data type.
`Column`	`cast(String to)` Casts the column to a different data type, using the canonical string representation of the type.
`Column`	`contains(Object other)` Contains the other element.
`Column`	`desc()` Returns an ordering used in sorting.
`Column`	`divide(Object other)` Division this expression by another expression.
`Column`	`endsWith(Column other)` String ends with.
`Column`	`endsWith(String literal)` String ends with another string literal.
`Column`	`eqNullSafe(Object other)` Equality test that is safe for null values.
`boolean`	`equals(Object that)`
`Column`	`equalTo(Object other)` Equality test.
`void`	`explain(boolean extended)` Prints the expression to the console for debugging purpose.
`Column`	`geq(Object other)` Greater than or equal to an expression.
`Column`	`getField(String fieldName)` An expression that gets a field by name in a `StructType`.
`Column`	`getItem(Object key)` An expression that gets an item at position `ordinal` out of an array, or gets a value by key `key` in a `MapType`.
`Column`	`gt(Object other)` Greater than.
`int`	`hashCode()`
`Column`	`in(Object... list)` Deprecated. As of 1.5.0. Use isin. This will be removed in Spark 2.0.
`Column`	`in(scala.collection.Seq<Object> list)` Deprecated. As of 1.5.0. Use isin. This will be removed in Spark 2.0.
`Column`	`isin(Object... list)` A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
`Column`	`isin(scala.collection.Seq<Object> list)` A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
`Column`	`isNaN()` True if the current expression is NaN.
`Column`	`isNotNull()` True if the current expression is NOT null.
`Column`	`isNull()` True if the current expression is null.
`Column`	`leq(Object other)` Less than or equal to.
`Column`	`like(String literal)` SQL like expression.
`Column`	`lt(Object other)` Less than.
`Column`	`minus(Object other)` Subtraction.
`Column`	`mod(Object other)` Modulo (a.k.a.
`Column`	`multiply(Object other)` Multiplication of this expression and another expression.
`Column`	`notEqual(Object other)` Inequality test.
`Column`	`or(Column other)` Boolean OR.
`Column`	`otherwise(Object value)` Evaluates a list of conditions and returns one of multiple possible result expressions.
`Column`	`over(WindowSpec window)` Define a windowing column.
`Column`	`plus(Object other)` Sum of this expression and another expression.
`Column`	`rlike(String literal)` SQL RLIKE expression (LIKE with Regex).
`Column`	`startsWith(Column other)` String starts with.
`Column`	`startsWith(String literal)` String starts with another string literal.
`Column`	`substr(Column startPos, Column len)` An expression that returns a substring.
`Column`	`substr(int startPos, int len)` An expression that returns a substring.
`String`	`toString()`
`static scala.Option<org.apache.spark.sql.catalyst.expressions.Expression>`	`unapply(Column col)`
`Column`	`when(Column condition, Object value)` Evaluates a list of conditions and returns one of multiple possible result expressions.

Methods inherited from class Object
getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - Column
```
public Column(org.apache.spark.sql.catalyst.expressions.Expression expr)
```
  - Column
```
public Column(String name)
```
- Method Detail
  - unapply
```
public static scala.Option<org.apache.spark.sql.catalyst.expressions.Expression> unapply(Column col)
```
  - in
```
public Column in(Object... list)
```
    Deprecated. As of 1.5.0. Use isin. This will be removed in Spark 2.0.
    
    A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
    
    Parameters:
    list - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - isin
```
public Column isin(Object... list)
```
    A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
    
    Parameters:
    list - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.5.0
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object
  - equals
```
public boolean equals(Object that)
```
    Overrides:
    
    equals in class Object
  - hashCode
```
public int hashCode()
```
    Overrides:
    
    hashCode in class Object
  - as
```
public <U> TypedColumn<Object,U> as(Encoder<U> evidence$1)
```
    Provides a type hint about the expected return value of this column. This information can be used by operations such as select on a Dataset to automatically convert the results into the correct JVM types.
    
    Parameters:
    evidence$1 - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.6.0
  - apply
```
public Column apply(Object extraction)
```
    Extracts a value or values from a complex type. The following types of extraction are supported:
    - Given an Array, an integer ordinal can be used to retrieve a single value. - Given a Map, a key of the correct type can be used to retrieve an individual value. - Given a Struct, a string fieldName can be used to extract that field. - Given an Array of Structs, a string fieldName can be used to extract filed of every struct in that array, and return an Array of fields
    
    Parameters:
    extraction - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - equalTo
```
public Column equalTo(Object other)
```
    Equality test.
```
   // Scala:
   df.filter( df("colA") === df("colB") )

   // Java
   import static org.apache.spark.sql.functions.*;
   df.filter( col("colA").equalTo(col("colB")) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - notEqual
```
public Column notEqual(Object other)
```
    Inequality test.
```
   // Scala:
   df.select( df("colA") !== df("colB") )
   df.select( !(df("colA") === df("colB")) )

   // Java:
   import static org.apache.spark.sql.functions.*;
   df.filter( col("colA").notEqual(col("colB")) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - gt
```
public Column gt(Object other)
```
    Greater than.
```
   // Scala: The following selects people older than 21.
   people.select( people("age") > lit(21) )

   // Java:
   import static org.apache.spark.sql.functions.*;
   people.select( people("age").gt(21) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - lt
```
public Column lt(Object other)
```
    Less than.
```
   // Scala: The following selects people younger than 21.
   people.select( people("age") < 21 )

   // Java:
   people.select( people("age").lt(21) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - leq
```
public Column leq(Object other)
```
    Less than or equal to.
```
   // Scala: The following selects people age 21 or younger than 21.
   people.select( people("age") <= 21 )

   // Java:
   people.select( people("age").leq(21) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - geq
```
public Column geq(Object other)
```
    Greater than or equal to an expression.
```
   // Scala: The following selects people age 21 or older than 21.
   people.select( people("age") >= 21 )

   // Java:
   people.select( people("age").geq(21) )
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - eqNullSafe
```
public Column eqNullSafe(Object other)
```
    Equality test that is safe for null values.
    
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - when
```
public Column when(Column condition,
          Object value)
```
    Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
```
   // Example: encoding gender string column into integer.

   // Scala:
   people.select(when(people("gender") === "male", 0)
     .when(people("gender") === "female", 1)
     .otherwise(2))

   // Java:
   people.select(when(col("gender").equalTo("male"), 0)
     .when(col("gender").equalTo("female"), 1)
     .otherwise(2))
 
```
    Parameters:
    condition - (undocumented)
    value - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - otherwise
```
public Column otherwise(Object value)
```
    Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
```
   // Example: encoding gender string column into integer.

   // Scala:
   people.select(when(people("gender") === "male", 0)
     .when(people("gender") === "female", 1)
     .otherwise(2))

   // Java:
   people.select(when(col("gender").equalTo("male"), 0)
     .when(col("gender").equalTo("female"), 1)
     .otherwise(2))
 
```
    Parameters:
    value - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - between
```
public Column between(Object lowerBound,
             Object upperBound)
```
    True if the current column is between the lower bound and upper bound, inclusive.
    
    Parameters:
    lowerBound - (undocumented)
    upperBound - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - isNaN
```
public Column isNaN()
```
    True if the current expression is NaN.
    
    Returns:
    (undocumented)
    Since:
    
    1.5.0
  - isNull
```
public Column isNull()
```
    True if the current expression is null.
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - isNotNull
```
public Column isNotNull()
```
    True if the current expression is NOT null.
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - or
```
public Column or(Column other)
```
    Boolean OR.
```
   // Scala: The following selects people that are in school or employed.
   people.filter( people("inSchool") || people("isEmployed") )

   // Java:
   people.filter( people("inSchool").or(people("isEmployed")) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - and
```
public Column and(Column other)
```
    Boolean AND.
```
   // Scala: The following selects people that are in school and employed at the same time.
   people.select( people("inSchool") && people("isEmployed") )

   // Java:
   people.select( people("inSchool").and(people("isEmployed")) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - plus
```
public Column plus(Object other)
```
    Sum of this expression and another expression.
```
   // Scala: The following selects the sum of a person's height and weight.
   people.select( people("height") + people("weight") )

   // Java:
   people.select( people("height").plus(people("weight")) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - minus
```
public Column minus(Object other)
```
    Subtraction. Subtract the other expression from this expression.
```
   // Scala: The following selects the difference between people's height and their weight.
   people.select( people("height") - people("weight") )

   // Java:
   people.select( people("height").minus(people("weight")) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - multiply
```
public Column multiply(Object other)
```
    Multiplication of this expression and another expression.
```
   // Scala: The following multiplies a person's height by their weight.
   people.select( people("height") * people("weight") )

   // Java:
   people.select( people("height").multiply(people("weight")) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - divide
```
public Column divide(Object other)
```
    Division this expression by another expression.
```
   // Scala: The following divides a person's height by their weight.
   people.select( people("height") / people("weight") )

   // Java:
   people.select( people("height").divide(people("weight")) );
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - mod
```
public Column mod(Object other)
```
    Modulo (a.k.a. remainder) expression.
    
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - in
```
public Column in(scala.collection.Seq<Object> list)
```
    Deprecated. As of 1.5.0. Use isin. This will be removed in Spark 2.0.
    
    A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
    
    Parameters:
    list - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - isin
```
public Column isin(scala.collection.Seq<Object> list)
```
    A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
    
    Parameters:
    list - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.5.0
  - like
```
public Column like(String literal)
```
    SQL like expression.
    
    Parameters:
    literal - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - rlike
```
public Column rlike(String literal)
```
    SQL RLIKE expression (LIKE with Regex).
    
    Parameters:
    literal - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - getItem
```
public Column getItem(Object key)
```
    An expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType.
    
    Parameters:
    key - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - getField
```
public Column getField(String fieldName)
```
    An expression that gets a field by name in a StructType.
    
    Parameters:
    fieldName - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - substr
```
public Column substr(Column startPos,
            Column len)
```
    An expression that returns a substring.
    
    Parameters:
    startPos - expression for the starting position.
    len - expression for the length of the substring.
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - substr
```
public Column substr(int startPos,
            int len)
```
    An expression that returns a substring.
    
    Parameters:
    startPos - starting position.
    len - length of the substring.
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - contains
```
public Column contains(Object other)
```
    Contains the other element.
    
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - startsWith
```
public Column startsWith(Column other)
```
    String starts with.
    
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - startsWith
```
public Column startsWith(String literal)
```
    String starts with another string literal.
    
    Parameters:
    literal - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - endsWith
```
public Column endsWith(Column other)
```
    String ends with.
    
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - endsWith
```
public Column endsWith(String literal)
```
    String ends with another string literal.
    
    Parameters:
    literal - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - alias
```
public Column alias(String alias)
```
    Gives the column an alias. Same as as.
```
   // Renames colA to colB in select output.
   df.select($"colA".alias("colB"))
 
```
    Parameters:
    alias - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - as
```
public Column as(String alias)
```
    Gives the column an alias.
```
   // Renames colA to colB in select output.
   df.select($"colA".as("colB"))
 
```
    If the current column has metadata associated with it, this metadata will be propagated to the new column. If this not desired, use as with explicitly empty metadata.
    Parameters:
    alias - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - as
```
public Column as(scala.collection.Seq<String> aliases)
```
    (Scala-specific) Assigns the given aliases to the results of a table generating function.
```
   // Renames colA to colB in select output.
   df.select(explode($"myMap").as("key" :: "value" :: Nil))
 
```
    Parameters:
    aliases - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - as
```
public Column as(String[] aliases)
```
    Assigns the given aliases to the results of a table generating function.
```
   // Renames colA to colB in select output.
   df.select(explode($"myMap").as("key" :: "value" :: Nil))
 
```
    Parameters:
    aliases - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - as
```
public Column as(scala.Symbol alias)
```
    Gives the column an alias.
```
   // Renames colA to colB in select output.
   df.select($"colA".as('colB))
 
```
    If the current column has metadata associated with it, this metadata will be propagated to the new column. If this not desired, use as with explicitly empty metadata.
    Parameters:
    alias - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - as
```
public Column as(String alias,
        Metadata metadata)
```
    Gives the column an alias with metadata.
```
   val metadata: Metadata = ...
   df.select($"colA".as("colB", metadata))
 
```
    Parameters:
    alias - (undocumented)
    metadata - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - cast
```
public Column cast(DataType to)
```
    Casts the column to a different data type.
```
   // Casts colA to IntegerType.
   import org.apache.spark.sql.types.IntegerType
   df.select(df("colA").cast(IntegerType))

   // equivalent to
   df.select(df("colA").cast("int"))
 
```
    Parameters:
    to - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - cast
```
public Column cast(String to)
```
    Casts the column to a different data type, using the canonical string representation of the type. The supported types are: string, boolean, byte, short, int, long, float, double, decimal, date, timestamp.
```
   // Casts colA to integer.
   df.select(df("colA").cast("int"))
 
```
    Parameters:
    to - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - desc
```
public Column desc()
```
    Returns an ordering used in sorting.
```
   // Scala: sort a DataFrame by age column in descending order.
   df.sort(df("age").desc)

   // Java
   df.sort(df.col("age").desc());
 
```
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - asc
```
public Column asc()
```
    Returns an ordering used in sorting.
```
   // Scala: sort a DataFrame by age column in ascending order.
   df.sort(df("age").asc)

   // Java
   df.sort(df.col("age").asc());
 
```
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - explain
```
public void explain(boolean extended)
```
    Prints the expression to the console for debugging purpose.
    
    Parameters:
    extended - (undocumented)
    Since:
    
    1.3.0
  - bitwiseOR
```
public Column bitwiseOR(Object other)
```
    Compute bitwise OR of this expression with another expression.
```
   df.select($"colA".bitwiseOR($"colB"))
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - bitwiseAND
```
public Column bitwiseAND(Object other)
```
    Compute bitwise AND of this expression with another expression.
```
   df.select($"colA".bitwiseAND($"colB"))
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - bitwiseXOR
```
public Column bitwiseXOR(Object other)
```
    Compute bitwise XOR of this expression with another expression.
```
   df.select($"colA".bitwiseXOR($"colB"))
 
```
    Parameters:
    other - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0
  - over
```
public Column over(WindowSpec window)
```
    Define a windowing column.
```
   val w = Window.partitionBy("name").orderBy("id")
   df.select(
     sum("price").over(w.rangeBetween(Long.MinValue, 2)),
     avg("price").over(w.rowsBetween(0, 4))
   )
 
```
    Parameters:
    window - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.4.0

Class Column

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

Column

Column

Method Detail

unapply

in

isin

toString

equals

hashCode

as

apply

equalTo

notEqual

gt

lt

leq

geq

eqNullSafe

when

otherwise

between

isNaN

isNull

isNotNull

or

and

plus

minus

multiply

divide

mod

in

isin

like

rlike

getItem

getField

substr

substr

contains

startsWith

startsWith

endsWith

endsWith

alias

as

as

as

as

as

cast

cast

desc

asc

explain

bitwiseOR

bitwiseAND

bitwiseXOR

over