Inequality test.
Inequality test.
// Scala: df.select( df("colA") !== df("colB") ) df.select( !(df("colA") === df("colB")) ) // Java: import static org.apache.spark.sql.functions.*; df.filter( col("colA").notEqual(col("colB")) );
Modulo (a.
Modulo (a.k.a. remainder) expression.
Boolean AND.
Boolean AND.
// Scala: The following selects people that are in school and employed at the same time. people.select( people("inSchool") && people("isEmployed") ) // Java: people.select( people("inSchool").and(people("isEmployed")) );
Multiplication of this expression and another expression.
Multiplication of this expression and another expression.
// Scala: The following multiplies a person's height by their weight. people.select( people("height") * people("weight") ) // Java: people.select( people("height").multiply(people("weight")) );
Sum of this expression and another expression.
Sum of this expression and another expression.
// Scala: The following selects the sum of a person's height and weight. people.select( people("height") + people("weight") ) // Java: people.select( people("height").plus(people("weight")) );
Subtraction.
Subtraction. Subtract the other expression from this expression.
// Scala: The following selects the difference between people's height and their weight. people.select( people("height") - people("weight") ) // Java: people.select( people("height").minus(people("weight")) );
Division this expression by another expression.
Division this expression by another expression.
// Scala: The following divides a person's height by their weight. people.select( people("height") / people("weight") ) // Java: people.select( people("height").divide(people("weight")) );
Less than.
Less than.
// Scala: The following selects people younger than 21. people.select( people("age") < 21 ) // Java: people.select( people("age").lt(21) );
Less than or equal to.
Less than or equal to.
// Scala: The following selects people age 21 or younger than 21. people.select( people("age") <= 21 ) // Java: people.select( people("age").leq(21) );
Equality test that is safe for null values.
Equality test.
Equality test.
// Scala: df.filter( df("colA") === df("colB") ) // Java import static org.apache.spark.sql.functions.*; df.filter( col("colA").equalTo(col("colB")) );
Greater than.
Greater than.
// Scala: The following selects people older than 21. people.select( people("age") > 21 ) // Java: import static org.apache.spark.sql.functions.*; people.select( people("age").gt(21) );
Greater than or equal to an expression.
Greater than or equal to an expression.
// Scala: The following selects people age 21 or older than 21. people.select( people("age") >= 21 ) // Java: people.select( people("age").geq(21) )
Boolean AND.
Boolean AND.
// Scala: The following selects people that are in school and employed at the same time. people.select( people("inSchool") && people("isEmployed") ) // Java: people.select( people("inSchool").and(people("isEmployed")) );
Gives the column an alias.
Gives the column an alias.
// Renames colA to colB in select output. df.select($"colA".as('colB))
Gives the column an alias.
Gives the column an alias.
// Renames colA to colB in select output. df.select($"colA".as("colB"))
Returns an ordering used in sorting.
Returns an ordering used in sorting.
// Scala: sort a DataFrame by age column in ascending order. df.sort(df("age").asc) // Java df.sort(df.col("age").asc());
Casts the column to a different data type, using the canonical string representation of the type.
Casts the column to a different data type, using the canonical string representation
of the type. The supported types are: string
, boolean
, byte
, short
, int
, long
,
float
, double
, decimal
, date
, timestamp
.
// Casts colA to integer. df.select(df("colA").cast("int"))
Casts the column to a different data type.
Casts the column to a different data type.
// Casts colA to IntegerType. import org.apache.spark.sql.types.IntegerType df.select(df("colA").cast(IntegerType)) // equivalent to df.select(df("colA").cast("int"))
Contains the other element.
Returns an ordering used in sorting.
Returns an ordering used in sorting.
// Scala: sort a DataFrame by age column in descending order. df.sort(df("age").desc) // Java df.sort(df.col("age").desc());
Division this expression by another expression.
Division this expression by another expression.
// Scala: The following divides a person's height by their weight. people.select( people("height") / people("weight") ) // Java: people.select( people("height").divide(people("weight")) );
String ends with another string literal.
String ends with.
Equality test that is safe for null values.
Equality test.
Equality test.
// Scala: df.filter( df("colA") === df("colB") ) // Java import static org.apache.spark.sql.functions.*; df.filter( col("colA").equalTo(col("colB")) );
Prints the expression to the console for debugging purpose.
Greater than or equal to an expression.
Greater than or equal to an expression.
// Scala: The following selects people age 21 or older than 21. people.select( people("age") >= 21 ) // Java: people.select( people("age").geq(21) )
An expression that gets a field by name in a StructField.
An expression that gets an item at position ordinal
out of an array.
Greater than.
Greater than.
// Scala: The following selects people older than 21. people.select( people("age") > lit(21) ) // Java: import static org.apache.spark.sql.functions.*; people.select( people("age").gt(21) );
A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
True if the current expression is NOT null.
True if the current expression is null.
Less than or equal to.
Less than or equal to.
// Scala: The following selects people age 21 or younger than 21. people.select( people("age") <= 21 ) // Java: people.select( people("age").leq(21) );
SQL like expression.
Less than.
Less than.
// Scala: The following selects people younger than 21. people.select( people("age") < 21 ) // Java: people.select( people("age").lt(21) );
Subtraction.
Subtraction. Subtract the other expression from this expression.
// Scala: The following selects the difference between people's height and their weight. people.select( people("height") - people("weight") ) // Java: people.select( people("height").minus(people("weight")) );
Modulo (a.
Modulo (a.k.a. remainder) expression.
Multiplication of this expression and another expression.
Multiplication of this expression and another expression.
// Scala: The following multiplies a person's height by their weight. people.select( people("height") * people("weight") ) // Java: people.select( people("height").multiply(people("weight")) );
Inequality test.
Inequality test.
// Scala: df.select( df("colA") !== df("colB") ) df.select( !(df("colA") === df("colB")) ) // Java: import static org.apache.spark.sql.functions.*; df.filter( col("colA").notEqual(col("colB")) );
Boolean OR.
Boolean OR.
// Scala: The following selects people that are in school or employed. people.filter( people("inSchool") || people("isEmployed") ) // Java: people.filter( people("inSchool").or(people("isEmployed")) );
Sum of this expression and another expression.
Sum of this expression and another expression.
// Scala: The following selects the sum of a person's height and weight. people.select( people("height") + people("weight") ) // Java: people.select( people("height").plus(people("weight")) );
SQL RLIKE expression (LIKE with Regex).
String starts with another string literal.
String starts with.
An expression that returns a substring.
An expression that returns a substring.
starting position.
length of the substring.
An expression that returns a substring.
An expression that returns a substring.
expression for the starting position.
expression for the length of the substring.
Inversion of boolean expression, i.
Inversion of boolean expression, i.e. NOT. {{ // Scala: select rows that are not active (isActive === false) df.filter( !df("isActive") )
// Java: import static org.apache.spark.sql.functions.*; df.filter( not(df.col("isActive")) ); }}
Unary minus, i.
Unary minus, i.e. negate the expression.
// Scala: select the amount column and negates all values. df.select( -df("amount") ) // Java: import static org.apache.spark.sql.functions.*; df.select( negate(col("amount") );
Boolean OR.
Boolean OR.
// Scala: The following selects people that are in school or employed. people.filter( people("inSchool") || people("isEmployed") ) // Java: people.filter( people("inSchool").or(people("isEmployed")) );
:: Experimental :: A column in a DataFrame.