public class PartitionPruning
extends Object
The basic mechanism for DPP inserts a duplicated subquery with the filter from the other side, when the following conditions are met: (1) the table to prune is partitioned by the JOIN key (2) the join operation is one of the following types: INNER, LEFT SEMI (partitioned on left), LEFT OUTER (partitioned on right), or RIGHT OUTER (partitioned on left)
In order to enable partition pruning directly in broadcasts, we use a custom DynamicPruning clause that incorporates the In clause with the subquery and the benefit estimation. During query planning, when the join type is known, we use the following mechanism: (1) if the join is a broadcast hash join, we replace the duplicated subquery with the reused results of the broadcast, (2) else if the estimated benefit of partition pruning outweighs the overhead of running the subquery query twice, we keep the duplicated subquery (3) otherwise, we drop the subquery.
Constructor and Description |
---|
PartitionPruning() |
Modifier and Type | Method and Description |
---|---|
static org.apache.spark.sql.catalyst.plans.logical.LogicalPlan |
apply(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan) |
static scala.Option<scala.Tuple2<org.apache.spark.sql.catalyst.expressions.Expression,org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> |
findExpressionAndTrackLineageDown(org.apache.spark.sql.catalyst.expressions.Expression exp,
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan) |
static scala.Option<org.apache.spark.sql.execution.datasources.LogicalRelation> |
getPartitionTableScan(org.apache.spark.sql.catalyst.expressions.Expression a,
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)
Search the partitioned table scan for a given partition column in a logical plan
|
static void |
org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) |
static org.slf4j.Logger |
org$apache$spark$internal$Logging$$log_() |
static String |
ruleName() |
public static scala.Option<org.apache.spark.sql.execution.datasources.LogicalRelation> getPartitionTableScan(org.apache.spark.sql.catalyst.expressions.Expression a, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)
a
- (undocumented)plan
- (undocumented)public static org.apache.spark.sql.catalyst.plans.logical.LogicalPlan apply(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)
public static String ruleName()
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)
public static scala.Option<scala.Tuple2<org.apache.spark.sql.catalyst.expressions.Expression,org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> findExpressionAndTrackLineageDown(org.apache.spark.sql.catalyst.expressions.Expression exp, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)