public class Pipeline extends Estimator<PipelineModel>
Estimator
or a Transformer
. When fit(org.apache.spark.sql.DataFrame, org.apache.spark.ml.param.ParamMap)
is called, the
stages are executed in order. If a stage is an Estimator
, its Estimator.fit(org.apache.spark.sql.DataFrame, org.apache.spark.ml.param.ParamPair<?>...)
method will
be called on the input dataset to fit a model. Then the model, which is a transformer, will be
used to transform the dataset as the input to the next stage. If a stage is a Transformer
,
its Transformer.transform(org.apache.spark.sql.DataFrame, org.apache.spark.ml.param.ParamPair<?>...)
method will be called to produce the dataset for the next stage.
The fitted model from a Pipeline
is an PipelineModel
, which consists of fitted models and
transformers, corresponding to the pipeline stages. If there are no stages, the pipeline acts as
an identity transformer.Constructor and Description |
---|
Pipeline() |
Modifier and Type | Method and Description |
---|---|
PipelineModel |
fit(DataFrame dataset,
ParamMap paramMap)
Fits the pipeline to the input dataset with additional parameters.
|
PipelineStage[] |
getStages() |
Pipeline |
setStages(PipelineStage[] value) |
Param<PipelineStage[]> |
stages()
param for pipeline stages
|
org.apache.spark.sql.types.StructType |
transformSchema(org.apache.spark.sql.types.StructType schema,
ParamMap paramMap)
:: DeveloperAPI ::
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addOutputColumn, checkInputColumn, explainParams, get, getParam, isSet, paramMap, params, set, set, validate, validate
uid
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public Param<PipelineStage[]> stages()
public Pipeline setStages(PipelineStage[] value)
public PipelineStage[] getStages()
public PipelineModel fit(DataFrame dataset, ParamMap paramMap)
Estimator
, its Estimator.fit(org.apache.spark.sql.DataFrame, org.apache.spark.ml.param.ParamPair<?>...)
method will be called on the input dataset to fit a model.
Then the model, which is a transformer, will be used to transform the dataset as the input to
the next stage. If a stage is a Transformer
, its Transformer.transform(org.apache.spark.sql.DataFrame, org.apache.spark.ml.param.ParamPair<?>...)
method will be
called to produce the dataset for the next stage. The fitted model from a Pipeline
is an
PipelineModel
, which consists of fitted models and transformers, corresponding to the
pipeline stages. If there are no stages, the output model acts as an identity transformer.
fit
in class Estimator<PipelineModel>
dataset
- input datasetparamMap
- parameter mappublic org.apache.spark.sql.types.StructType transformSchema(org.apache.spark.sql.types.StructType schema, ParamMap paramMap)
PipelineStage
Derives the output schema from the input schema and parameters. The schema describes the columns and types of the data.
transformSchema
in class PipelineStage
schema
- Input schema to this stageparamMap
- Parameters passed to this stage