NaiveBayes (Spark 3.0.0-preview2 JavaDoc)

Object
- org.apache.spark.mllib.classification.NaiveBayes

All Implemented Interfaces:

java.io.Serializable, Logging
```
public class NaiveBayes
extends Object
implements scala.Serializable, Logging
```
Trains a Naive Bayes model given an RDD of (label, features) pairs.
This is the Multinomial NB (see here) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB (see here). The input feature values must be nonnegative.

See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description

NaiveBayes()

NaiveBayes(double lambda)

Constructors
Constructor and Description
`NaiveBayes()`
`NaiveBayes(double lambda)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`double`	`getLambda()` Get the smoothing parameter.
`String`	`getModelType()` Get the model type.
`NaiveBayesModel`	`run(RDD<LabeledPoint> data)` Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
`NaiveBayes`	`setLambda(double lambda)` Set the smoothing parameter.
`NaiveBayes`	`setModelType(String modelType)` Set the model type using a string (case-sensitive).
`static NaiveBayesModel`	`train(RDD<LabeledPoint> input)` Trains a Naive Bayes model given an RDD of `(label, features)` pairs.
`static NaiveBayesModel`	`train(RDD<LabeledPoint> input, double lambda)` Trains a Naive Bayes model given an RDD of `(label, features)` pairs.
`static NaiveBayesModel`	`train(RDD<LabeledPoint> input, double lambda, String modelType)` Trains a Naive Bayes model given an RDD of `(label, features)` pairs.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - NaiveBayes
```
public NaiveBayes(double lambda)
```
  - NaiveBayes
```
public NaiveBayes()
```
- Method Detail
  - train
```
public static NaiveBayesModel train(RDD<LabeledPoint> input)
```
    Trains a Naive Bayes model given an RDD of (label, features) pairs.
    This is the default Multinomial NB (see here) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification.
    This version of the method uses a default smoothing parameter of 1.0.
    
    Parameters:
    
    input - RDD of (label, array of features) pairs. Every vector should be a frequency vector or a count vector.
    
    Returns:
    
    (undocumented)
  - train
```
public static NaiveBayesModel train(RDD<LabeledPoint> input,
                                    double lambda)
```
    Trains a Naive Bayes model given an RDD of (label, features) pairs.
    This is the default Multinomial NB (see here) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification.
    
    Parameters:
    
    input - RDD of (label, array of features) pairs. Every vector should be a frequency vector or a count vector.
    
    lambda - The smoothing parameter
    
    Returns:
    
    (undocumented)
  - train
```
public static NaiveBayesModel train(RDD<LabeledPoint> input,
                                    double lambda,
                                    String modelType)
```
    Trains a Naive Bayes model given an RDD of (label, features) pairs.
    The model type can be set to either Multinomial NB (see here) or Bernoulli NB (see here). The Multinomial NB can handle discrete count data and can be called by setting the model type to "multinomial". For example, it can be used with word counts or TF_IDF vectors of documents. The Bernoulli model fits presence or absence (0-1) counts. By making every vector a 0-1 vector and setting the model type to "bernoulli", the fits and predicts as Bernoulli NB.
    
    Parameters:
    
    input - RDD of (label, array of features) pairs. Every vector should be a frequency vector or a count vector.
    
    lambda - The smoothing parameter
    
    modelType - The type of NB model to fit from the enumeration NaiveBayesModels, can be multinomial or bernoulli
    
    Returns:
    
    (undocumented)
  - setLambda
```
public NaiveBayes setLambda(double lambda)
```
    Set the smoothing parameter. Default: 1.0.
  - getLambda
```
public double getLambda()
```
    Get the smoothing parameter.
  - setModelType
```
public NaiveBayes setModelType(String modelType)
```
    Set the model type using a string (case-sensitive). Supported options: "multinomial" (default) and "bernoulli".
    
    Parameters:
    
    modelType - (undocumented)
    
    Returns:
    
    (undocumented)
  - getModelType
```
public String getModelType()
```
    Get the model type.
  - run
```
public NaiveBayesModel run(RDD<LabeledPoint> data)
```
    Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
    
    Parameters:
    
    data - RDD of LabeledPoint.
    
    Returns:
    
    (undocumented)

Class NaiveBayes

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Detail

NaiveBayes

NaiveBayes

Method Detail

train

train

train

setLambda

getLambda

setModelType

getModelType

run