pyspark.sql.plot.core.PySparkPlotAccessor.box#

PySparkPlotAccessor.box(column=None, **kwargs)[source]#

Make a box plot of the DataFrame columns.

Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

Parameters
column: str or list of str, optional

Column name or list of names to be used for creating the box plot. If None (default), all numeric columns will be used.

**kwargs

Extra arguments to precision: refer to a float that is used by pyspark to compute approximate statistics for building a boxplot. The default value is 0.01. Use smaller values to get more precise statistics.

Returns
plotly.graph_objs.Figure

Examples

>>> data = [
...     ("A", 50, 55),
...     ("B", 55, 60),
...     ("C", 60, 65),
...     ("D", 65, 70),
...     ("E", 70, 75),
...     ("F", 10, 15),
...     ("G", 85, 90),
...     ("H", 5, 150),
... ]
>>> columns = ["student", "math_score", "english_score"]
>>> df = spark.createDataFrame(data, columns)
>>> df.plot.box()  
>>> df.plot.box(column="math_score")  
>>> df.plot.box(column=["math_score", "english_score"])