pyspark.sql.plot.core.PySparkPlotAccessor.box#
- PySparkPlotAccessor.box(column=None, **kwargs)[source]#
Make a box plot of the DataFrame columns.
Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.
- Parameters
- column: str or list of str, optional
Column name or list of names to be used for creating the box plot. If None (default), all numeric columns will be used.
- **kwargs
Extra arguments to precision: refer to a float that is used by pyspark to compute approximate statistics for building a boxplot. The default value is 0.01. Use smaller values to get more precise statistics.
- Returns
plotly.graph_objs.Figure
Examples
>>> data = [ ... ("A", 50, 55), ... ("B", 55, 60), ... ("C", 60, 65), ... ("D", 65, 70), ... ("E", 70, 75), ... ("F", 10, 15), ... ("G", 85, 90), ... ("H", 5, 150), ... ] >>> columns = ["student", "math_score", "english_score"] >>> df = spark.createDataFrame(data, columns) >>> df.plot.box() >>> df.plot.box(column="math_score") >>> df.plot.box(column=["math_score", "english_score"])