pyspark.sql.streaming.DataStreamWriter.format#

DataStreamWriter.format(source)[source]#

Specifies the underlying output data source.

New in version 2.0.0.

Changed in version 3.5.0: Supports Spark Connect.

Parameters

sourcestr: string, name of the data source, which for now can be ‘parquet’.

Notes

This API is evolving.

Examples

>>> df = spark.readStream.format("rate").load()
>>> df.writeStream.format("text")
<...streaming.readwriter.DataStreamWriter object ...>

This API allows to configure the source to write. The example below writes a CSV file from Rate source in a streaming manner.

>>> import tempfile
>>> import time
>>> with tempfile.TemporaryDirectory(prefix="format1") as d:
...     with tempfile.TemporaryDirectory(prefix="format2") as cp:
...         df = spark.readStream.format("rate").load()
...         q = df.writeStream.format("csv").option("checkpointLocation", cp).start(d)
...         time.sleep(5)
...         q.stop()
...         spark.read.schema("timestamp TIMESTAMP, value STRING").csv(d).show()
+...---------+-----+
|...timestamp|value|
+...---------+-----+
...