pyspark.sql.streaming.DataStreamWriter.format#
- DataStreamWriter.format(source)[source]#
Specifies the underlying output data source.
New in version 2.0.0.
Changed in version 3.5.0: Supports Spark Connect.
- Parameters
- sourcestr
string, name of the data source, which for now can be ‘parquet’.
Notes
This API is evolving.
Examples
>>> df = spark.readStream.format("rate").load() >>> df.writeStream.format("text") <...streaming.readwriter.DataStreamWriter object ...>
This API allows to configure the source to write. The example below writes a CSV file from Rate source in a streaming manner.
>>> import tempfile >>> import time >>> with tempfile.TemporaryDirectory(prefix="format1") as d: ... with tempfile.TemporaryDirectory(prefix="format2") as cp: ... df = spark.readStream.format("rate").load() ... q = df.writeStream.format("csv").option("checkpointLocation", cp).start(d) ... time.sleep(5) ... q.stop() ... spark.read.schema("timestamp TIMESTAMP, value STRING").csv(d).show() +...---------+-----+ |...timestamp|value| +...---------+-----+ ...