pyspark.sql.functions.from_csv#

pyspark.sql.functions.from_csv(col, schema, options=None)[source]#

CSV Function: Parses a column containing a CSV string into a row with the specified schema. Returns null if the string cannot be parsed.

New in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

A column or column name in CSV format.

schemaColumn or str

A column, or Python string literal with schema in DDL format, to use when parsing the CSV column.

optionsdict, optional

Options to control parsing. Accepts the same options as the CSV datasource. See Data Source Option for the version you use.

Returns
Column

A column of parsed CSV values.

Examples

Example 1: Parsing a simple CSV string

>>> from pyspark.sql import functions as sf
>>> data = [("1,2,3",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
+---------------+
|from_csv(value)|
+---------------+
|      {1, 2, 3}|
+---------------+

Example 2: Using schema_of_csv to infer the schema

>>> from pyspark.sql import functions as sf
>>> data = [("1,2,3",)]
>>> value = data[0][0]
>>> df.select(sf.from_csv(df.value, sf.schema_of_csv(value))).show()
+---------------+
|from_csv(value)|
+---------------+
|      {1, 2, 3}|
+---------------+

Example 3: Ignoring leading white space in the CSV string

>>> from pyspark.sql import functions as sf
>>> data = [("   abc",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> options = {'ignoreLeadingWhiteSpace': True}
>>> df.select(sf.from_csv(df.value, "s string", options)).show()
+---------------+
|from_csv(value)|
+---------------+
|          {abc}|
+---------------+

Example 4: Parsing a CSV string with a missing value

>>> from pyspark.sql import functions as sf
>>> data = [("1,2,",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
+---------------+
|from_csv(value)|
+---------------+
|   {1, 2, NULL}|
+---------------+

Example 5: Parsing a CSV string with a different delimiter

>>> from pyspark.sql import functions as sf
>>> data = [("1;2;3",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> options = {'delimiter': ';'}
>>> df.select(sf.from_csv(df.value, "a INT, b INT, c INT", options)).show()
+---------------+
|from_csv(value)|
+---------------+
|      {1, 2, 3}|
+---------------+