pyspark.sql.functions.from_csv#

pyspark.sql.functions.from_csv(col, schema, options=None)[source]#

CSV Function: Parses a column containing a CSV string into a row with the specified schema. Returns null if the string cannot be parsed.

New in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

colColumn or str: A column or column name in CSV format.
schemaColumn or str: A column, or Python string literal with schema in DDL format, to use when parsing the CSV column.
optionsdict, optional: Options to control parsing. Accepts the same options as the CSV datasource. See Data Source Option for the version you use.

Returns

Column: A column of parsed CSV values.

Examples

Example 1: Parsing a simple CSV string

>>> from pyspark.sql import functions as sf
>>> data = [("1,2,3",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
+---------------+
|from_csv(value)|
+---------------+
|      {1, 2, 3}|
+---------------+

Example 2: Using schema_of_csv to infer the schema

>>> from pyspark.sql import functions as sf
>>> data = [("1,2,3",)]
>>> value = data[0][0]
>>> df.select(sf.from_csv(df.value, sf.schema_of_csv(value))).show()
+---------------+
|from_csv(value)|
+---------------+
|      {1, 2, 3}|
+---------------+

Example 3: Ignoring leading white space in the CSV string

>>> from pyspark.sql import functions as sf
>>> data = [("   abc",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> options = {'ignoreLeadingWhiteSpace': True}
>>> df.select(sf.from_csv(df.value, "s string", options)).show()
+---------------+
|from_csv(value)|
+---------------+
|          {abc}|
+---------------+

Example 4: Parsing a CSV string with a missing value

>>> from pyspark.sql import functions as sf
>>> data = [("1,2,",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
+---------------+
|from_csv(value)|
+---------------+
|   {1, 2, NULL}|
+---------------+

Example 5: Parsing a CSV string with a different delimiter

>>> from pyspark.sql import functions as sf
>>> data = [("1;2;3",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> options = {'delimiter': ';'}
>>> df.select(sf.from_csv(df.value, "a INT, b INT, c INT", options)).show()
+---------------+
|from_csv(value)|
+---------------+
|      {1, 2, 3}|
+---------------+