pyspark.sql.functions.from_csv#
- pyspark.sql.functions.from_csv(col, schema, options=None)[source]#
CSV Function: Parses a column containing a CSV string into a row with the specified schema. Returns null if the string cannot be parsed.
New in version 3.0.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- col
Column
or str A column or column name in CSV format.
- schema
Column
or str A column, or Python string literal with schema in DDL format, to use when parsing the CSV column.
- optionsdict, optional
Options to control parsing. Accepts the same options as the CSV datasource. See Data Source Option for the version you use.
- col
- Returns
Column
A column of parsed CSV values.
Examples
Example 1: Parsing a simple CSV string
>>> from pyspark.sql import functions as sf >>> data = [("1,2,3",)] >>> df = spark.createDataFrame(data, ("value",)) >>> df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show() +---------------+ |from_csv(value)| +---------------+ | {1, 2, 3}| +---------------+
Example 2: Using schema_of_csv to infer the schema
>>> from pyspark.sql import functions as sf >>> data = [("1,2,3",)] >>> value = data[0][0] >>> df.select(sf.from_csv(df.value, sf.schema_of_csv(value))).show() +---------------+ |from_csv(value)| +---------------+ | {1, 2, 3}| +---------------+
Example 3: Ignoring leading white space in the CSV string
>>> from pyspark.sql import functions as sf >>> data = [(" abc",)] >>> df = spark.createDataFrame(data, ("value",)) >>> options = {'ignoreLeadingWhiteSpace': True} >>> df.select(sf.from_csv(df.value, "s string", options)).show() +---------------+ |from_csv(value)| +---------------+ | {abc}| +---------------+
Example 4: Parsing a CSV string with a missing value
>>> from pyspark.sql import functions as sf >>> data = [("1,2,",)] >>> df = spark.createDataFrame(data, ("value",)) >>> df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show() +---------------+ |from_csv(value)| +---------------+ | {1, 2, NULL}| +---------------+
Example 5: Parsing a CSV string with a different delimiter
>>> from pyspark.sql import functions as sf >>> data = [("1;2;3",)] >>> df = spark.createDataFrame(data, ("value",)) >>> options = {'delimiter': ';'} >>> df.select(sf.from_csv(df.value, "a INT, b INT, c INT", options)).show() +---------------+ |from_csv(value)| +---------------+ | {1, 2, 3}| +---------------+