pyspark.sql.Catalog.refreshTable#
- Catalog.refreshTable(tableName)[source]#
Invalidates and refreshes all the cached data and metadata of the given table.
New in version 2.0.0.
- Parameters
- tableNamestr
name of the table to get.
Changed in version 3.4.0: Allow
tableName
to be qualified with catalog name.
Examples
The example below caches a table, and then removes the data.
>>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="refreshTable") as d: ... _ = spark.sql("DROP TABLE IF EXISTS tbl1") ... _ = spark.sql( ... "CREATE TABLE tbl1 (col STRING) USING TEXT LOCATION '{}'".format(d)) ... _ = spark.sql("INSERT INTO tbl1 SELECT 'abc'") ... spark.catalog.cacheTable("tbl1") ... spark.table("tbl1").show() +---+ |col| +---+ |abc| +---+
Because the table is cached, it computes from the cached data as below.
>>> spark.table("tbl1").count() 1
After refreshing the table, it shows 0 because the data does not exist anymore.
>>> spark.catalog.refreshTable("tbl1") >>> spark.table("tbl1").count() 0
Using the fully qualified name for the table.
>>> spark.catalog.refreshTable("spark_catalog.default.tbl1") >>> _ = spark.sql("DROP TABLE tbl1")