pyspark.sql.Catalog.refreshTable#

Catalog.refreshTable(tableName)[source]#

Invalidates and refreshes all the cached data and metadata of the given table.

New in version 2.0.0.

Parameters

tableNamestr: name of the table to get.

Changed in version 3.4.0: Allow tableName to be qualified with catalog name.

Examples

The example below caches a table, and then removes the data.

>>> import tempfile
>>> with tempfile.TemporaryDirectory(prefix="refreshTable") as d:
...     _ = spark.sql("DROP TABLE IF EXISTS tbl1")
...     _ = spark.sql(
...         "CREATE TABLE tbl1 (col STRING) USING TEXT LOCATION '{}'".format(d))
...     _ = spark.sql("INSERT INTO tbl1 SELECT 'abc'")
...     spark.catalog.cacheTable("tbl1")
...     spark.table("tbl1").show()
+---+
|col|
+---+
|abc|
+---+

Because the table is cached, it computes from the cached data as below.

>>> spark.table("tbl1").count()
1

After refreshing the table, it shows 0 because the data does not exist anymore.

>>> spark.catalog.refreshTable("tbl1")
>>> spark.table("tbl1").count()
0

Using the fully qualified name for the table.

>>> spark.catalog.refreshTable("spark_catalog.default.tbl1")
>>> _ = spark.sql("DROP TABLE tbl1")