pyspark.sql.Catalog#
- class pyspark.sql.Catalog(sparkSession)[source]#
- User-facing catalog API, accessible through SparkSession.catalog. - This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog. - Changed in version 3.4.0: Supports Spark Connect. - Methods - cacheTable(tableName[, storageLevel])- Caches the specified table in-memory or with given storage level. - Removes all cached tables from the in-memory cache. - createExternalTable(tableName[, path, ...])- Creates a table based on the dataset in a data source. - createTable(tableName[, path, source, ...])- Creates a table based on the dataset in a data source. - Returns the current default catalog in this session. - Returns the current default database in this session. - databaseExists(dbName)- Check if the database with the specified name exists. - dropGlobalTempView(viewName)- Drops the global temporary view with the given view name in the catalog. - dropTempView(viewName)- Drops the local temporary view with the given view name in the catalog. - functionExists(functionName[, dbName])- Check if the function with the specified name exists. - getDatabase(dbName)- Get the database with the specified name. - getFunction(functionName)- Get the function with the specified name. - getTable(tableName)- Get the table or view with the specified name. - isCached(tableName)- Returns true if the table is currently cached in-memory. - listCatalogs([pattern])- Returns a list of catalogs in this session. - listColumns(tableName[, dbName])- Returns a list of columns for the given table/view in the specified database. - listDatabases([pattern])- Returns a list of databases available across all sessions. - listFunctions([dbName, pattern])- Returns a list of functions registered in the specified database. - listTables([dbName, pattern])- Returns a list of tables/views in the specified database. - recoverPartitions(tableName)- Recovers all the partitions of the given table and updates the catalog. - refreshByPath(path)- Invalidates and refreshes all the cached data (and the associated metadata) for any DataFrame that contains the given data source path. - refreshTable(tableName)- Invalidates and refreshes all the cached data and metadata of the given table. - registerFunction(name, f[, returnType])- An alias for - spark.udf.register().- setCurrentCatalog(catalogName)- Sets the current default catalog in this session. - setCurrentDatabase(dbName)- Sets the current default database in this session. - tableExists(tableName[, dbName])- Check if the table or view with the specified name exists. - uncacheTable(tableName)- Removes the specified table from the in-memory cache.