Using Dynamic Catalogs, Schemas, or Tables in Databricks SQL

Kyle Gibson
4 min readJun 16, 2023

--

Have you ever wanted to parameterize your catalog, schema or table names in Databricks SQL? This article will show you how to be more dynamic with them.

Problem:

Suppose I have a sample schema with three different versions that represent development (_dv), user acceptance (_ua), and production (_pr):

Shows all sample schemas in hive_metastore catalog

To select from any one of these three, I could hard-code in the schema name using the Databricks SQL Editor:

Query to select * from the sample_dv schema

But SQL Editor has parameters, right? Let’s use those to make this a little more dynamic. Parameters in Databricks SQL can be created by wrapping the parameter name in double curly brackets:

Syntax error when using parameter in the schema name

As you can see, the SQL Editor doesn’t allow the use of parameters in the schema name. This behavior is also seen when trying to use parameters for catalog or table names.

Let’s see how we can parameterize the schema name while still using SQL syntax.

Solution:

Solution 1: A Python Notebook

Note: This solution is not my preferred of the two solutions, but it will work.

Step 1: Create a Python notebook:

Creation of python notebook

Step 2: Create a Python variable for the dynamic environment:

Python environment variable

Step 3: Use Spark SQL and the Python variable to build the dynamic query
Note: I used Python’s f-string method to insert the variable, but use whatever Python string method you prefer.

spark sql command with the python environment variable

If you add .display() to the end of your Spark SQL command, you can see the output of the query:

spark sql command with the display method added to the end

That’s it!

I don’t prefer this solution because the SQL query has to be written as a Python string inside of the spark.sql command. For complicated SQL queries, this can get a little overwhelming. I would prefer to write in standard SQL format while still having the ability to be dynamic with the schema name.

Let’s look at a different solution that allows this.

Solution 2: A SQL Notebook

Note: This solution is my preferred of the two solutions.

Step 1: Create a SQL notebook:

blank sql notebook in Databricks

Step 2: Use the SQL SET command for the dynamic environment:

SET command used to set the environment

Step 3: Use SQL and the key created with the SET command to build the dynamic query
Note: Take note of the syntax used to reference the key that was created by the SET command: ${hiveconf:Environment}

SQL query using the key created from the SET command

That’s it!

I prefer this solution because the SQL queries written will be much cleaner without having to build them as a Python string.

Important Note: Both of these solutions will work for catalog, schema, or table names.

Important Note:

You may be wondering — why did we use a notebook to use the SET command? Why didn’t we just use it in SQL Editor?

Look at what happens when we try that:

Error in SQL Editor of Configuration Environment not available

The Spark configuration environment isn’t avaiable when trying that. So we had to use it in the notebook instead.

Conclusion:

Hopefully this helps you in your usage of Databricks SQL. Let me know in the comments how you plan to use this in the future.

Thanks for reading!

--

--

Kyle Gibson

Christian, husband, father. Data Engineer at Chick-fil-A.