Using Dynamic Catalogs, Schemas, or Tables in Databricks SQL
Have you ever wanted to parameterize your catalog, schema or table names in Databricks SQL? This article will show you how to be more dynamic with them.
Problem:
Suppose I have a sample schema with three different versions that represent development (_dv), user acceptance (_ua), and production (_pr):
To select from any one of these three, I could hard-code in the schema name using the Databricks SQL Editor:
But SQL Editor has parameters, right? Let’s use those to make this a little more dynamic. Parameters in Databricks SQL can be created by wrapping the parameter name in double curly brackets:
As you can see, the SQL Editor doesn’t allow the use of parameters in the schema name. This behavior is also seen when trying to use parameters for catalog or table names.
Let’s see how we can parameterize the schema name while still using SQL syntax.
Solution:
Solution 1: A Python Notebook
Note: This solution is not my preferred of the two solutions, but it will work.
Step 1: Create a Python notebook:
Step 2: Create a Python variable for the dynamic environment:
Step 3: Use Spark SQL and the Python variable to build the dynamic query
Note: I used Python’s f-string method to insert the variable, but use whatever Python string method you prefer.
If you add .display() to the end of your Spark SQL command, you can see the output of the query:
That’s it!
I don’t prefer this solution because the SQL query has to be written as a Python string inside of the spark.sql command. For complicated SQL queries, this can get a little overwhelming. I would prefer to write in standard SQL format while still having the ability to be dynamic with the schema name.
Let’s look at a different solution that allows this.
Solution 2: A SQL Notebook
Note: This solution is my preferred of the two solutions.
Step 1: Create a SQL notebook:
Step 2: Use the SQL SET command for the dynamic environment:
Step 3: Use SQL and the key created with the SET command to build the dynamic query
Note: Take note of the syntax used to reference the key that was created by the SET command: ${hiveconf:Environment}
That’s it!
I prefer this solution because the SQL queries written will be much cleaner without having to build them as a Python string.
Important Note: Both of these solutions will work for catalog, schema, or table names.
Important Note:
You may be wondering — why did we use a notebook to use the SET command? Why didn’t we just use it in SQL Editor?
Look at what happens when we try that:
The Spark configuration environment isn’t avaiable when trying that. So we had to use it in the notebook instead.
Conclusion:
Hopefully this helps you in your usage of Databricks SQL. Let me know in the comments how you plan to use this in the future.
Thanks for reading!