Creating Databricks SQL Endpoints with a Service Principal

Kyle Gibson
7 min readFeb 24, 2022

--

The Problem:

TL/DR: Databricks SQL Endpoints need to be created through service principals and not individual users.

Our team has been using Databricks SQL lately to enable users to explore data in our data lake, but we also have used it as a Power BI connector when building a few different datasets.

The connection to the data is managed through SQL Endpoints that Databricks makes easy to create directly in the workspace:

Pop-up to create a SQL Endpoint
Pop-up to create a SQL Endpoint

Once you create the SQL Endpoint, you can select it and view the details. An important thing to note — the details show who created the endpoint.

SQL Endpoint Details
SQL Endpoint Details

Our original setup had been to manually create endpoints as individual users directly in the workspace. However, we were alerted to a problem a couple weeks ago when every Power BI dataset that was connected to our endpoints was failing to refresh.

After some research, we identified the problem. This endpoint was created by an employee who had left the company a few months ago. We didn’t think it was an issue when he first left because everything worked fine even after his Azure AD account was disabled. Even though the account was disabled, he was still listed as a user in the Databricks workspace (in the Admin Console) which allowed the endpoint he created to continue working. But when I was cleaning up the users in the workspace, I removed him as a user without realizing the impact that would have. Removing him from the workspace made every SQL Endpoint he had created unusable, so users could no longer use them for data exploration and Power BI datasets could no longer refresh using those connections.

The Solution:

Create SQL Endpoints programmatically though a service principal so that they aren’t tied to an individual user account.

Documentation followed:
Get Azure AD tokens by using a service principal — Azure Databricks | Microsoft Docs

SQL Endpoints APIs 2.0 | Databricks on AWS

Note: I’m getting better at deploying Azure resources through the Azure CLI and Powershell instead of through the Portal, so I can store the code that I used in our Azure Repo. However, if just getting started, the Azure Portal can be easier to use, so all examples shown will be in the portal.

Step 1: Create an App Registration

Open the Azure Portal, navigate to App Registrations, select New Registration to create, and complete the information to create a new one.

Create app registration
Create app registration

Important note: when your App Registration is created, make sure to view the details, and confirm that the Overview > Managed Application in local directory option shows the name of the App Registration as well. If not, you will need to click that to create the Service Principal, and your app registration will now show as an Enterprise App in Azure AD.

Step 2: Create a client secret for your app registration

On the left menu for your App Registration, navigate to Manage > Certificates and Secrets.

Certificates and Secrets
Certificates and secrets

From there, navigate to Client Secrets > New client secret.

New client secret
New client secret

Enter a description and duration for your secret. You will need to store this secret somewhere securely, as you won’t be able to see it again after the initial creation. We store our secrets in Azure Key Vault.

Step 3: Get tokens to authenticate to the Databricks SQL API as the service principal instead of a user-generated personal access token

Note 1: I built my process in an Azure Function App using Python, so my example will be in Python. However, the API methods demonstrated can be made using your language of choice.

Note 2: My actual code to get tokens is using the msal library and looks cleaner, but my demonstration below will show what the actual request looks like, so it can translate to other languages.

I didn’t want to add my service principal as a member of the Databricks workspace itself, so there are two different tokens that have to be generated to authenticate to the Databricks SQL API. One is the Azure AD token and the other is the Azure Management token. If you choose to add the service principal as a member of the workspace, only the Azure AD token is needed.

Azure AD token:

Fields needed

  • ClientID: Found in Overview section of App Registration
  • ClientSecret: Generated in previous step and stored securely
  • TenantID: Found in Overview section of App Registration
  • Scope: hardcoded as 2ff814a6–3304–4ab8–85cb-cd0e6f879c1d/.default (from documentation link above)

ClientID and TenantID are both listed in the details of your App Registration.

Client ID and Tenant ID from App Registration Details
Client ID and Tenant ID from App Registration Details

In the production environment, I store these in an Azure Key Vault as well, so I’m not hard-coding them in any script.

Client secret was generated in the previous step. Now we are ready to make our request to get the Azure AD token. Note that YourClientID, YourClientSecret, and YourTenantID are just dummy values below that would be replaced with your actual values. Again, I use Azure Key Vault references in my Azure Function App for these values and don’t hard-code any of them.

Azure AD token request
Azure AD token request

You now have the Azure AD token to use later.

Azure Management token:

Everything is the same as the Azure AD token request, except the scope, which is https://management.core.windows.net//.default (from documentation link above)

Azure Management token request
Azure Management token request

You now have the Azure Management token to use later.

Step 4: Add your service principal as a Contributor on the Databricks instance in Azure

No screenshots on this one as its pretty self-explanatory. Navigate to the instance of Databricks in the Azure Portal, then navigate to Access Control (IAM) > Add > Role Assignment. From there, the role assignment you will choose is Contributor and then you will select your service principal.

Step 5: Get the Resource ID for your Databricks instance

If you just finished step 4, then you are already in your Databricks instance. Navigate to Settings > Properties > Resource ID to get the Resource ID to use later. I store this in Key Vault as well so that I’m not hard-coding it in any script.

Step 6: Create new SQL Endpoint through the Databricks SQL API

Fields needed

  • Azure AD token: Generated in step 3
  • Azure Management token: Generated in step 3
  • Databricks Resource ID: Found in step 5
  • Databricks Workspace URL: Can be seen in the Azure Portal or in the URL in your web browser while in your Databricks workspace. Will look like this: adb-1234567890123456.78.azuredatabricks.net

Here is what the request will look like to create an endpoint. Note that I’ve put dummy values where you would replace it with your values. I will note again that I store as much as I can in an Azure Key Vault. The Azure tokens won’t be stored there as those should be newly generated each time you run this, but the Resource ID and Workspace URL can be stored securely.

Create Databricks SQL Endpoint request
Create Databricks SQL Endpoint request

The Databricks SQL API documentation has a full list of all the different properties that can be included in the request body. I have mine set to default to what you see above, as any administrator can go in and edit the properties later after the endpoint is created. My main goal was to have all endpoints created by the service principal and not a user.

After running all of this, you will now have created a Databricks SQL Endpoint that will show as created by your Application (Client) ID of your App Registration. The added benefit of this method is that you didn’t add this service principal as a user to the workspace, so you don’t run the risk of removing that user and the endpoint no longer working.

Obviously some of the methods above can be used for more than just creating a SQL Endpoint, but I just wanted to show one example of how we used a service principal to authenticate to Databricks and help us avoid problems in the future.

Thanks for reading!

--

--

Kyle Gibson
Kyle Gibson

Written by Kyle Gibson

Christian, husband, father. Data Engineer at Chick-fil-A.

No responses yet