Connecting Alteryx to Databricks SQL

Kyle Gibson
4 min readMar 2, 2022

The problem:

You’re wanting to build a workflow in Alteryx by connecting to Databricks SQL because a Data Engineer won’t shut up about it, but you don’t know how.

The solution:

Follow the steps below.

  1. Download the correct Databricks ODBC Driver for your machine (Window 64 bit for me).

ODBC Drivers Download — Databricks

2. Go through setup steps to install the driver:

Simba Spark Driver installation setup window

3. Pull up the ODBC Data Source Administrator window (Note: I just type ‘ODBC Data Sources’ in the Windows search to find it). Make sure you select the one that corresponds to the driver you downloaded, whether it was 32-bit or 64-bit. Confirm that Simba Spark ODBC Driver is installed under the Drivers tab.

ODBC Data Source Administrator window

4. In the same ODBC Data Source Administrator window, click over to the User DSN tab. Click Add to create a new data source, choose the Spark driver you just added, and click finish.

Create new data source window

5. You should now see a screen that lets you configure the Simba Spark driver:

Simba Spark ODBC Driver Configuration window

6. Fill out the settings. You will want to decide which Databricks SQL Endpoint you will use to connect, since you will use its connection details in this section.

  • Data Source Name: Databricks (or whatever you want to name it — just remember the name for later when you build your Alteryx workflow)
  • Description: Connection to Databricks SQL (or however you want to describe it)
  • Host: Value found in your Databricks workspace > SQL Endpoints > [Your SQL Endpoint to use in Alteryx] > Connection details > Server hostname
  • Port: Value found in your Databricks workspace > SQL Endpoints > [Your SQL Endpoint to use in Alteryx] > Connection details > Port
  • Database: You can leave it at default to be able to choose from any Databricks SQL database when building a workflow or you can specify the database in your Databricks SQL instance that you want to see when using building a workflow
  • Authentication Mechanism: Username and Password
  • User Name: token
  • *Password: a personal access token generated in the user settings of the Databricks workspace. You will need to be a member of the workspace to do this. These can be manually generated in the workspace: Settings > User Settings > Personal Access Tokens > Generate New Token.
  • Thrift Transport: HTTP
  • HTTP Options > HTTP Path: Value found in your Databricks workspace > SQL Endpoints > [Your SQL Endpoint to use in Alteryx] > Connection details > HTTP path
  • SSL Options > Enable SSL: Make sure box is checked
  • Other settings can be configured as you see fit.

*Additional notes about passwords/tokens:

Like with any passwords/tokens, these should be stored securely and not shared with other users. The permissions to the Databricks SQL tables will depend on the user who generated the token, so tokens should not be shared between users if you want the underlying database and table permissions to be honored.

If your workflow is moved to production, you will probably want to look at creating a service principal for your workspace, and generating a token through it, so the authentication isn’t tied to an individual user. See this article that talks about some techniques that may help you do that:

Creating Databricks SQL Endpoints with a Service Principal | by Kyle Gibson | Feb, 2022 | Medium

7. Open up your Data connections in Alteryx and select the ODBC connection for Databricks.

Alteryx data connections window

8. Make sure to choose the source name that you chose in Step 6. If you named it Databricks, it would show like this:

Alteryx workflow data connection username/password window

9. You can re-enter token for username and the personal access token for password just like in step 6 if you want, or you can do it the easy way by leaving them blank and selecting OK since you added those in the driver configuration already.

10. Choose the table that you want to use.

11. That’s it. Now you’ve successfully connected to Databricks SQL.

Hopefully this helps you in building out Alteryx workflows by connecting to Databricks SQL.

Thanks for reading!

--

--

Kyle Gibson

Christian, husband, father. Data Engineer at Chick-fil-A.