How to Use the Databricks Reader

Dan Minney
Dan Minney
  • Updated

FME Version

  • FME 2023.0

Introduction

In FME, the Databricks reader supports the reading of Delta Tables stored within a Databricks Workspace. FME can also read tables stored in Unity Catalog. The data can then be transformed, manipulated, and converted into one of the many formats supported by FME. At the moment, Databricks does not support spatial data. 

In this tutorial, we will create a Databricks database connection to read data from Databricks. You will need to have a Databricks account to continue. After creating the connection, continue to Converting Databricks to JSON to work through an example of reading data from Databricks with FME. 
 

Creating a Databricks Database Connection

To read Databricks tables, you will need to create a Databricks database connection. When creating a Databricks database connection you are required to provide a few different parameters. The dialog will look the same whether Databricks is hosted on AWS or Azure. When creating a Databricks web connection, the dialog will look as follows:
image5.png


To fulfill all the Databricks database connection parameters, follow the next steps. 
Open your Databricks Workspace in a web browser, go to the Compute tab. 
image7.png

Next, click on the Cluster you want to use for reading Databricks tables. 
image18.png

The Databricks reader and writer require that the cluster is running in order to access the tables stored in Databricks. 
On the Cluster Details page, scroll down to the bottom and expand the Advanced Options. Click on the JDBC/ODBC tab and you will be presented with the parameters required to create a Databricks database connection.
rtaImage.jpeg

Server Hostname
This value is exactly the same as the Server Hostname value under your Cluster Details. 
Note: If the Databricks Workspace is hosted on Azure, then the Server Hostname may end with “azuredatabricks.net” instead of “cloud.databricks.com”.

HTTP Path 
This is the exact same as the HTTP Path you should see under your Cluster Details. For example, as seen in the screenshot above, the HTTP Path would be sql/protocolv1/o/1234567890123456/0123-123456-aj1bc23e. 

Authentication
You have the option to set this to either a Databricks Login or a Personal Access Token. A Databricks Login consists of the username and password you use to login to Databricks.

To generate a Personal Access Token, please follow these instructions

Catalog
Which Catalog you have access to is set at the Databricks database connection. If you have Unity Catalog enabled for the Databricks Workspace, then you will need to select which Catalog you want to read from. The Catalog you select will affect which tables appear when adding a Databricks reader to the workspace.

Using the example values in the screenshot above, the Databricks database connection would look like the following:
image1.png

The following is a visual representation of where the corresponding parameters are located in Databricks.
image3.png
 

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.
 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.