FME Version
Introduction
In FME, the Databricks reader supports the reading of Delta Tables stored within a Databricks Workspace. FME can also read tables stored in Unity Catalog. The data can then be transformed, manipulated, and converted into one of the many formats supported by FME. At the moment, Databricks does not support spatial data.
In this tutorial, we will create a Databricks database connection to read data from Databricks. You will need to have a Databricks account to continue. After creating the connection, continue to Converting Databricks to JSON to work through an example of reading data from Databricks with FME.
Creating a Databricks Database Connection
To read Databricks tables, you will need to create a Databricks database connection. When creating a Databricks database connection you are required to provide a few different parameters. The dialog will look the same whether Databricks is hosted on AWS or Azure. When creating a Databricks web connection, the dialog will look as follows:
To fulfill all the Databricks database connection parameters, follow the next steps.
Open your Databricks Workspace in a web browser, go to the Compute tab.
Next, click on the Cluster you want to use for reading Databricks tables.
The Databricks reader and writer require that the cluster is running in order to access the tables stored in Databricks.
On the Cluster Details page, scroll down to the bottom and expand the Advanced Options. Click on the JDBC/ODBC tab and you will be presented with the parameters required to create a Databricks database connection.
Server Hostname
This value is exactly the same as the Server Hostname value under your Cluster Details.
Note: If the Databricks Workspace is hosted on Azure, then the Server Hostname may end with “azuredatabricks.net” instead of “cloud.databricks.com”.
HTTP Path
This is the exact same as the HTTP Path you should see under your Cluster Details. For example, as seen in the screenshot above, the HTTP Path would be sql/protocolv1/o/1234567890123456/0123-123456-aj1bc23e.
Authentication
You have the option to set this to either a Databricks Login or a Personal Access Token. A Databricks Login consists of the username and password you use to login to Databricks.
To generate a Personal Access Token, please follow these instructions.
Catalog
Which Catalog you have access to is set at the Databricks database connection. If you have Unity Catalog enabled for the Databricks Workspace, then you will need to select which Catalog you want to read from. The Catalog you select will affect which tables appear when adding a Databricks reader to the workspace.
Using the example values in the screenshot above, the Databricks database connection would look like the following:
The following is a visual representation of where the corresponding parameters are located in Databricks.
Data Attribution
The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.
Comments
0 comments
Please sign in to leave a comment.