FME Version
Introduction
Databricks is a cloud data platform built on Apache Spark that unifies data warehouse and data lake concepts into one platform - the data lakehouse. Databricks handles data, analytics, and AI use cases. Databricks is available in FME 2023.0 and newer.
The Databricks reader and writer support the Delta Table format, the tabular format commonly used to store data in Databricks. All tables on Databricks are Delta tables by default.
The Delta Tables also provide the opportunity to work with table versions and time travel, as described in the How to Use the Databricks Reader article.
Because the Databricks reader uses the Databricks JDBC Driver, and the Databricks writer uses the Databricks API, it’s important to note that the reader and writer connections cannot be shared. This means you will need to create separate connections for the reader and writer, where typically most readers and writers support the use of the same connection. Information on supported data types can be found in the Databricks documentation.
The Databricks reader and writer require a Cluster to be running in order to read and write data. If you do not have a Cluster running at the time of querying a table, then FME will start the Cluster for you. This is important to note because there may be a delay when reading/writing due to FME waiting for the Cluster to be running. FME will not automatically terminate a Cluster after starting it for a read or write process.
Video
This video demonstrates how to connect to Databricks from FME. Detailed steps on connecting can also be found in the articles linked below.
Articles
This tutorial series will walk you through how to use the Databricks reader & writer - including creating database connections and example scenarios.
Comments
0 comments
Please sign in to leave a comment.