Introduction
The following is not an exhaustive list of all the issues you may encounter with the Databricks reader and writer. If you run into additional ones, feel free to comment at the bottom of this article or submit a support ticket.
Note that the Databricks Reader and Writer use different connections. Please see How to Use the Databricks Reader and How to Use the Databricks Writer.
Initial Troubleshooting
Have you reviewed the steps to use the Databricks Reader and Writer?
Please see How to Use the Databricks Reader and How to Use the Databricks Writer.
Databricks Writer
Are you using the latest package from the FME Hub?
Check your package version in FME Form by going to Tools > FME Options > FME Packages. Make sure you're using the latest package from the FME Hub.
Can you write to Databricks outside of FME?
When writing to Databricks, FME uses a staging location for the data. If you are using an Amazon S3 or Microsoft Azure Data Lake Gen 2 location for the staging location, then your Databricks cluster must have access to it. Please see How to Use the Databricks Writer.
A good test is to add a sample file (for example, a csv) to your staging location and see if you can ingest it into your Databricks Delta Lake using a COPY INTO command in a Databricks Notebook. See these instructions from Microsoft for Azure or these instructions from Databricks for AWS.
Are you using a Unity Catalog Volume?
If possible, we recommend using a Unity Catalog Volume as the storage type for the staging upload to avoid the additional configuration of an Amazon S3 or Microsoft Azure staging location. Please see Microsoft's documentation Set up and manage Unity Catalog for Azure and Databricks' documentation Set up and manage Unity Catalog for AWS.
Common Issues
Databricks Compute does not have access to Unity Catalog (UC)
If your Databricks compute is not configured for access to your Unity Catalog then you will run into issues when attempting to read from/write to your tables.
For example, when writing to Databricks, using a UC Volume as the storage location, with a compute that is not configured for access to UC, you may run into the following error:
DatabricksWriter: InvalidMountException: Error while using path /Volumes/demos/default/demos-default-volume for creating file system within mount at '/Volumes/demos/default/demos-default-volume'.
This Databricks Knowledge Base article contains more details.
FME Form times out while performing an operation.
Likely, the Databricks cluster isn't running and could not be started. Whenever you run an operation that requires the Databricks workspace cluster to be running, FME will trigger the cluster to start up. You will see a message in the translation log similar to the one below:
DatabricksWriter: Cluster 0587-172155-djjgaaa9 was terminated. Restarting
This can take several minutes in Databricks. If the operation times out, retry once the cluster is running.
Databricks writer error: "A table name or table qualifier was not specified for feature type ‘<feature-type-here>’"
You'll see this error in the translation log. The correct schema has not been set on the Databricks writer feature type. For the Databricks writer, the schema is set to “default” automatically. If you are using a different schema name, you will need to set this.
The Databricks Reader error: "[Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP Response code: 403, Error message: Unknown."
The user-created Databricks personal access token (PAT) may have expired. A new access token will need to be created and used in your Databricks Database Connection.
Databricks writer error: "Invalid configuration value detected for fs.azure.account.key"
The full error will be something like "DatabricksWriter: KeyProviderException: Failure to initialize configuration for storage account teststorageaccount.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key".
This occurs if you use Microsoft Azure Data Lake Gen 2 as the storage type for the staging upload. FME writes a parquet file to the staging location and then executes the database operation in Databricks using the data in the parquet file. The database operation requires your Databricks workspace to have access to the cloud storage location. See Databricks' documentation: Connect to Azure Data Lake Storage and Blob Storage. The Spark configuration to provide access can be applied to your Databricks cluster's advanced settings as described by Microsoft in the Compute configuration reference.