Converting Databricks to JSON

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2023.0

Introduction

This tutorial walks through how to convert a Databricks Delta table to a JSON dataset. 

In this scenario, you need to join Databricks Data with a Geopackage dataset and output it to JSON for easy sharing over the web. Follow the steps below to build this workspace from scratch, or open the completed FME template in the article attachments.
 

Step-by-Step Instructions

The data we are working with is an OGC Geopackage containing Address Points in Vancouver, British Columbia.

1. Upload Source Data to Databricks
Download the postal_addresses.csv file from the Files section of this article. Upload the file to a Databricks Workspace either through the Databricks web UI, or by using the Databricks writer in FME.

In Databricks, click on Data, then Add > Add Data. 
AddData.png

2. Open FME Workbench
Open FME Workbench and create a new workspace.

3. Add an OGC Geopackage Reader
Add a reader to the workspace by clicking the “Add Reader” button in the toolbar. Enter the following parameters:

  • Format: OGC Geopackage
  • Dataset: AddressInformation.gpkg

Since there is only one table in this GeoPackage, click OK to finish adding the reader, and the AddressPoints reader feature type will automatically be added to the canvas. 
geopackagereader.png

4. Add a Databricks Reader
Add another reader to the workspace. Enter the following parameters:

  • Format: Databricks
  • Connection: <your databricks connection>

If you have not yet created a Databricks database connection, please see How to Use the Databricks Reader.

Click the Parameters button. Click the ellipsis next to the Tables parameter and select the postal_addresses layer you uploaded to Databricks in Step 1. Click OK three times to add the postal_addresses reader feature type to the canvas. 
DatabricksReader.png

5. Join the Data
Before we output the data to our destination format, we want to join attributes from the postal_addresses table to the AddressPoints table. To do so we can use the FeatureMerger transformer. This transformer allows us to join features based on a common attribute.

Add a FeatureMerger transformer to the workspace. Connect the AddressPoints (OGC GeoPackage) reader feature type to the Requestor port of the FeatureMerger. Connect the postal_addresses (Databricks) reader feature type to the Supplier port of the FeatureMerger.
FeatureMerger.png


Open the FeatureMerger parameters. For the Requestor parameter, click the drop-down arrow and select Attribute Value > AddressId. For the Supplier parameter, click the drop-down arrow and select Attribute Value > AddressId. Press OK to accept the new parameters.
FeatureMergerParams.png

If you’re interested in learning more about how to perform joins in FME, see The FeatureMerger Transformer article. 

6. Write to JSON
Now that we’ve joined the Postal Addresses data to the Address Points dataset, we can write to a new JSON file for use over the web. 

Add a writer to the workspace by clicking the “Add Writer” button in the toolbar. Enter the following parameters:

  • Format: JSON (JavaScript Object Notation)
  • Dataset: Specify an output file & location for the dataset
  • Feature Type Definition: Automatic

Writer.png

Press OK to add the writer to the workspace. In the Feature Type dialog that pops up, set the Feature Type Name to AddressPoints and press OK. Connect the FeatureMerger Merged port to the AddressPoints writer feature type. 
CompletedWorkspace.png


Run your workspace. You have now translated data from a Databricks Delta Table to a web-shareable JSON file.
 

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.