How to Convert Parquet to JSON

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2021.0

Introduction

In FME, the Apache Parquet reader can read multiple .parquet files making up a partitioned dataset. The data can then be converted to any format supported by FME, like CSV, JSON, and spatial systems like ArcGIS.

 

Step-by-Step Instructions

In this scenario, the user needs to convert a set of Parquet files into a single JSON file for easy sharing over the web. In FME, the source feature types represent each input Parquet file, while in the JSON writer, the feature type is included as an attribute called “json_featuretype” by default. Each row becomes an element in the JSON array with attributes nested under it:

1 JSON output.png

Follow along in the steps below to build the workspace from scratch, or open the completed FME template in the article attachments.

The data we are working with is a set of partitioned Parquet files representing public art in Vancouver:

2 Parquet files.PNG

1. Open FME Workbench
Open FME Workbench and start with a blank canvas.

2. Add an Apache Parquet Reader
Click “Add Reader” and add the source data to the workspace. Enter the following parameters:

  • Format: Apache Parquet
  • Dataset: C:\<Tutorial Download>\public art\*.parquet
  • Workflow Options: Individual Feature Types

The asterisk represents a wildcard, which means FME will read all .parquet files in the folder.
3 Add Parquet reader.PNG

Click OK to add the reader feature type to the canvas.

In the Select Feature Types dialog that pops up, ensure all 3 files are selected and click OK.
4 Select all feature types.PNG

A Parquet reader will be added to the workflow and all 3 feature types added to the canvas.

3. Add a JSON Writer
Click “Add Writer” and enter the following parameters:

  • Format: JSON (JavaScript Object Notation)
  • Dataset: C:\<Tutorial Download>\public-art.json
  • Feature Type Definition: Copy from Reader...

5 Add JSON writer.PNG

Click “Parameters…” and:

(a) Set the Feature Type Key Name to “Neighborhood” instead of “json_featuretype”, since we know the feature type name is the Neighborhood name and this will make the output JSON clearer.

(b) Uncheck “Write Geometry”.

6 JSON writer parameters.PNG

Click OK on the Parameters and click OK again on the Add Writer dialog.

On the Select Feature Type dialog that pops up, select all 3 feature types and click OK.

7 JSON writer feature types.PNG

A JSON writer will be added to the workflow with 3 feature types.

4. Connect the Reader and Writer Feature Types
Connect the reader feature types to the writer feature types. The workspace is now configured to translate the 3 input Parquet files into a single JSON file that has 3 feature types included as attributes on the features.

8 Parquet to JSON workspace.png

5. Run the Workspace
Run the workspace to convert the Parquet data to JSON. The file public-art.json is generated, which contains an element for every row in the input dataset. Every element contains a “Neighborhood” attribute, which is the feature type name.

9 Output JSON.PNG

 

Bonus: Nested JSON Outputs

This scenario generated a JSON file that is not nested. It could be structured in a hierarchy with the Neighborhood as the top level and each art installation nested in its neighborhood. To learn how to write nested JSON, refer to “Writing JSON with JSONTemplater” and the JSONTemplater documentation.

 

Additional Resources

Apache Parquet FME Documentation
Tutorial: Getting Started with JSON

 

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.