How to Convert Parquet to JSON

Files

Parquet to JSON tutorial.zip
- 20 KB
- Download

Introduction

In FME, the Apache Parquet reader can read multiple .parquet files making up a partitioned dataset. The data can then be converted to any format supported by FME, like CSV, JSON, and spatial systems like ArcGIS.

Step-by-Step Instructions

In this scenario, the user needs to convert a set of Parquet files into a single JSON file for easy sharing over the web. In FME, the source feature types represent each input Parquet file. In the JSON writer, the feature type is included as an attribute called “json_featuretype” by default. Each row becomes an element in the JSON array with attributes nested under it:

1 JSON output.png

Follow the steps below to build the workspace from scratch, or open the completed FME template attached to the article.

The data we are working with is a set of partitioned Parquet files representing public art in Vancouver:

2 Parquet files.PNG

1. Open FME Workbench

Open FME Workbench and start with a blank canvas.

2. Add an Apache Parquet Reader

Click “Add Reader” and add the source data to the workspace. Enter the following parameters:

Format: Apache Parquet
Dataset: C:\<Tutorial Download>\public art\*.parquet
Workflow Options: Individual Feature Types

The asterisk represents a wildcard, which means FME will read all .parquet files in the folder.
3 Add Parquet reader.PNG

Click OK to add the reader feature type to the canvas.

In the Select Feature Types dialog that pops up, ensure all 3 files are selected and click OK.
4 Select all feature types.PNG

A Parquet reader will be added to the workflow, and all 3 feature types will be added to the canvas.

3. Add a JSON Writer

Click “Add Writer” and enter the following parameters:

Format: JSON (JavaScript Object Notation)
Dataset: C:\<Tutorial Download>\public-art.json
Feature Type Definition: Copy from Reader...

5 Add JSON writer.PNG

Click “Parameters…” and:

(a) Set the Feature Type Key Name to “Neighborhood” instead of “json_featuretype”, since we know the feature type name is the Neighborhood name, and this will make the output JSON clearer.

(b) Uncheck “Write Geometry”.

6 JSON writer parameters.PNG

Click OK on the Parameters and click OK again on the Add Writer dialog.

On the Select Feature Type dialog that pops up, select all 3 feature types and click OK.

7 JSON writer feature types.PNG

A JSON writer will be added to the workflow with 3 feature types.

4. Connect the Reader and Writer Feature Types

Connect the reader feature types to the writer feature types. The workspace is now configured to translate the three input Parquet files into a single JSON file that includes three feature types as attributes on the features.

8 Parquet to JSON workspace.png

5. Run the Workspace

Run the workspace to convert the Parquet data to JSON. The file public-art.json is generated, which contains an element for every row in the input dataset. Every element contains a “Neighborhood” attribute, which is the feature type name.

9 Output JSON.PNG

Bonus: Nested JSON Outputs

This scenario generated a JSON file that is not nested. It could be structured in a hierarchy with the Neighborhood as the top level and each art installation nested in its neighborhood. To learn how to write nested JSON, refer to “Writing JSON with JSONTemplater” and the JSONTemplater documentation.

Additional Resources

Apache Parquet FME Documentation
Tutorial: Getting Started with JSON

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.