Tutorial: Getting Started with Apache Parquet

Liz Sanderson
Liz Sanderson
  • Updated

Introduction

Apache Parquet is a column-oriented file format designed to be performant for use in Big Data systems. Unlike CSV, it supports null values and a full range of data types, and it is designed for efficient queries. This makes it ideal for use in data warehouses and data lakes, including systems such as Apache Hadoop, Amazon Athena, Google BigQuery, and Microsoft Azure.

A Parquet dataset consists of .parquet files in a folder, which might be nested into partitions based on attributes. In FME, a .parquet file is a feature type, and a row/record is a feature.

Articles

This tutorial series will guide you through basic Parquet translation and transformation scenarios, including how to utilize the Apache Parquet reader and writer in FME.

How to Convert CSV to Parquet

This tutorial guides you through the process of converting a CSV file to one or more Parquet files for use in a Big Data system.

How to Convert Parquet to JSON

This tutorial guides you through the process of converting a partitioned dataset of .parquet files into a single JSON file for easy sharing over the web.

How to do Spatial Processing on Parquet Data

This tutorial guides you through the process of performing spatial processing on a Parquet file extracted from a data lake. The data is then uploaded back to the cloud.

Was this article helpful?

We're sorry to hear that.

Please tell us why.

As of January 14th, 2026, comments on knowledge base articles have been closed. To make sure questions don’t get missed and to enable more community support, we’ve moved discussions to the FME Community. If you have a question or a comment about this article, please create a new post or create a support ticket.