Introduction
Apache Parquet is a column-oriented file format designed to be performant for use in Big Data systems. Unlike CSV, it supports null values and a full range of data types, and it is designed for efficient queries. This makes it ideal for use in data warehouses and data lakes, including systems such as Apache Hadoop, Amazon Athena, Google BigQuery, and Microsoft Azure.
A Parquet dataset consists of .parquet files in a folder, which might be nested into partitions based on attributes. In FME, a .parquet file is a feature type, and a row/record is a feature.
Articles
This tutorial series will guide you through basic Parquet translation and transformation scenarios, including how to utilize the Apache Parquet reader and writer in FME.
How to Convert CSV to Parquet
This tutorial guides you through the process of converting a CSV file to one or more Parquet files for use in a Big Data system.
How to Convert Parquet to JSON
This tutorial guides you through the process of converting a partitioned dataset of .parquet files into a single JSON file for easy sharing over the web.
How to do Spatial Processing on Parquet Data
This tutorial guides you through the process of performing spatial processing on a Parquet file extracted from a data lake. The data is then uploaded back to the cloud.