FME Version
Introduction
Apache Parquet is a column-oriented file format designed to be performant for use in Big Data systems. Unlike CSV, it supports null values and a full range of data types, and it is designed for efficient queries. This makes it optimal for use in data warehouses and data lakes, including systems like Apache Hadoop, Amazon Athena, Google BigQuery, and Microsoft Azure.
A Parquet dataset consists of .parquet files in a folder, which might be nested into partitions based on attributes. In FME, a .parquet file is a feature type and a row/record is a feature.
Articles
This tutorial series will walk through basic Parquet translation and transformation scenarios, including how to use the Apache Parquet reader and writer in FME.
How to Convert CSV to Parquet
This tutorial walks through how to convert a CSV file to one or more Parquet files for use in a Big Data system.
How to Convert Parquet to JSON
This tutorial walks through how to convert a partitioned dataset of .parquet files into a single JSON file for easy sharing over the web.
How to do Spatial Processing on Parquet Data
This tutorial walks through how to do spatial processing on a Parquet file that has been extracted from a data lake. The data is then uploaded back to the cloud.
Comments
0 comments
Please sign in to leave a comment.