Tutorial: Getting Started with Apache Parquet

Introduction

Apache Parquet is a column-oriented file format designed to be performant for use in Big Data systems. Unlike CSV, it supports null values and a full range of data types, and it is designed for efficient queries. This makes it ideal for use in data warehouses and data lakes, including systems such as Apache Hadoop, Amazon Athena, Google BigQuery, and Microsoft Azure.

A Parquet dataset consists of .parquet files in a folder, which might be nested into partitions based on attributes. In FME, a .parquet file is a feature type, and a row/record is a feature.

Articles

This tutorial series will guide you through basic Parquet translation and transformation scenarios, including how to utilize the Apache Parquet reader and writer in FME.

Tutorial: Getting Started with Apache Parquet

Introduction

Articles

How to Convert CSV to Parquet

How to Convert Parquet to JSON

How to do Spatial Processing on Parquet Data

Was this article helpful?

Search

Tutorial: Getting Started with Apache Parquet

Introduction

Articles

How to Convert CSV to Parquet

How to Convert Parquet to JSON

How to do Spatial Processing on Parquet Data

Was this article helpful?