FME Version
Files
Introduction
In FME, Parquet data can be extracted from data warehouse or data lake, processed, and written back to Parquet for uploading back to the cloud. Often, large volumes of spatial information like latitude and longitude or x/y coordinates are stored in columns and need to be processed and transformed. Parquet is optimized for storing millions of records, and FME can quickly process these records and associated geospatial information.
Step-by-Step Instructions
In this scenario, the user needs to process a Parquet file stored in the cloud, which contains spatial information in the form of lat/long coordinates. In FME, the workflow includes a Parquet reader and writer, plus transformers and an additional reader to perform the data processing. Follow along in the steps below to build the workspace from scratch, or open the completed FME template in the article attachments.
The data we are working with represents public art in Downtown Vancouver and contains latitude and longitude values for each row:
We are interested in finding which transit station is closest to each art display, and then updating the Parquet dataset to include that information. We’ll use an external GIS dataset containing transit data.
1. Generate a Workspace
Open FME Workbench and generate a new workspace. Add an Apache Parquet reader and writer by entering the following parameters:
- Reader Format: Apache Parquet
- Reader Dataset: C:\<Tutorial Download>\Downtown.parquet
- Writer Format: Apache Parquet
- Writer Dataset: C:\<Tutorial Download>\Downtown-transformed
- Workflow Options: Static Schema
Click OK.
A workspace is generated that translates the input .parquet file to an output .parquet file.
2. Add a Shapefile Reader
Now it’s time to perform the desired spatial processing. We are interested in finding which transit station is closest to each art display, so we’ll read in the GIS file containing transit stations.
Click “Add Reader” and enter the following parameters:
- Format: Esri Shapefile
- Dataset: C:\<Tutorial Download>\transit\rapid-transit-stations.shp
- Workflow Options: Individual Feature Types
The workspace now has two reader feature types, one for Downtown and one for rapid-transit-stations.
3. Add a NeighborFinder transformer
Click anywhere on the canvas and begin typing “NeighborFinder”. Click the transformer to add it to the workflow.
On the input side, connect the Downtown feature type to the Base port and the rapid-transit-stations feature type to the Candidate port.
Open the NeighborFinder parameters. Under the “Attribute Accumulation” section, check “Merge Attributes”. This ensures the station name from the Shapefile dataset will appear in the output Parquet dataset.
Connect the MatchedBase output port to the Downtown writer feature type. The workspace should look as follows:
4. Configure the Output Attributes
Open the parameters on the output writer feature type. In the User Attributes tab, ensure “Manual” is chosen and add a “station” attribute of type “string”. The list of attributes should consist of Name, Title, Longitude, Latitude, and station (case-sensitive). This ensures that we write only the columns we care about in the output Parquet file, which includes the station name taken from the Shapefile dataset.
Click OK.
When you expand the writer feature type, you should see the list of attributes with green input arrows. If an arrow is red, it means the attribute name does not match the source.
5. Run the Workspace
Run the workspace to perform the desired spatial processing on the Parquet dataset. The file Downtown.parquet is generated in the Downtown-transformed folder, which contains an additional column with the name of the nearest transit station. This file can now be uploaded back to the cloud.
Additional Resources
Apache Parquet FME Documentation
Data Attribution
The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.
Comments
0 comments
Please sign in to leave a comment.