FME Version
Introduction
Cloud-native refers to a modern approach to managing both spatial and non-spatial datasets, which are optimized for deployment and hosting within cloud environments. By leveraging cloud architecture, cloud-native formats have eased the heavy load of provider and user interactions across networks, saving time, money and storage.
The concept behind cloud-native is to simply ‘read what you need’. This often involves some sort of indexing or tiling of the data, so small extent queries can be applied and extracted from large datasets. Because cloud-native standards inherently enable these partial reads, data publishers can rely on these standards to optimize their datasets without much additional configuration.
FME can help enable seamless integration of cloud-native datasets stored across various cloud platforms, including Amazon’s S3, Azure, or Google Cloud. Safe Software has now added support for five key cloud-native formats to FME Form 2024.0.
Formats
Spatio-Temporal Asset Catalog (STAC)
Spatio-Temporal Asset Catalog is a format that stores cloud-based assets that relate to a geographic area or time. The assets are typically templated in a JSON catalog/collection. Although STAC was developed around raster data, it also supports vector products. For example, a STAC Collection can have Assets that store geopackage layers or COG bands as items.
Cloud-Optimized GeoTIFF (COG)
Cloud-Optimized GeoTIFF is designed based on the GeoTIFF raster specification. This format supports raster pyramiding, tiling and compression.
GeoParquet
GeoParquet is a cloud-friendly vector format built on the Parquet standards. As a result, Geoparquet benefits from a mature set of applications, libraries and tools available initially designed for Parquet. The format is column oriented and supports a wide range of geometries
FlatGeobuf
FlatGeobuf is a vector format built on Google’s Flatbuffers library. A buffer is considered a file and everything within it. Although it is not required, FlatGeobuf uses indexing to help reduce the amount of data that would need to be transferred over a potentially slow network.
Zarr
Based on NetCDF / HDF data cube formats, Zarr is a multidimensional raster array / time series storage optimized for the web. In Zarr, each time step represents a separate band with its own properties.
Cloud-Optimized Point Cloud (COPC)
A Cloud-Optimized Point Cloud stores point clouds optimized for the web, allowing you to read what you need from large 3D data volumes. This format is inspired by the LAS/LAZ specification.
Comments
1 comment
Hi there, I've tried using the COG reader in FME 2024.1 using the bounding box and also the feature reader where a spatial filter intersects. It seems no matter what settings I'm using FME is still downloading the whole dataset -maybe my data is wrong?? But this seems to defeat the whole purpose of creating a COG. As I understand it FME should be able to take the bounding box and make a ranged http request to just get the chunks it needs without actually needing to first download the whole file.
It seems I'm not alone, there are a number of threads which seem to communicate similar experiences.
https://community.safe.com/transformers-9/how-to-work-properly-with-cog-s-35553?postid=158635#post158635
https://community.safe.com/transformers-9/read-part-of-cloud-optimised-geotiff-cog-21515
Please sign in to leave a comment.