Introduction
In 2023, Safe Software participated in the OGC Climate Resilience Pilot. The OGC initiated this pilot to bring together a wide range of specialists, from a variety of disciplines, in order to explore what data sources and workflows are needed to support indicators and decision making in order to adequately cope with climate change impacts. The challenge was to develop methodologies and tools for discovering, extracting, integrating, processing and sharing information across spatial data infrastructures, using open standards, in the service of more effective climate resilience.
To support climate resilience, there are ever increasing volumes of data, much of it changing over time. However, all this can lead to information overload as managers are swamped with data from multiple domains and sources, and struggle to filter and interrelate it to extract meaningful, actionable knowledge. Thus, it is essential that the OGC and its partners find ways to improve their effectiveness and efficiency for integrating, combining and leveraging spatial data, in order to better support effective climate change impact analysis that empowers the anticipation, response to and mitigation of the effects.
Approaches
The basic approach for this Pilot is to combine systems that support the complete information pathway, from forecasts and observations to decision support information. The technical components involved include tools and processes to ingest climate model results, data and observations such as EO resources, real time data streams, combined with base maps, population and critical infrastructure data. These inputs can then be combined to generate analysis ready datasets (ARD) for downstream components, which execute impact analysis processes to derive information products (DRI – decision ready information). These DRI products will form the basis for answering key questions decision makers and planners have in order to better prepare for climate change impacts.
Safe Software contributed a key component in this information flow chain – a tool to generate analysis ready datasets, as well as a number of climate change impact components - for heat, flood and drought. Safe Software, with FME, its advanced data integration toolset, is uniquely positioned to support this data flow. Yet data integration alone is not sufficient. Rich ARD must be provided to downstream specialized climate impact assessment components. It was our goal that through this engagement to better integrate with other participants’ tools, via open standards where possible and industry standards as needed, so that together we can ultimately provide more rapid and effective information and indicators to support decision makers, operations and planning personnel.
High Level Architecture Approach
Below is a high level architecture diagram describing our general approach for data extraction, transformation and loading to ADR datasets and data streams to be consumed by other downstream components. This is more of a starting point for agile development, used to explore options for different data sources, process steps and output types. It is felt that one of the best ways to identify data and information gaps is to develop early prototypes like this and test out different approaches using real source data. This also gives stakeholders something to respond to as they review results and identify what’s useful and what is missing. This iterative development process with feedback from stakeholders and their representatives was crucial to the success of the Disaster Pilot. Stakeholder input was crucial for determining the contour interval needed for the flood contour output in order to support decisions on road restrictions (20 cm flood depth = restricted, 40 cm depth = closed). ARD is fed back to other components via OGC Geopackage, OGC Services (OGC API etc), and whatever other data formats are needed by downstream components.
High level pilot FME workflow to support generation of ARD from climate model, EO, IoT, infrastructure and base map inputs, and generation of climate change scenario ARD and impacts.
Open Standards and APIs
For this pilot, we hosted an FME Flow instance on FME Cloud which will provided access to our data via a number of different OGC standards (OGC API Features, GeoJSON, Geopackage, GML, KML, GeoTIFF) and via other open and defacto standards as needed (Shape, CSV,HTML).
Analysis Ready Data Component
Our Analysis Ready Data component (ARD) uses the FME platform to consume regional climate model and EO data and generate FAIR datasets for downstream analysis and decision support. The challenge to manage and mitigate the effects of climate change poses difficulties for spatial and temporal data integration. One of the biggest gaps to date has been the challenge of translating the outputs of global climate models into specific impacts at the local level. FME is ideally suited to help explore options for bridging this gap given its ability to read datasets produced by climate models such as NetCDF or OGC WCS and then filter, aggregate, interpolate and restructure it as needed. FME can inter-relate it with higher resolution local data, and then output it to whatever format or service is most appropriate for a given application domain or user community.
Our ARD component supports the consumption of climate model outputs such as NetCDF. It also has the capacity to consume earth observation (EO) data, and the base map datasets necessary for downstream workflows, though given time and resource constraints during this phase we did not pursue consumption of other data types besides climate data.
The basic workflow for generating output from the FME ARD component is as follows. The component extracts, filters, interrelates and refines these datasets according to indicator requirements. After extraction, datasets are filtered by location and transformed to an appropriate resolution and CRS. Then the workflow resamples, simplifies and reprojects the data, and then defines record level feature identifiers, ECV values, metadata and other properties to satisfy the target ARD requirements. This workflow is somewhat similar to what was needed to evaluate disaster impacts in DP21. Time ranges for climate scenarios are significantly longer — years rather than weeks for floods.
Once the climate model, and other supporting datasets have been adequately extracted, prepared and integrated, the final step is to generate the data streams and datasets required by downstream components and clients. The FME platform is well suited to deliver data in formats as needed. This includes Geopackage format for offline use. For online access, other open standards data streams are available, such as GeoJSON, KML or GML, via WFS and OGC Features APIs and other open APIs. For this pilot we generated OGC Geopackage, GeoJSON, CSV and OGC Features API services.
As our understanding of end user requirements continues to evolve, this will necessitate changes in which data sources are selected and how they are refined, using a model based rapid prototyping approach. We anticipate that any operational system will need to support a growing range of climate change impacts and related domains. Tools and processes must be able to absorb and integrate new datasets into existing workflows with relative ease. As the pilot develops, data volumes increase, requiring scalability methods to maintain performance and avoid overloading downstream components. Cloud based processing near cloud data sources using OGC API web services supports data scaling. Regarding the FME platform, this involves deployment of FME workflows to FME Cloud. Note that in future phases, we are likely to test how cloud native datasets (COG, STAC, ZARR) and caching can be used to scale performance as data transactions and volume requirements increase.
It is worth underlining that our ARD component depends on the appropriate data sources in order to produce the appropriate decision ready data (DRI) for downstream components. Risk factors include being able to locate and access suitable climate models of sufficient quality, resolution and timeliness to support indicators as the requirements and business rules associated with them evolve.
Figures
Figure 30 — Environment Canada NetCDF GCM time series downscaled to Vancouver area. From: https://climate-change.canada.ca/climate-data/#/downscaled-data
Figure 31 — Data Cube to ARD: NetCDF to KML, Geopackage, GeoTIFF
Original Data workflow: - Split data cube - Set timestep parameters - Compute timestep stats by band - Compute time range stats by cell - Classify by cell value range - Convert grids to vector contours\
Figure 32 — Extracted timestep grids: Monthly timesteps, period mean T, period max T
Figure 33 — Convert raster temperature grids into temperature contour areas by class
Figure 34 — Geopackage Vector Area Time Series: Max Yearly Temp
ARD Development Observations
Figure 35 — FME Data Inspector: RCM NetCDF data cube for Manitoba temperature 2020-2099
Disaster Pilot 2021 laid a good foundation for exploring data cube extraction and conversion to ARD with using the FME data integration platform. A variety of approaches were explored for extraction, simplification and transformation including approaches to select, split, aggregate, and summarize time series. However, more experimentation was needed to generate ARD that can be queried to answer questions about climate trends. This evolution of ARD was one of the goals for this CRP. This goal includes better support for both basic queries, and analytics, statistical methods, simplification & publication methods, including cloud native — NetCDF to Geopackage, GeoJSON and OGC, APIs.
In consultation with other participants, we learned fairly early on in the pilot that our approach to temperature and precipitation contours or polygons inherited from our work in DP21 on flood contours involved too much data simplification to be useful. For example, contouring required grid classification into segments, such as 5 degree C or 10mm of precipitation etc. However, this effective loss of detail oversimplified the data to the point where it no longer held enough variation over local areas to be useful. In discussion with other participants, it was determined that simply converting multidimensional data cubes to vector time series point data served the purpose of simplifying the data structure for ease of access, but retained the ECV precision needed to support a wider range of data interpretations for indicator derivation. It also meant that as a data provider we did not need to anticipate or encode interpretation of indicator business rules into our data simplification process. By simply providing ECV data points, the end user was free to run queries to find locations and time steps where temp > or precipitation < some threshold of interest.
Initially it was thought that classification rules need to more closely model impacts of interest. For example, the business rules for a heat wave might use a temperature range and stat type as part of the classification process before conversion to vector. However, this imposes the burden of domain knowledge on the data provider rather than on the climate service end user who is much more likely to understand the domain they wish to apply the data to and how best to interpret it.
Modified ARD Data workflow:
1. Split data cube
2. Set timestep parameters
3. Compute timestep stats by band
4. Compute time range stats by cell
5. Convert grids to vector points
Further scenario tests were explored, including comparison with historical norms. Calculations were made using the difference between projected climate variables and historical climate variables. These climate variable deltas may well serve as a useful starting point for climate change risk indicator development. They also serve as an approach for normalizing climate impacts when the absolute units are not the main focus. Interesting patterns emerged for the LA area that we ran this process on deltas between projected and historical precipitation. While summers are typically dry and winters are wet and prone to flash floods. Initial data exploration seemed to show an increase in drought patterns in the spring and fall. More analysis needs to be done to see if this is a general pattern or simply one that emerged from the climate scenario we ran. However, this is the type of trend that local planners and managers may benefit from having
the ability to explore once they have better access to climate model scenario outputs along with the ability to query and analyze them.
Figure 36 — Modified ARD Workflow: NetCDF data cube to precipitation delta grids (future - historical) in Geopackage for LA
ARD Climate Variable Delta Data workflow:
1. Split data cubes from historic and future netcdf inputs
2. Set timestep parameters
3. Compute historic mean for 1950 — 1980 per month based on historic time series input
4. Multiply historic mean by -1
5. Use RasterMosaiker to sum all future grids with -1 * historic mean grid for that month
6. Normalize environmental variable difference by dividing by historic range for that month delta / (max — min)
7. Convert grids to vector contours
8. Define monthly environment variables from band and range values
More analysis needs to be done with higher resolution time steps — weekly and daily. At the outset monthly time steps were used to make it easier to prototype workflows. Daily time step computations will take significantly more processing time. Future pilots should explore ways of better supporting scalability of processing through automation and cloud computing approaches such as the use of cloud native formats (STAC, COG, ZARR etc).
ARD Workflow Inputs: Datasets Read by FME
Sample Environment Canada Climate Data as read by FME:
Environment Canada NetCDF GCM time series downscaled to Vancouver area, 2000-2100
USGS GCM NetCDF for Pacific NorthWest: 2000-2100
Comments
0 comments
Please sign in to leave a comment.