CSW Metadata Insert

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2015.x

Introduction

The attached workspace demonstrates a potential solution to the following scenario:

We have a set of AutoCAD DWG files that we want included in a metadata catalog provided by a deegree CSW server. A third-party editor was used to prepare an ISO19115 metadata file that contains most of the metadata records that are not file-specific, but we need to 'fill-in' several file-specific records, such as the geographic extent, file location, etc.

 

Service Configuration

deegree-csw has the following prerequisites:

  1. Postgis
  2. Java
  3. Apache Tomcat


The deegree documentation is comprehensive, and provides a good set of instructions for setting up the service.

Setup of the service consists of:

  1. Install above prerequisites
  2. Copy the war file to the Tomcat root (Tomcat will expand the war file)
  3. Create a postgis spatial database
  4. Run the SQL scripts provided by deegree to configure the DB
  5. Update the deegree configuration file (inside the directory Tomcat expands) with the database details

 

Verification

deegree UI

To verify that deegree is properly configured, navigate to the deegree web application running within Tomcat. The default url is http://localhost:8080/deegree-csw/ (or http://<host>:<port>/deegree-csw/). Links to capabilities, status information, and a basic client app are provided.

http_download_1444329740505_6032.jpeg

 

deegree web client

The generic web service client provides a simple interface for sending XML-formatted requests to the service, and examining the XML-formatted responses. The client interface also includes several example requests, which are useful for understanding how CSW works. The 'template' metadata file included with this demo is derived from these examples.

http_download_1444329740682_6032.jpeg

 

Source Data

The source data is a set of AutoCAD DWG files consisting of parcel data for the city of Austin that has been divided into sections, some of which overlap.

 

Data Overview

An overview of the data is show below; the bounding box of each file is shown as a colored box.

http_download_1444329740990_6032.jpeg

 

Zoomed Section of Data

A zoomed in section of above.
http_download_1444329741393_6032.jpeg

 

Updating XML on the fly

Typical FME translations read data from a file, which results in the creation of features with attribute data corresponding to the file data. This workspace treats the XML metadata record as an opaque block that is manipulated using our XQuery transformers.

This approach permits us to use an easily extensible building block approach to manipulating the metadata, and then inserting it into the catalog service:

  1. The contents of the template metadata file are loaded into a single FME string attribute
  2. An XQuery is constructed for each section of the metadata that we're interested in updating. An arbitrary number of these update queries can be chained together
  3. The updated metadata xml is wrapped in the appropriate web service xml envelope & POSTed to the service end point

An additional benefit of using XQuery is that it ensures that we are dealing with valid XML at every stage of the transformation, thus eliminating one potential source of service errors.

 

Workspace Overview

http_download_1444329741702_6032.jpeg

The Workspace is divided into 3 sections:

  1. Initialization: Each DWG file is read; metadata regarding the file and its contents are collated, resulting in a single feature per file. A metadata template is attached to the feature by reading it from the external metadata template file into an attribute
  2. Metadata Preparation: The transformation sections that prepare & execute the XQuery updates
  3. CSW Insert: The insert transaction XML is prepared, executed, and the results are summarized

 

Workspace Details

For ease of navigation, the workspace has been divided into several numbered bookmarks. The following sections describe what is happening inside each of the bookmarks.

(1) Initialization

http_download_1444329741901_6032.jpeg
This workspace reads from several AutoCAD DWG files. Normally, each file would map to an individual FME feature type, however, if you examine the properties for single source feature type, you will notice that it is a 'wild card' feature type that merges all source feature types. This is a very handy way of handling multiple files that share a common schema.

(1.1) Accumulate Dataset Metadata

http_download_1444329742106_6032.jpeg
The MetadataAccumulator transformer examines each feature read from the features, and accumulates the following information for each feature type:

  1. Bounding box of all features
  2. Coordinate Reference System (CRS) of the first feature
  3. dataset location & file size


In addition, a title (for the metadata citation) is generated based on the feature type.

The transformer outputs one feature for every file that was read.

(1.2) Load Metadata Template

http_download_1444329742317_6032.jpeg
A copy of the metadata template is added (as a string attribute) to each of the features as it passes through.

(2) Prepare Metadata

http_download_1444329742469_6032.jpeg
This stage of the transformation is divided into several independent blocks. These blocks can be reordered or removed without affecting previous or subsequent blocks. Each block uses a very simple pattern:

http_download_1444329742671_6032.jpeg

  1. Use a PythonCaller to generate an XQuery
  2. Execute the query using an XQueryUpdater


The Python caller is used to generate the XQuery for several reasons:

  1. The transformer has a text editor that is useful for editing large sections of text
  2. The syntax allows for the XQuery to be formatted in an easy to read way
  3. It is very simple to access FME feature attributes & perform string insertion/replacement


The other choice would be to use a Concatenator transformer but it would be harder to read the XQuery that is being created.

For the purposes of this demo, each of these bookmarked blocks has been left 'exposed' in the main workspace window for illustration purposes. Encapsulation of these blocks into custom transformers is very easy, and highly recommended. Doing so will allow users to create a library of metadata manipulation transformers that can be easily chained together to manipulate a metadata record.

(2.1) Generate the File Identifier

http_download_1444329742871_6032.jpeg
This block generates a fileIndentification section for the metadata. This particular demo uses random UUID identifiers generated by the UUIDGenerator(LINK_TODO) transformer; this approach may or may not be appropriate in a production environment if record updates are desirable. In such a situation, the identifiers would have to be obtained from an externally maintained mapping of filenames to identifiers.

(2.2) Generate Datestamp

http_download_144432974348_6032.jpeg
This block generates the dateStamp section.

(2.3) Insert Title

http_download_1444329743228_6032.jpeg
This block inserts the title element of the citation section.

(2.4) Insert distributionInfo

http_download_1444329743415_6032.jpeg
Generates the distributionInfo section that includes a transferOptions element for each dataset file that includes the file location and size. In this example, our source dataset is Autodesk DWG, which has one file per dataset. Other formats, such as MIF/MID or Shape have multiple files per dataset. See the MetadataAccumulator source code for an example of how multiple files can be properly handled. At this time, the format being worked with must be hardcoded into the MetadataAccumulator to correctly deal with this multiple file per dataset issue.

Depending on organizational requirements, there may be other options for indicating file location and format; this is only one example.

(2.5) Setup Coordinate Reference System

http_download_1444329743710_6032.jpeg
This block demonstrates how the CRS of a feature can be extracted and used to construct a referenceSystemInfo section. Notes:

  1. The FME coordinate system name may or may not correspond to a well known CRS name
  2. The RS_Identifier element can contain two elements: code and authority. This example places the FME CRS name in the code element. Individual metadata implementations may have differing requirements. For example, EPSG numbers could be obtained from the FME CRS using the CoordinateSystemDescriptionConverter(Article#000001660) transformer and inserted here instead.

(2.6) Update Bounds

http_download_1444329743886_6032.jpeg
Constructs the EX_GeographicBoundingBox element of the identificationInfo section using the bounding box that was accumulated earlier.

(3) Perform CSW Insert Transaction

http_download_144432974465_6032.jpeg
At this point in the translation, a complete metadata XML record has been created and is ready to be inserted into the OGC catalog web service. This stage of the translation prepares the request, submits it to the server, and then examines the response to determine whether or not the request was successful

(3.1) Create the Request

http_download_1444329744246_6032.jpeg
Preparing a CSW Insert Transaction is fairly simple. The previously prepared metadata record is simply wrapped in a XML CSW transaction envelope.

(3.2) Execute the Request

http_download_1444329744404_6032.jpeg
Using HTTP POST, the the request is posted to service provided by deegree.

Note: depending on how your have installed & configured deegree-csw, you may need to change the URL of the service end-point in the HTTPUploader transformer.

The server response is tested to see if it contains the text "Exception", which indicates that the transaction failed. The web-service will return a HTTP status of 200 regardless of if the insert succeeded or not.

(3.3) Log Successful Insert Details

http_download_1444329744562_6032.jpeg
Each successful transaction response contains the following items:

  1. The number of inserts, updates, and deletes that were processed. In our case, this is always 1 insert
  2. The file identifier
  3. The title of the record


Rather that logging each result, the SummaryLog accumulates all results, and creates a concise summary log at the end of the translation.

(3.4) Handle Failure

http_download_1444329744726_6032.jpeg
Upon failure, deegree-csw will respond with an exception message that describes what caused the failure. Assuming the service is properly configured, the most common failure causes are either malformed XML, or non-conformant metadata records.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.