Regular Expressions and FME

Debbi L
Debbi L
  • Updated

Introduction

A regular expression (commonly known as regex or regexp) is a sequence of characters that defines a search pattern for text. It is commonly used to find or replace character combinations in strings or perform input validation.

Regular expressions (regex) can be used within FME in a variety of transformers and functions. This article describes how regular expressions can be created within FME Workbench, transformers and FME functions that allow users to specify regex, and contains a few examples where regular expressions can be used.


How do I create regular expressions in FME?

Regular expression can be created using the Regular Expression editor in FME Workbench. The Regular Expression editor allows you to compose a Perl Compatible Regular Expression (PCRE).

Enter the regular expression in the Regular Expression field and use the Test String field to test the expression. If the regular expression in the Regular Expression field is syntactically invalid, the field will be highlighted with red. If the results of the regular expression are found in the test strings, matches are highlighted in yellow in the Test String field, and the number of matches is indicated under 'Results'.

Image of regular expression editor in FME Workspace where regular expression can be composed and tested
The Regular Expression editor within Workbench is where regular expressions can be composed and tested if given a test string.

There is a basic guide to PCRE syntax in the Quick Reference section of the Regular Expression editor to help you get started. In most cases, matches are case-insensitive by default. If case sensitivity is required, enable the Case Sensitive checkbox under the Regular Expression field.


Which transformers support regular expressions?

FME supports regular expressions in a number of transformers listed below. These transformers support the use of regular expressions in at least one parameter.

 Note that regular expression support in many of the transformers listed are conditional. For example, StringReplacer's Text to Replace parameter only supports regular expressions when the Mode parameter is 'Replace Regular Expression'.


Which FME functions support regular expressions?

FME also supports regular expressions in the following FME string functions.

  • @FindRegularExpression()
  • @ReplaceRegularExpression()
  • @SubstringRegularExpression()


Examples of common regular expressions

Extracting street number in addresses

Regex is commonly used to extract data from strings. It is possible to extract data from strings using a StringSearcher, or AttributeCreator/AttributeManager with the @SubstringRegularExpression() FME string function.

For example, if street addresses are formatted as 1234 Main Street and you want to identify the street number (ie. 1234),

^\d+


would be a good regular expression to use. There are three parts to this regular expression–^ (start of line), \d (any digit, ie. 0-9), and + (quantifier that matches the preceding character between one to infinite times).

\d in combination with the + quantifier matches whole numbers between zero to infinity, eg. would match 1 in 1 Main Street, 12 in 12 Main Street, and 1234 in 1234 Main Street, etc. The ^ (start of line) is to handle cases where the street address contains a numbered street (ie. 1234 10th Street) as using \d+ as the match pattern would return 1234 10 as matches.

Regular expression editor in FME Workbench configured to use the regular expression ^\d+ to match the street number from street addresses.
Regular Expression editor configured to use the regular expression ^\d+ to match the street number from street addresses.


Postal code validation

Another common use case of regex is to perform input validation. Attribute validation can be performed with transformers, such as AttributeValidator, Tester, or TestFilter, etc to validate data.

Using the alphanumeric Canadian postal codes as an example, postal codes are in the format A1A 1A1, where A is an upper-case letter and 1 is a digit. A space separates the third and fourth character.

With this information, you can construct a basic regular expression

^[A-Z]\d[A-Z] \d[A-Z]\d$ 

to check if the input for a Canadian postal code field is formatted properly. Broken down, this regular expression finds matches that start with an uppercase letter, followed by one digit, followed by an uppercase letter, followed by a space, followed by one digit, followed by an uppercase letter, and ending with a digit. As case-sensitivity is a factor, the Case Sensitive checkbox under the Regular Expression field should be enabled.

Regular Expression editor in FME Workbench configured to use the regular expression ^[A-Z]\d[A-Z] \d[A-Z]\d$ to validate Canadian postal code input.
Regular Expression editor configured to use the regular expression ^[A-Z]\d[A-Z] \d[A-Z]\d$ to validate Canadian postal code input.

Note that this example is simplified as certain letters are not used in Canadian postal codes. This regular expression is not recommended to be used in a production environment.


AI Assist for regular expressions

Artificial Intelligence (AI) Assist is released as Tech Preview starting in FME Form 2023.1. Tech Preview features may not work 100%, may change without notice, and can be removed from a build of FME at any time. See this article for more details on the Tech Preview label.

Starting in FME Form 2023.1, Artificial Intelligence (AI) Assist is available in the Regular Expression editor to help create search patterns for your use case. Click the AI Assist button at the bottom of the Regular Expression Editor dialog to open the AI Assist dialog.

Artificial Intelligence (AI) Assist is available as Tech Preview in the Regular Expression editor in FME Form 2023.1.
Artificial Intelligence (AI) Assist is available as Tech Preview in the Regular Expression editor in FME Form 2023.1.

Type in a prompt in English in the Regular Expression Description field and click Generate. The AI service will then attempt to generate a regular expression search pattern based on the Regular Expression Description field. An explanation of the generated prompt and test strings will also be provided in the Explanation and Test String fields, respectively. You can also enter your own test strings to test the regular expression.


Create a regular expression with the help of Artificial Intelligence (AI) Assist by typing in your regular expression description and selecting Generate. Optionally, enter your own test strings to confirm the regex works as expected. Hit Apply to apply the generated regular expression to the Regular Expression dialog.


If the generated regular expression does not match your test cases, continue to refine and modify the prompt. Once you are satisfied with the regular expression match pattern that has been generated, select Apply to apply the generated regular expression to the Regular Expression editor dialog.


Which regular expression library does FME use?

There are many applications and languages that support regular expressions. Different languages and applications may have their own implementation of regular expressions, leading to differences in syntax and functions.

FME uses Qt, which is based on the Perl Compatible Regular Expressions, version 2 (PCRE2) library. A regular expression created for another application may not work within FME if the regular expression is based on a different implementation.


Additional Resources

Extracting Text and Tabular Data from PDF
Using the Directory and File Pathnames Reader | Record File Metadata
Anonymizing Crime Data with FME
How to Use the SchemaScanner Transformer
Validate your Data's Attributes with the AttributeValidator Transformer
Attribute Processing Example Workspace (CSV to MapInfo)

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.