Anonymizing Crime Data with FME

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2016.x

Introduction

Anonymizing data, or removing any information that makes an individual personally identifiable, can be an essential step before sharing data with the public. Anonymization is particularly useful for Police Departments who distribute up-to-date crime information but remove particular information to protect privacy.

This exercise will demonstrate one of the approaches that can be used to anonymize data, and will describe a workflow that includes:

  • Removing attribute information (e.g. individual names).
  • Testing for “block list” words, or words that should be obscured before being released to the public.
  • Anonymizing addresses/crime incident locations to display only numbers at the 100-block level.
  • Generalizing crime locations using a “block mapping” approach, where incident locations are moved to the midpoint of the street segment on which they occur.

The Workflow will write the output to two different KML files. The first file (Anonymized.kml) will be completely anonymized and intended for public use and distribution. The Second file (Original.kml) will be intended for internal use and maintains all the original attribute information, but moves incident locations to the midpoint of the street segment where there occurred.

 

Step-by-Step Instructions

We will begin our exercise by first constructing/simulating a Crime Dataset, which is based on Vancouver Postal Address information. If you are already working with a complete Crime Dataset, steps 1 – 3 can be omitted.

 

Simulate Crime Locations and Incident Type

(1) Add a reader to the canvas, specifying the ESRI Geodatabase (File Geodb Open API) as the format, and the “Addresses.gdb” Dataset. Click on the Parameters button, and select “PostalAddress” from the Table List.

(2) We will use a randomly selected subset of addresses to simulate the locations of crime events in Vancouver. Connect a Sampler transformer to the ESRI Geodatabase Reader. Set the Sampling Rate (N): to ‘10’, and Sampling Type: to ‘Every Nth Feature’. This will randomly select for every 10th address, and discard the remaining addresses.

(3) Now that we have simulated locations for our crimes, we will need to include information for the crime incident type. Add a CSV reader to the canvas, and select the “CRIMINAL_INCIDENT.csv” file. This CSV file will act as our lookup table, and contains 17 different crime incident types. Next, connect a RandomNumberGenerator transformer to the Sampler (Sampled output port), and set its:

  • Minimum Value = 1
  • Maximum Value = 17
  • Result Attribute to DESC_LINK

Add a FeatureMerger transformer, and connect its supplier input port to the CSV reader, and its requestor input port to the RandomNumberGenerator. Set the parameters of the FeatureMerger as:

  • Requestor = ‘DESC_LINK’
  • Supplier = ‘INCIDENT_ID’
  • Comparison Mode = Automatic

Ensure that the Feature Merge Type is set to ‘Attributes Only’. The result will be a randomly selected crime incident type, as defined in the CSV file, appended to our simulated crime locations.

 

simulated-crime.jpg

 

Modify Attribute Information

(4) We will remove information that should be kept private, as well as other unnecessary attribute information, from our dataset by using an AttributeManager transformer. Connect an AttributeManager transformer to the FeatureMerger (Merged output port), and remove attributes, keeping only:

  • OWNERNM1
  • OWNERNM2
  • PSTLADDRESS
  • PSTLCITY
  • INCIDENT_DESC

In addition, create 2 new attributes:

  • Output Attribute = ORIG_PSTLADDRESS, Attribute Value = PSTLADDRESS
  • Output Attribute = ORIG_INCIDENT_DESC, Attribute Value = INCIDENT_DESC

 

Remove “Block List” Words

(5) Our next step will be to modify “block list” information. For our example, we will replace all assault types (e.g. aggravated assault, simple assault, sexual assault) with “assault” to mask block list words. Connect a StringReplacer transformer to the AttributeManager. Open its parameters, and set;

  • Attributes: ‘INCIDENT_DESC’,
  • Text to Match: to
AGGRAVATED ASSAULT|SIMPLE ASSAULT|SEXUAL ASSAULT
  • Replacement Text: Assault
  • Use Regular Expression: Yes

 

Anonymize Crime Incident Locations to 100-block Addresses

(6) The first step in anonymizing addresses will be extracting the house address from the PSTLADDRESS attribute, and writing it to a new attribute (“_first_match”) by using a StringSearcher transformer. Connect a StringSearcher to the StringReplacer. Set the StringSearcher parameters;

  • Search In: PSTLADDRESS
  • Contains Regular Expression:
([^\s]+)  
  • Matched Result Attribute: _first_match

(7) The bulk of the work in anonymizing addresses makes use of an AttributeManager transformer with conditional statements. Conditional statements are used to first test for the length of the address, then replace trailing numbers with 0’s, and then writing this value to an attribute named “Address_Anon”.

Connect the AttributeManager to the StringSearcher (matched output port), open its parameters, and add in a new Output Attribute named “Address_Anon”. For the Attribute Value, click the arrow and use the ‘Conditional Value..’ editor. Complete the Condition Statements as below:

Left Value Operator Right Value

@StringLength(@Value(_first_match))

= 4
Output Value:    

@Left(@Value(_first_match),2)00

   
Left Value Operator Right Value

@StringLength(@Value(_first_match))

= 3
Output Value:    

@Left(@Value(_first_match),1)00

   
Left Value Operator Right Value

@StringLength(@Value(_first_match))

= 2
Output Value:    
0    
Left Value Operator Right Value

@StringLength(@Value(_first_match))

= 1
Output Value:    
0    

When finished, your final Condition Statement screen should appear as below:

conditional-statementtxt.jpg

(8) Next we will update the original address location with information generalized at the block level by using a StringReplacer transformer. Connect a StringReplacer to the AttributeManager. Open the StringReplacer parameters, and set:

  • Attributes: PSTLADDRESS
  • Text to Match: _first_match
  • Replacement Text: Address_Anon

This will overwrite the original address values with the 100-block level addresses.

 

Plot Incident Locations to the Midpoint of the Street Segments

We can now use our updated address information to reference incident locations to road segments, and move crime locations to the midpoint of the street segments.

(9) Add an AutoCAD DWG/DXF reader to the canvas, and select the Roads dataset. Open its parameters, and set Group entities by: ‘Attribute Schema’. Before adding the reader make sure to set the Workflow Option to ‘single merged feature type’. A new Roads feature type named <All> will be added to the canvas.

 

Special Cases in Address Matching – Street Ranges

Some of the roads in Vancouver do not have a single 100-block addresses, but instead make use of a range of addresses that share the same road segment (e.g.) ‘1300-1400 Laburnum St’. Before we can match our crime incidents to street segments, we must create road segments for each 100-block (e.g.) ‘1300 Laburnum St’, and ‘1400 Laburnum St’. We will make use of StringSearcher and StringReplacer transformers to modify the address information on road segments in order to match addresses.

(10) Connect a StringSearcher transformer to the AutoCad Reader. Open its parameters and set:

  • Search In: From_HBlock
  • Contains Regular Expression:
\-
  • Matched Result Attribute: _first_match

(11) Add 2 StringReplacers, and connect them to the Matched output port of the StringSearcher.

The first StringReplacer transformer will be used to keep the first address in the range, discard the second address, and append this information to the road segment. Set the parameters of the first StringReplacer transformer:

  • Attributes: HBlock
  • Text to Match:
(\-)([^\s]+)
  • Use Regular Expressions: Yes

The second StringReplacer will be used to keep the second address in the range, discard the first address, and append this information to the road segment. Set the parameters of the second StringReplacer transformer:

  • Attributes: HBlock
  • Text to Match:
([^\s]+)(\-)
  • Use Regular Expressions: Yes

 

Create Block Mid-Points

(12) Add a FeatureMerger transformer, and connect its Requestor input port to the StringReplacer transformer output port from step #8. Connect the FeatureMerger’s Supplier port to the StringReplacer transformers from step #11, and the StringSearcher NotMatched output port from step #10. The FeatureMerger connections should appear as below:

featuremerger.jpg

Open the FeatureMerger transformer parameters and and set the Join On, to:

  • Requestor = PSTLADDRESS
  • Supplier = HBlock
  • Comparison Mode = Automatic
  • Merge Parameters, Feature Merge Type = Attributes and Geometry

(13) Add a CenterPointReplacer transformer to the canvas, and connect it to the FeatureMerger Merged output port. Open its parameters, and set its Mode = Center Point. This will create a new center point for each road segment where a crime incident has taken place.

centerpointreplacer.jpg

 

Writing the Output

Our final steps will involve branching our workspace so that a KML file is created for public use that contains anonymized data, and a file is created for internal use that includes all the original attribute information. We will also modify the appearance of the final output files through the use of the KMLStyler transformer.

(14) Public Use and Distribution

Add a KMLStyler transformer to the canvas, and connect it to the CenterPointReplacer. Change the icon, (i.e. Name: gme/gx_placemark_circle_highlight), and set the Label Style Scale = 0.

Add a Google KML format Writer. Before adding it to the canvas, set the Feature Type Definition = Manual, and give the Feature Type Name = Anonymized. Connect it to the KMLStyler. Open the Writer’s parameters, and from the User Attributes tab add the following attributes:

  • PSTLADDRESS
  • PSTLCITY
  • INCIDENT_DESC

anonymized-data.jpg

 

(15) Internal Use

Add a KMLStyler transformer to the canvas, and connect it to the CenterPointReplacer. Change the icon, (i.e. Name: gme/gx_placemark_circle), and set the Label Style Scale = 0.

Add a Google KML format Writer. Before adding it to the canvas, set the Feature Type Definition = Manual, and give the Feature Type Name = Original. Connect it to the KMLStyler Open the Writer’s parameters, and from the User Attributes tab add the following attributes:

  • ORIG_PSTLADDRESS
  • PSTLCITY
  • ORIG_INCIDENT_DESC
  • OWNERNM1
  • OWNERNM2

(16) Run the workspace, and inspect the two resulting KML files.

internal-use.jpg

 

Sharing Encrypted data using the AttributeCompressor Transformer (Optional)

FME makes it possible to encrypt your data before sharing it through the use of the AttributeCompressor transformer. By selecting the attributes to encrypt, supplying a password, and selecting Encryption Type AES-256, you can apply encryption to the specified information in the output file. Once encrypted, the information can be decrypted by using the AttributeDecompressor transformer and the original password.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.