Using the Directory and File Pathnames Reader | Record File Metadata

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2020.2

Introduction

The Directory and File Pathnames reader is used when you want to manipulate the file name or directory of particular datasets or folders. This reader is often used in conjunction with the FilePartExtractor transformer or the File Copy writer. 

In this scenario, you have a folder containing folders from each department in your company which contains various files and documents. This folder is set to be archived, but you want to record, which files are contained and which department they belong to for quick reference should anyone require them in the future.  

 

Step-by-step Instructions

1. Add a Directory and File Pathnames Reader
In a blank workspace, add a Directory and File Pathnames reader to the canvas. Browse to the archive folder that is available from the Files section of this article in the upper-right corner. 


In the parameters, set the Recurse into Subfolders to Yes. This setting will allow the reader to see all the files, even if they are in subfolders. Additionally, set the Retrieve File Properties to Yes; since we are going to be creating a file containing meta, this setting is important. 


2. Test for Path Type
We are only interested in recording the metadata for the files, not the folders that contain the files. To split this data out, we can use the Tester. Add a Tester to the canvas and connect it to the PATH reader. In the parameters, set the test to:
path_type = file



3. Use the StringSearcher to Find Folder
Since our folders are structured by the department, we can determine who the files belong to by extracting the folder name out of the file path. We will need to use a Regular Expression to do this. 

Add a StringSearcher to the canvas and connect it to the Passed output port on the Tester. In the parameters, set the Search In value to path_windows, then set the Contains Regular Expression to: 
(\w+)(?=\\[^\\]+$)
To learn more about this expression, see the explanation at RegEx101.com

Finally, change the Matched Result Attribute to department, then click OK. 


4. Bulk Rename Attributes
Before we can write out the metadata file, we need to clean up the attributes. Since most of the attributes have the prefix path_ we can use the BulkAttributeRenamer to remove these quickly. Add a BulkAttributeRenamer to the canvas and connect it to the Matched output port on the StringSearcher. In the parameters, set the Action to Remove Prefix String and the String to path_



5. Rename Attributes
Now that we have removed path_ from all the attributes, we can use the AttributeManager to modify the remainder of the attributes. Add an AttributeManager to the canvas and connect it to the BulkAttributeRenamer. In the parameters, set the following:

Remove: 
  • unix
  • ownername
  • readonly
  • accessed_date
  • directory_unix
  • directory_windows
  • type

Rename:
  • windows to path



6. Write to Excel
With our data cleaned up, we can now write the metadata to Microsoft Excel. Add a Microsoft Excel reader, for the Dataset, browse to a location to save the file. Be sure to save it externally from the Archive folder. Name the file DepartmentFileMetadata.xlsx. Change the Sheet Definition to Automatic and click OK. 


In the Feature Type Definition dialog, change the Sheet Name to Metadata and click OK. After clicking OK, connect the Metadata writer feature type to the AttributeManager. For more information about working with Microsoft Excel, see Tutorial: Getting Started with Excel


7. Run the Workspace and View Output
Run the workspace and then view the output in either Microsoft Excel or Visual Preview. 

 

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.