Basic Statistical Custom Transformers

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2017.x

Introduction

The power of FME is being able to take data from multiple sources and manipulate it efficiently. So why not use FME for data science?

We’ve recently added a series of transformers to the FME hub that performs a few basic statistical tests using the RCaller or the PythonCaller.

If you don't see the statistical test you are looking for in this list, you can create your own and upload it to the FME Hub to share with other users or create a new Idea and if it gets enough votes will add it to the list.

 

Learning

Perform a Shapiro-Wilks Statistical Test using R or Python

Learn how to create a custom transformer using either R or Python to perform the Shapiro-Wilks test (to test for the normality of a distribution). This workflow can be adapted for any statistical test using R or Python.

 

Transformers

Each transformer listed has a link to the FME Hub page as well as a test workspace download. Due to the external software requirements for R, these test workspaces could not be uploaded to the hub. Each of the R transformers requires R to be installed on the users' machine as well as the sqldf R package. For the Python transformers, the SciPy Python package needs to be installed.

 

Correlation

A correlation is a test between two variables to determine their association.

 

RCorrelationCalculator

Uses R to calculate if there is an association between two variables.

RCorrelation-TestWorkspace.fmwt

 

Cluster Analysis

A Cluster Analysis is a method for determining groups of data.

 

RClusterCalculator

Uses R to calculate similar groups of data using one of three algorithms. This transformer only works for 2018.0+

RClusterCalculator-TestWorkspace.fmwt

 

Shapiro-Wilks Test

The Shapiro-Wilks test calculates whether a random sample of data comes from a normal distribution.

 

RShapiroWilksCalculator

Using R and the RCaller this transformer calculates whether a random sample of data comes from a normal distribution using the Shapiro-Wilks test.

RShapiroWilks-TestWorkspace.fmwt

 

PyShapiroWilksCalculator

Using Scipy and the PythonCaller, this transformer calculates whether a random sample of data comes from a normal distribution using the Shapiro-Wilks test.

PyShapiroWilks-TestWorkspace.fmwt

 

T-Test

A T-Test is a statistical test to test if the means of two samples are significantly different from random.

 

ROneSampleTTestCalculator

The one-sample t-test tests the null hypothesis that the population mean is equal to a specified value, In other words, it tells you if the mean of your sample is close enough to a certain number to be statistically significant. This test outputs the t-value, p-value, confidence interval and the estimate.

ROneSampleTTest-TestWorkspace.fmwt

 

RTwoSampleTTestCalculator

The two-sample t-test tests the mean of two groups to determine if they are significantly different or it is by random chance. This test outputs the t-value, p-value, confidence interval and the estimate.

RTwoTTest-TestWorkspace.fmwt

 

 

 

 

 

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.