Basic Statistical Custom Transformers

Files

rshapirowilks-testworkspace.fmwt
- 10 KB
- Download
clustercalculator-testworkspace.fmwt
- 2 MB
- Download
rcorrelation-testworkspace.fmwt
- 7 KB
- Download
ronettest-testworkspace.fmwt
- 7 KB
- Download
rtwottest-testworkspace.fmwt
- 8 KB
- Download
pyshapirowilks-testworkspace.fmwt
- 10 KB
- Download

Introduction

The power of FME lies in its ability to take data from multiple sources and manipulate it efficiently. So why not use FME for data science? We’ve recently added a series of transformers to the FME hub that perform a few basic statistical tests using the RCaller or the PythonCaller.

If you don't see the statistical test you are looking for in this list, you can create your own and upload it to the FME Hub to share with other users. Alternatively, you can create a new Idea, and if it receives enough votes, it will be added to the list.

Learning

Perform a Shapiro-Wilks Statistical Test using R or Python

Learn how to create a custom transformer using either R or Python to perform the Shapiro-Wilks test (to test for the normality of a distribution). This workflow can be adapted for any statistical test using R or Python.

Transformers

Each transformer listed has a link to the FME Hub page as well as a test workspace download. Due to the external software requirements for R, these test workspaces could not be uploaded to the hub. Each of the R transformers requires R to be installed on the users' machine as well as the sqldf R package. For the Python transformers, the SciPy Python package needs to be installed.

Correlation

A correlation is a statistical test used to determine the association between two variables.

The RCorrelationCalculator package uses R to calculate if there is an association between two variables.

To try this out yourself, you can open the RCorrelation-TestWorkspace.fmwt workspace from the files in the sidebar.

Cluster Analysis

A Cluster Analysis is a method for determining groups within a dataset.

The RClusterCalculator uses R to calculate similar groups of data using one of three algorithms.

To try this out yourself, you can open the RClusterCalculator-TestWorkspace.fmwt workspace from the files in the sidebar.

Shapiro-Wilks Test

The Shapiro-Wilks test calculates whether a random sample of data comes from a normal distribution.

The RShapiroWilksCalculator package utilizes R and the RCaller; this transformer determines whether a random sample of data originates from a normal distribution using the Shapiro-Wilk test.

To try this out yourself, you can open the RShapiroWilks-TestWorkspace.fmwt workspace from the files in the sidebar.

The PyShapiroWilksCalculator package utilizes SciPy and the PythonCaller; this transformer determines whether a random sample of data originates from a normal distribution using the Shapiro-Wilk test.

To try this out yourself, you can open the PyShapiroWilks-TestWorkspace.fmwt workspace from the files in the sidebar.

T-Test

A T-test is a statistical test to test if the means of two samples are significantly different from random.

The ROneSampleTTestCalculator package is a one-sample t-test calculator that tests the null hypothesis that the population mean equals a specified value. In other words, it tells you if the mean of your sample is close enough to a certain number to be statistically significant. This test outputs the t-value, p-value, confidence interval and the estimate.

To try this out yourself, you can open the ROneSampleTTest-TestWorkspace.fmwt workspace from the files in the sidebar.

The RTwoSampleTTestCalculator package is a two-sample t-test that compares the means of two groups to determine if they are significantly different or if the difference is due to random chance. This test outputs the t-value, p-value, confidence interval, and the estimate.

To try this out yourself, you can open the RTwoTTest-TestWorkspace.fmwt workspace from the files in the sidebar.

Search