Unimodality Testing
Unimodality Testing

Contents

Introduction

The aim of unimodality testing is to check whether the distribution of the data is unimodal or multimodal.

For many people it may refer to clustering but it is weaker: it does not clearly find neither the clusters nor the number of clusters. It only claims whether there is a single cluster or several. Moreover unimodality testing is generally more efficient than clustering (lower complexity).

Here is presented the univariate case, but the FTU works also on multivariate distributions!

Unimodal distribution
Multimodal distribution

Understanding the folding mechanism

Let us give some explanations about how the FTU works.

Let us consider a bimodal distribution. Because of the two modes its variance is quite large.
Now, the idea is to find a "suitable" pivot noted s*...
... so as to fold a mode (for example the left one) onto the other one.
The resulting distribution (i.e. the sum of the right mode and the folded left mode) has then a lower variance.

Obviously, no matter the shape of the distribution, the variance is reduced by folding. But, the reduction is far greater in the multimodal case than in the unimodal case. The FTU uses this phenomenon to discriminate unimodal distribution from multimodal ones.

Output of the test

The folding statistics

No matter the library you use, the output results are quite the same. The main returned result is the folding statistics noted $\Phi$. It scores the unimodality character of the input data distribution.

If $\Phi\ge1$ the distribution is rather unimodal while $\Phi< 1$ indicates it is rather multimodal.

So the final decision of the test is based on the value of $\Phi$.

The p-value

Moreover, the test also outputs a p-value which indicates the significance of the test. The lower, the better. The folding statistics has naturally a direct impact on the p-value but the latter also depends on the size of the dataset.

Indeed, according to the amount of data you have, you have not the same knowledge about the underlying distribution, so the test has not the same relevance. Thus, the the more data you have, the more significant the test will be.

Usually, the test is considered as significant when the p-value is lower than 0.05.

The folding pivot

Finally, as the test needs to compute the folding pivot (noted $s^*$), you can retrieve this information.

Other unimodality tests

Müller, D. W., & Sawitzki, G. (1991). Excess mass estimates and tests for multimodality. Journal of the American Statistical Association, 86(415), 738-746.

Hartigan, J. A., & Hartigan, P. M. (1985). The dip test of unimodality. The annals of Statistics, 13(1), 70-84.

Silverman, B. W. (1981). Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society. Series B (Methodological), 97-99.

  • C++
  • R
  • ···