Critical values for 33 discordancy test variants for outliers in normal samples of very large sizes from 1,000 to 30,000 and evaluation of different regression models for the interpolation and extrapolation of critical values

Regular Papers

Published 2018-01-22

Surendra P. Verma⁺⁻
Alfredo Quiroz-Ruiz⁺⁻

Surendra P. Verma

Centro de Investigación en Energía, Universidad Nacional Autónoma de México, Priv. Xochicalco s/no., Col Centro, Apartado Postal 34, Temixco 62580, Mexico.

Alfredo Quiroz-Ruiz

Centro de Investigación en Energía, Universidad Nacional Autónoma de México, Priv. Xochicalco s/no., Col Centro, Apartado Postal 34, Temixco 62580, Mexico.

PDF

Keywords

outlier methods
normal sample
Monte Carlo simulations
critical value tables
Dixon tests
Grubbs tests
skewness
kurtosis
statistics
regression equations
log-transformation
proteomics.

How to Cite

Verma, S. P., & Quiroz-Ruiz, A. (2018). Critical values for 33 discordancy test variants for outliers in normal samples of very large sizes from 1,000 to 30,000 and evaluation of different regression models for the interpolation and extrapolation of critical values. Revista Mexicana De Ciencias Geológicas, 25(3), 369–381. Retrieved from https://rmcg.unam.mx/index.php/rmcg/article/view/681

Citas en Dimensions Service

Share on

Abstract

In this final paper of a series of four, using our well-tested simulation procedure we report new, precise, and accurate critical values or percentage points (with four to eight decimal places) of 15 discordancy tests with 33 test variants, and each with seven signi ficance levels a = 0.30, 0.20, 0.10, 0.05, 0.02, 0.01, and 0.005, for normal samples of very large sizes n from 1,000 to 30,000,viz.,1,000(50)1,500(100)2,000(500)5,000(1,000)10,000(10,00)30,000, i.e., 1,000 (steps of 50) 1,500 (steps of 100) 2,000 (steps of 500) 5,000 (steps of 1,000) 10,000 (steps of 10,000) 30,000. The standard error of the mean is also reported explicitly and individually for each critical value. As a result, the applicability of these discordancy tests is now extended to practically all sample sizes (up to 30,000 observations or even greater). This final set of critical values for very large sample sizes would cover any present or future needs for the application of these discordancy tests in all fields of science and engineering. Because the critical values were simulated for only a few sample sizes between 1,000 and 30,000, six different regression models were evaluated for the interpolation and extrapolation purposes, and a combined natural logarithm-cubic model was shown to be the most appropriate. This is the first time in the literature that a log-transformation of the sample size n before a polynomial fit is shown to perform better than the conventional linear to polynomial regressions hitherto used. We also use 1,402 unpublished datasets from quantitative proteomics to show that our multiple-test method works more efficiently than the MAD_Z robust outlier method used for processing these data and to illustrate thus the usefulness of our final work on these lines.

PDF

This work is licensed under a Creative Commons Attribution 4.0 International License.

Downloads

Download data is not yet available.