Critical values for 33 discordancy test variants for outliers in normal samples up to sizes 1000, and applications in quality control in Earth Sciences

  • Surendra P. Verma Centro de Investigación en Energía, Universidad Nacional Autónoma de México, Priv. Xochicalco s/n, Col. Centro, Apartado Postal 34, Temixco 62580, Mexico.
  • Alfredo Quiroz-Ruiz Centro de Investigación en Energía, Universidad Nacional Autónoma de México, Priv. Xochicalco s/n, Col. Centro, Apartado Postal 34, Temixco 62580, Mexico.
  • Lorena Díaz-González Posgrado en Ingeniería (Energía), sede Centro de Investigación en Energía, Universidad Nacional Autónoma de México, Priv. Xochicalco s/n, Col. Centro, Apartado Postal 34, Temixco 62580, Mexico.
Keywords: outlier methods, normal sample, two standard deviation method, 2s method, reference materials, Monte Carlo simulation, critical values, Dixon tests, skewness, kurtosis, artificial neural network, ANN, statistics, petroleum hydrocarbon, Nd isotopes, ...

Abstract

In two earlier papers (Verma and Quiroz-Ruiz, 2006, Rev. Mex. Cienc. Geol., 23, 133-161, 302-319) precise critical values for normal univariate samples of sizes n up to 100 have been reported. However, for greater n, critical values are available only for a few tests: N1 for n up to 147, N4k2 for n up to 149, N6, N14 and N15 (for the latter three tests, critical values were reported for only n=200, 500, and 1000). This clearly demonstrates the need for proposing new critical values for n>100 through an adequate statistical methodology. Therefore, modifications of our earlier simulation procedure as well as new, precise, and accurate critical values or percentage points (with four to eight decimal places; average standard error of the mean ~0.00000003–0.0039) of 15 discordancy tests with 33 test variants, and each with seven significance levels α = 0.30, 0.20, 0.10, 0.05, 0.02, 0.01, and 0.005, for normal samples of sizes n up to 1000, viz., nmin (1)100(5)200(10)500(20)1000, are reported. For the first time in the literature, the standard error of the mean is also reported explicitly and individually for each critical value. Similarly, a new methodology involving artificial neural network (ANN) was used, for the first time in published literature, to obtain interpolation equations for all 33 discordancy test variants and for each of the seven significance levels. Each equation was fitted using 76 simulated data for n from 100 to 1000 for a given test and significance level. Extremely small sums of squared residuals (~5.5×10-8 – 8.4×10-5; generally <10-5) in the ANN equations fitted for n=100 to 1,000 were obtained. As a result, the applicability of these discordancy tests is now extended up to 1000 observations of a particular parameter in a statistical sample. The new most precise and accurate critical values will result in more reliable applications of these discordancy tests than have been possible so far in various scientific and engineering fields, particularly for quality control in Earth Sciences. The multiple-test method with new critical values was shown to perform better than both the box-and-whisker plot and the “two standard deviation” methods used by some researchers, and is therefore the recommended procedure for handling experimental data.

Published
2018-01-22
Section
Regular Papers