Calculate the basis value for a given data set. There are various functions to calculate the basis values for different distributions. The basis value is the lower one-sided tolerance bound of a certain proportion of the population. For more information on tolerance bounds, see Meeker, et. al. (2017). For B-Basis, set the content of tolerance bound to \(p=0.90\) and the confidence level to \(conf=0.95\); for A-Basis, set \(p=0.99\) and \(conf=0.95\). While other tolerance bound contents and confidence levels may be computed, they are infrequently needed in practice.

These functions also perform some automated diagnostic tests of the data prior to calculating the basis values. These diagnostic tests can be overridden if needed.

basis_normal(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  override = c()
)

basis_lognormal(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  override = c()
)

basis_weibull(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  override = c()
)

basis_pooled_cv(
  data = NULL,
  x,
  groups,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  modcv = FALSE,
  override = c()
)

basis_pooled_sd(
  data = NULL,
  x,
  groups,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  modcv = FALSE,
  override = c()
)

basis_hk_ext(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  method = c("optimum-order", "woodward-frawley"),
  override = c()
)

basis_nonpara_large_sample(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  override = c()
)

basis_anova(data = NULL, x, groups, p = 0.9, conf = 0.95, override = c())

Arguments

data

a data.frame

x

the variable in the data.frame for which to find the basis value

batch

the variable in the data.frame that contains the batches.

p

the content of the tolerance bound. Should be 0.90 for B-Basis and 0.99 for A-Basis

conf

confidence level Should be 0.95 for both A- and B-Basis

override

a list of names of diagnostic tests to override, if desired. Specifying "all" will override all diagnostic tests applicable to the current method.

groups

the variable in the data.frame representing the groups

modcv

a logical value indicating whether the modified CV approach should be used. Only applicable to pooling methods.

method

the method for Hanson--Koopmans nonparametric basis values. should be "optimum-order" for B-Basis and "woodward-frawley" for A-Basis.

Value

an object of class basis

This object has the following fields:

  • call the expression used to call this function

  • distribution the distribution used (normal, etc.)

  • p the value of \(p\) supplied

  • conf the value of \(conf\) supplied

  • modcv a logical value indicating whether the modified CV approach was used. Only applicable to pooling methods.

  • data a copy of the data used in the calculation

  • groups a copy of the groups variable. Only used for pooling and ANOVA methods.

  • batch a copy of the batch data used for diagnostic tests

  • modcv_transformed_data the data after the modified CV transformation

  • override a vector of the names of diagnostic tests that were overridden. NULL if none were overridden

  • diagnostic_results a named character vector containing the results of all the diagnostic tests. See the Details section for additional information

  • diagnostic_failures a vector containing any diagnostic tests that produced failures

  • n the number of observations

  • r the number of groups, if a pooling method was used. Otherwise it is NULL.

  • basis the basis value computed. This is a number except when pooling methods are used, in which case it is a data.frame.

Details

data is an optional argument. If data is given, it should be a data.frame (or similar object). When data is specified, the value of x is expected to be a variable within data. If data is not specified, x must be a vector.

When modcv=TRUE is set, which is only applicable to the pooling methods, the data is first modified according to the modified coefficient of variation (CV) rules. This modified data is then used when both calculating the basis values and also when performing the diagnostic tests. The modified CV approach is a way of adding extra variance to datasets with unexpectedly low variance.

basis_normal calculate the basis value by subtracting \(k\) times the standard deviation from the mean. \(k\) is given by the function k_factor_normal(). The equations in Krishnamoorthy and Mathew (2008) are used. basis_normal also performs a diagnostic test for outliers (using maximum_normed_residual()) and a diagnostic test for normality (using anderson_darling_normal()). If the argument batch is given, this function also performs a diagnostic test for outliers within each batch (using maximum_normed_residual()) and a diagnostic test for between batch variability (using ad_ksample()). The argument batch is only used for these diagnostic tests.

basis_lognormal calculates the basis value in the same way that basis_normal does, except that the natural logarithm of the data is taken.

basis_lognormal function also performs a diagnostic test for outliers (using maximum_normed_residual()) and a diagnostic test for normality (using anderson_darling_lognormal()). If the argument batch is given, this function also performs a diagnostic test for outliers within each batch (using maximum_normed_residual()) and a diagnostic test for between batch variability (using ad_ksample()). The argument batch is only used for these diagnostic tests.

basis_weibull calculates the basis value for data distributed according to a Weibull distribution. The confidence level for the content requested is calculated using the conditional method, as described in Lawless (1982) Section 4.1.2b. This has good agreement with tables published in CMH-17-1G. Results differ between this function and STAT17 by approximately 0.5\

basis_weibull function also performs a diagnostic test for outliers (using maximum_normed_residual()) and a diagnostic test for normality (using anderson_darling_weibull()). If the argument batch is given, this function also performs a diagnostic test for outliers within each batch (using maximum_normed_residual()) and a diagnostic test for between batch variability (using ad_ksample()). The argument batch is only used for these diagnostic tests.

basis_hk_ext calculates the basis value using the Extended Hanson--Koopmans method, as described in CMH-17-1G and Vangel (1994). For nonparametric distributions, this function should be used for samples up to n=28 for B-Basis and up to \(n=299\) for A-Basis. This method uses a pair of order statistics to determine the basis value. CMH-17-1G suggests that for A-Basis, the first and last order statistic is used: this is called the "woodward-frawley" method in this package, after the paper in which this approach is described (as referenced by Vangel (1994)). For B-Basis, another approach is used whereby the first and j-th order statistic are used to calculate the basis value. In this approach, the j-th order statistic is selected to minimize the difference between the tolerance limit (assuming that the order statistics are equal to the expected values from a standard normal distribution) and the population quantile for a standard normal distribution. This approach is described in Vangel (1994). This second method (for use when calculating B-Basis values) is called "optimum-order" in this package. The results of basis_hk_ext have been verified against example results from the program STAT-17. Agreement is typically well within 0.2%.

Note that the implementation of hk_ext_z_j_opt changed after cmstatr version 0.8.0. This function is used internally by basis_hk_ext when method = "optimum-order". This implementation change may mean that basis values computed using this method may change slightly after version 0.8.0. However, both implementations seem to be equally valid. See the included vignette for a discussion of the differences between the implementation before and after version 0.8.0, as well as the factors given in CMH-17-1G. To access this vignette, run: vignette("hk_ext", package = "cmstatr")

basis_hk_ext also performs a diagnostic test for outliers (using maximum_normed_residual()) and performs a pair of tests that the sample size and method selected follow the guidance described above. If the argument batch is given, this function also performs a diagnostic test for outliers within each batch (using maximum_normed_residual()) and a diagnostic test for between batch variability (using ad_ksample()). The argument batch is only used for these diagnostic tests.

basis_nonpara_large_sample calculates the basis value using the large sample method described in CMH-17-1G. This method uses a sum of binomials to determine the rank of the ordered statistic corresponding with the desired tolerance limit (basis value). Results of this function have been verified against results of the STAT-17 program.

basis_nonpara_large_sample also performs a diagnostic test for outliers (using maximum_normed_residual()) and performs a test that the sample size is sufficiently large. If the argument batch is given, this function also performs a diagnostic test for outliers within each batch (using maximum_normed_residual()) and a diagnostic test for between batch variability (using ad_ksample()). The argument batch is only used for these diagnostic tests.

basis_anova calculates basis values using the ANOVA method. x specifies the data (normally strength) and groups indicates the group corresponding to each observation. This method is described in CMH-17-1G, but when the ratio of between-batch mean square to the within-batch mean square is less than or equal to one, the tolerance factor is calculated based on pooling the data from all groups. This approach is recommended by Vangel (1992) and by Krishnamoorthy and Mathew (2008), and is also implemented by the software CMH17-STATS and STAT-17. This function automatically performs a diagnostic test for outliers within each group (using maximum_normed_residual()) and a test for between group variability (using ad_ksample()) as well as checking that the data contains at least 5 groups. This function has been verified against the results of the STAT-17 program.

basis_pooled_sd calculates basis values by pooling the data from several groups together. x specifies the data (normally strength) and group indicates the group corresponding to each observation. This method is described in CMH-17-1G and matches the pooling method implemented in ASAP 2008.

basis_pooled_cv calculates basis values by pooling the data from several groups together. x specifies the data (normally strength) and group indicates the group corresponding to each observation. This method is described in CMH-17-1G.

basis_pooled_sd and basis_pooled_cv both automatically perform a number of diagnostic tests. Using maximum_normed_residual(), they check that there are no outliers within each group and batch (provided that batch is specified). They check the between batch variability using ad_ksample(). They check that there are no outliers within each group (pooling all batches) using maximum_normed_residual(). They check for the normality of the pooled data using anderson_darling_normal(). basis_pooled_sd checks for equality of variance of all data using levene_test() and basis_pooled_cv checks for equality of variances of all data after transforming it using normalize_group_mean() using levene_test().

The object returned by these functions includes the named vector diagnostic_results. This contains all of the diagnostic tests performed. The name of each element of the vector corresponds with the name of the diagnostic test. The contents of each element will be "P" if the diagnostic test passed, "F" if the diagnostic test failed, "O" if the diagnostic test was overridden and NA if the diagnostic test was skipped (typically because an optional argument was not supplied).

The following list summarizes the diagnostic tests automatically performed by each function.

  • basis_normal

    • outliers_within_batch

    • between_batch_variability

    • outliers

    • anderson_darling_normal

  • basis_lognormal

    • outliers_within_batch

    • between_batch_variability

    • outliers

    • anderson_darling_lognormal

  • basis_weibull

    • outliers_within_batch

    • between_batch_variability

    • outliers

    • anderson_darling_weibull

  • basis_pooled_cv

    • outliers_within_batch

    • between_group_variability

    • outliers_within_group

    • pooled_data_normal

    • normalized_variance_equal

  • basis_pooled_sd

    • outliers_within_batch

    • between_group_variability

    • outliers_within_group

    • pooled_data_normal

    • pooled_variance_equal

  • basis_hk_ext

    • outliers_within_batch

    • between_batch_variability

    • outliers

    • sample_size

  • basis_nonpara_large_sample

    • outliers_within_batch

    • between_batch_variability

    • outliers

    • sample_size

  • basis_anova

    • outliers_within_group

    • equality_of_variance

    • number_of_groups

References

J. F. Lawless, Statistical Models and Methods for Lifetime Data. New York: John Wiley & Sons, 1982.

“Composite Materials Handbook, Volume 1. Polymer Matrix Composites Guideline for Characterization of Structural Materials,” SAE International, CMH-17-1G, Mar. 2012.

M. Vangel, “One-Sided Nonparametric Tolerance Limits,” Communications in Statistics - Simulation and Computation, vol. 23, no. 4. pp. 1137–1154, 1994.

K. Krishnamoorthy and T. Mathew, Statistical Tolerance Regions: Theory, Applications, and Computation. Hoboken: John Wiley & Sons, 2008.

W. Meeker, G. Hahn, and L. Escobar, Statistical Intervals: A Guide for Practitioners and Researchers, Second Edition. Hoboken: John Wiley & Sons, 2017.

M. Vangel, “New Methods for One-Sided Tolerance Limits for a One-Way Balanced Random-Effects ANOVA Model,” Technometrics, vol. 34, no. 2. Taylor & Francis, pp. 176–185, 1992.

Examples

library(dplyr)

# A single-point basis value can be calculated as follows
# in this example, three failed diagnostic tests are
# overridden.

carbon.fabric %>%
  filter(test == "FC") %>%
  filter(condition == "RTD") %>%
  basis_normal(strength, batch,
               override = c("outliers",
                            "outliers_within_batch",
                            "anderson_darling_normal"))
#> 
#> Call:
#> basis_normal(data = ., x = strength, batch = batch, override = c("outliers", 
#>     "outliers_within_batch", "anderson_darling_normal"))
#> 
#> Distribution:  Normal 	( n = 18 )
#> The following diagnostic tests were overridden:
#>     `outliers`,
#>     `outliers_within_batch`,
#>     `anderson_darling_normal`
#> B-Basis:   ( p = 0.9 , conf = 0.95 )
#> 76.94656 
#> 

## Call:
## basis_normal(data = ., x = strength, batch = batch,
##     override = c("outliers", "outliers_within_batch",
##    "anderson_darling_normal"))
##
## Distribution:  Normal   ( n = 18 )
## The following diagnostic tests were overridden:
##     `outliers`,
##     `outliers_within_batch`,
##     `anderson_darling_normal`
## B-Basis:   ( p = 0.9 , conf = 0.95 )
## 76.94656

# A set of pooled basis values can also be calculated
# using the pooled standard deviation method, as follows.
# In this example, one failed diagnostic test is overridden.
carbon.fabric %>%
  filter(test == "WT") %>%
  basis_pooled_sd(strength, condition, batch,
                  override = c("outliers_within_batch"))
#> 
#> Call:
#> basis_pooled_sd(data = ., x = strength, groups = condition, batch = batch, 
#>     override = c("outliers_within_batch"))
#> 
#> Distribution:  Normal - Pooled Standard Deviation 	( n = 54, r = 3 )
#> The following diagnostic tests were overridden:
#>     `outliers_within_batch`
#> B-Basis:   ( p = 0.9 , conf = 0.95 )
#> CTD  127.6914 
#> ETW  125.0698 
#> RTD  132.1457 
#> 

## Call:
## basis_pooled_sd(data = ., x = strength, groups = condition,
##                 batch = batch, override = c("outliers_within_batch"))
##
## Distribution:  Normal - Pooled Standard Deviation   ( n = 54, r = 3 )
## The following diagnostic tests were overridden:
##     `outliers_within_batch`
## B-Basis:   ( p = 0.9 , conf = 0.95 )
## CTD  127.6914
## ETW  125.0698
## RTD  132.1457