Calculate the basis value for a given data set. There are various functions to calculate the basis values for different distributions. The basis value is the lower one-sided tolerance bound of a certain proportion of the population. For more information on tolerance bounds, see Meeker, et. al. (2017). For B-Basis, set the content of tolerance bound to \(p=0.90\) and the confidence level to \(conf=0.95\); for A-Basis, set \(p=0.99\) and \(conf=0.95\). While other tolerance bound contents and confidence levels may be computed, they are infrequently needed in practice.

These functions also perform some automated diagnostic tests of the data prior to calculating the basis values. These diagnostic tests can be overridden if needed.

```
basis_normal(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
override = c()
)
basis_lognormal(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
override = c()
)
basis_weibull(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
override = c()
)
basis_pooled_cv(
data = NULL,
x,
groups,
batch = NULL,
p = 0.9,
conf = 0.95,
modcv = FALSE,
override = c()
)
basis_pooled_sd(
data = NULL,
x,
groups,
batch = NULL,
p = 0.9,
conf = 0.95,
modcv = FALSE,
override = c()
)
basis_hk_ext(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
method = c("optimum-order", "woodward-frawley"),
override = c()
)
basis_nonpara_large_sample(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
override = c()
)
basis_anova(data = NULL, x, groups, p = 0.9, conf = 0.95, override = c())
```

- data
a data.frame

- x
the variable in the data.frame for which to find the basis value

- batch
the variable in the data.frame that contains the batches.

- p
the content of the tolerance bound. Should be 0.90 for B-Basis and 0.99 for A-Basis

- conf
confidence level Should be 0.95 for both A- and B-Basis

- override
a list of names of diagnostic tests to override, if desired. Specifying "all" will override all diagnostic tests applicable to the current method.

- groups
the variable in the data.frame representing the groups

- modcv
a logical value indicating whether the modified CV approach should be used. Only applicable to pooling methods.

- method
the method for Hanson--Koopmans nonparametric basis values. should be "optimum-order" for B-Basis and "woodward-frawley" for A-Basis.

an object of class `basis`

This object has the following fields:

`call`

the expression used to call this function`distribution`

the distribution used (normal, etc.)`p`

the value of \(p\) supplied`conf`

the value of \(conf\) supplied`modcv`

a logical value indicating whether the modified CV approach was used. Only applicable to pooling methods.`data`

a copy of the data used in the calculation`groups`

a copy of the groups variable. Only used for pooling and ANOVA methods.`batch`

a copy of the batch data used for diagnostic tests`modcv_transformed_data`

the data after the modified CV transformation`override`

a vector of the names of diagnostic tests that were overridden.`NULL`

if none were overridden`diagnostic_results`

a named character vector containing the results of all the diagnostic tests. See the Details section for additional information`diagnostic_failures`

a vector containing any diagnostic tests that produced failures`n`

the number of observations`r`

the number of groups, if a pooling method was used. Otherwise it is NULL.`basis`

the basis value computed. This is a number except when pooling methods are used, in which case it is a data.frame.

`data`

is an optional argument. If `data`

is given, it should
be a
`data.frame`

(or similar object). When `data`

is specified, the
value of `x`

is expected to be a variable within `data`

. If
`data`

is not specified, `x`

must be a vector.

When `modcv=TRUE`

is set, which is only applicable to the
pooling methods,
the data is first modified according to the modified coefficient
of variation (CV)
rules. This modified data is then used when both calculating the
basis values and
also when performing the diagnostic tests. The modified CV approach
is a way of
adding extra variance to datasets with unexpectedly low variance.

`basis_normal`

calculate the basis value by subtracting \(k\) times
the standard deviation from the mean. \(k\) is given by
the function `k_factor_normal()`

. The equations in
Krishnamoorthy and Mathew (2008) are used.
`basis_normal`

also
performs a diagnostic test for outliers (using
`maximum_normed_residual()`

)
and a diagnostic test for normality (using
`anderson_darling_normal()`

).
If the argument `batch`

is given, this function also performs
a diagnostic test for outliers within
each batch (using `maximum_normed_residual()`

)
and a diagnostic test for between batch variability (using
`ad_ksample()`

). The argument `batch`

is only used
for these diagnostic tests.

`basis_lognormal`

calculates the basis value in the same way
that `basis_normal`

does, except that the natural logarithm of the
data is taken.

`basis_lognormal`

function also performs
a diagnostic test for outliers (using
`maximum_normed_residual()`

)
and a diagnostic test for normality (using
`anderson_darling_lognormal()`

).
If the argument `batch`

is given, this function also performs
a diagnostic test for outliers within
each batch (using `maximum_normed_residual()`

)
and a diagnostic test for between batch variability (using
`ad_ksample()`

). The argument `batch`

is only used
for these diagnostic tests.

`basis_weibull`

calculates the basis value for data distributed
according to a Weibull distribution. The confidence level for the
content requested is calculated using the conditional method, as
described in Lawless (1982) Section 4.1.2b. This has good agreement
with tables published in CMH-17-1G. Results differ between this function
and STAT17 by approximately 0.5\

`basis_weibull`

function also performs
a diagnostic test for outliers (using
`maximum_normed_residual()`

)
and a diagnostic test for normality (using
`anderson_darling_weibull()`

).
If the argument `batch`

is given, this function also performs
a diagnostic test for outliers within
each batch (using `maximum_normed_residual()`

)
and a diagnostic test for between batch variability (using
`ad_ksample()`

). The argument `batch`

is only used
for these diagnostic tests.

`basis_hk_ext`

calculates the basis value using the Extended
Hanson--Koopmans method, as described in CMH-17-1G and Vangel (1994).
For nonparametric distributions, this function should be used for samples
up to n=28 for B-Basis and up to \(n=299\) for A-Basis.
This method uses a pair of order statistics to determine the basis value.
CMH-17-1G suggests that for A-Basis, the first and last order statistic
is used: this is called the "woodward-frawley" method in this package,
after the paper in which this approach is described (as referenced
by Vangel (1994)). For B-Basis, another approach is used whereby the
first and `j-th`

order statistic are used to calculate the basis value.
In this approach, the `j-th`

order statistic is selected to minimize
the difference between the tolerance limit (assuming that the order
statistics are equal to the expected values from a standard normal
distribution) and the population quantile for a standard normal
distribution. This approach is described in Vangel (1994). This second
method (for use when calculating B-Basis values) is called
"optimum-order" in this package.
The results of `basis_hk_ext`

have been
verified against example results from the program STAT-17. Agreement is
typically well within 0.2%.

Note that the implementation of `hk_ext_z_j_opt`

changed after `cmstatr`

version 0.8.0. This function is used internally by `basis_hk_ext`

when `method = "optimum-order"`

. This implementation change may mean
that basis values computed using this method may change slightly
after version 0.8.0. However, both implementations seem to be equally
valid. See the included vignette
for a discussion of the differences between the implementation before
and after version 0.8.0, as well as the factors given in CMH-17-1G.
To access this vignette, run: `vignette("hk_ext", package = "cmstatr")`

`basis_hk_ext`

also performs
a diagnostic test for outliers (using
`maximum_normed_residual()`

)
and performs a pair of tests that the sample size and method selected
follow the guidance described above.
If the argument `batch`

is given, this function also performs
a diagnostic test for outliers within
each batch (using `maximum_normed_residual()`

)
and a diagnostic test for between batch variability (using
`ad_ksample()`

). The argument `batch`

is only used
for these diagnostic tests.

`basis_nonpara_large_sample`

calculates the basis value
using the large sample method described in CMH-17-1G. This method uses
a sum of binomials to determine the rank of the ordered statistic
corresponding with the desired tolerance limit (basis value). Results
of this function have been verified against results of the STAT-17
program.

`basis_nonpara_large_sample`

also performs
a diagnostic test for outliers (using
`maximum_normed_residual()`

)
and performs a test that the sample size is sufficiently large.
If the argument `batch`

is given, this function also performs
a diagnostic test for outliers within
each batch (using `maximum_normed_residual()`

)
and a diagnostic test for between batch variability (using
`ad_ksample()`

). The argument `batch`

is only used
for these diagnostic tests.

`basis_anova`

calculates basis values using the ANOVA method.
`x`

specifies the data (normally strength) and `groups`

indicates the group corresponding to each observation. This method is
described in CMH-17-1G, but when the ratio of between-batch mean
square to the within-batch mean square is less than or equal
to one, the tolerance factor is calculated based on pooling the data
from all groups. This approach is recommended by Vangel (1992)
and by Krishnamoorthy and Mathew (2008), and is also implemented
by the software CMH17-STATS and STAT-17.
This function automatically performs a diagnostic
test for outliers within each group
(using `maximum_normed_residual()`

) and a test for between
group variability (using `ad_ksample()`

) as well as checking
that the data contains at least 5 groups.
This function has been verified against the results of the STAT-17 program.

`basis_pooled_sd`

calculates basis values by pooling the data from
several groups together. `x`

specifies the data (normally strength)
and `group`

indicates the group corresponding to each observation.
This method is described in CMH-17-1G and matches the pooling method
implemented in ASAP 2008.

`basis_pooled_cv`

calculates basis values by pooling the data from
several groups together. `x`

specifies the data (normally strength)
and `group`

indicates the group corresponding to each observation.
This method is described in CMH-17-1G.

`basis_pooled_sd`

and `basis_pooled_cv`

both automatically
perform a number of diagnostic tests. Using
`maximum_normed_residual()`

, they check that there are no
outliers within each group and batch (provided that `batch`

is
specified). They check the between batch variability using
`ad_ksample()`

. They check that there are no outliers within
each group (pooling all batches) using
`maximum_normed_residual()`

. They check for the normality
of the pooled data using `anderson_darling_normal()`

.
`basis_pooled_sd`

checks for equality of variance of all
data using `levene_test()`

and `basis_pooled_cv`

checks for equality of variances of all data after transforming it
using `normalize_group_mean()`

using `levene_test()`

.

The object returned by these functions includes the named vector
`diagnostic_results`

. This contains all of the diagnostic tests
performed. The name of each element of the vector corresponds with the
name of the diagnostic test. The contents of each element will be
"P" if the diagnostic test passed, "F" if the diagnostic test failed,
"O" if the diagnostic test was overridden and `NA`

if the
diagnostic test was skipped (typically because an optional
argument was not supplied).

The following list summarizes the diagnostic tests automatically performed by each function.

`basis_normal`

`outliers_within_batch`

`between_batch_variability`

`outliers`

`anderson_darling_normal`

`basis_lognormal`

`outliers_within_batch`

`between_batch_variability`

`outliers`

`anderson_darling_lognormal`

`basis_weibull`

`outliers_within_batch`

`between_batch_variability`

`outliers`

`anderson_darling_weibull`

`basis_pooled_cv`

`outliers_within_batch`

`between_group_variability`

`outliers_within_group`

`pooled_data_normal`

`normalized_variance_equal`

`basis_pooled_sd`

`outliers_within_batch`

`between_group_variability`

`outliers_within_group`

`pooled_data_normal`

`pooled_variance_equal`

`basis_hk_ext`

`outliers_within_batch`

`between_batch_variability`

`outliers`

`sample_size`

`basis_nonpara_large_sample`

`outliers_within_batch`

`between_batch_variability`

`outliers`

`sample_size`

`basis_anova`

`outliers_within_group`

`equality_of_variance`

`number_of_groups`

J. F. Lawless, Statistical Models and Methods for Lifetime Data. New York: John Wiley & Sons, 1982.

“Composite Materials Handbook, Volume 1. Polymer Matrix Composites Guideline for Characterization of Structural Materials,” SAE International, CMH-17-1G, Mar. 2012.

M. Vangel, “One-Sided Nonparametric Tolerance Limits,” Communications in Statistics - Simulation and Computation, vol. 23, no. 4. pp. 1137–1154, 1994.

K. Krishnamoorthy and T. Mathew, Statistical Tolerance Regions: Theory, Applications, and Computation. Hoboken: John Wiley & Sons, 2008.

W. Meeker, G. Hahn, and L. Escobar, Statistical Intervals: A Guide for Practitioners and Researchers, Second Edition. Hoboken: John Wiley & Sons, 2017.

M. Vangel, “New Methods for One-Sided Tolerance Limits for a One-Way Balanced Random-Effects ANOVA Model,” Technometrics, vol. 34, no. 2. Taylor & Francis, pp. 176–185, 1992.

```
library(dplyr)
# A single-point basis value can be calculated as follows
# in this example, three failed diagnostic tests are
# overridden.
carbon.fabric %>%
filter(test == "FC") %>%
filter(condition == "RTD") %>%
basis_normal(strength, batch,
override = c("outliers",
"outliers_within_batch",
"anderson_darling_normal"))
#>
#> Call:
#> basis_normal(data = ., x = strength, batch = batch, override = c("outliers",
#> "outliers_within_batch", "anderson_darling_normal"))
#>
#> Distribution: Normal ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers`,
#> `outliers_within_batch`,
#> `anderson_darling_normal`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 76.94656
#>
## Call:
## basis_normal(data = ., x = strength, batch = batch,
## override = c("outliers", "outliers_within_batch",
## "anderson_darling_normal"))
##
## Distribution: Normal ( n = 18 )
## The following diagnostic tests were overridden:
## `outliers`,
## `outliers_within_batch`,
## `anderson_darling_normal`
## B-Basis: ( p = 0.9 , conf = 0.95 )
## 76.94656
# A set of pooled basis values can also be calculated
# using the pooled standard deviation method, as follows.
# In this example, one failed diagnostic test is overridden.
carbon.fabric %>%
filter(test == "WT") %>%
basis_pooled_sd(strength, condition, batch,
override = c("outliers_within_batch"))
#>
#> Call:
#> basis_pooled_sd(data = ., x = strength, groups = condition, batch = batch,
#> override = c("outliers_within_batch"))
#>
#> Distribution: Normal - Pooled Standard Deviation ( n = 54, r = 3 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> CTD 127.6914
#> ETW 125.0698
#> RTD 132.1457
#>
## Call:
## basis_pooled_sd(data = ., x = strength, groups = condition,
## batch = batch, override = c("outliers_within_batch"))
##
## Distribution: Normal - Pooled Standard Deviation ( n = 54, r = 3 )
## The following diagnostic tests were overridden:
## `outliers_within_batch`
## B-Basis: ( p = 0.9 , conf = 0.95 )
## CTD 127.6914
## ETW 125.0698
## RTD 132.1457
```