Statistical Procedures

The content on this page is a summary. For detailed information, please see the expanded overview or the Related Links.

NTP uses a variety of statistical procedures to:

Analyze data produced by two-year toxicology/carcinogenicity studies
Develop estimates of chemical properties
Predict the toxicological effects of certain chemicals
Reduce or replace the use of animals for toxicity testing

Survival

We use the Kaplan and Meier product-limit procedure, presented in graphical form, to estimate the probability of survival. Dose-related trends are identified with Tarone's life table test, and dose-related effects on survival are assessed using a Cox proportional hazards model.

Neoplasm and Nonneoplastic Lesion Incidences

We determine incidence rates based on the numbers of animals bearing neoplasms or nonneoplastic lesions at a specific anatomic site, as well as the numbers of animals with that site examined microscopically. The Poly-k test, a survival-adjusted procedure that takes survival differences into account, is used to assess the effect of dose on the prevalence of neoplasms and nonneoplastic lesions. Other tests of significance include pairwise comparisons of each exposed group with controls, and a test for overall exposure-related trends.

Continuous Variables

We employ two approaches to assess the significance of pairwise comparisons between exposed and control groups in the analysis of continuous variables. Organ and body-weight data are analyzed with parametric multiple comparison procedures. Hematology, clinical chemistry, urinalysis, urine- concentrating ability, cell proliferation, tissue concentrations, litter size, estrous cycle counts and durations, sperm counts, and concentration are analyzed using nonparametric multiple comparison methods.

Litter Effects

Incorporating litter effects into statistical analyses, when there are multiple pups per sex per litter, is one example of how NTP is committed to implementing new statistical procedures to produce more accurate data analyses. Littermates tend to be more like one another than fetuses/pups in other litters, sharing such common features as:

Genetics
Maternal environments during gestation
Shared environments during lactation
Possibly shared environments into adulthood

If litter effects are ignored, within-litter correlation leads to underestimates of variance in statistical tests, resulting in higher probabilities of Type I errors ("false positives"). We use various statistical approaches, including a modified version of the Poly-k test that incorporates the within-litter correlation and the effective sample size, as well as mixed-effects logistic regression, to account for between-litter variability.

Statistical Procedures

Expanded Overview

Summary of Statistical Procedures Used by NTP
Subject	Procedure	Purpose
Survival	Kaplan and Meier product-limit procedure	Probability of survival
Survival	Tarone's life table test	Dose-related trends
Survival	Cox proportional hazards model	Dose-related effects
Neoplasm and Nonneoplastic Lesion Incidences	Poly-k test Pairwise comparisons of exposed group with controls Test for overall exposure-related trends	Effect of dose on prevalence of neoplasms and nonneoplastic lesions after adjusting for survival
Continuous Variables	Analysis of variables using parametric multiple comparison and trend methods Analysis of variables using nonparametric multiple comparison and trend methods	Significance of pairwise comparisons between exposed and control groups and dose-related trends
Litter Effects	Modified version of Poly-k test	Adjust Poly-k test for correlations between littermates
Litter Effects	Mixed effects linear models using multiple comparison and trend methods Nonparametric methods using multiple comparison and trend methods	Adjust analyses of continuous variables for litter effects

Survival Analyses

The probability of survival is estimated by the product-limit procedure of Kaplan and Meier (1958) and is presented graphically. Animals surviving to the end of the observation period are treated as censored observations, as are animals dying from unnatural causes within the observation period. Animals dying from natural causes are included in analyses and are treated as uncensored observations. Dose-related trends are identified with Tarone's life table test (1975), and pairwise dose-related effects on survival are assessed using a Cox proportional hazards model (1972). All reported P values for the survival analyses are two sided.

Calculation of Incidence

The incidences of neoplasms or nonneoplastic lesions are the numbers of animals bearing such lesions at a specific anatomic site. For calculation of the proportion of incidence, the denominator for most neoplasms and all nonneoplastic lesions is the numbers of animals where the site was examined microscopically. However, when macroscopic examination is required to detect neoplasms in certain tissues (e.g., harderian gland, intestine, mammary gland, and skin) before microscopic evaluation, or when neoplasms have multiple potential sites of occurrence (e.g., leukemia or lymphoma), the denominator consists of the number of animals on which a necropsy was performed.

The survival-adjusted neoplasm rate for each group and each site-specific neoplasm is also determined. This survival-adjusted rate accounts for differential mortality by assigning a reduced risk of neoplasm, proportional to a power of the fraction of time on study, to animals that do not reach terminal sacrifice.

Analysis of Neoplasm and Nonneoplastic Lesion Incidences

The Poly-k test (Bailer and Portier 1988; Portier and Bailer 1989; Piegorsch and Bailer 1997) is used to assess neoplasm and nonneoplastic lesion prevalence. This test is a survival-adjusted quantal-response procedure that modifies the Cochran-Armitage linear trend test to take survival differences into account. More specifically, this method modifies the denominator in the quantal estimate of lesion incidence to approximate more closely the total number of animal years at risk. For analysis of a given site, each animal is assigned a risk weight. This value is one if the animal had a lesion at that site or if it survived until terminal sacrifice; if the animal died prior to terminal sacrifice and did not have a lesion at that site, its risk weight is the fraction of the entire study time that it survived, raised to the kth power.

This method yields a lesion prevalence rate that depends only upon the choice of a shape parameter for a Weibull hazard function describing cumulative lesion incidence over time (Bailer and Portier 1988). A value of k=3 is typically used in the analysis of site-specific lesions. This value was recommended by Bailer and Portier (1988) following an evaluation of neoplasm onset time distributions for a variety of site-specific neoplasms in control F344 rats and B6C3F1 mice (Portier, et al. 1986). Bailer and Portier (1988) showed that the Poly-3 test gave valid results if the true value of k was anywhere in the range from 1 to 5. A further advantage of the Poly-3 method is that it does not require lesion lethality assumptions. Variation introduced by the use of risk weights, which reflect differential mortality, is accommodated by adjusting the variance of the Poly-3 statistic as recommended by Bieler and Williams (1993).

Tests of significance include pairwise comparisons of each exposed group with controls and a test for an overall exposure-related trend. Continuity-corrected Poly-3 tests are used in the analysis of lesion incidence, and reported P values are one sided. The significance of lower incidences or decreasing trends in lesions are also identified.

Analysis of Continuous Variables

Two approaches are employed to assess the significance of pairwise comparisons between exposed and control groups in the analysis of continuous variables. Organ and body weight data, which historically have approximately normal distributions, are analyzed with the parametric multiple comparison procedures of Dunnett (1955) and Williams (1971, 1972). Hematology, clinical chemistry, urinalysis, urine concentrating ability, cell proliferation, tissue concentrations, litter size, sperm counts and concentration, and estrous cycle counts and durations are analyzed using the nonparametric multiple comparison methods of Shirley (1977) (as modified by Williams 1986) and Dunn (1964) since these endpoints typically have skewed distributions. Jonckheere's test (Jonckheere 1954) is used to assess the significance of the dose-related trends and to determine whether a trend-sensitive test (Williams' or Shirley's test) is more appropriate for pairwise comparisons than a test that does not assume a monotonic dose-related trend (Dunnett's or Dunn's test).

Prior to statistical analysis, extreme values identified by the outlier test of Dixon and Massey (1951) are examined by NTP personnel, and implausible values are eliminated from the analysis.

Analysis of Gestational and Fertility Indices

Significances of trends in gestational and fertility indices across dose groups are tested using Cochran-Armitage trend tests. Pairwise comparisons of each dosed group with the control group are conducted using the Fisher exact test.

Studies Involving Litter Effects

Littermates tend to be more like each other than fetuses/pups in other litters, Failure to account for correlation within litters leads to underestimates of variance in statistical tests, resulting in higher probabilities of Type I errors ("false positives"). Two kinds of adjustments are performed when there are multiple animals per sex from each litter:

Adjustments for correlations between littermates when testing for differences between control and dose groups or testing for dose-related trends
Adjustment of body weights of dams and pups for litter size

Analysis of Neoplasm and Nonneoplastic Lesion Incidences

The statistical analysis of lesion incidences uses the Poly-k test to account for survival differences, with an adjustment for litter effects (Rao and Scott 1992, Fung 1994).

Analysis of Continuous Variables

Pup organ weights, body weights and body temperatures historically have approximately normal distributions. To account for litter structure, these data are analyzed with mixed effects linear models, where litters are the random effect. Statistical analyses for linear trends across dose groups are performed using mixed models with dose as a continuous variable. Multiple pairwise comparisons of dose groups to control groups are performed using mixed models with dose as a categorical variable. The Dunnett-Hsu procedure (Hsu 1992) is used to adjust for multiple comparisons.

Other endpoints such as hematology and clinical chemistry may have skewed distributions. Statistical analyses for trend across dose groups are analyzed using a permutation test based on the Jonckheere trend test that randomly permutes whole litters across dose groups and uses a bootstrapping procedure within the litters. Pairwise comparisons are made using a modified Wilcoxon test (Datta and Satten 2006) that incorporates litter structure. The Hommel procedure (Hommel 1988) is used to adjust for multiple comparisons.

Analysis of Time-to-Event Endpoints

Developmental time to event endpoints, such as time to testicular descent, vaginal opening and balano preputial separation (BPS), are analyzed using the Cox proportional hazards model with litter as a random effect. Weight covariates are included in the model for the vaginal opening and BPS endpoints. The Hommel procedure is used to adjust for multiple comparisons.

Analysis of Gestational and Fertility Indices

When litter effects are present, gestational and fertility indices are analyzed using the Cochran-Armitage test with the Rao-Scott modification to account for litter.

Body Weight Adjustments

Fetal weight and litter size are inversely related, and accurate assessment of dose effects on body weight should be made after accounting for litter size (Romero, 1992). Adjusted dam body weights and adjusted pup body weights are calculated to account for litter size. For example, to calculate adjusted pup body weights, a linear model is fit to pup body weights as a function of dose and litter size. Then the estimated coefficient of litter size is used to adjust each pup body weight based on the difference between its litter size and the mean litter size. Dam body weights can be adjusted to account for total litter size, while pre-weaning pup body weights can be adjusted for live litter size. Post-weaning pup body weights are not adjusted for litter size.

References

Bailer AJ, Portier CJ. Effects of treatment-induced mortality and tumor-induced mortality on tests for carcinogenicity in small samples. Biometrics. 1988;44(2):417-431.
Bieler GS, Williams RL. Ratio estimates, the delta method, and quantal response tests for increased carcinogenicity. Biometrics. 1993;49(3):793-801.
Cox DR. Regression models and life-tables. J R Stat Soc Series B Stat Methodol. 1972;34(2):187-220.
Datta S, Satten GA. Rank-sum tests for clustered data. J Am Stat Assoc. 2005;100(471):908-915.
Dixon WJ, Massey FJ Jr. Introduction to statistical analysis. 1st ed. New York: McGraw-Hill; 1951. p. 145-147.
Dunn OJ. Multiple comparisons using rank sums. Technometrics. 1964;6(3):241-252.
Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc. 1955;50(272):1096-1121.
Fung KY, Krewski D, Rao JNK, Scott AJ. Tests for trend in developmental toxicity experiments with correlated binary data. Risk Analysis. 1994;14(4):639-648.
Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75(2):383-386.
Hsu JC. The factor analytic approach to simultaneous inference in the general linear model. J Comput Graph Stat. 1992:1:151-168
Jonckheere AR. A distribution-free k-sample test against ordered alternatives. Biometrika. 1954;41(1/2):133-145.
Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;53(282):457-481.
McCullough P, Nelder JA. Generalized Linear Models. 2nd ed. New York: Chapman and Hall; 1989. p. 126-128.
Morrison DF. Multivariate Statistical Methods. 2nd ed. New York: McGraw-Hill; 1976. p. 170-179.
Piegorsch WW, Bailer AJ. Statistics for Environmental Biology and Toxicology. London: Chapman and Hall; 1997. Section 6.3.2.
Portier CJ, Bailer AJ. Testing for increased carcinogenicity using a survival-adjusted quantal response test. Fundam Appl Toxicol. 1989;12(4):731-737.
Portier CJ, Hedges JC, Hoel DG. Age-specific models of mortality and tumor onset for historical control animals in the National Toxicology Program's carcinogenicity experiments. Cancer Res. 1986;46(9):4372-4378.
Rao JNK, Scott AJ. A simple method for the analysis of clustered binary data. Biometrics. 1992;48(2):577-585.
Romero A, Villamayor F, Grau MT, Sacristan A, Ortiz JA. Relationship between fetal weight and litter size in rats: application to reproductive toxicology studies. Reprod Toxicol. 1992;6:453-456.
Shirley E. A non-parametric equivalent of Williams' test for contrasting increasing dose levels of a treatment. Biometrics. 1977;33(2):386-389.
Tarone RE. Tests for trend in life table analysis. Biometrika. 1975;62(3):679-682.
Williams DA. A test for differences between treatment means when several dose levels are compared with a zero dose control. Biometrics. 1971;27(1):103-117.
Williams DA. The comparison of several dose levels with a zero dose control. Biometrics. 1972;28(2):519-531.
Williams DA. A note on Shirley's nonparametric test for comparing several dose levels with a zero-dose control. Biometrics. 1986;42(1):182-186.