Statistical Procedures
The content on this page is a summary. For detailed information, please see the expanded overview or the Related Links.
NTP uses a variety of statistical procedures to:
 Analyze data produced by twoyear toxicology/carcinogenicity studies
 Develop estimates of chemical properties
 Predict the toxicological effects of certain chemicals
 Reduce or replace the use of animals for toxicity testing
Survival
We use the Kaplan and Meier productlimit procedure, presented in graphical form, to estimate the probability of survival. Doserelated trends are identified with Tarone's life table test, and doserelated effects on survival are assessed using a Cox proportional hazards model.
Neoplasm and Nonneoplastic Lesion Incidences
We determine incidence rates based on the numbers of animals bearing neoplasms or nonneoplastic lesions at a specific anatomic site, as well as the numbers of animals with that site examined microscopically. The Polyk test, a survivaladjusted procedure that takes survival differences into account, is used to assess the effect of dose on the prevalence of neoplasms and nonneoplastic lesions. Other tests of significance include pairwise comparisons of each exposed group with controls, and a test for overall exposurerelated trends.
Continuous Variables
We employ two approaches to assess the significance of pairwise comparisons between exposed and control groups in the analysis of continuous variables. Organ and bodyweight data are analyzed with parametric multiple comparison procedures. Hematology, clinical chemistry, urinalysis, urine concentrating ability, cell proliferation, tissue concentrations, litter size, estrous cycle counts and durations, sperm counts, and concentration are analyzed using nonparametric multiple comparison methods.
Litter Effects
Incorporating litter effects into statistical analyses, when there are multiple pups per sex per litter, is one example of how NTP is committed to implementing new statistical procedures to produce more accurate data analyses. Littermates tend to be more like one another than fetuses/pups in other litters, sharing such common features as:
 Genetics
 Maternal environments during gestation
 Shared environments during lactation
 Possibly shared environments into adulthood
If litter effects are ignored, withinlitter correlation leads to underestimates of variance in statistical tests, resulting in higher probabilities of Type I errors ("false positives"). We use various statistical approaches, including a modified version of the Polyk test that incorporates the withinlitter correlation and the effective sample size, as well as mixedeffects logistic regression, to account for betweenlitter variability.
Statistical Procedures
Expanded Overview
Subject  Procedure  Purpose 

Survival 

Probability of survival 
Survival 

Doserelated trends 
Survival 

Doserelated effects 
Neoplasm and Nonneoplastic Lesion Incidences 

Effect of dose on prevalence of neoplasms and nonneoplastic lesions after adjusting for survival 
Continuous Variables 

Significance of pairwise comparisons between exposed and control groups and doserelated trends 
Litter Effects 

Adjust Polyk test for correlations between littermates 
Litter Effects 

Adjust analyses of continuous variables for litter effects 
Survival Analyses
The probability of survival is estimated by the productlimit procedure of Kaplan and Meier (1958) and is presented graphically. Animals surviving to the end of the observation period are treated as censored observations, as are animals dying from unnatural causes within the observation period. Animals dying from natural causes are included in analyses and are treated as uncensored observations. Doserelated trends are identified with Tarone's life table test (1975), and pairwise doserelated effects on survival are assessed using a Cox proportional hazards model (1972). All reported P values for the survival analyses are two sided.
Calculation of Incidence
The incidences of neoplasms or nonneoplastic lesions are the numbers of animals bearing such lesions at a specific anatomic site. For calculation of the proportion of incidence, the denominator for most neoplasms and all nonneoplastic lesions is the numbers of animals where the site was examined microscopically. However, when macroscopic examination is required to detect neoplasms in certain tissues (e.g., harderian gland, intestine, mammary gland, and skin) before microscopic evaluation, or when neoplasms have multiple potential sites of occurrence (e.g., leukemia or lymphoma), the denominator consists of the number of animals on which a necropsy was performed.
The survivaladjusted neoplasm rate for each group and each sitespecific neoplasm is also determined. This survivaladjusted rate accounts for differential mortality by assigning a reduced risk of neoplasm, proportional to a power of the fraction of time on study, to animals that do not reach terminal sacrifice.
Analysis of Neoplasm and Nonneoplastic Lesion Incidences
The Polyk test (Bailer and Portier 1988; Portier and Bailer 1989; Piegorsch and Bailer 1997) is used to assess neoplasm and nonneoplastic lesion prevalence. This test is a survivaladjusted quantalresponse procedure that modifies the CochranArmitage linear trend test to take survival differences into account. More specifically, this method modifies the denominator in the quantal estimate of lesion incidence to approximate more closely the total number of animal years at risk. For analysis of a given site, each animal is assigned a risk weight. This value is one if the animal had a lesion at that site or if it survived until terminal sacrifice; if the animal died prior to terminal sacrifice and did not have a lesion at that site, its risk weight is the fraction of the entire study time that it survived, raised to the kth power.
This method yields a lesion prevalence rate that depends only upon the choice of a shape parameter for a Weibull hazard function describing cumulative lesion incidence over time (Bailer and Portier 1988). A value of k=3 is typically used in the analysis of sitespecific lesions. This value was recommended by Bailer and Portier (1988) following an evaluation of neoplasm onset time distributions for a variety of sitespecific neoplasms in control F344 rats and B6C3F1 mice (Portier, et al. 1986). Bailer and Portier (1988) showed that the Poly3 test gave valid results if the true value of k was anywhere in the range from 1 to 5. A further advantage of the Poly3 method is that it does not require lesion lethality assumptions. Variation introduced by the use of risk weights, which reflect differential mortality, is accommodated by adjusting the variance of the Poly3 statistic as recommended by Bieler and Williams (1993).
Tests of significance include pairwise comparisons of each exposed group with controls and a test for an overall exposurerelated trend. Continuitycorrected Poly3 tests are used in the analysis of lesion incidence, and reported P values are one sided. The significance of lower incidences or decreasing trends in lesions are also identified.
Analysis of Continuous Variables
Two approaches are employed to assess the significance of pairwise comparisons between exposed and control groups in the analysis of continuous variables. Organ and body weight data, which historically have approximately normal distributions, are analyzed with the parametric multiple comparison procedures of Dunnett (1955) and Williams (1971, 1972). Hematology, clinical chemistry, urinalysis, urine concentrating ability, cell proliferation, tissue concentrations, litter size, sperm counts and concentration, and estrous cycle counts and durations are analyzed using the nonparametric multiple comparison methods of Shirley (1977) (as modified by Williams 1986) and Dunn (1964) since these endpoints typically have skewed distributions. Jonckheere's test (Jonckheere 1954) is used to assess the significance of the doserelated trends and to determine whether a trendsensitive test (Williams' or Shirley's test) is more appropriate for pairwise comparisons than a test that does not assume a monotonic doserelated trend (Dunnett's or Dunn's test).
Prior to statistical analysis, extreme values identified by the outlier test of Dixon and Massey (1951) are examined by NTP personnel, and implausible values are eliminated from the analysis.
Analysis of Gestational and Fertility Indices
Significances of trends in gestational and fertility indices across dose groups are tested using CochranArmitage trend tests. Pairwise comparisons of each dosed group with the control group are conducted using the Fisher exact test.
Studies Involving Litter Effects
Littermates tend to be more like each other than fetuses/pups in other litters, Failure to account for correlation within litters leads to underestimates of variance in statistical tests, resulting in higher probabilities of Type I errors ("false positives"). Two kinds of adjustments are performed when there are multiple animals per sex from each litter:
 Adjustments for correlations between littermates when testing for differences between control and dose groups or testing for doserelated trends
 Adjustment of body weights of dams and pups for litter size
Analysis of Neoplasm and Nonneoplastic Lesion Incidences
The statistical analysis of lesion incidences uses the Polyk test to account for survival differences, with an adjustment for litter effects (Rao and Scott 1992, Fung 1994).
Analysis of Continuous Variables
Pup organ weights, body weights and body temperatures historically have approximately normal distributions. To account for litter structure, these data are analyzed with mixed effects linear models, where litters are the random effect. Statistical analyses for linear trends across dose groups are performed using mixed models with dose as a continuous variable. Multiple pairwise comparisons of dose groups to control groups are performed using mixed models with dose as a categorical variable. The DunnettHsu procedure (Hsu 1992) is used to adjust for multiple comparisons.
Other endpoints such as hematology and clinical chemistry may have skewed distributions. Statistical analyses for trend across dose groups are analyzed using a permutation test based on the Jonckheere trend test that randomly permutes whole litters across dose groups and uses a bootstrapping procedure within the litters. Pairwise comparisons are made using a modified Wilcoxon test (Datta and Satten 2006) that incorporates litter structure. The Hommel procedure (Hommel 1988) is used to adjust for multiple comparisons.
Analysis of TimetoEvent Endpoints
Developmental time to event endpoints, such as time to testicular descent, vaginal opening and balano preputial separation (BPS), are analyzed using the Cox proportional hazards model with litter as a random effect. Weight covariates are included in the model for the vaginal opening and BPS endpoints. The Hommel procedure is used to adjust for multiple comparisons.
Analysis of Gestational and Fertility Indices
When litter effects are present, gestational and fertility indices are analyzed using the CochranArmitage test with the RaoScott modification to account for litter.
Body Weight Adjustments
Fetal weight and litter size are inversely related, and accurate assessment of dose effects on body weight should be made after accounting for litter size (Romero, 1992). Adjusted dam body weights and adjusted pup body weights are calculated to account for litter size. For example, to calculate adjusted pup body weights, a linear model is fit to pup body weights as a function of dose and litter size. Then the estimated coefficient of litter size is used to adjust each pup body weight based on the difference between its litter size and the mean litter size. Dam body weights can be adjusted to account for total litter size, while preweaning pup body weights can be adjusted for live litter size. Postweaning pup body weights are not adjusted for litter size.
References
 Bailer AJ, Portier CJ. Effects of treatmentinduced mortality and tumorinduced mortality on tests for carcinogenicity in small samples. Biometrics. 1988;44(2):417431.
 Bieler GS, Williams RL. Ratio estimates, the delta method, and quantal response tests for increased carcinogenicity. Biometrics. 1993;49(3):793801.
 Cox DR. Regression models and lifetables. J R Stat Soc Series B Stat Methodol. 1972;34(2):187220.
 Datta S, Satten GA. Ranksum tests for clustered data. J Am Stat Assoc. 2005;100(471):908915.
 Dixon WJ, Massey FJ Jr. Introduction to statistical analysis. 1st ed. New York: McGrawHill; 1951. p. 145147.
 Dunn OJ. Multiple comparisons using rank sums. Technometrics. 1964;6(3):241252.
 Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc. 1955;50(272):10961121.
 Fung KY, Krewski D, Rao JNK, Scott AJ. Tests for trend in developmental toxicity experiments with correlated binary data. Risk Analysis. 1994;14(4):639648.
 Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75(2):383386.
 Hsu JC. The factor analytic approach to simultaneous inference in the general linear model. J Comput Graph Stat. 1992:1:151168
 Jonckheere AR. A distributionfree ksample test against ordered alternatives. Biometrika. 1954;41(1/2):133145.
 Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;53(282):457481.
 McCullough P, Nelder JA. Generalized Linear Models. 2nd ed. New York: Chapman and Hall; 1989. p. 126128.
 Morrison DF. Multivariate Statistical Methods. 2nd ed. New York: McGrawHill; 1976. p. 170179.
 Piegorsch WW, Bailer AJ. Statistics for Environmental Biology and Toxicology. London: Chapman and Hall; 1997. Section 6.3.2.
 Portier CJ, Bailer AJ. Testing for increased carcinogenicity using a survivaladjusted quantal response test. Fundam Appl Toxicol. 1989;12(4):731737.
 Portier CJ, Hedges JC, Hoel DG. Agespecific models of mortality and tumor onset for historical control animals in the National Toxicology Program's carcinogenicity experiments. Cancer Res. 1986;46(9):43724378.
 Rao JNK, Scott AJ. A simple method for the analysis of clustered binary data. Biometrics. 1992;48(2):577585.
 Romero A, Villamayor F, Grau MT, Sacristan A, Ortiz JA. Relationship between fetal weight and litter size in rats: application to reproductive toxicology studies. Reprod Toxicol. 1992;6:453456.
 Shirley E. A nonparametric equivalent of Williams' test for contrasting increasing dose levels of a treatment. Biometrics. 1977;33(2):386389.
 Tarone RE. Tests for trend in life table analysis. Biometrika. 1975;62(3):679682.
 Williams DA. A test for differences between treatment means when several dose levels are compared with a zero dose control. Biometrics. 1971;27(1):103117.
 Williams DA. The comparison of several dose levels with a zero dose control. Biometrics. 1972;28(2):519531.
 Williams DA. A note on Shirley's nonparametric test for comparing several dose levels with a zerodose control. Biometrics. 1986;42(1):182186.