# Statistical Procedures

The content on this page is a summary. For detailed information, please see the expanded overview or the Related Links.

NTP uses many different statistical procedures in their studies. Some of the statistical procedures described below are used in the analysis of data presented in the Technical Reports (TRs) for the NTP 2-year toxicology/carcinogenicity studies. These TRs primarily describe the findings of studies designed to evaluate the potential health effects of long-term exposure to a test substance. Other procedures are used to develop estimates of chemical properties or predictions of whether chemicals could cause toxicity, as well as to reduce or replace animal use for toxicity testing.

## Overview

### Survival Analyses

The Kaplan and Meier product-limit procedure is used to estimate the probability of survival, which is presented in graphical form. Animals dying from natural causes are included in the analyses. Animals found dead of non-natural causes are not included in the results. Cox's method for testing two groups for equality and Tarone's life table test to identify dose-related trends are used to analyze possible dose-related effects on survival.

### Calculation of Incidence

Incidences are the numbers of animals bearing neoplasms or nonneoplastic lesions at a specific anatomic site and the numbers of animals with that site examined microscopically. A method is used to determine survival-adjusted neoplasm rate for each group and each site-specific neoplasm to account for differential mortality.

### Analysis of Neoplasm and Nonneoplastic Lesion Incidences

The Poly-k test is used to assess the prevalence of neoplasms and nonneoplastic lesions. This test is a survival-adjusted procedure that takes survival differences into account. Tests of significance include pairwise comparisons of each exposed group with controls and a test for an overall exposure-related trend.

### Analysis of Continuous Variables

Two approaches are employed to assess the significance of pairwise comparisons between exposed and control groups in the analysis of continuous variables. Organ and body weight data are analyzed with parametric multiple comparison procedures. Hematology, clinical chemistry, urinalysis, urine concentrating ability, cardiopulmonary, cell proliferation, tissue concentrations, spermatid, and epididymal spermatozoal data are analyzed using nonparametric multiple comparison methods. Jonckheere's test is used to assess the significance of dose-related trends.

### New Directions in Statistical Procedures

NTP is committed to considering new statistical procedures when necessary to produce the most accurate data analyses. Incorporating litter effects into statistical analyses when there are multiple pups per sex per litter is one area of interest. Littermates tend to be more like each other than fetuses/pups in other litters. Littermates share such common features as:

- Genetics
- Maternal environment during gestation
- Shared environment during lactation
- Possibly, a shared environment into adulthood

Various statistical approaches are used, such as a modified version of the Poly-k test that incorporates the effective sample size to be used when considering within-litter correlation. Since it is not always possible to modify the familiar tests, an equivalent method such as a mixed effects logistic regression may also be used. In either scenario, the P value from these methods will generally be slightly less significant than if litter effects are completely ignored.

# Statistical Procedures

## Expanded Overview

### Survival Analyses

The probability of survival is estimated by the product-limit procedure of Kaplan and Meier (1958) and is presented in the form of graphs. Animals found dead of other than natural causes are censored from the survival analyses; animals dying from natural causes are not censored. Statistical analyses for possible dose-related effects on survival uses Cox's method (1972) for testing two groups for equality and Tarone's life table test (1975) to identify dose-related trends. All reported P values for the survival analyses are two sided.

### Calculation of Incidence

The incidences of neoplasms or nonneoplastic lesions are the numbers of animals bearing such lesions at a specific anatomic site and the numbers of animals with that site examined microscopically. For calculation of statistical significance, the incidences of most neoplasms and all nonneoplastic lesions are the numbers of animals affected at each site examined microscopically.

However, when macroscopic examination is required to detect neoplasms in certain tissues (e.g., harderian gland, intestine, mammary gland, and skin) before microscopic evaluation, or when neoplasms have multiple potential sites of occurrence (e.g., leukemia or lymphoma), the denominators consist of the number of animals on which a necropsy was performed. The survival-adjusted neoplasm rate for each group and each site-specific neoplasm is also determined. This survival-adjusted rate accounts for differential mortality by assigning a reduced risk of neoplasm, proportional to a power of the fraction of time on study, to animals that do not reach terminal sacrifice.

### Analysis of Neoplasm and Nonneoplastic Lesion Incidences

The Poly-k test (Bailer and Portier 1988; Portier and Bailer 1989; Piegorsch and Bailer 1997) is used to assess neoplasm and nonneoplastic lesion prevalence. This test is a survival-adjusted quantal-response procedure that modifies the Cochran-Armitage linear trend test to take survival differences into account. More specifically, this method modifies the denominator in the quantal estimate of lesion incidence to approximate more closely the total number of animal years at risk. For analysis of a given site, each animal is assigned a risk weight. This value is one if the animal had a lesion at that site or if it survived until terminal sacrifice; if the animal died prior to terminal sacrifice and did not have a lesion at that site, its risk weight is the fraction of the entire study time that it survived, raised to the kth power.

This method yields a lesion prevalence rate that depends only upon the choice of a shape parameter for a Weibull hazard function describing cumulative lesion incidence over time (Bailer and Portier 1988). A value of k=3 is typically used in the analysis of site-specific lesions. This value was recommended by Bailer and Portier (1988) following an evaluation of neoplasm onset time distributions for a variety of site-specific neoplasms in control F344 rats and B6C3F1 mice (Portier, et al. 1986). Bailer and Portier (1988) showed that the Poly-3 test gave valid results if the true value of k was anywhere in the range from 1 to 5. A further advantage of the Poly-3 method is that it does not require lesion lethality assumptions. Variation introduced by the use of risk weights, which reflect differential mortality, is accommodated by adjusting the variance of the Poly-3 statistic as recommended by Bieler and Williams (1993).

Tests of significance include pairwise comparisons of each exposed group with controls and a test for an overall exposure-related trend. Continuity-corrected Poly-3 tests are used in the analysis of lesion incidence, and reported P values are one sided. The significance of lower incidences or decreasing trends in lesions are also identified.

### Analysis of Continuous Variables

Two approaches are employed to assess the significance of pairwise comparisons between exposed and control groups in the analysis of continuous variables. Organ and body weight data, which historically have approximately normal distributions, are analyzed with the parametric multiple comparison procedures of Dunnett (1955) and Williams (1971, 1972). Hematology, clinical chemistry, urinalysis, urine concentrating ability, cardiopulmonary, cell proliferation, tissue concentrations, spermatid, and epididymal spermatozoal data, which have typically skewed distributions, are analyzed using the nonparametric multiple comparison methods of Shirley (1977)(as modified by Williams 1986) and Dunn (1964). Jonckheere's test (Jonckheere 1954) is used to assess the significance of the dose-related trends and to determine whether a trend-sensitive test (Williams' or Shirley's test) is more appropriate for pairwise comparisons than a test that does not assume a monotonic dose-related trend (Dunnett's or Dunn's test).

Prior to statistical analysis, extreme values identified by the outlier test of Dixon and Massey (1951) are examined by NTP personnel, and implausible values are eliminated from the analysis.

Because vaginal cytology data are proportions (the proportion of the observation period that an animal was in a given estrous stage), an arcsine transformation is used to bring the data into closer conformance with a normality assumption. Treatment effects are investigated by applying a multivariate analysis of variance (Morrison 1976) to the transformed data to test for simultaneous equality of measurements across exposure concentrations.

### Analysis of Gestational and Fertility Indices

Significances of trends in gestational and fertility indices across dose groups are tested using Cochran-Armitage trend tests. Pairwise comparisons of each dosed group with the control group are conducted using the Fisher exact test.

## Future Directions Involving Litter Effects

Two kinds of adjustments are important to consider when there are multiple animals per sex from each litter. (1) Accounting for correlations between littermates when testing for differences between control and dose groups or testing for dose-related trends and (2) Adjusting body weights of dams and pups for litter size. Methods for 28-day studies, in which all of the animals survive, do not use survival adjustment (i.e., the Poly-k test would not apply), but these studies could still have litter effects. The Cochran-Armitage (C-A) test modified to accommodate litter effects would be used when littermates are involved but there is no adjustment for survival.

### Analysis of Neoplasm and Nonneoplastic Lesion Incidences

The statistical analysis of lesion incidences uses the Poly-k test to account for survival differences, with an adjustment for litter effects (Rao and Scott 1992). Litter effects arise when littermates are more similar to each other than they are to animals from other litters. If intra-litter correlations are present but ignored in the statistical analysis, the variance of the data will be underestimated, leading to P values that are too small.

### Analysis of Continuous Variables

It is important for the statistical analyses to account for litter structure when there is more than one pup per sex per litter in a study. Pup organ weights, body weights and body temperatures historically have approximately normal distributions. To account for litter structure, these data are analyzed with mixed effects linear models, where litters are the random effect. Statistical analyses for linear trends across dose groups are performed using mixed models with dose as a continuous variable. Multiple pairwise comparisons of dose groups to control groups are performed using mixed models with dose as a categorical variable. These pairwise tests can be conducted using the mixed model approach with Williams' test if there is a trend across dose groups and Dunnett’s test if there is not a trend.

Other endpoints, such as hematology and clinical chemistry, may have skewed distributions. Statistical analyses for trend across dose groups are analyzed using a permutation test based on the Jonckheere trend test that randomly permutes whole litters across dose groups and uses a bootstrapping procedure within the litters. Pairwise comparisons are made by using a modified Wilcoxon test (Datta and Satten 2006) that incorporates litter structure. The Hommel procedure (Hommel 1988) is used to adjust for multiple comparisons.

### Body Weight Adjustments

Adjusted dam body weights and adjusted pup body weights are calculated to account for litter size. Dam body weights can be adjusted to account for total litter size, while pre-weaning pup body weights can be adjusted for live litter size. Post-weaning pup body weights are not adjusted for litter size.

## References

- Bailer AJ, Portier CJ. Effects of treatment-induced mortality and tumor-induced mortality on tests for carcinogenicity in small samples. Biometrics. 1988;44(2):417-431.
- Bieler GS, Williams RL. Ratio estimates, the delta method, and quantal response tests for increased carcinogenicity. Biometrics. 1993;49(3):793-801.
- Cox DR. Regression models and life-tables. J R Stat Soc Series B Stat Methodol. 1972;34(2):187-220.
- Datta S, Satten GA. Rank-sum tests for clustered data. J Am Stat Assoc. 2005;100(471):908-915.
- Dixon WJ, Massey FJ Jr. Introduction to statistical analysis. 1st ed. New York: McGraw-Hill; 1951. p. 145-147.
- Dunn OJ. Multiple comparisons using rank sums. Technometrics. 1964;6(3):241-252.
- Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc. 1955;50(272):1096-1121.
- Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75(2):383-386.
- Jonckheere AR. A distribution-free k-sample test against ordered alternatives. Biometrika. 1954;41(1/2):133-145.
- Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;53(282):457-481.
- Morrison DF. Multivariate statistical methods. 2nd ed. New York: McGraw-Hill; 1976. p. 170-179.
- Piegorsch WW, Bailer AJ. Statistics for environmental biology and toxicology. London: Chapman and Hall; 1997. Section 6.3.2.
- Portier CJ, Bailer AJ. Testing for increased carcinogenicity using a survival-adjusted quantal response test. Fundam Appl Toxicol. 1989;12(4):731-737.
- Portier CJ, Hedges JC, Hoel DG. Age-specific models of mortality and tumor onset for historical control animals in the National Toxicology Program's carcinogenicity experiments. Cancer Res. 1986;46(9):4372-4378.
- Rao JNK, Scott AJ. A simple method for the analysis of clustered binary data. Biometrics. 1992;48(2):577-585.
- Shirley E. A non-parametric equivalent of Williams' test for contrasting increasing dose levels of a treatment. Biometrics. 1977;33(2):386-389.
- Tarone RE. Tests for trend in life table analysis. Biometrika. 1975;62(3):679-682.
- Williams DA. A test for differences between treatment means when several dose levels are compared with a zero dose control. Biometrics. 1971;27(1):103-117.
- Williams DA. The comparison of several dose levels with a zero dose control. Biometrics. 1972;28(2):519-531.
- Williams DA. A note on Shirley's nonparametric test for comparing several dose levels with a zero-dose control. Biometrics. 1986;42(1):182-186.