# Statistical Procedures

The content on this page is a summary. For more detailed information, please see the expanded overview.

The NTP uses many different statistical procedures in their studies. The following overview describes the statistical procedures used in the analysis of data presented in the Technical Reports (TRs) for the NTP 2-year toxicology/carcinogenicity studies. These TRs primarily describe the findings of studies designed to evaluate the potential health effects of long-term exposure to a test substance.

## Overview

### Survival Analyses

The Kaplan and Meier product-limit procedure is used to estimate the probability of survival, which is presented in graphical form. Animals dying from natural causes are included in the analyses. Animals found dead of non-natural causes are not included in the results. Cox's method for testing two groups for equality and Tarone's life table test to identify dose-related trends are used to analyze possible dose-related effects on survival.

### Calculation of Incidence

Incidences are the numbers of animals bearing neoplasms or nonneoplastic lesions at a specific anatomic site and the numbers of animals with that site examined microscopically. A method is used to determine survival-adjusted neoplasm rate for each group and each site-specific neoplasm to account for differential mortality.

### Analysis of Neoplasm and Nonneoplastic Lesion Incidences

The Poly-k test is used to assess the prevalence of neoplasms and nonneoplastic lesions. This test is a survival-adjusted procedure that takes survival differences into account.

Tests of significance include pairwise comparisons of each exposed group with controls and a test for an overall exposure-related trend.

### Analysis of Continuous Variables

Two approaches are employed to assess the significance of pairwise comparisons between exposed and control groups in the analysis of continuous variables. Organ and body weight data are analyzed with parametric multiple comparison procedures. Hematology, clinical chemistry, urinalysis, urine concentrating ability, cardiopulmonary, cell proliferation, tissue concentrations, spermatid, and epididymal spermatozoal data are analyzed using the nonparametric multiple comparison methods. Jonckheere's test is used to assess the significance of the dose-related trends.

# Expanded Overview

## Survival Analyses

The probability of survival is estimated by the product-limit procedure of Kaplan and Meier (1958) and is presented in the form of graphs. Animals found dead of other than natural causes are censored from the survival analyses; animals dying from natural causes are not censored. Statistical analyses for possible dose-related effects on survival uses Cox's (1972) method for testing two groups for equality and Tarone's (1975) life table test to identify dose-related trends. All reported P values for the survival analyses are two sided.

## Calculation of Incidence

The incidences of neoplasms or nonneoplastic lesions are the numbers of animals bearing such lesions at a specific anatomic site and the numbers of animals with that site examined microscopically. For calculation of statistical significance, the incidences of most neoplasms and all nonneoplastic lesions are the numbers of animals affected at each site examined microscopically. However, when macroscopic examination is required to detect neoplasms in certain tissues (e.g., harderian gland, intestine, mammary gland, and skin) before microscopic evaluation, or when neoplasms have multiple potential sites of occurrence (e.g., leukemia or lymphoma), the denominators consist of the number of animals on which a necropsy was performed. The survival-adjusted neoplasm rate for each group and each site-specific neoplasm is also determined. This survival-adjusted rate (based on the Poly-3 method described below) accounts for differential mortality by assigning a reduced risk of neoplasm, proportional to the third power of the fraction of time on study, to animals that do not reach terminal sacrifice.

## Analysis of Neoplasm and Nonneoplastic Lesion Incidences

The Poly-k test (Bailer and Portier, 1988; Portier and Bailer, 1989; Piegorsch and Bailer, 1997) is used to assess neoplasm and nonneoplastic lesion prevalence. This test is a survival-adjusted quantal-response procedure that modifies the Cochran-Armitage linear trend test to take survival differences into account. More specifically, this method modifies the denominator in the quantal estimate of lesion incidence to approximate more closely the total number of animal years at risk. For analysis of a given site, each animal is assigned a risk weight. This value is one if the animal had a lesion at that site or if it survived until terminal sacrifice; if the animal died prior to terminal sacrifice and did not have a lesion at that site, its risk weight is the fraction of the entire study time that it survived, raised to the kth power.

This method yields a lesion prevalence rate that depends only upon the choice of a shape parameter for a Weibull hazard function describing cumulative lesion incidence over time (Bailer and Portier, 1988). Unless otherwise specified, a value of k=3 was used in the analysis of site-specific lesions. This value was recommended by Bailer and Portier (1988) following an evaluation of neoplasm onset time distributions for a variety of site-specific neoplasms in control F344 rats and B6C3F1 mice (Portier et al., 1986). Bailer and Portier (1988) showed that the Poly-3 test gave valid results if the true value of k was anywhere in the range from 1 to 5. A further advantage of the Poly-3 method is that it does not require lesion lethality assumptions. Variation introduced by the use of risk weights, which reflect differential mortality, was accommodated by adjusting the variance of the Poly-3 statistic as recommended by Bieler and Williams (1993).

Tests of significance include pairwise comparisons of each exposed group with controls and a test for an overall exposure-related trend. Continuity-corrected Poly-3 tests are used in the analysis of lesion incidence, and reported P values are one sided. The significance of lower incidences or decreasing trends in lesions are also identified.

## Analysis of Continuous Variables

Two approaches are employed to assess the significance of pairwise comparisons between exposed and control groups in the analysis of continuous variables. Organ and body weight data, which historically have approximately normal distributions, are analyzed with the parametric multiple comparison procedures of Dunnett (1955) and Williams (1971, 1972). Hematology, clinical chemistry, urinalysis, urine concentrating ability, cardiopulmonary, cell proliferation, tissue concentrations, spermatid, and epididymal spermatozoal data, which have typically skewed distributions, are analyzed using the nonparametric multiple comparison methods of Shirley (1977), as modified by Williams (1986) and Dunn (1964). Jonckheere's test (Jonckheere, 1954) is used to assess the significance of the dose-related trends and to determine whether a trend-sensitive test (Williams' or Shirley's test) is more appropriate for pairwise comparisons than a test that does not assume a monotonic dose-related trend (Dunnett's or Dunn's test). Prior to statistical analysis, extreme values identified by the outlier test of Dixon and Massey (1951) are examined by NTP personnel, and implausible values are eliminated from the analysis. Average severity values are analyzed for significance with the Mann-Whitney U test (Hollander and Wolfe, 1973). Because vaginal cytology data are proportions (the proportion of the observation period that an animal was in a given estrous stage), an arcsine transformation is used to bring the data into closer conformance with a normality assumption. Treatment effects are investigated by applying a multivariate analysis of variance (Morrison, 1976) to the transformed data to test for simultaneous equality of measurements across exposure concentrations.

Statistical analysis of immunological data use a tier approach (Luster et al., 1988, Luster et al., 1992). Data are initially tested for homogeneity using Bartlett's Chi Square Test. For data that are determined to be homogeneous, one-analysis of variance (ANOVA) is conducted. If the ANOVA is significant at p < 0.05 or less, Dunnett's multiple range t test is used for multiple treatment-control comparisons. If the data are not homogeneous, nonparametric analysis of variance, the Kruskal-Wallis test, or the Wilcoxon rank sum test is used to compare treatment groups with controls groups. The level of statistical significance is set at p < 0.05 and p < 0.01. Values are routinely presented as mean ± standard error. For host-resistance data, Chi Square analysis or log linear models are used to determine chemical treatment effects on mortality. For comparisons of group survival times, the product limit estimator is used in conjunction with the Mantel-Cox test.

## References

- Bailer, A.J., and Portier, C.J. (1988). Effects of treatment-induced mortality and tumor-induced mortality on tests for carcinogenicity in small samples.
*Biometrics*44, 417-431. - Bieler G.S., and Williams, R.L. (1993). Ratio estimates, the delta method, and quantal response tests for increased carcinogenicity.
*Biometrics*49, 793-801 - Cox, D.R. (1972). Regression models and life-tables.
*J.R. Stat. Soc.*B34, 187-220 - Dixon, W.J., and Massey, F.J., Jr. (1951).
*Introduction to Statistical Analysis*, 1st ed., pp. 145-147. - Dunn, O.J. (1964). Multiple comparisons using rank sums.
*Technometrics*6, 241-252. - Dunnett, C.W. (1955). A multiple comparison procedure for comparing several treatments with a control.
*J. Am. Stat. Assoc.*50, 1096-1121. - Hollander, M., and Wolfe, D.A. (1973).
*Nonparametric Statistical Methods*, pp. 120-123. John Wiley and Sons, New York. - Jonckheere, A.R. (1954). A distribution-free
*k*-sample test against ordered alternatives.*Biometricka*41, 133-145. - Kaplan, E.L., and Meier, P. (1958). Nonparametric estimation from incomplete observations.
*J. Am. Stat. Assoc.*53, 457-481 - Morrison, D.F. (1976).
*Multivariate Statistical Methods*, 2nd ed., pp. 170-179. - Piegorsch, W.W., and Bailer, A.J. (1997).
*Statistics for Environmental Biology and Toxicology*, Section 6.3.2. Chapman and Hall, London. - Portier, C.J., and Bailer, A.J. (1989). Testing for increased carcinogenicity using a survival-adjusted quantal response test.
*Fundam. Appl. Toxicol*. 12, 731-737. - Portier, C.J., Hedges, J.C., and Hoel, D.G. (1986). Age-specific models of mortality and tumor onset for historical control animals in the National Toxicology Program's carcinogenicity experiments.
*Cancer Res*. 46, 4372-4378. - Shirley, E. (1977). A non-parametric equivalent of Williams' test for contrasting increasing dose levels of a treatment.
*Biometrics*33, 386-389. - Tarone, R.E. (1975). Tests for trend in life table analysis.
*Biometrika*62, 679-682. - Williams, D.A. (1971). A test for differences between treatment means when several dose levels are compared with a zero dose control.
*Biometrics*27, 103-117. - Williams, D.A. (1972). The comparison of several dose levels with a zero dose control.
*Biometrics*28, 519-531. - Williams, D.A. (1986). A note on Shirley's nonparametric test for comparing several dose levels with a zero-dose control.
*Biometrics*42, 182-186.