posted on 2016-01-13, 00:00authored byJ. Ren, Z. Ning, C.S. Kirkness, C.S. Asche, H. Wang
Background: In most biological experiments, especially infectious disease, the exposure-response relationship is
interrelated by a multitude of factors rather than many independent factors. Little is known about the suitability
of ordinary, categorical exposures, and logarithmic transformation which have been presented in logistic regression
models to assess the likelihood of an infectious disease as a function of a risk or exposure. This study aims to examine
and compare the current approaches.
Methods: A simulated human immunodeficiency virus (HIV) population, dynamic infection data for 100,000 individuals
with 1% initial prevalence and 2% infectivity, was created. Using the Monte Carlo method (computational algorithm) to
repeat random sampling to obtain numerical results, linearity between log odds and exposure, and suitability in
practice were examined in the three model approaches.
Results: Despite diverse population prevalence, the linearity was not satisfied between log odds and raw
exposures. Logarithmic transformation of exposures improved the linearity to a certain extent, and categorical
exposures satisfied the linear assumption (which was important for modelling). When the population prevalence
was low (assumed < 10%), performances of the three models were significantly different. Comparing to ordinary
logistic regression, the logarithmic transformation approach demonstrated better accuracy of estimation except
that at the two inflection points: likelihood of infection increased from slowly to sharply, then slowly again. The
approach using categorical exposures had better estimations around the real values, but the measurement was
coarse due to categorization.
Conclusions: It is not suitable to directly use ordinary logistic regression to explore the exposure-response relationship of
HIV as an infectious disease. This study provides some recommendations for practical implementations including: 1) utilize
categorical exposure if a large sample size and low population prevalence are provided; 2) utilize a logarithmic transformed
exposure if the sample size is insufficient or the population prevalence is too high (such as 30%).
Keywords: Logistic regression, Measurement error, Infectious disease, Exposure-response relationship, Computer simulation