Risk of using logistic regression to illustrate exposure-response relationship of infectious diseases

Background: In most biological experiments, especially infectious disease, the exposure-response relationship is interrelated by a multitude of factors rather than many independent factors. Little is known about the suitability of ordinary, categorical exposures, and logarithmic transformation which have been presented in logistic regression models to assess the likelihood of an infectious disease as a function of a risk or exposure. This study aims to examine and compare the current approaches. Methods: A simulated human immunodeficiency virus (HIV) population, dynamic infection data for 100,000 individuals with 1% initial prevalence and 2% infectivity, was created. Using the Monte Carlo method (computational algorithm) to repeat random sampling to obtain numerical results, linearity between log odds and exposure, and suitability in practice were examined in the three model approaches. Results: Despite diverse population prevalence, the linearity was not satisfied between log odds and raw exposures. Logarithmic transformation of exposures improved the linearity to a certain extent, and categorical exposures satisfied the linear assumption (which was important for modelling). When the population prevalence was low (assumed < 10%), performances of the three models were significantly different. Comparing to ordinary logistic regression, the logarithmic transformation approach demonstrated better accuracy of estimation except that at the two inflection points: likelihood of infection increased from slowly to sharply, then slowly again. The approach using categorical exposures had better estimations around the real values, but the measurement was coarse due to categorization. Conclusions: It is not suitable to directly use ordinary logistic regression to explore the exposure-response relationship of HIV as an infectious disease. This study provides some recommendations for practical implementations including: 1) utilize categorical exposure if a large sample size and low population prevalence are provided; 2) utilize a logarithmic transformed exposure if the sample size is insufficient or the population prevalence is too high (such as 30%). Keywords: Logistic regression, Measurement error, Infectious disease, Exposure-response relationship, Computer simulation