posted on 2024-05-01, 00:00authored byEzra Daniel Becker
When a researcher desires to evaluate the efficacy of a given treatment on a set of consumers, whenever possible the researcher first identifies a group of potential test subjects, and then assigns those test subjects to treatment and control groups via some randomized process before the experiment takes place. This tried-and-true method of random assignment between
treatment and control has been a mainstay of experimental design since Ronald Fisher’s work in the 1930s, and is hugely important in washing out the effects of unknown or otherwise uncontrollable sources of variation. However, researchers are often given a set of subjects who have already received a treatment, with no knowledge of how those subjects were selected for
the treatment, and with no control group provided. Those researchers are then faced with the challenge of establishing an appropriate benchmark for evaluating the effects of the treatment when they have no say in the generation of the data, or when assembling any such benchmark
a priori is either impractical, unethical, or impossible.
The idea that there are analytical questions that do not lend themselves nicely to the Fisher approach for experimental design is not at all new, and there already exist diverse methods that researchers have used to tackle this challenge. Several such methods are presented, along with a general discussion of their benefits and limitations. Also presented is a new approach to post hoc control group selection based on stratified random sampling along the principal components of the given treatment group. The main contribution of this research is the extension of a recently developed, novel
and innovative subdata selection algorithm to derive a nearly-optimal control group selection method—called the “NOD” method—for generating highly efficient designs across diverse scenarios relative to the criteria of A-optimality, D-optimality, and in some cases E-optimality. Importantly, an E-optimality equivalence theorem for constrained designs is also presented that
can provide value beyond the immediate context of post hoc control group selection.
Extensive simulation efforts demonstrate that the NOD method generates control groups that offer substantially better efficiency in parameter estimation than other post hoc control group selection methods. This improved performance is demonstrated across a diverse range
of conditions. Moreover, the NOD method is shown to generally exhibit increasingly better efficiency relative to other post hoc control group selection methods as the complexity of the simulated scenario increases.
This research further evaluates various post hoc control group selection methods in estimating the weight loss effect of the drug Semaglutide using a robust Electronic Health Records dataset. The theory-based (Rao-Cramer) lower bound on the variance of the estimators derived from each control group selection method indicate the clear superiority of that achieved by the
NOD selection method. This real-world analysis also demonstrates both the strength and the vulnerability of the Expert Opinion method of post hoc control group selection. This work concludes with a discussion of the diverse avenues along which the concept of robust post hoc control group selection can be extended and improved.
History
Advisor
Min Yang
Department
Mathematics, Statistics, and Computer Science
Degree Grantor
University of Illinois Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
D
i
b
y
e
n
M
a
j
u
m
d
a
r
,
J
i
e
Y
a
n
g
,
P
i
n
g
-
S
h
o
u
Z
h
o
n
g
,
A
n
d
r
e
w
B
o
y
d