posted on 2022-05-01, 00:00authored byDanielle Colette Tucker
In this dissertation, I provide the results of four collaborative research projects which build off of the work of Alexander Petersen and Hans-Georg M\"uller and their paper entitled ``Fr\'echet regression for random objects with Euclidean predictors." In Chapter 1, I introduce their work, including both the global and local Fr\'echet regression models, and I try to motivate research in this area with a real data example.
Fr\'echet regression was developed as a generic way to model non-Euclidean data, such as distributions, networks, and manifolds, which are now more available than ever. Specifically, global Fr\'echet regression is an extension of linear regression which can handle non-Euclidean, metric space-valued response modeled by Euclidean predictors. Variable selection is clearly important for any regression model in the presence of multiple predictors, but as of 2019 it had not yet been explored for Fr\'echet regression. Because the responses come from generic metric spaces, Fr\'echet regression models avoid the use of model parameters; this lack of parameters complicates the goal to extend variable selection methods which exist for linear regression to global Fr\'echet regression. In Chapter 2, I provide the research which addresses this challenge and develops a novel variable selection method with good performance. I share theoretical support for this method which we call the FRiSO, including its selection consistency. I also demonstrate FRiSO's finite sample performance via simulated and real data illustrations.
To further motivate the use of Fr\'echet regression models and to illustrate the performance of FRiSO, in Chapter 3, I provide the results of a collaboration utilizing fMRI data as a response. In the past, regression methods have been widely utilized to determine how clinical and behavioral variables affect brain networks. Traditional methods, such as linear regression, must extract a scalar summary from the highly complex brain network and model the dependency of this simplified response on the clinical and behavioral measures. However, these scalar representations cannot fully capture the network's dynamics and thus are not optimal for further analyses and inference. Therefore, in this research project, we implemented global Fr\'echet regression instead and utilized FRiSO to study the dependency of the brain networks as a whole on various clinical and behavioral variables jointly. Our preliminary results indicate that the interaction of age and sex plays an important role in such a model.
To generalize Fr\'echet regression even further and allow for more complex analyses in the future, in Chapter 4, I propose a partially-global Fr\'echet regression model, developed in collaboration with Yichao Wu. This model extends the profiling technique for the partially linear regression model. It not only allows for the response to come from a generic metric space, but it also can incorporate a predictor which comes from another generic metric space in combination with a set of Euclidean predictors. By melding together local and global Fr\'echet regression, we get a model that is more flexible than global Fr\'echet regression but more accurate than local Fr\'echet regression when the data generating process includes a non-Euclidean predictor or is ``global (linear)" for scalar predictors. Once again, I share theoretical support for partially-global Fr\'echet regression and demonstrate its good performance for both simulated and real data.
With the newly developed Fr\'echet regression model, I address its need for a variable selection method in Chapter 5.
The method, which is referred to as PG-FRSO, is an extension of a unified approach for variable selection for the partially-linear model. PG-FRSO is designed to select important predictors from a set containing both Euclidean and non-Euclidean covariates. The non-Euclidean predictors and the non-Euclidean response each may come from a different metric space. This research is ongoing, and therefore, I can only provide the framework for the methodology and a preliminary simulation study. However, in collaboration with Yushen Dong and Yichao Wu, I hope to eventually verify that it indeed satisfies selection consistency.
Finally, in Chapter 6, I share my ideas for future work in the area of Fr\'echet regression. I focus on three topics: 1) the choice of metric for the non-Euclidean response, 2) the development of statistical significance and inference in Fr\'echet regression models, and 3) further model generalization and flexibility. Each of these topics are practical in nature and are motivated by the applied research collaborations I have had the privilege to join.