Professor (Research) Stanford University Stanford, California, United States
Abstract Text: New algorithms MOFA and Stabl represent approaches that resolve issues in analyzing complex, heterogeneous datasets by isolating key biological features that explain variability and outcomes of subjects in a dataset. In this project, MOFA and Stabl were applied to data for pertussis vaccines (CMI-PB dataset) to predict the rankings of values for biological features at specific timepoints post-booster vaccination in 2023, using data from 2020-2022. The datasets included cell frequency, gene expression, cytokine and chemokine concentrations, antigen-specific antibody measurements, and T cell polarization and activation. MOFA used dimensionality reduction and ignored missing values to compute latent factors with weighted features, while Stabl filtered features with high missing values, implementing a KNN imputer before finding the most prominent features across models with different regularization strengths. Both effectively predicted IgG antibody levels against pertussis toxin on Day 14 with MOFA (Spearman correlation ⍴ = 0.387) identifying the baseline value, HSP90AB1, and RIPOR2 and Stabl (⍴ = 0.384) using CHI3L2, USF1, and interleukin-6. Additionally, Stabl (⍴ = 0.172) performed especially well in predicting frequency of Monocytes on Day 1, with monocytes and classical monocytes. Finally, MOFA (⍴ = 0.371) excelled on the difficult task of predicting Th1/Th2 (IFN-γ/IL-5) polarization ratio on Day 30, leveraging the baseline value, IgG against filamentous hemagglutinin (both of which Stabl highlighted as well), and interleukin-17A. Overall, our exploration provides guidance and displays the benefits of using MOFA and Stabl to find the best predictive cell subsets and features for understanding large immunological multi-omics data.