Th182 - Testing of Multi-Omics System Vaccinology Approaches on Pertussis Vaccine Dataset

Thursday, June 26, 2025

7:30am - 7:45pm East Coast USA Time

AG

Aanya Gupta, Intern

Stanford University
San Jose, California, United States
Holden T. Maecker, Dr., PhD

Professor (Research)
Stanford University
Stanford, California, United States

Abstract Text: New algorithms MOFA and Stabl represent approaches that resolve issues in analyzing complex, heterogeneous datasets by isolating key biological features that explain variability and outcomes of subjects in a dataset. In this project, MOFA and Stabl were applied to data for pertussis vaccines (CMI-PB dataset) to predict the rankings of values for biological features at specific timepoints post-booster vaccination in 2023, using data from 2020-2022. The datasets included cell frequency, gene expression, cytokine and chemokine concentrations, antigen-specific antibody measurements, and T cell polarization and activation. MOFA used dimensionality reduction and ignored missing values to compute latent factors with weighted features, while Stabl filtered features with high missing values, implementing a KNN imputer before finding the most prominent features across models with different regularization strengths. Both effectively predicted IgG antibody levels against pertussis toxin on Day 14 with MOFA (Spearman correlation ⍴ = 0.387) identifying the baseline value, HSP90AB1, and RIPOR2 and Stabl (⍴ = 0.384) using CHI3L2, USF1, and interleukin-6. Additionally, Stabl (⍴ = 0.172) performed especially well in predicting frequency of Monocytes on Day 1, with monocytes and classical monocytes. Finally, MOFA (⍴ = 0.371) excelled on the difficult task of predicting Th1/Th2 (IFN-γ/IL-5) polarization ratio on Day 30, leveraging the baseline value, IgG against filamentous hemagglutinin (both of which Stabl highlighted as well), and interleukin-17A. Overall, our exploration provides guidance and displays the benefits of using MOFA and Stabl to find the best predictive cell subsets and features for understanding large immunological multi-omics data.