Health data and machine learning are increasingly used to tackle fundamental challenges in medicine, ranging from providing a better understanding of disease trajectories, improving care services or devising new policies. As AI-supported decision-making makes its way into the clinical practice, of particular concern is presence of representation bias, where groups from particular socio-economic, racial, ethnicity and religious backgrounds might be underrepresented in the health data sets, used to devise AI predictive models. Consequently, the resulting decisions might be discriminatory, with the potential to perpetuate existing health inequities. In response, several approaches have been developed to address health data poverty, including data augmentation. Data augmentation methods are used to generate synthetic data from underrepresented groups that closely resemble real data to mitigate representation bias. Machine learning community has developed various approaches to synthetic data generation, ranging from simple resampling methods, such as SMOTE, up to recent methods based on artificial neural networks, using generative adversarial networks (GAN). These methods have been proven to work very well in imaging data, however several research challenges remain in generating longitudinal, multivariate data. Furthermore, none of the existing approaches are designed to specifically target the challenge of health data poverty. Therefore, the focus of the PhD work will be on devising novel GAN architectures that can generate high-dimensional, longitudinal clinical data to mitigate representation bias. Several datasets will be considered in this work containing patient data from underrepresented groups, such as US-based critical care datasets (MIMIC IV and eICU-CRD) as well as European based datasets, including National Multimorbidity Resource provided by Health Data Research UK (HDRUK), Connected Bradford and UK Biobank. The resulting methods will be data agnostic and applicable to a wide range of datasets.
Advisor Name