Improving Care

Reducing morbidity and mortality and enhancing quality of life

Informing Policy

Transforming health care at the local, national and international levels

Featured Projects

With more than 80 scientists, research at Advancing Health encompasses a wide breadth of areas


The Evidence Speaks

A recurring feature highlighting the latest in Advancing Health research

Our People

In the News

Research Resources

From design to execution, Advancing Health provides a broad range of support services

Work in Progress Seminar Series

Loading Events

« All Events

  • This event has passed.

WiP Seminar: Finding the Optimal Number of Splits in Double Cross-Fitting Targeted Maximum Likelihood Estimators

April 3 @ 12:00 pm 1:00 pm

Ehsan Karim, PhD, M.Sc.
Scientist, Advancing Health
Assistant Professor, School of Population and Public Health, UBC

Finding the Optimal Number of Splits in Double Cross-Fitting Targeted Maximum Likelihood Estimators

Flexible machine learning (ML) algorithms have become vital in the realm of epidemiological research, offering refined insights through real-data analyses. However, integrating highly flexible algorithms within double robust methods, such as the Targeted Maximum Likelihood Estimator (TMLE), introduces complexities in variance estimation, resulting in notable undercoverage; a critical concern. The Double Cross-Fitting (DCF) method enables the use of diverse machine learning estimators while facilitating asymptotically valid inference. Nonetheless, the literature on DCF lacks clarity regarding the optimal number of data splits. This research explores the impact of different DCF splits on the performance of TMLE estimators, utilizing statistical simulations and real-world data analysis. We generalize DCF beyond traditional setups, experimenting with various splits to optimize TMLE, and employing a super learner. Our study examines configurations across different sample sizes and DCF generalizations, with real-world implications demonstrated through data from the National Health and Nutrition Examination Survey (NHANES), focusing on the risk of obesity and diabetes. This study emphasizes the importance of careful split selection in DCF TMLE methods for computational efficiency and accurate statistical inference, finding that three to five splits are optimal. It offers guidance to epidemiologists using complex machine learning in causal studies, advocating for prudent split management in DCF to effectively navigate the complexities of epidemiological analysis. The presentation topic is a collaborative effort with Momenul Haque Mondol, a trainee at the UBC School of Population and Public Health.

This is a virtual event, please register to receive Zoom link.